The search for a C# DAL code generator that actually works. Part 3: the solution.

September 4, 2008

In part 2 of this series, I outlined my requirements for an ideal DAL generator.

Well, not having found any tool out there that meets my criteria, I ended up writing the tool myself, meeting all but one of my objectives [I dont handle stored procedures yet].
 
Sql Server 2005 is the target DBMS for now, using VS2005 as the compiler.

Lets walk through how to use the tool.

  • From the command line, the tool is invoked like this.
mycodegentool.exe config.xml


  • The simplest config.xml file is
<CodeGen>
 <Directory>D:\development\codegenExample\myDatabase</Directory>
 <ConnectionString>Data Source=.;Initial Catalog=myDatabase;
    Integrated Security=SSPI;</ConnectionString>
 <ConcurrencyColumn>update_date</ConcurrencyColumn>
<CodeGen>


This will give you a DAL dll with every table and view of the database in it, and handle concurrency for any table that that has an update_date field.


  • To get a unit test dll as well, add this xml fragment to the config file.
<UnitTest>
 <Directory>D:\development\codegenExample\myDatabase\CodeGenUnitTest</Directory>
</UnitTest>


This takes care of the plumbing. Now reference the DAL dll from your business layer dll, and happy porcelain coding.

Here are a few use case scenarios in actually using the DAL from your business layer.

  • Load up a table.
ProductsFactory factory = new ProductsFactory();
List rows = factory.LoadAll( connection ) ;
Console.WriteLine( rows.Count ) ;


  • Quicker way to see have many rows in a table.
ProductsFactory factory = new ProductsFactory();
int rowcount = factory.GetRowCount( connection ) ;


  • Load up a specific row from a table using the primary key.
int id = 17 ;
ProductsFactory factory = new ProductsFactory();
Products product = factory.LoadByPrimaryKey( connection, id ) ;


  • Load up a set of rows from a table based on custom search criteria.
bool discontinued = true ;
ProductsFactory factory = new ProductsFactory();
List rows = factory.LoadByDiscontinued( connection, discontinued ) ;


To add custom search criteria to a table or view, add this xml fragment to the config file.


<Tables>
 <Table Class="Products" >
  <Methods>
   <Method Name="LoadByDiscontinued" where="LoadByDiscontinued.sql">
     <Parameters>
      <Parameter Name="discontinued" Type="bool?" />
     </Parameters>
    </Method>
  </Methods>
 </Table>
<Tables>


where the contents of the sql file are


where discontinued = @discontinued order by productname


  • Get a row’s parent row.
Here we are walking the employee hierarchary using ootb generated methods - cool.


EmployeesFactory factory = new EmployeesFactory();

// robert - the pleb
int? employeeId = 7;
Employees robert = factory.LoadByPrimaryKey( connection, employeeId);
Assert.AreEqual( robert.FirstName, "Robert" ) ;

// robert reports to steven
Employees steven = robert.parentEmployeesByReportsTo( connection ) ;
Assert.AreEqual(steven.FirstName, "Steven");

// steven reports to andrew
Employees andy = steven.parentEmployeesByReportsTo( connection);
Assert.AreEqual(andy.FirstName, "Andrew");

// andrew is the big cheese
Employees nevilleNobody = andy.parentEmployeesByReportsTo(connection);
Assert.IsNull(nevilleNobody);


  • Get a row’s children rows.
Here we start with an order, then go up to get its customer, then go down to get all the orders of this customer - powerful pattern in a few lines of code.


OrdersFactory factory = new OrdersFactory();
int orderId = 10289 ;

// get an order
Orders order = factory.LoadByPrimaryKey( connection, orderId);

// the customer for this order
Customers customer = order.parentCustomersByCustomerID( connection) ;

// all the orders of this customer
List orderList = customer.childOrdersByCustomerID( connection ) ;


  • Load up from a view.
Same as loading from a table.


Alphabetical_list_of_productsFactory factory = new Alphabetical_list_of_productsFactory();
List rows = factory.LoadAll( connection ) ;


  • Load up from a custom query.
Same as loading from a table, with some extra config.


CustomerOrderFactory factory = new CustomerOrderFactory();
List rows = factory.LoadByRegion( connection, "Quebec" );
List rows2 = factory.LoadByCustomer( connection, "quick" ) ;


The config in this case would be

<Queries>
 <Query Class="CustomerOrder" CodeFile="CustomerOrder.cs" Select="CustomerOrder.sql" >
  <Methods>
   <Method Name="LoadByRegion" Where="LoadByRegion.sql">
    <Parameters>
     <Parameter Name="region" Type="string" />
   </Parameters>
  </Method>
 </Methods>
  <Method Name="LoadByCustomer" Where="LoadByCustomer.sql">
   <Parameters>
    <Parameter Name="customerId" Type="string" />
   </Parameters>
  </Method>
 </Query>
</Queries>


Here we have a customer order view, for display on a web page data grid for example.

The CustomerOrder.sql file contains

select * from customers c, orders o

whilst the LoadByCustomer method has the following criteria in the config sql file reference

where c.customerid = o.customerid
and c.customerid = @customerid
order by c.companyname, o.orderdate

and the LoadByRegion method’s where clause is

where c.customerid = o.customerid
and isnull(c.region, '') = isnull(@region, '')
order by c.companyname, o.orderdate


  • Insert a new row
OrdersFactory factory = new OrdersFactory();
Orders order = new Orders() ; // create a new row in memory
order.CustomerID = 123;
order.EmployeeID = 7 ;
order = factory.Save( order, connection ) ; // does the insert


  • Update an existing row
OrdersFactory factory = new OrdersFactory();
int orderId = 10289 ;
Orders order = factory.LoadByPrimaryKey( connection, orderId); // get an order
order.ShipPostalCode = "2565"
order = factory.Save( order, connection ) ; // does the update


  • Insert then update a row
OrdersFactory factory = new OrdersFactory();

Orders order = new Orders() ; // create a new row in memory
order.CustomerID = 123;
order.EmployeeID = 7 ;
order = factory.Save( order, connection ) ; // does the insert

order.ShipPostalCode = "2565"
order = factory.Save( order, connection ) ; // does the update


  • Delete an existing row
OrdersFactory factory = new OrdersFactory();

int orderId = 10289 ;
Orders order = factory.LoadByPrimaryKey( connection, orderId); // get an order

order.MarkForDelete = true ;
factory.Save( order, connection ) ; // does the delete


  • Negative use case - get a concurrency error.
TestTableFactory factory = new TestTableFactory();

TestTable alice = factory.LoadByPrimaryKey( connection, 12345 ); // get a row
TestTable bob = factory.LoadByPrimaryKey( connection, 12345 ); // get it again

alice.update_date = DateTime.Now ;
bob.update_date = DateTime.Now ;

alice = factory.Save( alice, connection ) ; // does the save ok
bob = factory.Save( bob, connection ) ; // will throw a concurrency exception


  • Negative use case - get a save error when saving a stale object
OrdersFactory factory = new OrdersFactory();

int orderId = 10289 ;
Orders order = factory.LoadByPrimaryKey( connection, orderId); // get an order

order.ShipPostalCode = "2565"
factory.Save( order, connection ) ; // does the update - ok

order.ShipPostalCode = "2000"
factory.Save( order, connection ) ; // will throw a stale row exception


  • Many users simultaneously inserting rows into a table.
In this example, a total of 5000 rows will be inserted into a table by 50 threads running in parallel. The table in question has a monotonically increasing non-identity integer ID primary key field, and therefore it is a contention hotspot.


In the first version of this use case, the inserts fail, as the code and sql that determine the next primary key value are not protected against concurrent threads. The test fails with the error “Violation of PRIMARY KEY constraint ‘PK_TestTable3′. Cannot insert duplicate key”.

protected void InsertLotsOfRowsInParallel()
{
 using ( MyThreadPoolManager tpm = new MyThreadPoolManager( 50, 5000 ))
 {
  for (int i = 1; i <= rowsToInsert; i++)
   tpm.Queue( InsertOneRow() ) ;
  tpm.WaitUntilAllFinished() ;
 }
}
// every invocation of this method runs in its own thread
protected void InsertOneRow()
{
 using (TransactionScope transaction =
         new TransactionScope(TransactionScopeOption.RequiresNew))
 {
  using (SqlConnection connection = new SqlConnection( _connectionString ) )
  {
   connection.Open();
   TestTable3Factory factory = new TestTable3Factory();
   TestTable3 obj = new TestTable3();
   // BAD: get the next available ID = max(ID) + 1
   obj.ID = select max(ID) + 1 from the table
   obj.update_date = DateTime.Now;
   // insert the row - this will fail with PRIMARY KEY constraint violation
   obj = factory.Save(obj, connection);
   transaction.Complete(); // commit - wont get here
  }
 }
}



In the correct version of this use case, the codegen tool has ootb capability to handle this common scenario. The tool will escalate the lock around the next primary key value and serialise access to the table, giving everyone a warm fuzzy feeling.



// every invocation of this method runs in its own thread
protected void InsertOneRow()
{
 using (TransactionScope transaction =
         new TransactionScope(TransactionScopeOption.RequiresNew))
 {
  using (SqlConnection connection = new SqlConnection( _connectionString ) )
  {
   connection.Open();
   TestTable3Factory factory = new TestTable3Factory();
   TestTable3 obj = new TestTable3();
   // GOOD: the tool will get the next available ID, the safe way
   obj.ID = factory.GetNextId( connection, TestTable3.Column_ID, true );
   obj.update_date = DateTime.Now;
   // insert the row - now it works
   obj = factory.Save(obj, connection);
   transaction.Complete(); // commit - aha
  }
 }
}


  • Here are a few .NET Reflector screen dumps of the DAL layer that the tool generates for Northwind.
northwind row

This is what the Orders row object looks like.


northwind factory

And the corresponding factory object for the Orders table.


northwind custom query

Here we have a custom query object that is a join between Customers and Orders tables.


northwind custom query factory

And the corresponding factory object.


northwind DAL code gen unit test

This screen shot shows some of the generated code involved in unit testing the DAL. Here, in the insert phase, each table is being populated and saved in referential integrity order, with pre and post row counts, object state, and field values being asserted.


  • One of the databases that I have recently codegened is the popular DotNetNuke database.
You can download my DAL for DNN here and also download the corresponding unit test dll here.
I recommend .NET Reflector to inspect the generated code.


Contact Buildmasters if you would like a DAL generated for your database.


  • This is the to do list for the tool
  • Make sure it compiles and works under Mono and Debian Linux.
  • Make sure it compiles and runs under VS2008.
  • Extend to target SqlServer 2008, MySql, Oracle databases.
  • Extend to handle stored procedures.


So far I have used the tool against 13 corporate, vendor and public databases, and it has my made my programming experience easier, faster, and more reliable.


I do not think about the DAL layer at all now - it just works.

The search for a C# DAL code generator that actually works. Part 2: the requirements.

September 4, 2008

Now that I have outlined the reasons why current DAL tools do not satisfy me, I will list my requirements for the ideal DAL code gen tool.

The tool should:

  • Generate the DAL for the whole database in one hit, not table by table, generating all non-system tables and views out of the box.
  • Have a simple pleasant configuration file experience.
    Ideally, the minimum configuration required would be a connection string to the database, and a directory path to which files are generated.
  • Be able to support the addition of custom query objects and custom criteria clauses, and to do so in an easy manner.
  • Create the DAL in a separate dll, so that the source code of my business rules layer contains only business rules code.
  • Be a command line tool, so I can run it as part of my build process, and via cruise control, regenerating the DAL every night and on-demand during the day, keeping it in synchronised with the underlying database schema.
  • Create objects that do not hold any database locks.
    Therefore all objects returned from the DAL are to be disconnected.
  • Be designed with concurrency in mind from the ground up.
    No joke, the vast majority of systems that I have encountered over the last two decades have not taken concurrency into account.
    Consider this scenario: Alice and Bob both read row A from a table in the database.
    Alice modifies her copy of row A and saves it back to the database.
    Bob then changes his copy and successfully saves it back.
    Congratulations, your database is now corrupt.
    This is incorrect behaviour, as Bob should have received a concurrency exception when he attempted to save his stale row.
    The DAL tool should support this out of the box.
    This is plumbing - the programmer should not have to worry about it.
    Concurrency handling is especially important now that we live a disconnected data set world.
  • Uses nullable types. I do not want to explicitly check for nulls before I get or set a field.
    The tool should do it.
  • Not use events.
  • Be designed with database transactions in mind from the ground up, and should work with implicit transactions via TransactionScope, or explicit BYO transactions.
  • Generate clean, understandable code that is easy to follow and debug.
  • Generate deterministic predictable code, so that small schema changes do not cause large diffs.
  • Use the referential integrity of the database itself to automatically generate accessor methods on objects that allow you to walk the database tree.
    These accessors to parent and child objects are to be lazy populated, ie when the accessor is invoked..
  • Have a simple object model using the factory pattern.
    TRow : RowBase , List<TRow>, TFactory: FactoryBase.
  • At buildmasters, we have a very strong test ethic.
    Even generated code needs to be tested.
    The tool should auto generate unit tests to code cover the generated DAL code.
    The unit tests should perform the following tasks in one database transaction:

    • Insert a row into every table, populating each field, read the rows back, and assert the row counts and field values.
    • Update the inserted rows, changing the values of each field, read the rows back, and assert the row counts and field values.
    • Delete the inserted rows, and assert the row counts.
    • Rollback the transaction [unit tests should leave no footprint in the database].
  • Have the following persistance patterns:
    1. Successfully persisting a row object invalidates the object, so it cannot be saved again, and returns the latest version of the row in a new object.
    2. Persisting an unmodified row object does nothing.
    3. If a table does not have a primary key, its row objects cannot to be persisted.
      This is a classic case of bad database design. Every table needs a primary key.
    4. A standard persistance mode that follows the rules of database integrity:
      A new memory row object can only be inserted, not updated, based on primary key.
      A row object that is read from the database can only be updated or deleted, based on primary key and concurrency cookie, and only if the concurrency cookie is still valid.
    5. An ETL persistence mode:
      Persisting a row object involves trying an update, and failing that, an insert is attempted, all based on primary key.
      The concurrency flag is ignored.
      This mode of operation is used in batch ETL type tasks where you just want to transfer or inject a whole bunch of rows in to a table, whether that table is in the same database or not, and maybe on a different connection.
  • Provide a DAL wrapper around legacy stored procedures.
    Im not a big fan of TSQL [see my post here ].
    I have seen many instances of whole systems that have been written in TSQL, where the business rules have been buried away in unmaintainable several-thousand-line TSQL stored procs, as opposed to having the business rules in a proper SOA/business tier written in a high level language like C#.
    It is a pain to debug TSQL and it does not scale.
    But sometimes you will inherit such a mess, so at least the DAL tool can hide it behind a nice C# interface.

In part 3 of this article, I will outline a tool that meets these requirements.

The search for a C# DAL code generator that actually works. Part 1: the problem.

September 4, 2008

Besides developers, the database is your company’s greatest asset.

The data in your database, your ability to get at it, shape it, analyse it, report on it, turn it in to understandable meaningful information, is what gives your company competitive advantage.

I have been designing, using, optimising and programming against relational databases for over 20 years. DB2, Oracle, SqlServer, Sybase, Access, and MySql are some of the major relational databases engines that I have worked with, and DAO, ADO, ADO.NET, visual studio dataset designer, nHibernate, CSLA, codesmith, and LINQ are some of the tools and DAL technologies that I have used to get at the those databases.

In all that time, I have been frustrated with all of the above DAL methods of accessing data, for various reasons that I will discuss below.

I found that I was spending too much time stuffing around trying to placate or work around the DAL technology and not enough time just getting the job done.

Developers should spend their time writing the porcelain (the business rules that are specific to solving a business problem that cannot be generated by a tool) and very little time writing and working around with the plumbing (the repeatable monkey code infrastructure that can be generated by a tool).

Let us take the visual studio dataset designer for example.
The visual studio DAL tool is fairly good, but I have several issues with it which have caused me a great considerable frustration .

  • For one, it does things one table at a time. This is very cumbersome when one has 100+ tables in a database that needs to be DALd.
  • It is also very slow. When I right click on the designer, my pc freezes for several seconds until the context menu pops up. FWIW I have a high spec dev box.
  • The generated code does not compile in SQLCLR safe mode.
    Why can’t something I generate in visual studio work in SQLCLR safe mode?
    Hello Microsoft. Visual studio and Sql Server are both your products.
    The problem is that the designer inserts a bunch of delegates and events into the generated code, and events use synchronisation objects, which are forbidden in SQLCLR safe mode, and there is no way to prevent generation of these events every time you touch the designer.
    Who actually uses those events anyway?
    Am I the only person that thinks that events are overrated and over used.
    For app support teams, events are a nightmare to debug.
    One needs innate familiarity with the code to visualise the execution flow, even when using a debugger.
    In my experience, events have been more trouble than their worth over the years.
  • Null checking. The code generated does not use nullable types for some reason, so every time I get or set a property, I have to write code to first test for null, otherwise the dataset code will kindly throw an exception.
    I just want to set a property – 1 line of code, not 5, please.
  • Also, after your program is complied, you don’t know if or when the underlying database schema has changed. Guess when you find out – runtime!
    I want the DAL layer to be in sync with the actual underlying database schema at all times.
    If I had a command line interface to the visual studio dataset designer, I could regenerate the DAL layer through my continuous build process and have confidence in the accuracy of my DAL.
  • Another annoyance is that sometimes the designer changes column orders, seemingly at random, and a small change in the designer causes illogical reordering in the designer.cs file, making diffs hard to read.
  • The typed data adapter classes do not expose their contained SqlCommand objects publicly, so I cannot associate an explicit transaction to them.

The other data access methods don’t do the job for me either:

  • CSLA: the version I used was based on XSLT templates – OMG -–enough said.
  • nHibernate: getting closer, doesn’t handle my requirements see below.
  • LINQ: ditto.

In part 2 of this article, I will outline my requirements for an ideal DAL code generator.