Thursday, 23 June 2011

Magic Unicorn and attachment disorders

As previously noted, I’m a big fan of Entity Framework 4.1 – but it does have a few shortcomings/quirks that can take a bit of head scratching to overcome.  One of its features that is useful and annoying in equal measure is the ability to persist whole layers of complex objects in a single call.  This is great because…it can do it in a single call!  But there are scenarios where automatic persistence of everything is less than desirable.  A good example of this is lookups – let’s take Northwind’s Products table:

northwindproducts

The SupplierID and CategoryID fields are foreign keys (to the Supplier and Category tables).  If we were to create an object model around the product table, then we would probably create separate classes for the suppliers and categories:

public class Supplier
{
    public int Id { get; set; }
    public string CompanyName { get; set; }
    // ...other fields
}

public class Category
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
}

public class Product
{
    public int Id { get; set; }
public string Name { get; set; } public virtual Supplier Supplier { get; set; } public virtual Category Category { get; set; } public string QuantityPerUnit { get; set; } public decimal? UnitPrice { get; set; } public short? UnitsInStock { get; set; } public short? UnitsOnOrder { get; set; } public short? ReorderLevel { get; set; } public bool Discontinued { get; set; } }

However if we run this through Magic Unicorn and set the Supplier and/or Category property of the Product object to a supplier and/or category that isn’t already in the database, then EF 4.1. will helpfully create new rows in the appropriate tables for us:

using (var context = new NorthwindContext())
{
    var supplier = new Supplier()
    {
        CompanyName = "Random Co"
    };

    var category = new Category()
    {
        Name = "Expensive stuff",
        Description = "Things that cost a lot of money"
    };

    var product = new Product() 
    { 
        Name = "Money burner",
        Category = category,
        Supplier = supplier, 
        QuantityPerUnit = "lots of boxes", 
        UnitPrice = 20,
        UnitsInStock = 500,
        UnitsOnOrder = 0,
        ReorderLevel = 10,
        Discontinued = false
    };

    context.Products.Add(product);
    context.SaveChanges();
}

Whether this is a good thing depends on whether the “dependent” object being created is specific to the parent or just a general lookup value.  In this case it’s a one-to-many with category as the principal (i.e. there can be many products for each category) so it’s much better practice to separate the processes of persisting products, suppliers, and categories.  As an aside I should highlight what happens if we use values already in the database for the category and supplier:

using (var context = new NorthwindContext())
{
    var supplier = context.Suppliers.FirstOrDefault();
    var category = context.Categories.FirstOrDefault();

    var product = new Product() 
    { 
        Name = "Money burner",
        Category = category,
        Supplier = supplier
        // other fields...
    };

    context.Products.Add(product);
    context.SaveChanges();
}

As one would expect, Entity Framework patches up the references correctly and everything saves fine with no duplication of the ‘lookup’ values.  This is because when entities are retrieved/deserialised, Entity Framework actually returns a proxy (rather than the POCO).  It does this so that it can do all sorts of clever interception to support change tracking.  It uses this ‘special magic’ to work out whether objects already relate to rows in the database so that it doesn’t duplicate them – in fact it’s key (no pun intended) to the way that a good OR/M should work.

So far, so good.  What happens if we stick on a few data contract attributes and expose those POCOs/entities over WCF?  Well, once we get past the data contract resolver problem (for getting the objects out of the database and over SOAP), we hit the issue of sending back real POCOs (instead of proxies) and trying to match them up with database rows.  This can lead to some interesting errors:

Violation of PRIMARY KEY constraint 'PK_xxx'. Cannot insert duplicate key in object 'MyTable.

What this is saying is that there is already an object with the same key, but the one being persisted isn’t it.  So EF tries to create the object again but fails because it already exists.  So what we need to do is tell EF that “the object I’m giving you is the same as the database row with the same key”.  After unsuccessfully trying to use the Attach() method of the data context, which produced this error:

An object with the same key already exists in the ObjectStateManager. The ObjectStateManager cannot track multiple objects with the same key

I still don’t understand why this happens – I thought that the Attach() method was supposed to make the DbContext recognise the object.  After much gnashing of teeth and wailing around stuff to do with the entity key, I realised that the easiest thing to do was to replace the object with the actual object from the database context.  But in cases whether the object I’m trying to retrieve doesn’t exist, I want to know about it.  To this end (and with a little help from Kevin McNeish’s blog) I wrote the GetEntityForObject() extension method:

public static class DataContextExtensions
{
    #region Public methods

    public static T GetEntityForObject<T>(this DbContext context, T value)
        where T : class, new()
    {
        T castEntity = null;

        if (value != null)
        {
            var adapter = context as IObjectContextAdapter;
            Debug.Assert(adapter != null);

            var objectContext = adapter.ObjectContext;
            string entitySetName = context.GetEntitySetName(typeof(T));

            try
            {
                var key = objectContext.CreateEntityKey(entitySetName, value);
                object entity = null;
                bool isExists = objectContext.TryGetObjectByKey(key, out entity);

                if (isExists)
                {
                    Debug.Assert(entity != null);
                    castEntity = entity as T;
                    Debug.Assert(castEntity != null);
                }
            }
            catch { }
        }

        return castEntity;
    }

    public static string GetEntitySetName(this DbContext context, Type type)
    {
        var adapter = context as IObjectContextAdapter;
        Debug.Assert(adapter != null);

        var objectContext = adapter.ObjectContext;
        string entitySetName = GetEntitySetName(type, objectContext);

        return entitySetName;
    }

    #endregion

    #region Private methods
        
    private static string GetEntitySetName(Type type, ObjectContext context)
    {
        // Thanks to Kevin McNeish for this little gem

        string entityTypeName = type.Name;

        var container = context.MetadataWorkspace.GetEntityContainer(
            context.DefaultContainerName, DataSpace.CSpace);

        string entitySetName = (from meta in container.BaseEntitySets
                                where meta.ElementType.Name == entityTypeName
                                select meta.Name).First();

        return entitySetName;
    }

    #endregion

Basically this tries to get the “same” entity from the data context, but returns null if there is no match.  So in the Northwind example we could use it to set the value of a property, so that if it already existed in the database then it would use the database version:

product.Category = context.GetEntityForObject(product.Category);

If it can’t find a matching version in the database then nothing (or more accurately null) gets persisted for that field.  Obviously this could be handled as you see fit (e.g. throw a validation error back to the user if the field gets nulled).

I’ve thrown together a demo that shows the whole process against a WPF client.