Thursday, November 02, 2006

Using MemoryStream and BinaryFormatter for reuseable GetHashCode and DeepCopy functions

Here's a couple of techniques I learnt a while back to do add two important capabilities to your objects; compute a hash code and execute a deep copy. I can't find the orginal source for the hash code example, but the deep copy comes from Rockford Lhotka's CSLA. Both examples are my implementation of the basic idea. Both techniques utilise the MemoryStream and BinaryFormatter by getting the object to serialize itself to a byte array. To compute the hash code I simply use SHA1CryptoServiceProvider to create a 20 byte hash of the serialized object and get then xor an integer value from that.

public override int public override int GetHashCode()
{
    byte[] thisSerialized;
    using(System.IO.MemoryStream stream = new System.IO.MemoryStream())
    {
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter().Serialize(stream, this);
        thisSerialized = stream.ToArray();
    }
    byte[] hash = new System.Security.Cryptography.SHA1CryptoServiceProvider().ComputeHash(thisSerialized);
    uint hashResult = 0;
    for(int i = 0; i < hash.Length; i++)
    {
        hashResult ^= (uint)(hash[i] << i % 4);
    }
    return (int)hashResult;
}

The most common use for a hash code is to make hash tables efficient and to implement Equals(). Note, there's a one in 4,294,967,295 chance that this will provide a false equals (thanks to Richard for pointing that out to me):

public override bool Equals(object obj)
{
    if(!(obj is MyClass)) return false;
    return this.GetHashCode() == obj.GetHashCode();
}

To do a deep copy I simply get the object to serialize itself and deserialize it as a new instance. Be carefull, this technique will serialize everything in this object's graph so make sure you're aware of what is referenced by it and that all the objects in the graph are marked as [Serializable], Here's a generic example that you can reuse in any object that needs deep copy:

public T DeepCopy<T>()
{
    T snapshot;
    using(System.IO.MemoryStream stream = new System.IO.MemoryStream())
    {
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter formatter = 
            new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
        formatter.Serialize(stream, this);
        stream.Position = 0;
        snapshot = (T)formatter.Deserialize(stream);
    }
    return snapshot;
}

1 comment:

Anonymous said...

I realize this is a rather old post but I stumbled on it while looking up the BinaryFormatter.

Implementing GetHashCode in this way is ridiculously inefficient (and hash codes are all about efficiency), and basing Equals exclusively on the hash code is a terrible idea and will likely fail in some horrible and unpredictable way in production code.

This advice is bad, and you should feel bad.