Tuesday, July 01, 2008

Eagerly Loading Northwind: playing with DataLoadOptions

An email today from Hrvoje Hudoletnjak set me off experimenting with LINQ-to-SQL DataLoadOptions. By default LINQ-to-SQL lazy loads entities. Here's a bit of the Northwind database mapped with the LINQ-to-SQL designer: If we do some simple data access with the Northwind database like this:
var dataContext = new NorthwindDataContext();

var theOrder = dataContext.GetTable<Order>().Single(order => order.OrderID == 10248);

Console.WriteLine("Order Date: {0}\r\nShipper: {1}\r\nCustomer Name: {2}\r\n", 
    theOrder.OrderDate, 
    theOrder.Shipper.CompanyName,
    theOrder.Customer.ContactName);

Console.WriteLine("Customer Demographic:");
foreach (var customerCustomerDemo in theOrder.Customer.CustomerCustomerDemos)
{
    Console.Write("{0}, ", customerCustomerDemo.CustomerDemographic.CustomerDesc);
}
Console.WriteLine("\r\n");

foreach (var orderDetail in theOrder.Order_Details)
{
    Console.WriteLine("Product Name: {0},\r\nSuppler Name: {1},\r\nCategory {2},\r\nQuantity {3}\r\n\r\n",
        orderDetail.Product.ProductName,
        orderDetail.Product.Supplier.CompanyName,
        orderDetail.Product.Category.CategoryName,
        orderDetail.Quantity);
}
Which gives this result: And then we use SQL Profiler to see what SQL gets thrown at the database, we get this: 17 separate SQL select statements thrown at our database: If we then use DataLoadOptions to eagerly load our order like this:
using System;
using System.Data.Linq;
using System.Linq;

namespace Mike.Northwind
{
    class Program
    {
        static void Main(string[] args)
        {
            var dataContext = new NorthwindDataContext();

            var options = new DataLoadOptions();

            options.LoadWith<Order>(order => order.Shipper);
            options.LoadWith<Order>(order => order.Customer);
            options.LoadWith<Customer>(customer => customer.CustomerCustomerDemos);
            options.LoadWith<CustomerCustomerDemo>(ccd => ccd.CustomerDemographic);

            options.LoadWith<Order>(order => order.Order_Details);
            options.LoadWith<Order_Detail>(orderDetail => orderDetail.Product);
            options.LoadWith<Product>(product => product.Supplier);
            options.LoadWith<Product>(product => product.Category);

            dataContext.LoadOptions = options;

            var theOrder = dataContext.GetTable<Order>().Single(order => order.OrderID == 10248);

            Console.WriteLine("Order Date: {0}\r\nShipper: {1}\r\nCustomer Name: {2}\r\n", 
                theOrder.OrderDate, 
                theOrder.Shipper.CompanyName,
                theOrder.Customer.ContactName);

            Console.WriteLine("Customer Demographic:");
            foreach (var customerCustomerDemo in theOrder.Customer.CustomerCustomerDemos)
            {
                Console.Write("{0}, ", customerCustomerDemo.CustomerDemographic.CustomerDesc);
            }
            Console.WriteLine("\r\n");

            foreach (var orderDetail in theOrder.Order_Details)
            {
                Console.WriteLine("Product Name: {0},\r\nSuppler Name: {1},\r\nCategory {2},\r\nQuantity {3}\r\n\r\n",
                    orderDetail.Product.ProductName,
                    orderDetail.Product.Supplier.CompanyName,
                    orderDetail.Product.Category.CategoryName,
                    orderDetail.Quantity);
            }
        }
    }
}
We only get two hits. First this large select which is pretty much everything: And this one which gets CustomerDemographic. Now I wondered why CustomerDemographic has to be fetched separately and I guess it's either that it's at the other end of a many-to-many relationship with a join table, or that it's another collection in addition to the order details and it only makes sense to get one collection at a time? What is plain is that you can dramatically reduce the round trips to the database that are required by judicious use of DataLoadOptions. Source for this here http://static.mikehadlow.com.s3.amazonaws.com/Mike.Northwind.zip

4 comments:

Ken Egozi said...

I'd guess that the second query is needed cuz it's Many-to-Many, so joining to Demographic would cause a huge number of rows to be returned.

btw, can you make LinqToSQL to issue these two queries within a single roundtrip?

Mike Hadlow said...

Hi Ken,

Yeah, I think you're right about the many-to-many join, but I should do some more experiments to find out what the behaviour is when a root entity has many collections and you ask to eagerly load them all. I can't see how it can efficiently do it in a single statement.

I'm not aware of any way to get LINQ-to-SQL to issue multiple queries in a single round trip.

Mike

Anonymous said...

I don't know if I"m too late to comment on this but if you eagerly load Custoemrs-->Orders-->OrderDetails then I will issue one statement for eagerly fetching Custoemrs-->Orders but for each OrderDetail, it will execute a seperate statement. In order words if we need to fetch two many-to-many collections, we are gonna end up executing too many statements.

Mike Hadlow said...

Hi Anonymous,

Thanks for that. You are right, I experimented with trying to get a list of customers with all their orders and order details. Even if you set the DataLoadOptions to load orders with customers and order details with orders, LINQ-to-SQL will issue one SQL select statement for the customers and then a separate one one for each order with its order details. This is plainly something that one should be aware of, especially when loading hierarchial lists of items.