Tuesday, September 27, 2011

Some Thoughts On Service Oriented Architecture (Part 2)

I’ve been writing a high-level ‘architectural vision’ document for my current clients. I thought it might be nice to republish bits of it here. This is part 2. The first part is here.

My Client has a core product that is heavily customised for each customer. In this post we look at the different kinds of components that make up this architecture. How some are common services that any make up the core product, and how other components might be bespoke pieces for a particular customer. We also examine the difference between workflow, services and endpoints.

It is very important that we make a clear distinction between components that we write as part of the product and bespoke components that we write for a particular customer. We should not put customer specific code into product components and we should not replicate common product code in customer specific pieces.

Because we are favouring small single-purpose components over large multi-purpose monolithic applications, it should be easy for us to differentiate between product and customer pieces.

There are three main kinds of components that make up a working system. Services, workflow and endpoints. The diagram below shows how they communicate via EasyNetQ, our open-source infrastructure layer. The green parts are product pieces. The blue parts are bespoke customer specific pieces.

image

Services

Services are components that implement a piece of the core product. An example of a service is a component called Renderer that takes templates and data and does a kind of mail-merge. Because Renderer is a service it should never contain any client specific code. Of course customer requirements might mean that enhancements need to be made to Renderer, but these enhancements should always be done with the understanding that Renderer is part of the product. We should be able to deploy the enhanced Renderer to all our customers without the enhancement affecting them.

Services (in fact all components) should maintain their own state using a service specific database. This database should not be shared with other services. The service should communicate via EasyNetQ with other services and not use a shared database as a back-channel. In the case of an updated Renderer, templates would be stored in Renderer’s local database. Any new or updated templates would arrive as messages via EasyNetQ. Each data item to be rendered would also arrive as a message, and once the data has been rendered, the document should also be published via EasyNetQ.

The core point here is that each service should have a clear API, defined by the message types that it subscribes to and publishes. We should be able to fully exercise a component via messages independently of other services. Because the service’s database is only used by the service, we should be able to flexibly modify its schema in response to changing requirements, without having to worry about the impact that will have on other parts of the system.

It’s important that services do not implement workflow. As we’ll see in the next section, a core feature of this architecture is that workflow and services are separate. Render, for example, should not make decisions about what happens to a document after it is rendered, or implement batching logic. These are separate concerns.

Workflow

Workflow components are customer specific components that describe what happens in response to a specific business trigger. They also implement customer specific business rules. For an airline, an example would be workflow that is triggered by a flight event, say a delay. When the component receives the delay message, it might first retrieve the manifest for the flight by sending a request to a manifest service, then render a message telling each passenger about the delay by sending a render request to renderer, then finally send that message by email by publishing an email request. It would typically implement business rules describing when a delay is considered important etc.

By separating workflow from services, we can flexibly implement customer requirements by creating custom workflows without having to customise our services. We can deliver bespoke customer solutions on a common product platform.

We call these workflow pieces, ‘Sagas’, this is a commonly used term in the industry for a long-running business process. Because sagas all need a common infrastructure for hosting, EasyNetQ includes a ‘SagaHost’. SagaHost is a Windows service that hosts sagas, just like it says on the box. This means that the sagas themselves are written as simple assemblies that can be xcopy deployed.

Sagas will usually require a database to store their state. Once again, this should be saga specific and not a database shared by other services. However a single customer workflow might well consist of several distinct sagas, it makes sense for these to be thought of as a unit. These may well share a single database.

Endpoints

Endpoints are components that communicate with the outside world. They are a bridge between our internal AMQP messaging infrastructure and our external HTTP API. The only way into and out of our product should be via this API. We want to be able to integrate with diverse customer systems, but these integration pieces should be implemented as bridges between the customer system and our official API, rather than as bespoke pieces that publish or subscribe directly to the message bus.

Endpoints come in two flavours, externally triggered and internally triggered. An externally triggered endpoint is where communication is initiated by the customer. An example of this would be flight event. These components are best implemented as web services that simply wait to be called and then publish an appropriate message using EasyNetQ.

An internally triggered endpoint is where communication is triggered by an internal event. An example of this would be the completion of a workflow with the final step being an update of a customer system. The API would be implemented as a Windows Service that subscribes to the update message using EasyNetQ and implements an HTTP client that makes a web service request to a configured endpoint.

The Importance of Testability

A core requirement for any component is that it should be testable. It should be possible to test Services and Workflow (Sagas) simply by sending them messages and checking that they respond with the correct messages. Because ‘back-channel’ communication, especially via shared databases, is not allowed we can treat these components as black-boxes that always respond with the same output to the same input.

Endpoints are slightly more complicated to test. It should be possible to send an externally triggered endpoint a request and watch for the message that’s published. An internally triggered endpoint should make a web request when it receives the correct message.

Developers should provide automated tests to QA for any modification to a component.

Monday, September 26, 2011

Some Thoughts On Service Oriented Architecture

I’ve been writing a high-level ‘architectural vision’ document for my current clients. I thought it might be nice to republish bits of it here. This is the section that makes a justification a service oriented architecture based on messaging.

I’ve taken out anything client-specific.

SOA is one of those snake-oil consultancy terms that seem to mean different things depending on who you talk to. My own views on SOA have been formed from three main sources. Firstly there’s the bible on SOA, Hohpe & Woolf’s Enterprise Integration Patterns. This really is essential reading for anyone attempting to get systems to work together. Next there’s the work of Udi Dahan. He the author of the excellent NServiceBus messaging framework for .NET. I’ve attended his course, and his blog is a fantastic mine of knowledge on all things SOA. Lastly there is the fantastic series of blog posts by Bill Poole on SOA. Just check JBOWS is Bad for a taster.

So what is SOA? Software has a complexity problem. There appears to be a geometric relationship between the complexity of a monolithic system and its stability and maintainability. So a system that is twice as complex as another one will be maybe four times more expensive to maintain and a quarter as stable. There is also the human and organisational side of software. Individual teams tend to build or buy their own solutions. The organisation then needs to find ways for these disparate systems to share information and workflow. Small single purpose systems are always a better choice than large monolithic ones. If we build our systems as components, we can build and maintain them independently. SOA is a set of design patterns that guide us in building and integrating these mini-application pieces.

What is a component?

A component in SOA terms is very different from a component in Object-Oriented terms. A component in OO terms is a logical component that is not independently compiled and deployed. A component or ‘service’ in SOA is a physical component that is an independently built and deployed stand-alone application. It is usually designed as a Windows service or possibly as a web service or an executable. It may or may not have a UI, but it will have some mechanism to communicate with other services.

Each component/service should have the following characteristics:

  • Encapsulated. The service should not share its internal state with the outside world. The most common way people break this rule is by having services share a single database. The service should only change state in response to input from its public interface.
  • Contract. The service should have a clear contract with the outside world. Typically this will be a web-service API and/or the message types that it publishes and consumes.
  • Single Purpose. A component should have one job within the system as a whole. Rendering PDFs for example.
  • Context Free. A component should not be dependent on other components, it should only depend on contracts. For example, if my business process component relies on getting a flight manifest from somewhere, it should simply be able to publish a flight manifest request, it shouldn’t expect a specific flight manifest service to be present.
  • Independently deployable. I should be able to deploy the component/service independently.
  • Independently testable. It should be possible to test the service in isolation without having other services in the system running.

How do components communicate?

Now that we’ve defined the characteristics of a component in our SOA, the next challenge is to select the technology and patterns that they use to communicate with each other. We can list the things that we want from our communication technology:

  • Components should only be able to communicate via a well defined contract - logically-decoupled. They should not be able to dip into each other’s internal state.
  • Components should not have to be configured to communicate with specific endpoints – decoupled configuration. It should be possible to deploy a new service without having to reconfigure the services that it talks to.
  • The communication technology should support ‘temporal-decoupling’. This means that all the services do not have to be running at the same time for the system to work.
  • The communication should be low latency. There should be a minimal delay between a component sending a message and the consumer receiving it.
  • The communication should be based on open standards.

Let’s consider how some common communication technologies meet these criteria…

  logically decoupled decoupled configuration temporal decoupling low latency open standards
File Transfer yes yes yes no yes
Shared Database no yes yes no no
RPC (remoting COM+) yes no no yes no
SOAP Web Services yes no no yes yes
Message Queue (MSMQ) yes no yes yes no
Pub/Sub Messaging (AMQP) yes yes yes yes yes

 

From the table we can see that there’s only one technology that ticks all our boxes, and that is pub/sub messaging based on an open standard like AMQP. AMQP is a bit like HTTP for messaging, an open wire-level protocol for brokers and their clients. The leading AMQP implementation is RabbitMQ, and this is technology we’ve chosen as our core messaging platform.

There is a caveat however. For communicating with external clients and their systems, building endpoints with open-standards outweighs all other considerations. Now AMQP is indeed an open standard, but it’s a relatively new one that’s only supported by a handful of products. To be confident that we can interoperate with a wide variety of 3rd party systems over the internet we need a ubiquitous technology, so for all external communication we will use HTTP based web-services.

So, AMQP internally, HTTP externally.

Wednesday, September 14, 2011

Thoughts on Windows 8 (part 2)

Back in June I wrote some thoughts on Windows 8 after the initial announcement. Now that we’ve got more details from the Build conference, I thought I’d do a little update.

Microsoft have climbed down and made a significant concession to WPF/Silverlight/.NET devs. Gone is the previous message that Metro applications will only be written in HTML/Javascript, developers can now choose which technology they want to use on the new platform. There still seems to be a bias towards HTML/Javascript judging my the number of sessions on each however, and it seems like MS would prefer developers to go down the HTML/Javascript route. How much this double headed personality effects the development experience is yet to be seen.

Somebody high up in MS must have banged some heads together to get the Windows and Developer Divisions talking to each other. They'd become two opposing camps after the fallout from the failed Vista WinFX experiment. Now Windows is forced to support XAML/.NET and Dev-Division arm-wrestled into supporting the HTML/Javascript model in Visual Studio and Blend. Once again the depth of this rapprochement will be the deciding factor when it comes to getting a consistent message across to us developers.

The message is still clear though; Javascript has won the latest round of the language wars, whatever you think about it as a language, it's becoming as ubiquitous as C. But .NET developers are going to have to be dragged kicking and screaming to this party.

The big question is still 'will it work?' Will Windows 8 be enough to get MS back into the game? There are two main problems I can see:

1. Microsoft has a strategic problem. They make money by selling operating systems. Their two main competitors, Google and Apple, don't. Will they be able to make Windows 8 financially attractive to tablet developers when they can get Android licence free and they are competing on price with the dominant iPad? Having said that, Android has been struggling on tablets, so there's still an opportunity for Microsoft to get traction in on that form factor.

2. Windows 8 is a hybrid. There's the traditional Windows mouse-and-keyboard UI that they have to keep, and there's the new Metro UI that they want everyone to develop for. What's the experience going to be like for a tablet user when they end up in Windows classic, or for the desktop user when they switch to Metro? The development experience for both sets of UI is going to be very different too, almost like developing for two different platforms. It will be interesting if Microsoft sells a business version of Windows 8 without Metro, or indeed a tablet version without the classic UI.

Despite having raised all these questions, I do think Microsoft have a workable strategy for Windows 8, and it’s going  to be an exciting time over the next few years to see how it pans out. There is still a window (sorry) of opportunity in the tablet form factor for Microsoft to challenge Apple id they can get this right. Let’s hope they can.

Tuesday, September 13, 2011

Why Write a .NET API For RabbitMQ?

Anyone who reads this blog knows that my current focus is writing a .NET API for RabbitMQ which I’ve named EasyNetQ. This is being paid for by my excellent clients 15Below who build high volume messaging solutions for the airline industry. EasyNetQ is a core part of their strategy moving forwards, so it’s going to get plenty of real-world use.

One question I’ve been asked quite a lot, is ‘why?’. Why am I building a .NET API when one is already available; the C# AMQP client from RabbitHQ? Think of AMQP as the HTTP of messaging. It’s a relatively low-level protocol. You typically wouldn’t build a web application directly against a low-level HTTP API such as System.Net.WebRequest, instead you would use a higher level toolkit such as WCF or ASP.NET MVC. Think of EasyNetQ as the ASP.NET MVC of AMQP.

AMQP is designed to be cross platform and language agnostic. It is also designed to flexibly support a wide range of messaging patterns based on the Exchange/Binding/Queue model. It’s great having this flexibility, but with flexibility comes complexity. It means that you will need to write a significant amount of code in order to implement a RabbitMQ client. Typically this code would include:

  • Implementing messaging patterns such as Publish/Subscribe or Request/Response. Although, to be fair, the .NET client does provide some support here.
  • Implement a routing strategy. How will you design your exchange-queue bindings, and how will you route messages between producers and consumers?
  • Implement message serialization/deserialization. How will you convert the binary representation of messages in AMQP to something your programming language understands?
  • Implement a consumer thread for subscriptions. You will need to have a dedicated consumer loop waiting for messages you have subscribed to. How will you deal with multiple subscribers, or transient subscribers, like those waiting for responses from a request?
  • Implement a versioning strategy for your messages. What happens when your message schema needs to change in response to business requirements?
  • Implement subscriber reconnection. If the connection is disrupted or the RabbitMQ server bounces, how do you detect it and make sure all your subscriptions are rebuilt?
  • Understand and implement quality of service settings. What settings do you need to make to ensure that you have a reliable client.
  • Implement an error handling strategy. What should your client do if it receives a malformed message, or if an unexpected exception is thrown?
  • Implement monitoring tools. How will you monitor your client applications so that you are alerted if there are any problems?

With EasyNetQ, you get all these out-of-the-box. You loose some of the flexibility in exchange for a model based on .NET types for routing, but it saves an awful lot of code.

Monday, September 12, 2011

Restarting RabbitMQ With Running EasyNetQ Clients

Here’s a screenshot of one of my tests of EasyNetQ (my easy-to-use .NET API for RabbitMQ). I’m running two publishing applications (top) and two subscribing applications (bottom), all publishing and subscribing to the same queue.  We’re getting a throughput of around 5000 messages / second on my local machine. Once they’re all humming along nicely, I bounce the RabbitMQ service. As you can see,  some messages get logged and once the RabbitMQ service comes back, things recover nicely and all the applications continue publishing and subscribing as before. I think that’s pretty sweet :)

RabbitRestart

The publisher and subscriber console apps are both in the EasyNetQ repository on GitHub:

EasyNetQ.Tests.Performance.Producer
EasyNetQ.Tests.Performance.Consumer

Friday, August 26, 2011

How to stop System.Uri un-escaping forward slash characters

Sometimes you want to construct a URI that has an escaped forward slash. For example, the RabbitMQ Management API requires that you encode the default rabbit VirtualHost ‘/’ as ‘%2f’. Here is the URL to get the details of a queue:

http://192.168.1.4:55672/api/queues/%2f/EasyNetQ_Default_Error_Queue

But if I try use WebRequest or WebClient the ‘%2f’ is un-escaped to a ‘/’, so the URL becomes:

http://192.168.1.4:55672/api/queues///EasyNetQ_Default_Error_Queue

And I get a 404 not found back :(

Both WebRequest and WebClient use System.Uri internally. It’s easy to demonstrate this behaviour with the following code:

var uri = new Uri(url);
Console.Out.WriteLine("uri = {0}", uri.PathAndQuery);
// outputs /api/queues///EasyNetQ_Default_Error_Queue

A bit of digging in the System.Uri code thanks to the excellent ReSharper 6.0, and help from this Stack Overflow question, shows that it’s possible to reset some flags and stop this behaviour. Here’s my LeaveDotsAndSlashesEscaped method (it’s .NET 4.0 specific):

private void LeaveDotsAndSlashesEscaped()
{
var getSyntaxMethod =
typeof (UriParser).GetMethod("GetSyntax", BindingFlags.Static | BindingFlags.NonPublic);
if (getSyntaxMethod == null)
{
throw new MissingMethodException("UriParser", "GetSyntax");
}

var uriParser = getSyntaxMethod.Invoke(null, new object[] { "http" });

var setUpdatableFlagsMethod =
uriParser.GetType().GetMethod("SetUpdatableFlags", BindingFlags.Instance | BindingFlags.NonPublic);
if (setUpdatableFlagsMethod == null)
{
throw new MissingMethodException("UriParser", "SetUpdatableFlags");
}

setUpdatableFlagsMethod.Invoke(uriParser, new object[] {0});
}

The usual caveats of poking into system assemblies with reflection apply. Don’t expect this to work with any other version of .NET than 4.0.
 
Now if we re-run our test…
 
LeaveDotsAndSlashesEscaped();
const string url = "http://192.168.1.4:55672/api/queues/%2f/EasyNetQ_Default_Error_Queue";
var uri = new Uri(url);
Console.Out.WriteLine("uri = {0}", uri.PathAndQuery);
// outputs /api/queues/%2f/EasyNetQ_Default_Error_Queue

Voila!
 
If you have a Web.config or App.config file you can also try this configuration setting, which should do the same thing (from this SO question) but I haven’t tried it personally:
 
<uri> 
<schemeSettings>
<add name="http" genericUriParserOptions="DontUnescapePathDotsAndSlashes" />
</schemeSettings>
</uri>

Monday, July 18, 2011

An Action Cache

Do you ever find yourself in a loop calling a method that expects an Action or a Func as an argument? Here’s an example from an EasyNetQ test method where I’m doing just that:

[Test, Explicit("Needs a Rabbit instance on localhost to work")]
public void Should_be_able_to_do_simple_request_response_lots()
{
for (int i = 0; i < 1000; i++)
{
var request = new TestRequestMessage { Text = "Hello from the client! " + i.ToString() };
bus.Request<TestRequestMessage, TestResponseMessage>(request, response =>
Console.WriteLine("Got response: '{0}'", response.Text));
}

Thread.Sleep(1000);
}

My initial naive implementation of IBus.Request set up a new response subscription each time Request was called. Obviously this is inefficient. It would be much nicer if I could identify when Request is called more than once with the same callback and re-use the subscription.

The question I had was: how can I uniquely identify each callback? It turns out that action.Method.GetHashcode() reliably identifies a unique action. I can demonstrate this with the following code:

public class UniquelyIdentifyDelegate
{
readonly IDictionary<int, Action> actionCache = new Dictionary<int, Action>();

public void DemonstrateActionCache()
{
for (var i=0; i < 3; i++)
{
RunAction(() => Console.Out.WriteLine("Hello from A {0}", i));
RunAction(() => Console.Out.WriteLine("Hello from B {0}", i));

Console.Out.WriteLine("");
}
}

public void RunAction(Action action)
{
Console.Out.WriteLine("Mehod = {0}, Cache Size = {1}", action.Method.GetHashCode(), actionCache.Count);
if (!actionCache.ContainsKey(action.Method.GetHashCode()))
{
actionCache.Add(action.Method.GetHashCode(), action);
}

var actionFromCache = actionCache[action.Method.GetHashCode()];

actionFromCache();
}
}


Here, I’m creating an action cache keyed on the action method’s hashcode. Then I’m calling RunAction a few times with two distinct action delegates. Note that they also close over a variable, i, from the outer scope.

Running DemonstrateActionCache() outputs the expected result:

Mehod = 59022676, Cache Size = 0
Hello from A 0
Mehod = 62968415, Cache Size = 1
Hello from B 0

Mehod = 59022676, Cache Size = 2
Hello from A 1
Mehod = 62968415, Cache Size = 2
Hello from B 1

Mehod = 59022676, Cache Size = 2
Hello from A 2
Mehod = 62968415, Cache Size = 2
Hello from B 2

Rather nice I think :)

Task Parallel Library: How To Write a Simple Delay Task

I just had a need for a delay task. A simple method that I can call to create a task that will turn a Func<T> into a Task<T> that will execute after a given delay.

The starting point for any Task creation based on an external asynchronous operation, like a Timer callback, is the TaskCompletionSource class.  It provides methods to transition the task it creates to different states. You call SetResult when the operation is completes, SetException if the operation fails, and SetCancelled if you want to cancel the task.

Here’s my RunDelayed method:

private static Task<T> RunDelayed<T>(int millisecondsDelay, Func<T> func)
{
if (func == null)
{
throw new ArgumentNullException("func");
}
if (millisecondsDelay < 0)
{
throw new ArgumentOutOfRangeException("millisecondsDelay");
}

var taskCompletionSource = new TaskCompletionSource<T>();

var timer = new Timer(self =>
{
((Timer)self).Dispose();
try
{
var result = func();
taskCompletionSource.SetResult(result);
}
catch (Exception exception)
{
taskCompletionSource.SetException(exception);
}
});
timer.Change(millisecondsDelay, millisecondsDelay);

return taskCompletionSource.Task;
}

I simply create a new TaskCompletionSource and a Timer where the callback calls SetResult with the result of the given Func<T>. If the Func<T> throws, we simply catch the exception and call SetException. Finally we start the timer and return the Task.

You would use it like this:

var task = RunDelayed(1000, () => "Hello World!");
task.ContinueWith(t =>
{
// 'Hello World' is output a second later on a threadpool thread.
Console.WriteLine(t.Result);
});

You can use the same technique to turn any asynchronous operation into a Task.

Note however if your operation exposes an APM API, it’s much easier to use the Task.Factory.FromAsync method.

Thursday, July 14, 2011

EasyNetQ: How Should a Messaging Client Handle Errors?

EasyNetQ is my simple .NET API for RabbitMQ.

I’ve started thinking about the best patterns for implementing error handling in EasyNetQ. One of the aims of EasyNetQ is to remove as many infrastructure concerns from the application developer as possible. This means that the API should correctly handle any exceptions that bubble up from the application layer.

One of the core requirements is that we shouldn’t lose messages when the application throws. The question then becomes: where should the message, that the application was consuming when it threw, go? There seem to be three choices:

  1. Put the failed message back on the queue it was consumed from.
  2. Put the failed message on an error queue.
  3. A combination of 1 and 2.

Option 1 has the benefit that it’s the out-of-the-box behaviour of AMQP. In the case of EasyNetQ, I would simply catch any exceptions, log them, and just send a noAck command back to RabbitMQ. Rabbit would put the message at the back of the queue and then resend it when it got to the front.

Another advantage of this technique is that it gives competing consumers the opportunity to process the message. If you have more than one consumer on a queue, Rabbit will send the messages to them in turn, so this is out-of-the-box.

The drawback of this method is that there’s the possibility of the queue filling up with failed messages. The consumer would just be cycling around throwing exceptions and any messages that it might be able to to consume would be slowed down by having to wait their turn amongst a long queue of failed messages.

Another problem is that it’s difficult to manually inspect the messages and selectively delete or retry them.

Option 2 is harder to implement. When an error occurs I would wrap the failed message in a special error message wrapper. This can include details about the type and location of the exception and other information such as stack traces. I would then publish the error message to an error exchange. Each consumer queue should have a matching error exchange. This gives the opportunity to bind generic error queues to all error exchanges, but also to have special case error consumers for particular queues.

I would need to write an error queue consumer to store the messages in a database. I would then need to provide the user with some way to inspect the messages alongside the error that caused them to arrive in the error queue so that they could make a ignore/retry decision.

I could also implement some kind of wait-and-retry function on the error queue, but that would also add additional complexity.

It has the advantage that the original queue remains clear of failing messages. Failed messages and the error condition that caused the failure can be inspected together, and failed messages can be manually ignored or retried.

With the failed messages sitting in a database, it would also be simple to create a mechanism where those messages could be replayed on a developer machine to aid in debugging.

A combination of 1 and 2. I’m moving towards thinking that a combination of 1 & 2 might be the best strategy. When a message fails initially, we simply noAck it and it goes back to the queue. AMQP provides a Redelivered flag, so when the messages is consumed a second time we can be aware that it’s a retry. Unfortunately there doesn’t seem to be a retry count in AMQP, so the best we can do is allow for a single retry. This has the benefit that it gives a competing consumer a chance to process the message.

No retry count is a problem. One option some people use is to roll their own ‘nack’ mechanism. In this case, when an error occurs in the consumer, rather than sending a ‘nack’ to Rabbit and relying on the built-in behaviour, the client ‘acks’ the message to remove it from the queue, and then re-publishes it via the default exchange back to the originating queue. Doing this gives the client access to the message and allows a ‘retry count’ header to be set.

After the single retry we fall back to Option 2. The message is passed to the error queue on the second failure.

I would be very interested in hearing how other people have implemented error handling with AMQP/RabbitMQ.

Updated based on feedback on the 15th July

Wednesday, July 13, 2011

MEF DirectoryCatalog Fails to Load Assemblies

I had an interesting problem with the Managed Extensibility Framework yesterday. I’m using the DirectoryCatalog to load assemblies from a given directory. Pretty standard stuff. When I tested my host on my developer machine, it got the works on my machine badge, but when I ran the host on one of our servers, it ignored all the assemblies.

Nothing loaded …

Hmm …

It turns out, after much digging and help from my Twitter crew,  that the assembly loader that MEF’s DirectoryCatalog uses ignores any files that have a URL Zone set. I described these zones in detail in my previous post here:

http://mikehadlow.blogspot.com/2011/07/detecting-and-changing-files-internet.html

Because we copy our plugins from a file share, Windows was marking them as belonging to the Intranet Zone. Thus the odd only-when-deployed behaviour.

How you deal with this depends on whether you think that files marked in this way represent a security threat or not. If you do, the best policy is to detect any assemblies in your DirectoryCatalogue directory that have a Zone set and log them. You can do that with the System.Security.Policy.Zone class:

var zone = Zone.CreateFromUrl("file:///C:/temp/ZoneTest.doc");
if (zone.SecurityZone != SecurityZone.MyComputer)
{
Console.WriteLine("File is blocked");
}
Console.Out.WriteLine("zone.SecurityZone = {0}", zone.SecurityZone);

If you don’t consider files copied from elsewhere a security concern, but rather a feature of your operating procedure, then you can clear the Zone flags from all the assemblies in the directory with the help of Richard Deeming’s Trinet.Core.IO.Ntfs library. I wrote a little class using this:

public class UrlZoneService
{
public static void ClearUrlZonesInDirectory(string directoryPath)
{
foreach (var filePath in Directory.EnumerateFiles(directoryPath))
{
var fileInfo = new FileInfo(filePath);
fileInfo.DeleteAlternateDataStream("Zone.Identifier");
}
}
}

I just run this before initiating my DirectoryCatalogue and now network copied assemblies load as expected.