Thursday, April 04, 2013

Lua as a Distributed Workflow Scripting Language

I’ve been spending a lot of time recently thinking about ways of orchestrating long running workflows in a service oriented architecture. I was talking this over at last Tuesday’s Brighton ALT NET when Jay Kannan, who’s a game developer amongst other things, mentioned that Lua is a popular choice for scripting game platforms. Maybe I should check it out. So I did. And it turns out to be very interesting.

If you haven’t heard of Lua, it’s a “powerful, fast, lightweight, embeddable scripting language” originally conceived by a team at the Catholic University of Rio de Janeiro in Brazil. It’s the leading scripting language for game platforms and also pops up in other interesting locations including Photoshop and Wikipedia. It’s got a straightforward C API that makes it relatively simple to p-invoke from .NET, and indeed there’s a LuaInterface library that provides a managed API.

I got the source code from the Google code svn repository and built it in VS 2012, but there are NuGet packages available as well.

It turned out to be very simple to use Lua to script a distributed workflow. Lua has first class coroutines which means that you can pause and continue a Lua script at will. The LuaInterface library allows you inject C# functions and call them as Lua functions, so it’s simply a case of calling an asynchronous C# ‘begin’ function, suspending the script by yielding the coroutine, waiting for the asynchronous function to return, setting the return value, and starting up the script again.

Let me show you how.

First here’s a little Lua script:

a = 5
b = 6

print('doing remote add ...')

r1 = remoteAdd(a, b)

print('doing remote multiply ...')

r2 = remoteMultiply(r1, 4)

print('doing remote divide ...')

r3 = remoteDivide(r2, 2)

print(r3)

The three functions ‘remoteAdd’, ‘remoteMultiply’ and ‘remoteDivide’ are all asynchronous. Behind the scenes a message is sent via RabbitMQ to a remote OperationServer where the calculation is carried out and a message is returned.

The script runs in my LuaRuntime class. This creates and sets up the Lua environment that the script runs in:

public class LuaRuntime : IDisposable
{
private readonly Lua lua = new Lua();
private readonly Functions functions = new Functions();

public LuaRuntime()
{
lua.RegisterFunction("print", functions, typeof(Functions).GetMethod("Print"));
lua.RegisterFunction("startOperation", this, GetType().GetMethod("StartOperation"));

lua.DoString(
@"
function remoteAdd(a, b) return remoteOperation(a, b, '+'); end
function remoteMultiply(a, b) return remoteOperation(a, b, '*'); end
function remoteDivide(a, b) return remoteOperation(a, b, '/'); end

function remoteOperation(a, b, op)
startOperation(a, b, op)
local cor = coroutine.running()
coroutine.yield(cor)

return LUA_RUNTIME_OPERATION_RESULT
end
"
);
}

public void StartOperation(int a, int b, string operation)
{
functions.RunOperation(a, b, operation, result =>
{
lua["LUA_RUNTIME_OPERATION_RESULT"] = result;
lua.DoString("coroutine.resume(co)");
});
}

public void Execute(string script)
{
const string coroutineWrapper =
@"co = coroutine.create(function()
{0}
end)"
;
lua.DoString(string.Format(coroutineWrapper, script));
lua.DoString("coroutine.resume(co)");
}

public void Dispose()
{
lua.Dispose();
functions.Dispose();
}
}

When this class is instantiated it creates a new LuaInterface environment (the Lua class) and a new instance of a Functions class that I’ll explain below.

The constructor is where most of the interesting setup happens. First we register two C# functions that we want to call from inside Lua: ‘print’ which simply prints from the console, and ‘startOperation’ which starts an asynchronous math operation.

Next we define our three functions: ‘remoteAdd’, ‘remoteMultiply’ and ‘remoteDivide’ which all in turn invoke a common function ‘remoteOperation’. RemoteOperation calls the registered C# function ‘startOperation’ then yields the currently running coroutine. Effectively the script will stop here until it’s started again. After it starts, the result of the asynchronous operation is accessed from the  LUA_RUNTIME_OPERATION_RESULT variable and returned to the caller.

The C# function StartOperation calls RunOperation on our Functions class which has an asynchronous callback. In the callback we set the result value in the Lua environment and execute ‘coroutine.resume’ which restarts the Lua script at the point where it yielded.

The Execute function actually runs the script. First it embeds it in a ‘coroutine.create’ call so that the entire script is created as a coroutine, then it simply starts the coroutine by calling ‘coroutine.resume’.

The Functions class is just a wrapper around a function that maintains an EasyNetQ connection to RabbitMQ and makes an EasyNetQ request to a remote server somewhere else on the network.

public class Functions : IDisposable
{
private readonly IBus bus;

public Functions()
{
bus = RabbitHutch.CreateBus("host=localhost");
}

public void Dispose()
{
bus.Dispose();
}

public void RunOperation(int a, int b, string operation, Action<int> resultCallback)
{
using (var channel = bus.OpenPublishChannel())
{
var request = new OperationRequest()
{
A = a,
B = b,
Operation = operation
};
channel.Request<OperationRequest, OperationResponse>(request, response =>
{
Console.WriteLine("Got response {0}", response.Result);
resultCallback(response.Result);
});
}
}

public void Print(string msg)
{
Console.WriteLine("LUA> {0}", msg);
}
}
 
Here’s a sample run of the script:
 
DEBUG: Trying to connect
DEBUG: OnConnected event fired
INFO: Connected to RabbitMQ. Broker: 'localhost', Port: 5672, VHost: '/'
LUA> doing remote add ...
DEBUG: Declared Consumer. queue='easynetq.response.143441ff-3635-4d5d-8e42-6b379b3f8356', prefetchcount=50
DEBUG: Published to exchange: 'easy_net_q_rpc', routing key: 'Mike_DistributedLua_Messages_OperationRequest:Mike_DistributedLua_Messages', correlationId: '50560dd9-2be1-49a1-96f6-9c62641080ae'
DEBUG: Recieved
RoutingKey: 'easynetq.response.143441ff-3635-4d5d-8e42-6b379b3f8356'
CorrelationId: '50560dd9-2be1-49a1-96f6-9c62641080ae'
ConsumerTag: '101343d9-9497-4893-88e6-b89cc1de29a4'
Got response 11
LUA> doing remote multiply ...
DEBUG: Declared Consumer. queue='easynetq.response.f571f6d7-b963-4a88-bf62-f05785009e39', prefetchcount=50
DEBUG: Published to exchange: 'easy_net_q_rpc', routing key: 'Mike_DistributedLua_Messages_OperationRequest:Mike_DistributedLua_Messages', correlationId: '0ea7e1c3-6f12-4cb9-a861-2f5de8f2600d'
DEBUG: Model Shutdown for queue: 'easynetq.response.143441ff-3635-4d5d-8e42-6b379b3f8356'
DEBUG: Recieved
RoutingKey: 'easynetq.response.f571f6d7-b963-4a88-bf62-f05785009e39'
CorrelationId: '0ea7e1c3-6f12-4cb9-a861-2f5de8f2600d'
ConsumerTag: '2c35f24e-7745-4475-885a-d214a1446a70'
Got response 44
LUA> doing remote divide ...
DEBUG: Declared Consumer. queue='easynetq.response.060f7882-685c-4b00-a930-aa4f20f7c057', prefetchcount=50
DEBUG: Published to exchange: 'easy_net_q_rpc', routing key: 'Mike_DistributedLua_Messages_OperationRequest:Mike_DistributedLua_Messages', correlationId: 'ea9a90cc-cd7d-4f05-b171-c6849026ac4a'
DEBUG: Model Shutdown for queue: 'easynetq.response.f571f6d7-b963-4a88-bf62-f05785009e39'
DEBUG: Recieved
RoutingKey: 'easynetq.response.060f7882-685c-4b00-a930-aa4f20f7c057'
CorrelationId: 'ea9a90cc-cd7d-4f05-b171-c6849026ac4a'
ConsumerTag: '90e6b024-c5c4-440a-abdf-cb9a000c131c'
Got response 22
LUA> 22
DEBUG: Model Shutdown for queue: 'easynetq.response.060f7882-685c-4b00-a930-aa4f20f7c057'
Completed
DEBUG: Connection disposed

You can see the Lua print statements interleaved with EasyNetQ DEBUG statements showing the messages being published and consumed.
 
So there you go, a distributed workflow scripting engine in under 100 lines of code. All I’ve got to do now is serialize the Lua environment at each yield and then restart it again from its serialized state. This is possible according to a bit of googling yesterday afternoon. Watch this space.
 
You can find the code for all this on GitHub here:
 

3 comments:

Anonymous said...

Interesting.

Having spent time with Erlang / actor model, and NServiceBus sagas, I probably wouldn't use this.

Still interesting though.

Vasily said...

Why don't you use F# async workflows for this? It'd be static typed, clearer and faster.

Travelling Greg said...

While a fun toy, a considerable amount of work would need to be done on this to make it anything resembling useful. As example it is common to want to do a join in such scripts. This model does not support such an operation and it would be non-trivial to make work.