Thursday, January 20, 2011

Monads in C#-6. Introducing a simple parser

This is the sixth part of my Monad series. See the introduction for a list of all the posts.

You can find the code for this post on GitHub here.

So far we’ve learnt how to create a simple Monad, the Maybe Monad, and seen how it can help us eliminate boiler plate code by wrapping the mechanics of handling nullability in the Bind function. But as I said before, there’s no real limit to the complexity we can encapsulate in Bind and still treat our Monad as if it were a simple type. To illustrate this, I’m going to leap ahead in complexity and use the canonical Monad example, a Parser Monad. In this post I’ll introduce a simple concept of a Parser and in later posts I’ll show how we can turn it into a Monad.

I’ve ripped this example straight from Graham Hutton’s excellent Haskell textbook, Programming in Haskell.

OK, so let’s think about what a parser does. It takes some input, usually some text, and produces some desired output from it. So a CSV parser would take a text file and output rows and columns of data, preferably typed as numbers, strings and dates. In abstract terms we can think of a parser as a function that takes a string and returns some type T:

Func<string, T>

It’s always easier if we can break large tasks into smaller ones, so it would be good if we could build our parser from lots of mini-parsers. Each mini-parser might only consume some characters of the string, so we really want a function that consumes a string and returns a T and and the unparsed remainder of the input string:

Func<string, Tuple<T,string>>

I’m using the new Tuple type from .NET4 here, but you could just as well NameValuePair or your own custom type.

Our parser might not get a string it understands, so we should be able to account for failure. We’ll do the simplest thing possible for now and reuse our Maybe type:

Func<string, Maybe<Tuple<T,string>>>

Now let’s create a simple parser that just consumes a particular sequence of characters, ‘Hello’:

public static Maybe<Tuple<string, string>> FindHello(string input)
{
return input.StartsWith("Hello")
? new Just<Tuple<string, string>>(Tuple.Create("Hello", input.Skip("Hello".Length).AsString()))
: (Maybe<Tuple<string, string>>)new Nothing<Tuple<string, string>>();
}

If we feed it ‘Hello’ it will return ‘Hello’ plus the remainder of the string:
var result = Parsers.FindHello("Hello World");
var justResult = result as Just<Tuple<string,string>>;
Console.Out.WriteLine("justResult.Value.Item1 = {0}", justResult.Value.Item1);
// justResult.Value.Item1 = Hello
Console.Out.WriteLine("justResult.Value.Item2 = {0}", justResult.Value.Item2);
//justResult.Value.Item2 = World

If we feed it ‘Goodbye’ it will return Nothing:
var result2 = Parsers.FindHello("Goodbye World");
Console.Out.WriteLine("result2 = {0}", result2);
// result2 = Nothing
 
We can make our parser a little more generic by creating a little parser factory that can make a token parser for any sequence of characters. But first, to make our lives easier, let’s define a Parser delegate:
public delegate Maybe<Tuple<T, string>> Parser<T>(string input);

Now here’s our ‘Find’ generator, we’ll write it as an extension method:
public static Parser<string> Find(this string stringToFind)
{
return s => s.StartsWith(stringToFind)
? new Just<Tuple<string, string>>(Tuple.Create(stringToFind,s.Skip(stringToFind.Length).AsString()))
: (Maybe<Tuple<string, string>>)new Nothing<Tuple<string, string>>();
}

This is a higher order function, it returns a function, our parser. Don’t loose sight of the fact that Parser<T> represents a delegate.
 
Now we can use it to make a couple of parsers, a ‘Hello’ parser and a ‘World’ parser:
var helloParser = "Hello".Find();
var worldParser = "World".Find();

Let’s write a convenience extension method to turn parse results into strings:
 
public static string AsString<T>(this Maybe<Tuple<T,string>> parseResult, Func<T, string> unwrap)
{
var justParseResult = parseResult as Just<Tuple<T, string>>;

return (justParseResult != null)
? unwrap(justParseResult.Value.Item1)
: "Nothing";
}

Now we can use our helloParser and worldParser to parse strings:
var result = helloParser("Hello World").AsString(s => s);
Console.Out.WriteLine("result = {0}", result);
// result = Hello

var result2 = worldParser("World Hello").AsString(s => s);
Console.Out.WriteLine("result2 = {0}", result2);
// result2 = World

How could we join them together to make a parser for ‘HelloWorld’ (we won’t worry about whitespace just yet). Here’s a brute force attempt:
Parser<Tuple<string,string>> helloWorldParser = s =>
{
var helloResult = helloParser(s) as Just<Tuple<string, string>>;
if (helloResult == null) return new Nothing<Tuple<Tuple<string, string>, string>>();

var worldResult = worldParser(helloResult.Value.Item2) as Just<Tuple<string, string>>;
if (worldResult == null) return new Nothing<Tuple<Tuple<string, string>, string>>();

return new Just<Tuple<Tuple<string, string>, string>>(Tuple.Create(
Tuple.Create(helloResult.Value.Item1, worldResult.Value.Item1), worldResult.Value.Item2));
};

var result3 = helloWorldParser("HelloWorld").AsString(s => s.Item1 + " " + s.Item2);
Console.Out.WriteLine("result3 = {0}", result3);
// result3 = Hello World

Phew! that was hard work. Imagine if we were combining parsers for anything a little more complex, like our CSV parser for example. We’d have a nasty headache pretty quickly. But don’t worry, we’ll see in the next post that we can turn our parser into a Monad and combine them with sublime simplicity. Stay tuned combinator fans!

5 comments:

Anonymous said...

I'm really enjoying this series, can't wait to read the next one!

Nick said...

I love that Hutton book too :) ... I worked through the combinator parsing example a bit further and posted the eventual result at http://code.google.com/p/sprache/ - though it is still very basic I've found it useful time and time again.

Cheers!
Nick

Mike Hadlow said...

Hi Nick,

Thanks for pointing me to Sprache, it looks like a very nice example. I hadn't come across it before. Have you had much interest in it? I think it would be very nice if the C# community could coalesce around a nice parser combinator library with the status of something like Parsec in the Haskell world. I've noticed several project specific examples in various places, so it seems everybody is currently writing their own. That's fine, but it's probably time for a standard to emerge.

Unknown said...

Good series, unfortunately I missed DDD but Chris (@calcock) said that you gave a cracking good talk.

Mike Hadlow said...

Hi Damian, Monkey,

Thanks :) glad you enjoyed the talk.