Saturday, March 15, 2008

LINQ to String


I'm currently reading the excellent Linq in Action by Marguerie, Eichert and Wooley. It's great exposition of all things Linq and there are lots really well explained examples. Something which tickled me, which I hadn't realized before is that you can use Linq over strings. Well of course, since System.String implements IEnumerable<T>. It gives you some interesting alternatives to standard string functions:

string text = "The quick brown fox jumped over the lazy dog.";

// substring
text.Skip(4).Take(5).Write();   // "quick"

// remove characters
text.Where(c => char.IsLetter(c)).Write(); // "Thequickbrownfoxjumpedoverthelazydog."

string map = "abcdefg";

// strip out only the characters in map
text.Join(map, c1 => c1, c2 => c2, (c1, c2) => c1).Write(); // "ecbfedeeadg"

// does text contain q?
text.Contains('q').Write(); // true

The 'Write()' at the end of each expression is just an extension method on IEnumerable<T> that writes to the console.

OK, so most of these aren't that useful and there are built in string functions to do most of these (except the mapping I think). The cool thing is that you can write a little function, called 'Chars' in my example, that exposes a file stream as IEnumerable<T>. This means you can do the same tricks on a file and since we're just building up a decorator chain of enumerators the entire file isn't loaded into memory, only one character at a time. In this example, showing the 'substring' again, the file will stop being read after the first nine characters.

string text = "The quick brown fox jumped over the lazy dog.";
string myDocuments = System.Environment.GetFolderPath(Environment.SpecialFolder.Personal);
string path = Path.Combine(myDocuments, "someText.txt");

File.WriteAllText(path, text);

using (StreamReader reader = File.OpenText(path))

Here's the code for the 'Chars' function.

public static class StreamExtensions
    public static IEnumerable<char> Chars(this StreamReader reader)
        if (reader == null) throw new ArgumentNullException("reader");

        char[] buffer = new char[1];
        while(reader.Read(buffer, 0, 1) > 0)
            yield return buffer[0];

In the book the authors show a similar function that reads a file line by line and they use it to enumerate over a csv file. The resulting syntax is extremely neat, but I'll let you buy the book and see it for yourself:)


Anonymous said...

Very cool. I wonder what kind of workaround can be done with the "public static" extension method - I can only use instance methods, so how could that code sample work for me?

Mike Hadlow said...

Hi Coconet,

Thanks, I'm glad you liked the post. Why can you only use instance methods?


Andrew Webb said...

Hah! You're right, but what fooled me was that Intellisense (in VS2008 at least) makes you think you can't use LINQ on strings. Type in the method name though (e.g. "Skip") and then Intellisense will kick in.