Tuesday, August 28, 2012

Using Git and GitHub in a Microsoft Development Team

The team at 15Below, my excellent clients, have been using Git and GitHub since last September. Although I’ve been using GitHub for open source projects for several years now, this is the first time I’ve worked with it in a largish (20+ developers) team. The default VCS for a Microsoft shop is, of course, TFS, so deciding to use GitHub might be seen as somewhat curious. This post describes why we came to the decision, how we integrate GitHub into our development process, and our experience so far.

So why did we choose Git as our VCS?

  • I, and several of my colleagues had had experience with distributed VCSs, specifically Git, from working on open source projects. Having your own local repository gives you so much more flexibility and opportunity for experimentation, that it seemed that a non-distributed VCS was a step backwards.
  • The team is spit into small project teams of 2 or 3 developers, each working on different features, so being able to branch-by-feature was also a requirement. We needed a VCS with excellent branching and merging capabilities.
  • We also have a distributed team with members in the UK, India and Australia, so a cloud based solution seemed appropriate. Our OSS experience with GitHub made it the obvious choice.
  • Whenever one is choosing tools, the level of adoption in the wider development community should be a consideration, and although Git is rare in the Microsoft world it’s fast becoming the default VCS elsewhere.

GitHub is Git’s killer app. Without GitHub, Git would simply be just another DVCS. After you’ve used a cloud based VCS like GitHub it feels like overkill to even consider hosting one’s own master repository. We pay $25 per month for the basic Bronze plan which is a trivial cost for an organisation of our size, yet it allows us to host our 5GB core repository and access for 20+ committers. I’m constantly amazed at Git and GitHub’s performance, I can pull the entire master branch down in just a few minutes and most normal pulls and pushes take a few seconds. Just to give you some idea of the scale of our software, running NDepend on our master branch gives:

  • 635,884 Lines of code
  • 435 Assemblies
  • 17,831 Types
  • 185,423 Methods
  • And we have 7665 commits since we started using GitHub last September.

So you can see that we are far from trivial users. GitHub and Git have proven to be reliable, scalable and fast (no, really fast) even for our rather bloated codebase.

The GitHub UI has also proved to be very useful. It gives a clear view of commits, and makes it easy to browse and comment on changes. Another nice touch is GitHub’s support for markdown syntax. We’ve started keeping technical documentation next to the code as .md files. This is great when you’re branching and merging because the documentation branches and merges along with the code. It also makes finding the docs for a particular assembly trivial since they’re part of the VS project.

Having decided on Git and GitHub, how did we integrate it into our existing tools and development process?

One lesson we’ve learnt is that source control tools that integrate into Visual Studio are problematic:

  • They tend to obfuscate changes to source code on disk with changes in the IDE. Weaning developers away from seeing everything from the view of the Solution Explorer has lead to far fewer problems with inadvertently changed files and corrupted solution and project files.
  • Source controlled assets that are not controlled by the IDE get forgotten. ‘Everything that the IDE cares about’ is different from ‘Everything that’s not ignored in this directory tree’. Using a source control tool that’s not integrated into VS gives a much cleaner view of the repository.

I still use the command line tools via Cygwin, but I’m in a minority of one, most of the team use Git Extensions and fall back on the bash shell when they need to do something complex. We initially tried Tortoise Git, but it wasn’t ready for prime time. We’ve also looked at GitHub for windows, but I don’t think anyone is using it day-to-day.

We have a single master repository on GitHub with multiple branches for releases and development. Each developer is a committer to the master repository. This is closest to the way one would work with a more old-fashioned client server tool like SVN and it seemed like the obvious model when we initially considered using GitHub. So far it’s worked reasonably well. We ‘branch-per-feature’, so each team works in their own feature branches and then merges into the development branch when they are done. We have discussed feature switches, but felt that it introduces an orthogonal source control concern into our code base.

We have also discussed using the GitHub model more directly with each developer having their own GitHub repository and issuing pull requests to a core repository. I quite like the idea of having a code review process built into the source control model, so it’s something I’d like to consider using in the future. I guess you’d have to have a ‘repo-guardian’ who handled all the pull requests. Whether this would be a single individual’s full time job, or something that would be shared around the team, is an interesting question.

We use TeamCity to manage our CI build process. It integrates well with GitHub and it only takes a few clicks to get it pulling on each push to GitHub. An essential piece of the branch-per-feature pattern is to have CI on every branch. Luckily TeamCity makes this pretty easy to do and with the new feature branch feature it should become trivial.

Problems with Git and GitHub

  • The security model doesn’t integrate with Active Directory, so we have to manage users and logins separately which is a pain. People often required help with the SSH keys when getting started.
  • Git is hard to learn. I think Git’s greatest strength and its greatest weakness is that there is no abstraction on top of the architecture of its implementation. You really have to understand how it works internally in order to use it correctly. This means there’s a non-trivial learning curve. Having said that, even our most junior developers now use it successfully, so the excuse that ‘it’s far too difficult for my team to learn’, means that you are probably underestimating your team.
  • Some developers might worry that not having TFS experience on their CV could hurt their employment opportunities. On the other hand, our top developers think its pretty cool that we use the same tools that they use it for their open source projects.

So …

On the whole our experience of Git and GitHub has been good. Our primary fear, that some of the junior people would find it too difficult to learn, has proved to be unfounded. There’s no doubt that the learning curve is greater than for TFS or SVN, but the power is also greater. The performance of Git and GitHub continues to impress, and we have no complaints with the robustness or stability of either tool. The merging and branching power of Git has allowed us to introduce a far more flexible product strategy and the repo-in-the-cloud has made the geographic spread of the team a non-issue. In short, GitHub is a compelling alternative to TFS and is a choice that I’m happy we made.

Friday, August 24, 2012

REST – Epic Semantic Fail

Roy Fielding writes a PhD dissertation describing the architectural style of the World Wide Web. He coins the term ‘Representational State Transfer’ (REST) to describe it – after all, if you’re going to talk about something, you need to give it a name. Somehow, in a semantic shift of epic fail proportions, the term REST has also come to mean something completely different to most people, a ‘style’ of writing web services. This style has no agreed protocol.

The result? The internet is ablaze with an out of control REST flame war. It seems that many people think there’s a REST protocol when in fact there’s no such thing. Looking for a protocol in Roy Fielding’s dissertation will get you nowhere because it’s an academic paper describing an architectural style, there’s no protocol to be had. The only contribution Mr Fielding makes to the debate is to tell almost anyone who describes their API as RESTful, that it is not.

Writing RESTful web services, in practice – in the real world – means that you are on your own. You have to write your own protocol (probably implicitly, because you don’t even realise that’s what you’re doing). Now the whole thing about a protocol – TCP/IP, HTTP, SMTP, SOAP – is that everyone agrees on a set of (reasonably) unambiguous rules for communication. These can then be coded into libraries, toolkits, servers, what have you, and my Linux web server written in PHP can communicate with your .NET client running on Windows because the TCP/IP, HTTP and HTML specs are unambiguous enough to ensure that if you follow them stuff will work. If you write your own protocol and nobody else adopts it, it’s not very useful. If I want to write a client to communicate with your REST API I’m in a world of pain; there’s no serviceutil I can point at your API to generate me a proxy, instead I have to carefully read your documentation (if it exists) and then lovingly handcraft my client using low level HTTP API calls.

Now don’t get me wrong, I think a web service protocol based on a RESTful architectural style would be a wonderful thing, but let’s not kid ourselves that such a thing exists.

Show me the metadata.

The core missing pieces of any RESTful web service protocol are agreed standards on media type and link metadata. Everyone seems to agree that the Content-Type header should describe the structure and purpose of the resource, but currently it’s up for grabs how you might navigate from a media type description (like ‘application/vnb.sutekishop.customer+json’) to a machine readable schema definition for the customer – should it be XSD? JSON schema? . The same goes for hyperlinks. Although there’s an established HTML standard (the A tag), how links should be encoded in an XML or JSON representation is up to the individual API author. Similarly although there seems to be agreement that the link metadata should live at URI described by the ‘rel’ attribute, what that metadata should look like is also undefined.

Sure there are some valiant attempts to come up with a protocol – I quite liked HAL, and the HAL browser is an interesting take on how RIA UIs might be purely metadata driven at runtime – these are all still just proposals.

I think we’ll know when we have an established RESTful web service protocol, or protocols. It will be when we stop using the term ‘REST’ to describe what we are doing. When we’re writing SOAP based services we call them just that. “I’m writing a SOAP web service.” Not, “I’m writing ‘XML based RPC tunnelled over HTTP’”, which of course could be open to interpretation about exactly how we’re doing it. When we have an established protocol, ‘REST’ will be retuned to its rightful place, and the only people who will use the term will be software architecture academics like Mr Fielding.

Evolution works for me

So far the tone of this rant has been somewhat negative, It seems like I’ve been rubbishing the whole ‘REST’ project. Actually I think the current situation is healthy. Monolithic ‘enterprise’ protocols like SOAP usually end up being a bad idea. The internet is a soup of small layered protocols, each one simple in itself, but that work together to make a much larger whole. The debate around REST has reminded us that much of the infrastructure to create hugely scalable services already exists. If the community can build on this, and fill in the missing pieces, preferably with small protocols that solve each problem individually, then I think we will arrive at a much better place. My reason for writing this piece is simply a warning to the unwary, regular ‘Morts’ like myself, that when someone says, “Mike, can you write us a REST API?” to be aware that the rule book has not yet been written and that you will be making much of it up as you go along.

Tuesday, August 07, 2012

Sprache – A Monadic Parser For C#

Recently I had a requirement to parse AMQP error messages. A typical message looks something like this:

The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, 
code=406, text="PRECONDITION_FAILED - parameters for queue 'my.redeclare.queue'
in vhost '/' not equivalent"
, classId=50, methodId=10, cause=

It starts off with the high-level message text ‘The AMQP operation was interrupted’, then a colon, then some comma separated values some of which are key value pairs.

I wanted to parse these into a ‘semantic model’ – an object graph that represents the structure of the error. Now I could have done some pretty nasty string manipulation; looking for the first colon, separating the rest by commas, looking for ‘=’ and separating out the key-value pairs, but the code would have been rather ugly to say the least. I could have used regular expressions, but once again I doubt very much if I would have been able to read the resulting expression if I revisited the code in a couple of weeks time.

Then I remembered Sprache, a little monadic parser by Nicholas Blumhardt that I’d encountered last year when I was writing about Monads. The lovely thing about Sprache is that you write your parser in readable C# code and build the sematic model directly in the parser code. It’s very easy to use and very readable. Nicholas has an excellent step-by-step post here that I’d strongly recommend reading.

I found it on NuGet, but whoever had put it up there had since disappeared. After contacting Nicholas I decided to adopt it, first moving the source to GitHub and setting up continuous deployment to NuGet via TeamCity.CodeBetter.com.

If you go to the NuGet.org Sprache page, you’ll see that its owners are myself and Nicholas and each push to the GitHub repository results in a new package upload (the last three all done today while I was getting it working ;).

So if you need to do some parsing, give Sprache a try, it’s much easier than writing your own parser from scratch, and unlike regex you can actually read the code after you’ve written it.

You can see my AMQP parser experiment here.