This post was sparked by some work I've been doing recently, I've been fortunate enough to have time to really get under the skin of continuous integration around a large UK-based e-commerce website. I'm busily working on some hands-on blog posts applying what I've learned to one of my open-source projects, but first a little bit of history.
Since my first encounter with continuous integration (CI) in April 2009, I've been through a bit of a journey with it, along which it's intruded on and controlled a bigger and bigger chunk of the whole software process. I've gone from CruiseControl giving me a simple version number, to unit tests and Selenium tests under Hudson (followed by Jenkins), with recent forays into NuGet feeds, automated deployments and code metrics under TeamCity and PSake.
This whole thing's taken me almost four years to get to where I am today, but looking back it really should have taken me as many months – seriously, if you're just getting started with CI, don't waste time playing with the basics. This post will explain the benefits of the most basic CI setup, and take you step-by-step through my journey towards the ultimate CI that automates every hum-drum, error-prone step of your software build and deploy process.
Continuous Integration Level 1 – Get Me A Build Number!
Initially, most start with CI server building stuff, and producing an associated build number. CI is your measure of, "Is what's in our version control system okay?", and when you get a "yes", you get a corresponding build number.
You've checked in a change which builds on your machine, but blows up on your work-mate Bob's machine. Without CI, how do you know if you've messed up the commit, or if something's screwy on Bob's machine, or if Bob's screwed up when he updated (merge conflict ahoy!), or if he's got some screwy code he's not committed yet? With a simple CI build, within a couple of minutes you have the decider between "Dammit I broke the build!", and "Dammit Bob, you fix it!"
Continuous Integration Level 2 – Is It Healthy?
Now that we know we've not totally screwed the source code (it builds, yay!), we can start worrying about how healthy is our source code.
What do I mean by "healthy"? There's lots of breadth to this (The first few of my more hands-on blog posts I'm working on will cover this area) – after getting a simple build, you can move onto all sorts of metrics, just off the top of my head:
- Unit tests pass
- Integration tests pass
- Percentage of test coverage
- Number of duplicate code fragments
- Average method length, class length, etc
- Number of ReSharper/FxCop/StyleCop/whatever warnings and errors
- Cyclometric complexity
- Performance stats
TeamCity can handle some of this stuff, there are also other tools out there that can work alongside TeamCity to help in this space, for example Sonar (which I'm sure will also be the subject of an upcoming blog post).
Continuous Integration Level 3 – Producing Artefacts
When your CI successfully builds, tests and labels a particular revision of your code, you should see that as "the thing" you want to deploy. You might go on to do further manual testing of "the thing", if that manual testing passes as well then you really want to get "the thing" live and in front of your customers as quickly and safely as possible.
How do you guarantee that "the thing" that you eventually deploy to live is the same "the thing" that's been through your CI build and test, your QA environment and QA testing? How do you know for sure that no one has snuck in a half-assed last minute bug fix that's going to blow up in production? You produce artefacts.
Artefacts are great. When a CI build completes successfully, the output of that build gets packaged up, usually into a zip file or a NuGet package, and that's the thing that gets deployed into each environment. So long as that artefact is used, you then know that it's been automatically and manually tested through an identical process as with any other of your deployments.
Artefacts and Environment Configuration
I can hear someone saying this – "But hold on – if you deploy that same artefact into each environment, what about your config files? Surely they can't be the same in each environment?"
As you proceed along the "CI Journey", at some point you're going to hit the configuration problem, and it needs solving, the sooner the better. I'm going to talk in detail about a couple of solutions to this in another blog post, but for now you've got a couple of options:
- Keep your environments identical, using tricks like "hosts" files to repoint domains – this isn't of course the solution, but always aim to keep your environments as similar as possible.
- Don't deploy your config files and hand-ball config changes using a combination of documented changes, revision history of config files, and a tool like WinMerge.
- Use the Visual Studio Config File Transformations process – I'm personally not a fan of this process, I can't see how it enforces the same transformations in each environment.
- Roll your own config file transform process using a list of XPath locators, and a set of "environment" files with values for each transform. You can throw an error if a value is missing for a particular transform for a particular environment, and halt proceedings.
Continuous Integration Level 3.5 – NuGet
Spoiler alert – CI level 4 is automating your deployments. We'll come to that in a moment, because there's a spot between build artefacts and deployments that's recently hit the .Net world big time, and that's NuGet.
For the uninitiated, NuGet is a system that manages artefacts called packages (NuGet packages are really zip files with a different extension and some special extra files in it), and defines a system for a build or development tool to fetch these packages when needed from a web feed, similar to RSS for blogs. NuGet has some added sugar to deal with things like package versioning, managing dependencies between packages, running on different version of .Net, etc. For more details, see my recent blog post on creating and publishing a NuGet package.
In our case, we have a bunch of solutions that we build, each producing a DLL for sharing between multiple other solutions. TeamCity (and many other CI tools) can automatically produce NuGet packages, TeamCity (not so sure about other CI tools) has a built in NuGet feed, instant win.
We now have TeamCity building and testing our DLLs, and a NuGet feed hosting current and historic versions of those DLLs. We use the Visual Studio NuGet tools to manage the DLL versions against the applications that depend on these DLLs, and we can be sure that if we update a DLL needed by another DLL, it'll automatically get sorted when we build our solutions, all's well.
Apart from the hellish time we had trying to use an authenticated NuGet feed. Don't do that, ever. It'll make you mad. Anonymous feed all the way.
Continuous Integration Level 4 – Automated Deploy
Now we're getting serious – we're talking about getting a commit from a developer's machine into an environment, with no manual steps. No manual keying in of version numbers, no manual running of scripts on a developer's machine, no handballing of config files, in short a whole slew of potential bugs stopped dead.
Depending on what you're building and where you want it deployed, automated deploy can mean a couple of different things.
- If you're building desktop software, automated deploy probably means building a .msi file (or something similar), exposing that as an artefact, and automating the install of said .msi file into an environment where you can automate tests against your app.
- If you're in web world (like ourselves), automated deploy is in a lot of ways simpler – it's can be as simple as positioning build artefacts in the appropriate location(s), and potentially updating something like IIS to point to a new folder location.
- If you're in cloud world, then you're in a world I currently know very little of, so I won't say much here. But I have a suspicion that your automated deploy will involve spinning up a new environment, deploying into it clean, and spinning up integration testing.
If you're doing this kinda stuff, particularly if IIS is involved, take a look at Octopus Deploy – we're getting some serious love for this tool.
But even if you've got tools like Octopus helping out, actually implementing auto deploy is tough. You need to have a good hand on the artefacts coming out of each build and what subsequent builds do with these artefacts, you need to have a fair bit of patience to run the same build 10-20 times sorting out crappy issues like folders not being cleared down because of files being locked, you need to sort out excluding files from NuGet that you don't want to publish (hint – an xcopy-style publish step before you NuGet can help here), etc. Again expect some future blog posts looking at this in a little more detail.
Journey's End – We Made It!
Now that you've actually got that ninja uber-build automating everything, you may think you can take a break and enjoy a well-earned break. But if your software's anything like mine, you've maybe just realised that integration testing suddenly has a whole new significance, and there's a whole new journey of Selenium WebDriver, SpecFlow, WatiR, WatiN, Selenium Grid...