Monthly Archives: February 2013

Agile vs. uncertainty

Hi all,

this time I want to talk about how agile methods or approaches can help you cope with uncertainty.
I want to tell you about the last little development job I did and how I approached it.

This one project was just successfully finished and the product shipped to the customer when I had to find out there is no proper way to find out how many installations we have.
So I started researching and found out there is a little project that was started and could help me find out these numbers.
The little project was about parsing logfiles and writing the parse results in a database.
But the way that the logfiles were parsed was very specific to other products that were released earlier, so it didn’t help me with the new and fancy product that apparently our customers liked a lot.

Analysing what we had a bit more showed that I was not the only one who needed to find out installation numbers for their products. And it was not only product management that was interested but also Sales and others.

So my idea was to start a mini-agile-project out of this.
I called it v4.

The plan for v4 was simple.

  1. Adjust the parser so it is more generic and writes all data needed in the database
  2. Modify the database or create a new database schema
  3. Visualize the data in a simple filtering webpage with a line-chart
Now you might wonder when the uncertainty comes into play.

My background is that of a windows developer where I developed for more than 10 years in C, C++, C# mainly with MFC and WPF (UI technologies from Microsoft). I did a bit of SQL but never really productive for work more for fun in a small php web page project.
Part 1 of the plan was to adjust the parser. The parser was written in Perl and Bash scripts.
And it was in complete unknown territory. The scripts and the database had to run on an ubuntu server.
Now what does all of that have to do with agile?
One of the principles when developing in agile is the following:
Implement the simplest solution that you could possibly think of.
This one principle kept me going all the time.
If you ever developed and did that on a new and unknown platform you might understand how frustrating it can be from time to time. But if you constantly achieve small goals and if you are able to “deliver” value to your customer that can be very motivating.
So to fulfill part 1 of the plan I needed to get the parser adjusted to fill that data in the database.
I first analysed what was there learnt the ins and outs of bash scripts to deal with the raw logfiles.
In the next step I had to adjust the perl script that was already there.
Unfortunately Perl is not necessarily the most descriptive programming language  I ever worked with. So I took some time to get used to it.  

This is how perl can look like:

  • $text =~ s//[^/]*/..//;
Ok I learned it and created a first version of the perl script.
1st version:
  • parse the logfiles and create an external sql file with insert statements that could be feeded to mysql
It worked and did what it should do. A little drawback was that it proofed to be quite slow. A few hours per logfile. Like that it was not usable.

2nd version:
  • directly insert data from perl script to the database using dbi

When I started using direct database access from perl things improved quite a lot. A full logfile could be parsed in about 1,5 – 2 hours time on the test machine. On the productive machine which had much more horse power (8 quad core cpu’s and a huge amount of RAM) I hoped that the speed was much higher.
Then I checked how many files in total I’d have to parse to get the data of 4 weeks.
The bad news: ~11.000 files. 11k files * 2 hours each = totally not doable

3rd version:
  • greatly improve the speed of the dbi inserts

After some research about mysql I was able to do it. From 2 hours down to 20 seconds. But the journey was not over yet. After parsing a few thousand logfiles the database grew too large and became unresponsive. What took only about 20 seconds when the database was empty took around 2 hours when the database had over 200 million records.

4th version:

  • reduce the raw data to keep the database responsive

There were many more steps but I’ll stop here.
What I wanted to show you with these steps is the way I approached the problem. I couldn’t do it like I was used to develop. I didn’t have the time to first learn the programming language and the system I was working on so this had to go along the way.
If you are in an area of absolute uncertainty it is very hard if not impossible to take the traditional appraoch of requirements, design, programming, test.

This is where the agile approach can shine:

  • implement a basic prototype to find out if you’re even able to do what you planned to do (version 1)
  • continuously refactor as you find out more about the platform and the system you are using (version 2)
  • let the architecture of your system emerge while you learn the pitfalls of the new environment (version 3 + 4)
  • constantly question the requirements that you have, there might be a better/different way how to do it (version 4)
  • automate as much as possible, the easier you can start from scratch the more likely will you succeed
  • test a lot, best create automated tests for the system so you can be sure the refactoring doesn’t break anything
  • keep your code as simple and readable as possible this will benefit the refactoring
  • if you find complex blocks of code, refactor it and break them down to smaller chunks or modules until you are satisfied with the result
  • expect the unexpected and be flexible enough to deal with things that could not be foreseen
  • deal with unknowns as soon as you uncover them (thx @lukadotnet for pointing me to these last 2)
I’m pretty sure this list is by far not complete. So I would like to ask about the experience you have made. If you have points to add to that list I’d be more than happy to put them there. Just leave a comment with your addition. Thank you!

v4 was not completed entirely but part 1 and 2 could be finished. The resulting database could answer questions that no other system before could answer. Even though the gold plating – the small node js webpage – was not finished the whole project was a great success. Out of a project that in its entirety was supposed to only take 1-2 weeks resulted a development effort for part 1-2 of about 4-6 weeks. I’m pretty sure there are different and more efficient ways to do it. But if you are uncertain – you never know…