Start-Up Mode

India Lately by Gold Panda on Grooveshark
In December of last year, I've joined Wajam, a great social search start-up. Since then, I've been very busy and learned a ton. I've coded more in the last 6 months than in the previous 2 years. I've learned about Information Retrieval, NoSQL databases and Hadoop. I can now program in Scala, write Pig and Map/Reduce jobs. The pace is incredibly fast; I often come home on Friday completely drained. I work with pure geniuses and lightning-fast typing hackers. Nothing can't be done. Ideas are implemented. I love it! At 34, I'm on the old side. I must admit I sometimes wonder how I can hold the pace with small children at home, but somehow, it works.

One of the aspect of start-up mode that I want to discuss is continuous deployment. When I joined, I was scared: new code was deployed many times a day. There was practically no unit tests, no formal QA, no stress tests. Initially, I thought the system would be continuously unstable and that most developers would spend time managing production fires. It happened...sometimes...but not as often as I thought.

If you know me or if you've been reading some of my posts, you know that I'm a test-oriented developer. So I've started adding unit tests and integration tests and made sure they were run every time new code was added. Then we've added code reviews before pushing to production using the github development flow. As of now, I've never seen a system as stable as the one we have at Wajam...and the best part is that the team still pushes code in production many times a day. I'm not scared anymore.

Unit tests, code reviews...they all help keeping the system stable. But I think the real key to a stable system is deploying often. I know it sounds counter intuitive so let me explain. In the beginning, I would write a feature, then write some more code. The more code I wrote, the more I was scared of deploying it, which somehow led me to add more fixes and small improvements before deploying?! The problem is: the bigger the code change, the more likely the change will break the system and the hardest it will be to troubleshoot. On the opposite, if small chunks of code are deployed continuously, it is easier to validate the code during the code review and easier to find the bug if the change breaks something because the difference between the stable system and the now unstable system is small. 

Deploying code often keeps the adrenaline high and the developers alert. There is no sloppy commit since you know you are the last frontier between implementation and production. Developers take a second careful look at the code they write which results in better code in general and the fact that you know your code will be reviewed pushes you to be more thorough.

There is also a motivational aspect to continuous deployment. The reward is instantaneous. You see the new feature live a few minutes after you finished it, you feel the relief when the CPU usage drops by 20% after the code optimization is deployed, etc. It keeps the momentum going.

I remember working for months on a release that would spend some more months in QA...then spending weeks identifying and fixing stress test problems because the diff between the two releases was a set of hundreds of files. Now, I wonder how this could possibly work.


Data, Functions and Objects

Assassinations by Stateless on Grooveshark

One thing I realized while learning Clojure is how data, functions and objects relate to each others.

Data is a structured set of values or, said another way, values within a data structure (e.g. a list of integers, a map of [key, value] pairs, a struct).

A function takes data as input and produces data as output.

An objects is a combination of data and functions within the same entity.

The problem with Object-Oriented languages like Java is that Java programmers tend to forget about data and functions and think only in terms of objects. This is problematic because data and functions are simpler than objects. Not only simpler to write, but also simpler to test and maintain. Objects are great for modeling because this type of entity is closer to the way humans see the world. But objects bring complexity because of the state notion that comes with them. Stateful objects can be painful to test and difficult to maintain. When you add multi-threading to the mix, it can become a real nightmare.

So, in order to keep the code simple, the programmer should favor data and functions over stateful objects.

Let's look at an an example. Here we have an Employee class defined the typical Java way. Data (employee's information) and functions (processing) are coupled together in a class. Some methods support the data (getters/setters) and the other methods are the functions that can be applied to employees.

Listing 1
As I mentioned in a previous post, a better approach would be to have the employee's data immutable to avoid any issues with multi-threading. For example, if computeBonus() is called and targetBonus is changed by the setter while it executes, the result is incoherent and wrong. So a better version of this class would be immutable.

Listing 2
Now, if we take a closer look at the employee's methods, we have salary related methods and a HR related method (isEligibleForBenefits()). As the system grows the Employee class will probably grow and more and more unrelated methods will be added to the Employee's class. So what if we look at it from the functional programming perspective. Let's split the data from the functions.

Listing 3
I'm not sure yet what is the best way to represent data in Java, but using public final fields seems nice since the data is immutable and the code is not cluttered with getters. Now if the data has to be modified, either a new Employee is constructed or an EmployeeBuilder can be written.

To write the functions, there are many alternatives. Let's try with static methods.

Listing 4
Static methods are easy to use, but they have one drawback: the client code is strongly coupled with the method. This means that the client code will always call the exact same method at runtime. This can cause trouble for the tests. Mock libraries such as PowerMock could be used to mock static methods, but the test code seems odd as the programmer will have to mock a class (the one containing the static methods) that seems unrelated to the test since the dependency is hidden in the class under test. The test code then looks like dark magic to me. Also, if we compare this option with the original Employee version, we have lost the option of polymorphic methods (runtime method dispatch). So I think static methods are only good for utility functions, those which are not likely to change and that can be use in many unrelated context such as Math.round().

Another option is to create a class that wraps the Employee data and adds the functions.

Listing 5
Basically, we are back to square one. This class is equivalent to the original Employee class, but more cumbersome to use. The only small gain I see is that it is possible to group the different employee functions in different classes.

I think a good option is to use an attribute-less class.

Listing 6
With this option, we have gain something compared to the static functions: polymorphism. We could create an interface for the class and implement the same methods differently. This is great for tests and great for flexibility. We can also group the functions logically (salary functions in one class, HR functions is another, ...). The downside is that we need to instanciate the class in order to use it and inject it into the client code otherwise the benefits are lost because instanciating the class in the client code using the new operator is the same as using static methods. The new operator is a static method that hardcode the dependency.

We could go one step further...

Listing 7
Now factor is an immutable attribute. This is great if, for some parts of the system, the factor value is always the same; the programmer does not have to specify it at every call and even better, he doesn't have to propagate it to all the callers which are now independent to this bonus computation detail. Put in another way, employee is a variable and factor is a parameter of the function (as x is a variable and A and B are parameters in f(x) = Ax + B).

There are cases where an object (state attributes and methods) is the way to go. But when the intention is to apply functions to data, I think a clean alternative is to split the data from the functions by using a simple data object for the data and stateless objects for the functions.

What do you think?



The Low Hum by Moby on Grooveshark

The detour took longer than expected. I left to learn about Clojure and how its constructs can simplify a program, but I ended up taking a different turn and left the OZ -> Nokia -> Synchronica boat after more than 7 years to work for a very interesting Montreal start-up called Wajam.

In the next few weeks, I'll focus on entering the start-up mode and working on my confoo presentation...but I have a few more ideas to write about: single responsibility principle, questioning test driven development, software binary mode and some ideas from my small incursion into functional programming...

Stay tuned for more music driven development!

...and Happy New Year!



I'll take a detour. I will pause writing about simple testable code for the month of November. Instead, I will focus on learning and understanding Clojure. Why? Because of these two talks by Rich Hickey, Clojure's creator.

InfoQ: Are We There Yet?
InfoQ: Simple Made Easy

In these talks, Rich explains limitations of object-oriented programming and how it affects program complexity. In Simple Made Easy, he gives some pointers as to how program could be simplified by using simpler constructs. His explanations are clear and the arguments are compelling. I strongly suggest to watch these two talks, they are well worth the time.

At this point in my quest for simple testable code, I believe I need to understand how Clojure simplifies program and why its constructs make it so. Hopefully, I will be able to bring some of this back to the Java world to create even simpler testable code :)


Final, Clean and Simple

I've read Clean Code. This is a great book for any developer that values craftsmanship. Throughout the book, I could not stop thinking about the link between clean code and simple code. They should be the same don't they? Clean code is simple, simple code is clean. Robert C. Martin's book is clearly about making code clean and easy to read. Could someone who follows all the rules in Clean Code still end up with complex but readable code.

The following passage of Clean Code highlights a potential clash between clean and simple:

"I think that there are a few good use of final, such as the occasional final constant, but otherwise the keyword adds little value and creates a lot of clutter. Perhaps I feel this way because the kinds of errors that final might catch are already caught by the test I write."

To reduce code clutter, programmers should rarely use final. I think this is wrong. It is wrong because the final keyword reduces complexity. Applied on attributes, it limits the state space of the object. It tells the programmer not to worry about this attribute's reference changing over the lifetime of the object. Whenever I see a final attribute, I can prune possible state transitions. I'm all for adding a little clutter to the code to explicitly reduce the program apparent complexity. Regarding the second statement of the passage, I do not believe final should be used to catch errors. It should be use to limits complexity, to simplify the code. Tests and the use of final are not related.

More generally, minimization of code and simplification of code are not the same thing. I initially thought that less code is better. I was wrong. After watching this talk by Rich Hickey, I understood a more powerful meaning of simplicity: one concept, one responsibility, one role. Clean Code does a good job of highlighting this desirable code attribute. If you split the code into single purpose, untangled methods, classes and components, the number of elements in the design will increase. You will end up with more, not less. As Rich says in his talk, simplicity is not about counting.

Understanding the real meaning of simplicity (vs easy, vs clean) is important. It is also important to understand what has to be simple. Is it the code syntax and format or the program structure and the elements that it consists of? I think programmers to often applies simplicity on the code surface, but it is when it is applied on its structure that the long term gains are impressive.


About Structure and Value

I've been thinking about the expert vs master dichotomy a bit more these days and my conclusion is that it is all about structure and value.

Expert programmers value structure first, master programmers build value first.


The expert, the master...a Java Version

One of the inspiration for this blog is this post by Zed Shaw where he points out what he thinks is the difference between an expert and a master programmer.

The expert, the master, the programmer.

For fun, I wrote a Java version.