Monthly Archives: September 2012

Exorcism : follow-up

Yesterday I told you about a debugging method I like to call exorcism, and how, until recently, I thought I had invented it. Then let me tell you a story.

Today I was at my parents’ house. That was tea time and I was talking to my father about the fact that I discovered that “exorcism” was actually a common practice. My father never went to college. He even stopped his studies in tenth grade (we call it “seconde” in France). Nevertheless he always helped me with my math problems when I was in high school, and even after. But today when I told him about exorcism he said : “That’s exactly what I’ve done with you when you were in high school. Every time you came to me with a math problem I used to sit down and listen to you, but I never understood a word of what you said! Neither the problem nor the solution! The thing is that you needed to rephrase the problem out loud to find a solution, and as I’m pretty good as a listener…”

Wise man.

Exorcism

Terminator exorcist
A Terminator exorcizing my code

In 2007 I started my career as a web developer in a web agency. One day one of my colleague had a problem that seemed to have been bothering her for a pretty long time. I told her to tell me about the problem but before she finished the explanation : bang! she found the solution just as she was trying to describe her problem to me.

We tried this several times after that, and every time it worked. As there was some kind of magic in all this, I called it Exorcism. That was a so efficient way to find solutions that I have been using it up until now.

The current rules for a good exorcism are as follows:

  1. A developer has been stuck with a problem for a while. He read and reread again the  whole thing without finding what was wrong with the code
  2. He calls for an exorcism
  3. Another member of the team grabs his scepter of exorcism (a thirty-centimeter ruler) and gets closer to the developer in trouble
  4. The developer explains the problem to the exorcist with much details while the exorcist pretends to listen (actually listening is optional)
  5. Suddenly the developer stops talking, much probably in the middle of a sentence : he has just found about the cause of his problem
  6. The exorcist brandishes his scepter in front of his exorcised teammate, letting a mystical sound flow out of his mouth (something like “haaaaaaaaaaaa…”)

I have long thought that I had invented this debugging methodology, until I recently discovered that many people do the exact same thing. This is actually a pretty common practice which have many different names. Some people do not even call another member of the team but explain their problems to a toy, a wall, a mirror, a stuffed duck, IRC or a text file, etc. So do not hesitate to implement it with your team, as this is both a fun and efficient way to debug some tricky problem.

As a concluding anecdote, here is the best exorcism I have ever practiced : One of my colleague was struggling against some code. He did not understand why some loop was not doing what he wanted it to do. He had already placed several watches and breakpoints but still could not figure out what was happening…so he called for a exorcism. After a brief explanation he said : “So the only way for the loop to behave like this is that there is a break right here!” You know exactly how it ended…

To swarm or not to swarm : return on experience

SwarmingBACKGROUND

Our team has been using Kanban for a while now, with pretty good results I must say. As we are developing and maintaining a long-term project, we deal with both new features development and bug fixes. Bug fixes are divided into three classes of service : “regular” (they actually have no name), “urgent” and “panic”. Regular bugs flow normally and urgent ones must be pulled before regular ones : as their impact is more important on customers, we want them to be fixed before. Panic bugs are critical, they possibly mean crashing the whole system, a loss of customers’ data or so. Dealing with a panic bug fix can break the WIP limits and usually stop the whole “line”, triggering the swarming effect.

SWARMING

If you’ve never heard of swarming applied to software development teams before, you might want to read the following articles:

In short swarming occurs when quite everybody on a team focuses on a specific backlog item.

A SIMPLE OBSERVATION

In our team, a “panic” item is usually analysed-developed-validated-released within one or two working days, whereas a regular bug fix takes an average of three to nine working days to be deployed. So one might come to the conclusion that we should swarm more items than just panic bugs and thus improve the average cycle time…but this is not that simple. We must study the reasons why swarming is efficient on panic items to know if we could swarm around other, lower priority, items.

NORMAL FLOW VS SWARMING

When a bug is reported to our team, we first try to evaluate the possible consequences of it. This often implies that we need to find the causes of the bug. When the consequences are critical enough, we name the issue a “panic” bug, otherwise the bug goes to the backlog and a fix will be developed for it later. This part is always kind of swarmed as everybody is interested in finding what is wrong with the product so, when a bug is found to be a “panic” bug, the environment is already set up in swarming mode and the team easily self-organizes to develop a fix. In fact analyse and development are often one single stage if the fix is trivial, like “The problem comes from here : this condition is wrong as it does not handle this case (that we never thought about before…). We just need to replace it by blah blah”, and the “dev” part is nearly done.

For the “QA” part this is slightly different. Indeed the validation on its own is the same – and must be the same – as for any development. It takes the same amount of time as for a regular bug fix. The difference here is that in swarming mode, as the QA guys are involved from the beginning, they can do some parallel work to speed up validation, like preparing a specific test configuration. It is also possible to validate the fix step-by-step, part-by-part if the implementation is not trivial.

Once the fix has been validated, we immediately release, following the normal procedure. Still we can prepare some steps of the release before the end when in swarming mode

BENEFITS

The first benefit we can point out is that we avoid any kind of buffer. When a fix is developed – when it is considered as “done” regarding the development stage – it sometimes has to stand a little while before someone from the QA validates it. Similarly when a fix has been validated, it might have to wait before we release it. Even if the flow is usually pretty smooth the cycle time will necessarily be shorter in swarming mode.

The other big difference between the normal flow and the panic flow is parallel work. When in the normal flow there is no parallel work at all. First we analyse the problem, then we develop and test a fix, then we validate it at a higher level, and then we release it. When in swarming mode some of the tasks are paralleled. We can see that as a sort of read-ahead, or maybe a read-above : “as we know what you are developing we can set up the right testing configuration”, “as we know that there is an emergency release, let’s start the release process right now”. Of course daily stand-up meetings also provide a kind of read-above, but not as much as swarming

THE EMERGENCY EFFECT

I think that most of the speed gain also comes from what I call the emergency effect. This is some kind of distributed adrenaline rush that explodes when the team is dealing with a “panic” bug. Actually I think that “panic” is not the right term, as nobody is panicking, nobody is scared as we know this can happen, the team is prepared for it. Let’s say we are…excited. That’s why “panic” is definitely not the right word : the team as a whole is just excited about a challenging bug to fix.

Anyway the emergency effect that accompany a panic issue contributes to the success of swarming, as it catalyses self-organization. But what if we change our policies to swarm around every kind of bug? The emergency effect will vanish and we will lose its extra piece of motivation.

DRAWBACKS

The main drawback I see here is throughput. Indeed, by swarming around a single item at a time you miss the opportunity to pipelining. So unless avoiding buffers and doing parallel work can give you enough speed, you won’t have the same throughput as with the normal flow.

In this article, Mike Cohn also point out that swarming “introduces too many opportunities to be in someone else’s way as they try to make progress”. There might be some big team management problems coming from swarming too often.

CONCLUSION

Swarming is efficient. It really is. It contributes to drastically decrease the cycle time. But swarming should not be enforced too often as it relies on a natural and spontaneous cohesion of the team. You know a swarm is not a flock of sheep. I think that one cannot force a team to gather and perfectly work together around any kind of PBI. There must be some kind of emergent behavior to make swarming really efficient. That’s why a critical bug is a good candidate to swarming : the whole bee hive is in danger and must be protected. But when the team does not naturally swarm, then it must be because there is no need to swarm.