Tag Archives: Kanban

Stuffed with tasks

This blog is almost exactly the answer I made to a question from pm.sepent0x on pm.stackexchange.com (Best way to divide & assign development work on projects?)

Pent0x is a developer who is being assigned many tasks from various projects at the same time. He does not know how to complete his assigned work in time, particularly when he also has to deal with issues that arise in addition of his regular work. He really believe that there must be a better way to assign tasks than what he experiences – endures.

There is no Holy Grail answer to this question, but let me give you some food for thought:

Issues happen

This is not a real eye-opener, is it? Bugs, issues, defects or whatever you call them should not jeopardize the schedule, at least to a certain extend. Although rework is a loss of time, it happens in every (software) project and should be considered since the beginning. If every people – who are called “resources” in such a case – is stuffed with tasks at 100% capacity, any issue will make it 120%.

My advise here is to identify classes of service based on urgency and criticity, and allow some slack in the system so that the team can expedite important bugs when they happen without killing the schedule.

“[I]ssues […] you can’t always predict a time frame for”

So what? You cannot crystal-ball predict how long it will take to solve a particular issue : is this really going to change the fact that the issue must be solved? If so then maybe this issue was not that important after all, and some other tasks could have had higher priority.

When running too many projects at the same time, none of them will meet the schedule.

Running too many projects at the same time produces at least two effects:

  • People have to switch between projects. Switching from one task to another is hard. Switching between projects is harder by an order of magnitude. When switching, people lose focus, experience difficulties reminding about the context and basically lose time.
  • People have to deal with too many things at once. The human brain is not multitask. When someone has to work on several work items at the same time, the quality of his work will drop significantly. Not a surprise that making phone calls when driving (at least in France) or chatting with your neighbor in a classroom are forbidden.

My advise here is to limit the amount of work in progress at every level of granularity.

Pushing work

[…]and so most of my tasks that keep getting assigned to me start stacking on top of each other

Giving – pushing – work to someone is easy. When we tell someone to work on a specific task, we can then come back later and blame her for not having finished the assigned work, which is pretty comfortable, isn’t it? But by doing so we blind ourselves : we ignore systemic problems and transform them into personal problems. The PM won’t think “We have a problem with our process : we start too many things without finishing them…” but “This guy is really slow : look at how late he is“.

So my last advise will be : implement a pull system and visualize work. A pull system will make sure that the team “stops starting and starts finishing” (a Kanban adage, I can’t remember who said it first) while visualizing work might trigger some improvements in the way work is processed (“Don’t tell me that we currently have 26 work items at the same time for only 4 people?!”).

Sticky notes do not make Kanban

I am often surprised to read or hear that many people think they are using a Kanban board when they stick notes on a wall. They say “We represent each stage of the process by a column and drive notes from left to right to represent work progress : a Kanban board.” This is frequently the case with Scrum teams, and you can find plenty of articles on the Web entitled “Using a Kanban board to deal with impediments” or saying things like “The main difference between a Scrum board and a Kanban board is that a Scrum board is reset after every sprint.” I’m sorry to disappoint you but sticky notes do not make Kanban.

The main confusion is due to the fact that the sticky notes we use on the board are not kanban “cards” (or just kanbans). Indeed, the sticky notes are usually used to represent work items, tasks, user stories, etc. Those work items travel through the boar. On the contrary kanbans represent the need to move work items. In a Toyota-like environment, they are messages meant to ask people before us on the process chain that we need them to refill the inventory, to produce more pieces.

So what about Kanban in software development then? Where are those famous kanbans if not the sticky notes? Let’s take an example to make this clear : a Scrum team of 5 people, 1 of them is a tester, 4 are developers. To take advantage of everyone’s specialty, they decide to change the classical Scrum board and process ( to do – doing – done) into to do – dev – test – done.

revisited scrum board

After a couple of sprints they notice a flaw with their process. Indeed, although everything goes fine most of the time, it happens that stories are easy and quick to develop, but hard to test. When this happens developers push many stories into the testing stage, the tester starts too many tests, plays too many scenarios at the same time and misses important defects, or cannot finish anything at all.

During a sprint retrospective, the team reminds itself one of the most important things about Agile : work must be completed (or done or whatever you call it). They realize that they cannot let work pile up into the “test” column and suggest several solutions. The first solution is to hire another tester, but this is impossible and would be unnecessary most of the time. The second is to implement a one-piece flow : the team swarms around a single item at a time, but it seems hard to implement and maybe a bit too extreme. Nevertheless the idea behind the one-piece flow looks good : limiting the work in progress to make sure that things are done. We could say that the tester cannot test more than one item at a time while the 4 developers can develop 2 items at once, as we practice pair-programming, plus a small buffer to improve flexibility. But how to do that : if developers push items into the test column there will be more than one item at a time sometimes. The solution : a pull system. The resulting board looks like this:

revisited scrum board with kanban

Developers will pull work from the TODO stage and the tester from the DEV DONE stage…and this board is a kanban board.

Still cannot see kanbans on this board? This is because kanbans are virtual. Here are the the two main ingredients that make kanbans appear on the board:

  • Focus on demand. The team has committed to providing a list of items during this sprint. To provide those items, they must be validated (the “test” column). To be tested, items must be developed (the “dev” stage). This is what makes a kanban system be a pull system based on need. Kanbans represent this need. There is no work without a kanban.
  • Limited-size inventory. To avoid piling up half-done work we need to limit WIP. Kanbans represent the availability of an empty slot to put a work item in.

So are you gonna tell us where those f***ing kanbans are? They are the difference between the WIP limits (aka the maximum size of the inventory) and the actual number of work items (aka the size of the inventory). We don’t usually represent them explicitly, but I have already seen boards on which WIP limits are not numbers but actual slots. What is important here is that kanban boards are not just about visualizing work items, they are an impressive tool to visualize resource availability, help identifying bottlenecks, help manage risk, improve predictability, etc.

In short, a kanban board is not a board with sticky notes representing work items, it is a board where is represented a demand-focused pull-system with limited work-in-progress. Sticky notes are not required.

EDIT (3 Jan 2013) : Mike Burrows just posted a very good, very complete article on this subject here. He gives an interesting value-based introduction to Kanban instead of the classical sticky-notes-based description.

Slack Time : A try-learn-improve catalyst

kaizen

In my opinion, a software development actor – by actor I mean a company, a team, a PM, a developer, the PM’s dog – starts being Agile and thinking Agile when he realizes two things. First that the customer – or product owner – cannot detail all the requirements and all the features at the beginning of the project, and will probably change his mind anyway. Second that software development is a creation activity (craftsmanship), not an engineering activity.

Once he realizes that, he begins to feel confident in the fact that some things cannot be streamlined and plan-driven from the start, that there is no such thing as a recipe to make a good software, and that both the requirement part and the pure development part are somehow made of try-failure-try-success cycles. So we try and experiment. If you are developer you might want to try and build your own set of best-practices that fit your current situation. If you are a project manager you might want to try to improve the process by making some small adjustments in a try-and-learn format. In short you stop being dogmatic, you stop thinking in terms of plan-and-apply and begin to believe in try-learn-improve.

This new try-learn-improve culture is basically what Japanese people call kaizen, which is kind of a buzz-word these days. The whole Agile thinking is based on the concepts behind kaizen : improve what you are doing every day, step by step, little by little, from the inside, each time the environment changes, etc.

At the highest level this is now a no-brainer: we stop trying to deliver the whole value – the whole product – at once but deliver it little by little, adding value at each step and gathering feedback as quickly as possible, allowing a quick learn-and-improve loop. Scrum, for example, completes this loop on every sprint through a review with the stakeholders and a retrospective with the team.

At a lower level, however, things are not so obvious. Of course people know that there might be some improvements to do. If you ask a developer, she might give you a list of two or three things that should be improved. Same for a tester. But they just can’t make those improvements. They just don’t have time. And because they don’t have time, they cannot complete the try-learn-improve loop. Why? Because their project managers try to hunt slack time down, believing in the 100%-utilization dogma.

There is a misunderstanding here, a confusion between “a good process utilizes all the resources at 100%”, which is questionable, and “utilizing all the resources at 100% makes the process good”, which is definitely wrong. And project managers that continuously try to maximize resource utilization are wrong. They are running after a symptom, a consequence of what they really seek. Metaphorically having a fever does not make you having flu.

If you are a Kanban-aficionado like me, you already know that limiting the work in progress can help us getting the process under control and ensures that things will end up being done. We are not going to start a hundred things at the same time without completing them. But WIP limits serve another purpose : creating slack time. When someone is “blocked” because of WIP limits and cannot do regular work he has to do something else, which can be about improving its own work, thus completing his own try-learn-improve loop, or, if you believe in self-organized teams, improve the current flow by helping unblock the bottleneck, or even improve the whole process if possible.

But even if you don’t practice Kanban, you should still try to create slack time, thus creating room for improvement. The main advantage of creating slack time with work-in-progress limits instead of scheduling it is that it might create opportunities for improvement right away. For example, a developer who cannot do regular work because the testers are overburdened might work on test automation improvement, etc.

As a final note, you are invited to read (and sign) Pawel Brodzinski’s Slacker Manifesto. You might also want to watch the video of his talk at LKCE2012 that deals with the subject.

A Bug Tracking Story

I’ve been working at Ve-hotech for years now. I first started being a developer and then moved to a project manager position…and as far as I can remember the tracking of bugs has always been a problem. The problem is not actually the bug tracking itself but to find a way to handle bugs when you cannot solve them as quickly as they get reported by end users (yes, we’ve been through pretty tough times…)

A classical approach for handling an important amount of bugs is to use a web-based bug tracking system. This is a pretty convenient way to centralize all the reported issues. The business stakeholders can then sort the bug list and prioritize it, allowing the team to pick up the next more important issues and solve them.

Although this approach might be compulsory when dealing with hundreds of bugs or when the team is not collocated, it surely is a pretty big overhead when the number of bugs is thin and the team and product owners are working at the same place.

Understanding that nobody really wanted to use a web interface for managing bugs, that the redmine instance we were using was beginning to get out of sync, and as we were moving to Kanban, I decided to morph the bug tracking system into a physical, visual, post-it driven bug backlog. Imagine a 2-meter high sheet of paper covered with little, yellow sticky notes – 80 or so.

The goal of this bug wall was to ease the work of product owners, since they could see everything at once and select the defects they wanted the team to solve more easily. New issues were added into a special area of the wall so that stakeholders could identify them and decide where to place them on the wall – and when to put them into the Kanban flow.

There were three main drawbacks with this bug tracking implementation :

  1. The team – the people who know about the technical part – was not much involved into the prioritization/evaluation of bugs criticality
  2. There was no simple way to get a whole-picture view of the product stability
  3. Who can possibly sort 80 sticky notes?

To solve these problems, the boss (I wish I have thought of that first but…) came out with the idea of using a sort of criticality matrix.

We could actually dramatically improve how bugs were handled using two criteria:

  1. Intrinsic severity : Does this bug jeopardize users’ data? Does it happen all the time? Is it highly visible? On the contrary is it only a highly improbable situation? Maybe we could not even see it by ourselves? etc. We decided that four levels of criticality were enough, each one having its own set of criteria.
  2. Technical impact : How the team feels about this issue. Is there any identified risk? Is the fix difficult to implement? Do we need to rewrite an important part of the product? Do we even have a clue about how to debug this? etc. We don’t need any precise measures or calculations here. A simple gut feeling will do.

So the current implementation of the matrix is a 2×2-meter grid with horizontally the intrinsic severity, from A to D, and vertically the team feelings : 

As you can notice the first line is special. Bugs that cannot be quickly evaluated go there and need special attention as basically the team do not know where it comes from.

In the above example the two bugs in [A/???] are highly critical and must be dealt with as quickly as possible since, for example, they happen all the time and threaten user’s data, and the team don’t know at all how to solve them. On the contrary the four bugs in [D/:-)] are likely to be very improbable bugs that would be very easy to solve with no real impact on the rest of the product. There is no emergency to treat them.

As we wanted to have a global stability indicator, we affected each row and column an arbitrary coefficient : A=20, B=10, C=5 and D=1; ???=4, : -)=3 etc. In the above example, each [A/???] bug is worth 80 points (20×4) whereas [D/:-)] bugs are worth 1 point each. The above matrix can thus be estimated : 20*(2*4 + 2*2) + 10*(1*4 + 4*3 + 2*2 + 1*1) + 5*(1*4 + 4*3 + 4*2 + 2*1) + 1*(2*3 + 1*2 + 4*1) = 592

This approach facilitates decision making about scheduling while providing a good visualization and an easy-to-understand stability indicator. It can also break the last silo that might still exist in an Agile organization : the one between product owners and teams.

Do you use a visual tool for bugs with your teams? How do you prioritize them? Go share your experience and answer this follow-up question in pm.stackexchange.com

To swarm or not to swarm : return on experience

SwarmingBACKGROUND

Our team has been using Kanban for a while now, with pretty good results I must say. As we are developing and maintaining a long-term project, we deal with both new features development and bug fixes. Bug fixes are divided into three classes of service : “regular” (they actually have no name), “urgent” and “panic”. Regular bugs flow normally and urgent ones must be pulled before regular ones : as their impact is more important on customers, we want them to be fixed before. Panic bugs are critical, they possibly mean crashing the whole system, a loss of customers’ data or so. Dealing with a panic bug fix can break the WIP limits and usually stop the whole “line”, triggering the swarming effect.

SWARMING

If you’ve never heard of swarming applied to software development teams before, you might want to read the following articles:

In short swarming occurs when quite everybody on a team focuses on a specific backlog item.

A SIMPLE OBSERVATION

In our team, a “panic” item is usually analysed-developed-validated-released within one or two working days, whereas a regular bug fix takes an average of three to nine working days to be deployed. So one might come to the conclusion that we should swarm more items than just panic bugs and thus improve the average cycle time…but this is not that simple. We must study the reasons why swarming is efficient on panic items to know if we could swarm around other, lower priority, items.

NORMAL FLOW VS SWARMING

When a bug is reported to our team, we first try to evaluate the possible consequences of it. This often implies that we need to find the causes of the bug. When the consequences are critical enough, we name the issue a “panic” bug, otherwise the bug goes to the backlog and a fix will be developed for it later. This part is always kind of swarmed as everybody is interested in finding what is wrong with the product so, when a bug is found to be a “panic” bug, the environment is already set up in swarming mode and the team easily self-organizes to develop a fix. In fact analyse and development are often one single stage if the fix is trivial, like “The problem comes from here : this condition is wrong as it does not handle this case (that we never thought about before…). We just need to replace it by blah blah”, and the “dev” part is nearly done.

For the “QA” part this is slightly different. Indeed the validation on its own is the same – and must be the same – as for any development. It takes the same amount of time as for a regular bug fix. The difference here is that in swarming mode, as the QA guys are involved from the beginning, they can do some parallel work to speed up validation, like preparing a specific test configuration. It is also possible to validate the fix step-by-step, part-by-part if the implementation is not trivial.

Once the fix has been validated, we immediately release, following the normal procedure. Still we can prepare some steps of the release before the end when in swarming mode

BENEFITS

The first benefit we can point out is that we avoid any kind of buffer. When a fix is developed – when it is considered as “done” regarding the development stage – it sometimes has to stand a little while before someone from the QA validates it. Similarly when a fix has been validated, it might have to wait before we release it. Even if the flow is usually pretty smooth the cycle time will necessarily be shorter in swarming mode.

The other big difference between the normal flow and the panic flow is parallel work. When in the normal flow there is no parallel work at all. First we analyse the problem, then we develop and test a fix, then we validate it at a higher level, and then we release it. When in swarming mode some of the tasks are paralleled. We can see that as a sort of read-ahead, or maybe a read-above : “as we know what you are developing we can set up the right testing configuration”, “as we know that there is an emergency release, let’s start the release process right now”. Of course daily stand-up meetings also provide a kind of read-above, but not as much as swarming

THE EMERGENCY EFFECT

I think that most of the speed gain also comes from what I call the emergency effect. This is some kind of distributed adrenaline rush that explodes when the team is dealing with a “panic” bug. Actually I think that “panic” is not the right term, as nobody is panicking, nobody is scared as we know this can happen, the team is prepared for it. Let’s say we are…excited. That’s why “panic” is definitely not the right word : the team as a whole is just excited about a challenging bug to fix.

Anyway the emergency effect that accompany a panic issue contributes to the success of swarming, as it catalyses self-organization. But what if we change our policies to swarm around every kind of bug? The emergency effect will vanish and we will lose its extra piece of motivation.

DRAWBACKS

The main drawback I see here is throughput. Indeed, by swarming around a single item at a time you miss the opportunity to pipelining. So unless avoiding buffers and doing parallel work can give you enough speed, you won’t have the same throughput as with the normal flow.

In this article, Mike Cohn also point out that swarming “introduces too many opportunities to be in someone else’s way as they try to make progress”. There might be some big team management problems coming from swarming too often.

CONCLUSION

Swarming is efficient. It really is. It contributes to drastically decrease the cycle time. But swarming should not be enforced too often as it relies on a natural and spontaneous cohesion of the team. You know a swarm is not a flock of sheep. I think that one cannot force a team to gather and perfectly work together around any kind of PBI. There must be some kind of emergent behavior to make swarming really efficient. That’s why a critical bug is a good candidate to swarming : the whole bee hive is in danger and must be protected. But when the team does not naturally swarm, then it must be because there is no need to swarm.