A Closer Look at Risk Burndown

I like the idea of the risk burndown chart. Burndown is an effective and satisfying visual indicator of progress and it’s relatively easy to calculate to boot. But does looking at a project’s risks through the lens of a burndown chart make sense?

I see several problems with thinking about risk in this way.

Numbers can be Misleading

The first key to effective risk management is to value accuracy over precision. This means that it’s better to be right in your predictions than it is to be spot on correct. Remember, risk is about assessing your likelihood for project success. It doesn’t matter if you miss your threshold of success by a little or a lot; either way you still fail the project!

Pop quiz. Say there are two risks in your project. There’s a 25% probability that Risk A will become a problem while Risk B only has a 20% probability. For now, assume the impact is the same for both risks. Which risk is a greater threat to the project?

That one’s easy. Risk A is a greater threat because, impacts aside, Risk A has a 5% greater probability of turning into a problem. Ok. What if I told you that I made up probabilities based on my gut feelings so I could easily rank risks? Now which risk is a greater threat to the project?

The real question I’m asking you is this. Are you willing to bet the success of your project on those numbers? Because if my best guess, gut feeling probabilities are off by more than 5%, the project could be in serious trouble depending on the risks’ impacts.

I know, I know. That was a trick question. Nobody on your team would make up numbers on one of your software projects. In all fairness, nobody goes out of their way to fabricate false values. Use your logics. If you were any good at guessing the probability of futures events occurring, you would not be reading this post right now. You would be a multi-millionaire, off enjoying your gambling winnings from the ponies. Too much precision gives folks too much confidence in the correctness of your assessment when the reality is that probability and impact are based on best guesses and gut feelings. Probability and impact numbers just make it easier to calculate exposure so risks can be ranked automatically. Burndown is a fairly precise metric.

Not all Risks are Created Equal

If you are monitoring project risk with a risk burndown chart, how do you know whether the right risks are being reduced? Let's take a look at an example. Which of these sets of risks should be addressed?

Set 1 with a total exposure of 7 days made up of the following risks:

Risk A has a probability of 20% and an impact of 15 for an exposure of 3 days.
Risk B has a probability of 25% and an impact of 10 days for an exposure of 2.5 days.
Risk C has a probability of 30% and an impact of 5 days for an exposure of 1.5 days.

Or Set 2 with a total exposure of 7 days (6.7 rounded up) made of the the following risk:

Risk D has a probability of 95% and an impact of 7 days for an exposure of 6.7 days.

In the first set, I can mitigate 3 risks, each with very low probability of becoming problems. In the second set I mitigate only 1 risk that is almost certainly going to become a problem. Reducing the imminent risk seems to make the most sense but this choice is not reflected in a risk burndown chart. Simply reducing risk over time is not enough. You have to reduce the right risks.

Impact Isn’t Really About Money or Effort

The only way for a visual chart such as risk burndown to work is if we’re able to quantify risks. This is generally done with exposure. Exposure = probability x impact. Impact is a funny thing. Impact is an assessment of how much the consequence of a risk will affect the project if the risk becomes a problem. Traditionalists like to think about this from a money perspective (which makes sense since software engineers stole most of our risk management practices from the finance world, originally anyway). For small teams, effort is a better measure as in the number of person days a risk that becomes a problem will cost to fix. This is a quantifiable loss.

There’s a problem with thinking about impact in terms days of loss. Since not all risks are created equal, not all loss is truly equal either. Some kinds of loss can’t be measured in terms of effort. It really all depends on your project’s threshold of success. Some example risks (which don’t rely on ye olde life-critical system standby) from which you might never recover if they became problems include:

We don’t have a reliable backup solution; might lose all of our project data. (Lost yer data? You’re up a creek, son!)
We don’t have backup power for our data center; data centers might go offline for more than a few hours. (How many days will it take you to get those customers back?)
The demo has bugs and our contract renewal is based exclusively on how much the client likes our demo; a bug might occur during the demo. (HA! HA! You don't have a job!)

In all of these cases you would reduce the risk by working on attributes other than impact (e.g. reduce probability, eliminate the condition, extend the time frame). Enough said. When it comes to calculating exposure, each of these risks has a catastrophic impact. That’s catastrophic, short for epic failure. No amount of days can really capture the essence of complete catastrophe. Impact works best when considered in terms of success, not days or dollars lost.

Forget Risk Burndown

I want risk burndown to make sense, but given the problems I can't help but think of it as a meaningless metric. Sure, some risks will be reduced and some will go away by converting into problems or being overcome by events. And a chart showing this would be really neat. But you’ll also uncover new risks as the project goes on. And some risks are just not worth caring about while others deserve a lot of attention. Risk management is about identifying the things that are most likely to kill your project so you can deal with them before it becomes too expensive (or impossible). A burndown chart doesn't reflect any of these things directly.

Burndown masks project risks too much and gives teams a false sense of confidence. To put it another way, there’s a risk with using risk burndown:

Our new risk management strategy assumes our estimation precision is better than it is; we may not mitigate the right risks.

Exposure is a ruse. And risk burndown is a metric for showing a reduction in exposure over time. To wax poetic, perception is reality and risk burndown provides a false perception.

That said, any risk management is better than none at all. If a risk burndown chart helps to get your team thinking about risk, then so be it. But there are other ways (might not be as fancy) to manage risk which are easier and more effective.

Search This Blog

Reflections on Software Engineering