Wednesday, October 29, 2014

Defect-Driven Process Improvement

Defects can tell you a lot about your team.  And not just what the problems are in the code.  The Team Software Process uses a very simple method of defect classification to unlock hidden insights. My team recently did some analysis on our bug database at the conclusion of a project and unearthed some surprising information about how we deal with requirements as well as some weaknesses we need to work on.

Simple Defect Classification 

The simplest defect classification method involves manually evaluating defects to add contextual meta information.  The Team Software Process suggests some of the following pieces of information as a starting point for creating a defect taxonomy.
  • Related Feature - the feature to which the defect was reported against 
  • Module Discovered - the code in which the defect was found 
  • Defect Type - a standardized description of the defect, such as "assignment," "data," or "syntax." 
  • Reason Injected - why did the bug make it into the software in the first place?    This should also be standardized, for example "education," "oversight," or "mistake." 
  • Phase Injected - what were the activities being performed when the defect was injected?  For example, "requirements," "design," or "implementation." 
  • Phase Discovered - what were the activities being performed when the defect was discovered? These would be the same as the list for phase injected.

This simple method has some disadvantages, the biggest one being that you usually will not be able to understand the reason a defect was injected until after you understand the root cause of the defect. Also, unless you were the one who wrote the code, you can only guess at the reason (was this an oversight, mistake, or education?) the defect was injected. As such, some knowledge of the system is required to perform the analysis. There are other methods for defect classification that overcome this problem but take a little more training and planning to apply well.  One example is the Orthogonal Defect Classification method from IBM Research.

Since the defects in a software system are a direct reflection of the process, knowledge, and skills of the team that injected them, once the defects are classified you have some serious power for enacting positive change.

A Snapshot of our Defect Data

The following graphs summarize some of the more interesting trends we discovered by looking at our data.  There were two very interesting findings.  First, we had a major problem with requirements that resulted in significant rework.  Second, we need to rethink the way we test and work to improve.

Figure 1 shows the reason why various defects were injected. Defects that originate in tests were excluded from this particular analysis so that we could focus on other areas that might also be interesting.  Communication in our vernacular refers to cases where the defect resulted from misinformation originating in documentation or personal interactions (e.g. a meeting).  We noticed that a disproportionately large number of defects resulted from miscommunication and so broke down communication into true communication problems, changes that were not effectively communicated, and new requirements that were only flushed out as reported bugs from the customer.

Distribution of defects injected by reason
Figure 1. Defect injection reason count and percentage.
Defects related to poor test cases are excluded from this summary.

Figure 2 summarizes the information in figure 1 into two basic categories.  It turns out that 55% of reported non-test defects resulted from requirements or communication issues.  While 18% of reported non-test defects were injected directly by the development team.  This 18% includes oversights (things you just forgot), education (things you should have known how to do), and mistakes (things you understood and meant to do but botched somehow).  We also included process issues in which the process we were following created an opportunity for a defect that was actually injected into the software and escaped to the customer for review.

Summary of distribution of defects injected by reason
Figure 2.  Defect injection reason count and percentage, identical to Figure 1.
Shows that communication and requirements issues account for 55% of all non-test
related defects. Further, 18 % of defects were injected directly by the team and might
be avoided through better training, testing, or peer reviews.

While understanding why we injected defects is interesting, understanding where the defects are injected is important to creating a plan for doing something about it.  Figure 3 shows the defect count and percentage for each of the main areas of the software product we were developing.  The top three trouble spots, our "bug farms" were the display configuration (user interface), test cases, and data conversion and extraction code.  That poor test cases were one of our greatest sources of defects is extremely disturbing.

Distribution of defects by software code module
Figure 3.  Count and percentage of defects injected into different software modules.
This graph includes defects related to tests.

Putting it all together (figure 4) really shows our trouble spots and gives a pretty fair view of where the team needs to improve.  Also important, the data shows low value areas and establishes a baseline for other modules and reasons that we don't need to focus on currently but also don't want to see suddenly increase over time.

Defects by module by reason for injection
Figure 4.  Count of defects by module by reason injected.

When I look at this data, I see that communication in general is a pretty big problem.  There were acute problems when trying to communicate requirements in the UI, especially late feature requests and changes.  Finally, my team just couldn't get it together for testing and made mistakes all over the place.

In the remainder of this post I am going to focus on how we dealt with changing requirements and how we might improve.  Testing is hopefully a topic for another day...

Change, Essence, and Accident in Agile Software Development

As you can see, we had a lot of defects escape implementation to customer review.  Digging a little deeper, close to 50% of our defects reported coming out of customer review had to do with UI issues. Digging deeper still, the majority of these UI issues weren’t bugs but rather changes to the originally specified requirements.  In other words, we built something that we thought was correct, but after the fully implemented feature was reviewed the customer decided to change how the feature works. 

This is rework.  Changes like this are waste.

"But wait!" you say, "You are an agile team!  You should be embracing change!  Change isn’t waste, change is the name of the game!"

Of course we embrace change.  Every UI issue reported by our customer was a learning opportunity and a chance for us to make better software for them.  But that does not change the fact that change has a cost.

With any software development there is going to be change.  Agile methods anticipate and even embrace this uncertainty.  That’s the whole point.  The question we should be asking is this. How much change is expected?  And what is a reasonable cost of change in a project?   
In other words, how much rework is essential to an Agile project and how much is avoidable, accidental rework?

Regardless of whether you are embracing change or not, reducing rework is still a good thing. Getting it right the first time, before you have implemented a feature and delivered it to the customer for review is the most cost effective way to build a system.

But at the same time an essential factor for success is to create feedback loops for learning more about your users and what they need.  Change is waste.  Change is also necessary.

Applying Knowledge from Defect Classification

In our case, after looking at the data and reflection on what we saw the team felt strongly that there is room for improving our requirements process.  We had a hunch that requirements were a problem but we had no idea that it was as bad as it was.  The next problem we need to solve was what to do about it.

Looking back at past projects, it seemed like requirements churn was a problem but it was never as obvious as it is now.  We've tried several different approaches over the past year and a half with mixed results.
  • "Shall statements" didn't work because they aren't descriptive enough.  The traditional method for specifying these requirements are in an Excel spreadsheet.  This encourages brevity and makes for bad requirements. 
  • User stories are slightly better but the majority of our customers are not familiar enough with the format.  The end result is a confused customer and poorly specified and validated system.
  • Letting the customer drive the requirements has also been mixed.  The customer is usually very happy with the format (they choose it) and eager to review.  Unfortunately, the requirements are usually lacking important information or the customer's chosen format is otherwise awkward for the implementation team.  In other words, it's good for them but not good for us or the project on the whole. 
  • Use cases were extremely successful with the development team, but we experienced similar issues as user stories when presenting the use cases to the customer.  The format and information was just abstract or foreign enough to make the representation a challenge.
For a very long time I had assumed that the problem was with the requirements representation itself. I assumed that there must be some "holy grail" format for specifying functional requirements.  After examining this data and reflecting on our past experiences, I'm not so sure any more.  User stories have been extremely successful within other parts of our engineering organization.  And I have firsthand success stories applying use cases.

Knowing that the majority of our requirements issues stem from change is the enlightening nugget of knowledge here. Changes are a positive thing. But can we flush out these changes before we fully implement the feature? 

Based on our data, here are some things we are thinking about trying on our next project. 
  • Apply use cases and take more time upfront to educate the customer on how to read and evaluate them. 
  • Try to match the right requirements specification type with the customer's behavioral and technical inclinations.
  • Try out a different requirement specification representation. We're working on a new idea, "FAQ Requirements." Maybe an easier to read and review representation will resolve some of these problems? 
  • Wait to conduct a UI workshop until after at least part of the system has been implemented. Our hope is that by this point the customer will have seen some aspects of the system and we will have had an opportunity to teach them core concepts in the application space. 
  • Spend more time prototyping using cheap prototyping methods. In some cases, paper prototyping might have resolved some of our uncertainty much more quickly than a fully or even partially implemented feature.
The goal in making a process change is not to remove the feedback loops or eliminate change by trying to get it right the first time. Instead the goal is to help us fail faster, cheaper, earlier, and more often. We want to make the mistakes early, before they are realized in code, when change is cheap. It's a lot easier to change a few lines on a whiteboard than it is to rewrite an entire code model.

Failure is a mechanism for learning. This education needs to be as cheap as possible. Our focus over the next few months is going to be figuring out how to make our requirements representation so good that it fails fast as a requirement.  Hopefully this way we can avoid waste from rework.