[Year 12 Its] Why Buggy Software Gets Shipped

Fri May 26 12:56:48 EST 2006

An interesting article to share with kids ..
http://developers.slashdot.org/developers/06/05/25/1318212.shtml

and an interesting discussion about it ...
http://developers.slashdot.org/developers/06/05/25/1318212.shtml

enjoy!

Margaret

PS, if you use Moodle, Slashdot has a good RSS feed that you can hook upto

----

Why we all sell code with bugs

Creating quality software products means knowing when to fix bugs and when
to leave well alone, writes Eric Sink

Thursday May 25, 2006
The Guardian

The world's six billion people can be divided into two groups: group one,
who know why every good software company ships products with known bugs;
and group two, who don't. Those in group 1 tend to forget what life was
like before our youthful optimism was spoiled by reality. Sometimes we
encounter a person in group two, a new hire on the team or a customer, who
is shocked that any software company would ship a product before every
last bug is fixed.
Every time Microsoft releases a version of Windows, stories are written
about how the open bug count is a five-digit number. People in group two
find that interesting. But if you are a software developer, you need to
get into group one, where I am. Why would an independent software vendor -
like SourceGear - release a product with known bugs? There are several
reasons:

· We care about quality so deeply that we know how to decide which bugs
are acceptable and which ones are not.

· It is better to ship a product with a known quality level than to ship a
product full of surprises.

· The alternative is to fix them and risk introducing worse bugs.

All the reasons are tied up in one truth: every time you fix a bug, you
risk introducing another. Don't we all start out with the belief that
software only gets better as we work on it? Nobody on our team
intentionally creates new bugs. Yet we have done accidentally.

Risky changes

Every code change is a risk. If you don't recognise this you will never
create a shippable product. At some point, you have to decide which bugs
aren't going to be fixed.

Think about what we want to say to ourselves just after our product is
released. The people in group two want to say: "Our bug database has zero
open items. We didn't defer a single bug."

The group 1 person wants to say: "Our bug database has lots of open items.
We have reviewed every one and consider each to be acceptable. We are not
ashamed of this list. On the contrary, we draw confidence because we are
shipping a product with a quality that is well known. We admit our product
would be even better if all these items were 'fixed', but fixing them
would risk introducing new bugs."

I'm not suggesting anybody should ship products of low quality. But
decisions about software quality can be tough and subtle.

There are four questions to ask about every bug. The first two are
customer ones, and the next two are developer ones.

1) How bad is its impact? (Severity)

2) How often does it happen? (Frequency)

3) How much effort is required to fix it? (Cost)

4) What is the risk of fixing it? (Risk)

I like to visualise the first two plotted on a 2D graph, with severity on
the vertical axis. The top of the graph is a bug with extreme impact ("the
user's computer bursts into flames") and the bottom one has very low
impact ("one splash screen pixel is the wrong shade of grey").

The horizontal axis is frequency: on the right side is a bug that happens
very often ("the user sees this each day") and on the left, one that
seldom happens.

Broadly speaking, stuff gets more important as you move up or to the right
of the graph. A bug in the upper right should be fixed. A bug in the lower
left should not. Sometimes I draw this graph on a whiteboard when arguing
for or against a bug.

Questions three and four are about the tradeoffs involved in fixing the
bug. The answers can only ever make the priority of a bug go down - never
up. If, after answering questions one and two, a bug does not deserve
attention, skip the other two. A common mistake is to use question three
to justify fixing a bug that isn't important. We never make unimportant
code changes just because they're easy.

Every code change has a cost and a risk. Bad decisions happen when people
make code changes ignoring these two issues.

For instance, our product, Vault, stores all data using Microsoft SQL
Server. Some people don't like this. We've been asked to port the back end
to Oracle, PostgreSQL, MySQL and Firebird. This issue is in our bug
database as item 6740. The four questions would look like this:

· Severity: People who refuse to use SQL Server can't use Vault.

· Frequency: This "bug" affects none of our users; it merely prevents a
group of people from using our product.

· Cost: Very high. Vault's backend makes extensive use of features
specific to Microsoft SQL Server. Contrary to popular belief, SQL isn't
portable. Adapting the backend for any other database would take months,
and the maintenance costs of two back ends would be quite high.

· Risk: The primary risk lies in any code changes made to the server to
enable it to speak to different backend implementations of the underlying
SQL store.

Obviously, this is more of a feature request than a bug.

Example: Item 10016. Linux and MacOS users have problems over how
end-of-line terminators show up. Last October, we tried to fix this and
accidentally introduced a nastier bug that prevented users creating new
versions of a project. So the four questions for 10016 would look like
this:

· Severity: For a certain class of users, this bug is a showstopper. It
does not threaten data integrity, but makes Vault unusable.

· Frequency: This bug only affects users on non-Windows platforms, a
rather small percentage of our user base.

· Cost: The code change is small and appears simple.

· Risk: We thought - wrongly - that the risk was low.

If testing had told us that the risk was higher than we thought, we would
have revisited the four questions. Because the frequency is relatively
low, we might have decided to defer this fix until we figured out how to
do it without breaking things. In fact, that's what we ended up doing: we
"undid" the fix for Bug 10016 in a minor update to Vault, so it's now open
again.

Contextual understanding

Not only do you have to answer the four questions, you have to answer them
with a good understanding of the context in which you are doing business.
You need to understand the quality expectations of your market segment and
what the market window is for your product. We can ship products with
"bugs" because there are some that customers will accept.

I know what you want, and I want it too: a way to make these decisions
easy. I want an algorithm with simple inputs to tell me which bugs I
should fix and in what order.

I want to implement this algorithm as a feature in our bug-tracking
product. Wouldn't it be a killer feature? In the Project Settings dialog,
the user would enter a numeric value for "Market Quality Expectations" and
a schedule for the closing of the "Market Window". For every bug, the user
would enter numeric values for severity, frequency, cost and risk. The
Priority field for each bug would be automatically calculated. Sort the
list on Priority descending and you see the order in which bugs should be
fixed. The ones near the bottom should not be fixed at all.

I'd probably even patent this algorithm even though, in principle, I
believe software patents are fundamentally evil.

Alas, this ethical quandary is not going to happen, as "Eric's Magic Bug
Priority Algorithm" will never exist. There is no shortcut. Understand
your context, ask all four questions and use your judgment.

Experienced developers can usually make these decisions quickly. It only
takes a few seconds mentally to process the four questions. In tougher
cases, gather two co-workers near a whiteboard and the right answer will
probably show up soon.

· Eric Sink is a software developer at SourceGear. A longer version of
this article appeared on his website: see http://software.ericsink.com/