The Universal Pattern of Huge Software Losses.

On my way to the Change Artistry Workshop, September 2013, I read an interesting article:
The Universal Pattern of Huge Software Losses, by Jerry Weinberg, one of the hosts of this monumental workshop.

Let me share a small section of the article:

The Pattern of Large Failures
Every such case that I have investigated follows a universal pattern:
1. There is an existing system in operation and it is considered reliable and crucial to the operation.
2. A quick change to the system is desired, usually from very high in the organization.
3. The change is labelled “trivial.”
4. Nobody notices that statement 3 is a statement about the difficulty of making the change, not the consequences of making it, or of making it wrong.
5. The change is made without any of the usual software engineering safeguards, however minimal, that the organization has in place.
6. The change is put directly into the normal operations.
7. The individual effect of the change is small, so that nobody notices immediately.
8. This small effect is multiplied by many uses, producing a large consequence.

Whenever I have been able to trace management action subsequent to the loss, I have found that the universal pattern continues. After the failure is spotted:

9. Management’s first reaction is to minimize its magnitude, so the consequences are continued for somewhat longer than necessary.
10. When the magnitude of the loss becomes undeniable, the programmer who actually touched the code is fired—for having done exactly what the supervisor said.
11. The supervisor is demoted to programmer, perhaps because of a demonstrated understanding of the technical aspects of the job. [not]
12. The manager who assigned the work to the supervisor is slipped sideways into a staff position, presumably to work on software engineering practices.
13. Higher managers are left untouched. After all, what could they have done?

The article amused me. A few weeks after the workshop, an email was send from our IT supplier to the Testing Department (I was in the cc).

Hi Frank,

Please find a list of known open incidents that could be fixed for the next repair release (I refer to bug tracking system IDs), together with our point of view regarding risks.

5791: Typos/translations -> impact = low risk, we will fix it.
5802: Sorting in the table on attribute. ->impact = also low risk (one line of code has to be changed)
5839: expanding of the list in the table view ->impact = low risk, consequence: the list will be scrolled to the top, please comment that!

5841: Appearance of “unknown error message”-> impact = none, already fixed in next version
5845: Comma between first name and last name -> impact = low risk, should be fixed
5846: Double click function does not work for certain objects. -> impact = low risk, should be fixed
5857: English error text ->impact = low risk, should be fixed
5859: Expanding of the list for the export filter -> impact = low risk
5863: view object in read-only mode -> impact = low risk (just one flag needs to be set correctly), should be fixed
5865: Rename the object window -> impact = low risk, of course 
5867: small error in standard calculation behaviour.. is that really a blocker?!? -> impact = anyway, risk seems to be low, but developer is on leave! Very quick decision required!!!

Best regards,

John

Hmmm…..I notice a pattern….. we just covered steps 1, 2, 3 en 4. Time is running out, but we still have some weeks to make sure we don’t end up in Jerry’s updated version form this article.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s