Wednesday, January 14, 2009

Thoughts on Formal Methods in Software Engineering

Are formal methods in software engineering really all they are cracked up to be?

First, let me clarify what I mean by formal methods.  Formal methods use mathematical representations of software to formally prove that the software behaves as expected.  Most formal methods use predicate logic and set theory along with other mathematical notations to describe software in a formal specification.  Examples include Z (pronounced zed), state machines (such as the finite state process algebra FSP), Petri Nets, the use of pre and post conditions, and the Object Constraint Language used in conjunction with class diagrams in UML.  Most formal methods have some kind of automated proof capability be it by brute force or through the use of more sophisticated theorem proving programs.

Let's take a quick look at Z which is probably the most widely available formal specification method in terms of books, online documentation, and automated theorem provers.  Here’s a simple example of the birthday book taken from "An introduction to Z and formal specifications" by Mike Spivey.


Looks like gibberish, right?  It isn’t actually important that you understand what this means.  In fact, if you don’t understand it then I’ve already made my point.  Writing formal specifications using a formal modeling language is time consuming, costly, and in the end, not effective since it is too difficult to understand what the specification is actually telling me, the programmer, even if I’m trained in Z.

Formal methods get a lot of traction from the ridiculously high cost of defect injection early in the software lifecycle and the unacceptably high cost of defects in safety critical systems.  Injecting a bug in the requirements phase that is not caught and fixed until testing has been estimated to cost upwards of 100 times the cost of finding and fixing the same bug while the requirements are being written.  In safety critical systems, where human life is literally in the hands of software, deployed defects could result in injury or death (the Therac-25 accidents being some of the most famous).

Formal methods folks argue that a clear, precise, unambiguous, and proven formal specification can prevent many of these early defects from making their way into the software and thus prevent these problems.  Economic claims aside, formal methods fail because they still ignore the weakest link in building software: the human programmers.

Mathematics tells us that it is possible to create functions that map one representation of a set to another equivalent set.  For example, given a cube in three-dimensional space, a rotation function will maintain the dimensions of the cube while rotating the shape about a desired axis.  A similar principle is applied when considering Z and other formal methods.  Mapping functions are used to transform a high-level Z specification into a more concrete representation of the same specification.  Eventually the specification reaches a level of detail that is easily translated to code.  In practices this might mean that a set, such as birthday in the BirthdayBook schema above might be represented by a pair of arrays in a more concrete version of the schema.

While theoretically this sounds great, the problem is that the mapping cannot be automated in most cases so we’re essentially back to square one in terms of what we can say about the more concrete version of the specification.  This is thanks to the inevitable mistakes injected during the transformation process.  Kennedy-Carter’s iUML from the Model Driven Architecture camp is the closest thing to an automated transformation tool I’ve yet seen and even that did not include OCL or any sort of pre and post condition analysis.

So formal methods are difficult to write, difficult to understand, few people know them, and they don’t translate easily into code.  Is there a silver lining?  After all, a lot of time and research has gone into devising formal methods and some of the greatest minds in software engineering are behind them.

I think the answer is a big fat maybe.

Formal methods, in the right hands on the right project can make it easier to understand the system being specified.  A Navy track manager might be one good example.  Rules for how ships assume reporting responsibility might be easily recorded as a predicate.  Safety is certainly a concern and though the validation process will have to begin anew once code is written at least the basic algorithms will be clear and unambiguous.  That is of course assuming the right people are working on the project.  Then again, in the case of the IABM, any specification would have been better than the big plate of nothing the developers had to work with.

For almost every project in the world, I think formal methods should be generally avoided.  Given the option of spending money and time on mathematicians or extremely smart coders I would chose the latter.  With smart coders, code inspection is a fun and effective defect filtering option.  And let’s face it.  Why would you have your amazing coders do something other than write amazing code?