### Thoughts on Formal Methods in Software Engineering

Are formal methods in software engineering really all they are cracked up to be?

First, let me clarify what I mean by formal methods. Formal methods use mathematical representations of software to formally prove that the software behaves as expected. Most formal methods use predicate logic and set theory along with other mathematical notations to describe software in a formal specification. Examples include Z (pronounced zed), state machines (such as the finite state process algebra FSP), Petri Nets, the use of pre and post conditions, and the Object Constraint Language used in conjunction with class diagrams in UML. Most formal methods have some kind of automated proof capability be it by brute force or through the use of more sophisticated theorem proving programs.

Let's take a quick look at Z which is probably the most widely available formal specification method in terms of books, online documentation, and automated theorem provers. Here’s a simple example of the birthday book taken from "An introduction to Z and formal specifications" by Mike Spivey.

Looks like gibberish, right? It isn’t actually important that you understand what this means. In fact, if you don’t understand it then I’ve already made my point. Writing formal specifications using a formal modeling language is time consuming, costly, and in the end, not effective since it is too difficult to understand what the specification is actually telling me, the programmer, even if I’m trained in Z.

Formal methods get a lot of traction from the ridiculously high cost of defect injection early in the software lifecycle and the unacceptably high cost of defects in safety critical systems. Injecting a bug in the requirements phase that is not caught and fixed until testing has been estimated to cost upwards of 100 times the cost of finding and fixing the same bug while the requirements are being written. In safety critical systems, where human life is literally in the hands of software, deployed defects could result in injury or death (the Therac-25 accidents being some of the most famous).

Formal methods folks argue that a clear, precise, unambiguous, and proven formal specification can prevent many of these early defects from making their way into the software and thus prevent these problems. Economic claims aside, formal methods fail because they still ignore the weakest link in building software: the human programmers.

Mathematics tells us that it is possible to create functions that map one representation of a set to another equivalent set. For example, given a cube in three-dimensional space, a rotation function will maintain the dimensions of the cube while rotating the shape about a desired axis. A similar principle is applied when considering Z and other formal methods. Mapping functions are used to transform a high-level Z specification into a more concrete representation of the same specification. Eventually the specification reaches a level of detail that is easily translated to code. In practices this might mean that a set, such as

While theoretically this sounds great, the problem is that the mapping cannot be automated in most cases so we’re essentially back to square one in terms of what we can say about the more concrete version of the specification. This is thanks to the inevitable mistakes injected during the transformation process. Kennedy-Carter’s iUML from the Model Driven Architecture camp is the closest thing to an automated transformation tool I’ve yet seen and even that did not include OCL or any sort of pre and post condition analysis.

So formal methods are difficult to write, difficult to understand, few people know them, and they don’t translate easily into code. Is there a silver lining? After all, a lot of time and research has gone into devising formal methods and some of the greatest minds in software engineering are behind them.

I think the answer is a big fat maybe.

Formal methods, in the right hands on the right project can make it easier to understand the system being specified. A Navy track manager might be one good example. Rules for how ships assume reporting responsibility might be easily recorded as a predicate. Safety is certainly a concern and though the validation process will have to begin anew once code is written at least the basic algorithms will be clear and unambiguous. That is of course assuming the right people are working on the project. Then again, in the case of the IABM, any specification would have been better than the big plate of nothing the developers had to work with.

For almost every project in the world, I think formal methods should be generally avoided. Given the option of spending money and time on mathematicians or extremely smart coders I would chose the latter. With smart coders, code inspection is a fun and effective defect filtering option. And let’s face it. Why would you have your amazing coders do something other than write amazing code?

First, let me clarify what I mean by formal methods. Formal methods use mathematical representations of software to formally prove that the software behaves as expected. Most formal methods use predicate logic and set theory along with other mathematical notations to describe software in a formal specification. Examples include Z (pronounced zed), state machines (such as the finite state process algebra FSP), Petri Nets, the use of pre and post conditions, and the Object Constraint Language used in conjunction with class diagrams in UML. Most formal methods have some kind of automated proof capability be it by brute force or through the use of more sophisticated theorem proving programs.

Let's take a quick look at Z which is probably the most widely available formal specification method in terms of books, online documentation, and automated theorem provers. Here’s a simple example of the birthday book taken from "An introduction to Z and formal specifications" by Mike Spivey.

Looks like gibberish, right? It isn’t actually important that you understand what this means. In fact, if you don’t understand it then I’ve already made my point. Writing formal specifications using a formal modeling language is time consuming, costly, and in the end, not effective since it is too difficult to understand what the specification is actually telling me, the programmer, even if I’m trained in Z.

Formal methods get a lot of traction from the ridiculously high cost of defect injection early in the software lifecycle and the unacceptably high cost of defects in safety critical systems. Injecting a bug in the requirements phase that is not caught and fixed until testing has been estimated to cost upwards of 100 times the cost of finding and fixing the same bug while the requirements are being written. In safety critical systems, where human life is literally in the hands of software, deployed defects could result in injury or death (the Therac-25 accidents being some of the most famous).

Formal methods folks argue that a clear, precise, unambiguous, and proven formal specification can prevent many of these early defects from making their way into the software and thus prevent these problems. Economic claims aside, formal methods fail because they still ignore the weakest link in building software: the human programmers.

Mathematics tells us that it is possible to create functions that map one representation of a set to another equivalent set. For example, given a cube in three-dimensional space, a rotation function will maintain the dimensions of the cube while rotating the shape about a desired axis. A similar principle is applied when considering Z and other formal methods. Mapping functions are used to transform a high-level Z specification into a more concrete representation of the same specification. Eventually the specification reaches a level of detail that is easily translated to code. In practices this might mean that a set, such as

*birthday*in the*BirthdayBook*schema above might be represented by a pair of arrays in a more concrete version of the schema.While theoretically this sounds great, the problem is that the mapping cannot be automated in most cases so we’re essentially back to square one in terms of what we can say about the more concrete version of the specification. This is thanks to the inevitable mistakes injected during the transformation process. Kennedy-Carter’s iUML from the Model Driven Architecture camp is the closest thing to an automated transformation tool I’ve yet seen and even that did not include OCL or any sort of pre and post condition analysis.

So formal methods are difficult to write, difficult to understand, few people know them, and they don’t translate easily into code. Is there a silver lining? After all, a lot of time and research has gone into devising formal methods and some of the greatest minds in software engineering are behind them.

I think the answer is a big fat maybe.

Formal methods, in the right hands on the right project can make it easier to understand the system being specified. A Navy track manager might be one good example. Rules for how ships assume reporting responsibility might be easily recorded as a predicate. Safety is certainly a concern and though the validation process will have to begin anew once code is written at least the basic algorithms will be clear and unambiguous. That is of course assuming the right people are working on the project. Then again, in the case of the IABM, any specification would have been better than the big plate of nothing the developers had to work with.

For almost every project in the world, I think formal methods should be generally avoided. Given the option of spending money and time on mathematicians or extremely smart coders I would chose the latter. With smart coders, code inspection is a fun and effective defect filtering option. And let’s face it. Why would you have your amazing coders do something other than write amazing code?

Your argument would be stronger if you had copied out those two schemas correctly, but you have given the second one the wrong name.

ReplyDeleteAnd it would be fairer if you pointed out that Z specifications are usually strong on informal commentary that explains the mathematics and relates it to an informal understanding of the system being described.

Mike, thanks for your comment, it's great to hear from you! I'll look into updating the image to have the correct schema name, though regardless, I think my point is still pretty clear — Z notation, and other formal models like it, is difficult to read and, in some ways moot since mapping functions from model to code are still a major weakness.

ReplyDeleteAlso, after some serious reflection on this subject and a little growth, I changed my mind on some of these points: http://neverletdown.net/2009/12/the-domestication-of-formal-methods/. Two things brought me around from thinking formal methods were complete waste to believing that they can provide serious value in the right circumstances.

One is seeing the WSDL 2.0 Z specification — the only "real world" Z specifications that I have ever seen. http://www.w3.org/TR/wsdl20/wsdl20-z.html The only downside to this is that I don’t think the specification was really used for anything other than… making a specification. In the context of the WSDL 2.0, that is OK, but my argument about a lack of solid mapping functions still stands. I want to see working code in the end and arguments about preventing defect injection fall apart in my opinion simply because for the cost required to create a formal mathematical model I likely could have written a piece of the system and tested it. Now, switching to a "domesticated" formal method and getting closer to the code is a really good compromise and starts to make a lot of sense in my opinion.

The other is reading Anthony Hall’s "7 Myths of Formal Methods" paper. Mike, you are 100% correct about your second statement (and I was wrong to omit this from my essay, though at the time I didn't understand or believe it) – the model is good, but the prose that goes with the model is invaluable. Further, the process of creating the prose and model, in the right situations, is priceless. The WSDL specification is a much better example for all of these positive attributes than the toy "Birthday Book" example.