Sunday, November 13, 2011

Static Testing Vs Dynamic Testing


In software development, static analysis and dynamic testing are two different ways of detecting defects. Unfortunately they are too often thought of as competition for one another, and developers are sometimes encouraged to favor one to the exclusion of the other. This inaccurate and potentially harmful impression may be a consequence of confusion over the power and role of the new generation of static-analysis tools. This article attempts to set the record straight and clear up the misunderstandings.
First, let’s be clear: just because you use static-analysis tools doesn’t mean you should skimp on testing. As I’ll explain, the two techniques are entirely complementary.
Second, for most programs there are no tools or techniques that can prove beyond all doubt that the software is free of defects. Maybe in the future it will become possible to do such things, but we’re not there yet.
The term “static analysis” has indeed been around for decades. The primary purpose of the first-generation tools such as lint was to do stronger type checking, but they could also do some style checking and lightweight bug-finding. Nowadays there are three classes of tool:
  1. Lightweight tools like lint, and others that find violations of coding standards are in one class. They typically use superficial techniques that examine the syntax of the program to find issues. A common complaint about them is their excessive rate of false alarms and that the warnings they issue do not correlate very well with real defects.
  2. Advanced static-analysis tools such as CodeSonar (the tool I work on) are bug hunters—their primary purpose is to find serious programming defects.
  3. Tools that support formal methods can prove the absence of certain classes of error. These are heavyweight tools that require programs conform to a strict subset of the language, so have limited applicability and are not widely used.
There is of course some overlap between these classes. The lightweight tools can be useful if there is a requirement for the code to conform to coding standards; they can also find some kinds of bugs, albeit not nearly as effectively as the other classes of tools. Similarly the advanced tools can also be used to check lightweight properties.
This article concentrates on the advanced static-analysis tools, or bug hunters: those whose primary purpose is to find serious bugs. It is wrong to state that these tools are little different from those that came before. The techniques that distinguish them from the first-generation tools have only been perfected and made feasible fairly recently.
They are simultaneously interprocedural, flow sensitive, context sensitive, whole program, and path sensitive. They scale to tens of millions of lines of code and yield results with a low level of false positives.
These techniques give them the capability to find deep semantic errors in huge code bases. There is no contradiction in observing that they can also subsume the capabilities of the earlier generation of tools. The commercial success and wide adoption of these tools indicates that many users find that these tools deliver real value.
Testing is very good at finding ways in which the software does not meet its requirements. However testing is only as good as the test cases, and it is expensive to create test cases and time-consuming to execute them. It is not uncommon for half of a software development effort to be consumed by testing activities.
Despite all the effort that goes into testing, bugs routinely evade detection and show up in the field. The practice of updating deployed software with patches to fix bugs has become commonplace. And these are not all low-impact defects either; hardly a week goes by without new security vulnerabilities in popular software coming to light, many of which have their roots in programming errors. If it were possible for testing to be more exhaustive, then maybe there would be no need for additional defect-finding techniques, but there is clearly no shortage of defects to be found in software that has been well tested.
Advanced static analysis tools routinely find serious errors that escaped detection through testing. One such example is the Zune bug. On December 31st 2008, thousands of Microsoft Zune music players worldwide simultaneously failed due to a leap-year bug in the firmware. The incident was highly embarrassing for Microsoft and led to a further decline in the reputation of a beleaguered (and now discontinued) product. That very bug (an infinite loop) was easily found by our static analysis tool in a few minutes.
This happens with safety-critical software too. For example, recently the FDA used static analysis to examine the source code in a medical device that was under investigation because it had failed in the field. The device had already undergone rigorous dynamic testing as required for certification. Yet advanced static analysis was able to uncover over 40 serious software problems of which the manufacturer was unaware.
Even in avionics, where software development is highly regulated and the most safety-critical components are required to undergo very stringent testing with clearly-defined coverage criteria, static analysis tools still manage to find serious defects, usually to the surprise of the developers.
This is of course an economic issue: testing is expensive and increased investment yields diminishing returns. Absolute guaranteed perfection is unattainable, so how much testing is enough is a business decision, and clearly tied to the level of quality required for the situation in which the software will be deployed.
The reason advanced static-analysis tools are successful is that they change the economics for several reasons:
  1. They can find bugs early in the development cycle, and bugs found earlier are less expensive to fix.
  2. They do not require program inputs, so bugs can be found and eliminated without incurring the expense of developing test cases.
  3. They can make it easier and less expensive to develop dynamic test cases. The consequence is they can eliminate more bugs for less expense.
Static analysis can be used as soon as the code can be compiled. The code does not have to be complete, or integrated with the rest of the program in order for the technique to be able to begin to find bugs. It can be run on a single file at a time, or a complete codebase, or anything in between. The results are better when the entire program is analyzed, but an analysis of small parts can also be useful. Thus developers can get very quick feedback on their code quality.
It is relatively easy to create test cases for normal inputs, but expensive to try and cover lots of corner cases. Advanced static-analysis tools work by doing a symbolic execution of the code. Instead of using real data as input, the analysis generates symbols that represent program inputs, so the results are an approximation to all possible inputs. This means that the tools are good at finding errors that occur only for very unusual or unexpected combinations of values.
Finally, static analysis tools can actually help make test-case generation easier by pointing out non-obvious flaws in the code that make it difficult or even impossible to test. For example, in order to get FCC certification for the most safety-critical avionics code, it must be tested to full MCDC coverage. Roughly speaking, this means that there must be test cases such that all conditional expressions are evaluated to both true and false.
It can be very time consuming to generate such test cases because you have to trace the flow of data and control back through the code to figure out what inputs can influence the condition and how. It can be especially frustrating if it turns out that it is fundamentally impossible to create inputs because the condition always evaluates in the same way; something that is surprisingly common in our experience. Static-analysis tools can find such redundant conditions very easily, and so can save a great deal of time in test case generation.
In conclusion, dynamic testing and static analysis should never be thought of as if they were in direct competition with each other. They use completely different techniques to find defects, so it does not make sense to compare them directly. It is more useful to consider them as mutually supporting techniques that together make the task of creating high-quality software easier and more efficient.
 Paul Anderson is VP of Engineering at GrammaTech, a spin-off of Cornell University that specializes in static analysis.  He received his B.Sc. from Kings College, University of London and his Ph.D. in computer science from City University London.  Paul manages GrammaTech’s engineering team and is the architect of the company’s static-analysis tools.  He has helped a wide variety of organizations, including NASA, the FDA, the FAA, MITRE, Draper Laboratory, GE, Lockheed Martin, and Boeing, apply automated code analysis to critical projects.  Paul has worked in the software industry for 16 years, with most of his experience focused on developing static-analysis, automated-testing, and program-transformation tools.  A significant portion of his work has involved applying program analysis to improve security.  His research on static analysis tools and techniques has been reported in numerous articles, journal publications, book chapters, and international conferences.

1 comments:

Post a Comment