Tuesday, November 15, 2011

Cyclomatic Complexity

Software metrics often receive negative criticism, as they are viewed as an exact science, uniformly applicable to all scenarios. True, software metrics are an unbiased, objective measurement of a particular aspect of code; however, a particular metric's applicability to a domain is usually subjective. For instance, highly coupled code may have been intentionally designed that way for performance reasons; consequently, a coupling metric that suggests problems with this code must be evaluated in the context of the overall application.
With this in mind, applying various software metrics to a code base can be an effective overall gauge of software quality. One such metric, cyclomatic complexity, can be helpful in ascertaining areas of code that may require additional attention to head off future maintenance issues. That attention, moreover, can take the form of unit testing and refactoring.
Pioneered in the 1970s by Thomas McCabe of McCabe & Associates fame, cyclomatic complexity essentially represents the number of paths through a particular section of code, which in object-oriented languages applies to methods. Cyclomatic complexity's formal academic equation from graph theory is as follows:
CC = E - N + P
where E represents the number of edges on a graph, N the number of nodes, and P the number of connected components.
If you've already given up on software metrics after reading that equation, there is an easier equation that will make sense. Cyclomatic complexity can be explained in layman's terms as follows: every decision point in a method (i.e., an if, for, while, or case statement) is counted; additionally, one is added for the method's entry point, resulting in an integer-based measurement denoting a method's complexity.
For instance, the following uncomplicated method will yield a cyclomatic complexity value of 3.
public int getValue(int param1) {
  int value = 0;
  if (param1 == 0)  {
    value = 4;
  } else {
    value = 0;
  }
  return value;      
}
In the above method, there are two decision points: an if and an else. Remembering that a method's entry point automatically adds one, the final value equals 3.
Indeed, the example method is quite simple; however, one can imagine that if there were 10 additional decision points (yielding a cyclomatic complexity value of 13), the method may be perceived as complex. While one's opinion as to what construes code complexity is quite subjective, over the years the software industry has largely agreed that highly complex code can be difficult for engineers to understand and therefore harder to maintain. Moreover, highly complex code has a high probability of containing defects.
Various authors and studies have, in fact, suggested that a cyclomatic complexity value of 10 or higher for a particular method is considered complex. This is the key point. By determining the cyclomatic complexity of various methods found in objects and paying attention to outlier values, one can uncover code that most likely should become a candidate for a highly focused unit testing/refactoring effort.

Determining Cyclomatic Complexity

There are various commercial and open source software metrics tools on the market that can examine source code and report on the gamut of software metrics. One such tool, which happens to be open source, is PMD.
Running PMD against a code base is quite easy (see this article by Tom Copeland). Upon examining the code, PMD will produce a report in various forms, such as XML and HTML, that describes the various customizable infractions found.
For example, running PMD against a code base may produce a report that contains the following snippet of XML:
<file name="C:\com\dgg\web\AnomalyAction.java">
<violation line="1" 
           rule="CouplingBetweenObjectsRule">
A value of 27 may denote a high amount of coupling 
within the class
</violation>
<violation line="103" 
           rule="CyclomaticComplexityRule">
The method 'updateAnomaly' has a 
Cyclomatic Complexity of 22.
</violation>
</file>
As the XML demonstrates, AnomalyAction.java has a method, updateAnomaly, which has a rather high value of cyclomatic complexity. Interestingly enough, this class also appears to be quite coupled to various other objects within the code base, which only serves to complicate the code further. After viewing this report, red flags should start popping up, as clearly the AnomalyAction object's quality and relative stability have been called into question.
Software metrics, however, need to be evaluated in the context of the application against which they are being applied. While some red flags have now been raised, are there others? Reading the entire report may reveal even more egregiously high values of cyclomatic complexity. Additionally, it could turn out that every class examined possesses a high cyclomatic complexity value. The key to effective utilization of cyclomatic complexity and other metrics is to look for the outliers; i.e., those whose complexity is much higher than the rest of the code.
The outliers indicate areas where one should focus. If there is a uniform pattern of complexity throughout a code base, then deciding where to make repairs becomes significantly more challenging. In this situation, other metrics, such as CM delta rates, can help indicate areas of needed attention.
Interestingly enough, there are multiple practices for discovering outliers. Tools like PMD report on individual methods' cyclomatic complexities. Other tools may report an object's aggregate cyclomatic complexity, or even a package's collective value. Unfortunately, aggregating cyclomatic complexity is rarely useful. As every concrete method found in an object will yield a minimum value of 1, any object following a JavaBean-like pattern (entity beans, DAOs, value objects, etc.) may produce a high cyclomatic complexity value. Large classes containing many methods may also produce elevated cyclomatic complexity scores.

Taking Action

When the determination has been made that a class is, in fact, complex, there are two steps available to mitigate the associated risk: unit testing, followed by refactoring.
As the cyclomatic complexity of a method is fundamentally the number of paths contained in that method, a general rule of thumb states that in order to ensure a high level of test coverage, the number of test cases for a method should equal the method's cyclomatic complexity. Unfortunately, this bold statement often is ignored.
Indeed, writing 22 unique test cases for updateAnomaly, while a noble goal, will probably not happen. Time constraints, developer attention spans, caffeine deprivation, etc. all work against even the most well-intentioned quality plans. While 22 test cases for updateAnomaly may be unattainable, surely zero should also be cause for concern.
As before, common sense needs to be applied. A further examination of the code may reveal that five test cases can effectively exercise all paths of the code under focus. Conversely, it may be determined that most of the paths in the code under focus are rarely executed edge cases; therefore, fewer test cases may suffice (for now). At this point in the process, one's tolerance for risk must be applied.
If five test cases can indeed assist in lowering the risk of the method's complexity, the next question becomes, "What is the most effective method for building the associated test cases?" By far, the most useful manner is unit testing with a framework like JUnit. White-box unit testing is close to the code and ensures that the code under test fulfills its fundamental responsibilities at the lowest possible level. Unit testing is also quite easy as one can implement unit tests before and after actually writing the code. Higher-level testing usually involves other aspects of a system, such as other packages, objects, databases, containers, etc., that introduce dependencies and complexities that ultimately make it harder to effectively address low-level code details.

Refactoring

Unit testing helps mitigate risk by instilling developer confidence and facilitating rapid code change; however, unit testing will not make the code under test less complex. To simplify complex code, one must surgically remove the knots via refactoring. Be warned, however, that well-written unit tests are required before one attempts to refactor code.

0 comments:

Post a Comment