Saturday, November 19, 2011

Path and Branch Coverage

Path and Branch Coverage

Coverage has been around for a long time but is often misunderstood. In the realm of unit testing, it is critical to understand, especially if you want to have a solid unit testing strategy. So, let's start with the basics.
Code coverage is a way to measure the level of testing you've performed on your software. Gathering coverage metrics is a straightforward process: Instrument your code and run your tests against the instrumented version. This produces data showing what code you did—or, more importantly, did not—execute. Coverage is the perfect complement to unit testing: Unit tests tell you whether your code performed as expected, and code coverage tells you what remains to be tested.
Most developers understand this process and agree on its value proposition, and often target 100% coverage. Although 100% coverage is an admirable goal, 100% of the wrong type of coverage can lead to problems. A typical software development effort measures coverage in terms of the number of either statements or branches to be tested. Even with 100% statement or branch coverage, critical bugs still may be present in the logic of your code, leaving both developers and managers with a false sense of security.
How can 100% coverage be insufficient? Because statement and branch coverage do not tell you whether the logic in your code was executed. Statement and branch coverage are great for uncovering glaring problems found in unexecuted blocks of code, but they often miss bugs related to both decision structures and decision interactions. Path coverage, on the other hand, is a more robust and comprehensive technique that helps reveal defects early.
Before you learn about path coverage, look at some of the problems with statement and branch coverage.

Statement Coverage

Statement coverage identifies which statements in a method or class have been executed. It is a simple metric to calculate, and a number of open source products exist that measure this level of coverage. Ultimately, the benefit of statement coverage is its ability to identify which blocks of code have not been executed. The problem with statement coverage, however, is that it does not identify bugs that arise from the control flow constructs in your source code, such as compound conditions or consecutive switch labels. This means that you easily can get 100% coverage and still have glaring, uncaught bugs.
The following example demonstrates this. Here, the returnInput() method is made up of seven statements and has a simple requirement: Its output should equal its input.


Next, you can create one JUnit test case that satisfies the requirement and gets 100% statement coverage.

There's an obvious bug in returnInput(). If the first or second decision evaluates true and the other evaluates false, the return value will not equal the method's input. An astute software developer will notice this right away, but the statement coverage report shows 100% coverage. If a manager sees 100% coverage, he or she may get a false sense of security, decide that testing is complete, and release the buggy code into production.
Recognizing that statement coverage may not fit the bill, the developer decides to move on to a better testing technique: branch coverage.

Branch Coverage

A branch is the outcome of a decision, so branch coverage simply measures which decision outcomes have been tested. This sounds great because it takes a more in-depth view of the source code than simple statement coverage, but branch coverage can also leave you wanting more.
Determining the number of branches in a method is easy. Boolean decisions obviously have two outcomes, true and false, whereas switches have one outcome for each case—and don't forget the default case! The total number of decision outcomes in a method is therefore equal to the number of branches that need to be covered plus the entry branch in the method (after all, even methods with straight line code have one branch).
In the example above, returnInput() has seven branches—three true, three false, and one invisible branch for the method entry. You can cover the six true and false branches with two test cases:


Both tests verify the requirement (output equals input) and they generate 100% branch coverage. But, even with 100% branch coverage, the tests missed finding the bug. And again, the manager may believe that testing is complete and that this method is ready for production.
A savvy developer recognizes that you're missing some of the possible paths through the method under test. The example above hasn't tested the TRUE-FALSE-TRUE or FALSE-TRUE-TRUE paths, and you can check those by adding two more tests.
There are only three decisions in this method, so testing all eight possible paths is easy. For methods that contain more decisions, though, the number of possible paths increases exponentially. For example, a method with only ten Boolean decisions has 1024 possible paths. Good luck with that one!
So, achieving 100% statement and 100% branch coverage may not be adequate, and testing every possible path exhaustively is probably not feasible for a complex method either. What's the alternative? Enter basis path coverage.

Basis Path Coverage

A path represents the flow of execution from the start of a method to its exit. A method with N decisions has 2^N possible paths, and if the method contains a loop, it may have an infinite number of paths. Fortunately, you can use a metric called cyclomatic complexity to reduce the number of paths you need to test.
The cyclomatic complexity of a method is one plus the number of unique decisions in the method. Cyclomatic complexity helps you define the number of linearly independent paths, called the basis set, through a method. The definition of linear independence is beyond the scope of this article, but, in summary, the basis set is the smallest set of paths that can be combined to create every other possible path through a method.
Like branch coverage, testing the basis set of paths ensures that you test every decision outcome, but, unlike branch coverage, basis path coverage ensures that you test all decision outcomes independently of one another. In other words, each new basis path "flips" exactly one previously executed decision, leaving all other executed branches unchanged. This is the crucial factor that makes basis path coverage more robust than branch coverage, and allows you to see how changing that one decision affects the method's behavior.
I'll use the same example to demonstrate.

To achieve 100% basis path coverage, you need to define your basis set. The cyclomatic complexity of this method is four (one plus the number of decisions), so you need to define four linearly independent paths. To do this, you pick an arbitrary first path as a baseline, and then flip decisions one at a time until you have your basis set.
Path 1: Any path will do for your baseline, so pick true for the decisions' outcomes (represented as TTT). This is the first path in your basis set.
Path 2: To find the next basis path, flip the first decision (only) in your baseline, giving you FTT for your desired decision outcomes.
Path 3: You flip the second decision in your baseline path, giving you TFT for your third basis path. In this case, the first baseline decision remains fixed with the true outcome.
Path 4 : Finally, you flip the third decision in your baseline path, giving you TTF for your fourth basis path. In this case, the first baseline decision remains fixed with the true outcome.
So, your four basis paths are TTT, FTT, TFT, and TTF. Now, make up your tests and see what happens.
In the attached code, you can see that testReturnInputIntBooleanBooleanBooleanTFT() and testReturnInputIntBooleanBooleanBooleanFTT() found the bug that was missed by your statement and branch coverage efforts. Further, the number of basis paths grows linearly with the number of decisions, not exponentially, keeping the number of required tests on par with the number required to achieve full branch coverage. If fact, because basis path testing covers all statements and branches in a method, it effectively subsumes branch and statement coverage.
But, why didn't you test that the other potential paths? Remember, the goal of basis path testing is to test all decision outcomes independently of one another. Testing the four basis paths achieves this goal, making the other paths extraneous. If you had started with FFF as your baseline path, you'd wind up with the basis set of (FFF, TFF, FTF, FFT) making the TTT path extraneous. Both basis sets are equally valid, and either satisfies your independent decision outcome criterion.

Creating Test Data

Achieving 100% basis path coverage is easy in this example, but fully testing a basis set of paths in the real world will be more challenging, even impossible. Because basis path coverage tests the interaction between decisions in a method, you need to use test data that causes execution of a specific path, not just a single decision outcome, as is necessary with branch coverage. Injecting data to force execution down a specific path is difficult, but there are a few coding practices that you can keep in mind to make the testing process easier.
  1. Keep your code simple. Avoid methods with cyclomatic complexity greater than ten. Not only does this reduce the number of basis paths that you need to test, but it reduces the number of decisions along each path.
  2. Avoid duplicate decisions.
  3. Avoid data dependencies.
Consider the following example:

The variable x depends indirectly on the object1 parameter, but the intervening code makes it difficult to see the relationship. As a method grows more complex, it may be nearly impossible to see the relationship between the method's input and the decision expression.

Read more »

Code Coverage

Introduction

Code coverage analysis is the process of:
  • Finding areas of a program not exercised by a set of test cases,
  • Creating additional test cases to increase coverage, and
  • Determining a quantitative measure of code coverage, which is an indirect measure of quality.
An optional aspect of code coverage analysis is:
  • Identifying redundant test cases that do not increase coverage.
A code coverage analyzer automates this process.
You use coverage analysis to assure quality of your set of tests, not the quality of the actual product. You do not generally use a coverage analyzer when running your set of tests through your release candidate. Coverage analysis requires access to test program source code and often requires recompiling it with a special command.
This paper discusses the details you should consider when planning to add coverage analysis to your test plan. Coverage analysis has certain strengths and weaknesses. You must choose from a range of measurement methods. You should establish a minimum percentage of coverage, to determine when to stop analyzing coverage. Coverage analysis is one of many testing techniques; you should not rely on it alone.
Code coverage analysis is sometimes called test coverage analysis. The two terms are synonymous. The academic world more often uses the term "test coverage" while practitioners more often use "code coverage". Likewise, a coverage analyzer is sometimes called a coverage monitor. I prefer the practitioner terms.

Structural Testing and Functional Testing

Code coverage analysis is a structural testing technique (AKA glass box testing and white box testing). Structural testing compares test program behavior against the apparent intention of the source code. This contrasts with functional testing (AKA black-box testing), which compares test program behavior against a requirements specification. Structural testing examines how the program works, taking into account possible pitfalls in the structure and logic. Functional testing examines what the program accomplishes, without regard to how it works internally.
Structural testing is also called path testing since you choose test cases that cause paths to be taken through the structure of the program. Do not confuse path testing with the path coverage metric, explained later.
At first glance, structural testing seems unsafe. Structural testing cannot find errors of omission. However, requirements specifications sometimes do not exist, and are rarely complete. This is especially true near the end of the product development time line when the requirements specification is updated less frequently and the product itself begins to take over the role of the specification. The difference between functional and structural testing blurs near release time.

The Premise

The basic assumptions behind coverage analysis tell us about the strengths and limitations of this testing technique. Some fundamental assumptions are listed below.
  • Bugs relate to control flow and you can expose Bugs by varying the control flow [Beizer1990 p.60]. For example, a programmer wrote "if (c)" rather than "if (!c)".
  • You can look for failures without knowing what failures might occur and all tests are reliable, in that successful test runs imply program correctness [Morell1990]. The tester understands what a correct version of the program would do and can identify differences from the correct behavior.
  • Other assumptions include achievable specifications, no errors of omission, and no unreachable code.
Clearly, these assumptions do not always hold. Coverage analysis exposes some plausible bugs but does not come close to exposing all classes of bugs. Coverage analysis provides more benefit when applied to an application that makes a lot of decisions rather than data-centric applications, such as a database application.

Basic Metrics

A large variety of coverage metrics exist. This section contains a summary of some fundamental metrics and their strengths, weaknesses and issues.
The U.S. Department of Transportation Federal Aviation Administration (FAA) has formal requirements for structural coverage in the certification of safety-critical airborne systems [DO-178B]. Few other organizations have such requirements, so the FAA is influential in the definitions of these metrics.

Statement Coverage

This metric reports whether each executable statement is encountered. Declarative statements that generate executable code are considered executable statements. Control-flow statements, such as if, for, and switch are covered if the expression controlling the flow is covered as well as all the contained statements. Implicit statements, such as an omitted return, are not subject to statement coverage.
Also known as: line coverage, segment coverage [Ntafos1988], C1 [Beizer1990 p.75] and basic block coverage. Basic block coverage is the same as statement coverage except the unit of code measured is each sequence of non-branching statements.
I highly discourage using the non-descriptive name C1. People sometimes incorrectly use the name C1 to identify decision coverage. Therefore this term has become ambiguous.
The chief advantage of this metric is that it can be applied directly to object code and does not require processing source code. Performance profilers commonly implement this metric.
The chief disadvantage of statement coverage is that it is insensitive to some control structures. For example, consider the following C/C++ code fragment:
int* p = NULL;
if (condition)
    p = &variable;
*p = 123;
Without a test case that causes condition to evaluate false, statement coverage rates this code fully covered. In fact, if condition ever evaluates false, this code fails. This is the most serious shortcoming of statement coverage. If-statements are very common.
Statement coverage does not report whether loops reach their termination condition - only whether the loop body was executed. With C, C++, and Java, this limitation affects loops that contain break statements.
Since do-while loops always execute at least once, statement coverage considers them the same rank as non-branching statements.
Statement coverage is completely insensitive to the logical operators (|| and &&).
Statement coverage cannot distinguish consecutive switch labels.
Test cases generally correlate more to decisions than to statements. You probably would not have 10 separate test cases for a sequence of 10 non-branching statements; you would have only one test case. For example, consider an if-else statement containing one statement in the then-clause and 99 statements in the else-clause. After exercising one of the two possible paths, statement coverage gives extreme results: either 1% or 99% coverage. Basic block coverage eliminates this problem.
One argument in favor of statement coverage over other metrics is that bugs are evenly distributed through code; therefore the percentage of executable statements covered reflects the percentage of faults discovered. However, one of our fundamental assumptions is that faults are related to control flow, not computations. Additionally, we could reasonably expect that programmers strive for a relatively constant ratio of branches to statements.
In summary, this metric is affected more by computational statements than by decisions.

Decision Coverage

This metric reports whether Boolean expressions tested in control structures (such as the if-statement and while-statement) evaluated to both true and false. The entire Boolean expression is considered one true-or-false predicate regardless of whether it contains logical-and or logical-or operators. Additionally, this metric includes coverage of switch-statement cases, exception handlers, and all points of entry and exit. Constant expressions controlling the flow are ignored.
Also known as: branch coverage, all-edges coverage [Roper1994 p.58], C2 [Beizer1990 p.75], decision-decision-path testing [Roper1994 p.39]. I discourage using the non-descriptive name C2 because of the confusion with the term C1.
The FAA makes a distinction between branch coverage and decision coverage, with branch coverage weaker than decision coverage [SVTAS2007]. The FAA definition of a decision is, in part, "A Boolean expression composed of conditions and zero or more Boolean operators." So the FAA definition of decision coverage requires all Boolean expressions to evaluate to both true and false, even those that do not affect control flow. There is no precise definition of "Boolean expression." Some languages, especially C, allow mixing integer and Boolean expressions and do not require Boolean variables be declared as Boolean. The FAA suggests using context to identify Boolean expressions, including whether expressions are used as operands to Boolean operators or tested to control flow. The suggested definition of "Boolean operator" is a built-in (not user-defined) operator with operands and result of Boolean type. The logical-not operator is exempted due to its simplicity. The C conditional operator (?:) is considered a Boolean operator if all three operands are Boolean expressions.
This metric has the advantage of simplicity without the problems of statement coverage.
A disadvantage is that this metric ignores branches within Boolean expressions which occur due to short-circuit operators. For example, consider the following C/C++/Java code fragment:
if (condition1 && (condition2 || function1()))
    statement1;
else
    statement2;
This metric could consider the control structure completely exercised without a call to function1. The test expression is true when condition1 is true and condition2 is true, and the test expression is false when condition1 is false. In this instance, the short-circuit operators preclude a call to function1.
The FAA suggests that for the purposes of measuring decision coverage, the operands of short-circuit operators (including the C conditional operator) be interpreted as decisions [SVTAS2007].

Condition Coverage

Condition coverage reports the true or false outcome of each condition. A condition is an operand of a logical operator that does not contain logical operators. Condition coverage measures the conditions independently of each other.
This metric is similar to decision coverage but has better sensitivity to the control flow.
However, full condition coverage does not guarantee full decision coverage. For example, consider the following C++/Java fragment.
bool f(bool e) { return false; }
bool a[2] = { false, false };
if (f(a && b)) ...
if (a[int(a && b)]) ...
if ((a && b) ? false : false) ...
All three of the if-statements above branch false regardless of the values of a and b. However if you exercise this code with a and b having all possible combinations of values, condition coverage reports full coverage.

Multiple Condition Coverage

Multiple condition coverage reports whether every possible combination of conditions occurs. The test cases required for full multiple condition coverage of a decision are given by the logical operator truth table for the decision.
For languages with short circuit operators such as C, C++, and Java, an advantage of multiple condition coverage is that it requires very thorough testing. For these languages, multiple condition coverage is very similar to condition coverage.
A disadvantage of this metric is that it can be tedious to determine the minimum set of test cases required, especially for very complex Boolean expressions. An additional disadvantage of this metric is that the number of test cases required could vary substantially among conditions that have similar complexity. For example, consider the following two C/C++/Java conditions.
a && b && (c || (d && e))
((a || b) && (c || d)) && e
To achieve full multiple condition coverage, the first condition requires 6 test cases while the second requires 11. Both conditions have the same number of operands and operators. The test cases are listed below.
   a && b && (c || (d && e))
1. F    -     -     -    -
2. T    F     -     -    -
3. T    T     F     F    -
4. T    T     F     T    F
5. T    T     F     T    T
6. T    T     T     -    -

    ((a || b) && (c || d)) && e
 1.   F    F      -    -      -
 2.   F    T      F    F      -  
 3.   F    T      F    T      F
 4.   F    T      F    T      T
 5.   F    T      T    -      F
 6.   F    T      T    -      T
 7.   T    -      F    F      -
 8.   T    -      F    T      F
 9.   T    -      F    T      T
10.   T    -      T    -      F
11.   T    -      T    -      T
As with condition coverage, multiple condition coverage does not include decision coverage.
For languages without short circuit operators such as Visual Basic and Pascal, multiple condition coverage is effectively path coverage (described below) for logical expressions, with the same advantages and disadvantages. Consider the following Visual Basic code fragment.
If a And b Then
...
Multiple condition coverage requires four test cases, for each of the combinations of a and b both true and false. As with path coverage each additional logical operator doubles the number of test cases required.

Condition/Decision Coverage

Condition/Decision Coverage is a hybrid metric composed by the union of condition coverage and decision coverage.
It has the advantage of simplicity but without the shortcomings of its component metrics.
BullseyeCoverage measures condition/decision coverage.

Modified Condition/Decision Coverage

The formal definition of modified condition/decision coverage is:
Every point of entry and exit in the program has been invoked at least once, every condition in a decision has taken all possible outcomes at least once, every decision in the program has taken all possible outcomes at least once, and each condition in a decision has been shown to independently affect that decisions outcome. A condition is shown to independently affect a decisions outcome by varying just that condition while holding fixed all other possible conditions [DO-178B].
Also known as MC/DC and MCDC. This metric is stronger than condition/decision coverage, requiring more test cases for full coverage.
This metric is specified for safety critical aviation software by RCTA/DO-178B and has been the subject of much study, debate and clarification for many years. Two difficult issues with MCDC are:
  • short circuit operators
  • multiple occurrences of a condition
There are two competing ideas of how to handle short-circuit operators. One idea is to relax the requirement that conditions be held constant if those conditions are not evaluated due to a short-circuit operator [Chilenski1994]. The other is to consider the condition operands of short-circuit operators as separate decisions [DO-248B].
A condition may occur more than once in a decision. In the expression "A or (not A and B)", the conditions "A" and "not A" are coupled - they cannot be varied independently as required by the definition of MCDC. One approach to this dilemma, called Unique Cause MCDC, is to interpret the term "condition" to mean "uncoupled condition." Another approach, called Masking MCDC, is to permit more than one condition to vary at once, using an analysis of the logic of the decision to ensure that only the condition of interest influences the outcome.

Path Coverage

This metric reports whether each of the possible paths in each function have been followed. A path is a unique sequence of branches from the function entry to the exit.
Also known as predicate coverage. Predicate coverage views paths as possible combinations of logical conditions [Beizer1990 p.98].
Since loops introduce an unbounded number of paths, this metric considers only a limited number of looping possibilities. A large number of variations of this metric exist to cope with loops. Boundary-interior path testing considers two possibilities for loops: zero repetitions and more than zero repetitions [Ntafos1988]. For do-while loops, the two possibilities are one iteration and more than one iteration.
Path coverage has the advantage of requiring very thorough testing. Path coverage has two severe disadvantages. The first is that the number of paths is exponential to the number of branches. For example, a function containing 10 if-statements has 1024 paths to test. Adding just one more if-statement doubles the count to 2048. The second disadvantage is that many paths are impossible to exercise due to relationships of data. For example, consider the following C/C++ code fragment:
if (success)
    statement1;
statement2;
if (success)
    statement3;
Path coverage considers this fragment to contain 4 paths. In fact, only two are feasible: success=false and success=true.
Researchers have invented many variations of path coverage to deal with the large number of paths. For example, n-length sub-path coverage reports whether you exercised each path of length n branches. Basis path testing selects paths that achieve decision coverage, with each path containing at least one decision outcome differing from the other paths [Roper1994 p.48]. Others variations include linear code sequence and jump (LCSAJ) coverage and data flow coverage.

Other Metrics

Here is a description of some variations of the fundamental metrics and some less commonly use metrics.

Function Coverage

This metric reports whether you invoked each function or procedure. It is useful during preliminary testing to assure at least some coverage in all areas of the software. Broad, shallow testing finds gross deficiencies in a test suite quickly.
BullseyeCoverage measures function coverage.

Call Coverage

This metric reports whether you executed each function call. The hypothesis is that bugs commonly occur in interfaces between modules.
Also known as call pair coverage.

Linear Code Sequence and Jump (LCSAJ) Coverage

This variation of path coverage considers only sub-paths that can easily be represented in the program source code, without requiring a flow graph [Woodward1980]. An LCSAJ is a sequence of source code lines executed in sequence. This "linear" sequence can contain decisions as long as the control flow actually continues from one line to the next at run-time. Sub-paths are constructed by concatenating LCSAJs. Researchers refer to the coverage ratio of paths of length n LCSAJs as the test effectiveness ratio (TER) n+2.
The advantage of this metric is that it is more thorough than decision coverage yet avoids the exponential difficulty of path coverage. The disadvantage is that it does not avoid infeasible paths.

Data Flow Coverage

This variation of path coverage considers only the sub-paths from variable assignments to subsequent references of the variables.
The advantage of this metric is the paths reported have direct relevance to the way the program handles data. One disadvantage is that this metric does not include decision coverage. Another disadvantage is complexity. Researchers have proposed numerous variations, all of which increase the complexity of this metric. For example, variations distinguish between the use of a variable in a computation versus a use in a decision, and between local and global variables. As with data flow analysis for code optimization, pointers also present problems.

Object Code Branch Coverage

This metric reports whether each machine language conditional branch instruction both took the branch and fell through.
This metric gives results that depend on the compiler rather than on the program structure since compiler code generation and optimization techniques can create object code that bears little similarity to the original source code structure.
Since branches disrupt the instruction pipeline, compilers sometimes avoid generating a branch and instead generate an equivalent sequence of non-branching instructions. Compilers often expand the body of a function inline to save the cost of a function call. If such functions contain branches, the number of machine language branches increases dramatically relative to the original source code.
You are better off testing the original source code since it relates to program requirements better than the object code.

Loop Coverage

This metric reports whether you executed each loop body zero times, exactly once, and more than once (consecutively). For do-while loops, loop coverage reports whether you executed the body exactly once, and more than once.
The valuable aspect of this metric is determining whether while-loops and for-loops execute more than once, information not reported by other metrics.
As far as I know, only GCT implements this metric.

Race Coverage

This metric reports whether multiple threads execute the same code at the same time. It helps detect failure to synchronize access to resources. It is useful for testing multi-threaded programs such as in an operating system.
As far as I know, only GCT implements this metric.

Relational Operator Coverage

This metric reports whether boundary situations occur with relational operators (<, <=, >, >=). The hypothesis is that boundary test cases find off-by-one mistakes and uses of the wrong relational operators such as < instead of <=. For example, consider the following C/C++ code fragment:
if (a < b)
    statement;
Relational operator coverage reports whether the situation a==b occurs. If a==b occurs and the program behaves correctly, you can assume the relational operator is not suppose to be <=.
As far as I know, only GCT implements this metric.

Weak Mutation Coverage

This metric is similar to relational operator coverage but much more general [Howden1982]. It reports whether test cases occur which would expose the use of wrong operators and also wrong operands. It works by reporting coverage of conditions derived by substituting (mutating) the program's expressions with alternate operators, such as "-" substituted for "+", and with alternate variables substituted.
This metric interests the academic world mainly. Caveats are many; programs must meet special requirements to enable measurement.
As far as I know, only GCT implements this metric.

Table Coverage

This metric indicates whether each entry in a particular array has been referenced. This is useful for programs that are controlled by a finite state machine.

Comparing Metrics

You can compare relative strengths when a stronger metric includes a weaker metric.
Academia says the stronger metric subsumes the weaker metric.
Coverage metrics cannot be compared quantitatively.

Coverage Goal for Release

Each project must choose a minimum percent coverage for release criteria based on available testing resources and the importance of preventing post-release failures. Clearly, safety-critical software should have a high goal. You might set a higher coverage goal for unit testing than for system testing since a failure in lower-level code may affect multiple high-level callers.
Using statement coverage, decision coverage, or condition/decision coverage you generally want to attain 80%-90% coverage or more before releasing. Some people feel that setting any goal less than 100% coverage does not assure quality. However, you expend a lot of effort attaining coverage approaching 100%. The same effort might find more bugs in a different testing activity, such as formal technical review. Avoid setting a goal lower than 80%.

Intermediate Coverage Goals

Choosing good intermediate coverage goals can greatly increase testing productivity.
Your highest level of testing productivity occurs when you find the most failures with the least effort. Effort is measured by the time required to create test cases, add them to your test suite and run them. It follows that you should use a coverage analysis strategy that increases coverage as fast as possible. This gives you the greatest probability of finding failures sooner rather than later. Figure 1 illustrates the coverage rates for high and low productivity. Figure 2 shows the corresponding failure discovery rates.
Figure 1 and figure 2
One strategy that usually increases coverage quickly is to first attain some coverage throughout the entire test program before striving for high coverage in any particular area. By briefly visiting each of the test program features, you are likely to find obvious or gross failures early. For example, suppose your application prints several types of documents, and a bug exists which completely prevents printing one (and only one) of the document types. If you first try printing one document of each type, you probably find this bug sooner than if you thoroughly test each document type one at a time by printing many documents of that type before moving on to the next type. The idea is to first look for failures that are easily found by minimal testing.
The sequence of coverage goals listed below illustrates a possible implementation of this strategy.
  1. Invoke at least one function in 90% of the source files (or classes).
  2. Invoke 90% of the functions.
  3. Attain 90% condition/decision coverage in each function.
  4. Attain 100% condition/decision coverage.
Notice we do not require 100% coverage in any of the initial goals. This allows you to defer testing the most difficult areas. This is crucial to maintaining high testing productivity; achieve maximum results with minimum effort.
Avoid using a weaker metric for an intermediate goal combined with a stronger metric for your release goal. Effectively, this allows the weaknesses in the weaker metric to decide which test cases to defer. Instead, use the stronger metric for all goals and allow the difficulty of the individual test cases help you decide whether to defer them.

Read more »

Friday, November 18, 2011

White Box Testing Technique

Data-Flow Analysis

Data-flow analysis can be used to increase program understanding and to develop test cases based on data flow within the program. The data-flow testing technique is based on investigating the ways values are associated with variables and the ways that these associations affect the execution of the program. Data-flow analysis focuses on occurrences of variables, following paths from the definition (or initialization) of a variable to its uses. The variable values may be used for computing values for defining other variables or used as predicate variables to decide whether a predicate is true for traversing a specific execution path. A data-flow analysis for an entire program involving all variables and traversing all usage paths requires immense computational resources; however, this technique can be applied for select variables. The simplest approach is to validate the usage of select sets of variables by executing a path that starts with definition and ends at uses of the definition. The path and the usage of the data can help in identifying suspicious code blocks and in developing test cases to validate the runtime behavior of the software. For example, for a chosen data definition-to-use path, with well-crafted test data, testing can uncover time-of-check-to-time-of-use (TOCTTOU) flaws. The ”Security Testing” section in [Howard 02] explains the data mutation technique, which deals with perturbing environment data. The same technique can be applied to internal data as well, with the help of data-flow analysis.

Code-Based Fault Injection

The fault injection technique perturbs program states by injecting software source code to force changes into the state of the program as it executes. Instrumentation is the process of non-intrusively inserting code into the software that is being analyzed and then compiling and executing the modified (or instrumented) software. Assertions are added to the code to raise a flag when a violation condition is encountered. This form of testing measures how software behaves when it is forced into anomalous circumstances. Basically this technique forces non-normative behavior of the software, and the resulting understanding can help determine whether a program has vulnerabilities that can lead to security violations. This technique can be used to force error conditions to exercise the error handling code, change execution paths, input unexpected (or abnormal) data, change return values, etc. In [Thompson 02], runtime fault injection is explained and advocated over code-based fault injection methods. One of the drawbacks of code based methods listed in the book is the lack of access to source code. However, in this content area, the assumptions are that source code is available and that the testers have the knowledge and expertise to understand the code for security implications. Refer to [Voas 98] for a detailed understanding of software fault injection concepts, methods, and tools.

Abuse Cases

Abuse cases help security testers view the software under test in the same light as attackers do. Abuse cases capture the non-normative behavior of the system. While in [McGraw 04c] abuse cases are described more as a design analysis technique than as a white box testing technique, the same technique can be used to develop innovative and effective test cases mirroring the way attackers would view the system. With access to the source code, a tester is in a better position to quickly see where the weak spots are compared to an outside attacker. The abuse case can also be applied to interactions between components within the system to capture abnormal behavior, should a component misbehave. The technique can also be used to validate design decisions and assumptions. The simplest, most practical method for creating abuse cases is usually through a process of informed brainstorming, involving security, reliability, and subject matter expertise. Known attack patterns form a rich source for developing abuse cases.

Trust Boundaries Mapping

Defining zones of varying trust in an application helps identify vulnerable areas of communication and possible attack paths for security violations. Certain components of a system have trust relationships (sometimes implicit, sometime explicit) with other parts of the system. Some of these trust relationships offer ”trust elevation” possibilities—that is, these components can escalate trust privileges of a user when data or control flow cross internal boundaries from a region of less trust to a region of more trust [Hoglund 04]. For systems that have n-tier architecture or that rely on several third-party components, the potential for missing trust validation checks is high, so drawing trust boundaries becomes critical for such systems. Drawing clear boundaries of trust on component interactions and identifying data validation points (or chokepoints, as described in [Howard 02]) helps in validating those chokepoints and testing some of the design assumptions behind trust relationships. Combining trust zone mapping with data-flow analysis helps identify data that move from one trust zone to another and whether data checkpoints are sufficient to prevent trust elevation possibilities. This insight can be used to create effective test cases.

Code Coverage Analysis

Code coverage is an important type of test effectiveness measurement. Code coverage is a way of determining which code statements or paths have been exercised during testing. With respect to testing, coverage analysis helps in identifying areas of code not exercised by a set of test cases. Alternatively, coverage analysis can also help in identifying redundant test cases that do not increase coverage. During ad hoc testing (testing performed without adhering to any specific test approach or process), coverage analysis can greatly reduce the time to determine the code paths exercised and thus improve understanding of code behavior. There are various measures for coverage, such as path coverage, path testing, statement coverage, multiple condition coverage, and function coverage. When planning to use coverage analysis, establish the coverage measure and the minimum percentage of coverage required. Many tools are available for code coverage analysis. It is important to note that coverage analysis should be used to measure test coverage and should not be used to create tests. After performing coverage analysis, if certain code paths or statements were found to be not covered by the tests, the questions to ask are whether the code path should be covered and why the tests missed those paths. A risk-based approach should be employed to decide whether additional tests are required. Covering all the code paths or statements does not guarantee that the software does not have faults; however, the missed code paths or statements should definitely be inspected. One obvious risk is that unexercised code will include Trojan horse functionality, whereby seemingly innocuous code can carry out an attack. Less obvious (but more pervasive) is the risk that unexercised code has serious bugs that can be leveraged into a successful attack [McGraw 02].

Classes of Tests

Creating security tests other than ones that directly map to security specifications is challenging, especially tests that intend to exercise the non-normative or non-functional behavior of the system. When creating such tests, it is helpful to view the software under test from multiple angles, including the data the system is handling, the environment the system will be operating in, the users of the software (including software components), the options available to configure the system, and the error handling behavior of the system. There is an obvious interaction and overlap between the different views; however, treating each one with specific focus provides a unique perspective that is very helpful in developing effective tests.

Data

All input data should be untrusted until proven otherwise, and all data must be validated as it crosses the boundary between trusted and untrusted environments [Howard 02]. Data sensitivity/criticality plays a big role in data-based testing; however, this does not imply that other data can be ignored—non-sensitive data could allow a hacker to control a system. When creating tests, it is important to test and observe the validity of data at different points in the software. Tests based on data and data flow should explore incorrectly formed data and stressing the size of the data. The section ”Attacking with Data Mutation” in [Howard 02] describes different properties of data and how to mutate data based on given properties. To understand different attack patterns relevant to program input, refer to chapter six, ”Crafting (Malicious) Input,” in [Hoglund 04]. Tests should validate data from all channels, including web inputs, databases, networks, file systems, and environment variables. Risk analysis should guide the selection of tests and the data set to be exercised.

Fuzzing

Although normally associated exclusively with black box security testing, fuzzing can also provide value in a white box testing program. Specifically, [Howard 06] introduced the concept of “smart fuzzing.” Indeed, a rigorous testing program involving smart fuzzing can be quite similar to the sorts of data testing scenarios presented above and can produce useful and meaningful results as well. [Howard 06] claims that Microsoft finds some 25-25 percent of the bugs in their code via fuzzing techniques. Although much of that is no doubt “dumb” fuzzing in black box tests, “smart” fuzzing should also be strongly considered in a white box testing program.

Environment

Software can only be considered secure if it behaves securely under all operating environments. The environment includes other systems, users, hardware, resources, networks, etc. A common cause of software field failure is miscommunication between the software and its environment [Whittaker 02]. Understanding the environment in which the software operates, and the interactions between the software and its environment, helps in uncovering vulnerable areas of the system. Understanding dependency on external resources (memory, network bandwidth, databases, etc.) helps in exploring the behavior of the software under different stress conditions. Another common source of input to programs is environment variables. If the environment variables can be manipulated, then they can have security implications. Similar conditions occur for registry information, configuration files, and property files. In general, analyzing entities outside the direct control of the system provides good insights in developing tests to ensure the robustness of the software under test, given the dependencies.

Component Interfaces

Applications usually communicate with other software systems. Within an application, components interface with each other to provide services and exchange data. Common causes of failure at interfaces are misunderstanding of data usage, data lengths, data validation, assumptions, trust relationships, etc. Understanding the interfaces exposed by components is essential in exposing security bugs hidden in the interactions between components. The need for such understanding and testing becomes paramount when third-party software is used or when the source code is not available for a particular component. Another important benefit of understanding component interfaces is validation of principles of compartmentalization. The basic idea behind compartmentalization is to minimize the amount of damage that can be done to a system by breaking up the system into as few units as possible while still isolating code that has security privileges [McGraw 02]. Test cases can be developed to validate compartmentalization and to explore failure behavior of components in the event of security violations and how the failure affects other components.

Configuration

In many cases, software comes with various parameters set by default, possibly with no regard for security. Often, functional testing is performed only with the default settings, thus leaving sections of code related to non-default settings untested. Two main concerns with configuration parameters with respect to security are storing sensitive data in configuration files and configuration parameters changing the flow of execution paths. For example, user privileges, user roles, or user passwords are stored in the configuration files, which could be manipulated to elevate privilege, change roles, or access the system as a valid user. Configuration settings that change the path of execution could exercise vulnerable code sections that were not developed with security in mind. The change of flow also applies to cases where the settings are changed from one security level to another, where the code sections are developed with security in mind. For example, changing an endpoint from requiring authentication to not requiring authentication means the endpoint can be accessed by everyone. When a system has multiple configurable options, testing all combinations of configuration can be time consuming; however, with access to source code, a risk-based approach can help in selecting combinations that have higher probability in exposing security violations. In addition, coverage analysis should aid in determining gaps in test coverage of code paths.

Error handling

The most neglected code paths during the testing process are error handling routines. Error handling in this paper includes exception handling, error recovery, and fault tolerance routines. Functionality tests are normally geared towards validating requirements, which generally do not describe negative (or error) scenarios. Even when negative functional tests are created, they don’t test for non-normative behavior or extreme error conditions, which can have security implications. For example, functional stress testing is not performed with an objective to break the system to expose security vulnerability. Validating the error handling behavior of the system is critical during security testing, especially subjecting the system to unusual and unexpected error conditions. Unusual errors are those that have a low probability of occurrence during normal usage. Unexpected errors are those that are not explicitly specified in the design specification, and the developers did not think of handling the error. For example, a system call may throw an ”unable to load library” error, which may not be explicitly listed in the design documentation as an error to be handled. All aspects of error handling should be verified and validated, including error propagation, error observability, and error recovery. Error propagation is how the errors are propagated through the call chain. Error observability is how the error is identified and what parameters are passed as error messages. Error recovery is getting back to a state conforming to specifications. For example, return codes for errors may not be checked, leading to uninitialized variables and garbage data in buffers; if the memory is manipulated before causing a failure, the uninitialized memory may contain attacker-supplied data. Another common mistake to look for is when sensitive information is included as part of the error messages.


Read more »

Ad-hoc Testing

This type of testing is done without any formal Test Plan or Test Case creation. Ad-hoc testing helps in deciding the scope and duration of the various other testing and it also helps testers in learning the application prior starting with any other testing. It is the least formal method of testing.


One of the best uses of ad hoc testing is for discovery. Reading the requirements or specifications (if they exist) rarely gives you a good sense of how a program actually behaves. Even the user documentation may not capture the “look and feel” of a program. Ad hoc testing can find holes in your test strategy, and can expose relationships between subsystems that would otherwise not be apparent. In this way, it serves as a tool for checking the completeness of your testing. Missing cases can be found and added to your testing arsenal. Finding new tests in this way can also be a sign that you should perform root cause analysis.
Ask yourself or your test team, “What other tests of this class should we be running?” Defects found while doing ad hoc testing are often examples of entire classes of forgotten test cases. Another use for ad hoc testing is to determine the priorities for your other testing activities. In our example program, Panorama may allow the user to sort photographs that are being displayed. If ad hoc testing shows this to work well, the formal testing of this feature might be deferred until the problematic areas are completed. On the other hand, if ad hoc testing of this sorting photograph feature uncovers problems, then the formal testing might receive a higher priority.

Read more »

Beta Testing

A product's beta is an officially released version of a product which includes most of the product's functionality. The beta version is intended for external testing of the product in order to identify configurations that cause problems, as well as collect requirements and suggestions from users.
Before its official release, a beta version ALWAYS undergoes a full cycle of internal testing, after which the application is sufficiently stable in the majority of computing environments.
A release notes file is supplied with each beta version. Release notes provide the following information:
  • the exact version number,
  • system and technical requirements for the equipment used for testing,
  • the list of changes since the previous version, and,
  • descriptions of known problems (if any) and other relevant information.
Please note that a beta version is NOT the final version of the product and therefore the developer does not guarantee an absence of errors that may disrupt the computer's operation and/or result in data loss.
Consequently, beta testers use the beta version at their own risk and Kaspersky Lab bears no responsibility for any consequences arising out of the use of the beta version.

Participating in beta testing enables you to:

  • be among the first to gain access to the latest versions of Kaspersky Lab solutions and share your opinion with us;
  • help Kaspersky Lab improve the quality of the product being tested;
  • provide your suggestions on possible ways of improving the product;
  • free technical support;
  • collaborate directly with developers and other beta testers using dedicated sections of our forum; and,
  • receive free versions of the product, which are awarded to the most active beta testers.

When taking part in the beta testing it is necessary to:

  • download and install the product you are interested in testing;
  • spend a certain amount of time on familiarizing yourself with the product and testing it;
  • prepare and send out bug reports on any errors found;
  • provide suggestions on ways to improve the product being tested; and,
  • report on compatibility issues (specifically related to your configuration).

Reporting problems:

  • To report an error, please provide a detailed description of the ways in which it manifests itself on your system, the steps which lead up to the error and characteristics of the hardware used for testing. Send your description to the email address specified on the page containing the description of the beta version you are testing.
  • You can also use this procedure to provide your suggestions on improving the product.

Read more »

Alpha Testing

Before any software product can be released it must be tested. Typically a formal test strategy is planned and executed on the software before it can be considered for release. Often after the formal phases of testing have been completed, additional testing is performed called Alpha and Beta testing.
Alpha testing is done before the software is made available to the general public. Typically, the developers will perform the Alpha testing using white box testing techniques. Subsequent black box and grey box techniques will be carried out afterwards. The focus is on simulating real users by using these techniques and carrying out tasks and operations that a typical user might perform. Normally, the actual Alpha testing itself will be carried out in a lab type environment and not in the usual workplaces. Once these techniques have been satisfactorily completed, the Alpha testing is considered to be complete.
The next phase of testing is known as Beta testing. Unlike Alpha testing, people outside of the company are included to perform the testing. As the aim is to perform a sanity check before the products release, there may be defects found during this stage, so the distribution of the software is limited to a selection of users outside of the company. Typically, outsourced testing companies are used as their feedback is independent and from a different perspective than that of the software development company employees. The feedback can be used to fix defects that were missed, assist in preparing support teams for expected issues or in some cases even enforce last minute changes to functionality.
In some cases, the Beta version of software will be made available to the general public. This can give vital 'real-world' information for software/systems that rely on acceptable performance and load to function correctly.
The types of techniques used during a public Beta test are typically restricted to Black box techniques. This is due to the fact that the general public does not have inside knowledge of the software code under test, and secondly the aim of the Beta test is often to gain a sanity check, and also to retrieve future customer feedback from how the product will be used in the real world.
Various sectors of the public are often eager to take part in Beta testing, as it can give them the opportunity to see and use products before their public release. Many companies use this phase of testing to assist with marketing their product. For example, Beta versions of a software application get people using the product and talking about it which (if the application is any good) builds hype and pre-orders before its public release.

Read more »

Thursday, November 17, 2011

Entry and Exit Criteria

The Entrance Criteria specified by the system test controller, should be fulfilled before System Test can commence. In the event, that any criterion has not been achieved, the System Test may commence if Business Team and Test Controller are in full agreement that the risk is manageable.

* All developed code must be unit tested. Unit and Link Testing must be completed and signed off by development team.
* System Test plans must be signed off by Business Analyst and Test Controller.
* All human resources must be assigned and in place.
* All test hardware and environments must be in place, and free for System test use.
* The Acceptance Tests must be completed, with a pass rate of not less than 80%.

The Exit Criteria detailed below must be achieved before the Phase 1 software can be recommended for promotion to Operations Acceptance status. Furthermore, I recommend that there be a minimum 2 days effort Final Integration testing AFTER the final fix/change has been retested.

* All High Priority errors from System Test must be fixed and tested
* If any medium or low-priority errors are outstanding - the implementation risk must be signed off as acceptable by Business Analyst and Business Expert
* Project Integration Test must be signed off by Test Controller and Business Analyst.
* Business Acceptance Test must be signed off by Business Expert.

Read more »

Wednesday, November 16, 2011

Visual testing

Visual testing

The aim of visual testing is to provide developers with the ability to examine what was happening at the point of software failure by presenting the data in such a way that the developer can easily find the information he requires, and the information is expressed clearly.
At the core of visual testing is the idea that showing someone a problem (or a test failure), rather than just describing it, greatly increases clarity and understanding. Visual testing therefore requires the recording of the entire test process – capturing everything that occurs on the test system in video format. Output videos are supplemented by real-time tester input via picture-in-a-picture webcam and audio commentary from microphones.
Visual testing provides a number of advantages. The quality of communication is increased dramatically because testers can show the problem (and the events leading up to it) to the developer as opposed to just describing it and the need to replicate test failures will cease to exist in many cases. The developer will have all the evidence he requires of a test failure and can instead focus on the cause of the fault and how it should be fixed.
Visual testing is particularly well-suited for environments that deploy agile methods in their development of software, since agile methods require greater communication between testers and developers and collaboration within small teams.
Ad hoc testing and exploratory testing are important methodologies for checking software integrity, because they require less preparation time to implement, whilst important bugs can be found quickly. In ad hoc testing, where testing takes place in an improvised, impromptu way, the ability of a test tool to visually record everything that occurs on a system becomes very important.
Visual testing is gathering recognition in customer acceptance and usability testing, because the test can be used by many individuals involved in the development process.For the customer, it becomes easy to provide detailed bug reports and feedback, and for program users, visual testing can record user actions on screen, as well as their voice and image, to provide a complete picture at the time of software failure for the developer.

Read more »

Tuesday, November 15, 2011

Cyclomatic Complexity

Software metrics often receive negative criticism, as they are viewed as an exact science, uniformly applicable to all scenarios. True, software metrics are an unbiased, objective measurement of a particular aspect of code; however, a particular metric's applicability to a domain is usually subjective. For instance, highly coupled code may have been intentionally designed that way for performance reasons; consequently, a coupling metric that suggests problems with this code must be evaluated in the context of the overall application.
With this in mind, applying various software metrics to a code base can be an effective overall gauge of software quality. One such metric, cyclomatic complexity, can be helpful in ascertaining areas of code that may require additional attention to head off future maintenance issues. That attention, moreover, can take the form of unit testing and refactoring.
Pioneered in the 1970s by Thomas McCabe of McCabe & Associates fame, cyclomatic complexity essentially represents the number of paths through a particular section of code, which in object-oriented languages applies to methods. Cyclomatic complexity's formal academic equation from graph theory is as follows:
CC = E - N + P
where E represents the number of edges on a graph, N the number of nodes, and P the number of connected components.
If you've already given up on software metrics after reading that equation, there is an easier equation that will make sense. Cyclomatic complexity can be explained in layman's terms as follows: every decision point in a method (i.e., an if, for, while, or case statement) is counted; additionally, one is added for the method's entry point, resulting in an integer-based measurement denoting a method's complexity.
For instance, the following uncomplicated method will yield a cyclomatic complexity value of 3.
public int getValue(int param1) {
  int value = 0;
  if (param1 == 0)  {
    value = 4;
  } else {
    value = 0;
  }
  return value;      
}
In the above method, there are two decision points: an if and an else. Remembering that a method's entry point automatically adds one, the final value equals 3.
Indeed, the example method is quite simple; however, one can imagine that if there were 10 additional decision points (yielding a cyclomatic complexity value of 13), the method may be perceived as complex. While one's opinion as to what construes code complexity is quite subjective, over the years the software industry has largely agreed that highly complex code can be difficult for engineers to understand and therefore harder to maintain. Moreover, highly complex code has a high probability of containing defects.
Various authors and studies have, in fact, suggested that a cyclomatic complexity value of 10 or higher for a particular method is considered complex. This is the key point. By determining the cyclomatic complexity of various methods found in objects and paying attention to outlier values, one can uncover code that most likely should become a candidate for a highly focused unit testing/refactoring effort.

Determining Cyclomatic Complexity

There are various commercial and open source software metrics tools on the market that can examine source code and report on the gamut of software metrics. One such tool, which happens to be open source, is PMD.
Running PMD against a code base is quite easy (see this article by Tom Copeland). Upon examining the code, PMD will produce a report in various forms, such as XML and HTML, that describes the various customizable infractions found.
For example, running PMD against a code base may produce a report that contains the following snippet of XML:
<file name="C:\com\dgg\web\AnomalyAction.java">
<violation line="1" 
           rule="CouplingBetweenObjectsRule">
A value of 27 may denote a high amount of coupling 
within the class
</violation>
<violation line="103" 
           rule="CyclomaticComplexityRule">
The method 'updateAnomaly' has a 
Cyclomatic Complexity of 22.
</violation>
</file>
As the XML demonstrates, AnomalyAction.java has a method, updateAnomaly, which has a rather high value of cyclomatic complexity. Interestingly enough, this class also appears to be quite coupled to various other objects within the code base, which only serves to complicate the code further. After viewing this report, red flags should start popping up, as clearly the AnomalyAction object's quality and relative stability have been called into question.
Software metrics, however, need to be evaluated in the context of the application against which they are being applied. While some red flags have now been raised, are there others? Reading the entire report may reveal even more egregiously high values of cyclomatic complexity. Additionally, it could turn out that every class examined possesses a high cyclomatic complexity value. The key to effective utilization of cyclomatic complexity and other metrics is to look for the outliers; i.e., those whose complexity is much higher than the rest of the code.
The outliers indicate areas where one should focus. If there is a uniform pattern of complexity throughout a code base, then deciding where to make repairs becomes significantly more challenging. In this situation, other metrics, such as CM delta rates, can help indicate areas of needed attention.
Interestingly enough, there are multiple practices for discovering outliers. Tools like PMD report on individual methods' cyclomatic complexities. Other tools may report an object's aggregate cyclomatic complexity, or even a package's collective value. Unfortunately, aggregating cyclomatic complexity is rarely useful. As every concrete method found in an object will yield a minimum value of 1, any object following a JavaBean-like pattern (entity beans, DAOs, value objects, etc.) may produce a high cyclomatic complexity value. Large classes containing many methods may also produce elevated cyclomatic complexity scores.

Taking Action

When the determination has been made that a class is, in fact, complex, there are two steps available to mitigate the associated risk: unit testing, followed by refactoring.
As the cyclomatic complexity of a method is fundamentally the number of paths contained in that method, a general rule of thumb states that in order to ensure a high level of test coverage, the number of test cases for a method should equal the method's cyclomatic complexity. Unfortunately, this bold statement often is ignored.
Indeed, writing 22 unique test cases for updateAnomaly, while a noble goal, will probably not happen. Time constraints, developer attention spans, caffeine deprivation, etc. all work against even the most well-intentioned quality plans. While 22 test cases for updateAnomaly may be unattainable, surely zero should also be cause for concern.
As before, common sense needs to be applied. A further examination of the code may reveal that five test cases can effectively exercise all paths of the code under focus. Conversely, it may be determined that most of the paths in the code under focus are rarely executed edge cases; therefore, fewer test cases may suffice (for now). At this point in the process, one's tolerance for risk must be applied.
If five test cases can indeed assist in lowering the risk of the method's complexity, the next question becomes, "What is the most effective method for building the associated test cases?" By far, the most useful manner is unit testing with a framework like JUnit. White-box unit testing is close to the code and ensures that the code under test fulfills its fundamental responsibilities at the lowest possible level. Unit testing is also quite easy as one can implement unit tests before and after actually writing the code. Higher-level testing usually involves other aspects of a system, such as other packages, objects, databases, containers, etc., that introduce dependencies and complexities that ultimately make it harder to effectively address low-level code details.

Refactoring

Unit testing helps mitigate risk by instilling developer confidence and facilitating rapid code change; however, unit testing will not make the code under test less complex. To simplify complex code, one must surgically remove the knots via refactoring. Be warned, however, that well-written unit tests are required before one attempts to refactor code.

Read more »

Error Guessing

Error guessing is testing technique which used for creating test cases used to find bugs. Error guessing technique is useful if the resource that will be creating the test cases will be having prior knowledge about testing or about the business or about the application. In error guessing tester uses intuition, instincts to find out in what all situation a software might fail. It is an ad hoc method to identify bugs depending upon the tester’s intuitiveness and gut feeling. There are no specific tools and software available. Error guessing has no explicit rules for testing.

Read more »

Boundary Value Analysis

Boundary Value Analysis is a Blackbox Testing Technique. It makes use of the fact that the inputs and outputs of the component under test can be partitioned into ordered sets with identifiable boundaries. Values in the same set will be treated in the same way. Test values are chosen that are just inside, on and just outside the boundaries.

For example, suppose an application collects some data about a traveller using the dialog box shown in the diagram. When the OK button is pressed the component calculates a fare from the current location using the input values.

The Discount Table

Age
Discount
0-4 years
100%
5-15 years
50%
16-64 years
0%
64 years and older
25%
There is a standard fare to each destination. Our travel service offers discounts to travellers based on their age. For example, children under 5 travel free and those over 65 get a 25% discount.
We can use boundary value analysis testing on the age field of our fare calculation component. We can partition the age input data into ordered sets using the data the Discount table. This gives us the following sets:
  • 0, 1, 2, 3, 4.
  • 5, 6, 7, …15.
  • 16, 17, 18, …64.
  • 65, 66, 67, …120.
  • Ages greater than 120 years.
It is often useful to draw these partitions as shown below.
 
Boundary value analysis requires us to identify the set boundaries. In our example, the boundaries are at ages: 0, 5, 15, 65 and 120. We then have to test values on and at either side of each boundary. So we test the component with ages:
  • -1, 0, 1.
  • 4, 5, 6.
  • 14, 15, 16.
  • 64, 65 66.
  • 119, 120, 121.
The component handles all values on the name field in the same way. As all values are treated the same, it is not possible to separate the name field values into separate sets so boundary value analysis cannot be performed on the name field.
Our application does treat different values in the destination field differently, based on region. However, it is not possible to identify boundary values for destination regions. Consequently, we are unable to perform boundary value analysis on the destination field.

Read more »

Testing Techniques

Testing Techniques

Testing techniques refer to different methods of testing particular features a computer program, system or product. Each testing type has its own testing techniques while some techniques combine the feature of both types.

Black box testing techniques:

* Graph Based Testing Methods
* Error Guessing
* Boundary Value analysis
* Equivalence partitioning
* Comparison Testing
* Orthogonal Array Testing

White box testing techniques:

* Basis Path Testing
* Flow Graph Notation
* Cyclomatic Complexity
* Graph Matrices
* Control Structure Testing
* Loop Testing

Difference between Testing Types and Testing Techniques?

Testing types deal with what aspect of the computer software would be tested, while testing techniques deal with how a specific part of the software would be tested.

That is, testing types mean whether we are testing the function or the structure of the software. In other words, we may test each function of the software to see if it is operational or we may test the internal components of the software to check if its internal workings are according to specification.

On the other hand, ‘Testing technique’ means what methods or ways would be applied or calculations would be done to test a particular feature of a software (Sometimes we test the interfaces, sometimes we test the segments, sometimes loops etc.)

Read more »

Bug Report


Anybody who has written software for public use will probably have received at least one bad bug report. Reports that say nothing ("It doesn't work!"); reports that make no sense; reports that don't give enough information; reports that give wrong information. Reports of problems that turn out to be user error; reports of problems that turn out to be the fault of somebody else's program; reports of problems that turn out to be network failures.
There's a reason why technical support is seen as a horrible job to be in, and that reason is bad bug reports. However, not all bug reports are unpleasant: I maintain free software, when I'm not earning my living, and sometimes I receive wonderfully clear, helpful,informative bug reports.
In this essay I'll try to state clearly what makes a good bug report. Ideally I would like everybody in the world to read this essay before reporting any bugs to anybody. Certainly I would like everybody who reports bugs to me to have read it.
In a nutshell, the aim of a bug report is to enable the programmer to see the program failing in front of them. You can either show them in person, or give them careful and detailed instructions on how to make it fail. If they can make it fail, they will try to gather extra information until they know the cause. If they can't make it fail, they will have to ask you to gather that information for them.
In bug reports, try to make very clear what are actual facts ("I was at the computer and this happened") and what are speculations ("Ithink the problem might be this"). Leave out speculations if you want to, but don't leave out facts.
When you report a bug, you are doing so because you want the bug fixed. There is no point in swearing at the programmer or being deliberately unhelpful: it may be their fault and your problem, and you might be right to be angry with them, but the bug will get fixed faster if you help them by supplying all the information they need. Remember also that if the program is free, then the author is providing it out of kindness, so if too many people are rude to them then they may stop feeling kind.

"It doesn't work."

Give the programmer some credit for basic intelligence: if the program really didn't work at all, they would probably have noticed. Since they haven't noticed, it must be working for them. Therefore, either you are doing something differently from them, or your environment is different from theirs. They need information; providing this information is the purpose of a bug report. More information is almost always better than less.
Many programs, particularly free ones, publish their list of known bugs. If you can find a list of known bugs, it's worth reading it to see if the bug you've just found is already known or not. If it's already known, it probably isn't worth reporting again, but if you think you have more information than the report in the bug list, you might want to contact the programmer anyway. They might be able to fix the bug more easily if you can give them information they didn't already have.
This essay is full of guidelines. None of them is an absolute rule. Particular programmers have particular ways they like bugs to be reported. If the program comes with its own set of bug-reporting guidelines, read them. If the guidelines that come with the program contradict the guidelines in this essay, follow the ones that come with the program!
If you are not reporting a bug but just asking for help using the program, you should state where you have already looked for the answer to your question. ("I looked in chapter 4 and section 5.2 but couldn't find anything that told me if this is possible.") This will let the programmer know where people will expect to find the answer, so they can make the documentation easier to use.

"Show me."

One of the very best ways you can report a bug is by showing it to the programmer. Stand them in front of your computer, fire up their software, and demonstrate the thing that goes wrong. Let them watch you start the machine, watch you run the software, watch how you interact with the software, and watch what the software does in response to your inputs.
They know that software like the back of their hand. They know which parts they trust, and they know which parts are likely to have faults. They know intuitively what to watch for. By the time the software does something obviously wrong, they may well have already noticed something subtly wrong earlier which might give them a clue. They can observe everything the computer does during the test run, and they can pick out the important bits for themselves.
This may not be enough. They may decide they need more information, and ask you to show them the same thing again. They may ask you to talk them through the procedure, so that they can reproduce the bug for themselves as many times as they want. They might try varying the procedure a few times, to see whether the problem occurs in only one case or in a family of related cases. If you're unlucky, they may need to sit down for a couple of hours with a set of development tools and really start investigating. But the most important thing is to have the programmer looking at the computer when it goes wrong. Once they can see the problem happening, they can usually take it from there and start trying to fix it.

"Show me how to show myself."

This is the era of the Internet. This is the era of worldwide communication. This is the era in which I can send my software to somebody in Russia at the touch of a button, and he can send me comments about it just as easily. But if he has a problem with my program, hecan't have me standing in front of it while it fails. "Show me" is good when you can, but often you can't.
If you have to report a bug to a programmer who can't be present in person, the aim of the exercise is to enable them to reproduce the problem. You want the programmer to run their own copy of the program, do the same things to it, and make it fail in the same way. When they can see the problem happening in front of their eyes, then they can deal with it.
So tell them exactly what you did. If it's a graphical program, tell them which buttons you pressed and what order you pressed them in. If it's a program you run by typing a command, show them precisely what command you typed. Wherever possible, you should provide a verbatim transcript of the session, showing what commands you typed and what the computer output in response.
Give the programmer all the input you can think of. If the program reads from a file, you will probably need to send a copy of the file. If the program talks to another computer over a network, you probably can't send a copy of that computer, but you can at least say what kind of computer it is, and (if you can) what software is running on it.

"Works for me. So what goes wrong?"

If you give the programmer a long list of inputs and actions, and they fire up their own copy of the program and nothing goes wrong, then you haven't given them enough information. Possibly the fault doesn't show up on every computer; your system and theirs may differ in some way. Possibly you have misunderstood what the program is supposed to do, and you are both looking at exactly the same display but you think it's wrong and they know it's right.
So also describe what happened. Tell them exactly what you saw. Tell them why you think what you saw is wrong; better still, tell them exactly what you expected to see. If you say "and then it went wrong", you have left out some very important information.
If you saw error messages then tell the programmer, carefully and precisely, what they were. They are important! At this stage, the programmer is not trying to fix the problem: they're just trying to find it. They need to know what has gone wrong, and those error messages are the computer's best effort to tell you that. Write the errors down if you have no other easy way to remember them, but it's not worth reporting that the program generated an error unless you can also report what the error message was.
In particular, if the error message has numbers in it, do let the programmer have those numbers. Just because you can't see any meaning in them doesn't mean there isn't any. Numbers contain all kinds of information that can be read by programmers, and they are likely to contain vital clues. Numbers in error messages are there because the computer is too confused to report the error in words, but is doing the best it can to get the important information to you somehow.
At this stage, the programmer is effectively doing detective work. They don't know what's happened, and they can't get close enough to watch it happening for themselves, so they are searching for clues that might give it away. Error messages, incomprehensible strings of numbers, and even unexplained delays are all just as important as fingerprints at the scene of a crime. Keep them!
If you are using Unix, the program may have produced a core dump. Core dumps are a particularly good source of clues, so don't throw them away. On the other hand, most programmers don't like to receive huge core files by e-mail without warning, so ask before mailing one to anybody. Also, be aware that the core file contains a record of the complete state of the program: any "secrets" involved (maybe the program was handling a personal message, or dealing with confidential data) may be contained in the core file.

"So then I tried . . ."

There are a lot of things you might do when an error or bug comes up. Many of them make the problem worse. A friend of mine at school deleted all her Word documents by mistake, and before calling in any expert help, she tried reinstalling Word, and then she tried running Defrag. Neither of these helped recover her files, and between them they scrambled her disk to the extent that no Undelete program in the world would have been able to recover anything. If she'd only left it alone, she might have had a chance.
Users like this are like a mongoose backed into a corner: with its back to the wall and seeing certain death staring it in the face, it attacks frantically, because doing something has to be better than doing nothing. This is not well adapted to the type of problems computers produce.
Instead of being a mongoose, be an antelope. When an antelope is confronted with something unexpected or frightening, it freezes. It stays absolutely still and tries not to attract any attention, while it stops and thinks and works out the best thing to do. (If antelopes had a technical support line, it would be telephoning it at this point.) Then, once it has decided what the safest thing to do is, it does it.
When something goes wrong, immediately stop doing anything. Don't touch any buttons at all. Look at the screen and notice everything out of the ordinary, and remember it or write it down. Then perhaps start cautiously pressing "OK" or "Cancel", whichever seems safest. Try to develop a reflex reaction - if a computer does anything unexpected, freeze.
If you manage to get out of the problem, whether by closing down the affected program or by rebooting the computer, a good thing to do is to try to make it happen again. Programmers like problems that they can reproduce more than once. Happy programmers fix bugs faster and more efficiently.

"I think the tachyon modulation must be wrongly polarised."

It isn't only non-programmers who produce bad bug reports. Some of the worst bug reports I've ever seen come from programmers, and even from good programmers.
I worked with another programmer once, who kept finding bugs in his own code and trying to fix them. Every so often he'd hit a bug he couldn't solve, and he'd call me over to help. "What's gone wrong?" I'd ask. He would reply by telling me his current opinion of what needed to be fixed.
This worked fine when his current opinion was right. It meant he'd already done half the work and we were able to finish the job together. It was efficient and useful.
But quite often he was wrong. We would work for some time trying to figure out why some particular part of the program was producing incorrect data, and eventually we would discover that it wasn't, that we'd been investigating a perfectly good piece of code for half an hour, and that the actual problem was somewhere else.
I'm sure he wouldn't do that to a doctor. "Doctor, I need a prescription for Hydroyoyodyne." People know not to say that to a doctor: you describe the symptoms, the actual discomforts and aches and pains and rashes and fevers, and you let the doctor do the diagnosis of what the problem is and what to do about it. Otherwise the doctor dismisses you as a hypochondriac or crackpot, and quite rightly so.
It's the same with programmers. Providing your own diagnosis might be helpful sometimes, but always state the symptoms. The diagnosis is an optional extra, and not an alternative to giving the symptoms. Equally, sending a modification to the code to fix the problem is a useful addition to a bug report but not an adequate substitute for one.
If a programmer asks you for extra information, don't make it up! Somebody reported a bug to me once, and I asked him to try a command that I knew wouldn't work. The reason I asked him to try it was that I wanted to know which of two different error messages it would give. Knowing which error message came back would give a vital clue. But he didn't actually try it - he just mailed me back and said "No, that won't work". It took me some time to persuade him to try it for real.
Using your intelligence to help the programmer is fine. Even if your deductions are wrong, the programmer should be grateful that you at least tried to make their life easier. But report the symptoms as well, or you may well make their life much more difficult instead.

"That's funny, it did it a moment ago."

Say "intermittent fault" to any programmer and watch their face fall. The easy problems are the ones where performing a simple sequence of actions will cause the failure to occur. The programmer can then repeat those actions under closely observed test conditions and watch what happens in great detail. Too many problems simply don't work that way: there will be programs that fail once a week, or fail once in a blue moon, or never fail when you try them in front of the programmer but always fail when you have a deadline coming up.
Most intermittent faults are not truly intermittent. Most of them have some logic somewhere. Some might occur when the machine is running out of memory, some might occur when another program tries to modify a critical file at the wrong moment, and some might occur only in the first half of every hour! (I've actually seen one of these.)
Also, if you can reproduce the bug but the programmer can't, it could very well be that their computer and your computer are different in some way and this difference is causing the problem. I had a program once whose window curled up into a little ball in the top left corner of the screen, and sat there and sulked. But it only did it on 800x600 screens; it was fine on my 1024x768 monitor.
The programmer will want to know anything you can find out about the problem. Try it on another machine, perhaps. Try it twice or three times and see how often it fails. If it goes wrong when you're doing serious work but not when you're trying to demonstrate it, it might be long running times or large files that make it fall over. Try to remember as much detail as you can about what you were doing to it when it did fall over, and if you see any patterns, mention them. Anything you can provide has to be some help. Even if it's only probabilistic (such as "it tends to crash more often when Emacs is running"), it might not provide direct clues to the cause of the problem, but it might help the programmer reproduce it.
Most importantly, the programmer will want to be sure of whether they're dealing with a true intermittent fault or a machine-specific fault. They will want to know lots of details about your computer, so they can work out how it differs from theirs. A lot of these details will depend on the particular program, but one thing you should definitely be ready to provide is version numbers. The version number of the program itself, and the version number of the operating system, and probably the version numbers of any other programs that are involved in the problem.

"So I loaded the disk on to my Windows . . ."

Writing clearly is essential in a bug report. If the programmer can't tell what you meant, you might as well not have said anything.
I get bug reports from all around the world. Many of them are from non-native English speakers, and a lot of those apologise for their poor English. In general, the bug reports with apologies for their poor English are actually very clear and useful. All the most unclear reports come from native English speakers who assume that I will understand them even if they don't make any effort to be clear or precise.
  • Be specific. If you can do the same thing two different ways, state which one you used. "I selected Load" might mean "I clicked on Load" or "I pressed Alt-L". Say which you did. Sometimes it matters.
  • Be verbose. Give more information rather than less. If you say too much, the programmer can ignore some of it. If you say too little, they have to come back and ask more questions. One bug report I received was a single sentence; every time I asked for more information, the reporter would reply with another single sentence. It took me several weeks to get a useful amount of information, because it turned up one short sentence at a time.
  • Be careful of pronouns. Don't use words like "it", or references like "the window", when it's unclear what they mean. Consider this: "I started FooApp. It put up a warning window. I tried to close it and it crashed." It isn't clear what the user tried to close. Did they try to close the warning window, or the whole of FooApp? It makes a difference. Instead, you could say "I started FooApp, which put up a warning window. I tried to close the warning window, and FooApp crashed." This is longer and more repetitive, but also clearer and less easy to misunderstand.
  • Read what you wrote. Read the report back to yourself, and see if you think it's clear. If you have listed a sequence of actions which should produce the failure, try following them yourself, to see if you missed a step.

Summary

  • The first aim of a bug report is to let the programmer see the failure with their own eyes. If you can't be with them to make it fail in front of them, give them detailed instructions so that they can make it fail for themselves.
  • In case the first aim doesn't succeed, and the programmer can't see it failing themselves, the second aim of a bug report is to describe what went wrong. Describe everything in detail. State what you saw, and also state what you expected to see. Write down the error messages, especially if they have numbers in.
  • When your computer does something unexpected, freeze. Do nothing until you're calm, and don't do anything that you think might be dangerous.
  • By all means try to diagnose the fault yourself if you think you can, but if you do, you should still report the symptoms as well.
  • Be ready to provide extra information if the programmer needs it. If they didn't need it, they wouldn't be asking for it. They aren't being deliberately awkward. Have version numbers at your fingertips, because they will probably be needed.
  • Write clearly. Say what you mean, and make sure it can't be misinterpreted.
  • Above all, be precise. Programmers like precision.

Read more »