by Srivaths Sankaran 08/17/2006
Problems in software are easier to fix the earlier they are detected. In describing the software constructive cost model (COCOMO), Boehm (in Software Engineering Economics) contends that the cost of remediation goes up nearly three-fold from coding to testing and a further ten-fold if the problem is not detected before release. Each layer of software built on top of unratified code aggravates the problem by infesting other parts of the application.
A code review is an excellent checkpoint that can help flush out erroneous assumptions and gaps in reasoning. It helps minimize the impact of a problem by means of early detection. While testing is an excellent way to improve the quality, testing alone is not enough. For one, as shown by McConnell in Code Complete, it is statistically impossible to completely test a non-trivial software project. Testing must be bolstered by up-front code reviews.
Conducting a code review, however, can also be a costly proposition, due to the involvement (interruption) of several individuals and their loss of productivity. With the context switch and the cost to the project as a whole, code reviews can add a sizable lengthening of the project timeline. The purist may point to the benefits of identifying, diagnosing, and fixing problems early in the project lifecycle.
How does one reconcile these polarizing effects of a code review? The following sections show how you can stay on the high horse and yet deliver on time. They synthesize the essence of lessons learned from experience and best practices such as those evangelized by the Fagan review process (PDF, 7MB). Use these lessons as guidelines while developing in a more agile and cost-effective manner.
Adherence to coding standards Writing to standards has several benefits, such as a shared vocabulary, which makes the code easier to reuse and maintain. One of the primary focuses of a code review is to ensure abidance with these norms. As you'll see later, this benefit of a review is best derived in an automated manner. Such an approach has the dual benefit of a machine's thoroughness while circumventing the cost of a manual review.
Addressing the requirements One of the key goals of a code review is to ensure that the code under scrutiny is feature-complete. Beautiful code that fails to meet user requirements is useless. The review must ensure that the logic does what it is called upon to do.
Code correctness Ensure that the authors are following proper programming techniques as appropriate:
Errors and omissions A well-written code fragment that abides by all the departmental norms, satisfies requirements, and otherwise "stays within the lines" could still be in error. It is the task of the reviewer to make sure that the code does not make incorrect business/usage assumptions and that all possible usage scenarios are taken into account. A business process expert helping with the review will be quick to spot logic that presupposes the state of the application while the logic is executed. For example, an application that allows guest users cannot be guaranteed to know the current user's identity.
Test for coverage. Test coverage is a measure of the quality of testing. Tools like Maven and CruiseControl simplify the process of testing and auditing coverage reports. The reviewer must understand the significance of branch and path coverage.
Silo-busting pedagogical aid This tool exposes others in the team to the code. The reviewer gets to read, analyze, and discuss someone else's work. Such cross-pollination is an excellent way to raise the skill of the entire group. This approach has the added tangential benefit of busting silos of expertise that projects tend to unwittingly foster.
How can a project have an effective review and oversight process, thereby assuring quality and on-time delivery? How can interruptions be minimized so that the developer/reviewer isn't constantly switching between coding and reviewing? Such task switches not only add to the timeline, they have a serious deleterious effect on the quality of the work. There are different facets to the solution. I will demonstrate that you can make a significant impact to the code base and yet work within the real-world project delivery constraints.
Managing what is reviewed You must accept that you cannot realistically manually review every last line of code that goes into a project. So what can you omit? Are there aspects of code that are, by nature, robust and whose veracity is implicit? The next section shows that you can make such a distinction and focus on what must be reviewed.
Using tools An automated tool is an ideal partner in your quest for code quality. A good tool functions while the developer is working, either directly during the writing of the code or during the build process. This means that the developer is not interrupted, and a formal manual review does not have to be conducted. However, tools aren't a panacea for this problem. A tool cannot validate business requirements. It cannot ensure against incorrect assumptions. It can only perform minimal oversight against insecure coding practices. Nevertheless, it is a major ally in the review process. It greatly simplifies what must be manually examined. Automated reviews are discussed in a later section.
Pair programming Proponents of extreme programming (XP) and other agile project development techniques have long sung the virtues of pair programming. In the current context, it has the benefit of being an on-the-spot code review. It is as if you are at a formal review process while you are coding. The added benefit is that it doesn't incur the additional cost that would otherwise be the case for a manual (scheduled) review.
The purpose of a code review is to ensure a high level of code quality. The ideal way to ensure this is by verifying every line and every statement that has been written. This, however, is unrealistic given the premium on delivery faced by most projects. This statement should not be mistaken as indicating that you do not have time for quality assurance. It points out the need to be more judicious with your use of the time to deliver the highest quality product.
Focus on the programmer Java development tools have matured to the point where several routine/repetitive tasks can be safely delegated to tools. Examples include IDE getter/setter generators, JavaBeans, and XDoclet-generated code. I'll make a simplifying assumption that such code is accurate and does not need to be further validated by a code review. However, you must be aware of possible manual updates to machine-generated code. For example, a developer may tweak the Hibernate mapping file generated by middlegen. It is the responsibility of the developer to highlight such exceptions, so that the reviewer is aware of the need to include the affected file during the review. The reviewer will assign the appropriate weight to such code during the examination process.
Risk-driven The higher the risk of using a given software component, the better its oversight should be. One way of quantifying risk is to use the formula from Madiera's Experimental software risk assessment [PDF 364KB]:
Risk = probability of bug x probability of bug activation x impact of bug activation
The probability of a bug is predicated on a couple factors. It is directly proportional to the frequency of usage of the affected code fragment and its complexity. The more complex the logic, the more prone it is to human error. This brings me to the next item on which to focus the review.
Complexity-driven Complexity in code is sometimes unavoidable; it may just be the nature of the problem. However, I have found that more often than not, complexity is a result of poor coding practice. Using metrics such as cyclomatic complexity, afferent, and efferent coupling allows you to quantify the complexity of the code. Several tools (IDE or Maven plug-ins) measure and report on the complexity of the code base. These are an integral part of a code review.
Test coverage-driven As cited above, a high degree of test coverage isn't a guarantor of quality. However, it is a step in the right direction. Conversely, logic that is not well tested is a candidate for thorough evaluation during a code review. It could also be argued that this egregious problem be fixed prior to a review.
Just as important as identifying what to review is the determination of when to conduct a review. If the reviews are held too frequently, they become ineffective due to the fluidity of the code under review. If they are held too far apart, you risk a mushrooming cost of remedy due to afferent coupling of the code under review. I therefore postulate that a scheduled review must be conducted either after a development task has been completed or after a bug has been fixed.
Automate the review where possible. A review conducted by a computer has several benefits:
The benefits notwithstanding, it must be acknowledged that the scope of automated reviews is limited to rules and conventions and are hard to extend to study business logic. The latter have to be handled by manual (or peer) examination.
From the prior sections you have seen that interactive and build-time tools at your disposal, while a great boon to the review process, cannot substitute for manual review for a certain class of problems. You have to manually review for certain types of improper coding practices. Tools cannot highlight errors of omission as described in the earlier section. Given that, let's see how a manual review must be conducted.
Participants: The review must focus on different facets of code correctness. Therefore, the participants of a code review must be chosen to satisfy a diverse set of roles.
How to conduct: The technical lead will work with the authors to coordinate the review. This process includes:
There is some debate as to the effectiveness of review meetings (for an example, see A Controlled Experiment to Assess the Effectiveness of Inspection Meetings [PDF, 120KB]). I recommend that the participants and roles be determined as described above. All participants will meet after they have had a chance to inspect the code. The purpose of the meeting is solely administrative and to clarify corrections recommended by the reviewer.
A code review is an integral part of a project. Just as every coding task has an associated test process, so too should it have a code review. Therefore, it must be treated as another task that is tracked on the project plan by the project's manager. The project's technical lead who is working in concert with the authors will be able to assign appropriate resources to the review process.
This article has shown how you can have your cake and eat it too. You canconduct thorough code reviews without succumbing to time pressures. Did you know that more than 60 percent of the defects can be captured by effective use of peer reviews (c.f., Boehm and Basili's Software Defect Reduction Top 10 List [PDF, 124KB])? With a judicious combination of automation, interactive tools, and risk-based evaluation, you can reap the benefits of early problem detection. The techniques described here come from experience and are directly applicable to your next project. Use the references for additional relevant information on the subject of quality assurance as a whole.
Srivaths Sankaran is a jack-of-all-trades when it comes to software development.
View all java.net Articles.