《Debug It》读书笔记

 按:看了几本调试方面的书,基本上都提到了一些关键点包括:

1、基础设施的建设,包括源代码管理系统、自动编译和持续集成系统、Bug跟踪系统;

2、单元测试的重要性;

3、首先要理解系统是如何运行的,能估计出改动对系统的影响;

4、正交性设计,例如隔离其他因素的干扰,每次只动一个变量,对其他变量进行控制,分而治之,这又涉及到软件设计的耦合性;

5、要努力对bug进行重现,消除不确定性的影响;

6、可视性的设计,能用某种形式反映出系统内部运行的状态,例如log、调试窗口等;

7、由粗至精,逐步缩小查找的范围;

8、假设——实验,用科学实验的方法来查找bug;

9、根本原因分析,由点至面,找到导致问题的根本原因,从根本上解决问题;

10、多人讨论,引入新鲜的视点;

11、防止类似问题再次发生,进行自身和团队学习。

 

Part I

The Heart of the Problem

 

Chapter 1

A Method in the Madness

 

 

Debugging Is More Than “Making the BugGo Away”

 

Effective debugging requires that we take these steps:

1. Work out why the software is behaving unexpectedly.

2. Fix the problem.

3. Avoid breaking anything else.

4. Maintain or improve the overall quality(readability, architecture, test coverage, performance, and so on) of the code.

5. Ensure that the same problem does not occurelsewhere and cannot occur again.

 

Understanding Is Everything

Inexperienced developers (and sometimes,unfortunately, those of us who should know better) often skip diagnosisaltogether. Instead, they immediately implement what they think might be a fix.If they’re lucky, it won’t work, and all they will have done is waste theirtime. The real danger comes if it works, or seems to work, because now they’vemade a change to the source that they don’t really understand. It might fix thebug, but there is a real chance that in reality it is only masking the trueunderlying cause. Worse, there is a good chance that this kind of change willintroduce regressions—breaking something that used to workcorrectly beforehand.

 

Without first understanding the true root cause of thebug, we are outside the realms of software engineering and delving instead intovoodoo programming or programming by coincidence.

 

 

One Problem at a Time Check Simple Things first

 

 

Chapter 2

Reproduce

 

Reproduce first, Ask Questions Later

 

Successful reproduction is all about control. If youcontrol all the relevant variables, you will reproduce your problem. The trick,of course, is identifying which variables are relevant to the bug at hand,discovering what you need to set them to, and finding a way to do so.

 

 

The things you need to control break down into threeareas:

 

The software itself:

If the bug is in an area that has changed recently,then ensuring that you’re running the same version of the software as it was reportedagainst is a good ?rst step.

The environment it’s running within:

If interaction with an external system (someparticular piece of hardware or a remote server perhaps) is involved, then youprobably want to ensure that you’re using the same.

The inputs you provide to it:

If the bug is related to an area that behaves verydifferently depending upon how the software is configured, then start by replicatingthe user’s configuration.

 

Controlling the Software

Controlling the Environment

Controlling Inputs

 

 

Load and Stress

Some bugs manifest only when the software is undersome kind of stress.

 

Aim for a minimal reproduction.

 

Make Nondeterministic Bugs Deterministic

Nondeterminism can have only a few causes:

? Starting from an unpredictable initial state

? Interaction with external systems

? Deliberate randomness

? Multithreading

 

 

Chapter 3

Diagnose

 

Having discovered that things aren’t as you believedthem to be, your task is to modify your understanding of the software until youdo understand what’s really going on. To do that, you operate in the other ofthe two possible directions—create a hypothesis that mightprovide an explanation and then construct experiments to test it.

So, here’s our idealized process:

1. Examine what you know about the software’sbehavior, and construct a hypothesis about what might cause it.

2. Design an experiment that will allow you to testits truth (or otherwise).

3. If the experiment disproves your hypothesis, comeup with a new one, and start again.

4. If it supports your hypothesis, keep coming up withexperiments until you have either disproved it or reached a high enough level ofcertainty to consider it proven.

 

 

Run Different Types of Experiments

 

You can design experiments that are intended to proveyour hypothesis or to disprove it.

 

One Change at a Time

 

Keep a Record of What You’ve Tried

 

Ignore Nothing

 

How Else Can a Daybook Help When Debugging?

As well as maintaining a record of your experiments, adaybook can also be useful for the following:

? Writing out hypotheses. Getting things onto papercan help identify flaws in assumptions, especially when the hypothesis iscomplex.

? Keeping track of details such as stack traces,argument values, and variable names. Not only does this help with findingthings again, but it also helps you communicate with colleagues when explainingthe problem, avoiding the need to rely upon memory.

? Keeping a list of ideas to try. Often you willnotice something else you want to investigate, or a possible follow-upexperiment will occur to you, but you don’t want to abandon the currentexperiment to pursue it. A “to-do” listensures that you don’t forget to come back to it later.

? Doodling when you need to take your mind off the problem.

 

 

Even if the odd behavior you notice doesn’t have anybearing on the problem at hand, the fact that you’ve discovered something unexpectedis valuable. Anything that you don’t understand is potentially a bug.

 

By far the simplest and most direct is addinginstrumentation to the software itself.

Instrumentation is code that doesn’t affect how thesoftware behaves but instead provides insight into why it behaves as it does.We already discussed the most common and important type of instrumentation,logging.

 

Beware of Heisenberg. Instrumenting softwareintrinsically involves changing it, which raises the specter of affecting,instead of simply observing, its behavior. This is dangerous during diagnosis,because introducing an unintentional change during a series of experiments can easilylead to you draw invalid conclusions.

 

Divide and conquer, or binary chop, is the Swiss Armyknife of debugging—it crops up again and again in awide variety of situations. Ultimately, this will allow you to narrow yoursearch to only a single module, but that’s still a considerable help.

 

Occasionally, you will find yourself chasing aregression—a bug in functionality that used to work correctly butwas broken by some subsequent change. Your normal diagnostic toolbox remainsjust as applicable to this kind of problem as any other, but there is one toolof particular value when regression hunting—your sourcecontrol system. If you can identify exactly which change introduced the problem,then diagnosing why it did so may be trivial.

 

Focus on the Differences

 

 

There’s no doubt that your debugger is one of the mostpowerful tools in your toolbox, and you should certainly take the time tobecome familiar with what it can do and proficient in its use. But here’s thething—as time goes on, I find myself using the debugger lessand less.

 

What has changed is test-first development. Where inthe past my first instinct might have been to break out the debugger, now it’sto write a test. To understand why, it helps to think about why we might use adebugger. It’s particularly helpful at three different points of thedevelopment life cycle:

1. During initial development, it’s helpful whensingle-stepping through code helps to convince us that what it’s really doing agreeswith what we thought we were implementing.

2. If we have a theory about why the code is behavingin a particular way, we can use the debugger to confirm or refute this theory.

3. Finally, a debugger helps us explore code that isbehaving in a way we simply don’t understand.

 

If we have a theory about what’s causing a bug, wecreate a test that proves it. And the beauty of this is that unlike stepping throughin a debugger, the results of which are ephemeral, a test is permanent. Notonly does it prove that the code works now, but it continues to do so in thefuture and can be run (and even improved) by other team members. Not only doesit prove that our theory is correct, but we can subsequently use it to verifythat our fix addresses the issue.

 

If the changes you’re making don’t seem to be havingan effect, you’re not changing what you think you are. The only defense is toalways have the possibility at the back of your mind.

 

Know what assumptions you’re making, and examine themcritically.

 

The most fruitful approach to multiple causes is toisolate the problems and find a way to reproduce a bug that depends upon one ofthe causes and not the other.

 

An alternative approach is to start by looking at anyother bugs you might be aware of in the same area. Addressing these cansometimes clear things up or improve your understanding enough to throw your originalproblem into sharper relief.

 

Another cause of that “twilight zone” feeling is a changing underlying system. The rock upon which theempirical method we’re relying upon depends is that we can reproduce theproblem over and over again, obtaining the same results each and every time.Take that certainty away, and making progress becomes extremely difficult.

 

If faced with a changing underlying system, stop andwork out what’s changing and why.

 

Explaining the problem helps get your thoughts inorder. There are excellent reasons why things work this way—explaining your problem to someone else forces you to get yourthoughts in order, enumerate your assumptions, and construct an argument frombasic principles. Very often, putting that structure in place is all it takesfor you to see the solution yourself.

 

When the stroke of genius arrives out of nowhere,write it down. If a pen and paper isn’t available, send yourself an SMS or tellwhoever you happen to be with—there’s nothing more frustratingthan being unable to recall your insight the following day.

Particularly difficult problems can benefit from alonger break. The fresh perspective of a new morning often helps immeasurably.But beware of overdoing it—tracking down an involved bug meansthat you need to understand a lot of different things. Take too much time off, andyou might find that you’re having to remind yourself of too much.

 

Sherlock Holmes famously said, “When you have eliminated the impossible, whatever remains, howeverimprobable, must be the truth.”

 

We humans are multitalented creatures. Unfortunately,one of our talents is self-deception—we’re very good at convincingourselves of something we want to be true. With that in mind, time spentvalidating that your diagnosis really stands up to scrutiny is time very wellspent.

 

 

Chapter 4

Fix

 

Start from a clean source tree.

 

Start by ensuring that all your tests pass.

 

Here’s the sequence to follow:

1. Run the existing tests, and demonstrate that theypass.

2. Add one or more new tests, or fix the existingtests, to demonstrate the bug (in other words, to fail).

3. Fix the bug.

4. Demonstrate that your fix works (the failing testsno longer fail).

5. Demonstrate that you haven’t introduced anyregressions (none of the tests that previously passed now fail).

 

 

Make sure you know how you’re going to test it beforedesigning your fix.

 

 

Sometimes, even if we do understand the root cause,there’s still a temptation to “paper over the cracks.” Perhaps the bug is deeply rooted in the architecture, and a truefix would involve dangerously widespread changes. Or there might be a danger ofintroducing compatibility issues with previous versions.

 

The upshot is that you’re very likely to be underpressure to just “make the bug go away” and move onto the next task.

 

Bug fixing often uncovers opportunities forrefactoring. The very fact that you’re working with code that contains a bugindicates that there is a chance that it could be clearer or better structured.

 

It can be tempting to collect a number of smallchanges together and—in one go—checkthem all in. To avoid this problem, stick to the rule one logical change, onecheck-in.

 

The rule of thumb is to consider a review whenever youreach an area of uncertainty or risk.

 

 

Chapter 5

Reflect

 

One of the humorous emails that turns up in my inboxevery once in a while is entitled “The six stages of debugging” and reads as follows:

1. That can’t happen.

2. That doesn’t happen on my machine.

3. That shouldn’t happen.

4. Why is that happening?

5. Oh, I see.

6. How did that ever work?

 

 

The first step toward learning the lessons of the bugis determining what went wrong.

 

A useful trick when performing root cause analysis isto ask “Why?” five times. For example:

? The software crashed. Why?

? The code didn’t handle network failure during datatransmission. Why?

? There was no unit test to check for network failure.Why?

? The original developer wasn’t aware that he shouldcreate such a test. Why?

? None of our unit tests check for network failure.Why?

? We failed to take network failure into account inthe original design.

 

What we’re talking about here is the bigger picture—how did the mistake make its way into the software in the firstplace?

 

Once you’ve identified the source of the error, youcan take steps to ensure that it doesn’t happen again. In some cases, thismight mean nothing more than a “note to self” to be more careful in that area in the future, or a quiet word witha colleague to let them know about their mistake.

 

Letting a colleague know that they’ve made a mistakecan be a minefield. On the one hand, it’s extremely valuable information—you owe it to them to let them know so that they can avoid the samemistake in the future. On the other hand, we programmers are not always knownfor our interpersonal skills, and telling someone that they’ve screwed up caneasily go wrong if done without tact.

 

? Most important, give feedback for the right reason.If you’re really telling someone about their mistake because you like thefeeling of superiority it gives you, hold your tongue.

? Avoid personal comments. It can be helpful to use “I” and “we” language instead of ”you” language.

? Be constructive.

? Remember that you might be mistaken. Don’t simply announcethat they’vemade amistake—explore the possibility with them.

 

The project that you are working on will have its ownset of norms, for example:

? Coding standards

? Testing standards

? Documentation standards

? Reporting/tracking processes

? Design guidelines

? Performance requirements

Whenever you fix a bug, you need to bear these inmind. Do you need to update the end-user documentation as a result of the fix?Or the change log for the next release? Does the work need to be trackedagainst a particular client or project?

 

Part II

The Bigger Picture

 

Chapter 6

Discovering That You Have a Problem

 

? Tracking bugs

? Working with users

? Working with the customer support and QA teams

 

Imagine how things might appear from your user’sperspective.

 

Chapter 7

Pragmatic Zero Tolerance

 

Early bug fixing is by far the superior strategy.

Early bug fixing depends upon two principles:

? Processes that are likely to uncover bugs (testing,code reviews, getting running software into users’ hands) happen continuously duringdevelopment.

? Bug fixing takes priority over everything else.

 

Early Bug fixing Decreases Uncertainty

Until you start looking for them, you can have littleor no idea how many bugs remain to be found. And until you start fixing them,you can’t know how long they’re going to take to fix. Early bug detection andfixing allows you to measure how much of your time you need to spend on bugfixing and adjust your plan accordingly. Late bug fixing,

on the other hand, gives you the illusion that you’remaking progress, but you’re just storing up technical debt—a backlog of problems lurking under the surface of the software. Youcan have no idea when you will be done—it’s impossibleto predict how many more issues are waiting to be found.

 

No Broken Windows

 

Poor quality is contagious.

 

Detect bugs early, and do so from day one.

 

As a bare minimum, this means the following:

? Source control

? A fully automated build system

? A fully automated test harness

? Overnight builds or continuous integration

 

Separate Clean from Unclean

One challenge you’re going to face is that you’ll befighting against the broken windows effect—when you’re surrounded by brokenwindows, it takes a strong effort of will to avoid backsliding.

 

A good strategy can be to clearly demarcate “clean” (well-written, well-

tested, and debugged) code from “unclean.”

 

 

Part III

Debug-Fu

 

Chapter 8

Special Cases

 

 

When patching an existing release, concentrate on reducingrisk.

 

 

A true fix might involve extensive refactoring or evendeep architectural changes. In the absence of the normal checks and balances ofthe full release process, it’s difficult to be certain that these changes won’tintroduce regressions and end up making things worse rather than better.

 

The bug will need fixing in the development versiontoo.

 

 

Concurrent software can be a rich source of difficult-to-reproduce,difficult-to-diagnose, and difficult-to-fix problems. Bugs in such softwareoften exhibit nondeterminism, depend upon subtle and difficult to understandinteractions, and suffer from mysterious failure modes.

 

Simplicity and Control

You can build a number of things into your concurrentsoftware that will help during debugging. The two keys are simplicity andcontrol. Simplicity is a key element of any software design, but it’sparticularly valuable when dealing with concurrency. Keep the interactionsbetween independent threads straightforward, and constrain them to as small a numberof areas of code as possible.

 

 

Debugging embedded software can be particularlytricky, not because it’s complicated or involved (although it can be) butbecause of the environment it runs within.

 

The system you’re working on is controlling something.You can use that control as a communication channel. Perhaps there’s an LCDdisplay you can use? Or a serial port you can write to? It doesn’t have to be arich channel—one bit is enough. Is there an LED you can light up? Or amotor you can run?

 

 

Chapter 9

The Ideal Debugging Environment

 

Chapter 10

Teach Your Software to Debug Itself

 

Chapter 11

Anti-patterns

 

 

 

 

 

你可能感兴趣的:(《Debug It》读书笔记)