Development issues part 2: Bug tracking
There comes a time, though, when even the most die-hard free software proponent wishes that things would just work. As our software finds its way into more situations where failures are unwelcome (at best), the level of tolerance for bugs is falling. The desire for fewer flaws, however, runs counter to the desire for increasingly capable (and thus more complex) software. Somehow we have to find ways to simultaneously grow our systems and reduce the total number of bugs. To this end, a few projects have been having some interesting discussions on the tracking and fixing of bugs.
As has been discussed in this companion article, Eric Raymond has been busily stirring up trouble on the Emacs development list. His point, deemed reasonable by your editor, is that Emacs must adopt a number of relatively modern development practices if it is to have any hope of remaining relevant at all. One of his key points is that Emacs needs to have a real bug tracking system. Says Eric:
While some of Eric's suggestions appear to be non-starters - imagine trying to get Richard Stallman to hang out on an IRC channel - the bug tracker suggestion might just go somewhere. Certainly it could only be an improvement for a project of that size to have some sort of idea of what the current list of outstanding bugs looks like. It might even help bring about another Emacs release before the end of the decade.
Bug trackers are not a magical solution to the bug problem, though; in fact, they can create some problems of their own. The Fedora project, which does have a bug tracker, is currently trying to figure out what to do with the contents of that tracker. It seems that said tracker contains over 13,000 bugs, almost 10,000 of which apply to Fedora 7 and later.
A bug database of this size is simply overwhelming to anybody who tries to do something about it. As a result, Fedora users are filing bugs, only to see nothing happen in response. Not even a "thanks for your report" message. This situation is discouraging for everybody involved, causing Fedora users to give up on reporting bugs and developers to fear looking at the tracker.
In the Fedora case, there appears to be a near-consensus that the biggest problem is in triaging bug entries. This is not a job which can be automated; somebody has to go through bug submissions, weed out the duplicates, identify those which are really "features," figure out which developer should be notified, etc. Tying bug entries to those found in upstream trackers would be a highly useful bonus. Without this sort of effort, the bug tracker quickly fills with low-quality entries which help nobody.
For the most part, nobody is doing this job for Fedora now. Red Hat is not paying for a staff member to triage bugs, and the wider community has not filled this gap. In the short term, any sort of solution looks like it will have to come from the community, so the Fedora folks are wondering what can be done to encourage more participation. Simply asking for help is the obvious first step, as is making sure that the process is easy. Then they may consider the tactics adopted by other large projects - Mozilla's policy of expressing its appreciation by sending a T-shirt, for example.
As an aside, one of the more useful bits of information to come from this discussion was the existence of this family of URLs:
http://bugz.fedoraproject.org/<package-name>
Fill in the name, and the result is an immediate list of open bugs for the given package. Thus, for example, a visit to bugz.fedoraproject.org/gcc yields a list of compiler bugs. This result can be had directly from bugzilla, of course, but this interface is faster and easier.
The Fedora developers have discussed a number of related issues, such as whether the Fedora bug database should be separated from the RHEL system and what can be done to make Red Hat better appreciate the value of doing more of its quality assurance work in the Fedora repository. But the core problem is just getting human attention applied to the bug reports. Digging through bug databases is a relatively unglamorous job; it is not an easy path toward rock-star hacker status. But it is an important and relatively easy way to help make free software better.
Just in time to serve as an example of how well bug management can work, the GNOME project has posted its annual bugzilla statistics. It seems that over 110,000 GNOME bugs were filed in 2007, almost 109,000 of them were closed. The top bug-closers for the year were:
14254 Andre Klapper 9800 Tom Parker 7047 Susana Pereira 6882 Bruno Boaventura 6649 Pedro Villavicencio
It is worth pondering for a moment on the amount of energy required to close over 14,000 bugs in a year - that's almost 40 per day, every day, without a break. This kind of energy does exist within our community, and some projects are putting it to very good use.
While it is easy to get a contrary impression, the kernel does, in fact, have a bug tracker; there is also, in the form of Natalie Protasevich, somebody who handles the care and feeding of that tracker. But, as a recent episode shows, that still is not always sufficient to actually get the bugs fixed.
On November 13, 2007, a bug in the SCSI subsystem was reported to the linux-kernel mailing list. It was put into the tracker as bug 9370 on the same day. Some developers looked at it over the next few days, but, even though a specific commit which appeared to cause the bug had been identified, no solution was forthcoming. Discussion eventually died out. At least until January 2, when Ingo Molnar decided to stir the pot by posting a patch to revert the seemingly guilty commit. At that point the discussion picked up and a reliable way of reproducing the bug was found. The commit which was said to have caused the problem was, in fact, not guilty; it had just caused an older bug to come to light. The discussion did not stop there, though.
A number of charges went back and forth which do not require discussion here. But one core point is this: as long as the bug report sat in the tracker, nothing much appeared to be happening with it - though, it seems, the SCSI developers had not forgotten it and were trying to figure out what was really going on. But once the problem came back to the linux-kernel list in the form of a brute-force solution, the root cause was found in short order. The key here was bringing the problem to the attention of a wider group of people; the crucial recipe for reproducing the problem came from a developer who had not been looking at the problem previously.
In the kernel context, at least, giving wide exposure to a bug often helps
immensely in getting that bug fixed. That is especially true for the sort
of hard-to-reproduce bugs which tend to come up in kernel programming. So,
while bug trackers are a useful tool for ensuring that problems do not fall
through the cracks, it seems that one of the most potent anti-bug tools we
have - discussing the problem via a widely-distributed email list - is the
same tool we have been using for decades.
| Index entries for this article | |
|---|---|
| Kernel | Debugging |
