A suspend blockers post-mortem
Suspend blockers first surfaced as wakelocks in February, 2009. They were immediately and roundly criticized by the development community; in response, Android developer Arve Hjønnevåg made a long series of changes before eventually bowing to product schedules and letting the patches drop for some months. After the Linux Foundation's Collaboration Summit this year, Arve came back with a new version of the patch set after being encouraged to do so by a number of developers. Several rounds of revisions later, each seemingly driven by a new set of developers who came in with new complaints, these patches failed to get into the mainline and, at this point, probably never will.
In a number of ways, the situation looks pretty grim - an expensive failure of the kernel development process. Ted Ts'o described it this way:
Ted's comments point to what is arguably the most discouraging part of the suspend blocker story: the Android developers were given conflicting advice over the course of more than one year. They were told several times: fix X to get this code merged. But once they had fixed X, another group of developers came along and insisted that they fix Y instead. There never seemed to be a point where the job was done - the finish line kept moving every time they seemed to get close to it. The developers who had the most say in the matter did not, for the most part, weigh in until the last week or so, when they decisively killed any chance of this code being merged.
Meanwhile, in public, the Android developers were being criticized for not getting their code upstream and having their code removed from the staging tree. It can only have been demoralizing - and expensive too:
No doubt plenty of others would have long since given up and walked away.
There are plenty of criticisms which can be directed against Android, starting with the way they developed a short-term solution behind closed doors and shipped it in thousands of handsets before even trying to upstream the code. That is not the way the "upstream first" policy says things should be done; that policy is there to prevent just this sort of episode. Once the code has been shipped and applications depend on it, making any sort of change becomes much harder.
On the other hand, it clearly would not have been reasonable to expect the Android project to delay the shipping of handsets for well over a year while the kernel community argued about suspend blockers.
In any case, this should be noted: once the Android developers decided to engage with the kernel community, they did so in a persistent, professional, and solution-oriented manner. They deserve some real credit for trying to do the right thing, even when "the right thing" looks like a different solution than the one they implemented.
The development community can also certainly be criticized for allowing this situation to go on for so long before coming together and working out a mutually acceptable solution. It is hard to say, though, how we could have done better. While kernel developers often see defending the quality of the kernel as a whole as part of their jobs, it's hard to tell them that helping others find the right solutions to problems is also a part of their jobs. Kernel developers tend to be busy people. So, while it is unfortunate that so many of them did not jump in until motivated by the imminent merging of the suspend blocker code, it's also an entirely understandable expression of basic human nature.
Anybody who wants to criticize the process needs to look at one other thing: in the end it appears to have come out with a better solution. Suspend blockers work well for current Android phones, but they are a special-case solution which will not work well for other use cases, and might not even work well on future Android-based hardware. The proposed alternative, based on a quality-of-service mechanism, seems likely to be more flexible, more maintainable, and better applicable to other situations (including realtime and virtualization). Had suspend blockers been accepted, it would have been that much harder to implement the better solution later on.
And that points to how one of the best aspects of the kernel development
process was on display here as well. We don't accept solutions which look
like they may not stand the test of time, and we don't accept code just
because it is already in wide use. That has a lot to do with how we've
managed to keep the code base vital and maintainable through nearly twenty
years of active development. Without that kind of discipline, the kernel would
have long since collapsed under its own weight. So, while we can certainly
try to find ways to make the contribution process less painful in
situations like this, we cannot compromise on code quality and
maintainability. After all, we fully expect to still be running (and
developing) Linux-based systems after another twenty years.
| Index entries for this article | |
|---|---|
| Kernel | Development model |
