Machine learning’s boosting as a model of scientific community

Boosting is a classic, very simple, clever algorithm for training a crappy classifier into a group of less crappy classifiers that are collectively one impressively good classifier. Classifiers are important for automatically making decisions about how to categorize things, like this:

Here is how boosting works:

Take a classifier. It doesn’t have to be any good. In fact, its performance can be barely above chance.
Collect all the mistakes and modify the classifier into a new one that it is more likely to get those particular ones right next time.
Repeat, say a hundred times, keeping each iteration, so that you end up with a hundred classifiers
Now, on a new task, for every instance you want to classify, ask all of your classifiers which category that instance belongs in, giving more weight to the ones that make fewer mistakes. Collectively, they’ll be very accurate.

The connection to scientific community?

With a few liberties, science is like boosting. Let’s say there are a hundred scientists in a community, and each gets to take a stab at the twenty problems of their discipline. The first one tries, does great on some, not so great on others, and gets a certain level of prestige based on how well he did. The second one comes along, giving a bit of extra attention to the ones that the last guy flubbed, and when it’s all over earns a certain level of prestige herself. The third follows the second, and so on. Then I come along and write a textbook on the twenty problems. To do it, I have to read all 100 papers about each problem, and make a decision based on each paper and the prestige of each author. When I’m done, I’ve condensed the contributions of this whole scientific community into it’s collective answers to the twenty questions.

This is a simple, powerful model for explaining how a community of so-so scientists can collectively reach impressive levels of know-how. Its shortcomings are clear, but, hey, that’s part of what makes a good model.

If one fully accepts boosting as a model of scientific endeavor, then a few implications fall right out:

Science should be effective enough to help even really stupid humans develop very accurate theories.
It is most likely that no scholar holds a complete account of a field’s knowledge, and that many have to be consulted for a complete view.
Research that synthesizes the findings of others is of a different kind than research that addresses this or that problem.

The connection to scientific community?

Comment to Seth / Read but not published Cancel reply