David Stark / Zarkonnen
I recently came across this Mitchell and Webb classic: "Cheesoid", the story of one man attempting to make a robot that can smell things.
Apart from being pretty funny, I think it's a great rapid-fire illustration of a lot of common mistakes in machine learning. That's what Cheesoid is, after all - it's a classifier system for smells.
- The first mistake is an inefficient setup for validation. Cheesoid cannot see, it can only smell, and as such, Mitchell has to manually pick up items and present them to the robot. Manual validation is both slow and prone to accidental variation between validation runs.
- Cheesoid appears to correctly identify the first validation item, cheese. Mitchell is very pleased at this, and pretty much convinced that the robot works. Attempting to come to any conclusions based on a validation data set that is too small is pointless, though. And indeed, the next two items in the validation set, flowers and petrol, are also identified as cheese.
- In the second validation run, having adjusted the robot, Mitchell presents the data in a different order: cheese, petrol, cheese, leaving out the flowers alltogether. Cheesoid correctly identifies the first instance of the cheese, and the first instance of the petrol (petrel), but upon being presented with the second instance of the cheese, also identifies it as petrol (petrel). At this point it ought to be clear to Mitchell that either the classification is not stable, or worse, that earlier members of the data set influence the classification of later ones. He can't tell the difference because his manual setup is terrible, and because he's simultaneously changing the learning system and the validation procedure.
- Next, we see Mitchell install a switch on the back of the robot labelled "Cheese / Petrol" to supply priors to the system about whether the smelly thing in front of it is cheese or petrol (petrel). This "works", but the system's output for flowers is now also simply whatever the switch is set to. By this point, the system appears to have simply connected its input straight to its output, and is clearly, utterly broken.
- Mitchell now declares the robot to be "good enough", ignoring the "flowers" data point. Ignoring contradictory data and moving goal posts after the beginning of the project can make even well-implemented validation worthless.
- As the rest of the video shows, once confronted with complex, real-world scenarios, Cheesoid's classifier fails to work, leading to a number of negative outcomes.