What is Overfitting?
Most exams are designed to test our real understanding, by including some questions that we’ve never seen before. To answer those questions well, we need to generalise, to apply our knowledge flexibly in novel situations.
But if all the questions on an upcoming exam had been asked in recent previous years, then you could get a great grade by regurgitating memorised good answers from those previous years without any real understanding. And the more time you spend on memorising, the worse you get at adapting to slightly different questions.
This happens in machine learning too - if your model performs well on the training set, but can’t generalise to the test set, it’s called overfitting. It has over-learned the training set, without picking up on the underlying principles that would allow it to generalise.
Overfitting is a common problem in machine learning. Simply put, you can tell it’s a problem if you’re improving on your training set as you continue to train, but you’re actually getting worse at generalising to the test set.
My favourite example of overfitting actually comes from a human, rather than a machine. Solomon Shereshevsky was a famous Russian mnemonist whose brain naturally emphasised the distinctiveness of the world through synaesthesia, making it easy for him to memorise long strings of playing cards or random numbers for his act. But on the flip side, he struggled to recognise the faces of people he knew, which he saw as ‘very changeable’.
This is the same problem we have when an AI facial recognition algorithm has overfit to the training set and can’t generalise to the “test set” of photos taken a year later!
There are many techniques for avoiding overfitting. One solution is to stop learning when you’ve got as much as you can from the training set. Another is to constrain your algorithms to look for simple solutions (known as ‘regularisation’). We’ll discuss some of these in future articles.