'Catastrophic forgetting': What it is, and how to prevent it
The fight against threat actors can often seem to be an uphill battle, and every weapon in a cybersecurity team's arsenal should be used to its greatest potential. One of those weapons is artificial intelligence (AI).
AI and cybersecurity complement each other well: for example, AI can learn patterns from millions of malware samples within a matter of days. This means that malware detection systems can be constantly updated as new malware samples are seen.
However, the threat landscape evolves so quickly that keeping AI-based detection models up to date is a significant challenge. One solution is to simply add new samples to the overall database and retrain the model from scratch on an ever-larger volume of data. But this can slow down the learning process and mean that updates are released less frequently.
A second option is to fine-tune the AI model on a selection of samples. However, if you train AI models only on new samples this can result in the AI "forgetting" older samples. This leads to a phenomenon dubbed "catastrophic forgetting.
Retraining the entire neural network would eat up about a week of time. Fine-tuning takes about an hour.
SophosAI wanted to see if it was possible to have a fine-tuning model that could keep up with the evolving threat landscape, learn new patterns but still remember older ones, while minimising the impact on performance. Researcher, Hillary Sanders evaluated a number of update options and has detailed her findings in this Sophos AI blog.
Data rehearsal
Data rehearsal involves taking a small selection of old samples and mixing them in with new, never-before-seen training data. Using this, the model is "reminded" of the old information it needed to detect older samples, while at the same time learning to detect the newer ones.
Learning rate
This approach involves modifying how quickly the model "learns" by adjusting how much it can change after seeing any given sample. If the learning rate is too fast (in which case the model can change a lot with each sample added), it will only "remember" the most recent samples it has seen. If the learning rate is too slow (the model can change only slightly with each sample added) it takes too long to learn anything. Finding the right trade-off between learning rate, retaining old information and adding new information can be tricky.
Elastic weight consolidation
This approach uses the "old" model to keep a "new" model grounded. If the new model starts to "forget," the old model acts as a spring and pulls it right back — hence, elastic weight consolidation. This approach is still largely academic and when testing it SophosAI had to make some adjustments to accommodate real-world needs. The advantage is that it means models can be trained on new samples only.
Conclusion
When comparing all three approaches, it is important to consider the trade-offs between performance, computation costs, and data maintenance costs. Elastic weight consolidation and changing the learning rate do not require older samples, but they are not the highest performers either.
Meanwhile, data-rehearsal has the strongest performance when compared to a model trained from scratch on both old and new data. It also reduces the overall cost of computation and data storage.
In short, data rehearsal was conclusively proven to be the most effective approach to address the issue of catastrophic forgetting, both in terms of detection rate and cost.
To learn more about how Sophos' uses AI to combat cyber threats, visit Sophos AI.