When researchers at JFrog, a software management company, examined artificial intelligence (AI) and machine learning models uploaded to Hugging Face earlier this year, they found about 100 malicious models. The discovery sheds light on an often-underestimated category of cybersecurity problems: data poisoning and manipulation.
Data poisoning targets the information used to train AI and machine learning models. This type of attack is atypical in the cybersecurity field and can sometimes be difficult or impossible to detect or counter. Manipulating the training data of large-scale language models (LLMs) like ChatGPT is relatively easy and does not require traditional hacking.
Wrong results
Data poisoning can be done to influence the behavior of the AI model or to trick it into producing erroneous results by modifying the data sent to the trained model. These are two different types of attacks: one before the model is deployed, the other after. Both are extremely difficult to detect and prevent.
Manipulating training data is not new. Since the early days of machine learning, researchers have shown how subtle attacks can lead a model to give incorrect answers with high confidence.
Models that poison each other
Over time, it is even possible that generative AI models, as they scour the internet, “poison” themselves, as their results become training data for future models, a phenomenon known as “degenerative model collapse.”
The researchers showed how subtle attacks could lead a model to give incorrect answers with high confidence.
Complexity increases with the difficulty of reproducing the results of AI models, due to the large datasets used for training. Researchers and data scientists may not fully understand what goes into a model and what comes out, making detection and traceability of malicious code even more complex.
Faced with this reality, ignoring the risks of data poisoning and manipulation can encourage attackers to develop stealthy exploits in AI software.
Depending on the attackers’ goals, consequences can include malicious code execution, new attack vectors for phishing campaigns, and misclassified model outputs leading to unexpected behavior.
Protect yourself from attacks
To protect against data poisoning attacks, several techniques are recommended, particularly during the data training phase and for the algorithms themselves.
The Open Source Foundation for Application Security (OWASP) recommends, in its “Top 10 for LLM Applications” list, to pay attention to the data supply chain, whether internal or external. It is crucial to continuously verify data sources throughout the pre-training, tuning, and integration phases, as well as to identify biases or anomalies.
OWASP also recommends “cleaning” data using outlier and anomaly detection methods to prevent hostile data from being incorporated into the tuning process.
Without trust and reliability, the greatest technological innovations risk losing their momentum.
Organizations must take a holistic approach to preventing threats in AI code generation by considering the entire ecosystem and supply chains associated with GenAI, LLMs, etc., as part of the overall threat landscape.
Sam Curryis Global Vice President and CISO in Residence at Zscaler