Episode 46 — Prevent Data Poisoning: Supply Chain Controls for Training Data Integrity (Domain 3)
Data poisoning is a long-term threat where an attacker corrupts the training data to create "backdoors" or systemic biases in the resulting model, a key concern in Domain 3. This episode explores the supply chain risks associated with training data, emphasizing the need for strict controls over data sources and ingestion pipelines. For the AAIR certification, you must understand how to verify the integrity of large-scale datasets, especially when they are sourced from third parties or the public web. We discuss the use of cryptographic hashing, anomaly detection in training sets, and the importance of data lineage to track the provenance of every sample. Preventive measures include "gold-set" comparisons where a model's performance on a trusted dataset is compared against its performance on the potentially poisoned set. By securing the data supply chain, risk professionals ensure that the model's foundational "knowledge" is accurate and has not been tampered with to favor an attacker’s objectives or produce hidden failures during production. Produced by BareMetalCyber.com, where you’ll find more cyber audio courses, books, and information to strengthen your educational path. Also, if you want to stay up to date with the latest news, visit DailyCyber.News for a newsletter you can use, and a daily podcast you can commute with.