What are Adversarial Attacks?
Adversarial attacks are techniques designed to fool machine learning models by introducing carefully crafted perturbations to input data. These attacks exploit vulnerabilities in how models process information, causing them to make incorrect predictions or classifications.
Types of Adversarial Attacks
Evasion Attacks
- Modify inputs at inference time
- Most common type
- Examples: adversarial images, audio
Poisoning Attacks
- Corrupt training data
- Degrade model performance
- Insert backdoors
Model Extraction
- Steal model functionality
- Query-based attacks
- Reverse engineering
Model Inversion
- Extract training data
- Privacy violations
- Membership inference
Attack Methods
White-Box Attacks Full model access:
- FGSM (Fast Gradient Sign Method)
- PGD (Projected Gradient Descent)
- C&W Attack
Black-Box Attacks No model access:
- Transfer attacks
- Query-based attacks
- Boundary attacks
Defense Strategies
- Adversarial training
- Input preprocessing
- Defensive distillation
- Certified defenses
- Ensemble methods
- Detection mechanisms