AdvSecureNet Documentation

AdvSecureNet

Build and CI/CD Status

Unit Tests and Style Checks Build and Deploy Sphinx Documentation Upload Python Package

Code Quality and Coverage

Quality Gate Status Bugs Code Smells Coverage Duplicated Lines (%) Lines of Code Reliability Rating Security Rating Technical Debt Maintainability Rating Vulnerabilities

Package Information

Code Style: Black pypi gitflow

AdvSecureNet is a Python library for Machine Learning Security, developed by Melih Catal at University of Zurich as part of his Master’s Thesis under the supervision of Prof. Dr. Manuel Günther. The main focus of the library is on adversarial attacks and defenses for vision tasks, with plans to extend support to other tasks such as natural language processing.

The library provides tools to generate adversarial examples, evaluate the robustness of machine learning models against adversarial attacks, and train robust machine learning models. Built on top of PyTorch, it is designed to be modular and extensible, making it easy to run experiments with different configurations. AdvSecureNet supports multi-GPU setups to enhance computational efficiency and fully supports both CLI and API interfaces, along with external YAML configuration files, enabling comprehensive testing and evaluation, facilitating the sharing and reproducibility of experiments.

Features

Adversarial Attacks: AdvSecureNet supports a diverse range of evasion attacks on computer vision tasks, including gradient-based, decision-based, single-step, iterative, white-box, black-box, targeted, and untargeted attacks, enabling comprehensive testing and evaluation of neural network robustness against various types of adversarial examples.

Adversarial Defenses: The toolkit includes adversarial training and ensemble adversarial training. Adversarial training incorporates adversarial examples into the training process to improve model robustness, while ensemble adversarial training uses multiple models or attacks for a more resilient defense strategy.

Evaluation Metrics: AdvSecureNet supports metrics like accuracy, robustness, transferability, and similarity. Accuracy measures performance on clean data, robustness assesses resistance to attacks, transferability evaluates how well adversarial examples deceive different models, and similarity quantifies perceptual differences using PSNR and SSIM.

Multi-GPU Support: AdvSecureNet is optimized for multi-GPU setups, enhancing the efficiency of training, evaluation, and adversarial attack generation, especially for large models and datasets or complex methods. By utilizing multiple GPUs in parallel, AdvSecureNet aims to reduce computational time, making it ideal for large-scale experiments and deep learning models.

CLI and API Interfaces: AdvSecureNet offers both CLI and API interfaces. The CLI allows for quick execution of attacks, defenses, and evaluations, while the API provides advanced integration and extension within user applications.

External Configuration Files: The toolkit supports YAML configuration files for easy parameter tuning and experimentation. This feature enables users to share experiments, reproduce results, and manage setups effectively, facilitating collaboration and comparison.

Built-in Models and Datasets Support: AdvSecureNet supports all PyTorch vision library models and well-known datasets like CIFAR-10, CIFAR-100, MNIST, FashionMNIST, and SVHN. Users can start without additional setup, but the toolkit also allows for custom datasets and models, offering flexibility for various research and applications.

Automated Adversarial Target Generation: AdvSecureNet can automatically generate adversarial targets for targeted attacks, simplifying the process and ensuring consistent and reliable results. As a user, you don’t need to manually specify targets. This feature is especially useful for targeted attacks on large datasets. You can also provide custom targets if you prefer.

Supported Attacks

Supported Defenses

Supported Evaluation Metrics

  • Benign Accuracy

  • Attack Success Rate

  • Transferability

  • Perturbation Distance

  • Robustness Gap

  • Perturbation Effectiveness

Similarity Metrics

Why AdvSecureNet?

  • Research-Oriented: Easily run and share experiments with different configurations using YAML configuration files.

  • Supports Various Attacks and Defenses: Experiment with a wide range of adversarial attacks and defenses.

  • Supports Any PyTorch Model: Use pre-trained models or your own PyTorch models with the library.

  • Supports Various Evaluation Metrics: Evaluate the robustness of models, performance of adversarial attacks, and defenses.

  • Benign Use Case Support: Train and evaluate models on benign data.

  • Native Multi-GPU Support: Efficiently run large-scale experiments utilizing multiple GPUs.

  • CLI and API Support: Use the command line interface for quick experiments or the Python API for advanced integration.

  • Automated Adversarial Target Generation: Simplify targeted attacks by letting the library generate targets automatically.

  • Active Maintenance: Regular updates and improvements to ensure the library remains relevant and useful.

  • Comprehensive Documentation: Detailed documentation to help you get started and make the most of the library.

  • Open Source: Free and open-source under the MIT license, allowing you to use, modify, and distribute the library.

Comparison with Other Libraries

AdvSecureNet stands out among adversarial machine learning toolkits like IBM ART, AdverTorch, SecML, FoolBox, ARES, and CleverHans. Key advantages include:

  • Active Maintenance: Ensures ongoing support and updates.

  • Comprehensive Training Support: One of the few toolkits supporting both adversarial and ensemble adversarial training.

  • Multi-GPU Support: The first toolkit with native multi-GPU support for attacks, defenses, and evaluations, ideal for large-scale experiments.

  • Flexible Interfaces: The first toolkit that fully supports CLI, API usage, and external YAML configuration files for reproducibility for all features.

  • Performance: AdvSecureNet excels in performance, significantly reducing execution times on multi-GPU setups.

comparison_table performance_comparision

[1] SecML supports attacks from CleverHans and FoolBox [2] This feature is only available for adversarial training.

Installation

You can install the library using pip:

pip install advsecurenet

Or install it from source:

git clone https://github.com/melihcatal/advsecurenet.git
cd advsecurenet
pip install -e .

Usage

The library can be used as a command line tool or as an importable Python package.

Command Line Tool

Use the advsecurenet command to interact with the library. Use advsecurenet --help to see available commands and options. It is recommended to use YAML configuration files to run experiments. You can list the available configuration options using advsecurenet utils configs list and generate a template configuration file using advsecurenet utils configs get -c <config_name> -o <output_file>.

Running an adversarial attack:

advsecurenet attack -c ./fgsm.yml

Running an adversarial defense:

advsecurenet defense adversarial-training -c ./adv_training.yml

Running an evaluation:

advsecurenet evaluate benign -c ./evaluate_benign.yml

or

advsecurenet evaluate adversarial -c ./evaluate_adversarial.yml

Python Package

You can import the library as a Python package. You can use the advsecurenet module to access the library. You can find the available modules and classes in the documentation.

image Usage example of AdvSecureNet demonstrating the equivalence between a YAML configuration file with a command-line interface (CLI) command and a corresponding Python API implementation.

Examples

Examples of different use cases can be found in the examples directory.

Architecture

The high-level architecture of the toolkit is shown in the figure below.

advsecurenet_arch

advsecurenet_arch

cli-arch

cli-arch

The toolkit is designed to be modular and extensible. CLI and Python API are implemented separately, however, they share the same core components and they have the same package structure for the sake of consistency. Tests are implemented for both CLI and Python API to ensure the correctness of the implementation and again they follow the same structure. The toolkit is designed to be easily extensible, new attacks, defenses, and evaluation metrics can be added by implementing the corresponding classes and registering them in the corresponding registries.

Testing

The library is tested using pytest and coverage is measured using coverage. You can run the tests using the following command:

pytest tests/

Some tests take longer to run. To speed up the tests, you can use the --device option to run tests on a specific device (e.g., --device cuda:0).

pytest tests/ --device cuda:0

Tests are categorized into the following groups:

  • cli: tests for the command line interface

  • advsecurenet: tests for the Python API

  • essential: tests for essential functionality (e.g., smoke and unit tests)

  • comprehensive: tests for comprehensive functionality (e.g., integration tests)

  • extended: tests for extended functionality (e.g., performance tests, security tests)

You can run tests for a specific group using the m option and the group name. For example, to run tests for the CLI:

pytest tests/ -m cli

CI/CD pipelines are set up to run tests automatically on every push and pull request. You can see the status of the tests in the badges at the top of the README.

Quality Assurance

AdvSecureNet is designed with a strong emphasis on code quality and maintainability. The toolkit follows best practices in software engineering and ensures high standards through the following measures:

  • PEP 8 Compliance: The codebase adheres to PEP 8 guidelines, the de facto coding standard for Python. We use Black for automatic code formatting to maintain consistent style and readability.

  • Static Code Analysis: We employ Pylint for static code analysis and MyPy for type checking. These tools help catch potential errors and enforce coding standards.

  • Code Quality and Complexity: Tools like SonarQube and Radon provide insights into code quality and complexity. These tools are integrated into our CI/CD pipelines to ensure that the code remains clean and maintainable.

  • Comprehensive Testing: The project features a robust testing suite, ensuring that all components are thoroughly tested. This helps in maintaining the reliability and stability of the toolkit.

  • Continuous Integration/Continuous Deployment (CI/CD): CI/CD pipelines are set up to automate the testing, analysis, and deployment processes. This ensures that any changes to the codebase are automatically verified for quality and correctness before being merged.

  • Documentation: Comprehensive documentation is available on GitHub Pages, providing detailed guidance on installation, usage, and API references. This ensures that users and contributors can easily understand and work with the toolkit.

By adhering to these practices and leveraging these tools, AdvSecureNet maintains a high standard of code quality, ensuring a reliable and user-friendly experience for developers and researchers alike.

License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

How to Contribute?

Thank you for considering contributing to AdvSecureNet. We welcome all contributions, including new features, bug fixes, and documentation improvements. To make the process seamless, we have prepared a guide for different types of contributions.

If you find a bug in the project, please open an issue on the GitHub repository. When reporting a bug, include the following details:

  • A detailed description of the issue.

  • Steps to reproduce the issue.

  • The version of the project you are using.

  • The operating system you are running.

  • The Python version.

  • If applicable:

    • The CUDA version.

  • Any other relevant information.

An example of a good bug report would be:

Description: [Detailed description of the issue]

Steps to Reproduce:

1.  [Step one]
2.  [Step two]
3.  [Step three]

Version: [Project version]
OS: [Operating system]
Python: [Python version ]
CUDA: [CUDA version if applicable]
Other: [Any other relevant information]

AdvSecureNet has two main components: the API and the CLI. You can contribute to either component. The high-level process is the same:

  1. Fork the repository.

  2. Create a new branch for your changes.

  3. Make your changes.

  4. Write tests for your changes.

  5. Run the tests.

  6. Document your changes.

  7. Submit a pull request.

  8. Stay tuned for feedback.

Code Quality Standards

To ensure code quality, follow these standards:

  • Format code using black.

  • Lint code using pylint.

  • Test code using pytest.

  • Document code using sphinx.

  • Type-check code using mypy.

  • Have your code reviewed by at least one other contributor before merging.

AdvSecureNet uses Gitflow as its branching model:

  • Develop new features in a feature branch.

  • Develop bug fixes in a hotfix branch.

  • The main branch is reserved for stable releases.

  • The develop branch is used for development.

When submitting a pull request, target the develop branch. For more information on Gitflow, refer to this guide.

Creating a New Feature

You have three options for creating a new feature:

  • A new attack

  • A new defense

  • A new evaluation metric / evaluator

Depending on the feature type and the target component (API or CLI), the process will differ. Refer to the following sections for more information.

The advsecurenet package contains an attacks module with various submodules based on attack types (e.g., gradient-based, decision-based). If your attack does not fit into any existing submodule, feel free to create a new one. Currently, AdvSecureNet supports evasion attacks on computer vision models only. All attacks should inherit from the AdversarialAttack class, an abstract base class that defines the interface for all attacks. The AdversarialAttack class is defined in the attacks.base module. Additionally, each attack should have its own configuration class, which defines the parameters of the attack. This approach keeps the attack class clean and makes it easier to use the attack in the CLI. The configuration class should be defined in the shared.types.configs.attack_configs folder and should inherit from the AttackConfig class. The AttackConfig class, also defined in the shared.types.configs.attack_configs folder, contains the device attribute, specifying the device on which the attack should run, and a flag indicating whether the attack is targeted or untargeted.

Follow these steps to create a new attack:

  1. Create a new submodule in the attacks module.

  2. Create a class for your attack that inherits from the AdversarialAttack class.

  3. Create a configuration class for your attack that inherits from the AttackConfig class.

  4. Implement the __init__ method, accepting the configuration as an argument.

  5. Implement the attack method, taking the model, input, and target as arguments and returning the adversarial example.

Example:

from advsecurenet.attacks.base import AdversarialAttack
from advsecurenet.shared.types.configs.attack_configs import AttackConfig
from advsecurenet.models.base_model import BaseModel
from dataclasses import dataclass, field

@dataclass(kw_only=True)
class RandomNoiseAttackConfig(AttackConfig):
    epsilon: float = field(default=0.1)

class RandomNoiseAttack(AdversarialAttack):
    def __init__(self, config: RandomNoiseAttackConfig):
        self.epsilon = config.epsilon

    def attack(self, model: BaseModel, input: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
        noise = torch.randn_like(input) * self.epsilon
        return input + noise

Using a dataclass for the configuration class makes it easy to create instances of the class with default values. It also prevents users from passing invalid arguments to the attack. Using kw_only=True ensures that users have to pass the arguments by keyword, which makes the code more readable and less error-prone. Additionally, it facilitates the future extension of the configuration class with new parameters without breaking the existing code.

To use your attack in the CLI, follow these additional steps:

  1. Create Default YAML Configuration Files

1.1. For Adversarial Training Command:

•   Create a configuration file for the attack parameters in the `cli/configs/attacks/base` folder.
•   Create another configuration file for the attack itself (including other necessary parameters) in the cli/configs/attacks folder.

This separation makes it easier to use the attack both in the adversarial training command and as a standalone attack.

Example of a configuration file for the attack parameters:

# Description: Base configuration file for the attack parameters. Located in the cli/configs/attacks/base folder
target_parameters: !include ../shared/attack_target_config.yml # Targeted attack configs
attack_parameters:
epsilon: 0.3 # The epsilon value to be used for the FGSM attack. The higher the value, the more the perturbation

Example of a configuration file for the attack:

# Description: Configuration file for the attack. Located in the cli/configs/attacks folder

model: !include ../shared/model_config.yml
dataset: !include ./shared/attack_dataset_config.yml
dataloader: !include ../shared/dataloader_config.yml
device: !include ../shared/device_config.yml
attack_procedure: !include ./shared/attack_procedure_config.yml

# Attack Specific Configuration
attack_config: !include ./base/attack_base_config.yml

The first file contains the parameters specific to the attack, while the second file contains the parameters common to all attacks. The first file is used when another command wants to use the attack as a parameter, which means that command takes care of the essential parameters like the model, dataset, etc. The second file is used when the attack is used as a standalone command, which means that the attack command needs to configure the necessary parameters to prepare the environment for the attack and then run the attack.

  1. Create a configuration dataclass for your attack in the cli/shared/types/attack/attacks folder.

  2. Update the attack mapping in the cli/shared/utils/attack_mappings.py file to include your attack.

  3. Finally, update the cli/commands/attack/commands.py module to include your attack as a subcommand.

There is no base class for defenses in AdvSecureNet. However, each defense should have its own configuration class similar to attacks. Follow these steps:

  1. Create a new submodule in the defenses module.

  2. Create a configuration class for your defense.

  3. Create a class for your defense and use the configuration class you created to initialize the defense in the __init__ method.

Since there isn’t a defender module to run the defenses, unlike the attacks, you need to create the logic for the defense in the cli/logic/defense folder. To add your defense to the CLI:

  1. Create a default YAML configuration file for your defense in the cli/configs/defenses folder.

  2. Create a configuration dataclass for your defense in the cli/shared/types/defense/defenses folder.

  3. Implement the defense logic in the cli/logic/defense folder.

  4. Add the defense to the cli/commands/defense/commands.py module.

Evaluation metrics are used to assess the performance of attacks, defenses or models. The advsecurenet package includes an evaluation module that contains all the evaluation metrics. Each evaluation metric should inherit from the BaseEvaluator class, an abstract base class that defines the interface for all evaluation metrics. The evaluators are context managers, allowing them to be used in a with statement to automatically clean up any resources they use. The BaseEvaluator class is defined in the evaluation.base_evaluator module.

  1. Create a new class for your evaluation metric that inherits from the BaseEvaluator class in the evaluation.evaluators folder.

  2. Implement the update method of your evaluation metric class. This method defines how the evaluation metric should be updated when a new sample is evaluated.

  3. Implement the get_results method of your evaluation metric class. This method should return the final result of the evaluation metric.

  4. If the evaluator is an adversarial evaluator, update the advsecurenet.shared.adversarial_evaluators module to include your evaluator.

Evaluation metrics are automatically available in the CLI once the API is updated. No additional steps are needed. This is because the CLI uses the attacker to run evaluation metrics, and they are not run independently.

Documentation

Improving documentation is always appreciated. If you find any part of the codebase that is not well-documented or could be improved, please open a pull request with your changes. We value any help in making the documentation more comprehensive and easier to understand.

Evaluation

Attack Success Rate (ASR) is the fundamental metric for evaluating the adversarial robustness of the model. It is the percentage of the adversarial examples that are successfully misclassified by the model. In the context of adversarial attacks, success can be defined differently for targeted and untargeted attacks. For targeted attacks, success means that the adversarial example is classified as a specific target class. For untargeted attacks, success means that the adversarial example is classified as any class other than the true class. The higher the ASR, the more vulnerable the model is to adversarial attacks. The ASR is given by the equation below.

\[ASR = \frac{\text{Number of Successful Adversarial Examples}}{\text{Total Number of Adversarial Examples}}\]

where:

  • Number of Successful Adversarial Examples refers to the count of adversarial examples that lead to a successful attack, where success is defined as: - In targeted attacks, the adversarial example is classified as the target class, y:sub:`target`. - In untargeted attacks, the adversarial example is classified as any class other than the true class, y:sub:`true`.

  • Total Number of Adversarial Examples is the total number of adversarial examples generated from inputs where the model initially makes a correct prediction.

The evaluation of adversarial transferability is facilitated by the transferability_evaluator class within the advsecurenet.evaluation.evaluators module. This evaluator is designed to assess the effectiveness of adversarial examples, originally generated for a source model, in deceiving various target models.

The evaluator is initialized with a list of target models. During the evaluation phase, the update method calculates whether the adversarial examples successfully mislead both the source and the target models. Success in targeted attacks is determined by the adversarial example being classified as the target class, while in untargeted attacks, success is achieved if the example is classified as any class other than its true class.

The evaluator maintains a tally of successful deceptions for each target model, as well as the total count of adversarial examples that successfully deceive the source model. The transferability rate for each target model is calculated as follows:

\[Transferability\ Rate = \frac{\text{Number of Successful Transfers to Target Model}}{\text{Total Number of Successful Adversarial Examples on Source Model}}\]

where:

  • Number of Successful Transfers to Target Model is the count of adversarial examples that successfully deceive the target model.

  • Total Number of Successful Adversarial Examples on Source Model refers to the count of adversarial examples that initially misled the source model.

This evaluation method provides a thorough analysis of the transferability of adversarial examples across different models, shedding light on the robustness of each model against such attacks.

Example

Consider a scenario with a source model, Model_A, and two target models, Model_B and Model_C. Suppose Model_A generates 100 adversarial examples, out of which 80 successfully deceive Model_A. When these 80 adversarial examples are tested against Model_B, 50 are successful, and against Model_C, 30 are successful.

Using the transferability rate formula:

For Model_B:

\[Transferability\ Rate_{Model_B} = \frac{50}{80} = 0.625\]

For Model_C:

\[Transferability\ Rate_{Model_C} = \frac{30}{80} = 0.375\]

This indicates that adversarial examples from Model_A are more transferable to Model_B than to Model_C.

Robustness Gap is a metric that measures the difference between the accuracy on clean examples and the accuracy on adversarial examples. The higher the robustness gap is, the more vulnerable the model is to adversarial attacks. Possible values for the robustness gap are between 0 and 1. 0 means that the model performs the same on clean and adversarial examples. 1 means that the model performs perfectly on clean examples but completely fails on adversarial examples.

Clean Accuracy (Aclean) is calculated as the ratio of the total number of correctly classified clean images (Ncorrect_clean) to the total number of samples (Ntotal):

\[A_{\text{clean}} = \frac{N_{\text{correct_clean}}}{N_{\text{total}}}\]

Adversarial Accuracy (Aadv) is the ratio of the total number of correctly classified adversarial images (Ncorrect_adv) to the total number of samples (Ntotal):

\[A_{\text{adv}} = \frac{N_{\text{correct_adv}}}{N_{\text{total}}}\]

The Robustness Gap (Grobust) is the difference between Clean Accuracy and Adversarial Accuracy:

\[G_{\text{robust}} = A_{\text{clean}} - A_{\text{adv}}\]

Evaluator for the perturbation effectiveness. The effectiveness score is the attack success rate divided by the perturbation distance. The higher the score, the more effective the attack.

Example

Suppose we have a model tested on a dataset of 1000 images. Out of these, the model correctly classifies 950 clean images, giving us Ncorrect_clean = 950 and Ntotal = 1000. When exposed to adversarial examples, the model correctly classifies only 700 of these images, thus Ncorrect_adv = 700.

Using the provided formulas, we can calculate the Clean Accuracy (Aclean) and the Adversarial Accuracy (Aadv):

\[A_{\text{clean}} = \frac{950}{1000} = 0.95\]
\[A_{\text{adv}} = \frac{700}{1000} = 0.70\]

Then, the Robustness Gap (Grobust) is calculated as:

\[G_{\text{robust}} = A_{\text{clean}} - A_{\text{adv}} = 0.95 - 0.70 = 0.25\]

This Robustness Gap of 0.25 indicates that the model’s performance significantly degrades when exposed to adversarial examples, revealing a vulnerability to such attacks.

Perturbation Effectiveness is a metric for evaluating the effectiveness of the adversarial perturbation. It is the percentage of the adversarial perturbation that is effective in changing the model’s prediction. The higher the perturbation effectiveness is, the more effective the adversarial perturbation is. The purpose of this metric is to distinguish between attacks that have a high success rate but require a large perturbation magnitude, and attacks that have a lower success rate but require a smaller perturbation magnitude. The perturbation effectiveness is given by the equation below.

\[PE = \frac{\text{Attack Success Rate}}{\text{Perturbation}}\]

where:

  • Attack Success Rate is the percentage of the adversarial examples that are successfully misclassified by the model.

  • Perturbation is the perturbation magnitude of the adversarial examples. It can be measured using different norms, such as L1, L2, or Linf.

The Peak Signal-to-Noise Ratio (PSNR) metric is a standard used in the field of image processing for assessing the quality of reconstructed or compressed images in relation to the original ones. The PSNR is derived from the mean squared error (MSE) between the original image and the reconstructed one. It is typically expressed in decibels (dB), indicating the ratio of the maximum possible power of a signal to the power of corrupting noise.

The formula for PSNR is given by:

\[\text{PSNR} = 10 \cdot \log_{10} \left( \frac{\text{MAX}_I^2}{\text{MSE}} \right)\]

where MAX_I represents the maximum possible pixel value of the image (e.g., 255 for 8-bit images), and MSE is the mean squared error between the original and reconstructed images.

The range of PSNR is typically between 0 dB to infinity, with higher values indicating a smaller difference between the original and reconstructed image, and thus, better quality. In cases where the original and reconstructed images are identical, the MSE becomes zero, leading to an undefined PSNR in the logarithmic scale, which can be theoretically considered as infinite. A higher PSNR value generally suggests that the reconstructed image closely resembles the original image in quality.

The Structural Similarity Index Measure (SSIM) is a metric used for measuring the similarity between two images. Unlike traditional methods like PSNR that focus on pixel-level differences, SSIM considers changes in structural information, luminance, and contrast, providing a more perceptually relevant assessment of image quality.

The formula for SSIM is given by:

\[\text{SSIM}(x, y) = \frac{(2\mu_x\mu_y + C_1)(2\sigma_{xy} + C_2)}{(\mu_x^2 + \mu_y^2 + C_1)(\sigma_x^2 + \sigma_y^2 + C_2)}\]

where \(x\) and \(y\) are the two images being compared, \(\mu_x\), \(\mu_y\) are the average pixel values, \(\sigma_x^2\), \(\sigma_y^2\) are the variances, \(\sigma_{xy}\) is the covariance of the images, and \(C_1\), \(C_2\) are constants used to stabilize the division.

The SSIM index is a decimal value between -1 and 1, where a value of 1 for SSIM implies no difference between the compared images. As the value decreases, the differences between the images increase. SSIM is particularly useful in contexts where a human observer’s assessment of quality is important, as it aligns more closely with human visual perception than metrics based solely on pixel differences, which makes it suitable for adversarial robustness evaluation.

ssim_psnr Comparison of SSIM and PSNR Metrics vs. Epsilon in FGSM Attack

ssim_psnr_example SSIM and PSNR Example. Taken from https://medium.com/@datamonsters/a-quick-overview-of-methods-to-measure-the-similarity-between-images-f907166694ee

Adversarial Attacks

AdvSecureNet supports various adversarial attacks, including:

  • Fast Gradient Sign Method (FGSM)

  • Carlini and Wagner (C&W)

  • Projected Gradient Descent (PGD)

  • DeepFool

  • Decision Boundary

  • Layerwise Origin-Target Synthesis (LOTS)

Some of these attacks are targeted, while others are untargeted. AdvSecureNet provides a simple way to use targeted adversarial attacks by having an automatic target generation mechanism. This mechanism generates a target label that is different from the original label of the input image. The target label is chosen randomly from the set of possible labels, excluding the original label. This ensures that the attack is targeted, as the goal is to mislead the model into predicting the target label instead of the correct one without being explicitly specified by the user. This feature is particularly useful for the large datasets where manually specifying target labels for each input image is impractical. However, users can also specify the target label manually if they would like to do so.

Below, we provide a brief overview of each adversarial attack supported by AdvSecureNet, including its characteristics, purpose, and potential applications.

Adversarial attacks can be categorized in different ways. One way to categorize them is based on the information they use. Broadly, these attacks fall into two categories: white-box and black-box attacks [1]_ [2]_. White-box attacks necessitate access to the model’s parameters, making them intrinsically reliant on detailed knowledge of the model’s internals [3]_ [4]_. In contrast, black-box attacks operate without requiring access to the model’s parameters [5]_ [6]_. Among black-box attack methods, one prevalent approach involves training a substitute model to exploit the transferability of adversarial attacks [7]_, targeting the victim model indirectly [8]_. Additionally, there are other black-box attack methods such as decision boundary attacks [9] and zeroth order optimization based attacks such as ZOO [10]. These methods, distinct from the substitute model approach, rely solely on the output of the model, further reinforcing their classification as black-box attacks.

Adversarial attacks can also be differentiated based on the number of steps involved in generating adversarial perturbations. This categorization divides them into single-step and iterative attacks [11]. Single-step attacks are characterized by their speed, as they require only one step to calculate the adversarial perturbation. On the other hand, iterative attacks are more time-consuming, involving multiple steps to incrementally compute the adversarial perturbation [12].

Additionally, another classification of adversarial attacks hinges on the objective of the attack. In this context, attacks are grouped into targeted and untargeted categories [13]. Targeted attacks are designed with the specific goal of manipulating the model’s output to a predetermined class. In contrast, untargeted attacks are aimed at causing the model to incorrectly classify the input into any class, provided it is not the correct one [14].


FGSM

FGSM, short for Fast Gradient Sign Method, is a type of adversarial attack that was introduced by Goodfellow et al. [15] in 2015. It is a single-step, white box attack. Initially, the attack is designed as an untargeted attack. However, it can be modified to be a targeted attack. The idea of the FGSM attack is to compute the adversarial perturbation by taking the sign of the gradient of the loss function with respect to the input. The FGSM attack is a fast attack since it only requires one step to compute the adversarial perturbation. This makes it a popular attack in the adversarial robustness literature. However, it has been shown that the FGSM attack is not effective against the adversarial defenses. This is because the FGSM attack is a weak attack and it can be easily defended by the adversarial defenses such as adversarial training.

If the attack is untargeted, the formula tries to maximize the loss function with respect to the input and correct label. If the attack is targeted, the formula tries to minimize the loss function with respect to the input and target label since the purpose of the targeted attack is to get closer to the target label. The untargeted FGSM attack is given in the equation below, and the targeted FGSM attack is given in the subsequent equation.

\[\text{adv}_x = x + \epsilon \cdot \text{sign}(\nabla_x J(\theta, x, y))\]
\[\text{adv}_x = x - \epsilon \cdot \text{sign}(\nabla_x J(\theta, x, y))\]

where:

  • \(\text{adv}_x\) is the adversarial image.

  • \(x\) is the original input image.

  • \(y\) is the original label for untargeted attacks, or the target label for targeted attacks.

  • \(\epsilon\) is a multiplier to ensure the perturbations are small.

  • \(\theta\) represents the model parameters.

  • \(J(\theta, x, y)\) is the loss function used by the model.


C&W

The Carlini and Wagner (C&W) attack [16], introduced by Nicholas Carlini and David Wagner in 2017, is a sophisticated method of adversarial attack aimed at machine learning models, particularly those used in computer vision. As an iterative, white-box attack, it requires access to the model’s architecture and parameters. The core of the C&W attack involves formulating and solving an optimization problem that minimally perturbs the input image in a way that leads to incorrect model predictions. This is done while maintaining the perturbations imperceptible to the human eye, thus preserving the image’s visual integrity. The optimization process often employs techniques like binary search to find the smallest possible perturbation that can deceive the model. The C&W attack is versatile, capable of being deployed as both untargeted and targeted attacks. The attack can also use different distance metrics when computing the perturbation, such as the L0, L2, and L-infinity norms. The choice of distance metric can affect the attack’s effectiveness and the perturbation’s perceptibility.

The attack’s effectiveness lies in its ability to subtly manipulate the input data, challenging the robustness and security of machine learning models, and it has become a benchmark for testing the vulnerability of these models to adversarial examples. However, the biggest drawback of the C&W attack is its computational complexity, which stems from the iterative nature of the attack and the optimization problem that needs to be solved. This makes the C&W attack less practical for real-world applications.

\[\begin{split}\begin{aligned} & \text{minimize} \quad \|\delta\|_p + c \cdot f(x + \delta) \\ & \text{such that} \quad x + \delta \in [0,1]^n \end{aligned}\end{split}\]

where:

  • \(\delta\) is the perturbation added to the input image \(x\).

  • \(\|\delta\|_p\) is the p-norm of the perturbation, which measures the size of the perturbation.

  • \(c\) is a constant that balances the perturbation magnitude and the success of the attack.

  • \(f(x + \delta)\) is the objective function, designed to mislead the model into making incorrect predictions.

  • \(x + \delta \in [0,1]^n\) ensures that the perturbed input remains within the valid input range for the model.


PGD

The Projected Gradient Descent (PGD) attack [17] is a prominent adversarial attack method in the field of machine learning, particularly for evaluating the robustness of models against adversarial examples. Introduced by Madry et al. [18], the PGD attack is an iterative method that generates adversarial examples by repeatedly applying a small perturbation and projecting this perturbation onto an \(\varepsilon\)-ball around the original input within a specified norm. This process is repeated for a fixed number of steps or until a successful adversarial example is found. The PGD attack operates under a white-box setting, where the attacker has full knowledge of the model, including its architecture and parameters. The strength of the PGD attack lies in its simplicity and effectiveness in finding adversarial examples within a constrained space, making it a standard benchmark in adversarial robustness research. According to cite here, the PGD attack is one of the most used attacks in adversarial training. However, similar to other adversarial attacks like the C&W attack, the PGD attack can be computationally intensive, particularly when dealing with complex models and high-dimensional input spaces, which may limit its practicality in real-world scenarios.


DeepFool

The DeepFool attack, introduced by Moosavi-Dezfooli et al. [19] in 2016, is a type of adversarial attack that aims to generate adversarial examples that are close to the original input but mislead the model. It is an iterative, white-box attack. The algorithm works by linearizing the decision boundaries of the model and then applying a small perturbation that pushes the input just across this boundary. This process is repeated iteratively until the input is misclassified, ensuring that the resulting adversarial example is as close to the original input as possible. One of the key strengths of DeepFool is its ability to compute these minimal perturbations with relatively low computational overhead compared to other methods because of its linearization approach. Despite its efficiency, the attack assumes a somewhat idealized linear model, which may not always accurately reflect the complex decision boundaries in more advanced, non-linear models. Nonetheless, DeepFool has become a valuable tool in the adversarial machine learning toolkit for its ability to provide insights into model vulnerabilities with minimal perturbations.


Decision Boundary

The Decision Boundary attack is a black-box attack that was introduced by Brendel et al. [20] in 2017. The idea of the Decision Boundary attack is to find the decision boundary of the model and then apply a small perturbation that pushes the input just across this boundary. The Decision Boundary attack is an iterative attack and can be both targeted and untargeted. The attack starts with a random input that is initially adversarial and then iteratively updates the input to get closer to the decision boundary and minimize the perturbation. The advantage of the attack is that it does not require any information about the model. This makes it more suitable for real-world applications where the model’s information is not available. However, the drawback of the attack is that it is computationally expensive since it requires iteratively updating the input to get closer to the decision boundary.


LOTS

LOTS, Layerwise Origin-Target Synthesis [21], is a type of adversarial attack that was introduced by Rozsa et al. in 2017. It is a versatile, white-box attack that can be used as both targeted and untargeted attacks, single-step and iterative. The idea of the LOTS attack is to compute the adversarial perturbation by using the deep feature layers of the model. The purpose of the attack algorithm is to adjust the deep feature representation of the input to match the deep feature representation of the target class. Utilizing deep feature representations makes the LOTS attack suitable for systems that use deep feature representations, such as face recognition systems. The results show that the Iterative LOTS attack is highly successful against the VGG Face network with success rates between 98.28% to 100% [22]. However, the drawback of the LOTS attack is that it needs to know the deep feature representation of the target class.


References

Adversarial Defenses

Adversarial training is a methodology in machine learning, particularly within the field of deep learning, aimed at improving the robustness and generalization of models against adversarial examples. These are inputs deliberately crafted to deceive models into making incorrect predictions or classifications [1]_. The concept of adversarial training emerged as a critical response to the observation that neural networks, despite their high accuracy, are often vulnerable to subtly modified inputs that are imperceptible to humans [2]_.

The core idea behind adversarial training involves the intentional generation of adversarial examples during the training process. By exposing the model to these challenging scenarios, the model learns to generalize better and becomes more resistant to such attacks [3]_. So far, adversarial training represents the only known defense that works to some extent and scale against adversarial attacks [4]_.

adversarial_training Adversarial Training Flow adversarial_data_generator Adversarial Data Generator

Ensemble Adversarial Training, proposed by Florian Tramèr et al. [5]_ in 2018, is a type of adversarial training that aims to improve the robustness of the model to unseen attacks and black-box attacks by generalizing the adversarial training process. The idea of the ensemble adversarial training is crafting adversarial examples from a set of pretrained substitute models in addition to the adversarial examples crafted from the original source model that the defender wants to robustify. The intuition is that crafting adversarial samples only from the source model can lead to overfitting to the source model and the model still can be vulnerable to unseen attacks and black-box attacks. However, crafting adversarial samples from a set of pretrained substitute models can lead to generalization of the adversarial training process and improve the robustness of the model to unseen attacks and black-box attacks. The experiments [5]_ showed that the ensemble adversarial training can improve the robustness of the model to unseen attacks and black-box attacks but lower the accuracy on clean examples.

The ensemble feature of the ensemble adversarial training refers to ensemble of models. However, it is also possible to ensemble the adversarial attacks [5]_. The intuition is similar to the ensemble of models, which is generalization since the adversarial training does not offer a guarantee to the unseen attacks [6]_ [7]_. It’s also shown that having a robust model to one type of attack can make the model more vulnerable to other types of attacks [7]_ [8]_.

The ensemble of adversarial attacks refers to crafting adversarial examples from a set of adversarial attacks in addition to the adversarial examples crafted from the original adversarial attack. The purpose is having a robust model to different types of perturbations simultaneously [5]_. However, the results show that the models trained with ensemble of adversarial attacks are not robust as the models trained with each attack individually [5]_.

ensemble_adversarial_generator Ensemble Adversarial Generator. The generator randomly picks one model from the models pool and one attack from the attacks pool. Only having origin model and one attack is the same as classical adversarial training. Having one attack but multiple pretrained models is the Ensemble Adversarial Training. It’s also possible have ensemble models and ensemble attacks at the same time.

Indices and tables