advsecurenet.dataloader package


advsecurenet.dataloader.data_loader_factory module

This module contains the DataLoaderFactory class that creates a DataLoader for the given dataset.

class advsecurenet.dataloader.data_loader_factory.DataLoaderFactory

Bases: object

The DataLoaderFactory class that creates a DataLoader for the given dataset.

None
static create_dataloader(config: DataLoaderConfig | None = None, **kwargs) DataLoader

A static method that creates a DataLoader for the given dataset with the given parameters.

Parameters:
  • config (DataLoaderConfig) – The DataLoader configuration.

  • **kwargs – Arbitrary keyword arguments for the DataLoader.

Returns:

The DataLoader for the given dataset.

Return type:

TorchDataLoader

Raises:

ValueError – If the dataset is not of type TorchDataset.

Note

It is possible to create a DataLoader without providing a DataLoaderConfig. In this case, the DataLoader will be created with the provided keyword arguments. DataLoaderConfig contains the following fields:

  • dataset: TorchDataset

  • batch_size: int

  • num_workers: int

  • shuffle: bool

  • drop_last: bool

  • pin_memory: bool

  • sampler: Optional[torch.utils.data.Sampler]

advsecurenet.dataloader.data_loader_factory.dataclass_to_dict(instance)

Convert a dataclass instance to a dictionary, handling non-serializable fields appropriately.

advsecurenet.dataloader.distributed_eval_sampler module

class advsecurenet.dataloader.distributed_eval_sampler.DistributedEvalSampler(dataset, num_replicas=None, rank=None, shuffle=False, seed=0)

Bases: Sampler

This code is taken from : https://github.com/SeungjunNah/DeepDeblur-PyTorch/blob/master/src/data/sampler.py

DistributedEvalSampler is different from DistributedSampler. It does NOT add extra samples to make it evenly divisible. DistributedEvalSampler should NOT be used for training. The distributed processes could hang forever. See this issue for details: https://github.com/pytorch/pytorch/issues/22584 shuffle is disabled by default

DistributedEvalSampler is for evaluation purpose where synchronization does not happen every epoch. Synchronization should be done outside the dataloader loop.

Sampler that restricts data loading to a subset of the dataset.

It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such a case, each process can pass a :class`~torch.utils.data.DistributedSampler` instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it.

Note

Dataset is assumed to be of constant size.

Parameters:
  • dataset – Dataset used for sampling.

  • num_replicas (int, optional) – Number of processes participating in distributed training. By default, rank is retrieved from the current distributed group.

  • rank (int, optional) – Rank of the current process within num_replicas. By default, rank is retrieved from the current distributed group.

  • shuffle (bool, optional) – If True (default), sampler will shuffle the indices.

  • seed (int, optional) – random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Default: 0.

Warning

In distributed mode, calling the :meth`set_epoch(epoch) <set_epoch>` method at the beginning of each epoch before creating the DataLoader iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used.

Example:

>>> sampler = DistributedSampler(dataset) if is_distributed else None
>>> loader = DataLoader(dataset, shuffle=(sampler is None),
...                     sampler=sampler)
>>> for epoch in range(start_epoch, n_epochs):
...     if is_distributed:
...         sampler.set_epoch(epoch)
...     train(loader)
set_epoch(epoch)

Sets the epoch for this sampler. When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters:

epoch (int) – _epoch number.