Introduction

This is an extension to pystematic that adds functionality related to running machine learning experiments in pytorch. Its main contribution is the Context and related classes, which has the goal of making your code agnostic to whether or not you are running on cuda, cpu, or distributed data-parallel.

Installation

All you have to do for pystematic to load the extension is to install it:

$ pip install pystematic-torch

Experiment API

This extension publishes its API under the pystematic.torch namespace.

General

pystematic.torch.move_to_device(device, *args)

Utility method to place a batch of data on a specific device (i.e. cuda or cpu). It handles nested dicts and lists by traversing every element and moving them to the proper device if possible. Unrecognized objects will be left as is.

Parameters:
  • device (str, torch.Device) – The device to move to

  • *args (any) – Any objects that you want to move

Returns:

The moved objects

Return type:

any

pystematic.torch.save_checkpoint(state_dict, id) None

Saves the provided state_dict to a file in pystematic.output_dir. This function will make sure to only save the checkpoint in the master process when called in distributed mode.

Parameters:
  • state_dict (dict) – The state dict to save, such as the on returned from Context.state_dict()

  • id (any) – An id that uniquely identifies this checkpoint. E.g. epoch number, step number etc.

pystematic.torch.load_checkpoint(checkpoint_file_path) dict

Loads and returns a checkpoint from the given filepath.

Parameters:

checkpoint_file_path (str, pathlib.Path) – Path to the file to load.

Returns:

The loaded state dict.

Return type:

dict

pystematic.torch.run_parameter_sweep(experiment, list_of_params, max_num_processes=1, num_gpus_per_process=None) None

Extends the pystematic.run_parameter_sweep() with GPU limiting capabilities.

Runs an experiment multiple times with a set of different params. At most max_num_processes concurrent processes will be used. This call will block until all experiments have been run.

Parameters:
  • experiment (Experiment) – The experiment to run.

  • list_of_params (list of dict) – A list of parameter dictionaries. Each corresponding to one run of the experiment. See pystematic.param_matrix() for a convenient way of generating such a list.

  • max_num_processes (int, optional) – The maximum number of concurrent processes to use for running the experiments. Defaults to 1.

  • num_gpus_per_process (int, optional) – The number of GPUs to allocate for each experiment. If None no allocation is done. Default is None.

Distributed

pystematic.torch.is_distributed() bool

Alias for torch.distributed.is_initialized().

Returns:

Returns true if torch distributed runtime is initialized.

Return type:

bool

pystematic.torch.is_master() bool

If running in distributed mode, returns whether of not this current process is the master process. In non-distributed mode, always returns True.

Returns:

Whether the current process is the master process.

Return type:

bool

pystematic.torch.get_num_processes() int

Alias for torch.distributed.get_world_size(). In non-distributed mode, this always returns 1.

Returns:

The total number of processes in the distributed runtime.

Return type:

int

pystematic.torch.get_rank() int

Returns the global rank of the current process. If the current process is not currently running in distributed mode, it always return 0. In single node training the rank is the same as the local rank.

Returns:

The rank of the current process.

Return type:

int

pystematic.torch.broadcast_from_master(value)

Alias for torch.distributed.broadcast(value, 0). In non-distributed mode, this just returns the value.

pystematic.torch.distributed_barrier() None

Alias for torch.distributed.barrier(). In non-distributed mode, this is a noop.

Context

When you are developing a model in pytorch, you often want to be able to train the model in many different settings, such as multi-node distributed, single gpu or even just on the cpu depending on your work location and on available resources. The main purpose of the context object is to allow you to transition seamlessly between these different modes of training, without changing your code.

If you are familiar with the Torch.nn.Module object, you know that whenever you add a paramater to the object, it gets registered with it, and when you want to move the model to another device, you simply call module.cuda() or module.cpu() to move all paramters registered with the module.

A context object is like a torch module on steroids. You are meant to register every object important to your training session with it, e.g. models, optimizers, epoch counter etc. You can then transition your session with the Context.cpu(), Context.cuda() and Context.ddp() methods.

You can also serialize and restore the state of the entire session with the Context.state_dict() and Context.load_state_dict() methods, which makes checkpointing painless.

Here is a short example showing how the Context may be used:

import pystematic

@pystematic.experiment
def context_example(params):
    ctx = pystematic.torch.Context()

    ctx.epoch = 0

    ctx.recorder = pystematic.torch.Recorder()

    ctx.model = torch.nn.Sequential(
        torch.nn.Linear(2, 1),
        torch.nn.Sigmoid()
    )

    ctx.optimzer = torch.optim.SGD(ctx.model.parameters(), lr=0.01)

    # We use the smart dataloader so that batches are moved to
    # the correct device
    ctx.dataloader = pystematic.torch.SmartDataLoader(
        dataset=Dataset(),
        batch_size=2
    )
    ctx.loss_function = torch.nn.BCELoss()

    ctx.cuda() # Move everything to cuda
    # ctx.ddp() # and maybe distributed data-parallel?


    if params["checkpoint"]:
        # Load checkpoint
        ctx.load_state_dict(pystematic.torch.load_checkpoint(params["checkpoint"]))

    # Train one epoch
    for input, lbl in ctx.dataloader:
        # The smart dataloader makes sure the batch is placed on
        # the correct device.
        output = ctx.model(input)

        loss = ctx.loss_function(output, lbl)

        ctx.optimzer.zero_grad()
        loss.backward()
        ctx.optimzer.step()

        ctx.recorder.scalar("train/loss", loss)
        ctx.recorder.step()

    ctx.epoch += 1

    # Save checkpoint
    pystematic.torch.save_checkpoint(ctx.state_dict(), id=ctx.epoch)

The following list specifies the transformations applied to each type of object:

torch.nn.Module:

  • cuda: moved to torch.cuda.current_device()

  • cpu: moved to cpu

  • ddp: Gets wrapped in torch.nn.parallel.DistributedDataParallel and then in an object proxy, that delegates all non-existing getattr() calls to the underlying module. This means that you should be able to use any custom attributes and methods of the original module, even after it get wrapped in the DDP module. This is needed to make the code you write agnostic to whether or not it is currently run in distributed mode.

torch.optim.Optimizer:

  • cuda, cpu, ddp: Optimizer parameters will be moved to the correct device.

pystematic.torch.Recorder:

  • ddp: gets silenced on non master processes

pystematic.torch.SmartDataLoader:

  • cuda, cpu: Moves the dataloader to the proper device. If you initialize the dataloader with move_output = True, the items yielded when iterating the dataloader are moved to the correct device.

Any object with a method named to() (such as torch.Tensor):

  • cuda, cpu, ddp: call the to() method with the device to move the object to.

All other types of objects are left unchanged.

The autotransform() method uses the parameters cuda, distributed, checkpoint to automatically determine how the context should be transformed.

class pystematic.torch.Context
autotransform()

Transforms the context according to the current experiment parameters. More specifically it; loads a state_dict from the parameter checkpoint if set, moves to cuda if paramter cuda is set, moves to distributed if parameter distributed is set.

cpu()

Moves the context to the cpu.

cuda()

Moves the context to torch.cuda.current_device().

ddp()

Moves the context to a distributed data-parallell setting. Can only be used if torch.distributed is initialized.

load_state_dict(state: dict) None

Sets the state for the context.

Parameters:

state (dict) – The state to load.

state_dict() dict

Returns the whole state of the context by iterating all registered items and calling state_dict() on the item to retrieve its state. Primitive values will also be saved.

Returns:

A dict representing the state of all registered objects.

Return type:

dict

Other

class pystematic.torch.Recorder(output_dir=None, tensorboard=True, file=True, console=False)

Used for recording metrics during training and evaluation.

The recorder has an internal counter count that is recorded together with all values. The count typically represents the ‘global_step’ during training. Remember to increment the counter appropriately.

Each recorded value is also associated with a tag that uniquely determines which time series the value should be recorded to. The tag can use slashes (‘/’) to build hierarchies. E.g. train/loss, test/loss etc.

Parameters:
  • output_dir (str, optional) – The output directory store data in. Defaults to pystematic.output_dir.

  • tensorboard (bool, optional) – If the recorder should write tensorboard logs. Defaults to True.

  • file (bool, optional) – If the recorder should write to plain files. Defaults to True.

  • console (bool, optional) – If the recorder should write to stdout. Defaults to False.

property count: int

Counter that represents the x-axis when logging data. You can assign a value to this property or call step() to increase the counter.

figure(tag, fig)

Logs a matplotlib figure

Parameters:
  • tag (str) – A string that determines which time series the value should be recorded to.

  • fig (Figure) – A matplotlib figure

image(tag, image)

Logs an image

Parameters:
  • tag (str) – A string that determines which time series the value should be recorded to.

  • image (PIL.Image, np.ndarray, torch.tensor) – The image

load_state_dict(state)

Loads a state dict

Parameters:

state (dict) – The state dict to load.

params(params_dict)

Logs a parameter dict.

Parameters:

params_dict (dict) – dict of param values.

scalar(tag, scalar)

Logs a scalar value.

Parameters:
  • tag (str) – A string that determines which time series the value should be recorded to.

  • scalar (float) – The value of the scalar.

state_dict()

Returns the state of the recorder, which consists of the counter count.

Returns:

A dict representing the state of this recorder

Return type:

dict

step()

Increases count by 1.

class pystematic.torch.SmartDataLoader(dataset, shuffle=False, random_seed=None, sampler=None, batch_sampler=None, move_output=True, loading_bar=True, **kwargs)

Extends the torch.utils.data.DataLoader with the following:

  • A loading bar is displayed when iterating the dataloader.

  • The items yielded when iterating are moved to the device previously set with to().

  • Transparently handles both distributed and non-distributed modes.

Parameters:
  • dataset (torch.utils.data.Dataset) – The dataset to construct a loader for

  • shuffle (bool, optional) – Whether to shuffle the data when loading. Ignored if sampler is not None. Defaults to False.

  • random_seed (int, optional) – Random seed to use when shuffleing data. Ignored if sampler is not None. Defaults to None.

  • sampler (torch.utils.data.Sampler, Iterable, optional) – An object defining how to sample data items. Defaults to None.

  • move_output (bool, optional) – If items yielded during iteration automatically should be moved to the curent device. Defaults to True.

  • loading_bar (bool, optional) – If a loading bar should be displayed during iteration. Defaults to True.

to(device)

Sets the device that yielded items should be placed on when iterating.

Parameters:

device (str, torch.Device) – The device to move the items to.

Default parameters

The following parameters are added to all experiments by default. Note that these are also listed if you run an experiment from the command line with the --help option.

  • checkpoint: If using the context autotransform() method, it will load the checkpoint pointed to by this parameter (if set). Default value is None.

  • cuda: If using the context autotransform() method, setting this to True will move the context to cuda. Default value is True.

  • distributed: Controls if the experiment should be run in a distributed fashion (multiple GPUs). When set to True, a distributed mode will be launched (similar to torch.distributed.launch) before the experiment main function is run. If using the context autotransform() method, this parameter also tells the context whether to move to distributed mode (ddp). Default value is False.

  • node_rank: The rank of the node for multi-node distributed training. Default value is 0.

  • nproc_per_node: The number of processes to launch on each node, for GPU training, this is recommended to be set to the number of GPUs in your system so that each process can be bound to a single GPU. Default value is 1.

  • nnodes: The number of nodes to use for distributed training. Default value is 1.

  • master_addr: The master node’s (rank 0) IP address or the hostname. Leave default for single node training. Default value is 127.0.0.1.

  • master_port: The master node’s (rank 0) port used for communciation during distributed training. Default value is 29500.