MolecularDiffusion.utils.comm

Attributes

Functions

cat(obj[, dst])

Concatenate any nested container of tensors along the 0-th axis.

get_cpu_count()

Get the number of CPUs on this node.

get_group(device)

Get the process group corresponding to the given device.

get_rank()

Get the rank of this process in distributed processes.

get_world_size()

Get the total number of distributed processes.

init_process_group(backend[, init_method])

Initialize CPU and/or GPU process groups.

reduce(obj[, op, dst])

Reduce any nested container of tensors.

stack(obj[, dst])

Stack any nested container of tensors. The new dimension will be added at the 0-th axis.

synchronize()

Synchronize among all distributed processes.

Module Contents

MolecularDiffusion.utils.comm.cat(obj, dst=None)

Concatenate any nested container of tensors along the 0-th axis.

Parameters:
  • obj (Object) – any container object. Can be nested list, tuple or dict.

  • dst (int, optional) – rank of destination worker. If not specified, broadcast the result to all workers.

Example:

>>> # assume 4 workers
>>> rank = comm.get_rank()
>>> rng = torch.arange(10)
>>> obj = {"range": rng[rank * (rank + 1) // 2: (rank + 1) * (rank + 2) // 2]}
>>> obj = comm.cat(obj)
>>> assert torch.allclose(obj["range"], rng)
MolecularDiffusion.utils.comm.get_cpu_count()

Get the number of CPUs on this node.

MolecularDiffusion.utils.comm.get_group(device)

Get the process group corresponding to the given device.

Parameters:

device (torch.device) – query device

MolecularDiffusion.utils.comm.get_rank()

Get the rank of this process in distributed processes.

Return 0 for single process case.

MolecularDiffusion.utils.comm.get_world_size()

Get the total number of distributed processes.

Return 1 for single process case.

MolecularDiffusion.utils.comm.init_process_group(backend, init_method=None, **kwargs)

Initialize CPU and/or GPU process groups.

Parameters:
  • backend (str) – Communication backend. Use nccl for GPUs and gloo for CPUs.

  • init_method (str, optional) – URL specifying how to initialize the process group

MolecularDiffusion.utils.comm.reduce(obj, op='sum', dst=None)

Reduce any nested container of tensors.

Parameters:
  • obj (Object) – any container object. Can be nested list, tuple or dict.

  • op (str, optional) – element-wise reduction operator. Available operators are sum, mean, min, max, product.

  • dst (int, optional) – rank of destination worker. If not specified, broadcast the result to all workers.

Example:

>>> # assume 4 workers
>>> rank = comm.get_rank()
>>> x = torch.rand(5)
>>> obj = {"polynomial": x ** rank}
>>> obj = comm.reduce(obj)
>>> assert torch.allclose(obj["polynomial"], x ** 3 + x ** 2 + x + 1)
MolecularDiffusion.utils.comm.stack(obj, dst=None)

Stack any nested container of tensors. The new dimension will be added at the 0-th axis.

Parameters:
  • obj (Object) – any container object. Can be nested list, tuple or dict.

  • dst (int, optional) – rank of destination worker. If not specified, broadcast the result to all workers.

Example:

>>> # assume 4 workers
>>> rank = comm.get_rank()
>>> x = torch.rand(5)
>>> obj = {"exponent": x ** rank}
>>> obj = comm.stack(obj)
>>> truth = torch.stack([torch.ones_like(x), x, x ** 2, x ** 3]
>>> assert torch.allclose(obj["exponent"], truth))
MolecularDiffusion.utils.comm.synchronize()

Synchronize among all distributed processes.

MolecularDiffusion.utils.comm.cpu_group = None
MolecularDiffusion.utils.comm.gpu_group = None