MolecularDiffusion.utils.file

Attributes

Functions

compute_md5(file_name[, chunk_size])

Compute MD5 of the file.

download(url, path[, save_file, md5])

Download a file from the specified url.

extract(zip_file[, member])

Extract files from a zip file. Currently, zip, gz, tar.gz, tar file types are supported.

get_line_count(file_name[, chunk_size])

Get the number of lines in a file.

smart_open(file_name[, mode])

Open a regular file or a zipped file.

Module Contents

MolecularDiffusion.utils.file.compute_md5(file_name, chunk_size=65536)

Compute MD5 of the file.

Parameters:
  • file_name (str) – file name

  • chunk_size (int, optional) – chunk size for reading large files

MolecularDiffusion.utils.file.download(url, path, save_file=None, md5=None)

Download a file from the specified url. Skip the downloading step if there exists a file satisfying the given MD5.

Parameters:
  • url (str) – URL to download

  • path (str) – path to store the downloaded file

  • save_file (str, optional) – name of save file. If not specified, infer the file name from the URL.

  • md5 (str, optional) – MD5 of the file

MolecularDiffusion.utils.file.extract(zip_file, member=None)

Extract files from a zip file. Currently, zip, gz, tar.gz, tar file types are supported.

Parameters:
  • zip_file (str) – file name

  • member (str, optional) – extract specific member from the zip file. If not specified, extract all members.

MolecularDiffusion.utils.file.get_line_count(file_name, chunk_size=8192 * 1024)

Get the number of lines in a file.

Parameters:
  • file_name (str) – file name

  • chunk_size (int, optional) – chunk size for reading large files

MolecularDiffusion.utils.file.smart_open(file_name, mode='rb')

Open a regular file or a zipped file.

This function can be used as drop-in replacement of the builtin function open().

Parameters:
  • file_name (str) – file name

  • mode (str, optional) – open mode for the file stream

MolecularDiffusion.utils.file.logger