dml_util.aws.s3#

S3 storage utilities.

Classes

S3Store([bucket, prefix, client])

S3 Store for DML

class dml_util.aws.s3.S3Store(bucket=<factory>, prefix=None, client=<factory>)[source]#

Bases: object

S3 Store for DML

Parameters:
  • bucket (str) – S3 bucket name. Defaults to the value of the environment variable “DML_S3_BUCKET”.

  • prefix (str) – S3 prefix. Defaults to the value of the environment variable “DML_S3_PREFIX”.

  • client (boto3.client, optional) – Boto3 S3 client. Defaults to a new client created using the get_client function.

Notes

  • If prefix is not provided, “/data” is appended to the DML_S3_PREFIX environment variable.

  • prefix is stripped of leading and trailing slashes, so if you want to use a prefix like “/foo/”, you’ll need to handle those uris directly. E.g. to put data at “s3://my-bucket//foo/bar”, you would use S3Store().put(data, uri=”s3://my-bucket//foo/bar”).

Examples

>>> s3 = S3Store(bucket="my-bucket", prefix="my-prefix")
>>> s3.put(data=b"Hello, World!", name="greeting.txt")
Resource(uri='s3://my-bucket/my-prefix/greeting.txt')
>>> s3.ls(recursive=True)
['s3://my-bucket/my-prefix/greeting.txt']
>>> s3.get("greeting.txt")
b'Hello, World!'
>>> s3.exists("greeting.txt")
True
>>> s3.rm("greeting.txt")
>>> s3.exists("greeting.txt")
False
>>> s3.put_js({"key": "value"}, name="data")
Resource(uri='s3://my-bucket/my-prefix/data.json')
>>> s3.get_js("data")
{'key': 'value'}
>>> s3.tar(dml, path="my_data", excludes=["*.tmp"])
Resource(uri='s3://my-bucket/my-prefix/my_data.tar')
>>> s3.untar("s3://my-bucket/my-prefix/my_data.tar", dest="my_data")
# Extracts the tar archive to the local directory "my_data"
>>> s3.cd("new-prefix")
S3Store(bucket='my-bucket', prefix='my-prefix/new-prefix')
>>> s3.cd("..")  # Go back to the previous prefix
S3Store(bucket='my-bucket', prefix='')
bucket: str#
cd(new_prefix)[source]#

Change the prefix of the S3 store.

Return type:

S3Store

client: client#
exists(name_or_uri)[source]#
get(name_or_uri)[source]#
get_js(uri)[source]#
ls(s3_root=None, *, recursive=False, lazy=False)[source]#

List objects in the S3 bucket.

Parameters:
  • s3_root (str, optional) – Name or s3 root to list. Defaults to s3://<bucket>/<prefix>/.

  • recursive (bool) – If True, list all objects recursively. Defaults to False.

  • lazy (bool) – If True, return a generator. Defaults to False.

Returns:

A generator or list of S3 URIs.

Return type:

generator or list

parse_uri(name_or_uri)[source]#

Parse a URI or name into bucket and key.

Examples

>>> s3 = S3Store(bucket="my-bucket", prefix="my-prefix")
>>> s3.parse_uri("s3://my-other-bucket/my-key")
('my-other-bucket', 'my-key')
>>> s3.parse_uri("my-key")
('my-bucket', 'my-prefix/my-key')
>>> s3.parse_uri(Resource("s3://my-other-bucket/my-key"))
('my-other-bucket', 'my-key')
prefix: str = None#
put(data=None, filepath=None, name=None, uri=None, suffix=None)[source]#
put_js(data, uri=None, **kw)[source]#
Return type:

Resource

rm(*name_or_uris)[source]#

Remove objects from S3.

Parameters:

name_or_uris (str | Resource | list[str | Resource])

tar(dml, path, excludes=())[source]#

Create a tar archive and store it in S3.

untar(tar_uri, dest)[source]#

Extract a tar archive from S3 to a local directory.