Ethan Chapman
Aug 12, 2020

Pandas + Minio Uploading & Downloading Files

At Oak-Tree, we utilize the S3 compatible storage application Minio to house many terabytes of data on our cluster. Minio is excellent because it's easy to use, and it has been remarkably stable for us. A common use case for our engineers is storing files within a Minio bucket and then accessing them with Pandas via JupyterHub or their local machine.

When downloading from a public Minio bucket, it's trivial:

import pandas as pd
df = pd.read_csv("https://storage.centerville.oak-tree.tech/public/examples/test.csv")

However, accessing from a protected bucket is a bit more involved. To do so, we'll need to use Minio's Python client.

from minio import Minio
client = Minio(
    "storage.centerville.oak-tree.tech",
    access_key="my-access-key",
    secret_key="my-secret-key",
    secure=True
)

With the Minio client initialized, we've unlocked two major functionalities: downloading objects from protected buckets and uploading objects to Minio.

Let's try to download a protected object:

obj = client.get_object(
    "my-protected-bucket",
    "examples/test.csv",
)
df = pd.read_csv(obj)

Uploading a Pandas DataFrame to Minio is a bit more involved than downloading. Minio accepts file-like objects, so we can use BytesIO here.

from io import BytesIO
csv = df.to_csv().encode('utf-8')
client.put_object(
    "my-bucket",
    "examples/test.csv",
    data=BytesIO(csv),
    length=len(csv),
    content_type='application/csv'
)

All of the examples here used CSV, but many other file types are supported; JSON, Excel, and pretty much any file that Pandas supports.

Ethan Chapman Aug 12, 2020
More Articles by Ethan Chapman

Loading

Unable to find related content

Comments

Loading
Unable to retrieve data due to an error
Retry
No results found
Back to All Comments