Pandas + Minio Uploading & Downloading Files
At Oak-Tree, we utilize the S3 compatible storage application Minio to house many terabytes of data on our cluster. Minio is excellent because it's easy to use, and it has been remarkably stable for us. A common use case for our engineers is storing files within a Minio bucket and then accessing them with Pandas via JupyterHub or their local machine.
When downloading from a public Minio bucket, it's trivial:
import pandas as pd df = pd.read_csv("https://storage.centerville.oak-tree.tech/public/examples/test.csv")
However, accessing from a protected bucket is a bit more involved. To do so, we'll need to use Minio's Python client.
from minio import Minio client = Minio( "storage.centerville.oak-tree.tech", access_key="my-access-key", secret_key="my-secret-key", secure=True )
With the Minio client initialized, we've unlocked two major functionalities: downloading objects from protected buckets and uploading objects to Minio.
Let's try to download a protected object:
obj = client.get_object( "my-protected-bucket", "examples/test.csv", ) df = pd.read_csv(obj)
Uploading a Pandas DataFrame to Minio is a bit more involved than downloading. Minio accepts file-like objects, so we can use BytesIO
here.
from io import BytesIO csv = df.to_csv().encode('utf-8') client.put_object( "my-bucket", "examples/test.csv", data=BytesIO(csv), length=len(csv), content_type='application/csv' )
All of the examples here used CSV, but many other file types are supported; JSON, Excel, and pretty much any file that Pandas supports.
Comments
Loading
No results found