Managing Large Files
Learn how to connect external storage with Outpost.
Outpost supports connecting external storage to Outpost repositories to access and interact with your data and large files without leaving the Outpost platform.
This can be used to connect your data storage to Outpost repositories, so that you can manage and version your datasets and models in the same place where you manage your code.
It can also be used to provide your own backing storage for your DVC remote.
Outpost provides integration with AWS S3, Google Cloud Storage (GCS), Azure Blob Storage and any S3-compatible storage, including MinIO.
This guide will walk you through connecting your bucket to Outpost.
It assumes you already have created your bucket set up with the correct permissions.
In the example we'll use AWS S3, but other options work similarly, simply choose the relevant provider from the list.
If you've already added code or connected your Git repo to your project, simply click the "Connect external storage bucket" button, and select the relevant storage you want to connect. Otherwise, if you project is entirely empty, you can go to the settings tab, and select the Integrations sub-section, where you'll see all available integrations. Some integrations also appear on the repo empty state.
Follow the connection wizard and add the relevant details.
!!! note
Your bucket URL should include the relevant prefix s3://
, gs://
, azure://
, and s3://
for S3-compatible storage.
Once you have connected your bucket you'll see the bucket appear in the "Storage Buckets" section of your files tab. You'll be able to see its contents, download files, and create a Data Engine Datasource from it for convenient annotation and training.
Data Streaming supports streaming files from your buckets connected to a Outpost repository. This means you can access the files as if they were in your local system. We'll download them on demand.
Here's some example code for an S3 bucket:
1from outpost.streaming import OutpostFilesystem
2
3fs = OutpostFilesystem(".", repo_url="https://outpost.run/<user_name>/<repo_name>")
4
5fs.listdir("s3://good-dog-pics/10plus")
It also works with the magical install_hooks()
functionality:
1import os
2from outpost.streaming import install_hooks
3install_hooks()
4
5os.listdir("s3://good-dog-pics/10plus")