Managing Large Files

Connect External Storage

Learn how to connect external storage with Outpost.

Outpost supports connecting external storage to Outpost repositories to access and interact with your data and large files without leaving the Outpost platform.

This can be used to connect your data storage to Outpost repositories, so that you can manage and version your datasets and models in the same place where you manage your code.

It can also be used to provide your own backing storage for your DVC remote.

What type of storage is supported?

Outpost provides integration with AWS S3, Google Cloud Storage (GCS), Azure Blob Storage and any S3-compatible storage, including MinIO.

How to connect an S3 bucket to a Outpost repository?

This guide will walk you through connecting your bucket to Outpost.

It assumes you already have created your bucket set up with the correct permissions.

In the example we'll use AWS S3, but other options work similarly, simply choose the relevant provider from the list.

Connection flow external buckets

If you've already added code or connected your Git repo to your project, simply click the "Connect external storage bucket" button, and select the relevant storage you want to connect. Otherwise, if you project is entirely empty, you can go to the settings tab, and select the Integrations sub-section, where you'll see all available integrations. Some integrations also appear on the repo empty state.

Connect a bucket section

Follow the connection wizard and add the relevant details.

Choose storage type

!!! note Your bucket URL should include the relevant prefix s3://, gs://, azure://, and s3:// for S3-compatible storage.

Accessing and viewing your connected storage bucket

Once you have connected your bucket you'll see the bucket appear in the "Storage Buckets" section of your files tab. You'll be able to see its contents, download files, and create a Data Engine Datasource from it for convenient annotation and training.

See Files inside a connected bucket

How to stream files hosted on an S3 bucket?

Data Streaming supports streaming files from your buckets connected to a Outpost repository. This means you can access the files as if they were in your local system. We'll download them on demand.

Here's some example code for an S3 bucket:

python
1from outpost.streaming import OutpostFilesystem
2
3fs = OutpostFilesystem(".", repo_url="https://outpost.run/<user_name>/<repo_name>")
4
5fs.listdir("s3://good-dog-pics/10plus")

It also works with the magical install_hooks() functionality:

python
1import os
2from outpost.streaming import install_hooks
3install_hooks()
4
5os.listdir("s3://good-dog-pics/10plus")