Repositories Overview

Version control built for AI — store models, datasets, and code together with no file size limits.

Outpost Repositories provide version control built from the ground up for AI workflows. The underlying SCM engine is written in Rust and designed to handle large binary data — model checkpoints, datasets, images, video, sensor data — with content-addressed storage, deduplication, and no file size limits.

Every repository tracks changes, supports branching and merging, and works through the outpost CLI and the web UI.

Key capabilities

No file size limits — a 50GB model checkpoint is versioned the same way as a 2KB script. No LFS configuration, no external plugins.
Content-addressed storage — every file is identified by its SHA2 hash. Identical content is stored once, regardless of where it appears. Push a model, change one layer, push again — only the diff is transferred.
Branching and merging — create branches for experiments, open pull requests for review, merge when ready.
Deduplication — when you push a commit, only changed files are uploaded. The rest are referenced by hash across all versions and repositories.
Metadata extraction — Outpost automatically extracts metadata from tracked files (image dimensions, CSV schemas, audio sample rates) and makes it searchable.

When to use Outpost Repositories

Datasets alongside code — version your training data, evaluation sets, and preprocessing scripts together in one repository.
Large binary files — model checkpoints, images, video, audio that would break a traditional Git repository.
Reproducibility — every commit is a snapshot of your data and code, linked to specific training runs and experiments.
Team collaboration — code reviews, pull requests, and branch protection for ML workflows.

How it works

Content-addressed storage — every file is hashed with SHA2 and stored by its content hash. Identical files across branches or repositories are stored once.
Merkle trees — directory structures are represented as Merkle trees, enabling efficient comparison and verification of entire repository states.
Streaming — large files can be streamed on demand, so you never need to download an entire dataset to access a single file.

Next steps

Managing Large Files

Connect and Develop