mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-06-21 19:10:45 -04:00
302 lines
12 KiB
Markdown
302 lines
12 KiB
Markdown
# Setting Up Storage
|
|
|
|
> *💬 We offer [consulting services](https://docs.monadical.com/s/archivebox-consulting-services) to set up, secure, and maintain ArchiveBox on your preferred storage provider.*
|
|
> <sub>We use this revenue (from corporate clients who can afford to pay) to support open source development and keep ArchiveBox free.</sub>
|
|
|
|
<br/>
|
|
|
|
ArchiveBox supports a wide range of local and remote filesystems using `rclone` and/or Docker storage plugins. The examples below use [Docker Compose bind mounts](https://docs.docker.com/storage/bind-mounts/) to demonstrate the concepts, you can adapt them to your OS and environment needs.
|
|
|
|
Example [`docker-compose.yml`](https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml) storage setup:
|
|
```yaml
|
|
services:
|
|
archivebox:
|
|
...
|
|
volumes:
|
|
# your index db, config, logs, etc. should be stored on a local SSD (usually <10Gb)
|
|
- ./data:/data
|
|
|
|
# but bulk archive/ content can be located on an HDD or remote filesystem
|
|
- /mnt/archivebox-s3/data/archive:/data/archive
|
|
```
|
|
|
|
<h4>Related Docs</h4>
|
|
<ul>
|
|
<li><a href="https://github.com/ArchiveBox/ArchiveBox#archive-layout">README: Archive Layout</a></li>
|
|
<li><a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Disk-Layout">Wiki: Usage (Disk Layout)</a></li>
|
|
<li><a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#large-archives">Wiki: Usage (Large Archives)</a></li>
|
|
<li><a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#output-folder">Wiki: Security Overview (Output Folder)</a></li>
|
|
<li><a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Publishing-Your-Archive">Wiki: Publishing Your Archive</a></li>
|
|
<li><a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives">Wiki: Upgrading or Merging Archives</a></li>
|
|
<li><a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Troubleshooting#other-database-or-filesystem-issues">Wiki: Troubleshooting Filesystem Issues</a></li>
|
|
</ul>
|
|
|
|
---
|
|
|
|
## Supported Local Filesystems
|
|
|
|
<img src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/45abfe78-87c4-4c87-ab11-9dae2f3b2518" alt="local filesystem icon" width="80px" align="right"/>
|
|
|
|
<a name="ext4"></a><a name="apfs"></a>
|
|
|
|
### `EXT4` (default on Linux), `APFS` (default on macOS)
|
|
|
|
> [!TIP]
|
|
> These default filesystems are fully supported by ArchiveBox on Linux and macOS (w/wo Docker).
|
|
|
|
<a name="zfs"></a>
|
|
|
|
### `ZFS` (recommended for best experience on Linux/BSD) ⭐️
|
|
|
|
> [!TIP]
|
|
> *This is the recommended filesystem for ArchiveBox on Linux, macOS, and BSD (w/wo Docker).*
|
|
> [`apt install zfsutils-linux`](https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/index.html)
|
|
> <sub>Provides RAID, compression, encryption, deduping, 0-cost point-in-time backups, remote sync, integrity verification, and more...</sub>
|
|
|
|
- https://openzfs.github.io/openzfs-docs/
|
|
- https://openzfs.github.io/openzfs-docs/man/v2.2/8/zpool-create.8.html
|
|
- https://openzfs.github.io/openzfs-docs/man/v2.2/8/zfs-create.8.html
|
|
- https://docs.docker.com/storage/storagedriver/zfs-driver/
|
|
- https://www.ixsystems.com/blog/fast-dedup-is-a-valentines-gift-to-the-openzfs-and-truenas-communities/
|
|
|
|
```bash
|
|
# create a new archivebox pool to hold your dataset
|
|
zpool create -f \
|
|
-O mountpoint=/mnt/archivebox \
|
|
-O sync=standard \
|
|
-O compression=lz4 \
|
|
-O recordsize=128K \
|
|
-O dnodesize=auto \
|
|
-O atime=off \
|
|
-O xattr=sa \
|
|
-O acltype=posixacl \
|
|
-O aclinherit=passthrough \
|
|
-O utf8only=on \
|
|
-O normalization=formD \
|
|
-O casesensitivity=sensitive \
|
|
archivebox /dev/disk/by-uuid/disk1... /dev/disk/by-uuid/disk2...
|
|
|
|
# create the archivebox/data ZFS dataset
|
|
zfs create \
|
|
-o mountpoint=/mnt/archivebox/data \
|
|
archivebox/data
|
|
|
|
# optional: add encryption
|
|
-o encryption=on \
|
|
-o keysource=passphrase,prompt \
|
|
```
|
|
|
|
<a name="ntfs"></a><a name="hfs"></a><a name="btrfs"></a>
|
|
|
|
### `NTFS`, `HFS+`, `BTRFS`
|
|
|
|
> [!WARNING]
|
|
> These filesystems are likely supported, but are not officially tested.
|
|
|
|
<a name="ext2"></a><a name="ext3"></a><a name="fat32"></a><a name="exfat"></a>
|
|
|
|
### `EXT2`, `EXT3`, `FAT32`, `exFAT`
|
|
|
|
> [!CAUTION]
|
|
> Not recommended. Cannot store files >4GB or more than 31k ~ 65k Snapshot entries due to directory entry limits.
|
|
|
|
<br/>
|
|
|
|
---
|
|
|
|
<br/>
|
|
<a name="remote-filesystems"></a>
|
|
|
|
## Supported Remote Filesystems
|
|
|
|
<img src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/6124b92a-df5a-47c4-b3c2-006ebd28785b" alt="local filesystem icon" width="80px" align="right"/>
|
|
|
|
ArchiveBox supports many common types of remote filesystems using RClone, FUSE, Docker Storage providers, and Docker Volume Plugins.
|
|
|
|
The `data/archive/` subfolder contains the bulk archived content, and it supports being stored on a slower remote server (SMB/NFS/SFTP/etc.) or object store (S3/B2/R2/etc.). For data integrity and performance reasons, the rest of the `data/` directory (`data/ArchiveBox.conf`, `data/logs`, etc.) must be stored locally while ArchiveBox is running.
|
|
|
|
> [!IMPORTANT]
|
|
> `data/index.sqlite3` is your main archive DB, *it must be on a fast, reliable, local filesystem* which supports [FSYNC](https://stackoverflow.com/questions/40849596/git-clone-fsync-input-output-error-in-linux#:~:text=Some%20filesystems%20%2D%20especially%20remote%20filesystems%20like%20NFS%2C%20sshfs%2C&text=do%20not%20support%20fsync()%20but%20git%20has%20no%20flag%20to%20disable%20these%20calls) (SSD/NVMe recommended for best experience).
|
|
|
|
> [!TIP]
|
|
> If you use a remote filesystem, you should switch ArchiveBox's search backend from [`ripgrep`](https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search#ripgrep) to [`sonic`](https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search#sonic) (or [`FTS5`](https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search#fts5)).
|
|
> <sub>(`ripgrep` scans over every byte in the archive to do each search, which is **slow and potentially costly** on remote cloud storage)</sub>
|
|
|
|
<a name="nfs"></a>
|
|
|
|
### `NFS` (Docker Driver)
|
|
|
|
`docker-compose.yml`:
|
|
```yaml
|
|
services:
|
|
archivebox:
|
|
volumes:
|
|
- ./data:/data
|
|
- archivebox-archive:/data/archive
|
|
|
|
volumes:
|
|
archivebox-archive:
|
|
driver_opts:
|
|
type: "nfs"
|
|
o: "addr=some-remote-server.example.com,nolock,soft,rw,nfsvers=4"
|
|
device: ":/archivebox-archive"
|
|
```
|
|
|
|
<a name="smb"></a><a name="ceph"></a>
|
|
|
|
### `SMB` / `Ceph` (Docker CIFS Driver)
|
|
|
|
`docker-compose.yml`:
|
|
```yaml
|
|
services:
|
|
archivebox:
|
|
volumes:
|
|
- ./data:/data
|
|
- archivebox-archive:/data/archive
|
|
|
|
volumes:
|
|
archivebox-archive:
|
|
driver: local
|
|
driver_opts:
|
|
type: cifs
|
|
device: "//some-remote-server.example.com/archivebox-archive"
|
|
o: "username=XXX,password=YYY,uid=911,gid=911"
|
|
```
|
|
|
|
<br/>
|
|
|
|
<img src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/0a159c27-5d54-46b9-814b-480f239ed27e" alt="local filesystem icon" height="80px" align="right"/><img src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/5ca561b4-4597-401f-84b6-d53042fd7359" alt="local filesystem icon" height="80px" align="right"/>
|
|
|
|
<a name="s3"></a><a name="b2"></a><a name="gdrive"></a><a name="rclone"></a>
|
|
|
|
### Amazon S3 / Backblaze B2 / Google Drive / etc. (RClone)
|
|
|
|
```bash
|
|
# install the RClone and FUSE packages on your host
|
|
apt install rclone fuse # or brew install
|
|
|
|
# IMPORTANT: needed to allow FUSE drives to be shared with Docker
|
|
echo 'user_allow_other' >> /etc/fuse.conf
|
|
```
|
|
|
|
Then define your remote storage config `~/.config/rclone/rclone.conf`:
|
|
|
|
> [!TIP]
|
|
> You can also create `rclone.conf` using the RClone Web GUI: `rclone rcd --rc-web-gui`
|
|
|
|
```ini
|
|
# Example rclone.conf using Amazon S3 for storage:
|
|
[archivebox-s3]
|
|
type = s3
|
|
provider = AWS
|
|
access_key_id = XXX
|
|
secret_access_key = YYY
|
|
region = us-east-1
|
|
```
|
|
|
|
#### RClone Config Examples
|
|
|
|
- [SMB](https://rclone.org/smb/) / [Ceph](https://rclone.org/s3/#ceph) / [SFTP](https://rclone.org/sftp/) / [FTP](https://rclone.org/ftp/) / [WebDAV (e.g. Nextcloud)](https://rclone.org/webdav/)
|
|
- [Google Drive](https://rclone.org/drive/) / [Dropbox](https://rclone.org/dropbox/) / [OneDrive](https://rclone.org/onedrive/)
|
|
- [Amazon S3](https://rclone.org/s3/#configuration) / [Backblaze B2](https://rclone.org/b2/) / [Cloudflare R2](https://rclone.org/s3/#cloudflare-r2) / [DigitalOcean Spaces](https://rclone.org/s3/#digitalocean-spaces)
|
|
- [Google Cloud Storage](https://rclone.org/s3/#google-cloud-storage) / [Azure Blob](https://rclone.org/azureblob/) / [Azure Files](https://rclone.org/azurefiles/)
|
|
- [Storj](https://rclone.org/s3/#storj) / [Sia](https://rclone.org/sia/) / [Archive.org Storage](https://rclone.org/internetarchive/)
|
|
- And many more...
|
|
- https://rclone.org/s3/
|
|
- https://rclone.org/overview/
|
|
|
|
*Bonus:*
|
|
- Set up gzip compression: https://rclone.org/compress/
|
|
- Set up file encryption: https://rclone.org/crypt/
|
|
- Set up hashing engine: https://rclone.org/hasher/
|
|
|
|
<br/>
|
|
|
|
#### Option A: Running RClone on Bare Metal host
|
|
|
|
1. *If Needed:* Transfer any existing local archive data to the remote volume first
|
|
```bash
|
|
rclone sync --fast-list --transfers 20 --progress /opt/archivebox/data/archive/ archivebox-s3:/data/archive
|
|
mv /opt/archivebox/data/archive /opt/archivebox/data/archive.localbackup
|
|
```
|
|
2. **Mount the remote storage volume as FUSE filesystem**
|
|
```
|
|
rclone mount
|
|
--allow-other \ # essential, allows Docker to access FUSE mounts
|
|
--uid 911 --gid 911 \ # 911 is the default used by ArchiveBox
|
|
--vfs-cache-mode=full \ # cache both file metadata and contents
|
|
--transfers=16 --checkers=4 \ # use 16 threads for transfers & 4 for checking
|
|
archivebox-s3/data/archive:/opt/archivebox/data/archive # remote:local
|
|
```
|
|
|
|
See here for full more detailed instructions here: [RClone Documentation: The `rclone mount` command](https://rclone.org/commands/rclone_mount/)
|
|
|
|
> [!TIP]
|
|
> You can use any RClone FUSE mounts as a normal volumes (bind mount) for Docker ArchiveBox, typically no storage plugin is needed as long as `allow-other` is setup properly.
|
|
|
|
`docker run -v $PWD:/data -v /opt/archivebox/data/archive:/data/archive`
|
|
|
|
`docker-compose.yml`:
|
|
```yaml
|
|
services:
|
|
archivebox:
|
|
...
|
|
volumes:
|
|
- ./data:/data
|
|
- /opt/archivebox/data/archive:/data/archive
|
|
```
|
|
|
|
<br/>
|
|
|
|
#### Option B: Running RClone with Docker Storage Plugin
|
|
|
|
*This is only needed if you are unable to `Option A` for compatibility or performance reasons, or if you prefer defining your remote storage config in `docker-compose.yml` instead of `rclone.conf`.*
|
|
|
|
See here for full instructions: [RClone Documentation: Docker Plugin](https://rclone.org/docker/)
|
|
|
|
1. First, install the [Rclone Docker Volume Plugin](https://rclone.org/docker/#installing-as-managed-plugin) for your CPU architecture (e.g. `amd64` or `arm64`):
|
|
```bash
|
|
docker plugin install rclone/docker-volume-rclone:amd64 --grant-all-permissions --alias rclone
|
|
ln -sf ~/.config/rclone/rclone.conf /var/lib/docker-plugins/rclone/config/rclone.conf
|
|
```
|
|
|
|
2. Then, [create a volume using the Docker CLI](https://rclone.org/docker/#creating-volumes-via-cli) or [define one using Docker Compose / Swarm](https://rclone.org/docker/#using-with-swarm-or-compose):
|
|
|
|
`docker-compose.yml`:
|
|
```yaml
|
|
services:
|
|
archivebox:
|
|
volumes:
|
|
- ./data:/data
|
|
- archivebox-s3:/data/archive
|
|
|
|
volumes:
|
|
archivebox-s3:
|
|
driver: rclone
|
|
driver_opts:
|
|
remote: 'archivebox-s3/data/archive'
|
|
allow_other: 'true'
|
|
vfs_cache_mode: full
|
|
poll_interval: 0
|
|
uid: 911
|
|
gid: 911
|
|
transfers: 16
|
|
checkers: 4
|
|
```
|
|
|
|
|
|
To start the container and verify the filesystem is accessible within it:
|
|
```bash
|
|
docker compose run archivebox /bin/bash 'ls -lah /data/archive/ | tee /data/archive/.write_test.txt'
|
|
```
|
|
|
|
<br/>
|
|
---
|
|
<br/>
|
|
|
|
### More Docker Storage Plugins
|
|
|
|
- [IPFS](https://github.com/djdv/go-filesystem-utils/pull/40) / [Peergos](https://github.com/peergos/peergos) / [GlusterFS](https://github.com/calavera/docker-volume-glusterfs)
|
|
- [DigitalOcean Block Storage Volumes](https://github.com/djmaze/dobs-volume-plugin) / [Linode Block Storage Volumes](https://github.com/linode/docker-volume-linode)
|
|
- [More volume plugins...](https://docs.docker.com/engine/extend/legacy_plugins/#volume-plugins)
|