mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-06-21 19:10:45 -04:00
290 lines
12 KiB
Markdown
290 lines
12 KiB
Markdown
# Docker
|
|
|
|
## Overview
|
|
|
|
Running ArchiveBox with Docker allows you to manage it in a container without exposing it to the rest of your system. ArchiveBox generally works the same in Docker as it does outside Docker. You can even use `pip`-installed ArchiveBox and Docker ArchiveBox in tandem, as they both share the same data directory format.
|
|
|
|
<img src="https://imgur.zervice.io/qFAPRwC.png" width="20%" align="right"/>
|
|
|
|
- [Overview](#Overview)
|
|
- [Docker Compose](#docker-compose) ⭐️ (recommended)
|
|
- [Setup](#setup)
|
|
- [Upgrading](https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#upgrading-with-docker-compose-%EF%B8%8F)
|
|
- [Usage](#usage)
|
|
- [Accessing the data](#accessing-the-data)
|
|
- [Configuration](#configuration)
|
|
- [Plain Docker](#docker)
|
|
- [Setup](#setup-1)
|
|
- [Upgrading](https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#upgrading-with-plain-docker)
|
|
- [Usage](#usage-1)
|
|
- [Accessing the data](#accessing-the-data-1)
|
|
- [Configuration](#configuration-1)
|
|
|
|
<br/>
|
|
|
|
**Official Docker Hub image: [`hub.docker.com/r/archivebox/archivebox`](https://hub.docker.com/r/archivebox/archivebox)**
|
|
```bash
|
|
docker pull archivebox/archivebox:latest
|
|
```
|
|
|
|
- [`Dockerfile`](https://github.com/ArchiveBox/ArchiveBox/blob/dev/Dockerfile)
|
|
- [`docker-compose.yml`](https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml)
|
|
- [`archivebox-kubernetes.yml`](https://github.com/ArchiveBox/docker-archivebox/blob/master/archivebox.yml)
|
|
|
|
Published [Docker tags](https://hub.docker.com/r/archivebox/archivebox/tags):
|
|
- `:latest`, `:stable` (latest stable release, the default)
|
|
- `:x.x` and `:x.x.x` for specific versions (e.g. `:0.7` or `:0.7.2`)
|
|
- `:dev` for unstable alpha builds (breaks often, only for developers and willing beta testers)
|
|
- `:sha-xxxxxxx` for builds of specific git commits (to test or pin specific PRs or commits)
|
|
|
|
<br/>
|
|
|
|
> [!IMPORTANT]
|
|
> *Make sure Docker is **[installed](https://docs.docker.com/install/#supported-platforms)** and up-to-date before following any instructions below!* ➡️
|
|
> To check installed version, run: `docker --version` (must be `>=17.04.0`)
|
|
|
|
<br/>
|
|
|
|
<img src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/9e8658f7-7d00-452e-a10e-f7d22ef9365a" height="40px" align="right"/>
|
|
|
|
## Docker Compose
|
|
|
|
<br/>
|
|
|
|
### Setup
|
|
|
|
A full [`docker-compose.yml`](https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml) file is provided with all the extras included.
|
|
You can uncomment sections within it to enable extra features, or run the basic version as-is.
|
|
|
|
|
|
```bash
|
|
# create a folder to store your data (can be anywhere)
|
|
mkdir -p ~/archivebox/data && cd ~/archivebox
|
|
|
|
# download the compose file into the directory
|
|
curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml
|
|
# (shortcut for getting https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/docker-compose.yml)
|
|
|
|
# initialize your collection and create an admin user for the Web UI (or set ADMIN_USERNAME/ADMIN_PASSWORD env vars)
|
|
docker compose run archivebox init
|
|
docker compose run archivebox manage createsuperuser
|
|
```
|
|
|
|
To use [Sonic](https://github.com/valeriansaliou/sonic) for improved full-text search, download this config & uncomment the sonic service in `docker-compose.yml`:
|
|
```bash
|
|
# download the sonic config file into your data folder (e.g. ~/archivebox)
|
|
curl -fsSL 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/etc/sonic.cfg' > sonic.cfg
|
|
|
|
# then uncomment the sonic-related sections in docker-compose.yml
|
|
nano docker-compose.yml
|
|
|
|
# to backfill any existing archive data into the search index, run:
|
|
docker compose run archivebox update --index-only
|
|
```
|
|
|
|
<br/>
|
|
|
|
### Upgrading
|
|
|
|
See the wiki page on [Upgrading or Merging Archives: Upgrading with Docker Compose](https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#upgrading-with-docker-compose-%EF%B8%8F) for instructions. ➡️
|
|
|
|
<br/>
|
|
|
|
### Usage
|
|
|
|
You can use `docker compose run archivebox [subcommand]` just like the non-Docker `archivebox [subcommand]` CLI.
|
|
|
|
First, make sure you're `cd`'ed into the same folder as your `docker-compose.yml` file (e.g. `~/archivebox`):
|
|
```bash
|
|
docker compose run archivebox help
|
|
```
|
|
|
|
To add an individual URL, pass it in as an arg or via stdin:
|
|
```bash
|
|
docker compose run archivebox add 'https://example.com'
|
|
# OR
|
|
echo 'https://example.com' | docker compose run -T archivebox add
|
|
```
|
|
|
|
To add multiple URLs at once, pipe them in via stdin, or place them in a file inside `./data/sources` so that ArchiveBox can access it from within the container:
|
|
```bash
|
|
# pipe URLs in from a file outside Docker
|
|
docker compose run -T archivebox add < ~/Downloads/example_urls.txt
|
|
|
|
# OR ingest URLs from a file mounted inside Docker
|
|
docker compose run archivebox add --depth=1 /data/sources/example_urls.txt
|
|
|
|
# OR pipe in URLs from a remote source
|
|
curl 'https://example.com/some/rss/feed.xml' | docker compose run archivebox add
|
|
docker compose run archivebox add --depth=1 'https://example.com/some/rss/feed.xml'
|
|
```
|
|
|
|
The `--depth=1` flag tells ArchiveBox to look inside the provided source and archive all the URLs within:
|
|
```bash
|
|
# this archives just the RSS file itself (probably not what you want)
|
|
docker compose run archivebox add 'https://example.com/some/feed.rss'
|
|
|
|
# this archives the RSS feed file + all the URLs mentioned inside of it
|
|
docker compose run archivebox add --depth=1 'https://example.com/some/feed.rss'
|
|
```
|
|
|
|
<br/>
|
|
|
|
### Accessing the data
|
|
|
|
The outputted archive data is stored in `data/` (relative to the project root), or whatever folder path you specified in the `docker-compose.yml` `volumes:` section. Make sure the `data/` folder on the host has permissions initially set to `777` so that the ArchiveBox command is able to set it to the specified `OUTPUT_PERMISSIONS` config setting on the first run.
|
|
|
|
To access the results directly via the filesystem, open `./data/archive/<timestamp>/index.html` (timestamp is shown in output of previous command).
|
|
|
|
Alternatively, to use the web UI, start the server with:
|
|
```bash
|
|
docker compose up # add -d to run in the background
|
|
```
|
|
|
|
Then open [`http://127.0.0.1:8000`](http://127.0.0.1:8000).
|
|
|
|
<br/>
|
|
|
|
### Configuration
|
|
|
|
ArchiveBox running with `docker compose` accepts all the same config options as other ArchiveBox distributions, see the full list of options available on the [[Configuration]] page.
|
|
|
|
The recommended way configure ArchiveBox in Docker Compose is using `archivebox config --set ...` or by editing `ArchiveBox.conf`.
|
|
```bash
|
|
docker compose run archivebox config --set TIMEOUT=120
|
|
# OR
|
|
echo 'TIMEOUT=120' >> ./data/ArchiveBox.conf
|
|
|
|
# plugin-specific options work the same way (see https://archivebox.github.io/abx-plugins/)
|
|
docker compose run archivebox config --set MEDIA_MAX_SIZE=750mb
|
|
```
|
|
This will apply the config to all containers or archivebox instances that access the collection.
|
|
|
|
If you're only running one container, or if you want to scope config options to only apply to a particular container, you can set them in that container's `environment:` section:
|
|
|
|
```yaml
|
|
...
|
|
|
|
services:
|
|
archivebox:
|
|
...
|
|
environment:
|
|
- USE_COLOR=False
|
|
- SHOW_PROGRESS=False
|
|
- CHECK_SSL_VALIDITY=False
|
|
- RESOLUTION=1900,1820
|
|
- MEDIA_TIMEOUT=512000
|
|
...
|
|
```
|
|
|
|
You can also specify an env file via CLI when running compose using `docker compose --env-file=/path/to/config.env ...` although you must specify the variables in the `environment:` section that you want to have passed down to the ArchiveBox container from the passed env file.
|
|
|
|
If you want to access your archive server with HTTPS, the bundled `docker-compose.yml` includes two opt-in ingress profiles:
|
|
|
|
- `COMPOSE_PROFILES=https` runs Traefik in front of ArchiveBox for HTTPS/TLS, with optional wildcard certificates via DNS-01.
|
|
- `COMPOSE_PROFILES=tunnel` runs a Cloudflare Tunnel for deployments without a public IP.
|
|
|
|
Set `BASE_URL=https://archive.example.com` in the `.env` file next to `docker-compose.yml`, then follow the inline comments in the compose file for the profile you choose. You can still bring your own reverse proxy such as Nginx or Caddy in front of `http://127.0.0.1:8000`; [`etc/nginx.conf`](https://github.com/ArchiveBox/ArchiveBox/blob/dev/etc/nginx.conf) remains a standalone example.
|
|
|
|
<br/>
|
|
|
|
---
|
|
|
|
<br/>
|
|
|
|
## Docker
|
|
|
|
<br/>
|
|
|
|
### Setup
|
|
|
|
Fetch and run the ArchiveBox Docker image to create your initial archive.
|
|
|
|
```bash
|
|
docker pull archivebox/archivebox
|
|
|
|
mkdkir -p ~/archivebox/data && cd ~/archivebox/data
|
|
docker run -it -v $PWD:/data archivebox/archivebox init --setup
|
|
```
|
|
|
|
*(You can create a collection in any directory you want, `~/archivebox/data` is just used as an example here)*
|
|
|
|
If you encounter permissions issues, make sure the mounted data directory is writable by its intended owner. Docker startup automatically uses the first non-root owner detected from the existing collection, or the default `archivebox` user when the data directory is root-owned.
|
|
|
|
<br/>
|
|
|
|
### Upgrading
|
|
|
|
See the wiki page on [Upgrading or Merging Archives: Upgrading with plain Docker](https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#upgrading-with-plain-docker) for instructions. ➡️
|
|
|
|
<br/>
|
|
|
|
### Usage
|
|
|
|
The Docker CLI `docker run ... archivebox/archivebox [subcommand]` works just like the non-Docker `archivebox [subcommand]` CLI.
|
|
|
|
First, make sure you're `cd`'ed into your collection data folder (e.g. `~/archivebox/data`).
|
|
|
|
```bash
|
|
docker run -it -v $PWD:/data archivebox/archivebox help
|
|
```
|
|
|
|
To add a single URL, pass it as an arg or pipe it in via stdin:
|
|
```bash
|
|
docker run -it -v $PWD:/data archivebox/archivebox add 'https://example.com'
|
|
# OR
|
|
echo 'https://example.com' | docker run -i -v $PWD:/data archivebox/archivebox add
|
|
```
|
|
|
|
To archive multiple URLs at once, pass text containing URLs in via stdin:
|
|
```bash
|
|
docker run -i -v $PWD:/data archivebox/archivebox add < urls.txt
|
|
# OR
|
|
curl 'https://example.com/some/rss/feed.xml' | docker run -i -v $PWD:/data archivebox/archivebox add
|
|
```
|
|
|
|
You can also use the `--depth=1` flag to tell ArchiveBox to recursively archive the URLs within a provided source.
|
|
```bash
|
|
docker run -it -v $PWD:/data archivebox/archivebox add --depth=1 'https://example.com/some/rss/feed.xml'
|
|
```
|
|
|
|
<br/>
|
|
|
|
### Accessing the data
|
|
|
|
The `docker run` `-v /path/on/host:/path/inside/container` flag specifies where your data dir lives on the host.
|
|
|
|
For example to use a folder on an external USB drive (instead of the current directory `$PWD` or `~/archivebox/data`):
|
|
```bash
|
|
docker run -it -v /media/USB-DRIVE/archivebox/data:/data archivebox/archivebox ...
|
|
```
|
|
|
|
Then to view your data, you can look in the folder on the host `/media/USB-DRIVE/archivebox/data`, or use the Web UI:
|
|
```bash
|
|
docker run -it -v /media/USB_DRIVE/archivebox/data:/data -p 8000:8000 archivebox/archivebox
|
|
# then open https://127.0.0.1:8000
|
|
```
|
|
|
|
<br/>
|
|
|
|
### Configuration
|
|
|
|
The easiest way is to use `archivebox config --set KEY=value` or edit `./ArchiveBox.conf` (in your collection dir).
|
|
|
|
For example, this sets `TIMEOUT=120` as a persistent setting for the collection:
|
|
```bash
|
|
docker run -it -v $PWD:/data archivebox/archivebox config --set TIMEOUT=120
|
|
# OR
|
|
echo 'TIMEOUT=120' >> ./ArchiveBox.conf
|
|
```
|
|
|
|
ArchiveBox in Docker also accepts config as environment variables, see more on the [[Configuration]] page (and the [abx-plugins config reference](https://archivebox.github.io/abx-plugins/) for per-plugin options).
|
|
|
|
For example, this disables the screenshot extractor for a single run (without persisting for other runs):
|
|
```bash
|
|
docker run -it -v $PWD:/data -e SCREENSHOT_ENABLED=False archivebox/archivebox add 'https://example.com'
|
|
# OR
|
|
echo 'SCREENSHOT_ENABLED=False' >> ./.env
|
|
docker run ... --env-file=./.env archivebox/archivebox ...
|
|
```
|