Compare commits

...

18 Commits

Author SHA1 Message Date
adamlamers 8de46f538d more readme notes
Continuous Integration / backend-tests (push) Successful in 1m1s
Continuous Integration / frontend-check (push) Successful in 20s
Continuous Integration / e2e-tests (push) Successful in 11m20s
2026-05-05 23:47:45 -04:00
adamlamers f5ed1adec4 not JUST tape
Continuous Integration / backend-tests (push) Successful in 1m18s
Continuous Integration / frontend-check (push) Successful in 22s
Continuous Integration / e2e-tests (push) Successful in 8m18s
2026-05-05 23:39:58 -04:00
adamlamers fb1ead7d63 new readme
Continuous Integration / backend-tests (push) Successful in 1m42s
Continuous Integration / frontend-check (push) Successful in 50s
Continuous Integration / e2e-tests (push) Successful in 7m24s
2026-05-05 23:33:26 -04:00
adamlamers f5ddfed38b let user set ionice in settings
Continuous Integration / backend-tests (push) Successful in 1m20s
Continuous Integration / frontend-check (push) Successful in 51s
Continuous Integration / e2e-tests (push) Successful in 6m55s
2026-05-05 22:07:30 -04:00
adamlamers 65860e0408 check staging area has enough capacity
Continuous Integration / backend-tests (push) Successful in 39s
Continuous Integration / frontend-check (push) Successful in 20s
Continuous Integration / e2e-tests (push) Successful in 5m17s
2026-05-05 21:33:44 -04:00
adamlamers 32fc9e4506 always call sg_read_attr to try and read tape info
Continuous Integration / backend-tests (push) Successful in 36s
Continuous Integration / frontend-check (push) Successful in 15s
Continuous Integration / e2e-tests (push) Successful in 5m8s
2026-05-05 20:59:34 -04:00
adamlamers f40a76aa14 better scsi ready state checking
Continuous Integration / backend-tests (push) Successful in 36s
Continuous Integration / frontend-check (push) Successful in 16s
Continuous Integration / e2e-tests (push) Successful in 5m51s
2026-05-05 20:51:41 -04:00
adamlamers d398664e51 better hardware polling on media page
Continuous Integration / backend-tests (push) Successful in 37s
Continuous Integration / frontend-check (push) Successful in 16s
Continuous Integration / e2e-tests (push) Successful in 6m9s
2026-05-05 20:41:03 -04:00
adamlamers fa171176fc media input refinement
Continuous Integration / backend-tests (push) Successful in 36s
Continuous Integration / frontend-check (push) Successful in 15s
Continuous Integration / e2e-tests (push) Successful in 5m46s
2026-05-05 20:07:35 -04:00
adamlamers 9e51247564 fast discover was also slower than os.walk
Continuous Integration / e2e-tests (push) Successful in 5m18s
Continuous Integration / backend-tests (push) Successful in 38s
Continuous Integration / frontend-check (push) Successful in 15s
2026-05-05 19:36:51 -04:00
adamlamers 4d4d9fa1e0 remove 'fast hashing' that was actually slower
Continuous Integration / backend-tests (push) Successful in 39s
Continuous Integration / frontend-check (push) Successful in 15s
Continuous Integration / e2e-tests (push) Successful in 5m57s
2026-05-05 19:13:32 -04:00
adamlamers c3457308ba make test_list_jobs_populated deterministic
Continuous Integration / backend-tests (push) Successful in 40s
Continuous Integration / frontend-check (push) Successful in 14s
Continuous Integration / e2e-tests (push) Successful in 5m12s
2026-05-05 18:54:41 -04:00
adamlamers 1ef2c194db media tests
Continuous Integration / e2e-tests (push) Successful in 5m22s
Continuous Integration / backend-tests (push) Successful in 38s
Continuous Integration / frontend-check (push) Successful in 16s
2026-05-05 18:48:47 -04:00
adamlamers d77a79876f cloud provider coverage
Continuous Integration / backend-tests (push) Successful in 39s
Continuous Integration / frontend-check (push) Successful in 16s
Continuous Integration / e2e-tests (push) Successful in 6m48s
2026-05-05 18:38:42 -04:00
adamlamers ae74a0bf02 more test improvements & new tests
Continuous Integration / backend-tests (push) Successful in 36s
Continuous Integration / frontend-check (push) Successful in 18s
Continuous Integration / e2e-tests (push) Successful in 5m13s
2026-05-05 17:26:03 -04:00
adamlamers f44895d40f more checks in archiver & scanner tests
Continuous Integration / backend-tests (push) Successful in 33s
Continuous Integration / frontend-check (push) Successful in 16s
Continuous Integration / e2e-tests (push) Successful in 5m11s
2026-05-05 17:13:47 -04:00
adamlamers c76ccd0dfa strengthen tests
Continuous Integration / backend-tests (push) Successful in 42s
Continuous Integration / frontend-check (push) Successful in 16s
Continuous Integration / e2e-tests (push) Successful in 5m29s
2026-05-05 17:02:59 -04:00
adamlamers 06eb00ab3e test cleanup 2026-05-05 15:28:31 -04:00
26 changed files with 1858 additions and 923 deletions
+104 -35
View File
@@ -1,57 +1,126 @@
# TapeHoard
A robust, index-driven Tape Backup Manager designed for single-tape drive users and scalable to tape libraries.
> Physical media archival for people who don't trust the cloud alone.
For full architectural details, see [PLAN.md](PLAN.md).
**TapeHoard is not just for tapes.** It's a self-hosted backup manager for any offline-capable storage you already own:
## Docker Deployment
- **Offline HDDs / USB drives** — Any mountable filesystem (ext4, NTFS, APFS, exFAT)
- **S3-compatible cloud** — Encrypted copies on Wasabi, Backblaze B2, MinIO, or any S3-compat provider
- **LTO tape** — If you happen to own a tape drive like some of us do
TapeHoard is designed to run as a Docker container with native hardware access.
It indexes your source filesystems, tracks what has been archived to which medium, and gives you a searchable catalog—even when the media itself is sitting in a vault across town.
### Permissions (PUID/PGID)
The container supports `PUID` and `PGID` environment variables to ensure files written to volumes match your host user's identity.
![Dashboard](docs/screenshots/dashboard.png)
**Critical:** To ensure fast startup times, TapeHoard **does not** perform a recursive `chown` on your data. You must ensure your host directories are owned by the same PUID/PGID you provide to the container:
## Features
```bash
# Example: If PUID=1000 and PGID=1000
sudo chown -R 1000:1000 ./db ./staging ./source_data ./restores
```
| Feature | Description |
|---|---|
| **Index-First Design** | Browse, search, and check discrepancies against the database. Live filesystem is only touched during scans. |
| **Multi-Media Fleet** | Manage HDDs, USB drives, S3-compatible cloud, and LTO tape in one inventory. |
| **Ordered Auto-Archival** | Drag media to set fill priority. Backups flow to the first available medium in your sequence. |
| **LTO Tape Native** | Barcode discovery via MAM, hardware compression control, direct SCSI streaming to tape. |
| **Restore Queue** | Stage files for recovery. Get a minimum-media manifest so you only mount what you need. |
| **Discrepancy Detection** | Find missing files, changes without backup, or policy exclusions. |
| **Encrypted at Rest** | Per-media encryption secrets in a built-in keystore. Compatible with LTO hardware encryption (`stenc`) and client-side cloud encryption. |
| **Scheduled Scans** | Cron-like scheduling for automatic filesystem discovery and hashing. |
| **Exclusion Policies** | Global gitignore-style patterns to skip caches, build artifacts, and temp files. |
### Example `docker-compose.yml`
## Screenshots
| Dashboard | Media Inventory |
|---|---|
| ![Dashboard](docs/screenshots/dashboard.png) | ![Inventory](docs/screenshots/inventory.png) |
| Live Filesystem | System Settings |
|---|---|
| ![Filesystem](docs/screenshots/filesystem.png) | ![Settings](docs/screenshots/settings.png) |
## Quick Start (Docker Compose)
The recommended deployment is a single container with persistent volumes for the database, staging area, and source/restore mounts.
```yaml
services:
tapehoard:
image: tapehoard:latest
image: ghcr.io/tapehoard/tapehoard:latest
container_name: tapehoard
environment:
- PUID=1000
- PGID=1000
- TZ=UTC
volumes:
- ./db:/database
- ./staging:/staging
- /mnt/my_data:/source_data:ro
- /mnt/restores:/restores
# LTO Tape Drive Passthrough
- /dev/nst0:/dev/nst0
- /dev/sgX:/dev/sgX
cap_add:
- SYS_RAWIO
devices:
- /dev/nst0:/dev/nst0
- /dev/sgX:/dev/sgX
environment:
- TZ=UTC
- DATABASE_URL=sqlite:////database/tapehoard.db
- STAGING_DIRECTORY=/staging
ports:
- "8000:8000"
restart: unless-stopped
- '30265:8000'
volumes:
- ./database:/database
- ./staging:/staging
- /mnt/archive:/source_data:ro
- /mnt/restores:/restores
```
## Project Structure
### Requirements
* `backend/`: Python/FastAPI application handling the heavy lifting (hashing, streaming, db indexing).
* `frontend/`: Svelte 5 application providing the Web UI.
* `docker/`: Files required for building the multi-stage Docker container.
* `docs/`: Additional documentation.
- **Linux host** (LTO tape support requires `mt`, `sg_read_attr`, and optionally `stenc` on the host or in the container)
- **Persistent volumes** — Database and staging must survive container restarts
## Quickstart
> **No tape drive?** Remove the `cap_add`, `devices`, and `SYS_RAWIO` lines from the compose file above. TapeHoard works great with just HDDs, USB drives, or cloud storage.
(Coming soon)
### Hardware-Specific Notes
**HDDs / USB Drives (Recommended for most users):**
- Mount the drive filesystem into the container at `/source_data` or a restore destination
- The HDD provider reads a `.tapehoard_id` file on the drive root to identify media
- No special capabilities required — works on any Linux, macOS, or Docker host
**S3-Compatible Cloud:**
- Configure endpoint URL, bucket, region, and access credentials in settings
- Optional client-side filename obfuscation and encryption
**LTO Tape (For the dedicated):**
- The container must run as root or have access to the SCSI device node
- Requires `SYS_RAWIO` capability for direct SCSI access
- Set `TAPEHOARD_TEST_MODE=true` to enable a mock LTO provider for development without hardware
### First Run
1. Start the container: `docker compose up -d`
2. Open `http://host:30265`
3. Go to **Settings → Drives** and configure your source roots (and tape drive path, if applicable)
4. Trigger an initial scan from the dashboard
5. Register media under **Physical Inventory**
6. Run your first backup
## Development
TapeHoard uses [`just`](https://github.com/casey/just) as its command runner. Install it (`brew install just` or `cargo install just`), then run `just` to see all available commands.
### Common Tasks
```bash
just dev # Start backend + frontend with hot reload
just backend # Start only the backend
just frontend # Start only the frontend
just test # Run linting, backend tests, and E2E tests
just lint # Run Ruff, ty, and Svelte checks
just format # Auto-format Python code
just generate-client # Regenerate TypeScript SDK from OpenAPI spec
```
### Database Migrations
```bash
just db-upgrade # Apply pending migrations
just db-migrate "add user table" # Autogenerate a new migration
```
## Why TapeHoard?
Most backup tools are built for always-online replication. TapeHoard is built for media you can unplug:
- **Air-gappable** — Pull the drive or tape, store it offline. Your index stays searchable even when the media is in a vault.
- **Auditability** — Every file's SHA-256, every version's offset on every medium, tracked in SQLite.
- **No vendor lock-in** — Standard tar archives on tape, standard files on disk, standard S3 objects in cloud. If TapeHoard disappears, your data doesn't.
+8 -1
View File
@@ -56,7 +56,7 @@ def get_exclusion_spec(db_session: Session) -> Optional[pathspec.PathSpec]:
for pattern in settings_record.value.splitlines()
if pattern.strip()
]
return pathspec.PathSpec.from_lines("gitwildmatch", exclusion_patterns)
return pathspec.PathSpec.from_lines("gitignore", exclusion_patterns)
def get_ignored_status(
@@ -159,6 +159,13 @@ class DashboardStatsSchema(BaseModel):
redundancy_ratio: float
class StagingInfoSchema(BaseModel):
path: str
total_bytes: int
used_bytes: int
free_bytes: int
class JobSchema(BaseModel):
model_config = ConfigDict(from_attributes=True)
+38 -1
View File
@@ -1,7 +1,10 @@
import shutil
from fastapi import APIRouter, Depends
from sqlalchemy.orm import Session
from app.db.database import get_db
from app.api.common import DashboardStatsSchema
from app.api.common import DashboardStatsSchema, StagingInfoSchema
from app.core.config import settings
from sqlalchemy import func, text
from app.db import models
@@ -113,3 +116,37 @@ def get_dashboard_stats(db_session: Session = Depends(get_db)):
last_scan_time=last_scan.completed_at if last_scan else None,
redundancy_ratio=round(redundancy_percentage, 1),
)
@router.get(
"/staging/info", response_model=StagingInfoSchema, operation_id="get_staging_info"
)
def get_staging_info():
"""Returns disk usage information for the backup staging directory."""
path = settings.staging_directory
try:
usage = shutil.disk_usage(path)
return StagingInfoSchema(
path=path,
total_bytes=usage.total,
used_bytes=usage.used,
free_bytes=usage.free,
)
except OSError:
# Fallback: if the configured path doesn't exist yet, check its parent
parent = path if path == "/" else path.rsplit("/", 1)[0] or "/"
try:
usage = shutil.disk_usage(parent)
return StagingInfoSchema(
path=path,
total_bytes=usage.total,
used_bytes=usage.used,
free_bytes=usage.free,
)
except OSError:
return StagingInfoSchema(
path=path,
total_bytes=0,
used_bytes=0,
free_bytes=0,
)
+2 -2
View File
@@ -127,7 +127,7 @@ def test_exclusions(
total_files=0, total_size=0, matched_count=0, matched_size=0, sample=[]
)
spec = pathspec.PathSpec.from_lines("gitwildmatch", patterns)
spec = pathspec.PathSpec.from_lines("gitignore", patterns)
all_files = (
db_session.query(models.FilesystemState)
@@ -179,7 +179,7 @@ def download_exclusion_report(
if not patterns:
raise HTTPException(status_code=400, detail="No patterns provided")
spec = pathspec.PathSpec.from_lines("gitwildmatch", patterns)
spec = pathspec.PathSpec.from_lines("gitignore", patterns)
all_files = (
db_session.query(models.FilesystemState)
+48
View File
@@ -4,6 +4,54 @@ import sys
from loguru import logger
def _get_ionice_setting() -> str:
"""Reads the user's preferred I/O scheduling class from settings."""
try:
from app.db.database import SessionLocal
from app.db import models
with SessionLocal() as db_session:
record = (
db_session.query(models.SystemSetting)
.filter(models.SystemSetting.key == "ionice_level")
.first()
)
if record and record.value in ("idle", "best-effort", "realtime"):
return record.value
except Exception:
pass
return "idle" # Default: be the most polite
def set_process_priority(level: str):
"""Adjusts CPU and I/O priority of the current process.
Args:
level: "background" for lowest priority (ionice idle + nice 19),
"normal" to reset (ionice best-effort + nice 0).
"""
try:
import psutil
p = psutil.Process(os.getpid())
if level == "background":
ionice_level = _get_ionice_setting()
if hasattr(p, "ionice"):
if ionice_level == "idle":
p.ionice(psutil.IOPRIO_CLASS_IDLE) # type: ignore[attr-defined]
elif ionice_level == "realtime":
p.ionice(psutil.IOPRIO_CLASS_RT) # type: ignore[attr-defined]
else:
p.ionice(psutil.IOPRIO_CLASS_BE) # type: ignore[attr-defined]
p.nice(19)
else:
if hasattr(p, "ionice"):
p.ionice(psutil.IOPRIO_CLASS_BE) # type: ignore[attr-defined]
p.nice(0)
except Exception as e:
logger.debug(f"Could not set process priority to '{level}': {e}")
def get_path_uuid(path: str) -> str | None:
"""Attempts to retrieve a stable hardware/filesystem UUID for a given path."""
if not os.path.exists(path):
+49 -18
View File
@@ -10,7 +10,7 @@ from loguru import logger
class LTOProvider(AbstractStorageProvider):
provider_id = "lto_tape"
name = "LTO Tape Drive"
name = "LTO Tape"
description = "Hardware Linear Tape-Open (LTO) drives."
capabilities = {
"supports_random_access": False,
@@ -70,7 +70,8 @@ class LTOProvider(AbstractStorageProvider):
"drive": {},
"mam": {},
"online": False,
"last_check": 0.0,
"last_online_check": 0.0,
"last_mam_check": 0.0,
}
def _log_command(self, cmd: List[str]):
@@ -116,7 +117,8 @@ class LTOProvider(AbstractStorageProvider):
# Throttle MAM reads to once every 2 seconds unless forced
now = time.time()
if not force and (
now - LTOProvider._lkg_state[self.device_path].get("last_check", 0) < 2.0
now - LTOProvider._lkg_state[self.device_path].get("last_mam_check", 0)
< 2.0
):
return LTOProvider._lkg_state[self.device_path]["mam"]
@@ -233,22 +235,33 @@ class LTOProvider(AbstractStorageProvider):
# SUCCESS! Update LKG MAM state
LTOProvider._lkg_state[self.device_path]["mam"] = mam
LTOProvider._lkg_state[self.device_path]["last_check"] = time.time()
LTOProvider._lkg_state[self.device_path]["last_mam_check"] = (
time.time()
)
return mam
# If we get "Device or resource busy", wait a bit and retry
# Log failure so we can diagnose why sg_read_attr isn't working
stderr_text = (
(result.stderr or b"").decode().lower()
(result.stderr or b"").decode()
if isinstance(result.stderr, bytes)
else (result.stderr or "").lower()
else (result.stderr or "")
)
if result.returncode != 0 and "busy" in stderr_text:
if result.returncode != 0:
logger.warning(
f"sg_read_attr returned code {result.returncode} for {self.device_path} (attempt {attempt + 1}/3): {stderr_text[:200]}"
)
if "busy" in stderr_text.lower():
time.sleep(0.2)
continue
except FileNotFoundError:
logger.error(
f"'sg_read_attr' binary not found in PATH. Cannot read MAM for {self.device_path}."
)
break
except Exception as e:
logger.debug(
f"MAM read attempt {attempt} failed for {self.device_path}: {e}"
logger.warning(
f"MAM read attempt {attempt + 1}/3 failed for {self.device_path}: {e}"
)
time.sleep(0.1)
@@ -296,11 +309,15 @@ class LTOProvider(AbstractStorageProvider):
now = time.time()
if (
not force
and now - LTOProvider._lkg_state[self.device_path].get("last_check", 0)
and now
- LTOProvider._lkg_state[self.device_path].get("last_online_check", 0)
< 2.0
):
return LTOProvider._lkg_state[self.device_path]["online"]
is_online = False
# 1. Try mt status
try:
cmd = ["mt", "-f", self.device_path, "status"]
self._log_command(cmd)
@@ -314,22 +331,36 @@ class LTOProvider(AbstractStorageProvider):
"Device or resource busy" in stderr
or "Device or resource busy" in stdout
):
LTOProvider._lkg_state[self.device_path]["online"] = True
return True
is_online = True
else:
is_online = (
"ONLINE" in stdout or "READY" in stdout or result.returncode == 0
)
except FileNotFoundError:
logger.debug(f"'mt' binary not found for {self.device_path}")
except Exception as e:
logger.debug(f"mt status failed for {self.device_path}: {e}")
# If we transitioned from online -> offline, clear the LKG MAM (tape was likely ejected)
# 2. Fallback: try sg_turs (SCSI Test Unit Ready)
if not is_online:
try:
cmd = ["sg_turs", self.device_path]
self._log_command(cmd)
result = subprocess.run(cmd, capture_output=True, timeout=5)
if result.returncode == 0:
is_online = True
except FileNotFoundError:
logger.debug(f"'sg_turs' binary not found for {self.device_path}")
except Exception as e:
logger.debug(f"sg_turs failed for {self.device_path}: {e}")
# 3. If we transitioned from online -> offline, clear the LKG MAM (tape was likely ejected)
if LTOProvider._lkg_state[self.device_path]["online"] and not is_online:
LTOProvider._lkg_state[self.device_path]["mam"] = {}
LTOProvider._lkg_state[self.device_path]["online"] = is_online
LTOProvider._lkg_state[self.device_path]["last_check"] = now
LTOProvider._lkg_state[self.device_path]["last_online_check"] = now
return is_online
except Exception:
return LTOProvider._lkg_state[self.device_path]["online"]
def is_write_protected(self) -> bool:
"""Checks if the tape is write-protected (read-only)"""
+30
View File
@@ -307,6 +307,10 @@ class ArchiverService:
)
JobManager.add_job_log(job_id, f"Starting backup to {media_record.identifier}")
from app.core.utils import set_process_priority
set_process_priority("background")
workload_batch = self.assemble_backup_batch(db_session, media_id)
if not workload_batch:
JobManager.add_job_log(job_id, "No files require backup")
@@ -374,6 +378,31 @@ class ArchiverService:
if current_chunk:
chunks.append(current_chunk)
# --- Staging Space Validation ---
# Sequential media (tape) requires staging the full tarfile before writing.
# Ensure the staging directory has enough free space for the largest chunk.
if not storage_provider.capabilities.get("supports_random_access"):
largest_chunk_size = max(
sum(i["offset_end"] - i["offset_start"] for i in chunk)
for chunk in chunks
)
try:
usage = shutil.disk_usage(self.staging_directory)
# Require 110% of chunk size to leave headroom for tar overhead
required = int(largest_chunk_size * 1.1)
if usage.free < required:
free_gb = usage.free / (1024**3)
req_gb = required / (1024**3)
JobManager.fail_job(
job_id,
f"Staging area at {self.staging_directory} has only {free_gb:.1f} GB free, "
f"but the largest archive chunk requires {req_gb:.1f} GB. "
f"Free up space or reduce the backup set.",
)
return
except OSError as e:
logger.warning(f"Could not check staging disk usage: {e}")
JobManager.add_job_log(job_id, f"Packed into {len(chunks)} archive(s)")
for chunk_index, chunk_items in enumerate(chunks):
@@ -719,6 +748,7 @@ class ArchiverService:
logger.exception(f"Archival failed: {e}")
JobManager.fail_job(job_id, str(e))
finally:
set_process_priority("normal")
# Clean up any residual staging files
for chunk_file in os.listdir(self.staging_directory):
if chunk_file.startswith("backup_") and chunk_file.endswith(".tar"):
+11 -399
View File
@@ -1,12 +1,10 @@
import concurrent.futures
import hashlib
import os
import shutil
import subprocess
import threading
import time
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, Dict, List, Optional
import psutil
from loguru import logger
@@ -16,278 +14,6 @@ from sqlalchemy.orm.exc import ObjectDeletedError, StaleDataError
from app.db import models
from app.db.database import SessionLocal
# Fast file discovery via `find -printf` (GNU find or compatible).
# Detected once at import time; falls back to os.walk if unavailable.
_FAST_FIND_BINARY: Optional[str] = None
# Fast hashing via `sha256sum` or `shasum`.
# Detected once at import time; falls back to Python hashlib if unavailable.
_FAST_HASH_BINARY: Optional[str] = None
def _detect_fast_find() -> Optional[str]:
"""Check if a `find` binary with `-printf` support is available.
Tries `gfind` (GNU find via Homebrew on macOS) first, then `find`.
Returns the binary path if `-printf` works, otherwise ``None``.
"""
for candidate in ("gfind", "find"):
binary = shutil.which(candidate)
if binary is None:
continue
try:
result = subprocess.run(
[binary, "/tmp", "-maxdepth", "0", "-printf", "%f\n"],
capture_output=True,
timeout=5,
)
if result.returncode == 0 and result.stdout.strip() == b"tmp":
return binary
except Exception:
continue
return None
def _detect_fast_hash() -> Optional[str]:
"""Check if a SHA-256 binary is available for batch hashing.
Tries `sha256sum` (GNU coreutils, Linux/Homebrew) then `shasum` (macOS).
Returns the binary path if it works, otherwise ``None``.
"""
# Try sha256sum first (Linux, Homebrew gnu-coreutils)
binary = shutil.which("sha256sum")
if binary:
try:
result = subprocess.run(
[binary, "/dev/null"],
capture_output=True,
timeout=5,
)
if (
result.returncode == 0
and b"e3b0c44298fc1c149afbf4c8996fb924" in result.stdout
):
return binary
except Exception:
pass
# Try shasum (macOS default)
binary = shutil.which("shasum")
if binary:
try:
result = subprocess.run(
[binary, "-a", "256", "/dev/null"],
capture_output=True,
timeout=5,
)
if (
result.returncode == 0
and b"e3b0c44298fc1c149afbf4c8996fb924" in result.stdout
):
return binary
except Exception:
pass
return None
def _init_fast_features() -> Tuple[Optional[str], Optional[str]]:
global _FAST_FIND_BINARY, _FAST_HASH_BINARY
_FAST_FIND_BINARY = _detect_fast_find()
_FAST_HASH_BINARY = _detect_fast_hash()
if _FAST_FIND_BINARY:
logger.info(f"Fast file discovery enabled: using {_FAST_FIND_BINARY} -printf")
else:
logger.info("Fast file discovery unavailable: falling back to os.walk")
if _FAST_HASH_BINARY:
logger.info(f"Fast hashing enabled: using {_FAST_HASH_BINARY}")
else:
logger.info("Fast hashing unavailable: falling back to Python hashlib")
return _FAST_FIND_BINARY, _FAST_HASH_BINARY
_FAST_FIND_BINARY, _FAST_HASH_BINARY = _init_fast_features()
def _hash_file_batch_fast(
file_paths: List[str], binary: str
) -> Dict[str, Optional[str]]:
"""Hash a batch of files using a native SHA-256 binary.
Streams output line-by-line via subprocess.Popen for incremental progress.
Args:
file_paths: Paths to hash.
binary: Path to sha256sum or shasum.
Returns a mapping of file_path -> hex_digest (or None on failure).
"""
results: Dict[str, Optional[str]] = {}
if not file_paths:
return results
# Build command: shasum needs -a 256 prefix, sha256sum doesn't
if binary.endswith("sha256sum"):
cmd = [binary, "--"] + file_paths
else:
# shasum
cmd = [binary, "-a", "256", "--"] + file_paths
try:
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
)
# Stream output line-by-line for incremental progress
if proc.stdout is None:
return results
for line in iter(proc.stdout.readline, b""):
line = line.strip()
if not line:
continue
# Format: "<hash> <path>" or "<hash> *<path>"
parts = line.split(b" ", 1)
if len(parts) != 2:
# Try single space with binary marker: "<hash> *<path>"
parts = line.split(b" *", 1)
if len(parts) != 2:
continue
file_hash = parts[0].decode("ascii", errors="replace").lower()
raw_path = parts[1].decode("utf-8", errors="replace")
# sha256sum may escape backslashes in filenames; handle common case
clean_path = raw_path.replace("\\\\", "\\")
results[clean_path] = file_hash
proc.stdout.close()
proc.wait()
except Exception as e:
logger.error(f"Native hash batch failed: {e}")
return results
def _discover_files_fast(
root_base: str,
job_id: Optional[int],
batch_size: int,
current_timestamp,
resolve_tracking,
sync_metadata_batch,
metrics_lock,
metrics,
db_session: Session,
) -> Tuple[int, int]:
"""Walk a tree using `find -printf` for fast metadata extraction.
Streams output line-by-line via subprocess.Popen so progress updates
appear as files are discovered instead of waiting for find to finish.
Returns (files_found, files_batched) counts.
"""
total_files_found = 0
files_batched = 0
pending_metadata: List[Dict[str, Any]] = []
# -printf format: path\tsize\tmtime (tab-separated; split from right for safety)
find_binary = _FAST_FIND_BINARY
if find_binary is None:
logger.warning(
"Fast file discovery requested but no compatible `find` binary found"
)
return 0, 0
cmd = [
find_binary,
root_base,
"-type",
"f",
"-printf",
"%p\t%s\t%T@\n",
]
try:
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
)
if proc.stdout is None:
logger.error(
f"Fast file discovery failed: could not open stdout for {root_base}"
)
return 0, 0
except Exception as e:
logger.error(f"Fast file discovery failed for {root_base}: {e}")
return 0, 0
# Stream output line by line (tab-separated: path\tsize\tmtime)
for line in iter(proc.stdout.readline, b""):
if job_id is not None and JobManager.is_cancelled(job_id):
break
if not line.strip():
continue
# Split from right: mtime and size are always numeric
parts = line.split(b"\t")
if len(parts) < 3:
continue
# First n-2 parts may be path components (tabs in filename are rare)
full_file_path = b"\t".join(parts[:-2]).decode("utf-8", errors="replace")
try:
file_size = int(parts[-2])
file_mtime = float(parts[-1])
except (ValueError, IndexError):
continue
total_files_found += 1
with metrics_lock:
metrics["total_files_found"] = total_files_found
metrics["current_path"] = os.path.dirname(full_file_path)
is_ignored = resolve_tracking(full_file_path)
pending_metadata.append(
{
"path": full_file_path,
"size": file_size,
"mtime": file_mtime,
"ignored": is_ignored,
}
)
if len(pending_metadata) >= batch_size:
sync_metadata_batch(db_session, pending_metadata, current_timestamp)
db_session.commit()
files_batched += len(pending_metadata)
pending_metadata = []
if job_id is not None:
JobManager.update_job(
job_id,
10.0,
f"Discovered {total_files_found} items...",
)
proc.stdout.close()
proc.wait()
# Flush remaining batch
if pending_metadata:
sync_metadata_batch(db_session, pending_metadata, current_timestamp)
db_session.commit()
files_batched += len(pending_metadata)
return total_files_found, files_batched
class JobManager:
"""Manages operational job states and persistence with high resilience for background threads."""
@@ -443,23 +169,6 @@ class ScannerService:
return
time.sleep(0.1)
def _set_process_priority(self, level: str):
"""Adjusts the CPU and I/O priority of the current process."""
try:
p = psutil.Process(os.getpid())
if level == "background":
if hasattr(p, "ionice"):
p.ionice(
psutil.IOPRIO_CLASS_IDLE # ty: ignore[unresolved-attribute]
)
p.nice(19)
else:
if hasattr(p, "ionice"):
p.ionice(psutil.IOPRIO_CLASS_BE) # ty: ignore[unresolved-attribute]
p.nice(0)
except Exception:
pass
def compute_sha256(
self, file_path: str, job_id: Optional[int] = None
) -> Optional[str]:
@@ -505,7 +214,9 @@ class ScannerService:
JobManager.update_job(job_id, 0.0, "Starting system scan...")
JobManager.add_job_log(job_id, "Starting system scan")
self._set_process_priority("normal")
from app.core.utils import set_process_priority
set_process_priority("normal")
with self._metrics_lock:
self.files_processed = 0
self.files_new = 0
@@ -556,27 +267,6 @@ class ScannerService:
if not os.path.exists(root_base):
continue
if _FAST_FIND_BINARY:
# Fast path: GNU find -printf (metadata extracted in C)
metrics = {
"total_files_found": 0,
"current_path": root_base,
}
found, _ = _discover_files_fast(
root_base,
job_id,
BATCH_SIZE,
current_timestamp,
resolve_tracking,
self._sync_metadata_batch,
self._metrics_lock,
metrics,
db_session,
)
with self._metrics_lock:
self.total_files_found += found
else:
# Compatibility path: Python os.walk + os.stat
for current_dir, _sub_dirs, file_names in os.walk(root_base):
if job_id is not None and JobManager.is_cancelled(job_id):
break
@@ -729,7 +419,9 @@ class ScannerService:
with self._metrics_lock:
self.is_hashing = True
self._set_process_priority("background")
from app.core.utils import set_process_priority
set_process_priority("background")
try:
with SessionLocal() as db_session:
@@ -751,10 +443,8 @@ class ScannerService:
.count()
)
# Fast hash batch size: more files per batch reduces subprocess overhead
HASH_BATCH_SIZE = 100 if _FAST_HASH_BINARY else 100
# How many files to pull from DB per iteration
FETCH_LIMIT = HASH_BATCH_SIZE * 4
FETCH_LIMIT = 400
while self.is_hashing:
# Find unindexed work (exclude deleted files - they cannot be hashed)
@@ -780,82 +470,7 @@ class ScannerService:
if JobManager.is_cancelled(hashing_job.id):
break
if _FAST_HASH_BINARY:
# Fast path: batch files to native sha256sum/shasum
# Group into sub-batches of HASH_BATCH_SIZE for parallel processing
file_paths = [t.file_path for t in hashing_targets]
path_to_record = {t.file_path: t for t in hashing_targets}
sub_batches = [
file_paths[i : i + HASH_BATCH_SIZE]
for i in range(0, len(file_paths), HASH_BATCH_SIZE)
]
max_workers = min(os.cpu_count() or 4, len(sub_batches))
with concurrent.futures.ThreadPoolExecutor(
max_workers=max_workers
) as hashing_executor:
future_to_batch = {
hashing_executor.submit(
_hash_file_batch_fast,
batch,
_FAST_HASH_BINARY,
): batch
for batch in sub_batches
}
for future in concurrent.futures.as_completed(
future_to_batch
):
if not self.is_hashing:
break
batch = future_to_batch[future]
try:
batch_results = future.result()
except Exception:
continue
# Apply hashes and detect missing files ONLY for this batch
for file_path in batch:
target_record = path_to_record.get(file_path)
if not target_record:
continue
if file_path in batch_results:
target_record.sha256_hash = batch_results[
file_path
]
with self._metrics_lock:
self.bytes_hashed += target_record.size or 0
self.files_hashed += 1
# Report progress incrementally as files complete
if self.files_hashed % 5 == 0:
progress = min(
99.9,
(
self.files_hashed
/ max(total_pending, 1)
)
* 100,
)
JobManager.update_job(
hashing_job.id,
progress,
f"Hashed {self.files_hashed} files ({self._format_throughput()})...",
)
elif not os.path.exists(file_path):
target_record.is_deleted = True
with self._metrics_lock:
self.files_missing += 1
# Throttle between sub-batches if I/O pressure is high
with self._metrics_lock:
should_throttle = self.is_throttled
if should_throttle:
time.sleep(0.5)
else:
# Compatibility path: Python hashlib via thread pool
# Hash files using Python hashlib via thread pool
max_workers = os.cpu_count() or 4
with concurrent.futures.ThreadPoolExecutor(
max_workers=max_workers
@@ -869,9 +484,7 @@ class ScannerService:
for target in hashing_targets
}
for future in concurrent.futures.as_completed(
future_to_file
):
for future in concurrent.futures.as_completed(future_to_file):
if not self.is_hashing:
break
@@ -892,8 +505,7 @@ class ScannerService:
if self.files_hashed % 5 == 0:
progress = min(
99.9,
(self.files_hashed / max(total_pending, 1))
* 100,
(self.files_hashed / max(total_pending, 1)) * 100,
)
JobManager.update_job(
hashing_job.id,
+4 -2
View File
@@ -82,9 +82,11 @@ def db_session():
conn.execute(text("PRAGMA foreign_keys = OFF"))
# Fetch all tables from the metadata
for table_name in reversed(Base.metadata.tables.keys()):
# Avoid truncating internal alembic or FTS tables
if "alembic" not in table_name and "fts" not in table_name:
# Avoid truncating internal alembic tables
if "alembic" not in table_name:
conn.execute(text(f"DELETE FROM {table_name}"))
# FTS5 virtual table is not in Base.metadata; clear it explicitly
conn.execute(text("DELETE FROM filesystem_fts"))
conn.execute(text("PRAGMA foreign_keys = ON"))
+509 -4
View File
@@ -1,4 +1,5 @@
from app.db import models
from app.services.archiver import archiver_manager
from datetime import datetime, timezone
import json
@@ -147,14 +148,21 @@ def test_search_index(client, db_session):
)
db_session.commit()
# Trigger FTS manually since we are using raw SQL triggers which might not have fired
# if we didn't insert via SQL or if there are issues in :memory:
# but conftest uses a real temp file.
# Manually insert into FTS5 since triggers may not fire on ORM inserts in tests
from sqlalchemy import text
db_session.execute(
text("INSERT INTO filesystem_fts(rowid, file_path) VALUES (:rowid, :path)"),
{"rowid": file1.id, "path": file1.file_path},
)
db_session.commit()
response = client.get("/archive/search?q=important")
assert response.status_code == 200
# If FTS5 is working, it should return results.
data = response.json()
assert len(data) == 1
assert data[0]["path"] == "data/important.doc"
assert data[0]["name"] == "important.doc"
def test_get_metadata(client, db_session):
@@ -694,3 +702,500 @@ def test_metadata_directory(client, db_session):
assert data["type"] == "directory"
assert data["child_count"] == 2
assert data["size"] == 300
# ── List Providers ──
def test_list_providers_includes_mock_in_test_mode(client, monkeypatch):
"""Tests that MockLTOProvider is included when TAPEHOARD_TEST_MODE is set."""
monkeypatch.setenv("TAPEHOARD_TEST_MODE", "true")
response = client.get("/inventory/providers")
assert response.status_code == 200
data = response.json()
provider_ids = [p["provider_id"] for p in data]
assert "lto_tape" in provider_ids
assert "local_hdd" in provider_ids
assert "s3_compat" in provider_ids
assert "mock_lto" in provider_ids
# ── List Media with Provider State ──
def test_list_media_with_refresh(client, db_session, mocker):
"""Tests listing media with refresh=True queries hardware status."""
media = models.StorageMedia(
media_type="local_hdd",
identifier="DISK_ONLINE",
capacity=1000,
status="active",
extra_config='{"mount_path": "/tmp"}',
)
db_session.add(media)
db_session.commit()
mock_provider = mocker.MagicMock()
mock_provider.check_online.return_value = True
mock_provider.identify_media.return_value = "DISK_ONLINE"
mock_provider.get_live_info.return_value = {"online": True}
mock_provider.mount_base = "/tmp"
mocker.patch.object(
archiver_manager,
"_get_storage_provider",
return_value=mock_provider,
)
response = client.get("/inventory/media?refresh=true")
assert response.status_code == 200
data = response.json()
assert len(data) == 1
assert data[0]["is_online"] is True
assert data[0]["is_identified"] is True
# ── Create Media Validation ──
def test_create_media_duplicate_identifier(client, db_session):
"""Tests creating media with duplicate identifier returns 400."""
db_session.add(
models.StorageMedia(
media_type="hdd", identifier="DUPE", capacity=1000, status="active"
)
)
db_session.commit()
response = client.post(
"/inventory/media",
json={"media_type": "local_hdd", "identifier": "DUPE", "capacity": 1000},
)
assert response.status_code == 400
assert "already exists" in response.json()["detail"]
# ── Update Media Edge Cases ──
def test_update_media_not_found(client):
"""Tests updating non-existent media returns 404."""
response = client.patch("/inventory/media/99999", json={"location": "Nowhere"})
assert response.status_code == 404
def test_update_status_to_failed_purges_versions(client, db_session):
"""Setting status to FAILED should delete all file_versions."""
media = models.StorageMedia(
media_type="hdd", identifier="DISK_FAIL_001", capacity=1000, status="active"
)
db_session.add(media)
db_session.flush()
file1 = models.FilesystemState(file_path="data/file1.txt", size=100, mtime=1000)
db_session.add(file1)
db_session.flush()
db_session.add(
models.FileVersion(
filesystem_state_id=file1.id,
media_id=media.id,
file_number="1",
offset_start=0,
offset_end=100,
)
)
db_session.commit()
response = client.patch(
f"/inventory/media/{media.id}",
json={"status": "FAILED"},
)
assert response.status_code == 200
from sqlalchemy import text
result = db_session.execute(
text("SELECT COUNT(*) FROM file_versions WHERE media_id = :media_id"),
{"media_id": media.id},
).scalar()
assert result == 0
def test_update_media_all_lto_fields(client, db_session):
"""Tests updating all LTO-specific fields."""
media = models.StorageMedia(
media_type="lto_tape",
identifier="LTO_PATCH",
capacity=1000,
status="active",
)
db_session.add(media)
db_session.commit()
response = client.patch(
f"/inventory/media/{media.id}",
json={
"generation": "LTO-9",
"worm": True,
"write_protected": True,
"compression": False,
"encryption_key_id": "new-key",
"cleaning_cartridge": True,
},
)
assert response.status_code == 200
data = response.json()
assert data["generation"] == "LTO-9"
assert data["worm"] is True
assert data["write_protected"] is True
assert data["compression"] is False
assert data["encryption_key_id"] == "new-key"
assert data["cleaning_cartridge"] is True
def test_update_media_all_hdd_fields(client, db_session):
"""Tests updating all HDD-specific fields."""
media = models.StorageMedia(
media_type="local_hdd",
identifier="HDD_PATCH",
capacity=1000,
status="active",
)
db_session.add(media)
db_session.commit()
response = client.patch(
f"/inventory/media/{media.id}",
json={
"drive_model": "WD-Red",
"device_uuid": "uuid-123",
"is_ssd": True,
"mount_path": "/mnt/backup",
"filesystem_type": "ext4",
"connection_interface": "USB3",
"encrypted": True,
},
)
assert response.status_code == 200
data = response.json()
assert data["drive_model"] == "WD-Red"
assert data["device_uuid"] == "uuid-123"
assert data["is_ssd"] is True
assert data["mount_path"] == "/mnt/backup"
assert data["filesystem_type"] == "ext4"
assert data["connection_interface"] == "USB3"
assert data["encrypted"] is True
def test_update_media_all_cloud_fields(client, db_session):
"""Tests updating all cloud-specific fields."""
media = models.StorageMedia(
media_type="s3_compat",
identifier="CLOUD_PATCH",
capacity=1000,
status="active",
)
db_session.add(media)
db_session.commit()
response = client.patch(
f"/inventory/media/{media.id}",
json={
"provider_template": "wasabi",
"endpoint_url": "https://s3.wasabisys.com",
"region": "us-east-1",
"bucket_name": "my-bucket",
"access_key_id": "AKIA...",
"secret_access_key_name": "wasabi-key",
"path_style_access": False,
"storage_class": "STANDARD",
"max_part_size_mb": 1000,
"obfuscate_filenames": True,
"encryption_secret_name": "enc-secret",
},
)
assert response.status_code == 200
data = response.json()
assert data["provider_template"] == "wasabi"
assert data["endpoint_url"] == "https://s3.wasabisys.com"
assert data["region"] == "us-east-1"
assert data["bucket_name"] == "my-bucket"
assert data["access_key_id"] == "AKIA..."
assert data["secret_access_key_name"] == "wasabi-key"
assert data["path_style_access"] is False
assert data["storage_class"] == "STANDARD"
assert data["max_part_size_mb"] == 1000
assert data["obfuscate_filenames"] is True
assert data["encryption_secret_name"] == "enc-secret"
def test_update_media_legacy_extra_config_migration(client, db_session):
"""Tests that legacy extra_config keys are migrated to first-class columns."""
media = models.StorageMedia(
media_type="local_hdd",
identifier="LEGACY_001",
capacity=1000,
status="active",
extra_config=json.dumps(
{
"device_path": "/mnt/legacy",
"encryption_key": "legacy-key",
"encryption_passphrase": "legacy-pass",
}
),
)
db_session.add(media)
db_session.commit()
response = client.patch(
f"/inventory/media/{media.id}",
json={"location": "Migrated"},
)
assert response.status_code == 200
data = response.json()
assert data["mount_path"] == "/mnt/legacy"
assert data["encryption_key_id"] == "legacy-key"
# ── Delete Media ──
def test_delete_media_not_found(client):
"""Tests deleting non-existent media returns 404."""
response = client.delete("/inventory/media/99999")
assert response.status_code == 404
# ── Initialize Media ──
def test_initialize_media_not_found(client):
"""Tests initializing non-existent media returns 404."""
response = client.post("/inventory/media/99999/initialize")
assert response.status_code == 404
def test_initialize_media_no_provider(client, db_session, mocker):
"""Tests initializing media with unsupported type returns 400."""
media = models.StorageMedia(
media_type="hdd", identifier="NO_PROV", capacity=1000, status="active"
)
db_session.add(media)
db_session.commit()
mocker.patch.object(
archiver_manager,
"_get_storage_provider",
return_value=None,
)
response = client.post(f"/inventory/media/{media.id}/initialize")
assert response.status_code == 400
assert "provider not found" in response.json()["detail"]
def test_initialize_media_existing_data_blocks(client, db_session, mocker):
"""Tests initialize blocks when existing data found and force=False."""
media = models.StorageMedia(
media_type="hdd", identifier="HAS_DATA", capacity=1000, status="active"
)
db_session.add(media)
db_session.commit()
mock_provider = mocker.MagicMock()
mock_provider.check_existing_data.return_value = True
mocker.patch.object(
archiver_manager,
"_get_storage_provider",
return_value=mock_provider,
)
response = client.post(f"/inventory/media/{media.id}/initialize")
assert response.status_code == 409
assert "existing data" in response.json()["detail"]
def test_initialize_media_force_overwrite(client, db_session, mocker):
"""Tests initialize with force=True overwrites existing data."""
media = models.StorageMedia(
media_type="hdd",
identifier="FORCE_INIT",
capacity=1000,
status="active",
extra_config='{"mount_path": "/tmp"}',
)
db_session.add(media)
db_session.commit()
mock_provider = mocker.MagicMock()
mock_provider.check_existing_data.return_value = True
mock_provider.initialize_media.return_value = True
mock_provider.device_path = "/tmp/init"
mocker.patch.object(
archiver_manager,
"_get_storage_provider",
return_value=mock_provider,
)
response = client.post(f"/inventory/media/{media.id}/initialize?force=true")
assert response.status_code == 200
assert "complete" in response.json()["message"]
def test_initialize_media_permission_error(client, db_session, mocker):
"""Tests initialize handles PermissionError."""
media = models.StorageMedia(
media_type="hdd", identifier="PERM_DENY", capacity=1000, status="active"
)
db_session.add(media)
db_session.commit()
mock_provider = mocker.MagicMock()
mock_provider.check_existing_data.return_value = False
mock_provider.initialize_media.side_effect = PermissionError("Access denied")
mocker.patch.object(
archiver_manager,
"_get_storage_provider",
return_value=mock_provider,
)
response = client.post(f"/inventory/media/{media.id}/initialize")
assert response.status_code == 403
assert "Access denied" in response.json()["detail"]
# ── Reorder Media ──
def test_reorder_media(client, db_session):
"""Tests reordering media priority."""
m1 = models.StorageMedia(
media_type="hdd", identifier="A", capacity=1000, status="active"
)
m2 = models.StorageMedia(
media_type="hdd", identifier="B", capacity=1000, status="active"
)
db_session.add_all([m1, m2])
db_session.commit()
response = client.post(
"/inventory/media/reorder", json={"media_ids": [m2.id, m1.id]}
)
assert response.status_code == 200
db_session.expire_all()
assert db_session.get(models.StorageMedia, m2.id).priority_index == 0
assert db_session.get(models.StorageMedia, m1.id).priority_index == 1
# ── Insights Deep Tests ──
def test_insights_with_duplicates_and_aging(client, db_session):
"""Tests insights reports duplicates, aging, redundancy, and extensions."""
now = datetime.now(timezone.utc).timestamp()
# Two files with same hash (duplicate)
f1 = models.FilesystemState(
file_path="/data/a.txt", size=100, mtime=now, sha256_hash="duphash"
)
f2 = models.FilesystemState(
file_path="/data/b.txt", size=100, mtime=now, sha256_hash="duphash"
)
# Protected file
f3 = models.FilesystemState(
file_path="/data/c.txt",
size=200,
mtime=now - 400 * 24 * 3600,
sha256_hash="hash3",
)
db_session.add_all([f1, f2, f3])
db_session.flush()
media = models.StorageMedia(
media_type="hdd", identifier="M1", capacity=1000, status="active"
)
db_session.add(media)
db_session.flush()
db_session.add(
models.FileVersion(
filesystem_state_id=f3.id,
media_id=media.id,
file_number="1",
offset_start=0,
offset_end=200,
)
)
db_session.commit()
response = client.get("/inventory/insights")
assert response.status_code == 200
data = response.json()
assert data["summary"]["total_files"] == 3
assert data["summary"]["total_bytes"] == 400
assert data["summary"]["protected_bytes"] == 200
assert data["summary"]["vulnerable_bytes"] == 200
assert len(data["duplicates"]) == 1
assert data["duplicates"][0]["copies"] == 2
assert data["duplicates"][0]["saved"] == 100
assert len(data["extensions"]) >= 1
assert any(e["ext"] == "txt" for e in data["extensions"])
assert len(data["redundancy"]) >= 1
# ── Treemap / Directories ──
def test_get_treemap(client, db_session):
"""Tests the treemap endpoint returns hierarchical directory data."""
f1 = models.FilesystemState(file_path="/data/sub/file1.txt", size=100, mtime=1000)
f2 = models.FilesystemState(file_path="/data/sub/file2.txt", size=200, mtime=1000)
db_session.add_all([f1, f2])
db_session.commit()
response = client.get("/inventory/directories")
assert response.status_code == 200
data = response.json()
assert isinstance(data, list)
# Should contain data directory
assert len(data) >= 1
# ── Detect Media ──
def test_detect_media_finds_new_insertion(client, db_session, mocker):
"""Tests detect_media finds newly inserted unregistered media."""
media = models.StorageMedia(
media_type="lto_tape",
identifier="EXISTING_TAPE",
capacity=1000,
status="active",
extra_config=json.dumps({"device_path": "/dev/nst0"}),
)
db_session.add(media)
db_session.commit()
mock_provider = mocker.MagicMock()
mock_provider.provider_id = "lto_tape"
mock_provider.check_online.return_value = True
mock_provider.get_live_info.return_value = {"identity": "NEW_TAPE_01"}
mocker.patch.object(
archiver_manager,
"_get_storage_provider",
return_value=mock_provider,
)
response = client.get("/inventory/detect")
assert response.status_code == 200
data = response.json()
assert len(data) == 1
assert data[0]["identifier"] == "NEW_TAPE_01"
assert data[0]["device_path"] == "/dev/nst0"
+6 -3
View File
@@ -1,4 +1,4 @@
from datetime import datetime, timezone
from datetime import datetime, timedelta, timezone
import pytest
@@ -17,19 +17,22 @@ def test_list_jobs_empty(client):
def test_list_jobs_populated(client, db_session):
"""Tests listing jobs with pagination and latest_log inclusion."""
now = datetime.now(timezone.utc)
job1 = models.Job(
job_type="SCAN",
status="COMPLETED",
progress=100.0,
current_task="Done",
started_at=datetime.now(timezone.utc),
completed_at=datetime.now(timezone.utc),
started_at=now - timedelta(seconds=2),
completed_at=now - timedelta(seconds=1),
created_at=now - timedelta(seconds=2),
)
job2 = models.Job(
job_type="BACKUP",
status="RUNNING",
progress=50.0,
current_task="Writing archive",
created_at=now,
)
db_session.add_all([job1, job2])
db_session.flush()
+196
View File
@@ -0,0 +1,196 @@
from app.db import models
# ── Settings CRUD ──
def test_get_settings_empty(client):
"""Tests retrieving settings when none are set."""
response = client.get("/system/settings")
assert response.status_code == 200
assert response.json() == {}
def test_update_settings(client):
"""Tests updating a system setting."""
response = client.post(
"/system/settings", json={"key": "schedule_scan", "value": "0 2 * * *"}
)
assert response.status_code == 200
assert response.json() == {"message": "Setting committed."}
# Verify retrieval
response = client.get("/system/settings")
assert response.json()["schedule_scan"] == "0 2 * * *"
def test_update_settings_triggers_scheduler_reload(client, mocker):
"""Tests that updating schedule_scan reloads the scheduler."""
from app.services.scheduler import scheduler_manager
reload_spy = mocker.spy(scheduler_manager, "reload")
response = client.post(
"/system/settings", json={"key": "schedule_scan", "value": "0 3 * * *"}
)
assert response.status_code == 200
reload_spy.assert_called_once()
def test_update_global_exclusions_recomputes_policy(client, db_session, mocker):
"""Tests that updating global_exclusions triggers policy recompute."""
recompute_spy = mocker.patch("app.api.system.settings.recompute_exclusion_policy")
response = client.post(
"/system/settings",
json={"key": "global_exclusions", "value": "*.tmp\n*.log"},
)
assert response.status_code == 200
recompute_spy.assert_called_once()
# ── Exclusion Testing ──
def test_test_exclusions_empty_patterns(client):
"""Tests exclusion test with empty patterns returns zeros."""
response = client.post(
"/system/settings/test-exclusions",
json={"patterns": "", "limit": 10},
)
assert response.status_code == 200
data = response.json()
assert data["total_files"] == 0
assert data["matched_count"] == 0
assert data["sample"] == []
def test_test_exclusions_matches_files(client, db_session):
"""Tests exclusion patterns against indexed files."""
db_session.add_all(
[
models.FilesystemState(
file_path="/data/file.txt", size=100, mtime=1000, is_deleted=False
),
models.FilesystemState(
file_path="/data/temp.tmp", size=50, mtime=1000, is_deleted=False
),
models.FilesystemState(
file_path="/data/debug.log", size=200, mtime=1000, is_deleted=False
),
]
)
db_session.commit()
response = client.post(
"/system/settings/test-exclusions",
json={"patterns": "*.tmp\n*.log", "limit": 10},
)
assert response.status_code == 200
data = response.json()
assert data["total_files"] == 3
assert data["matched_count"] == 2
assert data["matched_size"] == 250
assert len(data["sample"]) == 2
def test_test_exclusions_deleted_files_excluded(client, db_session):
"""Tests that deleted files are excluded from exclusion testing."""
db_session.add_all(
[
models.FilesystemState(
file_path="/data/keep.txt",
size=100,
mtime=1000,
is_deleted=False,
),
models.FilesystemState(
file_path="/data/old.tmp",
size=50,
mtime=1000,
is_deleted=True,
),
]
)
db_session.commit()
response = client.post(
"/system/settings/test-exclusions",
json={"patterns": "*.tmp", "limit": 10},
)
assert response.status_code == 200
data = response.json()
assert data["total_files"] == 1
assert data["matched_count"] == 0
# ── Exclusion Report Download ──
def test_download_exclusion_report(client, db_session):
"""Tests CSV report generation for exclusion matches."""
db_session.add(
models.FilesystemState(
file_path="/data/target.log", size=100, mtime=1000, is_deleted=False
)
)
db_session.commit()
response = client.post(
"/system/settings/test-exclusions/download",
json={"patterns": "*.log", "limit": 10},
)
assert response.status_code == 200
assert response.headers["content-type"] == "text/csv; charset=utf-8"
assert "exclusion_report.csv" in response.headers["content-disposition"]
content = response.content.decode("utf-8")
assert "path,size,mtime,sha256_hash" in content
assert "target.log" in content
def test_download_exclusion_report_no_patterns(client):
"""Tests download with empty patterns returns 400."""
response = client.post(
"/system/settings/test-exclusions/download",
json={"patterns": "", "limit": 10},
)
assert response.status_code == 400
assert "No patterns provided" in response.json()["detail"]
# ── Secrets Keystore (complementing test_api_system.py) ──
def test_create_secret(client):
"""Tests creating a secret."""
response = client.post(
"/system/secrets", json={"name": "api-key", "value": "secret123"}
)
assert response.status_code == 200
assert "stored" in response.json()["message"]
response = client.get("/system/secrets")
assert "api-key" in response.json()
def test_get_secret_value(client):
"""Tests retrieving a secret value."""
client.post("/system/secrets", json={"name": "key-1", "value": "val-1"})
response = client.get("/system/secrets/key-1")
assert response.status_code == 200
assert response.json()["value"] == "val-1"
def test_delete_secret(client):
"""Tests deleting a secret."""
client.post("/system/secrets", json={"name": "to-delete", "value": "x"})
response = client.request("DELETE", "/system/secrets", json={"name": "to-delete"})
assert response.status_code == 200
response = client.get("/system/secrets")
assert "to-delete" not in response.json()
def test_delete_secret_not_found(client):
"""Tests deleting a non-existent secret returns 404."""
response = client.request("DELETE", "/system/secrets", json={"name": "missing"})
assert response.status_code == 404
+82 -135
View File
@@ -1,3 +1,4 @@
import json
from datetime import datetime, timezone
from app.db import models
@@ -52,13 +53,6 @@ def test_update_settings(client):
assert response.json()["schedule_scan"] == "0 2 * * *"
def test_list_jobs_empty(client):
"""Tests listing jobs when none exist."""
response = client.get("/system/jobs")
assert response.status_code == 200
assert response.json() == []
def test_trigger_scan(client):
"""Tests triggering a system scan."""
response = client.post("/system/scan")
@@ -77,10 +71,17 @@ def test_get_scan_status(client):
def test_ls_root(client):
"""Tests listing the root directory."""
"""Tests listing the root directory returns actual subdirectories."""
response = client.get("/system/ls?path=/")
assert response.status_code == 200
assert isinstance(response.json(), list)
data = response.json()
assert isinstance(data, list)
assert len(data) > 0
for entry in data:
assert "name" in entry
assert "path" in entry
assert entry["name"] != ""
assert entry["path"] != ""
def test_ignore_hardware(client):
@@ -104,127 +105,6 @@ def test_scan_status_includes_files_missing(client):
assert data["files_missing"] == 0
def test_list_discrepancies_empty(client):
"""Tests listing discrepancies when none exist."""
response = client.get("/system/discrepancies")
assert response.status_code == 200
assert response.json() == []
def test_list_discrepancies_deleted_file(client, db_session):
"""Tests listing a confirmed-deleted file in discrepancies."""
file_record = models.FilesystemState(
file_path="/data/old.txt",
size=100,
mtime=1000,
is_deleted=True,
is_ignored=False,
sha256_hash=None,
)
db_session.add(file_record)
db_session.commit()
response = client.get("/system/discrepancies")
assert response.status_code == 200
data = response.json()
assert len(data) == 1
assert data[0]["path"] == "/data/old.txt"
assert data[0]["is_deleted"] is True
def test_confirm_file_deleted(client, db_session):
"""Tests confirming a file as deleted."""
file_record = models.FilesystemState(
file_path="/data/verify.txt",
size=50,
mtime=2000,
is_deleted=False,
)
db_session.add(file_record)
db_session.commit()
response = client.post(f"/system/discrepancies/{file_record.id}/confirm")
assert response.status_code == 200
assert "marked as deleted" in response.json()["message"]
db_session.expire_all()
db_session.refresh(file_record)
assert file_record.is_deleted is True
def test_confirm_file_deleted_not_found(client):
"""Tests confirming a non-existent file returns 404."""
response = client.post("/system/discrepancies/9999/confirm")
assert response.status_code == 404
def test_dismiss_discrepancy(client, db_session):
"""Tests dismissing a deleted file."""
file_record = models.FilesystemState(
file_path="/data/dismiss.txt",
size=50,
mtime=2000,
is_deleted=True,
)
db_session.add(file_record)
db_session.commit()
response = client.post(f"/system/discrepancies/{file_record.id}/dismiss")
assert response.status_code == 200
assert "dismissed" in response.json()["message"]
db_session.expire_all()
db_session.refresh(file_record)
assert file_record.missing_acknowledged_at is not None
def test_delete_file_record(client, db_session):
"""Tests hard-deleting a file record and its versions."""
media = models.StorageMedia(
media_type="hdd", identifier="M1", capacity=1000, status="active"
)
db_session.add(media)
db_session.flush()
file_record = models.FilesystemState(
file_path="/data/hard_delete.txt",
size=100,
mtime=1000,
is_deleted=True,
)
db_session.add(file_record)
db_session.flush()
db_session.add(
models.FileVersion(
filesystem_state_id=file_record.id,
media_id=media.id,
file_number="1",
offset_start=0,
offset_end=100,
)
)
db_session.commit()
file_id = file_record.id
response = client.delete(f"/system/discrepancies/{file_id}")
assert response.status_code == 200
db_session.expire_all()
# Verify file and version are gone
assert (
db_session.query(models.FilesystemState).filter_by(id=file_id).first() is None
)
assert (
db_session.query(models.FileVersion)
.filter_by(filesystem_state_id=file_id)
.first()
is None
)
def test_dashboard_stats_excludes_failed_media(client, db_session):
"""Tests that dashboard stats do not count versions on failed or retired media."""
active_media = models.StorageMedia(
@@ -593,10 +473,13 @@ def test_ignore_hardware_duplicate(client):
def test_database_export(client):
"""Tests database export endpoint returns a file response."""
"""Tests database export endpoint returns a SQLite file download."""
response = client.get("/system/database/export")
# May return 200 with file or 404 if db path not found
assert response.status_code in (200, 404)
assert response.status_code == 200
assert "tapehoard_index_" in response.headers["content-disposition"]
assert ".db" in response.headers["content-disposition"]
# Should contain SQLite magic bytes
assert response.content[:16] == b"SQLite format 3\x00"
# ── Tracking Batch ──
@@ -676,5 +559,69 @@ def test_test_notification_invalid_url(client):
response = client.post(
"/system/notifications/test", json={"url": "not-a-valid-url"}
)
# Notification manager may succeed or fail depending on apprise parsing
assert response.status_code in (200, 500)
assert response.status_code == 500
assert "Failed to dispatch test alert" in response.json()["detail"]
# ── Host Directory Listing ──
def test_ls_traversal_rejected(client):
"""Tests that path traversal attempts are blocked."""
response = client.get("/system/ls?path=/etc/../secret")
assert response.status_code == 403
assert "Path traversal not allowed" in response.json()["detail"]
def test_ls_nonexistent_path(client):
"""Tests listing a non-existent directory returns empty list."""
response = client.get("/system/ls?path=/nonexistent_path_12345")
assert response.status_code == 200
assert response.json() == []
# ── System Tree ──
def test_system_tree_root(client, db_session):
"""Tests system tree at ROOT returns configured source roots."""
db_session.add(models.SystemSetting(key="source_roots", value='["/source_data"]'))
db_session.commit()
response = client.get("/system/tree")
assert response.status_code == 200
data = response.json()
assert len(data) == 1
assert data[0]["name"] == "/source_data"
assert data[0]["has_children"] is True
def test_system_tree_subdirectory(client, db_session):
"""Tests system tree browsing a subdirectory."""
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
db_session.add(
models.SystemSetting(key="source_roots", value=json.dumps([tmpdir]))
)
db_session.commit()
# Create a subdirectory
import os
os.makedirs(os.path.join(tmpdir, "subdir"))
response = client.get(f"/system/tree?path={tmpdir}")
assert response.status_code == 200
data = response.json()
assert len(data) == 1
assert data[0]["name"] == "subdir"
def test_system_tree_outside_roots(client, db_session):
"""Tests tree browsing outside roots returns 403."""
db_session.add(models.SystemSetting(key="source_roots", value='["/source_data"]'))
db_session.commit()
response = client.get("/system/tree?path=/etc")
assert response.status_code == 403
+361 -66
View File
@@ -1,84 +1,379 @@
import hashlib
import io
import os
import pytest
from unittest.mock import MagicMock
from app.providers.cloud import CloudStorageProvider
def test_cloud_provider_obfuscation_logic():
"""Verifies that filename hashing and sharding works as expected."""
# ── Constructor & Config ──
# CASE 1: Obfuscation Disabled
config_plain = {
def test_cloud_provider_endpoint_normalization(mocker):
"""Tests that endpoint URLs without protocol get https:// prepended."""
mock_boto = mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider(
{
"bucket_name": "test-bucket",
"obfuscate_filenames": False,
"access_key": "fake",
"secret_key": "fake",
"endpoint_url": "s3.example.com",
"region": "eu-west-1",
"access_key": "ak",
"secret_key": "sk",
}
provider_plain = CloudStorageProvider(config_plain)
path = "documents/secret_plan.pdf"
# Expectation: Key is exactly the path with prefix
key_plain = provider_plain._get_obfuscated_key("objects", path)
assert key_plain == "objects/documents/secret_plan.pdf"
# CASE 2: Obfuscation Enabled
config_hidden = {
"bucket_name": "test-bucket",
"obfuscate_filenames": True,
"access_key": "fake",
"secret_key": "fake",
}
provider_hidden = CloudStorageProvider(config_hidden)
# Expectation: Key is hashed and sharded
# hash of "documents/secret_plan.pdf"
expected_hash = hashlib.sha256(path.encode("utf-8")).hexdigest()
expected_prefix = f"objects/{expected_hash[:2]}/{expected_hash[2:4]}"
key_hidden = provider_hidden._get_obfuscated_key("objects", path)
assert key_hidden.startswith("objects/")
assert key_hidden == f"{expected_prefix}/{expected_hash}"
assert "secret_plan.pdf" not in key_hidden
def test_cloud_secret_lookup(mocker, db_session):
"""Verifies that the provider looks up secrets from the keystore by name."""
from app.db import models
# Mock boto3.client to avoid slow initialization in unit tests
mocker.patch("app.providers.cloud.boto3")
# Seed the secrets keystore
db_session.add(
models.SystemSetting(
key="secrets",
value='{"my-encryption-key": "local-override", "empty-secret": ""}',
)
call_kwargs = mock_boto.client.call_args[1]
assert call_kwargs["endpoint_url"] == "https://s3.example.com"
assert call_kwargs["region_name"] == "eu-west-1"
assert provider.provider_type == "S3"
def test_cloud_provider_endpoint_no_modification(mocker):
"""Tests that endpoint URLs with existing protocol are left alone."""
mock_boto = mocker.patch("app.providers.cloud.boto3")
CloudStorageProvider(
{
"bucket_name": "test-bucket",
"endpoint_url": "http://minio.local:9000",
}
)
call_kwargs = mock_boto.client.call_args[1]
assert call_kwargs["endpoint_url"] == "http://minio.local:9000"
def test_cloud_provider_defaults(mocker):
"""Tests default values when minimal config is provided."""
mock_boto = mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
assert provider.region == "us-east-1"
assert provider.endpoint_url is None
assert provider.obfuscate is False
mock_boto.client.assert_called_once()
# ── Online & Identification ──
def test_check_online_success(mocker):
"""Tests check_online returns True when head_bucket succeeds."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.head_bucket = MagicMock(return_value=None)
assert provider.check_online() is True
def test_check_online_failure(mocker):
"""Tests check_online returns False when head_bucket raises."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.head_bucket = MagicMock(side_effect=Exception("timeout"))
assert provider.check_online() is False
def test_get_live_info(mocker):
"""Tests get_live_info returns provider metadata."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "my-bucket"})
provider.s3.head_bucket = MagicMock(return_value=None)
info = provider.get_live_info()
assert info["online"] is True
assert info["provider"] == "S3"
assert info["bucket"] == "my-bucket"
def test_check_existing_data_found(mocker):
"""Tests check_existing_data when objects exist under archives/."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.list_objects_v2 = MagicMock(
return_value={"Contents": [{"Key": "archives/1.tar"}]}
)
assert provider.check_existing_data() is True
def test_check_existing_data_empty(mocker):
"""Tests check_existing_data when no objects exist."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.list_objects_v2 = MagicMock(return_value={})
assert provider.check_existing_data() is False
def test_identify_media_by_id_file(mocker):
"""Tests identify_media reads .tapehoard_id when available."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
mock_body = MagicMock()
mock_body.read.return_value = b" BUCKET_001 "
provider.s3.get_object = MagicMock(return_value={"Body": mock_body})
result = provider.identify_media()
assert result == "BUCKET_001"
def test_identify_media_fallback_to_bucket_name(mocker):
"""Tests identify_media falls back to bucket name when .tapehoard_id missing."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "fallback-bucket"})
provider.s3.get_object = MagicMock(side_effect=Exception("NoSuchKey"))
provider.s3.head_bucket = MagicMock(return_value=None)
result = provider.identify_media()
assert result == "fallback-bucket"
def test_identify_media_complete_failure(mocker):
"""Tests identify_media returns None when everything fails."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.get_object = MagicMock(side_effect=Exception("fail"))
provider.s3.head_bucket = MagicMock(side_effect=Exception("fail"))
assert provider.identify_media() is None
# ── Write Operations ──
def test_write_archive_plain(mocker):
"""Tests writing an unencrypted archive."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b", "obfuscate_filenames": False})
stream = io.BytesIO(b"archive content")
provider.s3.upload_fileobj = MagicMock(return_value=None)
location = provider.write_archive("M1", stream)
assert location.startswith("archives/archives/")
assert location.endswith(".tar")
provider.s3.upload_fileobj.assert_called_once()
def test_write_file_direct_plain(mocker):
"""Tests writing an unencrypted object directly."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b", "obfuscate_filenames": False})
stream = io.BytesIO(b"file content")
provider.s3.upload_fileobj = MagicMock(return_value=None)
location = provider.write_file_direct("M1", "photos/image.jpg", stream)
assert location == "objects/photos/image.jpg"
def test_initialize_media_clears_and_tags(mocker):
"""Tests initialize_media clears existing objects and writes .tapehoard_id."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.head_bucket = MagicMock(return_value=None)
mock_paginator = MagicMock()
mock_paginator.paginate = MagicMock(
return_value=[{"Contents": [{"Key": "old1"}, {"Key": "old2"}]}]
)
provider.s3.get_paginator = MagicMock(return_value=mock_paginator)
provider.s3.delete_objects = MagicMock(return_value=None)
provider.s3.put_object = MagicMock(return_value=None)
result = provider.initialize_media("NEW_DISK")
assert result is True
provider.s3.delete_objects.assert_called_once()
provider.s3.put_object.assert_called_once()
call_kwargs = provider.s3.put_object.call_args[1]
assert call_kwargs["Key"] == ".tapehoard_id"
assert call_kwargs["Body"] == b"NEW_DISK"
def test_initialize_media_failure(mocker):
"""Tests initialize_media returns False on error."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.head_bucket = MagicMock(side_effect=Exception("no access"))
assert provider.initialize_media("X") is False
def test_prepare_for_write_match(mocker):
"""Tests prepare_for_write when media identifier matches."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.head_bucket = MagicMock(return_value=None)
provider.s3.get_object = MagicMock(side_effect=Exception("not found"))
# Fallback to bucket name
assert provider.prepare_for_write("b") is True
assert provider.prepare_for_write("wrong") is False
# ── Read Operations ──
def test_read_archive_plain(mocker):
"""Tests reading an unencrypted archive."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
provider.s3.get_object = MagicMock(
return_value={
"Body": io.BytesIO(b"raw archive data"),
"Metadata": {},
}
)
result = provider.read_archive("M1", "archives/1.tar")
assert result.read() == b"raw archive data"
def test_read_archive_encrypted(mocker, db_session):
"""Tests round-trip encryption/decryption for archives."""
from app.db import models
from Crypto.Cipher import AES
from Crypto.Protocol.KDF import PBKDF2
from Crypto.Hash import SHA256
# Seed passphrase in keystore
db_session.add(
models.SystemSetting(key="secrets", value='{"cloud-enc": "my-passphrase-123"}')
)
db_session.commit()
# CASE 1: Secret name provided and exists in keystore
config_local = {
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider(
{
"bucket_name": "b",
"encryption_secret_name": "my-encryption-key",
"encryption_secret_name": "cloud-enc",
}
provider_local = CloudStorageProvider(config_local)
assert provider_local.passphrase == "local-override"
)
# CASE 2: No secret name provided, passphrase is None
config_empty = {"bucket_name": "b"}
provider_fallback = CloudStorageProvider(config_empty)
assert provider_fallback.passphrase is None
# Encrypt data ourselves to simulate stored payload
original_data = b"secret archive content"
salt = os.urandom(16)
nonce = os.urandom(12)
key = PBKDF2(
"my-passphrase-123", salt, dkLen=32, count=100000, hmac_hash_module=SHA256
)
cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
ciphertext, tag = cipher.encrypt_and_digest(original_data)
payload = salt + nonce + tag + ciphertext
# CASE 3: Secret name provided but value is empty string
config_empty_secret = {
provider.s3.get_object = MagicMock(
return_value={
"Body": io.BytesIO(payload),
"Metadata": {"tapehoard-encrypted": "v2-gcm"},
}
)
result = provider.read_archive("M1", "archives/enc.tar")
assert result.read() == original_data
def test_read_archive_encrypted_tampered(mocker, db_session):
"""Tests that tampered encrypted archive raises ValueError."""
from app.db import models
db_session.add(
models.SystemSetting(key="secrets", value='{"cloud-enc": "my-passphrase-123"}')
)
db_session.commit()
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider(
{
"bucket_name": "b",
"encryption_secret_name": "empty-secret",
"encryption_secret_name": "cloud-enc",
}
provider_empty = CloudStorageProvider(config_empty_secret)
assert provider_empty.passphrase == ""
)
# CASE 4: No passphrase anywhere (ValueError on key derivation)
provider_none = CloudStorageProvider({"bucket_name": "b"})
with pytest.raises(ValueError, match="No encryption passphrase configured"):
provider_none._derive_key(b"salt")
# Corrupt payload: valid structure but wrong ciphertext
fake_payload = os.urandom(16) + os.urandom(12) + os.urandom(16) + b"garbage"
provider.s3.get_object = MagicMock(
return_value={
"Body": io.BytesIO(fake_payload),
"Metadata": {"tapehoard-encrypted": "v2-gcm"},
}
)
with pytest.raises(ValueError, match="tampering detected"):
provider.read_archive("M1", "archives/bad.tar")
# ── Encryption Round-Trip ──
def test_write_and_read_archive_encrypted(mocker, db_session):
"""End-to-end test: write encrypted archive, read it back."""
from app.db import models
db_session.add(
models.SystemSetting(key="secrets", value='{"cloud-enc": "my-passphrase-123"}')
)
db_session.commit()
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider(
{
"bucket_name": "b",
"encryption_secret_name": "cloud-enc",
"obfuscate_filenames": False,
}
)
# Capture the uploaded payload
uploaded = {}
def capture_put_object(**kwargs):
uploaded["key"] = kwargs["Key"]
uploaded["body"] = kwargs["Body"]
uploaded["metadata"] = kwargs.get("Metadata", {})
provider.s3.put_object = MagicMock(side_effect=capture_put_object)
original = b"round-trip test data"
location = provider.write_archive("M1", io.BytesIO(original))
# Verify upload happened with encryption metadata
assert uploaded["metadata"].get("x-amz-meta-tapehoard-encrypted") == "v2-gcm"
assert uploaded["metadata"].get("x-amz-meta-tapehoard-type") == "archive"
# Now read it back
provider.s3.get_object = MagicMock(
return_value={
"Body": io.BytesIO(uploaded["body"]),
"Metadata": {"tapehoard-encrypted": "v2-gcm"},
}
)
result = provider.read_archive("M1", location)
assert result.read() == original
# ── Misc ──
def test_get_name(mocker):
"""Tests get_name returns provider type string."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b", "provider": "Wasabi"})
assert provider.get_name() == "Cloud (Wasabi)"
def test_finalize_media(mocker):
"""Tests finalize_media is a no-op that logs."""
mocker.patch("app.providers.cloud.boto3")
provider = CloudStorageProvider({"bucket_name": "b"})
# Should not raise
provider.finalize_media("M1")
+23
View File
@@ -150,8 +150,31 @@ def test_run_backup_mocked(db_session, mocker, tmp_path):
# Verify result
db_session.expire_all()
# Media usage updated
assert media.bytes_used > 0
# FileVersion recorded for the archived file
version = (
db_session.query(models.FileVersion)
.filter_by(filesystem_state_id=f1.id)
.first()
)
assert version is not None
assert version.media_id == media.id
assert version.offset_start == 0
assert version.offset_end == f1.size
# Backup job completed successfully
refreshed_job = db_session.get(models.Job, job.id)
assert refreshed_job.status == "COMPLETED"
assert refreshed_job.progress == 100.0
# Provider was asked to write the archive
mock_provider.write_archive.assert_called_once()
call_args = mock_provider.write_archive.call_args
assert call_args[0][0] == "DISK_001" # media identifier
def test_archiver_saturated_media_logic(db_session, mocker, tmp_path):
"""Verifies that media is marked full and priority ceded based on hardware feedback."""
+16 -64
View File
@@ -1,10 +1,9 @@
import hashlib
from datetime import datetime, timezone
from app.services.scanner import (
ScannerService,
JobManager,
_hash_file_batch_fast,
_FAST_HASH_BINARY,
)
from app.db import models
@@ -115,9 +114,6 @@ def test_scan_sources_mocked(db_session, mocker):
"""Tests the discovery scan with mocked filesystem."""
scanner = ScannerService()
# Disable fast find so the test uses the os.walk fallback path
mocker.patch("app.services.scanner._FAST_FIND_BINARY", None)
# Mock settings
mocker.patch("app.api.common.get_source_roots", return_value=["/mock_source"])
mocker.patch("app.api.common.get_exclusion_spec", return_value=None)
@@ -143,52 +139,10 @@ def test_scan_sources_mocked(db_session, mocker):
assert record.size == 500
def test_hash_file_batch_fast(tmp_path):
"""Tests native sha256sum/shasum batch hashing if available."""
if _FAST_HASH_BINARY is None:
# Skip if no native hash binary is available
return
# Create test files
files = {}
for i in range(5):
content = f"test content {i}".encode()
f = tmp_path / f"file_{i}.txt"
f.write_bytes(content)
files[str(f)] = hashlib.sha256(content).hexdigest()
# Hash via native binary
results = _hash_file_batch_fast(list(files.keys()), _FAST_HASH_BINARY)
assert len(results) == 5
for path, expected_hash in files.items():
assert results[path] == expected_hash
def test_hash_file_batch_fast_empty():
"""Tests that empty batch returns empty results."""
if _FAST_HASH_BINARY is None:
return
results = _hash_file_batch_fast([], _FAST_HASH_BINARY)
assert results == {}
def test_hash_file_batch_fast_nonexistent():
"""Tests that non-existent files are gracefully handled."""
if _FAST_HASH_BINARY is None:
return
results = _hash_file_batch_fast(["/nonexistent/path"], _FAST_HASH_BINARY)
# Non-existent files may or may not appear in results depending on binary behavior
assert isinstance(results, dict)
def test_missing_file_marked_deleted_at_end_of_scan(db_session, mocker):
"""Tests that files not seen during a scan are marked as deleted."""
scanner = ScannerService()
mocker.patch("app.services.scanner._FAST_FIND_BINARY", None)
mocker.patch("app.api.common.get_source_roots", return_value=["/mock_source"])
mocker.patch("app.api.common.get_exclusion_spec", return_value=None)
mocker.patch("os.walk", return_value=[])
@@ -222,10 +176,7 @@ def test_missing_file_marked_deleted_at_end_of_scan(db_session, mocker):
def test_existing_file_not_marked_deleted(db_session, mocker):
"""Tests that files found during scan retain is_deleted=False."""
scanner = ScannerService()
print(f"DEBUG test_existing: scanner.is_running = {scanner.is_running}")
print(f"DEBUG test_existing: scanner.is_hashing = {scanner.is_hashing}")
mocker.patch("app.services.scanner._FAST_FIND_BINARY", None)
mocker.patch("app.api.common.get_source_roots", return_value=["/mock_source"])
mocker.patch("app.api.common.get_exclusion_spec", return_value=None)
mocker.patch("os.path.exists", return_value=True)
@@ -260,8 +211,6 @@ def test_missing_file_during_hashing_marked_deleted(db_session, mocker):
"""Tests that files missing during hashing are marked as deleted."""
scanner = ScannerService()
mocker.patch("app.services.scanner._FAST_HASH_BINARY", None)
f = models.FilesystemState(
file_path="/data/vanished.bin", size=10, mtime=1, is_ignored=False
)
@@ -276,8 +225,11 @@ def test_missing_file_during_hashing_marked_deleted(db_session, mocker):
assert f.is_deleted is True
def test_missing_file_skipped_in_hashing_query(db_session):
"""Tests that already-deleted files are excluded from hashing targets."""
def test_deleted_files_excluded_from_hashing(db_session):
"""Tests that run_hashing skips already-deleted files."""
scanner = ScannerService()
scanner.is_running = False # Causes run_hashing to exit when no targets found
deleted_file = models.FilesystemState(
file_path="/data/deleted.bin",
size=10,
@@ -289,13 +241,13 @@ def test_missing_file_skipped_in_hashing_query(db_session):
db_session.add(deleted_file)
db_session.commit()
pending = (
db_session.query(models.FilesystemState)
.filter(
models.FilesystemState.sha256_hash.is_(None),
models.FilesystemState.is_ignored.is_(False),
models.FilesystemState.is_deleted.is_(False),
)
.all()
)
assert len(pending) == 0
scanner.run_hashing()
# Deleted file should not have been processed (hash still None)
db_session.refresh(deleted_file)
assert deleted_file.sha256_hash is None
# A HASH job should have been created and completed (no work to do)
job = db_session.query(models.Job).filter_by(job_type="HASH").first()
assert job is not None
assert job.status == "COMPLETED"
+135
View File
@@ -0,0 +1,135 @@
from app.services.scheduler import SchedulerService
from app.db import models
def test_scheduler_start_stop():
"""Tests scheduler lifecycle (start, stop, idempotent)."""
scheduler = SchedulerService()
assert not scheduler.scheduler.running
scheduler.start()
assert scheduler.scheduler.running
# Idempotent start
scheduler.start()
assert scheduler.scheduler.running
scheduler.stop()
assert not scheduler.scheduler.running
# Idempotent stop
scheduler.stop()
assert not scheduler.scheduler.running
def test_scheduler_load_schedules_empty():
"""Tests load_schedules with no cron settings configured."""
scheduler = SchedulerService()
scheduler.start()
scheduler.load_schedules()
# No jobs should be registered
assert scheduler.scheduler.get_job("system_scan") is None
assert scheduler.scheduler.get_job("system_archival") is None
scheduler.stop()
def test_scheduler_load_schedules_with_scan(db_session):
"""Tests load_schedules picks up a scan schedule from settings."""
db_session.add(models.SystemSetting(key="schedule_scan", value="0 2 * * *"))
db_session.commit()
scheduler = SchedulerService()
scheduler.start()
scheduler.load_schedules()
job = scheduler.scheduler.get_job("system_scan")
assert job is not None
assert job.id == "system_scan"
scheduler.stop()
def test_scheduler_add_remove_job():
"""Tests adding and removing scheduled jobs."""
scheduler = SchedulerService()
scheduler.start()
def dummy_job():
pass
scheduler.add_job("test_job", dummy_job, "0 0 * * *")
assert scheduler.scheduler.get_job("test_job") is not None
scheduler.remove_job("test_job")
assert scheduler.scheduler.get_job("test_job") is None
# Idempotent remove
scheduler.remove_job("test_job")
assert scheduler.scheduler.get_job("test_job") is None
scheduler.stop()
def test_scheduler_add_job_empty_cron():
"""Tests that empty/whitespace cron expression removes the job."""
scheduler = SchedulerService()
scheduler.start()
def dummy_job():
pass
scheduler.add_job("test_job", dummy_job, "0 0 * * *")
assert scheduler.scheduler.get_job("test_job") is not None
# Empty string should remove
scheduler.add_job("test_job", dummy_job, " ")
assert scheduler.scheduler.get_job("test_job") is None
scheduler.stop()
def test_scheduler_reload(db_session, mocker):
"""Tests reload calls load_schedules."""
db_session.add(models.SystemSetting(key="schedule_scan", value="0 3 * * *"))
db_session.commit()
scheduler = SchedulerService()
scheduler.start()
load_spy = mocker.spy(scheduler, "load_schedules")
scheduler.reload()
load_spy.assert_called_once()
job = scheduler.scheduler.get_job("system_scan")
assert job is not None
scheduler.stop()
def test_scheduler_run_system_scan_skips_when_running(mocker):
"""Tests run_system_scan is skipped when scanner_manager is already running."""
scheduler = SchedulerService()
mocker.patch("app.services.scheduler.scanner_manager.is_running", True)
scan_sources_spy = mocker.patch(
"app.services.scheduler.scanner_manager.scan_sources"
)
scheduler.run_system_scan()
scan_sources_spy.assert_not_called()
def test_scheduler_run_system_archival_no_online_media(db_session, mocker):
"""Tests run_system_archival skips when no active media is online."""
scheduler = SchedulerService()
# No media in DB
run_backup_spy = mocker.patch("app.services.scheduler.archiver_manager.run_backup")
scheduler.run_system_archival()
run_backup_spy.assert_not_called()
Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

File diff suppressed because one or more lines are too long
+15 -8
View File
@@ -2,7 +2,7 @@
import type { Client, Options as Options2, TDataShape } from './client';
import { client } from './client.gen';
import type { AddDirectoryToRestoreQueueData, AddDirectoryToRestoreQueueErrors, AddDirectoryToRestoreQueueResponses, AddFileToRestoreQueueData, AddFileToRestoreQueueErrors, AddFileToRestoreQueueResponses, ArchiveBrowseData, ArchiveBrowseErrors, ArchiveBrowseResponses, ArchiveMetadataData, ArchiveMetadataErrors, ArchiveMetadataResponses, ArchiveSearchData, ArchiveSearchErrors, ArchiveSearchResponses, ArchiveTreeData, ArchiveTreeErrors, ArchiveTreeResponses, BatchAddToRestoreQueueData, BatchAddToRestoreQueueErrors, BatchAddToRestoreQueueResponses, BatchConfirmDiscrepanciesData, BatchConfirmDiscrepanciesErrors, BatchConfirmDiscrepanciesResponses, BatchDeleteDiscrepanciesData, BatchDeleteDiscrepanciesErrors, BatchDeleteDiscrepanciesResponses, BatchDismissDiscrepanciesData, BatchDismissDiscrepanciesErrors, BatchDismissDiscrepanciesResponses, BatchResolveDiscrepanciesData, BatchResolveDiscrepanciesErrors, BatchResolveDiscrepanciesResponses, BatchTrackData, BatchTrackErrors, BatchTrackResponses, BrowseDiscrepanciesData, BrowseDiscrepanciesErrors, BrowseDiscrepanciesResponses, BrowseRestoreQueueData, BrowseRestoreQueueErrors, BrowseRestoreQueueResponses, CancelJobData, CancelJobErrors, CancelJobResponses, CheckHealthData, CheckHealthResponses, ClearRestoreQueueData, ClearRestoreQueueResponses, ConfirmDiscrepancyData, ConfirmDiscrepancyErrors, ConfirmDiscrepancyResponses, CreateMediaData, CreateMediaErrors, CreateMediaResponses, CreateSecretData, CreateSecretErrors, CreateSecretResponses, DeleteDiscrepancyData, DeleteDiscrepancyErrors, DeleteDiscrepancyResponses, DeleteMediaData, DeleteMediaErrors, DeleteMediaResponses, DeleteSecretData, DeleteSecretErrors, DeleteSecretResponses, DetectMediaData, DetectMediaResponses, DiscoverHardwareData, DiscoverHardwareResponses, DismissDiscrepancyData, DismissDiscrepancyErrors, DismissDiscrepancyResponses, DownloadExclusionReportData, DownloadExclusionReportErrors, DownloadExclusionReportResponses, ExportDatabaseData, ExportDatabaseResponses, FilesystemBrowseData, FilesystemBrowseErrors, FilesystemBrowseResponses, FilesystemSearchData, FilesystemSearchErrors, FilesystemSearchResponses, FilesystemTreeData, FilesystemTreeErrors, FilesystemTreeResponses, GetAnalyticsData, GetAnalyticsResponses, GetDashboardStatsData, GetDashboardStatsResponses, GetDiscrepancyTreeData, GetDiscrepancyTreeErrors, GetDiscrepancyTreeResponses, GetJobCountData, GetJobCountResponses, GetJobData, GetJobErrors, GetJobLogsData, GetJobLogsErrors, GetJobLogsResponses, GetJobResponses, GetJobStatsData, GetJobStatsResponses, GetRestoreManifestData, GetRestoreManifestResponses, GetRestoreQueueData, GetRestoreQueueResponses, GetRestoreQueueTreeData, GetRestoreQueueTreeErrors, GetRestoreQueueTreeResponses, GetScanStatusData, GetScanStatusResponses, GetSecretData, GetSecretErrors, GetSecretResponses, GetSettingsData, GetSettingsResponses, GetTreemapData, GetTreemapResponses, IgnoreHardwareData, IgnoreHardwareErrors, IgnoreHardwareResponses, ImportDatabaseData, ImportDatabaseErrors, ImportDatabaseResponses, InitializeMediaData, InitializeMediaErrors, InitializeMediaResponses, ListBackupsData, ListBackupsResponses, ListDirectoriesData, ListDirectoriesErrors, ListDirectoriesResponses, ListDiscrepanciesData, ListDiscrepanciesResponses, ListJobsData, ListJobsErrors, ListJobsResponses, ListMediaData, ListMediaErrors, ListMediaResponses, ListProvidersData, ListProvidersResponses, ListSecretsData, ListSecretsResponses, RemoveFromRestoreQueueData, RemoveFromRestoreQueueErrors, RemoveFromRestoreQueueResponses, ReorderMediaData, ReorderMediaErrors, ReorderMediaResponses, ResetTestEnvironmentData, ResetTestEnvironmentResponses, RetryJobData, RetryJobErrors, RetryJobResponses, StreamJobsData, StreamJobsResponses, TestExclusionsData, TestExclusionsErrors, TestExclusionsResponses, TestNotificationData, TestNotificationErrors, TestNotificationResponses, TriggerAutoBackupData, TriggerAutoBackupResponses, TriggerBackupData, TriggerBackupErrors, TriggerBackupResponses, TriggerIndexingData, TriggerIndexingResponses, TriggerRestoreData, TriggerRestoreErrors, TriggerRestoreResponses, TriggerScanData, TriggerScanResponses, UndoDismissDiscrepancyData, UndoDismissDiscrepancyErrors, UndoDismissDiscrepancyResponses, UpdateMediaData, UpdateMediaErrors, UpdateMediaResponses, UpdateSettingsData, UpdateSettingsErrors, UpdateSettingsResponses } from './types.gen';
import type { AddDirectoryToRestoreQueueData, AddDirectoryToRestoreQueueErrors, AddDirectoryToRestoreQueueResponses, AddFileToRestoreQueueData, AddFileToRestoreQueueErrors, AddFileToRestoreQueueResponses, ArchiveBrowseData, ArchiveBrowseErrors, ArchiveBrowseResponses, ArchiveMetadataData, ArchiveMetadataErrors, ArchiveMetadataResponses, ArchiveSearchData, ArchiveSearchErrors, ArchiveSearchResponses, ArchiveTreeData, ArchiveTreeErrors, ArchiveTreeResponses, BatchAddToRestoreQueueData, BatchAddToRestoreQueueErrors, BatchAddToRestoreQueueResponses, BatchConfirmDiscrepanciesData, BatchConfirmDiscrepanciesErrors, BatchConfirmDiscrepanciesResponses, BatchDeleteDiscrepanciesData, BatchDeleteDiscrepanciesErrors, BatchDeleteDiscrepanciesResponses, BatchDismissDiscrepanciesData, BatchDismissDiscrepanciesErrors, BatchDismissDiscrepanciesResponses, BatchResolveDiscrepanciesData, BatchResolveDiscrepanciesErrors, BatchResolveDiscrepanciesResponses, BatchTrackData, BatchTrackErrors, BatchTrackResponses, BrowseDiscrepanciesData, BrowseDiscrepanciesErrors, BrowseDiscrepanciesResponses, BrowseRestoreQueueData, BrowseRestoreQueueErrors, BrowseRestoreQueueResponses, CancelJobData, CancelJobErrors, CancelJobResponses, CheckHealthData, CheckHealthResponses, ClearRestoreQueueData, ClearRestoreQueueResponses, ConfirmDiscrepancyData, ConfirmDiscrepancyErrors, ConfirmDiscrepancyResponses, CreateMediaData, CreateMediaErrors, CreateMediaResponses, CreateSecretData, CreateSecretErrors, CreateSecretResponses, DeleteDiscrepancyData, DeleteDiscrepancyErrors, DeleteDiscrepancyResponses, DeleteMediaData, DeleteMediaErrors, DeleteMediaResponses, DeleteSecretData, DeleteSecretErrors, DeleteSecretResponses, DetectMediaData, DetectMediaResponses, DiscoverHardwareData, DiscoverHardwareResponses, DismissDiscrepancyData, DismissDiscrepancyErrors, DismissDiscrepancyResponses, DownloadExclusionReportData, DownloadExclusionReportErrors, DownloadExclusionReportResponses, ExportDatabaseData, ExportDatabaseResponses, FilesystemBrowseData, FilesystemBrowseErrors, FilesystemBrowseResponses, FilesystemSearchData, FilesystemSearchErrors, FilesystemSearchResponses, FilesystemTreeData, FilesystemTreeErrors, FilesystemTreeResponses, GetAnalyticsData, GetAnalyticsResponses, GetDashboardStatsData, GetDashboardStatsResponses, GetDiscrepancyTreeData, GetDiscrepancyTreeErrors, GetDiscrepancyTreeResponses, GetJobCountData, GetJobCountResponses, GetJobData, GetJobErrors, GetJobLogsData, GetJobLogsErrors, GetJobLogsResponses, GetJobResponses, GetJobStatsData, GetJobStatsResponses, GetRestoreManifestData, GetRestoreManifestResponses, GetRestoreQueueData, GetRestoreQueueResponses, GetRestoreQueueTreeData, GetRestoreQueueTreeErrors, GetRestoreQueueTreeResponses, GetScanStatusData, GetScanStatusResponses, GetSecretData, GetSecretErrors, GetSecretResponses, GetSettingsData, GetSettingsResponses, GetStagingInfoData, GetStagingInfoResponses, GetTreemapData, GetTreemapResponses, IgnoreHardwareData, IgnoreHardwareErrors, IgnoreHardwareResponses, ImportDatabaseData, ImportDatabaseErrors, ImportDatabaseResponses, InitializeMediaData, InitializeMediaErrors, InitializeMediaResponses, ListBackupsData, ListBackupsResponses, ListDirectoriesData, ListDirectoriesErrors, ListDirectoriesResponses, ListDiscrepanciesData, ListDiscrepanciesResponses, ListJobsData, ListJobsErrors, ListJobsResponses, ListMediaData, ListMediaErrors, ListMediaResponses, ListProvidersData, ListProvidersResponses, ListSecretsData, ListSecretsResponses, RemoveFromRestoreQueueData, RemoveFromRestoreQueueErrors, RemoveFromRestoreQueueResponses, ReorderMediaData, ReorderMediaErrors, ReorderMediaResponses, ResetTestEnvironmentData, ResetTestEnvironmentResponses, RetryJobData, RetryJobErrors, RetryJobResponses, StreamJobsData, StreamJobsResponses, TestExclusionsData, TestExclusionsErrors, TestExclusionsResponses, TestNotificationData, TestNotificationErrors, TestNotificationResponses, TriggerAutoBackupData, TriggerAutoBackupResponses, TriggerBackupData, TriggerBackupErrors, TriggerBackupResponses, TriggerIndexingData, TriggerIndexingResponses, TriggerRestoreData, TriggerRestoreErrors, TriggerRestoreResponses, TriggerScanData, TriggerScanResponses, UndoDismissDiscrepancyData, UndoDismissDiscrepancyErrors, UndoDismissDiscrepancyResponses, UpdateMediaData, UpdateMediaErrors, UpdateMediaResponses, UpdateSettingsData, UpdateSettingsErrors, UpdateSettingsResponses } from './types.gen';
export type Options<TData extends TDataShape = TDataShape, ThrowOnError extends boolean = boolean, TResponse = unknown> = Options2<TData, ThrowOnError, TResponse> & {
/**
@@ -32,6 +32,13 @@ export const resetTestEnvironment = <ThrowOnError extends boolean = false>(optio
*/
export const getDashboardStats = <ThrowOnError extends boolean = false>(options?: Options<GetDashboardStatsData, ThrowOnError>) => (options?.client ?? client).get<GetDashboardStatsResponses, unknown, ThrowOnError>({ url: '/system/dashboard/stats', ...options });
/**
* Get Staging Info
*
* Returns disk usage information for the backup staging directory.
*/
export const getStagingInfo = <ThrowOnError extends boolean = false>(options?: Options<GetStagingInfoData, ThrowOnError>) => (options?.client ?? client).get<GetStagingInfoResponses, unknown, ThrowOnError>({ url: '/system/staging/info', ...options });
/**
* List Jobs
*
@@ -53,6 +60,13 @@ export const getJobCount = <ThrowOnError extends boolean = false>(options?: Opti
*/
export const getJobStats = <ThrowOnError extends boolean = false>(options?: Options<GetJobStatsData, ThrowOnError>) => (options?.client ?? client).get<GetJobStatsResponses, unknown, ThrowOnError>({ url: '/system/jobs/stats', ...options });
/**
* Stream Jobs
*
* Server-Sent Events (SSE) endpoint for real-time job status updates.
*/
export const streamJobs = <ThrowOnError extends boolean = false>(options?: Options<StreamJobsData, ThrowOnError>) => (options?.client ?? client).get<StreamJobsResponses, unknown, ThrowOnError>({ url: '/system/jobs/stream', ...options });
/**
* Get Job
*
@@ -81,13 +95,6 @@ export const cancelJob = <ThrowOnError extends boolean = false>(options: Options
*/
export const retryJob = <ThrowOnError extends boolean = false>(options: Options<RetryJobData, ThrowOnError>) => (options.client ?? client).post<RetryJobResponses, RetryJobErrors, ThrowOnError>({ url: '/system/jobs/{job_id}/retry', ...options });
/**
* Stream Jobs
*
* Server-Sent Events (SSE) endpoint for real-time job status updates.
*/
export const streamJobs = <ThrowOnError extends boolean = false>(options?: Options<StreamJobsData, ThrowOnError>) => (options?.client ?? client).get<StreamJobsResponses, unknown, ThrowOnError>({ url: '/system/jobs/stream', ...options });
/**
* Trigger Scan
*
+52 -14
View File
@@ -1106,6 +1106,28 @@ export type SettingSchema = {
value: string;
};
/**
* StagingInfoSchema
*/
export type StagingInfoSchema = {
/**
* Path
*/
path: string;
/**
* Total Bytes
*/
total_bytes: number;
/**
* Used Bytes
*/
used_bytes: number;
/**
* Free Bytes
*/
free_bytes: number;
};
/**
* StorageProviderSchema
*/
@@ -1338,6 +1360,22 @@ export type GetDashboardStatsResponses = {
export type GetDashboardStatsResponse = GetDashboardStatsResponses[keyof GetDashboardStatsResponses];
export type GetStagingInfoData = {
body?: never;
path?: never;
query?: never;
url: '/system/staging/info';
};
export type GetStagingInfoResponses = {
/**
* Successful Response
*/
200: StagingInfoSchema;
};
export type GetStagingInfoResponse = GetStagingInfoResponses[keyof GetStagingInfoResponses];
export type ListJobsData = {
body?: never;
path?: never;
@@ -1402,6 +1440,20 @@ export type GetJobStatsResponses = {
200: unknown;
};
export type StreamJobsData = {
body?: never;
path?: never;
query?: never;
url: '/system/jobs/stream';
};
export type StreamJobsResponses = {
/**
* Successful Response
*/
200: unknown;
};
export type GetJobData = {
body?: never;
path: {
@@ -1520,20 +1572,6 @@ export type RetryJobResponses = {
200: unknown;
};
export type StreamJobsData = {
body?: never;
path?: never;
query?: never;
url: '/system/jobs/stream';
};
export type StreamJobsResponses = {
/**
* Successful Response
*/
200: unknown;
};
export type TriggerScanData = {
body?: never;
path?: never;
+51 -78
View File
@@ -52,8 +52,10 @@
ignoreHardware,
listProviders,
listSecrets,
getStagingInfo,
type MediaSchema,
type StorageProviderSchema
type StorageProviderSchema,
type StagingInfoSchema
} from '$lib/api';
import { LTO_CAPACITY, PROVIDER_TEMPLATES, type LtoTapeCreateData, type OfflineHddCreateData, type CloudCreateData } from '$lib/types';
import { dndzone } from 'svelte-dnd-action';
@@ -67,6 +69,7 @@
let loading = $state(true);
let showRegisterDialog = $state(false);
let editingMedia = $state<MediaSchema | null>(null);
let stagingInfo = $state<StagingInfoSchema | null>(null);
let activeMedia = $derived(mediaList.filter(m => m.status === 'active'));
let fullMedia = $derived(mediaList.filter(m => m.status === 'full'));
@@ -243,6 +246,38 @@
}
}
async function pollHardware() {
try {
const res = await discoverHardware();
if (!res.data) return;
const prevPaths = new Set(discoveredAssets.map(a => a.device_path));
let hasNew = false;
discoveredAssets = (res.data as any[]).map(newAsset => {
if (!prevPaths.has(newAsset.device_path)) {
hasNew = true;
}
const oldAsset = discoveredAssets.find(a => a.device_path === newAsset.device_path);
if (oldAsset && oldAsset.hardware_info && newAsset.hardware_info) {
if (Object.keys(newAsset.hardware_info.tape || {}).length === 0 && Object.keys(oldAsset.hardware_info.tape || {}).length > 0) {
newAsset.hardware_info.tape = oldAsset.hardware_info.tape;
}
if (Object.keys(newAsset.hardware_info.drive || {}).length === 0 && Object.keys(oldAsset.hardware_info.drive || {}).length > 0) {
newAsset.hardware_info.drive = oldAsset.hardware_info.drive;
}
}
return newAsset;
});
if (hasNew) {
loadMedia(true, true);
}
} catch (error) {
console.error("Hardware discovery failed:", error);
}
}
let prevOnlineCount = $state(0);
$effect(() => {
@@ -263,10 +298,20 @@
}
}
async function loadStagingInfo() {
try {
const res = await getStagingInfo();
if (res.data) stagingInfo = res.data;
} catch (error) {
console.error("Failed to load staging info:", error);
}
}
onMount(async () => {
// Initial load (non-silent and forced refresh to show live hardware status immediately)
loadMedia(false, true);
loadSecrets();
loadStagingInfo();
try {
const res = await listProviders();
@@ -275,7 +320,10 @@
console.error("Failed to load storage providers:", error);
}
pollInterval = setInterval(() => loadMedia(true), POLL_SLOW);
pollInterval = setInterval(() => {
pollHardware();
loadStagingInfo();
}, POLL_SLOW);
});
onDestroy(() => {
@@ -1095,7 +1143,7 @@
<label class="text-xs font-medium text-text-secondary ml-1" for="identifier">
{newMedia.media_type === 'lto_tape' ? 'Barcode' : newMedia.media_type === 'local_hdd' ? 'Identifier / Serial' : 'Friendly Name'}
</label>
<Input id="identifier" bind:value={newMedia.identifier} placeholder={newMedia.media_type === 'lto_tape' ? 'BUP-00001' : newMedia.media_type === 'local_hdd' ? 'Samsung-T7-001' : 'AWS-Production'} class="h-10 bg-bg-primary/50 border-border-color font-mono text-sm" />
<Input id="identifier" bind:value={newMedia.identifier} placeholder={newMedia.media_type === 'lto_tape' ? 'TAPE01' : newMedia.media_type === 'local_hdd' ? 'Samsung-T7-001' : 'AWS-Production'} class="h-10 bg-bg-primary/50 border-border-color font-mono text-sm" />
</div>
{#if newMedia.media_type === 'lto_tape'}
@@ -1232,28 +1280,10 @@
<h3 class="text-xs font-semibold text-text-secondary uppercase tracking-wider">Configuration</h3>
{#if newMedia.media_type === 'lto_tape'}
<div class="grid grid-cols-2 gap-4">
<div class="flex items-center gap-3 h-10 px-1">
<input id="compression" type="checkbox" bind:checked={newMedia.compression} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="compression">Hardware Compression</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="worm" type="checkbox" bind:checked={newMedia.worm} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="worm">WORM (Write Once Read Many)</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="write_protected" type="checkbox" bind:checked={newMedia.write_protected} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="write_protected">Write Protected (Physical)</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="cleaning_cartridge" type="checkbox" bind:checked={newMedia.cleaning_cartridge} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="cleaning_cartridge">Cleaning Cartridge</label>
</div>
</div>
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="encryption_key_id">Encryption Key ID</label>
<Input id="encryption_key_id" bind:value={newMedia.encryption_key_id} placeholder="Key reference in system keystore" class="h-10 bg-bg-primary/50 border-border-color font-mono text-sm" />
</div>
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="lto-encryption_secret_name">Encryption Secret</label>
<div class="relative">
@@ -1268,49 +1298,10 @@
<p class="text-[10px] text-text-secondary leading-tight opacity-60">Manage secrets in <a href="/settings" class="text-blue-500 hover:underline">Settings</a>.</p>
</div>
{:else if newMedia.media_type === 'local_hdd'}
<div class="grid grid-cols-2 gap-4">
<div class="flex items-center gap-3 h-10 px-1">
<input id="is_ssd" type="checkbox" bind:checked={newMedia.is_ssd} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="is_ssd">SSD (Solid State Drive)</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="encrypted" type="checkbox" bind:checked={newMedia.encrypted} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="encrypted">Drive Encrypted (BitLocker/LUKS)</label>
</div>
</div>
<div class="grid grid-cols-2 gap-6">
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="filesystem_type">Filesystem Type</label>
<div class="relative">
<select id="filesystem_type" bind:value={newMedia.filesystem_type} class="w-full h-10 bg-bg-primary border border-border-color rounded-xl px-4 pr-10 text-sm font-medium text-text-primary outline-none focus:ring-2 focus:ring-blue-500/20 transition-all appearance-none cursor-pointer">
<option value="">Select...</option>
<option value="ext4">ext4</option>
<option value="NTFS">NTFS</option>
<option value="APFS">APFS</option>
<option value="exFAT">exFAT</option>
</select>
<ChevronDown size={16} class="absolute right-3 top-1/2 -translate-y-1/2 text-text-secondary pointer-events-none" />
</div>
</div>
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="connection_interface">Connection Interface</label>
<div class="relative">
<select id="connection_interface" bind:value={newMedia.connection_interface} class="w-full h-10 bg-bg-primary border border-border-color rounded-xl px-4 pr-10 text-sm font-medium text-text-primary outline-none focus:ring-2 focus:ring-blue-500/20 transition-all appearance-none cursor-pointer">
<option value="">Select...</option>
<option value="USB-A">USB-A</option>
<option value="USB-C">USB-C</option>
<option value="Thunderbolt">Thunderbolt</option>
<option value="SATA">SATA</option>
<option value="NVMe">NVMe</option>
</select>
<ChevronDown size={16} class="absolute right-3 top-1/2 -translate-y-1/2 text-text-secondary pointer-events-none" />
</div>
</div>
</div>
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="hdd_encryption_key_id">Encryption Key ID</label>
<Input id="hdd_encryption_key_id" bind:value={newMedia.hdd_encryption_key_id} placeholder="Key reference in system keystore" class="h-10 bg-bg-primary/50 border-border-color font-mono text-sm" />
</div>
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="hdd-encryption_secret_name">Encryption Secret</label>
<div class="relative">
@@ -1435,24 +1426,10 @@
{#if editingMedia.media_type === 'lto_tape'}
<div class="space-y-4">
<h3 class="text-xs font-semibold text-text-secondary uppercase tracking-wider">LTO Configuration</h3>
<div class="grid grid-cols-2 gap-4">
<div class="flex items-center gap-3 h-10 px-1">
<input id="edit-compression" type="checkbox" bind:checked={editingMedia.compression} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="edit-compression">Hardware Compression</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="edit-worm" type="checkbox" bind:checked={editingMedia.worm} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="edit-worm">WORM</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="edit-write_protected" type="checkbox" bind:checked={editingMedia.write_protected} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="edit-write_protected">Write Protected</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="edit-cleaning_cartridge" type="checkbox" bind:checked={editingMedia.cleaning_cartridge} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="edit-cleaning_cartridge">Cleaning Cartridge</label>
</div>
</div>
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="edit-lto-encryption_secret_name">Encryption Secret</label>
<div class="relative">
@@ -1482,10 +1459,6 @@
<input id="edit-is_ssd" type="checkbox" bind:checked={editingMedia.is_ssd} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="edit-is_ssd">SSD</label>
</div>
<div class="flex items-center gap-3 h-10 px-1">
<input id="edit-encrypted" type="checkbox" bind:checked={editingMedia.encrypted} class="w-4 h-4 rounded border-border-color bg-bg-primary text-blue-600 focus:ring-blue-500/20" />
<label class="text-xs font-medium text-text-secondary cursor-pointer" for="edit-encrypted">Encrypted</label>
</div>
</div>
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="edit-hdd-encryption_secret_name">Encryption Secret</label>
+30 -5
View File
@@ -20,7 +20,8 @@
Upload,
Terminal,
Globe,
Key
Key,
ChevronDown
} from "lucide-svelte";
import { Button } from "$lib/components/ui/button";
import PageHeader from "$lib/components/ui/PageHeader.svelte";
@@ -51,6 +52,7 @@
let scanSchedule = $state("");
let archivalSchedule = $state("");
let notificationUrls = $state<string[]>([]);
let ioniceLevel = $state("idle");
// Secrets keystore
let secretsList = $state<string[]>([]);
@@ -66,7 +68,8 @@
globalExclusions,
scanSchedule,
archivalSchedule,
notificationUrls
notificationUrls,
ioniceLevel
}));
beforeNavigate((navigation: any) => {
@@ -155,6 +158,7 @@
if (data.schedule_scan) scanSchedule = data.schedule_scan;
if (data.schedule_archival) archivalSchedule = data.schedule_archival;
if (data.notification_urls) notificationUrls = JSON.parse(data.notification_urls);
if (data.ionice_level) ioniceLevel = data.ionice_level;
}
// Load secrets
@@ -169,7 +173,8 @@
globalExclusions,
scanSchedule,
archivalSchedule,
notificationUrls
notificationUrls,
ioniceLevel
});
} catch (error) {
toast.error("Failed to load system configuration");
@@ -188,7 +193,8 @@
updateSettings({ body: { key: "global_exclusions", value: globalExclusions } }),
updateSettings({ body: { key: "schedule_scan", value: scanSchedule } }),
updateSettings({ body: { key: "schedule_archival", value: archivalSchedule } }),
updateSettings({ body: { key: "notification_urls", value: JSON.stringify(notificationUrls) } })
updateSettings({ body: { key: "notification_urls", value: JSON.stringify(notificationUrls) } }),
updateSettings({ body: { key: "ionice_level", value: ioniceLevel } })
]);
// Snapshot saved state
@@ -199,7 +205,8 @@
globalExclusions,
scanSchedule,
archivalSchedule,
notificationUrls
notificationUrls,
ioniceLevel
});
toast.success("System configuration committed");
@@ -648,6 +655,24 @@
{:else if activeTab === 'system'}
<div class="animate-in slide-in-from-bottom-4 duration-500 space-y-6">
<Card class="p-5 shadow-xl">
<SectionHeader title="I/O scheduling" icon={Cpu} class="mb-6 px-0" />
<div class="space-y-4">
<div class="space-y-2">
<label class="text-xs font-medium text-text-secondary ml-1" for="ionice-level">Background job I/O priority</label>
<div class="relative">
<select id="ionice-level" bind:value={ioniceLevel} class="w-full h-10 bg-bg-primary border border-border-color rounded-xl px-4 pr-10 text-sm font-medium text-text-primary outline-none focus:ring-2 focus:ring-blue-500/20 transition-all appearance-none cursor-pointer">
<option value="idle">Idle (only use I/O when system is free)</option>
<option value="best-effort">Best-effort (normal scheduling)</option>
<option value="realtime">Real-time (highest priority, requires root)</option>
</select>
<ChevronDown size={16} class="absolute right-3 top-1/2 -translate-y-1/2 text-text-secondary pointer-events-none" />
</div>
<p class="text-[10px] text-text-secondary leading-tight opacity-60">Applies to scan and backup jobs. Idle is recommended for production systems.</p>
</div>
</div>
</Card>
<Card class="p-5 shadow-xl">
<SectionHeader title="Index management" icon={Database} class="mb-6 px-0" />
<div class="grid grid-cols-2 gap-4">