Files
tapehoard/GEMINI.md
T
adamlamers 0b302d2961
Continuous Integration / backend-tests (push) Failing after 5m25s
Continuous Integration / frontend-check (push) Successful in 9m45s
Continuous Integration / e2e-tests (push) Has been skipped
cleanup
2026-04-29 23:53:53 -04:00

14 KiB

TapeHoard - Developer & AI Assistant Guide

This document (GEMINI.md) contains critical, contextual information about the TapeHoard project. It takes absolute precedence over generic workflows. Always refer to the architecture constraints in PLAN.md before implementing new features.

1. Tooling & Ecosystem

Backend (Python)

  • Package Manager: uv. Never use pip directly. Use uv add <pkg> and uv sync to manage dependencies.
  • Framework: FastAPI.
  • Database: SQLite via SQLAlchemy ORM. Migrations are strictly managed by alembic.
    • To generate migrations: cd backend && uv run alembic revision --autogenerate -m "message"
    • To apply migrations: cd backend && uv run alembic upgrade head
  • Logging: loguru. Do not use standard logging or print statements.
  • Type Safety: ty. All Python code must be fully type-hinted and pass uv run ty without errors.
  • Configuration: pydantic-settings. Define environment variables and constants in a settings schema.

Frontend (Svelte 5 / SvelteKit)

  • Framework: Svelte 5 Runes (using $props(), $state(), etc.).
  • Styling: Tailwind CSS. All new components must use Tailwind utility classes.
  • Component Library: Custom library based on shadcn-svelte and bits-ui. Use existing components in src/lib/components/ui/ or add new ones following the shadcn pattern.
  • Package Manager: npm.
  • API Client Generation: @hey-api/openapi-ts. Never manually fetch or type API responses. Ensure the backend is running, then run just generate-client to auto-generate the strictly typed TypeScript client from the FastAPI OpenAPI spec.
  • Icons: lucide-svelte.
  • Notifications: svelte-sonner.

Global Task Runner

  • just: Use the justfile in the root directory for executing common tasks.
    • just dev: Starts both backend and frontend servers.
    • just lint: Runs Ruff, ty, and Svelte Check.
    • just format: Auto-formats code with Ruff.

2. Code Quality & Pre-commit

  • PEP 8 Compliance: All Python code must strictly adhere to PEP 8 standards. Use explicit, idiomatic language features.
  • Descriptive Naming: Always use very descriptive variable and function names. Avoid abbreviations (e.g., use file_state instead of fs) to maintain high readability.
  • Pre-commit: All code must pass pre-commit hooks (ruff, ruff-format, etc.).
  • Validation: Fulfill the user's request thoroughly, including adding tests when adding features or fixing bugs. You must empirically reproduce failures with new test cases before applying fixes.

3. Core Architectural Rules

Storage Providers & Media Lifecycle

  • Plugin Architecture: All storage destinations are treated as plugins implementing AbstractStorageProvider. Avoid hardcoding hardware logic (tape, hdd) in the API or UI.
  • Dynamic UI: The frontend dynamically renders registration and edit forms based on a provider's config_schema (fetched from GET /inventory/providers).
  • Standardized Telemetry: Providers must implement get_live_info(force: bool) to return unified telemetry (e.g., drive status, capacity).
  • Sanitization: Initializing media performs a full purge of existing TapeHoard data if the force flag is set.
  • Hardware Failure: Marking media as "Failed" triggers an automatic atomic purge of all associated file_versions to surface those files as "Pending" on the dashboard.
  • Tape Registration is Discovery-Only: Tape media (lto_tape, mock_lto) cannot be registered through the manual "Register media" dialog. Tapes are only registered via the hardware discovery section (/inventory → "Discovered unregistered drives") where the system auto-captures device_path, barcode, and serial number from the connected drive's MAM. The device_path is excluded from LTOProvider.config_schema because it is a per-drive setting configured globally in tape_drives, not a per-media attribute. The archiver resolves the drive at runtime when instantiating the provider.

Database & Performance

  • High Concurrency: SQLite must always run in WAL (Write-Ahead Logging) mode with a 30s busy timeout and larger page cache.
  • Archival Intent: is_ignored in filesystem_state is the single source of truth. The scanner indexes all files but lazily marks excluded ones as is_ignored = 1. Explicit user tracking policies override global exclusions.
  • Aggregate Intelligence: Use Raw SQL Aggregates for dashboard stats and directory protection status to avoid N+1 query patterns.
  • FTS5 Search: Full-text search is managed via triggers. Ensure searches filter for has_version = 1 when browsing the Archive Index, regardless of current is_ignored state on disk.

Scanning & Hashing Architecture

  • Concurrent Phasing: Decoupled into SCAN (Metadata, Normal priority) and HASH (Content, Idle priority with dynamic iowait throttling).
  • Thread-Safe Metrics: All counters (files processed, bytes hashed) must be protected by a threading.Lock.
  • Hashing Progress: Hashing jobs calculate progress against a dynamically updating snapshot of total sha256_hash IS NULL AND is_ignored = 0 files.
  • Streaming Subprocess I/O: Both _discover_files_fast (find -printf) and _hash_file_batch_fast (sha256sum/shasum) use subprocess.Popen with line-by-line readline streaming — never subprocess.run(capture_output=True). This enables incremental progress updates as each file is discovered or hashed.
  • Streaming Callback Pattern: The hashing sub-batch workers accept an on_result(file_path, hex_digest) callback (created via _make_hash_callback) that assigns hashes to DB records and reports job progress with throughput every 5 files, providing responsive UI updates during large batches.
  • Partial Batch Results: sha256sum/shasum may return non-zero exit codes when some files in a batch are missing. Output is always parsed regardless of returncode to capture partial results.
  • Missing File Guard: Files that cannot be hashed (deleted or inaccessible) are detected via os.path.exists fallback and marked is_deleted = True to prevent infinite re-query loops in the hashing worker.
  • Provider Temp Dir Lifecycle: MockLTOProvider auto-creates temp dirs when no device_path is configured. These are tracked in a module-level _auto_temp_dirs set and cleaned up via atexit on server shutdown. The device_path is persisted to StorageMedia.extra_config on /initialize so background threads can locate the correct directory.

Archival & Recovery

  • Format Negotiation: The Archiver adapts formats based on provider capabilities (supports_random_access).
    • Sequential (Tape): Uses .tar streams to maintain drive streaming.
    • Random Access (HDD/Cloud): Uses native direct file copying/objects to enable instant seekless restores without unpacking gigabytes of data.
  • High-Speed Hybrid Archival:
    • The system prioritizes the system tar binary for whole-file chunks, delivering a 10x-20x performance boost over pure Python and ensuring optimal buffer saturation for LTO drives.
    • It transparently falls back to the Python RangeFile logic only for chunks containing split fragments, maintaining bit-perfect alignment for multi-tape files.
  • Industrial Tar Chunking:
    • Large backup sets are automatically split into multiple independent archives. The system dynamically aims for at least 100 archives per tape (calculated based on generational capacity, e.g., ~15GB for LTO-5) to provide high seek granularity during restoration.
    • Exception: Single large files are allowed to occupy their own archives even if they exceed the target chunk size, preventing unnecessary fragmentation while keeping them as independent, seekable objects.
  • Refined Splitting Philosophy:
    • Files are only split if they are physically larger than the media's entire capacity (multi-tape spanning).
    • Skip-and-Defer: If a file is larger than the remaining space on a tape but smaller than its total capacity, it is deferred to the next fresh medium to minimize fragmentation.
  • Hardware-First Utilization: The system trusts Physical Hardware Feedback (MAM) over logical byte counts. Tapes are only marked as "Full" when the drive reporting (via get_utilization) confirms saturation, maximizing utilization when hardware compression is active.
  • Bitstream Integrity: RangeFile must guarantee exact byte counts for tar alignment.
  • Metadata Fidelity: The restorer must preserve original permissions (chmod), timestamps (utime), and ownership (chown) when recovering files natively or via tar.
  • Independence: Force all tar archive members to be Regular Files to break fragile hard-link dependencies. Symlinks are preserved as SYMTYPE (or .symlink stub objects for native format).

Deployment & Testing

  • Temporal Standard: Backend uses UTC. Frontend uses parseUTCDate to convert to browser Local Time.
  • Unsaved Changes Guard: UI must use beforeNavigate and beforeunload listeners to warn users if they leave the Settings or Media registration forms with uncommitted changes.
  • Backend Testing: Use Alembic-driven file-based SQLite for tests to ensure 100% schema fidelity (including FTS5 and triggers) and reliable cross-thread data visibility. Atomic truncation must occur between tests. Run just pytest to execute backend tests.
  • End-to-End (E2E) Testing: Playwright is used for E2E testing (frontend/tests/).
    • Mock Hardware: To simulate LTO drives in CI, the backend supports a TAPEHOARD_TEST_MODE=true flag. This registers a MockLTOProvider that uses local directories instead of physical SCSI devices.
    • Running E2E: Use just e2e-server to start the mock backend (on port 8001), and then just playwright to execute the Playwright test suite against it.

UI & UX Philosophy

  • Direct Terminology: Use technical terms like "Backup Manager", "System Status", "Archive Index". Avoid marketing fluff.
  • Layout: Natural page scrolling only. No sticky headers.
  • Navigation: The FileBrowser must maintain internal back/forward history separate from browser page navigation.
  • Refined Industrial Design Paradigm:
    • Scale: Standard root font size is 16px.
    • Typography: Transition from aggressive all-caps and heavy weights to Sentence case and font-medium for general UI text. Reserve font-bold for primary headers and high-impact dashboard metrics.
    • Modular Components: Use standardized layout components to maintain visual consistency:
      • PageHeader: Centralized logic for page titles, descriptions, and action buttons.
      • SectionHeader: Standardized "Industrial" divider (Icon + Title + Gradient Line).
      • StatCard: Modular metric tiles with consistent scaling and alignment for big numbers.
      • ProgressBar: Unified utilization and task indicators with industrial glow effects.
      • StatusBadge: Centralized state indicators (Success, Error, Warning, Neutral, Blue) with consistent padding.
      • Dialog: Standardized modal/dialog system with backdrop blurring and consistent ARIA roles.
      • EmptyState: Unified visual pattern for empty views with consistent icons and typography.
      • IconButton: Standardized boilerplate for icon-only buttons with fixed SVG scaling and consistent sizes.
      • Card: Unified p-5 padding, rounded-xl borders, and shadow-xl for all content containers.
      • Button: Standardized high-density h-9 px-4 sizing (or h-11 for primary CTAs) with font-medium sentence-case labels.
    • High Density: Maintain maximum information density without sacrificing legibility by utilizing high-density typography classes (text-4xs to text-6xs) for metadata and technical labels.
    • Color Strategy: Use low-opacity backgrounds (e.g., bg-blue-500/10) and subtle borders (border-blue-500/20) for interactive elements and badges to preserve the "professional terminal" aesthetic.

API & Type Safety

  • Explicit Response Models: All FastAPI endpoints MUST explicitly declare a response_model. This is critical for generating accurate OpenAPI specs and strictly typed TypeScript SDKs for the frontend.
  • Centralized Schemas: Define shared Pydantic models in app.api.schemas to avoid circular dependencies when importing across different routers.

Hardware Polling & Stability

  • Non-Intrusive Polling: Hardware status checks must prioritize non-intrusive methods (e.g., reading MAM via sg_read_attr). Intrusive operations (mt rewind) are strictly fallbacks. Always verify device path existence (os.path.exists) before issuing SCSI/CLI commands to prevent log spam on disconnected drives.
  • Last Known Good (LKG) Caching: Implement LKG caching in both backend hardware providers and frontend UI state. If a status poll fails or returns empty because a device is temporarily busy with an archival job, preserve and return the LKG state to prevent UI flickering.
  • Forced Refreshes: Hardware polling defaults to throttled (e.g., 2 seconds) intervals. Use force=True on provider calls and ?refresh=true on API endpoints to bypass throttling when the user explicitly requests a live update or upon initial page loads.

Frontend Reactivity

  • Svelte 5 State: When mutating complex data structures like Map or Set in Svelte 5 $state, always explicitly reassign the variable (e.g., myMap = new Map(myMap)) after mutation to trigger the reactivity engine.

4. Pending Feature Implementations

  • Media Pools & Sets: Transition from targeting individual media to targeting logical MediaPool entities. Archiver logic should resolve a pool to its active appendable member. Requires a new DB model and UI management.
  • Location & Custody Tracking: Implement a formalized check-in/out ledger (MediaCustodyLog) for physical offline media.
  • Barcode & Label Generation: Add a feature using reportlab or weasyprint to generate printable Avery-format PDF sheets containing Code 39 barcodes for tapes and QR codes for HDDs.
  • Lifecycle Policies: Implement background tasks in scheduler.py to flag expired data for pruning based on user-defined retention rules. Add physical wear alerts to the dashboard based on tape load_count and lifetime_mib_written.