343 Commits

Author SHA1 Message Date
Nick Sweeting 0e0759a680 Skip binary probes outside data dirs 2026-06-21 05:42:26 -07:00
Nick Sweeting 59a956bf9f Show dynamic DB binaries in version output 2026-06-21 05:24:15 -07:00
Nick Sweeting 9dbeece35b fix version command exit status 2026-06-21 02:41:18 -07:00
Nick Sweeting d259ac8095 fix machine network interface identity 2026-06-21 02:29:53 -07:00
Nick Sweeting 00e818018a release: v0.9.35rc46 2026-06-21 01:44:30 -07:00
Nick Sweeting 8b57085827 chore: commit local archivebox changes 2026-06-14 11:44:38 -07:00
Nick Sweeting 6f321797c9 fix: publish dev build with updated package deps 2026-06-14 09:34:32 -07:00
Nick Sweeting 88a92b6548 fix: keep version informational without installed binaries 2026-06-14 09:28:41 -07:00
Nick Sweeting 0ddda66ee3 release: archivebox 0.9.35rc35 2026-06-14 08:55:48 -07:00
Nick Sweeting 1dbde4776f release: archivebox 0.9.35rc27 2026-06-13 23:12:36 -07:00
Nick Sweeting e547abbf27 Align direct URL Crawl flow with historical depth=0 convention
archivebox add and other entry points now seed Crawl.urls as
CrawlSeed JSONL at depth=0 (the input layer) with max_depth=depth
for direct URLs and depth+1 only for stdin/import text where the
synthetic archivebox://internal root lives at depth=0. The runner
also accepts one plain URL per line for ORM/crawl-create/schedule
callers so every Crawl row goes through the same expansion path
without scattering CrawlSeed knowledge across the codebase.

Tests updated to match restored convention.
2026-06-13 18:04:45 -07:00
Nick Sweeting c9e63ccffd Centralize synthetic root snapshot creation in CrawlRunner
Direct URL inputs from CLI/UI/API now seed Crawl.urls as explicit
{type:CrawlSeed,url,depth} JSONL rows; raw stdin/UI/API import text
stays verbatim. The runner's create_initial_snapshots() is now the
single place that either expands seed rows or creates the synthetic
archivebox://internal root + staticfile/stdin.txt, so add paths no
longer perform DB/FS side effects and the parser hooks run through
the same Snapshot lifecycle as every other extractor.
2026-06-13 17:10:06 -07:00
Nick Sweeting 6a635d3cd6 Fix UI add direct URL runner flow 2026-06-11 09:28:28 -07:00
Nick Sweeting 14d43c88f5 Repair stale binaries in version output 2026-06-11 07:52:05 -07:00
Nick Sweeting b6921d5e03 Restore direct URL add snapshots 2026-06-11 06:21:46 -07:00
Nick Sweeting f6c98b67d7 Limit stale binary checks to version output 2026-06-11 01:32:30 -07:00
Nick Sweeting 98aecfaf91 release: archivebox 0.9.35rc8 2026-06-09 23:12:22 -07:00
Nick Sweeting 2bfb3ad4eb release: archivebox 0.9.35rc5 2026-06-09 22:09:28 -07:00
Nick Sweeting a1449c2822 Use plugin URL patterns for source imports 2026-06-08 23:40:25 -07:00
Nick Sweeting 5adec53b4c Expand add flow runtime handling 2026-06-08 23:27:08 -07:00
Nick Sweeting 4fa90e484a release: archivebox 0.9.34rc71 2026-06-07 20:51:28 -07:00
Nick Sweeting 2659f20dc4 fix runner takeover for scoped snapshot workers 2026-06-07 11:41:17 -07:00
Nick Sweeting 1ba5281343 release: archivebox 0.9.34rc68 2026-06-07 04:19:40 -07:00
Nick Sweeting 87b518314a release: archivebox 0.9.34rc67 2026-06-07 04:06:45 -07:00
Nick Sweeting 1b19736b2f Recover interrupted hook work by hook identity 2026-06-05 03:25:38 -07:00
Nick Sweeting 4e4ee8cdb0 fix runner stdin and update maintenance lifecycle 2026-06-04 22:38:22 -07:00
Nick Sweeting 73587a1a4d use archivebox plugin discovery for extraction queues 2026-06-04 21:57:33 -07:00
Nick Sweeting 7f8af6357d fix index and binary runner lifecycle 2026-06-04 21:40:58 -07:00
Nick Sweeting c0fb8eb532 release: archivebox 0.9.34rc39 2026-06-03 17:19:48 -07:00
Nick Sweeting 83a2099851 fix: scope update search backfill runner 2026-06-02 21:54:25 -07:00
Nick Sweeting 3669133a05 fix: allow install to initialize collections 2026-06-02 21:25:20 -07:00
Nick Sweeting 9f0544857c test: require success in cli workflows 2026-06-02 21:19:21 -07:00
Nick Sweeting e4ec848da8 fix: keep background add crawls runnable 2026-06-02 18:53:59 -07:00
Nick Sweeting 39eac65ed0 Stabilize frozen config CLI test flows
(cherry picked from commit 2bca869e32)
2026-06-02 18:44:57 -07:00
Nick Sweeting b46d142cc6 test cleanup 2026-06-02 12:13:47 -07:00
Nick Sweeting 96437e1ffd Publish local ArchiveBox changes 2026-06-02 02:25:52 -07:00
Nick Sweeting 7dd738b5b7 release: archivebox 0.9.34rc37 2026-06-01 21:44:23 -07:00
Nick Sweeting ac6e018672 release: archivebox 0.9.34rc34 2026-06-01 19:23:06 -07:00
Nick Sweeting 065fcfc0ba Preserve queued index jobs during reindex 2026-06-01 15:29:38 -07:00
Nick Sweeting c075d654d8 Consolidate runtime config handling 2026-06-01 15:03:40 -07:00
Nick Sweeting 453d998e7d fix: schedule background admin crawls 2026-06-01 10:44:21 -07:00
Nick Sweeting 72a67bd511 Project abxpkg binary events 2026-06-01 02:02:42 -07:00
Nick Sweeting cab05eb1c6 Refactor plugins search progress and config flows 2026-06-01 00:08:27 -07:00
Nick Sweeting 9bcba41b58 release: archivebox 0.9.33rc58 2026-05-31 04:38:45 -07:00
Nick Sweeting 83d5161b3e release: v0.9.33rc51 2026-05-31 01:14:40 -07:00
Nick Sweeting 5a38193f56 release: archivebox 0.9.33rc50 2026-05-30 22:27:28 -07:00
Nick Sweeting 6ce2555dfd fix: rename utils.py → util.py across modules, fix add --index-only, misc cleanups
Renames (no functional change, just consistency with the rest of the codebase):
- cli/cli_utils.py → cli/cli_util.py
- core/host_utils.py → core/host_util.py
- core/tag_utils.py → core/tag_util.py
- crawls/schedule_utils.py → crawls/schedule_util.py
- machine/env_utils.py → machine/env_util.py

Functional fixes:
- archivebox add --index-only now materializes Snapshot rows synchronously
  via crawl.create_snapshots_from_urls() instead of just queueing the Crawl
  and leaving the index empty. The previous behavior broke every test that
  expected --index-only to populate the index, since the runner is never
  started in index-only mode.
- config/collection.py: add _coerce_from_str_dict as the inverse of
  _coerce_to_str_dict so JSON-encoded INI values are decoded back to native
  dict/list types when mirrored into Machine.config (a JSONField). Without
  this, downstream consumers like MachineEvent / abx-dl get raw JSON
  strings where they expect dicts.

Plus matching admin / middleware / model touch-ups, the registration
password_change_form template, and assorted small cleanups the user
worked through while validating the deploy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 14:30:33 -07:00
Nick Sweeting 383e4b5c6e release: archivebox 0.9.33rc45 2026-05-30 05:42:53 -07:00
Nick Sweeting b0a47e8bf5 wip: snapshot live progress, universal --init, runner perms, supervisord SIGINT
- Snapshot detail page: embed scoped live-progress monitor (same-origin
  /progress.json on whichever host the page is served from); hide admin
  action buttons when scoped; per-snapshot perms via can_view_snapshot.
- crawl_file API: respect crawl-level permissions; PUBLIC/UNLISTED served
  to guests, PRIVATE returns 404 for non-admin/non-owner.
- CrawlRunner: replace allow_paused_snapshot_maintenance with
  allow_maintenance_on_inactive_crawl so SEALED crawls don't short-circuit
  the cancellation guard for legitimate maintenance hooks (search backend
  backfill, fs migration, etc.). Fixes infinite STARTED loop on snapshots
  with queued search_backend results.
- Universal `--init` flag: works on any subcommand (server, update, add,
  shell, install, ...). Detected at module load, stripped from argv, and
  consumed in the dispatcher so subprocesses inherit a clean env.
- supervisord_util.run_runner_worker: route Ctrl+C through
  supervisor.signalProcess(name, "SIGINT") instead of raw os.kill on a
  cached pid, gated on statename=RUNNING. Prevents killing unrelated
  processes when the worker's pid has been reused by the OS.
- Login page: remove non-functional password-reset links; add
  has_real_admin_users template tag to gate the bootstrap hint.
- Add page: hide underline on the "Get the extension" link.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 04:45:15 -07:00
Nick Sweeting 2d2b8ff047 release: archivebox 0.9.33rc39 2026-05-29 03:53:41 -07:00