125 Commits

Author SHA1 Message Date
Nick Sweeting 3fa4fd5d18 Fix runtime hook config regressions 2026-06-21 06:09:04 -07:00
Nick Sweeting c9e63ccffd Centralize synthetic root snapshot creation in CrawlRunner
Direct URL inputs from CLI/UI/API now seed Crawl.urls as explicit
{type:CrawlSeed,url,depth} JSONL rows; raw stdin/UI/API import text
stays verbatim. The runner's create_initial_snapshots() is now the
single place that either expands seed rows or creates the synthetic
archivebox://internal root + staticfile/stdin.txt, so add paths no
longer perform DB/FS side effects and the parser hooks run through
the same Snapshot lifecycle as every other extractor.
2026-06-13 17:10:06 -07:00
Nick Sweeting 6e0f2a41a3 Fix add input routing and plugin dependencies 2026-06-11 07:06:41 -07:00
Nick Sweeting b46d142cc6 test cleanup 2026-06-02 12:13:47 -07:00
Nick Sweeting 96437e1ffd Publish local ArchiveBox changes 2026-06-02 02:25:52 -07:00
Nick Sweeting c075d654d8 Consolidate runtime config handling 2026-06-01 15:03:40 -07:00
Nick Sweeting cab05eb1c6 Refactor plugins search progress and config flows 2026-06-01 00:08:27 -07:00
Nick Sweeting 6ce2555dfd fix: rename utils.py → util.py across modules, fix add --index-only, misc cleanups
Renames (no functional change, just consistency with the rest of the codebase):
- cli/cli_utils.py → cli/cli_util.py
- core/host_utils.py → core/host_util.py
- core/tag_utils.py → core/tag_util.py
- crawls/schedule_utils.py → crawls/schedule_util.py
- machine/env_utils.py → machine/env_util.py

Functional fixes:
- archivebox add --index-only now materializes Snapshot rows synchronously
  via crawl.create_snapshots_from_urls() instead of just queueing the Crawl
  and leaving the index empty. The previous behavior broke every test that
  expected --index-only to populate the index, since the runner is never
  started in index-only mode.
- config/collection.py: add _coerce_from_str_dict as the inverse of
  _coerce_to_str_dict so JSON-encoded INI values are decoded back to native
  dict/list types when mirrored into Machine.config (a JSONField). Without
  this, downstream consumers like MachineEvent / abx-dl get raw JSON
  strings where they expect dicts.

Plus matching admin / middleware / model touch-ups, the registration
password_change_form template, and assorted small cleanups the user
worked through while validating the deploy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 14:30:33 -07:00
Nick Sweeting 383e4b5c6e release: archivebox 0.9.33rc45 2026-05-30 05:42:53 -07:00
Nick Sweeting b0a47e8bf5 wip: snapshot live progress, universal --init, runner perms, supervisord SIGINT
- Snapshot detail page: embed scoped live-progress monitor (same-origin
  /progress.json on whichever host the page is served from); hide admin
  action buttons when scoped; per-snapshot perms via can_view_snapshot.
- crawl_file API: respect crawl-level permissions; PUBLIC/UNLISTED served
  to guests, PRIVATE returns 404 for non-admin/non-owner.
- CrawlRunner: replace allow_paused_snapshot_maintenance with
  allow_maintenance_on_inactive_crawl so SEALED crawls don't short-circuit
  the cancellation guard for legitimate maintenance hooks (search backend
  backfill, fs migration, etc.). Fixes infinite STARTED loop on snapshots
  with queued search_backend results.
- Universal `--init` flag: works on any subcommand (server, update, add,
  shell, install, ...). Detected at module load, stripped from argv, and
  consumed in the dispatcher so subprocesses inherit a clean env.
- supervisord_util.run_runner_worker: route Ctrl+C through
  supervisor.signalProcess(name, "SIGINT") instead of raw os.kill on a
  cached pid, gated on statename=RUNNING. Prevents killing unrelated
  processes when the worker's pid has been reused by the OS.
- Login page: remove non-functional password-reset links; add
  has_real_admin_users template tag to gate the bootstrap hint.
- Add page: hide underline on the "Get the extension" link.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 04:45:15 -07:00
Nick Sweeting c750084c3c release: archivebox 0.9.33rc37 2026-05-29 02:50:04 -07:00
Nick Sweeting 881a858383 release: archivebox 0.9.33rc35 2026-05-28 20:23:01 -07:00
Nick Sweeting 031e956080 release: archivebox 0.9.33rc29 2026-05-28 15:35:33 -07:00
Nick Sweeting b1fa3de5a5 release: archivebox 0.9.33rc13 2026-05-28 06:27:57 -07:00
Nick Sweeting fbcc972441 backup: save in-progress dev changes 2026-05-28 05:20:52 -07:00
Nick Sweeting 09f7c8bba0 release: archivebox 0.9.32rc32 2026-05-27 14:50:49 -07:00
Nick Sweeting 89a3119f2a release: v0.9.32rc24 2026-05-27 12:40:49 -07:00
Nick Sweeting 69a6bb360d release: archivebox 0.9.31rc42 2026-05-24 15:40:51 -07:00
Nick Sweeting d9596756e8 release: archivebox 0.9.31rc41 2026-05-24 14:07:33 -07:00
Nick Sweeting ec4eded38a release: archivebox 0.9.31rc39 2026-05-24 13:25:16 -07:00
Nick Sweeting a18a10c5f3 release: archivebox 0.9.31rc34 2026-05-24 03:36:47 -07:00
Nick Sweeting 9c70ac426b release: archivebox 0.9.31rc32 2026-05-24 03:09:59 -07:00
Nick Sweeting 687c27ab23 release: archivebox 0.9.31rc31 2026-05-24 03:07:01 -07:00
Nick Sweeting cadd3f517d release: archivebox 0.9.31rc30 2026-05-24 02:05:13 -07:00
Nick Sweeting ea4ce5b641 Prepare ArchiveBox 0.9.30rc81 demo release 2026-05-18 01:51:55 -07:00
Nick Sweeting 0cc4251c8a Update ArchiveBox for abx dependency releases 2026-05-16 23:50:16 -07:00
Claude 7c3a3e0dba Put tag slug back in JS download filename
Address pirate's review: restore the slug in the client-side
download fallback filename. Expose tag.slug as data-slug on the
card element and in the search card schema so the JS can read it
directly without slugifying client-side.
2026-04-21 17:35:57 +00:00
Claude 2ea66d05d1 Move tag slug logic onto Tag.slug @property
Replaces the tag_filename_safe() helper with a Tag.slug property
that returns the slugified form via django.utils.text.slugify.
Call sites now just use tag.slug directly.
2026-04-21 17:32:25 +00:00
Claude 0041a2d407 Sanitize tag export filenames via django.utils.text.slugify
Addresses review feedback from cubic and devin: quote()'s percent-
encoding isn't decoded by browsers in Content-Disposition's filename
parameter (Safari saves literal %20). Switch to Django's slugify()
which does NFKD normalization, ASCII transliteration, and replaces
punctuation with hyphens — producing clean names like
"tag-alpha-research-urls.txt".

- Add tag_filename_safe(name) helper wrapping slugify
- Use it in both tag export endpoints
- Drop the now-unneeded JS fallback name (server always sets
  Content-Disposition)
2026-04-21 17:30:50 +00:00
Claude b83e2de73a Add TODO on tag export filename encoding
Applies pirate's review suggestion on PR #1789: mark the
Content-Disposition filename encoding as a known-rough approach
that could be hardened further (strip punctuation, convert to
ASCII equivalents) in a follow-up.
2026-04-21 17:27:06 +00:00
Claude ec9c7c89f4 Drop Tag slug column and use URL-encoded names
Tags now support full unicode with no restrictions. URL-encode the tag
name wherever it previously used the slug (export filenames, lookups).

- Remove `slug` field, `_generate_unique_slug`, and slug handling in save()
- Add migration 0034 to drop the slug column
- `get_tag_by_ref` now resolves by URL-decoded exact name match
- Tag search/autocomplete/export filenames use the name directly
- Drop slug from admin search_fields/readonly_fields/fieldsets
- Remove slug display from similar-tag cards and client download filename
2026-04-20 17:05:07 +00:00
Nick Sweeting b749b26c5d wip 2026-03-23 03:58:32 -07:00
Nick Sweeting f400a2cd67 WIP: checkpoint working tree before rebasing onto dev 2026-03-22 20:25:18 -07:00
Nick Sweeting 57e11879ec cleanup archivebox tests 2026-03-15 22:09:56 -07:00
Nick Sweeting 49436af869 Tighten CLI and admin typing 2026-03-15 19:33:15 -07:00
Nick Sweeting 5381f7584c Tighten API typing and add return values 2026-03-15 19:24:54 -07:00
Nick Sweeting 95a105feb9 small fixes 2026-03-15 19:22:06 -07:00
Nick Sweeting f932054915 add stricter locking around stage machine models 2026-03-15 19:21:41 -07:00
Nick Sweeting 934e02695b fix lint 2026-03-15 18:45:29 -07:00
Nick Sweeting 70c9358cf9 Improve scheduling, runtime paths, and API behavior 2026-03-15 18:31:56 -07:00
Nick Sweeting 7d42c6c8b5 bump versions and fix docs 2026-03-15 17:43:07 -07:00
Nick Sweeting ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting 099d955ef5 Implement tags editor widget for Django admin (#1729)
Implement a sleek inline tag editor with autocomplete and AJAX support:

- Create TagEditorWidget and InlineTagEditorWidget in core/widgets.py
  - Pills display with X remove button, sorted alphabetically
  - Text input with HTML5 datalist autocomplete
  - Enter/Space/Comma to add tags, auto-creates if doesn't exist
  - Backspace removes last tag when input is empty

- Add API endpoints in api/v1_core.py
  - GET /tags/autocomplete/ - search tags by name
  - POST /tags/create/ - get_or_create tag
  - POST /tags/add-to-snapshot/ - add tag to snapshot via AJAX
  - POST /tags/remove-from-snapshot/ - remove tag from snapshot

- Update admin_snapshots.py
  - Replace FilteredSelectMultiple with TagEditorWidget in bulk actions
  - Create SnapshotAdminForm with tags_editor field
  - Update title_str() to render inline tag editor in list view
  - Remove TagInline, use widget instead

- Add CSS styles in templates/admin/base.html
  - Blue gradient pill styling matching admin theme
  - Focus ring and hover states
  - Compact inline variant for list view

<!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line
length changes. -->

# Summary

<!--e.g. This PR fixes ABC or adds the ability to do XYZ...-->

# Related issues

<!-- e.g. #123 or Roadmap goal #
https://github.com/pirate/ArchiveBox/wiki/Roadmap -->

# Changes these areas

- [ ] Bugfixes
- [ ] Feature behavior
- [ ] Command line interface
- [ ] Configuration options
- [ ] Internal architecture
- [ ] Snapshot data layout on disk

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Implemented a new interactive tags editor for Django admin with
autocomplete and AJAX add/remove, replacing the old multi-select and
inline. This makes tagging snapshots faster and safer in detail, list,
and bulk actions.

- **New Features**
- TagEditorWidget and InlineTagEditorWidget with pill UI and remove
buttons, XSS-safe rendering, and delegated events.
- Keyboard support: Enter/Space/Comma to add, Backspace to remove last
when input is empty.
- Datalist autocomplete and debounced search via GET
/tags/autocomplete/.
- AJAX endpoints: POST /tags/create/, /tags/add-to-snapshot/,
/tags/remove-from-snapshot/.

- **Refactors**
- Replaced FilteredSelectMultiple with TagEditorWidget in bulk actions;
parse comma-separated tags and use bulk_create/delete for efficient
add/remove.
- Added SnapshotAdminForm with tags_editor field; saves tags
case-insensitively and fixes remove_tags matching.
- Rendered inline tag editor in list view via title_str; removed
TagInline.
- Added CSS in admin/base.html for pill styling, focus ring, and compact
inline variant.

<sup>Written for commit 0dee662f41.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2025-12-30 11:59:39 -08:00
Nick Sweeting 95beddc5fc more migration fixes 2025-12-29 22:12:57 -08:00
Nick Sweeting 80f75126c6 more fixes 2025-12-29 21:03:05 -08:00
Claude 202e5b2e59 Add interactive tags editor widget for Django admin
Implement a sleek inline tag editor with autocomplete and AJAX support:

- Create TagEditorWidget and InlineTagEditorWidget in core/widgets.py
  - Pills display with X remove button, sorted alphabetically
  - Text input with HTML5 datalist autocomplete
  - Enter/Space/Comma to add tags, auto-creates if doesn't exist
  - Backspace removes last tag when input is empty

- Add API endpoints in api/v1_core.py
  - GET /tags/autocomplete/ - search tags by name
  - POST /tags/create/ - get_or_create tag
  - POST /tags/add-to-snapshot/ - add tag to snapshot via AJAX
  - POST /tags/remove-from-snapshot/ - remove tag from snapshot

- Update admin_snapshots.py
  - Replace FilteredSelectMultiple with TagEditorWidget in bulk actions
  - Create SnapshotAdminForm with tags_editor field
  - Update title_str() to render inline tag editor in list view
  - Remove TagInline, use widget instead

- Add CSS styles in templates/admin/base.html
  - Blue gradient pill styling matching admin theme
  - Focus ring and hover states
  - Compact inline variant for list view
2025-12-30 02:18:08 +00:00
Nick Sweeting 30c60eef76 much better tests and add page ui 2025-12-29 04:02:11 -08:00
Nick Sweeting f4e7820533 use full dotted paths for all archivebox imports, add migrations and more fixes 2025-12-29 00:47:08 -08:00
Nick Sweeting f0aa19fa7d wip 2025-12-28 17:51:54 -08:00