2311 Commits

Author SHA1 Message Date
Claude 7c3a3e0dba Put tag slug back in JS download filename
Address pirate's review: restore the slug in the client-side
download fallback filename. Expose tag.slug as data-slug on the
card element and in the search card schema so the JS can read it
directly without slugifying client-side.
2026-04-21 17:35:57 +00:00
Claude 2ea66d05d1 Move tag slug logic onto Tag.slug @property
Replaces the tag_filename_safe() helper with a Tag.slug property
that returns the slugified form via django.utils.text.slugify.
Call sites now just use tag.slug directly.
2026-04-21 17:32:25 +00:00
Claude 0041a2d407 Sanitize tag export filenames via django.utils.text.slugify
Addresses review feedback from cubic and devin: quote()'s percent-
encoding isn't decoded by browsers in Content-Disposition's filename
parameter (Safari saves literal %20). Switch to Django's slugify()
which does NFKD normalization, ASCII transliteration, and replaces
punctuation with hyphens — producing clean names like
"tag-alpha-research-urls.txt".

- Add tag_filename_safe(name) helper wrapping slugify
- Use it in both tag export endpoints
- Drop the now-unneeded JS fallback name (server always sets
  Content-Disposition)
2026-04-21 17:30:50 +00:00
Claude b83e2de73a Add TODO on tag export filename encoding
Applies pirate's review suggestion on PR #1789: mark the
Content-Disposition filename encoding as a known-rough approach
that could be hardened further (strip punctuation, convert to
ASCII equivalents) in a follow-up.
2026-04-21 17:27:06 +00:00
Claude ec9c7c89f4 Drop Tag slug column and use URL-encoded names
Tags now support full unicode with no restrictions. URL-encode the tag
name wherever it previously used the slug (export filenames, lookups).

- Remove `slug` field, `_generate_unique_slug`, and slug handling in save()
- Add migration 0034 to drop the slug column
- `get_tag_by_ref` now resolves by URL-decoded exact name match
- Tag search/autocomplete/export filenames use the name directly
- Drop slug from admin search_fields/readonly_fields/fieldsets
- Remove slug display from similar-tag cards and client download filename
2026-04-20 17:05:07 +00:00
Nick Sweeting 0b9b3b7e54 split tag editor issue 2026-04-07 20:21:58 -07:00
Nick Sweeting 4d66996569 small fixes 2026-04-06 23:47:38 -07:00
Nick Sweeting 3e7b83ac91 bump versions 2026-04-04 23:10:12 -07:00
Nick Sweeting f3622d8cd3 update working changes 2026-03-25 05:36:07 -07:00
Nick Sweeting 80243accfd Fix archivebox CI regressions 2026-03-24 15:36:23 -07:00
Nick Sweeting 68d9e30c5f Fix pytest basetemp handling in test harness 2026-03-24 14:46:05 -07:00
Nick Sweeting ed1ddbc95e Fix CI workflows and migration tests 2026-03-24 13:37:02 -07:00
Nick Sweeting 50286d3c38 Reuse cached binaries in archivebox runtime 2026-03-24 11:03:43 -07:00
Nick Sweeting 39450111dd Update CI uv handling and runner changes 2026-03-23 13:27:23 -07:00
Nick Sweeting e1eb5693c9 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:16:47 -07:00
Nick Sweeting 25f935b9d1 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:15:41 -07:00
Nick Sweeting 8a25704aac add harness tests 2026-03-23 04:12:46 -07:00
Nick Sweeting 1d94645abd test fixes 2026-03-23 04:12:31 -07:00
Nick Sweeting b749b26c5d wip 2026-03-23 03:58:32 -07:00
Nick Sweeting 268856bcfb Preserve common config console handling after rebase 2026-03-22 20:25:53 -07:00
Nick Sweeting f400a2cd67 WIP: checkpoint working tree before rebasing onto dev 2026-03-22 20:25:18 -07:00
Nick Sweeting a6548df8d0 Add configurable server security modes (#1773)
Fixes https://github.com/ArchiveBox/ArchiveBox/issues/239

## Summary
- add `SERVER_SECURITY_MODE` presets for safe subdomain replay, safe
one-domain no-JS replay, unsafe one-domain no-admin, and dangerous
one-domain full replay
- make host routing, replay URLs, static serving, and control-plane
access mode-aware
- add strict routing/header coverage plus a browser-backed
Chrome/Puppeteer test that verifies real same-origin behavior in all
four modes

## Testing
- `uv run pytest archivebox/tests/test_urls.py -v`
- `uv run pytest archivebox/tests/test_admin_views.py -v`
- `uv run pytest archivebox/tests/test_server_security_browser.py -v`

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/archivebox/archivebox/pull/1773"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds configurable server security modes to isolate admin/API from
archived content, with a safe subdomain default and single-domain
fallbacks. Routing, replay endpoints, headers, and middleware are
mode-aware, with browser tests validating same-origin behavior.

- New Features
- Introduced SERVER_SECURITY_MODE with presets:
safe-subdomains-fullreplay (default), safe-onedomain-nojsreplay,
unsafe-onedomain-noadmin, danger-onedomain-fullreplay.
- Mode-aware routing and base URLs; one-domain modes use path-based
replay: /snapshot/<id>/... and /original/<domain>/....
- Control plane gate: block admin/API and non-GET methods in
unsafe-onedomain-noadmin; allow full access in
danger-onedomain-fullreplay.
- Safer replay: detect risky HTML/SVG and apply CSP sandbox (no scripts)
in safe-onedomain-nojsreplay; add X-ArchiveBox-Security-Mode and
X-Content-Type-Options: nosniff on replay responses.
- Middleware and serving: added ServerSecurityModeMiddleware, improved
HostRouting, and static server byte-range/CSP handling.
- Tests: added Chrome/Puppeteer browser tests and stricter URL routing
tests covering all modes.

- Migration
- Default requires wildcard subdomains for full isolation (admin., web.,
api., and snapshot-id.<base>).
- To run on one domain, set SERVER_SECURITY_MODE to a one-domain preset;
URLs switch to /snapshot/<id>/ and /original/<domain>/ paths.
- For production, prefer safe-subdomains-fullreplay; lower-security
modes print a startup warning.

<sup>Written for commit ad41b15581.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-22 20:17:21 -07:00
Nick Sweeting c87079aa0a Refactor ArchiveBox onto abx-dl bus runner 2026-03-21 11:47:57 -07:00
Nick Sweeting ad41b15581 Add configurable server security modes 2026-03-15 23:34:40 -07:00
Nick Sweeting 57e11879ec cleanup archivebox tests 2026-03-15 22:09:56 -07:00
Nick Sweeting 9de084da65 bump package versions 2026-03-15 20:47:28 -07:00
Nick Sweeting bc21d4bfdb type and test fixes 2026-03-15 20:12:27 -07:00
Nick Sweeting 3889eb4efa Tighten config and admin typing 2026-03-15 19:49:52 -07:00
Nick Sweeting 44cabac8d0 fix typing 2026-03-15 19:47:36 -07:00
Nick Sweeting 4756697a17 Use ruff pyright and ty for linting 2026-03-15 19:43:59 -07:00
Nick Sweeting 49436af869 Tighten CLI and admin typing 2026-03-15 19:33:15 -07:00
Nick Sweeting 5381f7584c Tighten API typing and add return values 2026-03-15 19:24:54 -07:00
Nick Sweeting 95a105feb9 small fixes 2026-03-15 19:22:06 -07:00
Nick Sweeting f932054915 add stricter locking around stage machine models 2026-03-15 19:21:41 -07:00
Nick Sweeting 311e4340ec Fix add CLI input handling and lint regressions 2026-03-15 19:04:13 -07:00
Nick Sweeting 5f0cfe5251 add new persona tests 2026-03-15 18:46:45 -07:00
Nick Sweeting 934e02695b fix lint 2026-03-15 18:45:29 -07:00
Nick Sweeting 70c9358cf9 Improve scheduling, runtime paths, and API behavior 2026-03-15 18:31:56 -07:00
Nick Sweeting 7d42c6c8b5 bump versions and fix docs 2026-03-15 17:43:07 -07:00
Nick Sweeting e598614b05 Avoid filesystem lookups in snapshot admin list 2026-03-15 17:18:53 -07:00
Nick Sweeting 21a0a27091 Remove 7 dead functions and 4 unused imports from hooks.py
Dead functions: extract_step, run_hooks, is_parser_plugin,
get_all_plugin_icons, discover_plugin_templates, find_binary_for_cmd,
create_model_record, get_parser_plugins

Dead imports: re, signal, subprocess, django.utils.timezone

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 16:34:20 -07:00
Nick Sweeting 0ac83c8799 Wait for crawl hook records before advancing 2026-03-15 14:15:04 -07:00
Nick Sweeting 1d16038ceb Relax archive output readiness check 2026-03-15 13:31:05 -07:00
Nick Sweeting 2585ef5870 Use npm package for readability extractor installs 2026-03-15 13:09:18 -07:00
Nick Sweeting 957387fd88 Fix plugin hook env and extractor retries 2026-03-15 12:39:27 -07:00
Nick Sweeting 1fc860e901 Remove legacy binary override coercion 2026-03-15 11:45:04 -07:00
Nick Sweeting f92ca93ae9 Skip puppeteer browser download during package install 2026-03-15 11:39:43 -07:00
Nick Sweeting 7c55259ed0 Update title HTML test for search export 2026-03-15 11:17:58 -07:00
Nick Sweeting 86fdc3be1e Refresh worker config from resolved plugin installs 2026-03-15 11:07:55 -07:00
Nick Sweeting 47f540c094 Resolve crawl provider dependencies lazily 2026-03-15 10:18:49 -07:00