5030 Commits

Author SHA1 Message Date
Nick Sweeting caba6e4246 Link publicsite capability chips
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:18:30 -07:00
Nick Sweeting 9b8f00fa26 more tweaks
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:16:51 -07:00
Nick Sweeting 166dcd5a6d tweaks
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:16:28 -07:00
Nick Sweeting 9c71acc2b2 Add README shields to publicsite hero
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:14:44 -07:00
Nick Sweeting 166a161b85 Align publicsite hero and nav with design system
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:13:27 -07:00
Nick Sweeting 4fef401bcd Refine publicsite hero heading
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:10:04 -07:00
Nick Sweeting 4804ad315e Update publicsite source header
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:04:36 -07:00
Nick Sweeting fc3682abfb Fix publicsite hero typo
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:03:40 -07:00
Nick Sweeting 35d630ba76 Tighten publicsite hero header
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 22:03:09 -07:00
Nick Sweeting abc987c403 Update publicsite intro header
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 21:39:08 -07:00
Nick Sweeting ca7eeb77eb Update publicsite configuration header
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 21:37:05 -07:00
Nick Sweeting 163e9bd626 Rename publicsite Pages workflow
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 21:36:05 -07:00
Nick Sweeting e013817dd0 public site tweaks
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 21:03:37 -07:00
Nick Sweeting 2b71c474ca [codex] Add static ArchiveBox landing page (#1791)
## Summary
- add a vanilla HTML/CSS landing page under repo-root `publicsite/`
- keep the existing ArchiveBox logo and custom domain CNAME in the Pages
artifact
- use the light-mode ArchiveBox design tokens with no dark-mode CSS
- update the GitHub Pages workflow to deploy `./publicsite` directly
without Jekyll
- remove the old top-level `website/` tree and duplicate Jekyll Pages
workflow

## Validation
- `ruby -e "require 'yaml';
YAML.load_file('.github/workflows/gh-pages.yml')"`
- parsed `publicsite/index.html` with Python `HTMLParser`
- served `publicsite` locally and verified `/`, `styles.css`,
`icon.png`, and `CNAME` return 200
2026-04-23 20:51:17 -07:00
Nick Sweeting 2c1700a8e0 Add static ArchiveBox landing page
Signed-off-by: Nick Sweeting <git@sweeting.me>
2026-04-23 20:48:12 -07:00
Nick Sweeting 42dc87f271 Remove slug field from Tag model (#1789)
## Summary

This PR removes the `slug` field from the Tag model and all related slug
generation logic. Tags are now identified and referenced by their name
instead of a generated slug, simplifying the data model and reducing
complexity.

## Related issues

N/A

## Changes these areas

- [x] Internal architecture
- [x] Snapshot data layout on disk

## Details

### What changed

1. **Model changes**: Removed the `slug` field from the Tag model,
including the `_generate_unique_slug()` method and slug generation logic
in the `save()` method
2. **Database migration**: Added migration `0034_remove_tag_slug` to
drop the slug column
3. **API updates**: Removed `slug` from all API schemas (TagSchema,
TagSearchCardSchema, TagUpdateResponseSchema) and responses
4. **Tag lookup**: Updated `get_tag_by_ref()` to use URL-decoded tag
names instead of slugs for lookups
5. **Tag filtering**: Simplified `get_matching_tags()` to only filter by
name instead of both name and slug
6. **Export filenames**: Changed tag export filenames to use
`quote(tag.name)` instead of `tag.slug`
7. **Admin interface**: Removed slug from TagAdmin search fields,
readonly fields, and fieldsets
8. **Templates**: Removed slug display from tag cards and similar tags
UI
9. **Tests**: Updated test expectations and removed slug assertions;
updated export filename checks to use `quote(tag.name)`

### Why

This simplifies the Tag model by removing the derived slug field. Tags
can be uniquely identified by their name, and URL encoding handles
special characters in filenames and URLs. This reduces database
complexity and eliminates the need for slug generation and uniqueness
logic.

## Test Plan

Existing tests have been updated to verify the new behavior:
- `test_tag_rename_api_updates_name` verifies tag renaming works without
slug
- `test_tag_snapshots_export_returns_jsonl` and
`test_tag_urls_export_returns_plain_text_urls` verify export filenames
use encoded tag names
- `test_tag_table_has_required_columns` verifies the database schema no
longer includes slug

All related tests pass with the updated assertions.

https://claude.ai/code/session_014KmEXoA64Ayp2t8BW2xfVP
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/archivebox/archivebox/pull/1789"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Removed the stored `slug` from Tag and moved to name-based tags. Added a
derived `Tag.slug` via `django.utils.text.slugify` for clean export
filenames and an admin download fallback; public APIs no longer include
slugs and lookups resolve by URL-decoded exact name.

- **Refactors**
- Replaced stored slug with a derived `Tag.slug` property; removed slug
generation/save logic.
- Public API schemas and autocomplete drop `slug`; matching/filtering
uses `name` only.
- `get_tag_by_ref` resolves by URL-decoded `name` (case-insensitive
exact match).
- Export endpoints set filenames using `tag.slug`; admin tag cards
expose `data-slug`, and the client uses it as a fallback filename.
Removed slug from admin search fields/fieldsets and UI displays.

- **Migration**
  - Run database migrations.
- Update any consumers expecting `slug` in Tag API/admin; use the tag
`name` for references (URL-encode names in links). Rely on
server-provided filenames, with the built-in client fallback using
`tag.slug` where needed.

<sup>Written for commit 7c3a3e0dba.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-04-21 11:54:00 -07:00
Claude 7c3a3e0dba Put tag slug back in JS download filename
Address pirate's review: restore the slug in the client-side
download fallback filename. Expose tag.slug as data-slug on the
card element and in the search card schema so the JS can read it
directly without slugifying client-side.
2026-04-21 17:35:57 +00:00
Claude 2ea66d05d1 Move tag slug logic onto Tag.slug @property
Replaces the tag_filename_safe() helper with a Tag.slug property
that returns the slugified form via django.utils.text.slugify.
Call sites now just use tag.slug directly.
2026-04-21 17:32:25 +00:00
Claude 0041a2d407 Sanitize tag export filenames via django.utils.text.slugify
Addresses review feedback from cubic and devin: quote()'s percent-
encoding isn't decoded by browsers in Content-Disposition's filename
parameter (Safari saves literal %20). Switch to Django's slugify()
which does NFKD normalization, ASCII transliteration, and replaces
punctuation with hyphens — producing clean names like
"tag-alpha-research-urls.txt".

- Add tag_filename_safe(name) helper wrapping slugify
- Use it in both tag export endpoints
- Drop the now-unneeded JS fallback name (server always sets
  Content-Disposition)
2026-04-21 17:30:50 +00:00
Claude b83e2de73a Add TODO on tag export filename encoding
Applies pirate's review suggestion on PR #1789: mark the
Content-Disposition filename encoding as a known-rough approach
that could be hardened further (strip punctuation, convert to
ASCII equivalents) in a follow-up.
2026-04-21 17:27:06 +00:00
Claude ec9c7c89f4 Drop Tag slug column and use URL-encoded names
Tags now support full unicode with no restrictions. URL-encode the tag
name wherever it previously used the slug (export filenames, lookups).

- Remove `slug` field, `_generate_unique_slug`, and slug handling in save()
- Add migration 0034 to drop the slug column
- `get_tag_by_ref` now resolves by URL-decoded exact name match
- Tag search/autocomplete/export filenames use the name directly
- Drop slug from admin search_fields/readonly_fields/fieldsets
- Remove slug display from similar-tag cards and client download filename
2026-04-20 17:05:07 +00:00
Nick Sweeting b68ff3ed29 fix monorepo script 2026-04-18 21:55:02 -07:00
Nick Sweeting 7d8c468659 rename abx-pkg to abxpkg 2026-04-17 10:48:40 -07:00
Nick Sweeting ee5685353b rename abx-pkg to abxpkg 2026-04-17 10:37:10 -07:00
Nick Sweeting f1287510ff rename abxpkg 2026-04-17 10:36:19 -07:00
Nick Sweeting 2cc5a11662 update dev instructions 2026-04-07 20:29:33 -07:00
Nick Sweeting 0b9b3b7e54 split tag editor issue 2026-04-07 20:21:58 -07:00
Nick Sweeting f126c6e628 symlink lock_pkgs to setup monorepo script 2026-04-07 20:21:41 -07:00
Nick Sweeting 4d66996569 small fixes 2026-04-06 23:47:38 -07:00
Nick Sweeting 1c6b78223c ignore outfiles 2026-04-04 23:11:06 -07:00
Nick Sweeting 3e7b83ac91 bump versions 2026-04-04 23:10:12 -07:00
Nick Sweeting c8221d5b13 Remove local uv sources override 2026-04-02 16:17:43 -07:00
Nick Sweeting b40b5b8b4d chore: bump abx dependency minimums 2026-04-02 15:18:39 -07:00
Nick Sweeting f3622d8cd3 update working changes 2026-03-25 05:36:07 -07:00
Nick Sweeting 80243accfd Fix archivebox CI regressions 2026-03-24 15:36:23 -07:00
Nick Sweeting 68d9e30c5f Fix pytest basetemp handling in test harness 2026-03-24 14:46:05 -07:00
Nick Sweeting ed1ddbc95e Fix CI workflows and migration tests 2026-03-24 13:37:02 -07:00
Nick Sweeting 50286d3c38 Reuse cached binaries in archivebox runtime 2026-03-24 11:03:43 -07:00
Nick Sweeting 39450111dd Update CI uv handling and runner changes 2026-03-23 13:27:23 -07:00
Nick Sweeting e1eb5693c9 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:16:47 -07:00
Nick Sweeting 25f935b9d1 split CrawlSetup into Install phase with new Binary + BinaryRequest events 2026-03-23 13:15:41 -07:00
Nick Sweeting f2c81142e1 tweak release script 2026-03-23 04:21:12 -07:00
Nick Sweeting 8a25704aac add harness tests 2026-03-23 04:12:46 -07:00
Nick Sweeting 1d94645abd test fixes 2026-03-23 04:12:31 -07:00
Nick Sweeting b749b26c5d wip 2026-03-23 03:58:32 -07:00
Nick Sweeting 268856bcfb Preserve common config console handling after rebase 2026-03-22 20:25:53 -07:00
Nick Sweeting f400a2cd67 WIP: checkpoint working tree before rebasing onto dev 2026-03-22 20:25:18 -07:00
Nick Sweeting a6548df8d0 Add configurable server security modes (#1773)
Fixes https://github.com/ArchiveBox/ArchiveBox/issues/239

## Summary
- add `SERVER_SECURITY_MODE` presets for safe subdomain replay, safe
one-domain no-JS replay, unsafe one-domain no-admin, and dangerous
one-domain full replay
- make host routing, replay URLs, static serving, and control-plane
access mode-aware
- add strict routing/header coverage plus a browser-backed
Chrome/Puppeteer test that verifies real same-origin behavior in all
four modes

## Testing
- `uv run pytest archivebox/tests/test_urls.py -v`
- `uv run pytest archivebox/tests/test_admin_views.py -v`
- `uv run pytest archivebox/tests/test_server_security_browser.py -v`

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/archivebox/archivebox/pull/1773"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Adds configurable server security modes to isolate admin/API from
archived content, with a safe subdomain default and single-domain
fallbacks. Routing, replay endpoints, headers, and middleware are
mode-aware, with browser tests validating same-origin behavior.

- New Features
- Introduced SERVER_SECURITY_MODE with presets:
safe-subdomains-fullreplay (default), safe-onedomain-nojsreplay,
unsafe-onedomain-noadmin, danger-onedomain-fullreplay.
- Mode-aware routing and base URLs; one-domain modes use path-based
replay: /snapshot/<id>/... and /original/<domain>/....
- Control plane gate: block admin/API and non-GET methods in
unsafe-onedomain-noadmin; allow full access in
danger-onedomain-fullreplay.
- Safer replay: detect risky HTML/SVG and apply CSP sandbox (no scripts)
in safe-onedomain-nojsreplay; add X-ArchiveBox-Security-Mode and
X-Content-Type-Options: nosniff on replay responses.
- Middleware and serving: added ServerSecurityModeMiddleware, improved
HostRouting, and static server byte-range/CSP handling.
- Tests: added Chrome/Puppeteer browser tests and stricter URL routing
tests covering all modes.

- Migration
- Default requires wildcard subdomains for full isolation (admin., web.,
api., and snapshot-id.<base>).
- To run on one domain, set SERVER_SECURITY_MODE to a one-domain preset;
URLs switch to /snapshot/<id>/ and /original/<domain>/ paths.
- For production, prefer safe-subdomains-fullreplay; lower-security
modes print a startup warning.

<sup>Written for commit ad41b15581.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-22 20:17:21 -07:00
Nick Sweeting c87079aa0a Refactor ArchiveBox onto abx-dl bus runner 2026-03-21 11:47:57 -07:00
Nick Sweeting ee9ed440d1 bump dependencies 2026-03-21 10:23:59 -07:00