127 Commits

Author SHA1 Message Date
Nick Sweeting a350f58e44 Mark hook subprocesses as abxpkg scripts 2026-06-21 08:03:00 -07:00
Nick Sweeting 614a5bd095 Prefer explicit hook library overrides 2026-06-21 07:30:32 -07:00
Nick Sweeting 3fa4fd5d18 Fix runtime hook config regressions 2026-06-21 06:09:04 -07:00
Nick Sweeting 8b57085827 chore: commit local archivebox changes 2026-06-14 11:44:38 -07:00
Nick Sweeting 4fa90e484a release: archivebox 0.9.34rc71 2026-06-07 20:51:28 -07:00
Nick Sweeting c0fb8eb532 release: archivebox 0.9.34rc39 2026-06-03 17:19:48 -07:00
Nick Sweeting 7dd738b5b7 release: archivebox 0.9.34rc37 2026-06-01 21:44:23 -07:00
Nick Sweeting ac6e018672 release: archivebox 0.9.34rc34 2026-06-01 19:23:06 -07:00
Nick Sweeting 5c3161a5c1 release: archivebox 0.9.34rc29 2026-06-01 17:22:59 -07:00
Nick Sweeting c075d654d8 Consolidate runtime config handling 2026-06-01 15:03:40 -07:00
Nick Sweeting 72a67bd511 Project abxpkg binary events 2026-06-01 02:02:42 -07:00
Nick Sweeting cab05eb1c6 Refactor plugins search progress and config flows 2026-06-01 00:08:27 -07:00
Nick Sweeting ecb1764590 switch to external plugins 2026-03-15 03:46:23 -07:00
Nick Sweeting ec4b27056e wip 2026-01-21 03:19:56 -08:00
Nick Sweeting 86e7973334 cleanup tui, startup, card templtes, and more 2026-01-19 14:33:20 -08:00
Nick Sweeting bef67760db working singlefile 2026-01-19 03:05:49 -08:00
Nick Sweeting b5bbc3b549 better tui 2026-01-19 01:53:32 -08:00
Nick Sweeting 1cb2d5070e bump version 2026-01-19 01:11:59 -08:00
Nick Sweeting c7b2217cd6 tons of fixes with codex 2026-01-19 01:00:53 -08:00
Nick Sweeting 0a2ac11b01 more binary fixes 2026-01-05 02:26:33 -08:00
Nick Sweeting b80e80439d more binary fixes 2026-01-05 02:18:38 -08:00
Nick Sweeting 7ceaeae2d9 rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through 2026-01-04 22:38:15 -08:00
Nick Sweeting 456aaee287 more migration id/uuid and config propagation fixes 2026-01-04 16:16:26 -08:00
Nick Sweeting 839ae744cf simplify entrypoints for orchestrator and workers 2026-01-04 13:17:07 -08:00
Nick Sweeting dd77511026 unified Process source of truth and better screenshot tests 2026-01-02 04:20:34 -08:00
Nick Sweeting 3672174dad fix transition mid transition 2026-01-02 00:24:44 -08:00
Nick Sweeting 65ee09ceab move tests into subfolder, add missing install hooks 2026-01-02 00:22:07 -08:00
Nick Sweeting c2afb40350 fix lib bin dir and archivebox add hanging 2026-01-01 16:58:47 -08:00
Nick Sweeting 9008cefca2 codecov, migrations, orchestrator fixes 2026-01-01 16:57:04 -08:00
Nick Sweeting 60422adc87 fix orchestrator statemachine and Process from archiveresult migrations 2026-01-01 16:43:02 -08:00
Nick Sweeting 876feac522 actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage 2026-01-01 15:50:00 -08:00
Nick Sweeting 6fadcf5168 remove model health stats from models that dont need it 2026-01-01 15:50:00 -08:00
Nick Sweeting e903fa1d2b Fix: Make SingleFile use SINGLEFILE_CHROME_ARGS with fallback to CHROME_ARGS (#1754)
Fixes #1445

This PR resolves the issue where SingleFile was not respecting Chrome
user data directory and other Chrome launch options that work for other
Chrome-based extractors (PDF, Screenshot, etc.).

## Changes
- Added `SINGLEFILE_CHROME_ARGS` config option with fallback to
`CHROME_ARGS`
- Updated SingleFile extractor to pass Chrome arguments via
`--browser-args`
- Updated documentation

This ensures SingleFile respects the same Chrome configuration as other
Chrome-based extractors.

Generated with [Claude Code](https://claude.ai/code)
2026-01-01 14:34:05 -08:00
Claude 09a1ca3134 Fix hook priority conflicts and standardize on_Binary naming
on_Snapshot priority fixes:
- redirects.bg.js stays at 31, staticfile.bg.js → 32
- headers.js stays at 55, readability.py → 56
- mercury.py → 57, htmltotext.py → 58

on_Binary hooks now have numeric priorities:
- 10: npm_install.py
- 11: pip_install.py
- 12: brew_install.py
- 13: apt_install.py
- 14: custom_install.py
- 15: env_install.py
2026-01-01 01:31:52 +00:00
Claude 4d33084496 Remove redundant chrome_validate hook, rename wget_validate to wget_install
- Delete chrome/on_Crawl__10_chrome_validate.py (duplicates chrome_install)
- Rename wget/on_Crawl__11_wget_validate.py → on_Crawl__06_wget_install.py

All hooks now follow consistent naming: install, launch, or config
2025-12-31 23:41:40 +00:00
Nick Sweeting a04e4a7345 cleanup migrations, json, jsonl 2025-12-31 15:36:43 -08:00
Claude 4c77949197 Clean up on_Crawl hooks: remove duplicates and standardize naming
Deleted dead/duplicate hooks:
- wget/on_Crawl__10_install_wget.py (duplicate of __10_wget_validate_config.py)
- chrome/on_Crawl__00_chrome_install.py (simpler version, kept full one)
- chrome/on_Crawl__20_chrome_launch.bg.js (legacy, kept __30 version)
- singlefile/on_Crawl__20_install_singlefile_extension.js (disabled/dead)
- istilldontcareaboutcookies/on_Crawl__20_install_*.js (legacy)
- ublock/on_Crawl__03_ublock.js (legacy, kept __20 version)
- Entire captcha2/ plugin (legacy version of twocaptcha/)

Renamed hooks to follow consistent pattern: on_Crawl__XX_<plugin>_<action>.<ext>
Priority bands:
  00-09: Binary/extension installation
  10-19: Config validation
  20-29: Browser launch and post-launch config

Final hooks:
  00 ripgrep_install.py, 01 chrome_install.py
  02 istilldontcareaboutcookies_install.js
  03 ublock_install.js, 04 singlefile_install.js
  05 twocaptcha_install.js
  10 chrome_validate.py, 11 wget_validate.py
  20 chrome_launch.bg.js, 25 twocaptcha_config.js
2025-12-31 22:47:36 +00:00
Nick Sweeting 73fde81fce more migrations tweaks 2025-12-31 12:34:31 -08:00
Nick Sweeting 469932b469 more 2025-12-31 12:34:31 -08:00
Nick Sweeting 72f6a91b31 more progress bar and migrations fixes 2025-12-31 12:34:31 -08:00
Nick Sweeting d5c0c64dcd fix progress bars 2025-12-31 12:34:29 -08:00
Nick Sweeting cb97f6651b Add DNS traffic recorder plugin (#1748) 2025-12-31 11:02:43 -08:00
Nick Sweeting 60a4581ed8 Add tests for accessibility, parse_dom_outlinks, and consolelog plugins (#1749) 2025-12-31 11:01:56 -08:00
claude[bot] 1f84d1b467 Fix test assertions to fail when data is missing
- Add assertIsNotNone for accessibility_data to ensure test fails if no data generated
- Capture and report JSON decode errors in parse_dom_outlinks test
- Add assertIsNotNone for outlinks_data with error details
- Removes conditional checks that allowed tests to pass without verifying functionality

Addresses review comments from cubic-dev-ai

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2025-12-31 19:00:30 +00:00
claude[bot] 483929391d Fix test assertions to fail properly and add NXDOMAIN deduplication
- test_seo.py: Add assertIsNotNone before conditional to catch SEO extraction failures
- test_ssl.py: Add assertIsNotNone to ensure SSL data is captured from HTTPS URLs
- test_pip_provider.py: Assert jsonl_found variable to verify binary discovery
- dns plugin: Deduplicate NXDOMAIN records using seenResolutions map

Tests now fail when functionality doesn't work (no cheating).

Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com>
2025-12-31 19:00:28 +00:00
Nick Sweeting edc83bfac6 Add persona CLI command with browser cookie import (#1747) 2025-12-31 10:56:40 -08:00
Claude 2a68248602 Update all Chrome plugins to use shared chrome_utils.js
Refactored 8 plugins to import shared utilities instead of
duplicating code locally:
- consolelog, redirects: Complete rewrite using shared utils
- modalcloser, staticfile: Use readCdpUrl, readTargetId, parseArgs
- dom, screenshot, pdf: Remove local parseArgs/getCdpUrl
- headers: Import getEnv, getEnvBool, getEnvInt, parseArgs

Removes ~380 lines of duplicated boilerplate code.
2025-12-31 18:35:25 +00:00
Claude 263335dc6d Add tests for merkletree and custom binary provider plugins
- merkletree: Tests merkle tree generation with real files,
  empty directory handling, and disabled mode
- custom: Tests custom bash command execution and binary discovery
2025-12-31 18:30:04 +00:00
Claude 9703a8e88c Add tests for responses, staticfile, and env provider plugins
- responses: Tests network response capture during page load
- staticfile: Tests static file detection and download skip for HTML
- env: Tests PATH-based binary discovery (python3, bash)
2025-12-31 18:28:01 +00:00
Claude cfa5edb160 Add tests for accessibility, parse_dom_outlinks, and consolelog plugins
Real integration tests using Chrome sessions with example.com:
- accessibility: Tests page outline and accessibility tree extraction
- parse_dom_outlinks: Tests link extraction and categorization
- consolelog: Tests console output capture
2025-12-31 18:25:48 +00:00