Nick Sweeting
a350f58e44
Mark hook subprocesses as abxpkg scripts
2026-06-21 08:03:00 -07:00
Nick Sweeting
614a5bd095
Prefer explicit hook library overrides
2026-06-21 07:30:32 -07:00
Nick Sweeting
3fa4fd5d18
Fix runtime hook config regressions
2026-06-21 06:09:04 -07:00
Nick Sweeting
8b57085827
chore: commit local archivebox changes
2026-06-14 11:44:38 -07:00
Nick Sweeting
4fa90e484a
release: archivebox 0.9.34rc71
2026-06-07 20:51:28 -07:00
Nick Sweeting
c0fb8eb532
release: archivebox 0.9.34rc39
2026-06-03 17:19:48 -07:00
Nick Sweeting
7dd738b5b7
release: archivebox 0.9.34rc37
2026-06-01 21:44:23 -07:00
Nick Sweeting
ac6e018672
release: archivebox 0.9.34rc34
2026-06-01 19:23:06 -07:00
Nick Sweeting
5c3161a5c1
release: archivebox 0.9.34rc29
2026-06-01 17:22:59 -07:00
Nick Sweeting
c075d654d8
Consolidate runtime config handling
2026-06-01 15:03:40 -07:00
Nick Sweeting
72a67bd511
Project abxpkg binary events
2026-06-01 02:02:42 -07:00
Nick Sweeting
cab05eb1c6
Refactor plugins search progress and config flows
2026-06-01 00:08:27 -07:00
Nick Sweeting
ecb1764590
switch to external plugins
2026-03-15 03:46:23 -07:00
Nick Sweeting
ec4b27056e
wip
2026-01-21 03:19:56 -08:00
Nick Sweeting
86e7973334
cleanup tui, startup, card templtes, and more
2026-01-19 14:33:20 -08:00
Nick Sweeting
bef67760db
working singlefile
2026-01-19 03:05:49 -08:00
Nick Sweeting
b5bbc3b549
better tui
2026-01-19 01:53:32 -08:00
Nick Sweeting
1cb2d5070e
bump version
2026-01-19 01:11:59 -08:00
Nick Sweeting
c7b2217cd6
tons of fixes with codex
2026-01-19 01:00:53 -08:00
Nick Sweeting
0a2ac11b01
more binary fixes
2026-01-05 02:26:33 -08:00
Nick Sweeting
b80e80439d
more binary fixes
2026-01-05 02:18:38 -08:00
Nick Sweeting
7ceaeae2d9
rename archive_org to archivedotorg, add BinaryWorker, fix config pass-through
2026-01-04 22:38:15 -08:00
Nick Sweeting
456aaee287
more migration id/uuid and config propagation fixes
2026-01-04 16:16:26 -08:00
Nick Sweeting
839ae744cf
simplify entrypoints for orchestrator and workers
2026-01-04 13:17:07 -08:00
Nick Sweeting
dd77511026
unified Process source of truth and better screenshot tests
2026-01-02 04:20:34 -08:00
Nick Sweeting
3672174dad
fix transition mid transition
2026-01-02 00:24:44 -08:00
Nick Sweeting
65ee09ceab
move tests into subfolder, add missing install hooks
2026-01-02 00:22:07 -08:00
Nick Sweeting
c2afb40350
fix lib bin dir and archivebox add hanging
2026-01-01 16:58:47 -08:00
Nick Sweeting
9008cefca2
codecov, migrations, orchestrator fixes
2026-01-01 16:57:04 -08:00
Nick Sweeting
60422adc87
fix orchestrator statemachine and Process from archiveresult migrations
2026-01-01 16:43:02 -08:00
Nick Sweeting
876feac522
actually working migration path from 0.7.2 and 0.8.6 + renames and test coverage
2026-01-01 15:50:00 -08:00
Nick Sweeting
6fadcf5168
remove model health stats from models that dont need it
2026-01-01 15:50:00 -08:00
Nick Sweeting
e903fa1d2b
Fix: Make SingleFile use SINGLEFILE_CHROME_ARGS with fallback to CHROME_ARGS ( #1754 )
...
Fixes #1445
This PR resolves the issue where SingleFile was not respecting Chrome
user data directory and other Chrome launch options that work for other
Chrome-based extractors (PDF, Screenshot, etc.).
## Changes
- Added `SINGLEFILE_CHROME_ARGS` config option with fallback to
`CHROME_ARGS`
- Updated SingleFile extractor to pass Chrome arguments via
`--browser-args`
- Updated documentation
This ensures SingleFile respects the same Chrome configuration as other
Chrome-based extractors.
Generated with [Claude Code](https://claude.ai/code )
2026-01-01 14:34:05 -08:00
Claude
09a1ca3134
Fix hook priority conflicts and standardize on_Binary naming
...
on_Snapshot priority fixes:
- redirects.bg.js stays at 31, staticfile.bg.js → 32
- headers.js stays at 55, readability.py → 56
- mercury.py → 57, htmltotext.py → 58
on_Binary hooks now have numeric priorities:
- 10: npm_install.py
- 11: pip_install.py
- 12: brew_install.py
- 13: apt_install.py
- 14: custom_install.py
- 15: env_install.py
2026-01-01 01:31:52 +00:00
Claude
4d33084496
Remove redundant chrome_validate hook, rename wget_validate to wget_install
...
- Delete chrome/on_Crawl__10_chrome_validate.py (duplicates chrome_install)
- Rename wget/on_Crawl__11_wget_validate.py → on_Crawl__06_wget_install.py
All hooks now follow consistent naming: install, launch, or config
2025-12-31 23:41:40 +00:00
Nick Sweeting
a04e4a7345
cleanup migrations, json, jsonl
2025-12-31 15:36:43 -08:00
Claude
4c77949197
Clean up on_Crawl hooks: remove duplicates and standardize naming
...
Deleted dead/duplicate hooks:
- wget/on_Crawl__10_install_wget.py (duplicate of __10_wget_validate_config.py)
- chrome/on_Crawl__00_chrome_install.py (simpler version, kept full one)
- chrome/on_Crawl__20_chrome_launch.bg.js (legacy, kept __30 version)
- singlefile/on_Crawl__20_install_singlefile_extension.js (disabled/dead)
- istilldontcareaboutcookies/on_Crawl__20_install_*.js (legacy)
- ublock/on_Crawl__03_ublock.js (legacy, kept __20 version)
- Entire captcha2/ plugin (legacy version of twocaptcha/)
Renamed hooks to follow consistent pattern: on_Crawl__XX_<plugin>_<action>.<ext>
Priority bands:
00-09: Binary/extension installation
10-19: Config validation
20-29: Browser launch and post-launch config
Final hooks:
00 ripgrep_install.py, 01 chrome_install.py
02 istilldontcareaboutcookies_install.js
03 ublock_install.js, 04 singlefile_install.js
05 twocaptcha_install.js
10 chrome_validate.py, 11 wget_validate.py
20 chrome_launch.bg.js, 25 twocaptcha_config.js
2025-12-31 22:47:36 +00:00
Nick Sweeting
73fde81fce
more migrations tweaks
2025-12-31 12:34:31 -08:00
Nick Sweeting
469932b469
more
2025-12-31 12:34:31 -08:00
Nick Sweeting
72f6a91b31
more progress bar and migrations fixes
2025-12-31 12:34:31 -08:00
Nick Sweeting
d5c0c64dcd
fix progress bars
2025-12-31 12:34:29 -08:00
Nick Sweeting
cb97f6651b
Add DNS traffic recorder plugin ( #1748 )
2025-12-31 11:02:43 -08:00
Nick Sweeting
60a4581ed8
Add tests for accessibility, parse_dom_outlinks, and consolelog plugins ( #1749 )
2025-12-31 11:01:56 -08:00
claude[bot]
1f84d1b467
Fix test assertions to fail when data is missing
...
- Add assertIsNotNone for accessibility_data to ensure test fails if no data generated
- Capture and report JSON decode errors in parse_dom_outlinks test
- Add assertIsNotNone for outlinks_data with error details
- Removes conditional checks that allowed tests to pass without verifying functionality
Addresses review comments from cubic-dev-ai
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com >
2025-12-31 19:00:30 +00:00
claude[bot]
483929391d
Fix test assertions to fail properly and add NXDOMAIN deduplication
...
- test_seo.py: Add assertIsNotNone before conditional to catch SEO extraction failures
- test_ssl.py: Add assertIsNotNone to ensure SSL data is captured from HTTPS URLs
- test_pip_provider.py: Assert jsonl_found variable to verify binary discovery
- dns plugin: Deduplicate NXDOMAIN records using seenResolutions map
Tests now fail when functionality doesn't work (no cheating).
Co-authored-by: Nick Sweeting <pirate@users.noreply.github.com >
2025-12-31 19:00:28 +00:00
Nick Sweeting
edc83bfac6
Add persona CLI command with browser cookie import ( #1747 )
2025-12-31 10:56:40 -08:00
Claude
2a68248602
Update all Chrome plugins to use shared chrome_utils.js
...
Refactored 8 plugins to import shared utilities instead of
duplicating code locally:
- consolelog, redirects: Complete rewrite using shared utils
- modalcloser, staticfile: Use readCdpUrl, readTargetId, parseArgs
- dom, screenshot, pdf: Remove local parseArgs/getCdpUrl
- headers: Import getEnv, getEnvBool, getEnvInt, parseArgs
Removes ~380 lines of duplicated boilerplate code.
2025-12-31 18:35:25 +00:00
Claude
263335dc6d
Add tests for merkletree and custom binary provider plugins
...
- merkletree: Tests merkle tree generation with real files,
empty directory handling, and disabled mode
- custom: Tests custom bash command execution and binary discovery
2025-12-31 18:30:04 +00:00
Claude
9703a8e88c
Add tests for responses, staticfile, and env provider plugins
...
- responses: Tests network response capture during page load
- staticfile: Tests static file detection and download skip for HTML
- env: Tests PATH-based binary discovery (python3, bash)
2025-12-31 18:28:01 +00:00
Claude
cfa5edb160
Add tests for accessibility, parse_dom_outlinks, and consolelog plugins
...
Real integration tests using Chrome sessions with example.com:
- accessibility: Tests page outline and accessibility tree extraction
- parse_dom_outlinks: Tests link extraction and categorization
- consolelog: Tests console output capture
2025-12-31 18:25:48 +00:00