mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-06-22 19:40:39 -04:00
c9e63ccffd
Direct URL inputs from CLI/UI/API now seed Crawl.urls as explicit
{type:CrawlSeed,url,depth} JSONL rows; raw stdin/UI/API import text
stays verbatim. The runner's create_initial_snapshots() is now the
single place that either expands seed rows or creates the synthetic
archivebox://internal root + staticfile/stdin.txt, so add paths no
longer perform DB/FS side effects and the parser hooks run through
the same Snapshot lifecycle as every other extractor.