55 Commits

Author SHA1 Message Date
Vinta Chen 6c18b6447e feat: use explicit Projects section in README 2026-05-04 21:24:57 +08:00
Vinta Chen 921d47b455 remove index.md 2026-05-04 17:11:34 +08:00
Vinta Chen 3510db9df9 update llms.txt 2026-05-04 17:05:05 +08:00
Vinta Chen 509ebaff7a use file modification time as lastmod in sitemap 2026-05-04 16:24:52 +08:00
Vinta Chen 28b61a9212 style(seo): switch category page title separator from pipe to hyphen
Google truncates pipe separators and treats hyphens as cleaner word
boundaries in SERP titles.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 20:03:29 +08:00
Vinta Chen c886e470b6 feat(website): lead category meta description with real description when present, count first as fallback
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 19:57:31 +08:00
Vinta Chen 2f398acefb fix(seo): align JSON-LD with Yoast/RankMath conventions
- Wrap category pages in a self-contained @graph (WebSite + CollectionPage)
- Set canonical @id on CollectionPage to its URL (no hash fragment)
- Expand isPartOf to typed object {"@type": "WebSite", "@id": ...}
- Extract _website_node() and ISPARTOF_WEBSITE constants to avoid repetition
- Update tests to assert @graph structure on category pages

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 19:31:14 +08:00
Vinta Chen 86d2aa7e01 feat(website): add CollectionPage JSON-LD to category, group, and subcategory pages
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 19:23:14 +08:00
Vinta Chen b2910d59c8 feat(website): add homepage JSON-LD with WebSite, CollectionPage, ItemList for SEO/AEO
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 19:18:15 +08:00
Vinta Chen 3d99f7336d style(website): apply ruff format
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 12:23:55 +08:00
Vinta Chen d3f35a9d21 test(website): remove redundant and brittle tests
Drops tests that either duplicate coverage already provided by adjacent
cases (single-word slugify, trailing-slash checks) or hard-code first-
category names and specific description strings that break whenever the
README content shifts.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 12:19:32 +08:00
Vinta Chen a068219684 fix(website): type build template entries 2026-05-03 12:08:41 +08:00
Vinta Chen c68b985d7c feat(website): add /sponsorship/ landing page
Adds a dedicated sponsorship page at /sponsorship/ built from the Jinja2
template, with hero stats, tier cards, and CSS. Updates the index.html
sponsor sidebar link to point to /sponsorship/ instead of the GitHub
SPONSORSHIP.md. Adds the URL to the sitemap and test fixtures.

Also renames .impeccable.md to DESIGN.md.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 09:35:39 +08:00
Vinta Chen d64b47b910 feat(website): mirror index layout on category pages
Add search input, filter chips, no-results block, and back-to-top
button to category/group/subcategory pages. Pass filter_urls_json to
all page types so tag-chip navigation works site-wide. Fix JS so
filter-clear and no-results-clear redirect to / on non-index pages
instead of trying to filter a non-existent local table. Remove the
now-redundant .category-results CSS overrides.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 08:26:37 +08:00
Vinta Chen 04a04a136b feat(website): add data-url to tag buttons for client-side routing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:44:43 +08:00
Vinta Chen 704332271b fix(website): escape </script> in embedded filter URLs JSON
`| safe` bypasses Jinja autoescape. If a category name ever contained
"</script>", the literal substring would close the script block early,
leaking JSON content into the DOM and creating an XSS vector. Replace
"</" with "<\\/" (still valid JSON) and pass ensure_ascii=False so
non-ASCII names render readably. Also add a group_path() helper to
parallel category_path()/subcategory_path() and reuse category_urls
when seeding filter_urls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:40:52 +08:00
Vinta Chen e0e7fc9168 feat(website): embed filter-to-url map in index for client routing
Adds filter_urls dict (categories, groups, subcategories) in build.py,
passes filter_urls_json to the template, and injects a JSON script block
before the results section in index.html. Covered by a new test that
verifies all three URL types are present and correctly resolved.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 00:36:41 +08:00
Vinta Chen e320ba7278 test(website): restore exact sitemap URL list and lastmod count check
Membership-only assertions wouldn't catch phantom URLs added by future
build changes. Tighten back to an exact-list assertion now that we know
the fixture's exact output, and assert lastmod count tracks loc count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:35:11 +08:00
Vinta Chen fe7fd35e18 feat(website): include group and subcategory URLs in sitemap
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 00:32:21 +08:00
Vinta Chen 20df47e1e9 style(website): add CSS for category-breadcrumb and assert absence on parent
Mirrors the .category-subtitle a underline style for visual cohesion in
the hero, and locks in the gating behavior with a negative assertion so
a regression that drops the page_kind guard would be caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:30:56 +08:00
Vinta Chen 03702231af feat(website): show parent category breadcrumb on subcategory pages
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 00:27:22 +08:00
Vinta Chen eeecacc3bd feat(website): generate static pages for subcategories
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 00:23:14 +08:00
Vinta Chen 532d93d436 feat(website): generate static pages for groups under /categories/
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 00:19:14 +08:00
Vinta Chen cee1e65fb3 test(website): hoist pytest import to module level
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:17:53 +08:00
Vinta Chen 583d5e7c51 feat(website): assert unique slugs across categories and groups
Categories and groups will share the /categories/ URL namespace.
Fail the build with a clear error message if a future README change
introduces a collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:15:25 +08:00
Vinta Chen a46b57e428 fix(readme): rename group "Miscellaneous" to "Other"
Avoids a slug collision between the group "Miscellaneous" and the
category of the same name once both share the /categories/ URL
namespace introduced in the upcoming filter-URL refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:15:19 +08:00
Vinta Chen 39d4b3db4b feat(website): add subcategory_path and subcategory_public_url helpers
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 00:08:04 +08:00
Vinta Chen 4005c2ea82 feat(website): add slug and url to subcategory entries
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 00:05:02 +08:00
Vinta Chen 7fadbaf6fe feat(website): add homepage category directory 2026-05-02 23:44:27 +08:00
Vinta Chen b00395a301 add missing links of category descriptions 2026-05-02 23:35:24 +08:00
Vinta Chen e11afd1730 feat(website): generate static category pages 2026-05-02 23:31:08 +08:00
Vinta Chen 429c9b3d12 feat: generate llms.txt from template and annotate entries with star counts
- Add llms.txt Jinja2 template with a categories_md placeholder
- Extract categories body from README and inject it into the template
- Annotate bullet-entry lines with GitHub star counts (N GitHub stars)
  for the main index.md and bare numbers for llms.txt
- Add TestAnnotateEntriesWithStars unit tests

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-02 02:32:18 +08:00
Vinta Chen d9f26a8635 Improve SEO/AEO discovery surface for awesome-python.com (#3103)
* update gitignore

* feat: tighten homepage metadata

* fix: trim generated HTML whitespace

* feat(website): add discovery files and markdown alternate

* feat(website): add sitemap lastmod

* feat(seo): add Content-Signal directive to robots.txt

Signals search, ai-input, and ai-train to crawlers
via the experimental Content-Signal header in robots.txt.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-05-02 01:53:19 +08:00
Vinta Chen f10337bb31 refactor(tests): modernize test_readme_parser to use pathlib.Path
Replace os.path.join + manual open() with Path(__file__).resolve().parents[2]
and Path.read_text() for locating and reading README.md.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-19 22:07:16 +08:00
Vinta Chen 39b65bc994 refactor(build): inline format_stars_short into its call site
The helper only appeared once and the logic is two lines, so the named
function added indirection without clarity. Removed the four dedicated
unit tests that covered the function directly.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-19 22:00:45 +08:00
Vinta Chen c85f81bb24 refactor(build): accept Path directly in build() signature
Remove internal str->Path conversion; callers and tests now pass
Path objects directly.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-19 21:56:06 +08:00
Vinta Chen 520e285e8e test: add entry validation and broken-link detection tests
Add three tests against the real README: verify all entries have
non-empty names, valid http(s) URLs, and no broken markdown link
syntax (e.g. '[name(url)' missing the closing '](').

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-03 15:55:53 +08:00
Vinta Chen 1ae889b4fd fix: stricter GitHub owner/repo regexes and injection tests
Split _GITHUB_NAME_RE into separate owner and repo patterns.
Owner regex now rejects leading/trailing hyphens and dots (matching
GitHub's actual username rules). Repo regex requires alphanumeric
start but allows dots and underscores anywhere after.

New tests cover GraphQL injection attempts, invalid leading chars,
and valid hyphenated/underscore/dot combinations.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 15:03:06 +08:00
Vinta Chen e71f38ef4e test: add coverage for detect_source_type, format_stars_short, extract_entries, and last_commit_at parsing
Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-23 02:25:44 +08:00
Vinta Chen f27b7c80fb feat(website): add social proof line to hero with star count and build date
Display the awesome-python repo's star count (formatted as '230k+') and
the last data refresh date below the hero CTA. Fetches the self-repo
star count by always including vinta/awesome-python in the stars fetch.
Also removes the footer date stamp, which is now surfaced in the hero.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-23 01:56:15 +08:00
Vinta Chen 25a3f4d903 refactor(parser): remove resources parsing, preview, and content_html fields
parse_readme now returns list[ParsedGroup] instead of a tuple. The
resources section (Newsletters, Podcasts), preview string, and
content_html rendering are no longer produced by the parser or consumed
by the build. Removes _render_section_html, _group_by_h2, and the
associated dead code and tests.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-23 01:43:19 +08:00
Vinta Chen c5dd3060ef chore: add __pycache__ to .gitignore and remove sys.path hack in tests
Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-23 01:43:12 +08:00
Vinta Chen df2191fc05 refactor(build): remove unused group_categories wrapper
group_categories only ever appended a Resources group when the
resources list was non-empty. All call sites passed an empty list,
making it a no-op indirection. Inline parsed_groups directly and
remove the dead code along with its tests.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 15:58:42 +08:00
Vinta Chen 81074548b5 test: lower category count floor to 69 to match current README
Several sections were removed in recent cleanup commits, so the previous
floor of 76 was no longer accurate.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-22 01:32:32 +08:00
Vinta Chen 4322026817 refactor: parse thematic groups from README bold markers instead of hardcoding them
The website builder previously relied on a hardcoded SECTION_GROUPS list in
build.py to organize categories into thematic groups. This was fragile: any
rename or addition to README.md required a matching code change.

Replace this with a parser-driven approach:
- readme_parser.py now detects bold-only paragraphs (**Group Name**) as
  group boundary markers and groups H2 categories beneath them into
  ParsedGroup structs.
- build.py drops SECTION_GROUPS entirely; group_categories() now just
  passes parsed groups through and appends the Resources group.
- sort.py is removed as it relied on the old flat section model.
- Tests updated throughout to reflect the new (groups, resources) return
  shape and to cover the new grouping logic.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-20 18:43:09 +08:00
Vinta Chen 6148c13c0c feat: skip fetching repos whose cache entry is still fresh
Introduce CACHE_MAX_AGE_HOURS (12 h) and filter current_repos before
the fetch loop so repos that were updated recently are not re-requested.
Prints a breakdown of fetched vs cached count.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-18 22:55:21 +08:00
Vinta Chen 280f250ce0 feat: migrate README parser to markdown-it-py and refresh website
Switch readme_parser.py from regex-based parsing to markdown-it-py for
more robust and maintainable Markdown AST traversal. Update build pipeline,
templates, styles, and JS to support the new parser output. Refresh GitHub
stars data and update tests to match new parser behavior.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-18 20:33:36 +08:00
Vinta Chen af3baab2ed refactor: consolidate load_cache into build.load_stars
load_cache was a duplicate of logic now living in build.load_stars.
Switch the call site to the shared helper and remove the redundant
local function and its tests.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-18 17:28:53 +08:00
Vinta Chen 0f374970dd refactor: extract parsing logic from build.py into readme_parser module
slugify, parse_readme, count_entries, extract_preview, render_content_html,
and related helpers are moved to a dedicated readme_parser module.
build.py now imports from readme_parser rather than defining these inline.
Tests for the removed functions are dropped from test_build.py since they
now live with the module they test.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-18 17:27:14 +08:00
Vinta Chen 03ac212880 test: add integration tests against the real README.md
Adds TestParseRealReadme covering category count, slug generation,
descriptions, entry counts, previews, content HTML, subcategory
rendering, also-see links, and description link stripping.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-18 17:25:12 +08:00