mirror of
https://github.com/supabase/supabase.git
synced 2026-06-28 11:33:52 -04:00
20290c71bd
## Summary Since the guides UA-redirect shipped (GROWTH-811), named LLM bots requesting `/docs/guides/*` get rewritten to the markdown handler, which returns a 404 when no `.md` file exists. About 90K of those 404s per day land on real pages that serve HTML 200 fine: the bot gets nothing on a page that works. The root cause is that the docs middleware hardcoded `hasMarkdownVariant: true` for every guide path, so it never checked whether a `.md` actually existed. I fixed it in two layers: 1. A build-time slug manifest makes `hasMarkdownVariant` truthful. Guide pages with no `.md` now fall through to HTML 200 instead of a 404. This is content-source-agnostic and future-proof: a new content source can never silently regress to a 404. 2. A second generator pass emits real markdown for the troubleshooting collection (the largest source, ~70% of the 404 volume), so those bots get clean markdown rather than just HTML. ## Changes - Add a shared `markdown-sources` module: a single source of truth for which slugs get a `.md` (guides + troubleshooting), so the generator output and the manifest cannot drift. - Generate markdown for the troubleshooting collection (196 pages, TOML frontmatter parsed via `smol-toml`), written under `public/markdown/guides/troubleshooting/`. - Emit a build-time slug manifest (a gitignored generated `.ts` module, regenerated in `prebuild`, `predev`, and `pretypecheck`, mirroring the existing `__generated__/graphql.ts` lifecycle). - Gate the middleware's `hasMarkdownVariant` on the manifest: serve HTML 200 instead of a 404 for guide paths with no markdown variant. This PR intentionally does not generate markdown for the ai-prompts, YAML config, and externally-fetched (splinter) sources. The HTML fallback covers them now; generating their markdown is follow-up work. ## Testing Local verification (deterministic, against the real manifest and the real negotiation function): - Manifest invariant holds: 744 manifest slugs equal 744 generated `.md` files. - Generator emits 196 troubleshooting files with zero warnings, frontmatter stripped, no leaked delimiters. - Negotiation decision matrix, 6/6: covered slug + bot UA to markdown; uncovered real page + bot UA to pass (HTML 200); nonexistent + bot UA to pass; browser to HTML; covered + `.md` suffix to markdown; uncovered + `.md` suffix to pass. Verified on the Vercel preview deploy: - [x] `User-Agent: ChatGPT-User` on a troubleshooting page returns `200 text/markdown` (real markdown body, frontmatter stripped). - [x] `User-Agent: ChatGPT-User` on an uncovered real page (`ai-tools/ai-prompts/code-format-sql`) returns `200 text/html` (was 404). - [x] Browser request to the same uncovered page returns `200 text/html` (unchanged for humans). - [x] `User-Agent: ChatGPT-User` on a covered standard guide returns `200 text/markdown` (no regression). - [x] `User-Agent: ChatGPT-User` on a nonexistent guide URL returns `404` (correct). Known limitation: an explicit `.md`-suffix request on an uncovered page still 404s by design (an explicit markdown request for a page that has no markdown). The ~90K/day volume is plain-URL UA-based, so it is unaffected. Post-deploy, I will re-run the request-grain 404 reclassification in the GROWTH-915 BQ workspace to confirm fixable guide markdown 404s drop to near zero. ## Linear - fixes GROWTH-946 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added generated markdown slug tracking for docs guides, improving markdown availability detection. * Added automated manifest generation and validation during docs build and CI workflows. * **Bug Fixes** * Improved guide markdown negotiation so only supported guide slugs are treated as having a markdown variant. * Standardized markdown source handling for guides and troubleshooting pages. * **Tests** * Added coverage for guide and troubleshooting slug generation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Alaister Young <10985857+alaister@users.noreply.github.com>
43 lines
1.3 KiB
TypeScript
43 lines
1.3 KiB
TypeScript
import path from 'node:path'
|
|
import { globby } from 'globby'
|
|
|
|
export type FrontmatterFormat = 'yaml' | 'toml'
|
|
|
|
export interface MarkdownSource {
|
|
sourceFile: string
|
|
slug: string
|
|
outPath: string
|
|
frontmatter: FrontmatterFormat
|
|
}
|
|
|
|
const OUTPUT_ROOT = 'public/markdown/guides'
|
|
const GUIDES_GLOB = 'content/guides/**/!(_)*.mdx'
|
|
const TROUBLESHOOTING_GLOB = 'content/troubleshooting/!(_)*.mdx'
|
|
|
|
export function guideSlug(sourceFile: string): string {
|
|
return sourceFile.replace(/^content\/guides\//, '').replace(/\.mdx$/, '')
|
|
}
|
|
|
|
export function troubleshootingSlug(sourceFile: string): string {
|
|
return `troubleshooting/${path.basename(sourceFile, '.mdx')}`
|
|
}
|
|
|
|
export async function collectMarkdownSources(): Promise<MarkdownSource[]> {
|
|
const [guideFiles, troubleshootingFiles] = await Promise.all([
|
|
globby([GUIDES_GLOB]),
|
|
globby([TROUBLESHOOTING_GLOB]),
|
|
])
|
|
|
|
const guides: MarkdownSource[] = guideFiles.map((sourceFile) => {
|
|
const slug = guideSlug(sourceFile)
|
|
return { sourceFile, slug, outPath: `${OUTPUT_ROOT}/${slug}.md`, frontmatter: 'yaml' }
|
|
})
|
|
|
|
const troubleshooting: MarkdownSource[] = troubleshootingFiles.map((sourceFile) => {
|
|
const slug = troubleshootingSlug(sourceFile)
|
|
return { sourceFile, slug, outPath: `${OUTPUT_ROOT}/${slug}.md`, frontmatter: 'toml' }
|
|
})
|
|
|
|
return [...guides, ...troubleshooting]
|
|
}
|