## Summary
- Fix TypeScript view examples to use `ctx.sender` as a property,
matching the server SDK `ViewCtx` API.
- Update the architecture overview TypeScript view snippet to use
`players.rowType` and `undefined` for optional view returns.
- Clarify shared `ViewContext` prose so it does not imply every language
uses a callable `ctx.sender()` API.
- Improve docs agent-readiness metadata:
- publish `/docs/robots.txt` with a docs sitemap directive and
Content-Signal policy
- add Markdown alternate links for the existing `/docs/llms.txt` and
`/docs/llms-full.txt` outputs
- generate `/.well-known/agent-skills`-style discovery metadata under
`/docs/.well-known/agent-skills/` from the repo's existing `skills/`
source files during docs builds
## Validation
- `pnpm --dir docs build`
- `pnpm --dir docs typecheck`
- Verified the docs build emits `robots.txt` and
`.well-known/agent-skills/index.json`.
- Verified generated Agent Skills SHA-256 digests match the emitted
`SKILL.md` artifacts.
## Notes
The Cloudflare agent-readiness scan for `spacetimedb.com` still depends
on the root/marketing host exposing root-level files such as
`/robots.txt`, `/sitemap.xml`, and
`/.well-known/agent-skills/index.json`. This PR makes the docs origin
produce the corresponding docs-scoped artifacts at `/docs/...`; the root
host can route or mirror these if we want the exact `spacetimedb.com`
scan to pick them up.
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: rain <rain@rain.local>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
# Description of Changes
LLM benchmark infrastructure improvements and new benchmark tasks.
**Runner & scoring:**
- Add retry logic with backoff for LLM API calls (rate limits,
502/503/504, timeouts)
- Fix `generation_duration_ms` to only time the successful attempt, not
retries+sleep delays
- Add `--dry-run` flag to run benchmarks without saving results
- Add OpenRouter client as unified fallback when direct vendor keys
aren't set
- Add web search mode via OpenRouter `:online` suffix
- Extract shared OpenAI-compatible response types into `oa_compat.rs`
- Add `ReducerCallBothScorer` for calling reducers on both golden and
LLM databases
- Set `max_tokens` on OpenRouter and Meta clients to prevent silent
truncation
**Model routing:**
- Add `ModelRoute` with display name, vendor, API model, and OpenRouter
model ID
- Support ad-hoc model IDs via `--models vendor:model` without static
registration
- Add model name normalization (OpenRouter IDs, case variants →
canonical display names)
**Context modes:**
- Add `guidelines`, `cursor_rules`, `search`, `no_context` modes with
`is_empty_context_mode()` helper
- Add mode-specific prompt preambles
- Consolidate mode alias normalization (`none`/`no_guidelines` →
`no_context`)
**CI workflows:**
- Add `llm-benchmark-periodic.yml` for scheduled nightly runs with
per-language failure tracking
- **Note**: The periodic workflow requires `OPENROUTER_API_KEY`,
`LLM_BENCHMARK_UPLOAD_URL`, and `LLM_BENCHMARK_API_KEY` as GitHub
secrets.
- Add `llm-benchmark-validate-goldens.yml` for validating golden answers
still compile
**Results & summary:**
- Add `cmd_status` to show incomplete benchmark combinations with rerun
commands
- Add `cmd_analyze` for LLM-powered failure analysis
- Split `normalize_details_file` from `write_summary_from_details_file`
- Derive task categories from filesystem for summary generation
- Add timestamp tracking (`started_at`/`finished_at`) and token usage
**New benchmark tasks:**
- 30 new tasks across auth, data_modeling, queries, basics, and schema
categories
- Updated/fixed existing task prompts and golden answers
# API and ABI breaking changes
None. Internal tooling only.
# Expected complexity level and risk
2 — Changes are scoped to the LLM benchmark CLI tool
(`xtask-llm-benchmark`) and CI workflows. No impact on SpacetimeDB core.
# Testing
- [x] `cargo check -p xtask-llm-benchmark` — zero errors, zero warnings
- [x] Dry run: `llm_benchmark run --lang typescript --modes no_context
--tasks t_001 --models openai:gpt-5-mini --dry-run` — ran end-to-end,
confirmed no results saved to disk
- [ ] Verify periodic workflow runs successfully on next scheduled
trigger
---------
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>