mirror of
https://github.com/clockworklabs/SpacetimeDB.git
synced 2026-06-27 16:30:35 -04:00
be86a512f2
# Description of Changes LLM benchmark infrastructure improvements and new benchmark tasks. **Runner & scoring:** - Add retry logic with backoff for LLM API calls (rate limits, 502/503/504, timeouts) - Fix `generation_duration_ms` to only time the successful attempt, not retries+sleep delays - Add `--dry-run` flag to run benchmarks without saving results - Add OpenRouter client as unified fallback when direct vendor keys aren't set - Add web search mode via OpenRouter `:online` suffix - Extract shared OpenAI-compatible response types into `oa_compat.rs` - Add `ReducerCallBothScorer` for calling reducers on both golden and LLM databases - Set `max_tokens` on OpenRouter and Meta clients to prevent silent truncation **Model routing:** - Add `ModelRoute` with display name, vendor, API model, and OpenRouter model ID - Support ad-hoc model IDs via `--models vendor:model` without static registration - Add model name normalization (OpenRouter IDs, case variants → canonical display names) **Context modes:** - Add `guidelines`, `cursor_rules`, `search`, `no_context` modes with `is_empty_context_mode()` helper - Add mode-specific prompt preambles - Consolidate mode alias normalization (`none`/`no_guidelines` → `no_context`) **CI workflows:** - Add `llm-benchmark-periodic.yml` for scheduled nightly runs with per-language failure tracking - **Note**: The periodic workflow requires `OPENROUTER_API_KEY`, `LLM_BENCHMARK_UPLOAD_URL`, and `LLM_BENCHMARK_API_KEY` as GitHub secrets. - Add `llm-benchmark-validate-goldens.yml` for validating golden answers still compile **Results & summary:** - Add `cmd_status` to show incomplete benchmark combinations with rerun commands - Add `cmd_analyze` for LLM-powered failure analysis - Split `normalize_details_file` from `write_summary_from_details_file` - Derive task categories from filesystem for summary generation - Add timestamp tracking (`started_at`/`finished_at`) and token usage **New benchmark tasks:** - 30 new tasks across auth, data_modeling, queries, basics, and schema categories - Updated/fixed existing task prompts and golden answers # API and ABI breaking changes None. Internal tooling only. # Expected complexity level and risk 2 — Changes are scoped to the LLM benchmark CLI tool (`xtask-llm-benchmark`) and CI workflows. No impact on SpacetimeDB core. # Testing - [x] `cargo check -p xtask-llm-benchmark` — zero errors, zero warnings - [x] Dry run: `llm_benchmark run --lang typescript --modes no_context --tasks t_001 --models openai:gpt-5-mini --dry-run` — ran end-to-end, confirmed no results saved to disk - [ ] Verify periodic workflow runs successfully on next scheduled trigger --------- Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
49 lines
1.3 KiB
JavaScript
49 lines
1.3 KiB
JavaScript
#!/usr/bin/env node
|
|
/**
|
|
* Post-build script: copies the plugin-generated llms.txt to static/llms.md
|
|
* so it can be committed to the repo.
|
|
*
|
|
* Usage: pnpm build && node scripts/generate-llms.mjs
|
|
* or: pnpm generate-llms
|
|
*/
|
|
import { promises as fs } from 'node:fs';
|
|
import path from 'node:path';
|
|
import { fileURLToPath } from 'node:url';
|
|
|
|
const __dirname = path.dirname(fileURLToPath(import.meta.url));
|
|
const BUILD_DIR = path.resolve(__dirname, '../build');
|
|
const STATIC_DIR = path.resolve(__dirname, '../static');
|
|
|
|
async function findInBuild(filename) {
|
|
for (const candidate of [filename, `docs/${filename}`]) {
|
|
const p = path.join(BUILD_DIR, candidate);
|
|
try {
|
|
await fs.access(p);
|
|
return p;
|
|
} catch {}
|
|
}
|
|
return null;
|
|
}
|
|
|
|
async function main() {
|
|
const src = await findInBuild('llms.txt');
|
|
if (!src) {
|
|
console.error('Error: llms.txt not found in build output.');
|
|
console.error('Run "pnpm build" first to generate it via the plugin.');
|
|
process.exit(1);
|
|
}
|
|
|
|
const content = await fs.readFile(src, 'utf8');
|
|
const dest = path.join(STATIC_DIR, 'llms.md');
|
|
await fs.writeFile(dest, content, 'utf8');
|
|
|
|
const lines = content.split('\n').length;
|
|
console.log(`${src} -> ${dest}`);
|
|
console.log(` ${lines} lines`);
|
|
}
|
|
|
|
main().catch((err) => {
|
|
console.error(err);
|
|
process.exit(1);
|
|
});
|