SpacetimeDB

PublicArchive/SpacetimeDB

Fork 0

mirror of https://github.com/clockworklabs/SpacetimeDB.git synced 2026-06-28 00:38:30 -04:00

Commit Graph

Author	SHA1	Message	Date
clockwork-labs-bot	f83d41c75c	docs: consolidate outstanding docs fixes (#5166 ) ## Summary Consolidates the outstanding docs PRs opened by `clockwork-labs-bot` / Docs Gremlin into one reviewable PR: - #4958 - #5085 - #5089 - #5097 - #5112 - #5114 - #5117 - #5127 - #5138 - #5165 - #5175 - #5222 This combines docs updates for: - getting-started links - server-issued auth token reconnect behavior in the Unity tutorial - C++ module/client language coverage - Unreal client ticking / `FrameTick` guidance - C# connection callback signatures and codegen language spelling - deterministic schedule-table sample time - client frame ticking troubleshooting - corrected `spacetime generate` usage for Unreal bindings - TypeScript framework integration reference updates for SolidJS and current React query-builder tuple usage ## Validation - `rg -n '^(<<<<<<<\|=======\|>>>>>>>)' docs crates skills` - `git diff --check origin/master...HEAD` - `pnpm --dir docs typecheck` - `pnpm --dir docs build` --------- Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com> Co-authored-by: rain <rain@rain.local> Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>	2026-06-05 15:43:27 +00:00
bradleyshep	be86a512f2	LLM Benchmark Improvements + More Evals (#4740 ) # Description of Changes LLM benchmark infrastructure improvements and new benchmark tasks. Runner & scoring: - Add retry logic with backoff for LLM API calls (rate limits, 502/503/504, timeouts) - Fix `generation_duration_ms` to only time the successful attempt, not retries+sleep delays - Add `--dry-run` flag to run benchmarks without saving results - Add OpenRouter client as unified fallback when direct vendor keys aren't set - Add web search mode via OpenRouter `:online` suffix - Extract shared OpenAI-compatible response types into `oa_compat.rs` - Add `ReducerCallBothScorer` for calling reducers on both golden and LLM databases - Set `max_tokens` on OpenRouter and Meta clients to prevent silent truncation Model routing: - Add `ModelRoute` with display name, vendor, API model, and OpenRouter model ID - Support ad-hoc model IDs via `--models vendor:model` without static registration - Add model name normalization (OpenRouter IDs, case variants → canonical display names) Context modes: - Add `guidelines`, `cursor_rules`, `search`, `no_context` modes with `is_empty_context_mode()` helper - Add mode-specific prompt preambles - Consolidate mode alias normalization (`none`/`no_guidelines` → `no_context`) CI workflows: - Add `llm-benchmark-periodic.yml` for scheduled nightly runs with per-language failure tracking - Note: The periodic workflow requires `OPENROUTER_API_KEY`, `LLM_BENCHMARK_UPLOAD_URL`, and `LLM_BENCHMARK_API_KEY` as GitHub secrets. - Add `llm-benchmark-validate-goldens.yml` for validating golden answers still compile Results & summary: - Add `cmd_status` to show incomplete benchmark combinations with rerun commands - Add `cmd_analyze` for LLM-powered failure analysis - Split `normalize_details_file` from `write_summary_from_details_file` - Derive task categories from filesystem for summary generation - Add timestamp tracking (`started_at`/`finished_at`) and token usage New benchmark tasks: - 30 new tasks across auth, data_modeling, queries, basics, and schema categories - Updated/fixed existing task prompts and golden answers # API and ABI breaking changes None. Internal tooling only. # Expected complexity level and risk 2 — Changes are scoped to the LLM benchmark CLI tool (`xtask-llm-benchmark`) and CI workflows. No impact on SpacetimeDB core. # Testing - [x] `cargo check -p xtask-llm-benchmark` — zero errors, zero warnings - [x] Dry run: `llm_benchmark run --lang typescript --modes no_context --tasks t_001 --models openai:gpt-5-mini --dry-run` — ran end-to-end, confirmed no results saved to disk - [ ] Verify periodic workflow runs successfully on next scheduled trigger --------- Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>	2026-05-11 22:53:24 +00:00

Author

SHA1

Message

Date

clockwork-labs-bot

f83d41c75c

docs: consolidate outstanding docs fixes (#5166 )

## Summary

Consolidates the outstanding docs PRs opened by `clockwork-labs-bot` /
Docs Gremlin into one reviewable PR:

- #4958
- #5085
- #5089
- #5097
- #5112
- #5114
- #5117
- #5127
- #5138
- #5165
- #5175
- #5222

This combines docs updates for:

- getting-started links
- server-issued auth token reconnect behavior in the Unity tutorial
- C++ module/client language coverage
- Unreal client ticking / `FrameTick` guidance
- C# connection callback signatures and codegen language spelling
- deterministic schedule-table sample time
- client frame ticking troubleshooting
- corrected `spacetime generate` usage for Unreal bindings
- TypeScript framework integration reference updates for SolidJS and
current React query-builder tuple usage

## Validation

- `rg -n '^(<<<<<<<|=======|>>>>>>>)' docs crates skills`
- `git diff --check origin/master...HEAD`
- `pnpm --dir docs typecheck`
- `pnpm --dir docs build`

---------

Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: rain <rain@rain.local>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>

2026-06-05 15:43:27 +00:00

bradleyshep

be86a512f2

LLM Benchmark Improvements + More Evals (#4740 )

# Description of Changes

LLM benchmark infrastructure improvements and new benchmark tasks.

**Runner & scoring:**
- Add retry logic with backoff for LLM API calls (rate limits,
502/503/504, timeouts)
- Fix `generation_duration_ms` to only time the successful attempt, not
retries+sleep delays
- Add `--dry-run` flag to run benchmarks without saving results
- Add OpenRouter client as unified fallback when direct vendor keys
aren't set
- Add web search mode via OpenRouter `:online` suffix
- Extract shared OpenAI-compatible response types into `oa_compat.rs`
- Add `ReducerCallBothScorer` for calling reducers on both golden and
LLM databases
- Set `max_tokens` on OpenRouter and Meta clients to prevent silent
truncation

**Model routing:**
- Add `ModelRoute` with display name, vendor, API model, and OpenRouter
model ID
- Support ad-hoc model IDs via `--models vendor:model` without static
registration
- Add model name normalization (OpenRouter IDs, case variants →
canonical display names)

**Context modes:**
- Add `guidelines`, `cursor_rules`, `search`, `no_context` modes with
`is_empty_context_mode()` helper
- Add mode-specific prompt preambles
- Consolidate mode alias normalization (`none`/`no_guidelines` →
`no_context`)

**CI workflows:**
- Add `llm-benchmark-periodic.yml` for scheduled nightly runs with
per-language failure tracking
- **Note**: The periodic workflow requires `OPENROUTER_API_KEY`,
`LLM_BENCHMARK_UPLOAD_URL`, and `LLM_BENCHMARK_API_KEY` as GitHub
secrets.
- Add `llm-benchmark-validate-goldens.yml` for validating golden answers
still compile

**Results & summary:**
- Add `cmd_status` to show incomplete benchmark combinations with rerun
commands
- Add `cmd_analyze` for LLM-powered failure analysis
- Split `normalize_details_file` from `write_summary_from_details_file`
- Derive task categories from filesystem for summary generation
- Add timestamp tracking (`started_at`/`finished_at`) and token usage

**New benchmark tasks:**
- 30 new tasks across auth, data_modeling, queries, basics, and schema
categories
- Updated/fixed existing task prompts and golden answers

# API and ABI breaking changes

None. Internal tooling only.

# Expected complexity level and risk

2 — Changes are scoped to the LLM benchmark CLI tool
(`xtask-llm-benchmark`) and CI workflows. No impact on SpacetimeDB core.

# Testing

- [x] `cargo check -p xtask-llm-benchmark` — zero errors, zero warnings
- [x] Dry run: `llm_benchmark run --lang typescript --modes no_context
--tasks t_001 --models openai:gpt-5-mini --dry-run` — ran end-to-end,
confirmed no results saved to disk
- [ ] Verify periodic workflow runs successfully on next scheduled
trigger

---------

Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>

2026-05-11 22:53:24 +00:00

2 Commits