SpacetimeDB

mirror of https://github.com/clockworklabs/SpacetimeDB.git synced 2026-05-06 15:49:35 -04:00

Author	SHA1	Message	Date
bradleyshep	efa6f382b1	LLM benchmark tool updates (#4413 ) # Description of Changes LLM benchmark updates for local development: - Local SDK paths: Templates use relative paths to workspace crates (`crates/bindings`, `crates/bindings-csharp`, `crates/bindings-typescript`) instead of published packages, so the bench runs against local SDK changes. - NODEJS_DIR support: On Windows (e.g. nvm4w), if `pnpm` is not on PATH, the bench uses `NODEJS_DIR` to locate `pnpm` and prepends it to PATH for subprocesses. - Refactor: Extracted `relative_to_workspace()` in `templates.rs` and removed noisy `NODEJS_DIR` logging in `publishers.rs`. - Benchmark results: Updated `docs/llms/llm-comparison-details.json` and `docs/llms/llm-comparison-summary.json`. # API and ABI breaking changes None. # Expected complexity level and risk 2 — Local-only changes to the benchmark tool. Templates now require local SDKs to be built (especially TypeScript: `pnpm build` in `crates/bindings-typescript`). No impact on published SDKs or runtime. # Testing - [ ] Run `cargo llm run --lang rust --modes docs --providers openai` from repo root - [ ] Run TypeScript benchmarks with `pnpm build` in `crates/bindings-typescript` first - [ ] On Windows with nvm4w, set `NODEJS_DIR` if `pnpm` is not on PATH and run TypeScript benchmarks --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: clockwork-labs-bot <bot@clockworklabs.com> Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>	2026-03-01 02:22:59 +00:00
Tyler Cloutier	3b9497e318	Empty commit basically (#4088 ) Empty commit to fix llm benchmark. --------- Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>	2026-01-21 20:41:38 -05:00
Tyler Cloutier	73881e38f7	Further misc docs changes (#4029 ) # Description of Changes Major documentation overhaul focusing on tables, column types, and indexes. Quickstart Guides: - Updated React, TypeScript, Rust, and C# quickstarts with table/reducer examples - Fixed CLI syntax (positional `--database` argument) - Improved template consistency across languages Tables Documentation: - Added "Why Tables" section explaining table-oriented design philosophy (tables as fundamental unit, system tables, data-oriented design principles) - Added "Physical and Logical Independence" section explaining how subscription queries use the relational model independently of physical storage - Added brief sections linking to related pages (Visibility, Constraints, Schedule Tables) - Renamed "Scheduled Tables" to "Schedule Tables" throughout (tables store schedules; reducers are scheduled) Column Types: - Split into dedicated page with unified type reference table - Added "Representing Collections" section (Vec/Array vs table tradeoffs) - Added "Binary Data and Files" section for Vec<u8> storage patterns - Added "Type Performance" section (smaller types, fixed-size types, column ordering for alignment) - Added complete example struct demonstrating all type categories - Renamed "Structured" category to "Composite" Indexes: - Complete rewrite with textbook-style documentation - Added "When to Use Indexes" guidance - Documented single-column and multi-column index syntax (field-level and table-level) - Comprehensive range query examples with correct TypeScript `Range` class syntax - Explained multi-column index prefix matching semantics - Added index-accelerated deletion examples - Included index design guidelines Styling: - Added CSS for table border radius and row separators - Created Check component for green checkmarks in tables # API and ABI breaking changes None. Documentation only. # Expected complexity level and risk 1 - Documentation changes only, no code changes. # Testing - [ ] Verify docs build without errors - [ ] Review rendered pages for formatting issues - [ ] Confirm code examples are syntactically correct --------- Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com> Signed-off-by: John Detter <4099508+jdetter@users.noreply.github.com> Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com> Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>	2026-01-17 17:44:58 +00:00
Zeke Foppa	38ee9e89ba	CI - Fix hint for fixing llm benchmarks (#4040 ) # Description of Changes I believe that local users do not have API tokens for OpenAI, so the existing hint was not helpful. Apparently the correct path is to post `/update-llm-benchmark` on the PR and let the CI take care of it. # API and ABI breaking changes None # Expected complexity level and risk 1 # Testing None --------- Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>	2026-01-16 23:15:32 +00:00
bradleyshep	b75bf6decf	LLM Benchmarking (#3486 ) # Description of Changes Introduce a new LLM benchmarking app and supporting code. * CLI: `llm` with subcommands `run`, `routes list`, `diff`, `ci-check`. * Runner: executes globally numbered tasks; filters by `--lang`, `--categories`, `--tasks`, `--providers`, `--models`. * Providers/clients: route layer (`provider:model`) with HTTP LLM Vendor clients; env-driven keys/base URLs. * Evaluation: deterministic scorers (hash/equality, JSON shape/count, light schema/reducer parity) with clear failure messages. * Results: stable JSON schema; single-file HTML viewer to inspect/filter/export CSV. * Build & guards: build script for compile-time setup; * Docs: `DEVELOP.md` includes `cargo llm …` usage. This PR is the initial addition of the app and its modules (runner, config, routes, prompt/segmentation, scorers, schema/types, defaults/constants/paths/hashing/combine, publishers, spacetime guard, HTML stats viewer). ### How it works 1. Pick what to run * Choose tasks (`--tasks 0,7,12`), or a language (`--lang rust\|csharp`), or categories (`--categories basics,schema`). * Optionally limit vendors/models (`--providers …`, `--models …`). 2. Resolve routes * Read env (API keys + base URLs) and build the active set (e.g., `openai:gpt-5`). 3. Build context * Start Spacetime * Publish golden answer modules * Prepare prompts and send to LLM model * Attempt to publish LLM module 4. Execute calls * Run the selected tasks within each test against selected models and languages. 5. Score outputs * Apply deterministic scorers (hash/equality, JSON shape/count, simple schema/reducer checks). * Record the score and any short failure reason. 6. Update results file * Write/update the single results JSON with task/route outcomes, timings, and summaries. # API and ABI breaking changes None. New application and modules; no existing public APIs/ABIs altered. # Expected complexity level and risk 4/5. New CLI, routing, evaluation, and artifact format. * External model APIs may rate-limit/timeout; concurrency tunable via `LLM_BENCH_CONCURRENCY` / `LLM_BENCH_ROUTE_CONCURRENCY`. # Testing I ran the full test matrix and generated results for every task against every vendor, model, and language (rust + C#). I also tested the CI check locally using [act](https://github.com/nektos/act). Please verify * [ ] `llm run --tasks 0,1,2` (explicit `run`) * [ ] `llm run --lang rust --categories basics` (filters) * [ ] `llm run --categories basics,schema` (multiple categories) * [ ] `llm run --lang csharp` (language switch) * [ ] `llm run --providers openai,anthropic --models "openai:gpt-5 anthropic:claude-sonnet-4-5"` (provider/model limits) * [ ] `llm run --hash-only` (dry integrity) * [ ] `llm run --goldens-only` (test goldens only) * [ ] `llm run --force` (skip hash check) * [ ] `llm ci-check` * [ ] Stats viewer loads the JSON; filtering and CSV export work * [ ] CI works as intended --------- Signed-off-by: bradleyshep <148254416+bradleyshep@users.noreply.github.com> Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Tyler Cloutier <cloutiertyler@aol.com> Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com> Co-authored-by: spacetimedb-bot <spacetimedb-bot@users.noreply.github.com> Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>	2026-01-06 22:22:57 +00:00

5 Commits