mirror of https://github.com/clockworklabs/SpacetimeDB.git synced 2026-05-10 01:30:37 -04:00

Files

T

bradleyshep b75bf6decf LLM Benchmarking (#3486 )

# Description of Changes

Introduce a new **LLM benchmarking app** and supporting code.

* **CLI:** `llm` with subcommands `run`, `routes list`, `diff`,
`ci-check`.
* **Runner:** executes globally numbered tasks; filters by `--lang`,
`--categories`, `--tasks`, `--providers`, `--models`.
* **Providers/clients:** route layer (`provider:model`) with HTTP LLM
Vendor clients; env-driven keys/base URLs.
* **Evaluation:** deterministic scorers (hash/equality, JSON
shape/count, light schema/reducer parity) with clear failure messages.
* **Results:** stable JSON schema; single-file HTML viewer to
inspect/filter/export CSV.
* **Build & guards:** build script for compile-time setup;
* **Docs:** `DEVELOP.md` includes `cargo llm …` usage.

This PR is the initial addition of the app and its modules (runner,
config, routes, prompt/segmentation, scorers, schema/types,
defaults/constants/paths/hashing/combine, publishers, spacetime guard,
HTML stats viewer).

### How it works
1. **Pick what to run**

* Choose tasks (`--tasks 0,7,12`), or a language (`--lang rust|csharp`),
or categories (`--categories basics,schema`).
   * Optionally limit vendors/models (`--providers …`, `--models …`).

2. **Resolve routes**

* Read env (API keys + base URLs) and build the active set (e.g.,
`openai:gpt-5`).

3. **Build context**

   * Start Spacetime
   * Publish golden answer modules
   * Prepare prompts and send to LLM model
   * Attempt to publish LLM module

4. **Execute calls**

* Run the selected tasks within each test against selected models and
languages.

5. **Score outputs**

* Apply deterministic scorers (hash/equality, JSON shape/count, simple
schema/reducer checks).
   * Record the score and any short failure reason.

6. **Update results file**

* Write/update the single results JSON with task/route outcomes,
timings, and summaries.


# API and ABI breaking changes

None. New application and modules; no existing public APIs/ABIs altered.

# Expected complexity level and risk

**4/5.** New CLI, routing, evaluation, and artifact format.

* External model APIs may rate-limit/timeout; concurrency tunable via
`LLM_BENCH_CONCURRENCY` / `LLM_BENCH_ROUTE_CONCURRENCY`.

# Testing

I ran the full test matrix and generated results for every task against
every vendor, model, and language (rust + C#). I also tested the CI
check locally using [act](https://github.com/nektos/act).

**Please verify**

* [ ] `llm run --tasks 0,1,2` (explicit `run`)
* [ ] `llm run --lang rust --categories basics` (filters)
* [ ] `llm run --categories basics,schema` (multiple categories)
* [ ] `llm run --lang csharp` (language switch)
* [ ] `llm run --providers openai,anthropic --models "openai:gpt-5
anthropic:claude-sonnet-4-5"` (provider/model limits)
* [ ] `llm run --hash-only` (dry integrity)
* [ ] `llm run --goldens-only` (test goldens only)
* [ ] `llm run --force` (skip hash check)
* [ ] `llm ci-check`
* [ ] Stats viewer loads the JSON; filtering and CSV export work
* [ ] CI works as intended

---------

Signed-off-by: bradleyshep <148254416+bradleyshep@users.noreply.github.com>
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@aol.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: spacetimedb-bot <spacetimedb-bot@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>

2026-01-06 22:22:57 +00:00

.cursor/rules

spacetime init rewrite (#3366 )

2025-10-30 04:26:08 +00:00

docs

Make /v1/database/:name/call/:func call procedures too, remove procedure route (#3883 )

2025-12-31 23:31:02 +00:00

llms

LLM Benchmarking (#3486 )

2026-01-06 22:22:57 +00:00

scripts

Refactor /docs to close in on the final form (#3917 )

2025-12-23 15:06:57 +00:00

src

Refactor /docs to close in on the final form (#3917 )

2025-12-23 15:06:57 +00:00

static

Added and tested procedure docs for Unreal C++ & Unreal Blueprint (#3810 )

2025-12-03 19:47:25 +00:00

.editorconfig

Docusaurus migration (#3343 )

2025-10-24 14:36:38 +00:00

.gitignore

Docusaurus migration (#3343 )

2025-10-24 14:36:38 +00:00

DEVELOP.md

LLM Benchmarking (#3486 )

2026-01-06 22:22:57 +00:00

docusaurus.config.ts

Fix broken SpacetimeDB logo in docs (#3886 )

2025-12-16 20:17:27 +00:00

LICENSE.txt

Docusaurus migration (#3343 )

2025-10-24 14:36:38 +00:00

package.json

Fix CLI reference generation (#3403 )

2025-12-16 20:17:51 +00:00

pnpm-lock.yaml

Docusaurus migration (#3343 )

2025-10-24 14:36:38 +00:00

README.md

Fix CLI reference generation (#3403 )

2025-12-16 20:17:51 +00:00

sidebars.ts

Docusaurus migration (#3343 )

2025-10-24 14:36:38 +00:00

STYLE.md

Docusaurus migration (#3343 )

2025-10-24 14:36:38 +00:00

tsconfig.json

First pass at reorganizing the docs and making them nice (#3494 )

2025-11-26 15:06:02 +00:00

README.md

SpacetimeDB Documentation

This repository contains the markdown files which are used to display documentation on our website. This documentation is built using Docusaurus.

Making Edits

To make changes to our docs, you can open a pull request in this repository. You can typically edit the files directly using the GitHub web interface, but you can also clone our repository and make your edits locally.

Instructions

Fork our repository
Clone your fork:

git clone ssh://git@github.com/<username>/SpacetimeDB
cd SpacetimeDB/docs

Make your edits to the docs that you want to make + test them locally (See Testing Locally)
Commit your changes:

git add .
git commit -m "A specific description of the changes I made and why"

Push your changes to your fork as a branch

git checkout -b a-branch-name-that-describes-my-change
git push -u origin a-branch-name-that-describes-my-change

Go to our GitHub and open a PR that references your branch in your fork on your GitHub

CLI Reference Section

To regenerate the CLI reference section, run pnpm generate-cli-docs.

Docusaurus Documentation

For more information on how to use Docusaurus, see the Docusaurus documentation.

Testing Locally

Installation

Make sure you have Node.js installed (version 22 or higher is recommended).
Clone the repository and navigate to the docs directory.
Install the dependencies: pnpm install
Run the development server: pnpm dev, which will start a local server and open a browser window. All changes you make to the markdown files will be reflected live in the browser.

Adding new pages

All of our directory and file names are prefixed with a five-digit number which determines how they're sorted. We started with the hundreds place as the smallest significant digit, to allow using the tens and ones places to add new pages between. When adding a new page in between two existing pages, choose a number which:

Doesn't use any more significant figures than it needs to.
Is approximately halfway between the previous and next page.

For example, if you want to add a new page between 00300-foo and 00400-bar, name it 00350-baz. To add a new page between 00350-baz and 00400-bar, prefer 00370-quux or 00380-quux, rather than 00375-quux, to avoid populating the ones place.

To add a new page after all previous pages, use the smallest multiple of 100 larger than all other pages. For example, if the highest-numbered existing page is 01350-abc, create 01400-def.

License

This documentation repository is licensed under Apache 2.0. See LICENSE.txt for more details