5 Commits

Author SHA1 Message Date
bradleyshep efa6f382b1 LLM benchmark tool updates (#4413)
# Description of Changes

LLM benchmark updates for local development:

- **Local SDK paths**: Templates use relative paths to workspace crates
(`crates/bindings`, `crates/bindings-csharp`,
`crates/bindings-typescript`) instead of published packages, so the
bench runs against local SDK changes.
- **NODEJS_DIR support**: On Windows (e.g. nvm4w), if `pnpm` is not on
PATH, the bench uses `NODEJS_DIR` to locate `pnpm` and prepends it to
PATH for subprocesses.
- **Refactor**: Extracted `relative_to_workspace()` in `templates.rs`
and removed noisy `NODEJS_DIR` logging in `publishers.rs`.
- **Benchmark results**: Updated `docs/llms/llm-comparison-details.json`
and `docs/llms/llm-comparison-summary.json`.

# API and ABI breaking changes

None.

# Expected complexity level and risk

**2** — Local-only changes to the benchmark tool. Templates now require
local SDKs to be built (especially TypeScript: `pnpm build` in
`crates/bindings-typescript`). No impact on published SDKs or runtime.

# Testing

- [ ] Run `cargo llm run --lang rust --modes docs --providers openai`
from repo root
- [ ] Run TypeScript benchmarks with `pnpm build` in
`crates/bindings-typescript` first
- [ ] On Windows with nvm4w, set `NODEJS_DIR` if `pnpm` is not on PATH
and run TypeScript benchmarks

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: clockwork-labs-bot <bot@clockworklabs.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
2026-03-01 02:22:59 +00:00
Tyler Cloutier 3b9497e318 Empty commit basically (#4088)
Empty commit to fix llm benchmark.

---------

Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
2026-01-21 20:41:38 -05:00
Tyler Cloutier 73881e38f7 Further misc docs changes (#4029)
# Description of Changes

Major documentation overhaul focusing on tables, column types, and
indexes.

  **Quickstart Guides:**
- Updated React, TypeScript, Rust, and C# quickstarts with table/reducer
examples
  - Fixed CLI syntax (positional `--database` argument)
  - Improved template consistency across languages

  **Tables Documentation:**
- Added "Why Tables" section explaining table-oriented design philosophy
(tables as fundamental unit, system tables, data-oriented design
principles)
- Added "Physical and Logical Independence" section explaining how
subscription queries use the relational model independently of physical
storage
- Added brief sections linking to related pages (Visibility,
Constraints, Schedule Tables)
- Renamed "Scheduled Tables" to "Schedule Tables" throughout (tables
store schedules; reducers are scheduled)

  **Column Types:**
  - Split into dedicated page with unified type reference table
- Added "Representing Collections" section (Vec/Array vs table
tradeoffs)
  - Added "Binary Data and Files" section for Vec<u8> storage patterns
- Added "Type Performance" section (smaller types, fixed-size types,
column ordering for alignment)
  - Added complete example struct demonstrating all type categories
  - Renamed "Structured" category to "Composite"

  **Indexes:**
  - Complete rewrite with textbook-style documentation
  - Added "When to Use Indexes" guidance
- Documented single-column and multi-column index syntax (field-level
and table-level)
- Comprehensive range query examples with correct TypeScript `Range`
class syntax
  - Explained multi-column index prefix matching semantics
  - Added index-accelerated deletion examples
  - Included index design guidelines

  **Styling:**
  - Added CSS for table border radius and row separators
  - Created Check component for green checkmarks in tables

  # API and ABI breaking changes

  None. Documentation only.

  # Expected complexity level and risk

  1 - Documentation changes only, no code changes.

  # Testing

  - [ ] Verify docs build without errors
  - [ ] Review rendered pages for formatting issues
  - [ ] Confirm code examples are syntactically correct

---------

Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Signed-off-by: John Detter <4099508+jdetter@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
2026-01-17 17:44:58 +00:00
Zeke Foppa 38ee9e89ba CI - Fix hint for fixing llm benchmarks (#4040)
# Description of Changes

I believe that local users do not have API tokens for OpenAI, so the
existing hint was not helpful. Apparently the correct path is to post
`/update-llm-benchmark` on the PR and let the CI take care of it.

# API and ABI breaking changes

None

# Expected complexity level and risk

1

# Testing

None

---------

Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
2026-01-16 23:15:32 +00:00
bradleyshep b75bf6decf LLM Benchmarking (#3486)
# Description of Changes

Introduce a new **LLM benchmarking app** and supporting code.

* **CLI:** `llm` with subcommands `run`, `routes list`, `diff`,
`ci-check`.
* **Runner:** executes globally numbered tasks; filters by `--lang`,
`--categories`, `--tasks`, `--providers`, `--models`.
* **Providers/clients:** route layer (`provider:model`) with HTTP LLM
Vendor clients; env-driven keys/base URLs.
* **Evaluation:** deterministic scorers (hash/equality, JSON
shape/count, light schema/reducer parity) with clear failure messages.
* **Results:** stable JSON schema; single-file HTML viewer to
inspect/filter/export CSV.
* **Build & guards:** build script for compile-time setup;
* **Docs:** `DEVELOP.md` includes `cargo llm …` usage.

This PR is the initial addition of the app and its modules (runner,
config, routes, prompt/segmentation, scorers, schema/types,
defaults/constants/paths/hashing/combine, publishers, spacetime guard,
HTML stats viewer).

### How it works
1. **Pick what to run**

* Choose tasks (`--tasks 0,7,12`), or a language (`--lang rust|csharp`),
or categories (`--categories basics,schema`).
   * Optionally limit vendors/models (`--providers …`, `--models …`).

2. **Resolve routes**

* Read env (API keys + base URLs) and build the active set (e.g.,
`openai:gpt-5`).

3. **Build context**

   * Start Spacetime
   * Publish golden answer modules
   * Prepare prompts and send to LLM model
   * Attempt to publish LLM module

4. **Execute calls**

* Run the selected tasks within each test against selected models and
languages.

5. **Score outputs**

* Apply deterministic scorers (hash/equality, JSON shape/count, simple
schema/reducer checks).
   * Record the score and any short failure reason.

6. **Update results file**

* Write/update the single results JSON with task/route outcomes,
timings, and summaries.


# API and ABI breaking changes

None. New application and modules; no existing public APIs/ABIs altered.

# Expected complexity level and risk

**4/5.** New CLI, routing, evaluation, and artifact format.

* External model APIs may rate-limit/timeout; concurrency tunable via
`LLM_BENCH_CONCURRENCY` / `LLM_BENCH_ROUTE_CONCURRENCY`.

# Testing

I ran the full test matrix and generated results for every task against
every vendor, model, and language (rust + C#). I also tested the CI
check locally using [act](https://github.com/nektos/act).

**Please verify**

* [ ] `llm run --tasks 0,1,2` (explicit `run`)
* [ ] `llm run --lang rust --categories basics` (filters)
* [ ] `llm run --categories basics,schema` (multiple categories)
* [ ] `llm run --lang csharp` (language switch)
* [ ] `llm run --providers openai,anthropic --models "openai:gpt-5
anthropic:claude-sonnet-4-5"` (provider/model limits)
* [ ] `llm run --hash-only` (dry integrity)
* [ ] `llm run --goldens-only` (test goldens only)
* [ ] `llm run --force` (skip hash check)
* [ ] `llm ci-check`
* [ ] Stats viewer loads the JSON; filtering and CSV export work
* [ ] CI works as intended

---------

Signed-off-by: bradleyshep <148254416+bradleyshep@users.noreply.github.com>
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@aol.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: spacetimedb-bot <spacetimedb-bot@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
2026-01-06 22:22:57 +00:00