# Description of Changes
(jdetter): Github marked Ryan's PR as merged because I rebased it
without updating the base branch first - my bad.
Original PR: https://github.com/clockworklabs/SpacetimeDB/pull/4120
This PR fixes C#-related CI failures when testing against `v1.12.0`
before the corresponding NuGet packages are available on nuget.org.
* CI: Ensure local C# NuGet packages are used consistently
* Pack `crates/bindings-csharp/{BSATN.Runtime,Runtime}` with `-c
Release` so the configured local package sources (`bin/Release`)
actually contain the `.nupkg`s.
* Run `./sdks/csharp/tools~/write-nuget-config.sh` in CI to generate a
`NuGet.Config` that maps `SpacetimeDB.BSATN.Runtime` /
`SpacetimeDB.Runtime` to those local sources (with `nuget.org` as
fallback for non-SpacetimeDB dependencies).
* Add an explicit `dotnet restore --configfile NuGet.Config` step before
tests, and `run dotnet test --no-restore`, so restore always uses the
intended config and does not “accidentally” restore from `nuget.org`.
* `write-nuget-config.sh`: Make NuGet config discovery + mapping
deterministic
* Write `NuGet.Config` (capitalized) and include `<clear />` + an
explicit `nuget.org` source to avoid inherited sources and to keep
`PackageSourceMapping` functional.
* Also write a repo-root `NuGet.Config` so `dotnet publish/pack` invoked
from templates (outside `sdks/csharp/`) still discovers the same
override behavior.
* Smoketests quickstart: avoid `PackageSourceMapping` restore failures
* Ensure smoketest helper packing uses `Release`, and when repo-root
`NuGet.Config` exists, perform `dotnet restore --configfile ...` +
`dotnet pack --no-restore` so pack/restore are evaluated with the same
sources/mapping.
* When editing configs, prefer `NuGet.Config` casing for Linux discovery
and ensure `nuget.org` exists as a source for the fallback mapping.
# API and ABI breaking changes
No changes
# Expected complexity level and risk
1
# Testing
- [X] Locally tested concept after simulating local repro.
- [x] Confirmed CI passes in this branch
---------
Co-authored-by: rekhoff <r.ekhoff@clockworklabs.io>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
This fixes 2 issues with the upgrade version tool:
1. The typescript bindings need to be updated otherwise the typescript
test in CI will fail
2. The snapshots need to be updated
When the version upgrade tool check in CI runs, snapshot changes are
accepted automatically via the `--accept-snapshots` cli argument. When
you are running this tool locally without `--accept-snapshots` you will
be asked to manually review the snapshot changes before doing a final
test to make sure the snapshots are correct.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None
# Expected complexity level and risk
1 - this just updates the version upgrade tool
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Version bump 1.12.0 worked:
https://github.com/clockworklabs/SpacetimeDB/pull/4084
- [x] CI passes
---------
Signed-off-by: John Detter <4099508+jdetter@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
- Add smoketest_args matrix (--docker for Linux, --no-build-cli for Windows)
- Add excluded tests: -x clear_database replication teams
- Add Build crates step
- Add Docker daemon startup and database setup for Linux
- Add database startup for Windows
- Add container cleanup step
- Match Python version to 3.12 on Windows like master
- Rename existing smoketests job to use cargo smoketest (xtask) which
handles pre-building binaries, using nextest, and optimal parallelism
- Add cargo-nextest installation step for better parallel execution
- Add new "Smoketests (Python Legacy)" job that runs the old Python
smoketests alongside the Rust ones for comparison during transition
- Add warn-python-smoketests job that posts a PR comment when Python
smoketests are modified, encouraging use of Rust smoketests instead
- Update smoketests/README.md to note the transition to Rust smoketests
# Description of Changes
This PR renames the templates to always use shorthand for the language,
specify a framework (or console) if necessary, and shorten the naming in
general
# Expected complexity level and risk
1
# Testing
I've tested generating templates manually
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
@cloutiertyler when you fix the llm benchmarks feel free to revert this
PR.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None
# Expected complexity level and risk
0
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] CI runs even if llm benchmark fails
# Description of Changes
Adds TypeScript as a third language for LLM benchmark tests alongside
Rust and C#, and fixes table naming convention mismatches.
**TypeScript Support:**
- Added `Lang::TypeScript` variant with camelCase naming conventions
- Created TypeScript project template (`templates/typescript/server/`)
with package.json, tsconfig.json, and index.ts
- Added TypeScript publisher that uses `spacetime build` and `spacetime
publish`
- Created 22 TypeScript task prompts and golden answer files for all
benchmark tests
- Updated prompt discovery to find `tasks/typescript.txt` files
**Table Naming Fix:**
- Standardized on singular table names across all languages:
- Rust: `user` (snake_case singular)
- C#: `User` (PascalCase singular)
- TypeScript: `user` (camelCase singular)
- Updated `table_name()` helper to convert singular names to appropriate
case per language
- Updated all spec.rs files to use `table_name("user", lang)` instead of
hardcoded `"users"`
**CI/Hashing Improvements:**
- Added `compute_processed_context_hash()` for language-specific hash
computation after tab filtering
- Updated CI check to verify both `rustdoc_json` and `docs` modes for
Rust
- Fixed `--hash-only` mode to skip golden builds
# API and ABI breaking changes
None - these are internal benchmark tooling changes only.
# Expected complexity level and risk
**Complexity: 2**
The changes add a new language following existing patterns for Rust and
C#. The table naming fixes are straightforward find-and-replace style
updates. Low risk since this only affects the benchmark tooling, not the
core SpacetimeDB codebase.
# Testing
- [x] `cargo build -p xtask-llm-benchmark` compiles successfully
- [x] All 22 TypeScript golden modules build and publish successfully
- [x] Rust and C# benchmarks unaffected by changes
---------
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
- This will eliminate a lot of the test flakes with the unity testsuite.
A huge amount of the test flakes here were that we were trying to
acquire more than 2 unity license seats at a time which is not allowed.
This will have the largest impact when the runners are busy.
The downside here: This will cause jobs to wait which is what we want
right now. Jobs may wait to queue for several minutes while other jobs
finish but I don't expect this to have a big impact on the total time a
PR will spend in the merge queue.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None
# Expected complexity level and risk
1
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Unity CI passes even if multiple jobs are running at the same time
# Description of Changes
Major documentation overhaul focusing on tables, column types, and
indexes.
**Quickstart Guides:**
- Updated React, TypeScript, Rust, and C# quickstarts with table/reducer
examples
- Fixed CLI syntax (positional `--database` argument)
- Improved template consistency across languages
**Tables Documentation:**
- Added "Why Tables" section explaining table-oriented design philosophy
(tables as fundamental unit, system tables, data-oriented design
principles)
- Added "Physical and Logical Independence" section explaining how
subscription queries use the relational model independently of physical
storage
- Added brief sections linking to related pages (Visibility,
Constraints, Schedule Tables)
- Renamed "Scheduled Tables" to "Schedule Tables" throughout (tables
store schedules; reducers are scheduled)
**Column Types:**
- Split into dedicated page with unified type reference table
- Added "Representing Collections" section (Vec/Array vs table
tradeoffs)
- Added "Binary Data and Files" section for Vec<u8> storage patterns
- Added "Type Performance" section (smaller types, fixed-size types,
column ordering for alignment)
- Added complete example struct demonstrating all type categories
- Renamed "Structured" category to "Composite"
**Indexes:**
- Complete rewrite with textbook-style documentation
- Added "When to Use Indexes" guidance
- Documented single-column and multi-column index syntax (field-level
and table-level)
- Comprehensive range query examples with correct TypeScript `Range`
class syntax
- Explained multi-column index prefix matching semantics
- Added index-accelerated deletion examples
- Included index design guidelines
**Styling:**
- Added CSS for table border radius and row separators
- Created Check component for green checkmarks in tables
# API and ABI breaking changes
None. Documentation only.
# Expected complexity level and risk
1 - Documentation changes only, no code changes.
# Testing
- [ ] Verify docs build without errors
- [ ] Review rendered pages for formatting issues
- [ ] Confirm code examples are syntactically correct
---------
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Signed-off-by: John Detter <4099508+jdetter@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
This also has a few minor changes to fix build errors for the
`test-app`.
# Description of Changes
Updates the typescript-test CI job to build some packages that weren't
being built before.
This also updates the root-level `pnpm generate/format/lint/build`
commands to also apply to templates.
# Expected complexity level and risk
1
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Augmented the concurrency group to not have unrelated comments cancel
the `/update-llm-benchmark` command.
# API and ABI breaking changes
CI only
# Expected complexity level and risk
1
# Testing
I think this can only be tested once merged? Comments on this PR still
seem to use the old workflow.
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Make the GH username for this job correspond to an actual GH user, so we
can have it sign our CLA.
# API and ABI breaking changes
CI only
# Expected complexity level and risk
1
# Testing
I think we could test this by having clockworklabs-bot create an
external PR, if we think that's worth it. Alternatively we could merge
and test in prod.
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
We would like to move all of the templates to a central directory
# API and ABI breaking changes
None
# Expected complexity level and risk
2
# Testing
---------
Co-authored-by: spacetimedb-bot <spacetimedb-bot@users.noreply.github.com>
# Description of Changes
Introduce a new **LLM benchmarking app** and supporting code.
* **CLI:** `llm` with subcommands `run`, `routes list`, `diff`,
`ci-check`.
* **Runner:** executes globally numbered tasks; filters by `--lang`,
`--categories`, `--tasks`, `--providers`, `--models`.
* **Providers/clients:** route layer (`provider:model`) with HTTP LLM
Vendor clients; env-driven keys/base URLs.
* **Evaluation:** deterministic scorers (hash/equality, JSON
shape/count, light schema/reducer parity) with clear failure messages.
* **Results:** stable JSON schema; single-file HTML viewer to
inspect/filter/export CSV.
* **Build & guards:** build script for compile-time setup;
* **Docs:** `DEVELOP.md` includes `cargo llm …` usage.
This PR is the initial addition of the app and its modules (runner,
config, routes, prompt/segmentation, scorers, schema/types,
defaults/constants/paths/hashing/combine, publishers, spacetime guard,
HTML stats viewer).
### How it works
1. **Pick what to run**
* Choose tasks (`--tasks 0,7,12`), or a language (`--lang rust|csharp`),
or categories (`--categories basics,schema`).
* Optionally limit vendors/models (`--providers …`, `--models …`).
2. **Resolve routes**
* Read env (API keys + base URLs) and build the active set (e.g.,
`openai:gpt-5`).
3. **Build context**
* Start Spacetime
* Publish golden answer modules
* Prepare prompts and send to LLM model
* Attempt to publish LLM module
4. **Execute calls**
* Run the selected tasks within each test against selected models and
languages.
5. **Score outputs**
* Apply deterministic scorers (hash/equality, JSON shape/count, simple
schema/reducer checks).
* Record the score and any short failure reason.
6. **Update results file**
* Write/update the single results JSON with task/route outcomes,
timings, and summaries.
# API and ABI breaking changes
None. New application and modules; no existing public APIs/ABIs altered.
# Expected complexity level and risk
**4/5.** New CLI, routing, evaluation, and artifact format.
* External model APIs may rate-limit/timeout; concurrency tunable via
`LLM_BENCH_CONCURRENCY` / `LLM_BENCH_ROUTE_CONCURRENCY`.
# Testing
I ran the full test matrix and generated results for every task against
every vendor, model, and language (rust + C#). I also tested the CI
check locally using [act](https://github.com/nektos/act).
**Please verify**
* [ ] `llm run --tasks 0,1,2` (explicit `run`)
* [ ] `llm run --lang rust --categories basics` (filters)
* [ ] `llm run --categories basics,schema` (multiple categories)
* [ ] `llm run --lang csharp` (language switch)
* [ ] `llm run --providers openai,anthropic --models "openai:gpt-5
anthropic:claude-sonnet-4-5"` (provider/model limits)
* [ ] `llm run --hash-only` (dry integrity)
* [ ] `llm run --goldens-only` (test goldens only)
* [ ] `llm run --force` (skip hash check)
* [ ] `llm ci-check`
* [ ] Stats viewer loads the JSON; filtering and CSV export work
* [ ] CI works as intended
---------
Signed-off-by: bradleyshep <148254416+bradleyshep@users.noreply.github.com>
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@aol.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: spacetimedb-bot <spacetimedb-bot@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
# Description of Changes
Introduce a hacky workaround to our `csharp-testsuite` to address
missing `librusty_v8.a`: manually check for that file and manually build
the package if it's missing.
# API and ABI breaking changes
CI-only change
# Expected complexity level and risk
1
# Testing
- [x] Locally tested removing the `librusty_v8.a` and then running
`cargo clean -p v8 && cargo build -p v8`, and this does seem to repair
it.
- [x] The CI has run with a cache that is "broken", but successfully
passes `csharp-testsuite`
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Disable `cache-on-failure` as we think that it's contributing to
mysterious `rusty_v8` linker issues.
I bumped the prefix key so that PRs with this change don't share caches
with PRs missing this change.
# API and ABI breaking changes
CI only.
# Expected complexity level and risk
1
# Testing
None. We'll have to see if we stop having issues once this is merged.
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Make `cargo ci` work properly on Windows, in preparation for
https://github.com/clockworklabs/SpacetimeDB/pull/3702.
# API and ABI breaking changes
No. CI-only.
# Expected complexity level and risk
2. Not trivial, but not complicated.
# Testing
- [x] CI output seems to be genuinely running the tests, and it's
passing on Windows
- [x] Make a change to `crates/bindings-csharp` and see that `cargo ci
test` fails
- [x] I can manually run a minimal `cargo ci smoketests` invocation on a
Windows machine
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Kasama <robertoaall@gmail.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
We were using `rust-toolchain` in some places, and `rust-toolchain-file`
in others. I think Rust released a new version, which made the
`rust-toolchain` parts break with:
```
info: downloading component 'rustfmt'
info: removing previous version of component 'rust-src'
info: rolling back changes
error: could not rename component file from '/root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/term/terminfo/searcher' to '/root/.rustup/tmp/a4eo07uz83vsyfhk_dir/bk': Invalid cross-device link (os error 18)
Error: Process completed with exit code 1.
```
(Separately, this breakage is confusing.. we'll probably run into this
again when we roll forward our rust version..)
# API and ABI breaking changes
None. CI-only change.
# Expected complexity level and risk
1
# Testing
- [x] CI passes
- [x] There are no more instances of `rust-toolchain`:
```
$ grep -rIP 'rust-toolchain(?!-file)' .github/workflows
.github/CODEOWNERS:/rust-toolchain.toml @cloutiertyler
```
(on `master`, this finds the instances we changed)
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
This changes the ci runs to execute `cargo ci` instead of running
commands directly from the github workflow.
The goal here is to unify the commands under `cargo ci` so that it's
easier and more intuitive to run locally
# API and ABI breaking changes
There are no API/ABI changes.
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
# Expected complexity level and risk
Complexity: 1
It is not a complex change as it is mostly localized to the ci runs and
is easily reversible if something goes wrong. The biggest risk here is
to have future CI runs break, which can be remediated by reverting these
changes.
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] run `cargo ci` and its subcommands locally
- [x] run the github workflow against this branch to check if the CI
jobs are working properly.
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Signed-off-by: Roberto Pommella Alegro <robertoaall@gmail.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
This has 2 benefits:
1. If the Unity test fails because of a license issue then we don't have
to re-run the C# tests again as part of this flow. Re-running the Unity
tests will be much faster if that's the only thing the job is doing.
2. These tests will run faster because they will now run in parallel as
separate CI jobs.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None
# Expected complexity level and risk
1
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Both tests pass
---------
Signed-off-by: John Detter <4099508+jdetter@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
Sets a 2 hour timeout for the smoketests. If the smoketests are taking
more than 2 hours they likely will never finish and we should just time
them out.
# API and ABI breaking changes
None
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
# Expected complexity level and risk
1
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x]
https://github.com/clockworklabs/SpacetimeDB/actions/runs/19697510980/job/56425630374?pr=3772
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
Just moving more jobs to the custom runner.
|Test|Before|After|% Change|
|---|---|---|---|
|Docs - Build + Test|2m|1m|No change|
|Docs - Publish|2m|<2m|No change|
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None
# Expected complexity level and risk
1 - This is very low risk
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
I ran a publish test and it worked just fine:
<https://github.com/clockworklabs/SpacetimeDB/actions/runs/19515454265/job/55865908554>
- [x] CI still passes
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Move anything running on `spacetimedb-runner` to
`spacetimedb-new-runner`
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
- [x] CI passes
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Our automated discord posts have a more explicit call to action for the
PR author.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
Did a test post and it looks fine
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>