# Description of Changes
Makes the keynote benchmark job reusable so that it can be invoked and
run in other CI environments.
# API and ABI breaking changes
N/A
# Expected complexity level and risk
2
# Testing
Refactor. Relies on existing coverage.
# Description of Changes
Uses `jsonwebtoken v10.4.0` instead. Important changes include:
**1. Token serialization**
Old tokens with `"exp": null` are still accepted, but new no-expiry
tokens now omit `exp` instead of serializing it as `"exp": null`.
**2. OIDC/JWKS validation**
Issuer extraction now uses `jsonwebtoken::dangerous::insecure_decode`
for key discovery only, not validation. And the old `spacetimedb-jwks`
crate required every JWK to have a `kid`, but this patch does not
preserve that limitation.
# API and ABI breaking changes
I don't believe this is considered breaking, but it bears repeating that
new no-expiry tokens now serialize without `exp` instead of `"exp":
null`.
# Expected complexity level and risk
2
# Testing
- [x] Verify a legacy no-expiry token serialized as `"exp": null` still
validates.
- [x] Verify a token with an expired `exp` is still rejected.
- [x] Verify OIDC/JWKS validation works when the JWKS keys omit the
optional `kid` field.
# Description of Changes
We currently have discord notifications when PRs merge into master. We
have two problems:
1. These notifications don't run properly for external PRs (due to
missing secrets)
2. We don't get notifications for other pushes to master
This PR fixes those issues by changing the job to run on any push to
master rather than when a PR is closed.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
1
# Testing
I don't think we can test this without merging into master.
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
### Note 1: this requires a website PR to merge
### Note 2:
I was able to run all workflow smoke tests successfully, including
golden validation and dry-run benchmarks, except for the C# dry-run
benchmark path. C# golden validation passes, but the C# benchmark dry
run still fails intermittently/consistently on the runner despite
several attempts to align its build/publish setup with the known-good
smoketest path.
```
gh workflow run llm-benchmark-periodic.yml `
--repo ClockworkLabs/SpacetimeDB `
--ref bradley/fix-validate-goldens-ci `
-f model_set=explicit `
-f models="openrouter:openai/gpt-5.4-mini" `
-f languages=rust,csharp,typescript `
-f modes=guidelines `
-f tasks=t_000_empty_reducers `
-f dry_run=true
```
# Description of Changes
This updates the LLM benchmark automation and runner plumbing.
- Move periodic LLM benchmark and golden validation workflows from
daily/nightly to weekly Monday UTC runs.
- Add manual workflow inputs for benchmark smoke runs:
- model set: website-managed, local defaults, or explicit models
- languages, modes, categories, tasks
- dry-run mode
- Build the local TypeScript SDK before TypeScript benchmark/golden
validation runs.
- Add support for fetching active/available benchmark models from the
website API via `--model-source remote`.
- Keep explicit `--models ...` working for manual/local overrides.
- Add OpenRouter preflight checks before benchmark execution:
- checks key/account credits when available
- probes the selected model when credit balance cannot be checked
- supports `OPENROUTER_ALLOW_UNCHECKED_CREDITS=1` escape hatch
- supports `OPENROUTER_MIN_CREDITS` / `LLM_MIN_CREDITS`
- Force scheduled benchmark workflow runs through OpenRouter with
`LLM_VENDOR=openrouter`, while preserving direct OpenAI support for
local/manual use.
- Improve benchmark publishing isolation:
- isolated SpacetimeDB CLI root per publish
- serialized C# benchmark publish concurrency
- local NuGet package references for generated C# benchmark projects
- Windows/PATH handling for TypeScript `pnpm`
- Update default benchmark model routes to current model names/ids.
- Update TypeScript golden answers for current SDK shape.
# API and ABI breaking changes
None.
This adds benchmark-runner/workflow behavior and CLI options, but does
not change SpacetimeDB runtime API or ABI.
# Expected complexity level and risk
3/5
The changes are mostly isolated to the LLM benchmark runner and GitHub
workflows, but the risk is moderate because they touch CI execution
paths, local SDK build assumptions, website-managed model resolution,
OpenRouter routing, and generated module publish behavior across Rust,
C#, and TypeScript.
The most sensitive pieces are:
- GitHub Actions workflow dispatch/manual input behavior.
- Remote model registry parsing from the website.
- C# benchmark publish behavior on the self-hosted runner.
# Testing
- [x] `cargo check -p xtask-llm-benchmark --bin llm_benchmark`
- [x] `cargo test -p xtask-llm-benchmark --bin llm_benchmark`
- [x] `cargo test -p xtask-llm-benchmark
parses_active_available_model_routes`
- [x] Manual GitHub Actions golden validation smoke runs for Rust, C#,
and TypeScript.
- [ ] Run a dry-run periodic benchmark workflow from this branch with
one explicit OpenRouter model, one task, and all languages.
- [ ] Run a website-dispatched dry-run benchmark and verify it sends
`model_set=explicit` plus selected model/task inputs.
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@clockworklabs.io>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
# Description of Changes
Fixes the `Release Notifications` workflow startup failure seen in
<https://github.com/clockworklabs/SpacetimeDB/actions/runs/27775721225/workflow>.
The internal announcement job referenced `needs: on-release`, but no
`on-release` job exists in `.github/workflows/tag-release.yml`, so the
workflow failed before scheduling any jobs. This removes the dangling
dependency and gates the internal Discord announcement to real `release`
events so manual `workflow_dispatch` dry runs do not try to send an
internal release announcement using missing release-event fields.
# API and ABI breaking changes
None. This only changes GitHub Actions configuration.
# Expected complexity level and risk
1 - Low complexity. The change is limited to one workflow job
dependency/condition.
# Testing
- [x] Parsed `.github/workflows/tag-release.yml` as YAML.
- [x] Checked that all remaining `needs:` targets in
`.github/workflows/tag-release.yml` refer to existing jobs.
- [x] Ran `git diff --check`.
- [ ] Optional reviewer check: run the workflow manually with the
default dry-run inputs after merge.
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
Creates a new GitHub action that triggers any time a `Release` on the
SpacetimeDB repo changes to the `published` state.
When this triggers, the workflow will take information from that
release, and build a message from it, in the form of:
```
**SpacetimeDB ${RELEASE_TAG}**
View the full release notes:
${RELEASE_URL}
${RELEASE_BODY}
```
And send that message to the SpacetimeDB Public Discord Webhook.
Note: This PR itself does not setup or configure the Discord Webhook,
and relies on the Webhook URL already being available.
# API and ABI breaking changes
No API or ABI changes, this is only related to GitHub tooling.
# Expected complexity level and risk
1 - Low complexity. The only risk is in sending garbage messages to the
Discord URL if this automation is improperly configured.
# Testing
- [X] Ran a local version of similar code to test formatting. No testing
of this GitHub Action has been performed.
---------
Signed-off-by: Ryan <r.ekhoff@clockworklabs.io>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
# Description of Changes
Adds a merge-queue fast path for CI when the synthetic merge-group
commit has the same tree as the queued PR head.
The new `merge_queue_noop` job parses the PR number from the merge-group
ref, resolves the PR head SHA, and compares that tree to `GITHUB_SHA`.
If there is no diff, the expensive CI jobs are skipped as duplicate
work. Matrix jobs with required per-matrix check names get lightweight
no-op counterparts so branch protection still sees the expected
successful check names.
# API and ABI breaking changes
None.
# Expected complexity level and risk
2. This is limited to GitHub Actions wiring, but it interacts with merge
queue semantics and required check names. The implementation
intentionally falls back to normal CI if it cannot parse the PR number
or resolve the PR head.
# Testing
- [x] Parsed `.github/workflows/ci.yml` with Ruby YAML.
- [x] Ran `git diff --check`.
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
## What changed
Adds an explicit checkout step before `dorny/paths-filter` in the
Internal Tests workflow.
## Why
`dorny/paths-filter@v3` needs a git working tree for `push` events. The
Internal Tests workflow ran it before any checkout, so every `push` run
on `master` failed immediately in `Detect non-docs changes` with:
```text
fatal: not a git repository (or any of the parent directories): .git
```
This only showed up consistently on `master` because those runs are
`push` events. On `pull_request` events, `dorny/paths-filter` can use
the GitHub pull request files API with the PR number, so it does not
need a local checkout for the same file detection path.
Adding checkout gives the action a repository when it handles `push`
events, while leaving PR behavior unchanged.
## Testing
- `git diff --check`
- PR #5295 `Internal Tests` job completed `Checkout` and `Detect
non-docs changes` successfully, then moved on to private dispatch/wait.
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
We believe this docker build is completely unused. This docker container
is different than the docker build that we send out for releases which
is the one users actually end up using. This specific docker build used
to be for internal deploys but we have not used it in a very long time
now, probably more than a year or two.
# API and ABI breaking changes
None - this is just a CI change.
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
# Expected complexity level and risk
1 - just a CI change
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- Not tested but we sync'd on this in the discord and there were no
objections from the devops team or @joshua-spacetime .
---------
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
This removes the `spacetimedb-update` check specifically on arm. This
test doesn't have a whole lot of value because we're already covering
Linux + Windows on x86 and then macOS on aarch64. Removing this will
allow us to decom the arm runner.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None - just a CI change
# Expected complexity level and risk
1
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- I have not tested this but me and Zeke sync'd on this and we think it
makes sense.
# Description of Changes
Moves `RelationalDB` and related database code into a new
`spacetimedb-engine` crate.
The main motivation is to tighten dependency control around the engine
layer and isolate `RelationalDB`
behind a crate boundary.
- Majority of this PR is code-motion.
- Removes direct production dependence on `tokio` from
`spacetimedb-engine`.
- Keeps `tokio` only as a dev-dependency for test-only code in
`spacetimedb-engine`.
- This is intended to be a structural refactor only and should not
result in any functional change in
production.
- Adds a CI check to ensure `spacetimedb-engine` continues to compile in
simulation mode
# API and ABI breaking changes
NA
# Expected complexity level and risk
1.5.
# Testing
Existing tests should be enough.
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
# Description of Changes
Adding a stub version of this workflow so we can test it with
`workflow_dispatch`
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
# Expected complexity level and risk
1
# Testing
None, can't test until it's merged
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Restores `CLA Gate` as a repository-owned commit status on the actual
target SHA.
The merged workflow tried to use the Actions job result for `status`
events. Those runs are attached to the default-branch SHA, so a
`license/cla` status on a PR head can trigger the workflow without
creating any required context on the PR commit. This change mirrors the
`license/cla` result back to `CLA Gate` on the PR or merge-group SHA.
# API and ABI breaking changes
None.
# Expected complexity level and risk
1. This is a narrow workflow fix for the required CLA check context.
# Testing
- [x] Ruby YAML parse for `.github/workflows/cla-gate.yml`
- [x] `git diff --check`
- [ ] Confirm a new `license/cla` status posts `CLA Gate` on the same PR
head SHA
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
## Summary
- keep the CLA Gate workflow using `cargo ci cla-assistant status` for
CLA Assistant lookups
- stop mirroring pull request/status `license/cla` results into a custom
`CLA Gate` commit status
- use a plain `pull_request` trigger for PR checks, and keep merge-group
status publishing for synthetic queue SHAs
- let the Actions job pass when `license/cla` is success and fail when
it is missing or non-success
## Notes
Merge-group handling is left as-is. This PR intentionally keeps the
helper command and only changes the workflow behavior needed to avoid
the custom status mirror on PR/status events.
## Testing
- `cargo fmt --all`
- `cargo check -p ci`
- `cargo ci self-docs --check`
- Ruby YAML parse
- `git diff --check`
`actionlint` unavailable locally.
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
## Summary
- Add a Discord announcement job for published GitHub releases.
- Use `DISCORD_WEBHOOK_RELEASE_CHANNEL_URL` so the target channel can be
configured as a GitHub secret.
- Run the announcement job even if Docker `latest` retagging fails, and
include the retag result in the message.
## Verification
- `ruby -e 'require "yaml";
YAML.load_file(".github/workflows/tag-release.yml")'`
- `git diff --check`
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
## Summary
- Add a CLA Gate workflow that publishes a repository-owned commit
status.
- Mirror CLA Assistant license/cla status on pull requests and status
events.
- Publish CLA Gate=success on merge_group runs, because entries have
already passed PR checks before entering the queue.
## Required settings change after merge
- Remove license/cla from required checks.
- Add CLA Gate as the required CLA check.
This keeps CLA enforcement before merge queue while removing CLA
Assistant from the merge-group critical path.
## Test plan
- Parsed .github/workflows/cla-gate.yml with Ruby YAML loader.
- Ran git diff --check.
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
Moves `test_index_scans` to its own job that uses the same runner as the
keynote benchmark.
This test had several issues:
1. It was a performance regression test that didn't run in an isolated
environment because it was just a test.
2. It measured timings by search the module log for `ns`(nanosecond) and
`us`(microsecond) suffixes
As a result it would occasionally flake.
Now it runs on dedicated hardware in an isolated environment, so we
shouldn't see anymore flakes.
# API and ABI breaking changes
N/A
# Expected complexity level and risk
1
# Testing
N/A
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
The objective here is to get rid of the arm runner that we have deployed
which is very much underutilized and sometimes during the release is the
bottleneck because it can only run a small amount of jobs at any given
time. Instead, we will cross compile to ARM on our existing x86 github
runner fleet.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None - CI only change.
# Expected complexity level and risk
1 - CI only change
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
https://github.com/clockworklabs/SpacetimeDB/actions/runs/26833018052
# Description of Changes
(Moving this to a tools repo)
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
None
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
## Summary
- Add a small `Retry CLA Assistant` workflow for cases where
`license/cla` is missing or pending after the normal PR checks have gone
green.
- Keep the workflow thin: it checks out trusted default-branch code,
determines the PR number from the GitHub event, and runs `cargo ci
retry-cla-assistant --pr-number <number>`.
- Put the retry gate and CLA Assistant recheck call in the existing Rust
`tools/ci` crate.
- Trigger it from PR updates, any workflow completion, manual dispatch
with a PR number, and a 15-minute scheduled fallback sweep.
## Behavior
The workflow passes explicit PR numbers into the Rust command. For
scheduled runs, it enumerates open `master` PRs in the workflow and
invokes the Rust command for each one.
The `cargo ci retry-cla-assistant --pr-number <number>` command only
calls CLA Assistant's recheck endpoint when the PR is open, non-draft,
targets `master`, the head SHA is at least 10 minutes old, at least one
check run exists, all reported check runs are green, no non-CLA commit
status is non-green, and `license/cla` is missing/pending/failing.
The `workflow_run` trigger is intentionally unfiltered, so adding or
renaming CI workflows does not require changing this workflow.
It calls
`https://cla-assistant.io/check/{owner}/{repo}?pullRequest={number}` and
polls `license/cla` for up to 3 minutes.
## Safety
- The workflow uses `pull_request_target`, but it does not check out or
execute PR code. It checks out the trusted default branch before running
the CI tool.
- It ignores completion of the `Retry CLA Assistant` workflow itself.
- It intentionally does not forge a `license/cla` status.
## Validation
- Parsed the workflow YAML locally with Ruby/Psych.
- Ran `cargo fmt --package ci`.
- Ran `cargo check -p ci`.
- Ran `cargo ci retry-cla-assistant --help` and verified `--pr-number`
is required.
- Tested the first Rust version against live PR #5164 metadata and
caught that a hard-coded required-check list was too strict; replaced
that list with a reported-checks-all-green gate.
Addresses #5215.
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
Updates `llm-benchmark-periodic.yml` and
`llm-benchmark-validate-goldens.yml` to new CI infrastructure.
- Switch to the current runner (`spacetimedb-new-runner-2`)
- Drop the dead `localhost:5000` container + `--privileged`
- Build the local SpacetimeDB server the benchmark harness needs, and
use that
same local CLI for publishing
# API and ABI breaking changes
None. CI-only.
# Expected complexity level and risk
1 — workflow-only, no production code.
# Testing
- [ ] Run both workflows via `workflow_dispatch` and confirm they pass
# Description of Changes
Set `RUST_BACKTRACE=full` for the CI jobs.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
None
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Making @joshua-spacetime a CODEOWNER of this instead of me.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
# Expected complexity level and risk
1
# Testing
None tbh
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
This is no longer used.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
- [x] CI passes
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
- Completes the Blackholio demo with username selection, leader board,
split mechanic, improved visuals...
- Add Godot tests
# API and ABI breaking changes
- Nothing
# Expected complexity level and risk
1. It just updates the Blackholio demo project
# Testing
- [X] Play the updated demo
- [X] Run the tests locally
# Description of Changes
Moving `Internal Tests` to its own workflow so it can be canceled and
re-run independently from the rest of CI.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
# Expected complexity level and risk
1
# Testing
- [x] `Internal Tests` succeed on this PR and appear to have run
properly
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
## Summary
- Add a `dorny/paths-filter` step to the `internal-tests` job that
checks for any changed file outside `docs/`.
- Gate the private-repo dispatch and the wait-for-completion steps on
that filter, so a docs-only PR no longer fires off the private CI
workflow.
- The job itself still runs and completes successfully, so any branch
protection requiring the `Internal Tests` check continues to be
satisfied.
## Test plan
- [ ] Open a docs-only follow-up PR and confirm `Internal Tests` reports
success without dispatching a private run.
- [ ] Open a PR touching `crates/` (or anything non-docs) and confirm
the private dispatch still fires as before.
# Description of Changes
Adds a new required ci check for keynote-2 benchmark regressions. The
test runs for 60s and fails if throughput < 300K TPS.
Note, this check will be flaky as long as it's running concurrently with
other CI jobs. It may need a dedicated runner/host machine. Although it
may be sufficient to only schedule one runner/VM to a single host
machine at a time. I'll need to sync with @jdetter to determine the best
way forward here.
UPDATE: We're using a dedicated runner. See the **Testing** section.
# API and ABI breaking changes
N/A
# Expected complexity level and risk
2
Mainly copy-paste from the other CI workflows.
# Testing
This job now uses `spacetimedb-benchmark-runner` which is entirely
dedicated to this one CI job. I've tested this at different times of the
day when the CI runners are under load and not. The performance is
consistent and the test isn't flaky. It has passed every time.
# Description of Changes
Very small additions to the C# SDK specific for Godot.
I wrote the Godot Blackholio tutorial and updated the Unity one.
I added the image assets necessary for the tutorial.
I added the files for the Godot demo.
# API and ABI breaking changes
No breaking changes.
# Expected complexity level and risk
1. There's really no risk for current systems or projects, it's all new
additions to support nicely(-ish) Godot, new Blackholio demo for Godot
and updates to the tutorials.
# Testing
- [X] Follow the tutorial and verify everything works.
- [X] Build for Windows and verify it works.
- [x] Build for Linux and verify it works.
- [x] Build SDK
- [x] Publish to Nuget
---------
Co-authored-by: rekhoff <r.ekhoff@clockworklabs.io>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Use an AWS bucket for client binaries instead of DigitalOcean.
Note that this will effectively hamstring the DigitalOcean mirror for
any old clients (i.e. ones before this change is released). But it's
only a mirror, and they can "fix" it by upgrading to a version with this
change.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
- [x] Manually ran the package job on this PR and confirmed that some of
the resulting URLs are indeed downloadable from the AWS urls
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Due to the relatively frequent supply chain attacks on especially npm
packages, we're instituting a minimum package age in the whole repo.
- Globally set a minimum npm package age in CI
- Best-effort set npm package age using `.npmrc` beside any
`package.json`
- Add CI checks that pnpm version and minimum package age values are the
same everywhere
# API and ABI breaking changes
None
# Expected complexity level and risk
2
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] CI passes
- [x] if I remove a `.npmrc` then `cargo ci lint` fails
- [x] if I change a value in `.npmrc` then `cargo ci lint` fails
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
LLM benchmark infrastructure improvements and new benchmark tasks.
**Runner & scoring:**
- Add retry logic with backoff for LLM API calls (rate limits,
502/503/504, timeouts)
- Fix `generation_duration_ms` to only time the successful attempt, not
retries+sleep delays
- Add `--dry-run` flag to run benchmarks without saving results
- Add OpenRouter client as unified fallback when direct vendor keys
aren't set
- Add web search mode via OpenRouter `:online` suffix
- Extract shared OpenAI-compatible response types into `oa_compat.rs`
- Add `ReducerCallBothScorer` for calling reducers on both golden and
LLM databases
- Set `max_tokens` on OpenRouter and Meta clients to prevent silent
truncation
**Model routing:**
- Add `ModelRoute` with display name, vendor, API model, and OpenRouter
model ID
- Support ad-hoc model IDs via `--models vendor:model` without static
registration
- Add model name normalization (OpenRouter IDs, case variants →
canonical display names)
**Context modes:**
- Add `guidelines`, `cursor_rules`, `search`, `no_context` modes with
`is_empty_context_mode()` helper
- Add mode-specific prompt preambles
- Consolidate mode alias normalization (`none`/`no_guidelines` →
`no_context`)
**CI workflows:**
- Add `llm-benchmark-periodic.yml` for scheduled nightly runs with
per-language failure tracking
- **Note**: The periodic workflow requires `OPENROUTER_API_KEY`,
`LLM_BENCHMARK_UPLOAD_URL`, and `LLM_BENCHMARK_API_KEY` as GitHub
secrets.
- Add `llm-benchmark-validate-goldens.yml` for validating golden answers
still compile
**Results & summary:**
- Add `cmd_status` to show incomplete benchmark combinations with rerun
commands
- Add `cmd_analyze` for LLM-powered failure analysis
- Split `normalize_details_file` from `write_summary_from_details_file`
- Derive task categories from filesystem for summary generation
- Add timestamp tracking (`started_at`/`finished_at`) and token usage
**New benchmark tasks:**
- 30 new tasks across auth, data_modeling, queries, basics, and schema
categories
- Updated/fixed existing task prompts and golden answers
# API and ABI breaking changes
None. Internal tooling only.
# Expected complexity level and risk
2 — Changes are scoped to the LLM benchmark CLI tool
(`xtask-llm-benchmark`) and CI workflows. No impact on SpacetimeDB core.
# Testing
- [x] `cargo check -p xtask-llm-benchmark` — zero errors, zero warnings
- [x] Dry run: `llm_benchmark run --lang typescript --modes no_context
--tasks t_001 --models openai:gpt-5-mini --dry-run` — ran end-to-end,
confirmed no results saved to disk
- [ ] Verify periodic workflow runs successfully on next scheduled
trigger
---------
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
# Description of Changes
CI was running `gen-quickstart.sh` and then checking for a diff.. but it
was checking in the wrong directory.
I have also regenerated the files because the fixed check was failing.
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
- [x] CI passes
- [x] updated CI failed without the changes to the other files
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Merged the `upgrade-version-check.yml` into `ci.yml`, and moved the
business logic under `cargo ci`.
I would also be very open to just removing this test until we choose to
define a better suite of tests for `cargo bump-version`.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
1
# Testing
- [x] Ran it locally. It made a diff
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
https://github.com/clockworklabs/SpacetimeDB/pull/4231 changed our CI to
always pass a parameter corresponding to the PR number, which.. broke on
`master` commits since they don't have a PR number.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
1.
# Testing
I think I don't know how to test this. But it's basically the old
behavior on `master` commits, so it should work fine? One hopes?
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Add EV code signing for Windows CLI binaries using DigiCert KeyLocker.
The workflow now signs `spacetimedb-update.exe`, `spacetimedb-cli.exe`,
and `spacetimedb-standalone.exe` on tag pushes using `smctl sign` with a
cloud HSM-backed certificate.
These changes reflect the updated DigiCert guidance for code signing
through GitHub found here:
https://github.com/marketplace/actions/digicert-binary-signing
# API and ABI breaking changes
No API or ABI changes. This change only affects the CI/CD packaging
workflow.
# Expected complexity level and risk
1 - This PR only adds code signing to existing CI packaging. Risk is
limited to the Windows packaging step failing on tags; Linux and macOS
builds are unaffected.
# Testing
- [X] Tested via workflow dispatch on tag `test-signing-v0.0.1`
- [X] All three executables signed and verified successfully
- [X] Signature verification confirms certificate chain
- [X] Signed artifacts uploaded successfully
# Description of Changes
Remove the Python smoketests and the CI check that tests for edits.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
1
# Testing
- [ ] All CI passes
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Flipping on some new inputs to `Internal Tests` to get new
functionality.
I'll follow-up with a more detailed description in discord.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
2
# Testing
- [x] public PR without private PR just uses master
- [x] public PR with private PR uses that one
- [x] fails if private PR is not approved
- [x] fails if private PR does not pass its CI
- [x] passes if private PR is ready to go
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Revert the following PRs that have caused some breakage:
```
a32cffa76 Finish refactoring out replay (#4850)
d639be0af Replay: some code motion & reuse `ReplayCommittedState` (#4849)
78d6b6f7d Update NativeAOT-LLVM infrastructure to current ABI (#4515)
d5c1738c1 Better module backtraces for panics and whatnot (#577)
6f23b19f3 Wait for database update to become durable (#4846)
81c9eab86 Add `spacetime lock/unlock` to prevent accidental database deletion (#4502)
809aebd7c Move field `replay_table_updated` to `ReplayCommittedState` (#4807)
21b58ef99 Update axum (#2713)
b5cadff7a Extract replay stuff out of `CommittedState`, part 1 (#4804)
```
I also updated the Python smoketests for breakage introduced in
https://github.com/clockworklabs/SpacetimeDB/pull/4502. Reverting that
PR caused conflicts, so this fix is more straightforward.
# API and ABI breaking changes
Maybe kind of, but we haven't released any of these.
# Expected complexity level and risk
1
# Testing
Ask @bfops about testing
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Migrate these checks into `cargo ci`:
- Check that packages are publishable
- Docs test
- TypeScript - Tests
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
2
# Testing
- [ ] CI passes
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Removed some "if we're on windows" checks in the CI so that we're always
running through `cargo ci update-flow`.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
1.
# Testing
- [x] Upgrade flow tests pass
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Merged `typescript-test.yml` and `docs-test.yml` into `ci.yml`.
Note: The required checks will need to be updated when this PR is ready
to merge.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
1
# Testing
- [x] CI passes
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Invoke a private workflow when a PR merges, so that we can do extra
follow-up actions.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
2
# Testing
- [x] When a PR merged with a corresponding private PR, I got a discord
notification:
<img width="543" height="70" alt="image"
src="https://github.com/user-attachments/assets/209347c3-57be-47d7-8d75-6154c9e222cb"
/>
- [x] When a PR merged without a corresponding private PR, no discord
notification
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Merge the `TypeScript - Lint` CI job into `cargo ci lint`.
Note that this removes the custom caching for the pnpm store, but we're
planning to overhaul our CI cache approach anyway.
# API and ABI breaking changes
None. CI only.
# Expected complexity level and risk
1
# Testing
- [x] Lint step passes on this PR
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
## Summary
- Update the experimental NativeAOT-LLVM build path
(`EXPERIMENTAL_WASM_AOT=1`) to include all host function imports from
ABI versions 10.0 through 10.4
- Fix the compiler package reference to work on both Windows x64 and
Linux x64 (was hardcoded to Windows only)
- Add a CI smoketest to verify AOT builds work on Linux x64
## Context
See #4514 for the full writeup on the C# AOT situation. The
`wasi-experimental` workload that all C# module builds depend on is
deprecated and removed from .NET 9+. NativeAOT-LLVM is the recommended
path forward for ahead-of-time compilation of C# to WebAssembly.
The existing NativeAOT-LLVM support (added by RReverser in #713) was
stale: missing imports added since then and a Windows-only package
reference.
## Changes
**`SpacetimeDB.Runtime.targets`:**
- Add 10 missing `WasmImport` declarations across spacetime_10.0 through
10.4
- Replace `runtime.win-x64.Microsoft.DotNet.ILCompiler.LLVM` with
`runtime.$(NETCoreSdkPortableRuntimeIdentifier).Microsoft.DotNet.ILCompiler.LLVM`
so it resolves correctly on Linux x64 as well
- Use explicit version strings instead of the `$(SpacetimeNamespace)`
variable
**`ci.yml`:**
- Add AOT build smoketest step in the `csharp-testsuite` job
## Test plan
- [x] CI smoketest passes: `EXPERIMENTAL_WASM_AOT=1 dotnet publish -c
Release` builds successfully on Linux x64
- [ ] Existing C# tests continue to pass (no changes to the default
interpreter path)
---------
Signed-off-by: Ryan <r.ekhoff@clockworklabs.io>
Co-authored-by: Ryan <r.ekhoff@clockworklabs.io>
Co-authored-by: Jason Larabie <jason@clockworklabs.io>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>