Commit Graph

19 Commits

Author SHA1 Message Date
Zeke Foppa 179b3e0c3e [tyler/translate-smoketests]: [REVERT] debugging changes 2026-01-29 11:55:04 -08:00
Zeke Foppa 92a404832e [tyler/translate-smoketests]: fix build 2026-01-29 10:25:38 -08:00
Zeke Foppa 89a76f8664 [tyler/translate-smoketests]: properly use pg_port 2026-01-28 15:42:26 -08:00
Zeke Foppa 6a47135d90 [tyler/translate-smoketests]: unused 2026-01-28 14:03:53 -08:00
Zeke Foppa 46e63e439c Update crates/guard/src/lib.rs
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
2026-01-28 13:48:42 -08:00
Zeke Foppa 6d59cb12a9 Update crates/guard/src/lib.rs
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
2026-01-28 13:45:53 -08:00
Tyler Cloutier db95efd38c Ran cargo fmt 2026-01-25 21:27:37 -05:00
= 5a8107f633 Add precompiled WASM modules for smoketests
Extract static smoketest modules into a nested workspace at
crates/smoketests/modules/ that is pre-compiled during warmup.
This eliminates per-test WASM compilation overhead.

Key changes:
- Add 38 precompiled module crates in nested workspace
- Add module registry (src/modules.rs) for WASM path lookup
- Add precompiled_module() builder and use_precompiled_module() method
- Update xtask warmup to build nested workspace
- Migrate all static tests to use precompiled modules
- Tests using precompiled modules run in ~0.5-3s vs ~4-7s before

Tests that need runtime compilation (auto_migration, detect_wasm_bindgen,
intentionally-broken modules) continue to use module_code().
2026-01-25 21:17:03 -05:00
= a2db6afa08 Fix clippy warnings in guard crate 2026-01-23 15:28:47 -05:00
Zeke Foppa 70bae5b94b [tyler/translate-smoketests]: lints 2026-01-23 12:24:09 -08:00
= 9835e1ee7f Add server restart smoketests
Translate server restart tests from smoketests/tests/zz_docker.py to Rust.
These tests verify SpacetimeDB behavior across server restarts:
- Data persistence (test_restart_module)
- SQL queries after restart (test_restart_sql)
- Client auto-disconnection (test_restart_auto_disconnect)
- Autoinc sequence integrity (test_add_remove_index_after_restart)

Infrastructure changes:
- Add data_dir and restart() to SpacetimeDbGuard
- Add restart_server() to Smoketest
- Consolidate duplicated kill/spawn logic into helpers
2026-01-23 15:10:44 -05:00
= 157a81434d cargo fmt --all 2026-01-23 14:09:32 -05:00
Zeke Foppa 8b6506bf5e [tyler/translate-smoketests]: more lints 2026-01-23 10:20:00 -08:00
= 1db4180fa3 Translate 5 more Python smoketests to Rust
Add test translations for:
- connect_disconnect_from_cli.rs - client connection callbacks
- domains.rs - database rename functionality
- client_connection_errors.rs - client_connected error handling
- confirmed_reads.rs - --confirmed flag for subscriptions/SQL
- create_project.rs - spacetime init command

Also fix subscription race condition by waiting for initial update
before returning from subscribe_background_*, matching Python behavior.
2026-01-23 02:00:34 -05:00
= 2d996f3127 Fix unnecessary rebuilds in ensure_binaries_built
Clear CARGO* environment variables (except CARGO_HOME) when spawning
child cargo build processes. When running under `cargo test`, cargo
sets env vars like CARGO_ENCODED_RUSTFLAGS that differ from a normal
build, causing child cargo processes to think they need to recompile.

This reduces single-test runtime from ~45s to ~18s by avoiding
redundant rebuilds of spacetimedb-standalone and spacetimedb-cli.
2026-01-23 01:38:10 -05:00
= 0df9a8b01d Add Rust smoketests crate with sql and call test translations
Create `crates/smoketests/` to translate Python smoketests to Rust:

- Add `Smoketest` struct with builder pattern for test setup
- Implement CLI helpers: `spacetime_cmd()`, `call()`, `sql()`, `logs()`, etc.
- Translate `smoketests/tests/sql.py` → `tests/sql.rs`
- Translate `smoketests/tests/call.py` → `tests/call.rs`
- Reuse `ensure_binaries_built()` from guard crate (now public)

Also fix Windows process cleanup in `SpacetimeDbGuard`:
- Use `taskkill /F /T /PID` to kill entire process tree
- Prevents orphaned `spacetimedb-standalone.exe` processes
2026-01-22 22:15:54 -05:00
= 921c76b012 Pre-build SpacetimeDB binaries once for tests
Use OnceLock to build spacetimedb-cli and spacetimedb-standalone once
per test process, then run the pre-built binary directly instead of
using `cargo run`. This avoids repeated cargo overhead and ensures
consistent binary reuse across parallel tests.
2026-01-22 21:25:11 -05:00
John Detter a9892aae0e Fix logic for ipv6 connections in is_port_available (#4005)
# Description of Changes

<!-- Please describe your change, mention any related tickets, and so on
here. -->

- Small fix for checking to see if a port is available on some given
interface.

updated:

The original implementation here used `bind` to try to discover if a
port is currently in use. This isn't reliable due to platform
differences - especially on windows where it's apparently acceptable to
have a service running on both `0.0.0.0:3000` and `127.0.0.1:3000`. This
would cause bind to return successfully when we wanted it to fail. Also:
binding on an ipv6 interface when a machine doesn't have ipv6 enabled
caused random errors and it was too unreliable to be useful.

This new implementation uses `get_socket_info` which returns info on all
sockets in use on the system. We can then look through this list to find
services which conflict with our requested port.

updated 1/14:

This PR now includes a fix for flaky CLI tests. Originally we were using
`find_free_port` to pick a random free port, but that was causing a race
condition which resulted in test flakes. This PR fixes this issue by
using `127.0.0.1:0` as the listen addr so the kernel will automatically
pick a free port for us.

# API and ABI breaking changes

<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->

None

# Expected complexity level and risk

<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.

This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->

1 - this is a pretty isolated check, unlikely to introduce larger
issues.

# Testing

<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->

I tested on macos, windows and linux:
```
ALLOW
docusaurus is already running on   127.0.0.1:3000
SpacetimeDB then tries to start on 192.168.1.10:3000

ALLOW
docusaurus is already running on   ::1:3000
SpacetimeDB then tries to start on 192.168.1.10:3000

DENY
docusaurus is already running on   ::1:3000
SpacetimeDB then tries to start on 127.0.0.1:3000

DENY
docusaurus is already running on   0:0:0:0:0:0:0:0:3000
SpacetimeDB then tries to start on 0.0.0.0:3000

DENY
docusaurus is already running on   0:0:0:0:0:0:0:0:3000
SpacetimeDB then tries to start on 127.0.0.1:3000

DENY
docusaurus is already running on   0:0:0:0:0:0:0:0:3000
SpacetimeDB then tries to start on 192.168.1.10:3000

DENY
docusaurus is already running on   127.0.0.1:3000
SpacetimeDB then tries to start on 0:0:0:0:0:0:0:0:3000

DENY
docusaurus is already running on   192.168.1.10:3000
SpacetimeDB then tries to start on 0:0:0:0:0:0:0:0:3000
```
2026-01-15 04:07:16 +00:00
bradleyshep b75bf6decf LLM Benchmarking (#3486)
# Description of Changes

Introduce a new **LLM benchmarking app** and supporting code.

* **CLI:** `llm` with subcommands `run`, `routes list`, `diff`,
`ci-check`.
* **Runner:** executes globally numbered tasks; filters by `--lang`,
`--categories`, `--tasks`, `--providers`, `--models`.
* **Providers/clients:** route layer (`provider:model`) with HTTP LLM
Vendor clients; env-driven keys/base URLs.
* **Evaluation:** deterministic scorers (hash/equality, JSON
shape/count, light schema/reducer parity) with clear failure messages.
* **Results:** stable JSON schema; single-file HTML viewer to
inspect/filter/export CSV.
* **Build & guards:** build script for compile-time setup;
* **Docs:** `DEVELOP.md` includes `cargo llm …` usage.

This PR is the initial addition of the app and its modules (runner,
config, routes, prompt/segmentation, scorers, schema/types,
defaults/constants/paths/hashing/combine, publishers, spacetime guard,
HTML stats viewer).

### How it works
1. **Pick what to run**

* Choose tasks (`--tasks 0,7,12`), or a language (`--lang rust|csharp`),
or categories (`--categories basics,schema`).
   * Optionally limit vendors/models (`--providers …`, `--models …`).

2. **Resolve routes**

* Read env (API keys + base URLs) and build the active set (e.g.,
`openai:gpt-5`).

3. **Build context**

   * Start Spacetime
   * Publish golden answer modules
   * Prepare prompts and send to LLM model
   * Attempt to publish LLM module

4. **Execute calls**

* Run the selected tasks within each test against selected models and
languages.

5. **Score outputs**

* Apply deterministic scorers (hash/equality, JSON shape/count, simple
schema/reducer checks).
   * Record the score and any short failure reason.

6. **Update results file**

* Write/update the single results JSON with task/route outcomes,
timings, and summaries.


# API and ABI breaking changes

None. New application and modules; no existing public APIs/ABIs altered.

# Expected complexity level and risk

**4/5.** New CLI, routing, evaluation, and artifact format.

* External model APIs may rate-limit/timeout; concurrency tunable via
`LLM_BENCH_CONCURRENCY` / `LLM_BENCH_ROUTE_CONCURRENCY`.

# Testing

I ran the full test matrix and generated results for every task against
every vendor, model, and language (rust + C#). I also tested the CI
check locally using [act](https://github.com/nektos/act).

**Please verify**

* [ ] `llm run --tasks 0,1,2` (explicit `run`)
* [ ] `llm run --lang rust --categories basics` (filters)
* [ ] `llm run --categories basics,schema` (multiple categories)
* [ ] `llm run --lang csharp` (language switch)
* [ ] `llm run --providers openai,anthropic --models "openai:gpt-5
anthropic:claude-sonnet-4-5"` (provider/model limits)
* [ ] `llm run --hash-only` (dry integrity)
* [ ] `llm run --goldens-only` (test goldens only)
* [ ] `llm run --force` (skip hash check)
* [ ] `llm ci-check`
* [ ] Stats viewer loads the JSON; filtering and CSV export work
* [ ] CI works as intended

---------

Signed-off-by: bradleyshep <148254416+bradleyshep@users.noreply.github.com>
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@aol.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: spacetimedb-bot <spacetimedb-bot@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
2026-01-06 22:22:57 +00:00