Files
bradleyshep fb0a458d4f LLM Benchmark: Sequential Upgrades Test (#4817)
# Description of Changes

AI app generation benchmark comparing SpacetimeDB vs PostgreSQL (Express
+ Socket.io + Drizzle ORM). Same AI model (Claude Sonnet 4.6), same
prompts, same chat app, two backends. Upgraded through 12 feature
levels, manually graded at each level, bugs fixed, all costs measured
via OpenTelemetry.

Results viewable at:
https://spacetimedb.com/llms-benchmark-sequential-upgrade

## Benchmark harness (`tools/llm-sequential-upgrade/`)

- `run.sh`: orchestrates headless Claude Code sessions for code
generation, sequential upgrades, and bug fixes. Tracks all API costs via
OTel. Supports `--upgrade`, `--fix`, `--composed-prompt`,
`--resume-session` modes.
- `grade.sh` / `grade-agents.sh`: grading harnesses for manual testing
of generated apps.
- `docker-compose.otel.yaml`: OTel collector + PostgreSQL services.
- `generate-report.mjs` / `parse-telemetry.mjs`: aggregate per-session
telemetry into cost reports.
- Backend guidelines in `backends/`: SpacetimeDB SDK reference, config
templates, server setup docs, PostgreSQL setup with Drizzle/Socket.io
guidance.
**After https://github.com/clockworklabs/SpacetimeDB/pull/4740 merges,
we will likely want to update this so that it reads backend and SDK
guidance from SKILLS**

## Two complete benchmark runs

**Run 1 (20260403):** Original methodology.
**Run 2 (20260406):** Refined methodology with domain bias removed from
SpacetimeDB SDK docs and PostgreSQL instructions made
feature-spec-neutral.
**Note: no meaningful changes in results were observed with these
changes. Domain familiarity biases were very small and almost certainly
not the cause of STDB's major gains over PG stack.**

Each run contains full L1-L12 app source for both backends, level
snapshots preserving state before each upgrade, and per-session OTel
cost summaries.

## 12 feature levels

| Level | Feature |
|---|---|
| L1 | Basic Chat + Typing + Read Receipts + Unread Counts |
| L2 | Scheduled Messages |
| L3 | Ephemeral Messages |
| L4 | Message Reactions |
| L5 | Message Editing with History |
| L6 | Real-Time Permissions (kick, ban, promote) |
| L7 | Rich User Presence |
| L8 | Message Threading |
| L9 | Private Rooms + Direct Messages |
| L10 | Room Activity Indicators |
| L11 | Draft Sync |
| L12 | Anonymous to Registered Migration |

## Results

| | Run 1 (20260403) | Run 2 (20260406) |
|---|---|---|
| **SpacetimeDB total cost** | $13.33 | $12.62 |
| **PostgreSQL total cost** | $17.80 | $19.68 |
| **SpacetimeDB bugs** | 5 | 2 |
| **PostgreSQL bugs** | 19 | 8 |
| **SpacetimeDB fix sessions** | 4 | 1 |
| **PostgreSQL fix sessions** | 17 | 10 |

Both runs agree: SpacetimeDB apps are cheaper to build, have fewer bugs,
and require fewer fix iterations. The refined methodology (Run 2)
widened the cost gap and **confirmed the advantage is structural, not an
artifact of domain-biased SDK docs.**

## Performance benchmark (`perf-benchmark/`)

Stress throughput tool that fires concurrent writers at peak saturation
against the AI-generated `send_message` handlers.

| Tier | SpacetimeDB (avg) | PostgreSQL (avg) | Ratio |
|---|---|---|---|
| AI-generated (as-shipped) | 5,267 msgs/sec | 694 msgs/sec | 7.6x |
| PG rate limit removed | 5,267 msgs/sec | 1,070 msgs/sec | 4.9x |
| Optimized (same features kept) | 25,278 msgs/sec | 1,139 msgs/sec |
22x |

The gap widens with optimization because SpacetimeDB's bottleneck is
fixable code patterns in the reducer while PostgreSQL's bottleneck is
architectural (sequential network round-trips to an external database).

Optimized reference code with all features preserved is in
`perf-benchmark/results/optimized-reference/`.

## Data handling

Per-session cost summaries (`cost-summary.json`, `COST_REPORT.md`,
`metadata.json`) are committed. Raw OTel telemetry
(`raw-telemetry.jsonl`) containing PII is excluded via `.gitignore` and
stored privately.

# API and ABI breaking changes

None. All changes are in `tools/llm-sequential-upgrade/`. No production
code, library, or SDK changes.

# Expected complexity level and risk

**1 - Trivial.** Self-contained benchmarking tooling and data. No
interaction with production code.

# Testing

- [x] L1-L12 upgrades completed on all 4 apps (2 backends x 2 runs) with
OTel cost capture
- [x] All levels manually graded after each upgrade; bugs filed and
fixed via the harness
- [x] Methodology refinement between runs validated (domain bias
removal, feature-neutral instructions)
- [x] Stress benchmarks run across both runs x 3 tiers (as-shipped,
rate-limit-removed, optimized)
- [x] Optimized benchmarks verified to preserve all original features
- [x] Sensitive data (PII in raw telemetry) removed from repo and
gitignored
- [ ] Reviewer: spot-check that METRICS_DATA.json / METRICS_REPORT.json
numbers match the telemetry cost-summary.json files

---------

Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
2026-06-10 16:37:33 +00:00

7.8 KiB

Backend: PostgreSQL

Instructions for generating, building, and deploying the PostgreSQL backend.

Do NOT read SpacetimeDB SDK rule files. This backend uses standard Node.js/TypeScript patterns.


Architecture

  • Server: Node.js + Express + Drizzle ORM + Socket.io
  • Client: React + Vite + TypeScript + Socket.io-client
  • Database: PostgreSQL (running in Docker)

The server handles:

  • REST API endpoints for CRUD operations
  • Socket.io for real-time events (messages, typing, presence, etc.)
  • Drizzle ORM for database queries
  • Session/identity management

PostgreSQL Connection

PostgreSQL is already running in a Docker container.

Parameter Value
Host localhost
Port 6432 (mapped from container 5432)
User spacetime
Password spacetime
Database spacetime
Container spacetime-web-postgres-1
Connection URL postgresql://spacetime:spacetime@localhost:6432/spacetime

Pre-flight Check

docker exec spacetime-web-postgres-1 psql -U spacetime -d spacetime -c "SELECT 1"

If PostgreSQL is not reachable, STOP and report the error.


Directory Structure

<app-dir>/
  server/
    package.json
    tsconfig.json
    drizzle.config.ts
    .env
    src/
      schema.ts      # Drizzle ORM table definitions
      index.ts       # Express server + Socket.io + routes
  client/
    package.json
    vite.config.ts
    tsconfig.json
    index.html
    src/
      main.tsx       # React entry point
      App.tsx        # Main application component
      styles.css     # Dark theme styling

Phase 1: Generate Server

Create the Express + Socket.io server:

  • server/package.json:

    {
      "name": "chat-server",
      "type": "module",
      "scripts": {
        "dev": "tsx watch src/index.ts",
        "start": "tsx src/index.ts"
      },
      "dependencies": {
        "express": "^4.18.2",
        "@types/express": "^4.17.21",
        "drizzle-orm": "^0.39.0",
        "pg": "^8.13.0",
        "@types/pg": "^8.11.0",
        "socket.io": "^4.7.4",
        "cors": "^2.8.5",
        "@types/cors": "^2.8.17",
        "dotenv": "^16.4.5",
        "drizzle-kit": "^0.30.0",
        "tsx": "^4.19.0",
        "typescript": "^5.4.0"
      }
    }
    
  • server/tsconfig.json:

    {
      "compilerOptions": {
        "target": "ES2022",
        "module": "ES2022",
        "moduleResolution": "bundler",
        "esModuleInterop": true,
        "strict": true,
        "outDir": "dist",
        "rootDir": "src",
        "skipLibCheck": true
      },
      "include": ["src/**/*"]
    }
    
  • server/.env:

    DATABASE_URL=postgresql://spacetime:spacetime@localhost:6432/spacetime
    PORT=6001
    
  • server/drizzle.config.ts:

    import { defineConfig } from 'drizzle-kit';
    
    export default defineConfig({
      schema: './src/schema.ts',
      out: './drizzle',
      dialect: 'postgresql',
      dbCredentials: {
        url: process.env.DATABASE_URL || 'postgresql://spacetime:spacetime@localhost:6432/spacetime',
      },
    });
    
  • server/src/schema.ts — Drizzle ORM table definitions for all features

  • server/src/index.ts — Express server with:

    • CORS configured for http://localhost:6273
    • Socket.io with CORS
    • REST endpoints for the app's resources (per the feature spec)
    • Socket.io events for real-time updates (per the feature spec)
    • Database queries via Drizzle ORM

Install and push schema:

cd <server-dir> && npm install
npx drizzle-kit push

Phase 2: (No bindings step)

Skip — PostgreSQL has no binding generation. The client calls REST/Socket.io APIs directly.


Phase 3: Generate Client

  • client/package.json:

    {
      "name": "chat-client",
      "type": "module",
      "scripts": {
        "dev": "vite",
        "build": "tsc -b && vite build"
      },
      "dependencies": {
        "react": "^18.3.1",
        "react-dom": "^18.3.1",
        "socket.io-client": "^4.7.4"
      },
      "devDependencies": {
        "@types/react": "^18.3.12",
        "@types/react-dom": "^18.3.1",
        "@vitejs/plugin-react": "^4.3.4",
        "typescript": "^5.4.0",
        "vite": "^6.0.0"
      }
    }
    
  • client/vite.config.ts — port 6273 (NOT 6173 — that's SpacetimeDB), proxy /api and /socket.io to http://localhost:6001

    import { defineConfig } from 'vite';
    import react from '@vitejs/plugin-react';
    
    export default defineConfig({
      plugins: [react()],
      server: {
        port: 6273,
        proxy: {
          '/api': 'http://localhost:6001',
          '/socket.io': {
            target: 'http://localhost:6001',
            ws: true,
          },
        },
      },
    });
    
  • client/tsconfig.json

  • client/index.html

  • client/src/main.tsx — React entry point

  • client/src/App.tsx — Main component using fetch('/api/...') + Socket.io client

  • client/src/styles.css — Dark theme styling

The client connects to the server via the Vite proxy — no hardcoded localhost:6001 in client code.

Critical: Initialize the socket.io client without a hardcoded URL so it routes through the Vite proxy (e.g. io() or io({ path: '/socket.io' })). Hardcoding http://localhost:6001 bypasses the proxy and breaks WebSocket upgrades.


Phase 4: Verify

# Server
cd <server-dir> && npm install && npx tsc --noEmit

# Client
cd <client-dir> && npm install && npx tsc --noEmit && npm run build

Both must pass. If either fails:

  1. Read the error
  2. Fix the code
  3. Retry (up to 3 attempts)
  4. Each fix counts as a reprompt — log it

Phase 5: Deploy

# Kill any existing servers
npx kill-port 6273 2>/dev/null || true
npx kill-port 6001 2>/dev/null || true

# Start the API server in background
cd <server-dir> && npx tsx src/index.ts &

# Wait for API server to be ready (poll http://localhost:6001 up to 30s)

# Start client dev server in background
cd <client-dir> && npm run dev &

Wait for both servers to be ready:

  • API server at http://localhost:6001
  • Client dev server at http://localhost:6273

Redeploy (for fix iterations)

  • If server changed: kill and restart the Express server
    npx kill-port 6001 2>/dev/null || true
    cd <server-dir> && npx tsx src/index.ts &
    
  • If schema changed: push new schema before restarting
    cd <server-dir> && npx drizzle-kit push
    
  • If client changed: Vite HMR handles it automatically (or restart dev server if needed)

Key Differences from SpacetimeDB

For context on what makes this backend different (this helps the benchmark comparison):

Aspect SpacetimeDB PostgreSQL
Real-time Built-in subscriptions Socket.io (manual)
API layer Reducers (auto-exposed) Express routes (manual)
Schema table() + reducer() Drizzle pgTable()
Bindings Auto-generated types Manual type definitions
Deployment spacetime publish Start Express server
State sync Automatic client cache Manual fetch + Socket.io
Online presence Via lifecycle hooks Manual Socket.io tracking
Typing indicators Reducer + subscription Socket.io events
Infra dependencies SpacetimeDB only PostgreSQL + Express + Socket.io + CORS

App Identity

  • HTML <title> MUST be "PostgreSQL Chat" (not "Chat App", not "SpacetimeDB Chat")
  • The app MUST show "PostgreSQL Chat" as the visible header/title in the UI
  • This distinguishes it from the SpacetimeDB version during testing

Port Configuration

Service Port Notes
PostgreSQL (Docker) 6432 Database
Express API server 6001 REST + Socket.io
Vite dev server 6273 React client — NOT 6173 (that's SpacetimeDB)

Reference Files

The language and feature prompt files are provided as absolute paths in the launch prompt. No additional reference files are needed — this backend uses standard Node.js/TypeScript patterns.