# Description of Changes AI app generation benchmark comparing SpacetimeDB vs PostgreSQL (Express + Socket.io + Drizzle ORM). Same AI model (Claude Sonnet 4.6), same prompts, same chat app, two backends. Upgraded through 12 feature levels, manually graded at each level, bugs fixed, all costs measured via OpenTelemetry. Results viewable at: https://spacetimedb.com/llms-benchmark-sequential-upgrade ## Benchmark harness (`tools/llm-sequential-upgrade/`) - `run.sh`: orchestrates headless Claude Code sessions for code generation, sequential upgrades, and bug fixes. Tracks all API costs via OTel. Supports `--upgrade`, `--fix`, `--composed-prompt`, `--resume-session` modes. - `grade.sh` / `grade-agents.sh`: grading harnesses for manual testing of generated apps. - `docker-compose.otel.yaml`: OTel collector + PostgreSQL services. - `generate-report.mjs` / `parse-telemetry.mjs`: aggregate per-session telemetry into cost reports. - Backend guidelines in `backends/`: SpacetimeDB SDK reference, config templates, server setup docs, PostgreSQL setup with Drizzle/Socket.io guidance. **After https://github.com/clockworklabs/SpacetimeDB/pull/4740 merges, we will likely want to update this so that it reads backend and SDK guidance from SKILLS** ## Two complete benchmark runs **Run 1 (20260403):** Original methodology. **Run 2 (20260406):** Refined methodology with domain bias removed from SpacetimeDB SDK docs and PostgreSQL instructions made feature-spec-neutral. **Note: no meaningful changes in results were observed with these changes. Domain familiarity biases were very small and almost certainly not the cause of STDB's major gains over PG stack.** Each run contains full L1-L12 app source for both backends, level snapshots preserving state before each upgrade, and per-session OTel cost summaries. ## 12 feature levels | Level | Feature | |---|---| | L1 | Basic Chat + Typing + Read Receipts + Unread Counts | | L2 | Scheduled Messages | | L3 | Ephemeral Messages | | L4 | Message Reactions | | L5 | Message Editing with History | | L6 | Real-Time Permissions (kick, ban, promote) | | L7 | Rich User Presence | | L8 | Message Threading | | L9 | Private Rooms + Direct Messages | | L10 | Room Activity Indicators | | L11 | Draft Sync | | L12 | Anonymous to Registered Migration | ## Results | | Run 1 (20260403) | Run 2 (20260406) | |---|---|---| | **SpacetimeDB total cost** | $13.33 | $12.62 | | **PostgreSQL total cost** | $17.80 | $19.68 | | **SpacetimeDB bugs** | 5 | 2 | | **PostgreSQL bugs** | 19 | 8 | | **SpacetimeDB fix sessions** | 4 | 1 | | **PostgreSQL fix sessions** | 17 | 10 | Both runs agree: SpacetimeDB apps are cheaper to build, have fewer bugs, and require fewer fix iterations. The refined methodology (Run 2) widened the cost gap and **confirmed the advantage is structural, not an artifact of domain-biased SDK docs.** ## Performance benchmark (`perf-benchmark/`) Stress throughput tool that fires concurrent writers at peak saturation against the AI-generated `send_message` handlers. | Tier | SpacetimeDB (avg) | PostgreSQL (avg) | Ratio | |---|---|---|---| | AI-generated (as-shipped) | 5,267 msgs/sec | 694 msgs/sec | 7.6x | | PG rate limit removed | 5,267 msgs/sec | 1,070 msgs/sec | 4.9x | | Optimized (same features kept) | 25,278 msgs/sec | 1,139 msgs/sec | 22x | The gap widens with optimization because SpacetimeDB's bottleneck is fixable code patterns in the reducer while PostgreSQL's bottleneck is architectural (sequential network round-trips to an external database). Optimized reference code with all features preserved is in `perf-benchmark/results/optimized-reference/`. ## Data handling Per-session cost summaries (`cost-summary.json`, `COST_REPORT.md`, `metadata.json`) are committed. Raw OTel telemetry (`raw-telemetry.jsonl`) containing PII is excluded via `.gitignore` and stored privately. # API and ABI breaking changes None. All changes are in `tools/llm-sequential-upgrade/`. No production code, library, or SDK changes. # Expected complexity level and risk **1 - Trivial.** Self-contained benchmarking tooling and data. No interaction with production code. # Testing - [x] L1-L12 upgrades completed on all 4 apps (2 backends x 2 runs) with OTel cost capture - [x] All levels manually graded after each upgrade; bugs filed and fixed via the harness - [x] Methodology refinement between runs validated (domain bias removal, feature-neutral instructions) - [x] Stress benchmarks run across both runs x 3 tiers (as-shipped, rate-limit-removed, optimized) - [x] Optimized benchmarks verified to preserve all original features - [x] Sensitive data (PII in raw telemetry) removed from repo and gitignored - [ ] Reviewer: spot-check that METRICS_DATA.json / METRICS_REPORT.json numbers match the telemetry cost-summary.json files --------- Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com> Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
7.8 KiB
Backend: PostgreSQL
Instructions for generating, building, and deploying the PostgreSQL backend.
Do NOT read SpacetimeDB SDK rule files. This backend uses standard Node.js/TypeScript patterns.
Architecture
- Server: Node.js + Express + Drizzle ORM + Socket.io
- Client: React + Vite + TypeScript + Socket.io-client
- Database: PostgreSQL (running in Docker)
The server handles:
- REST API endpoints for CRUD operations
- Socket.io for real-time events (messages, typing, presence, etc.)
- Drizzle ORM for database queries
- Session/identity management
PostgreSQL Connection
PostgreSQL is already running in a Docker container.
| Parameter | Value |
|---|---|
| Host | localhost |
| Port | 6432 (mapped from container 5432) |
| User | spacetime |
| Password | spacetime |
| Database | spacetime |
| Container | spacetime-web-postgres-1 |
| Connection URL | postgresql://spacetime:spacetime@localhost:6432/spacetime |
Pre-flight Check
docker exec spacetime-web-postgres-1 psql -U spacetime -d spacetime -c "SELECT 1"
If PostgreSQL is not reachable, STOP and report the error.
Directory Structure
<app-dir>/
server/
package.json
tsconfig.json
drizzle.config.ts
.env
src/
schema.ts # Drizzle ORM table definitions
index.ts # Express server + Socket.io + routes
client/
package.json
vite.config.ts
tsconfig.json
index.html
src/
main.tsx # React entry point
App.tsx # Main application component
styles.css # Dark theme styling
Phase 1: Generate Server
Create the Express + Socket.io server:
-
server/package.json:{ "name": "chat-server", "type": "module", "scripts": { "dev": "tsx watch src/index.ts", "start": "tsx src/index.ts" }, "dependencies": { "express": "^4.18.2", "@types/express": "^4.17.21", "drizzle-orm": "^0.39.0", "pg": "^8.13.0", "@types/pg": "^8.11.0", "socket.io": "^4.7.4", "cors": "^2.8.5", "@types/cors": "^2.8.17", "dotenv": "^16.4.5", "drizzle-kit": "^0.30.0", "tsx": "^4.19.0", "typescript": "^5.4.0" } } -
server/tsconfig.json:{ "compilerOptions": { "target": "ES2022", "module": "ES2022", "moduleResolution": "bundler", "esModuleInterop": true, "strict": true, "outDir": "dist", "rootDir": "src", "skipLibCheck": true }, "include": ["src/**/*"] } -
server/.env:DATABASE_URL=postgresql://spacetime:spacetime@localhost:6432/spacetime PORT=6001 -
server/drizzle.config.ts:import { defineConfig } from 'drizzle-kit'; export default defineConfig({ schema: './src/schema.ts', out: './drizzle', dialect: 'postgresql', dbCredentials: { url: process.env.DATABASE_URL || 'postgresql://spacetime:spacetime@localhost:6432/spacetime', }, }); -
server/src/schema.ts— Drizzle ORM table definitions for all features -
server/src/index.ts— Express server with:- CORS configured for
http://localhost:6273 - Socket.io with CORS
- REST endpoints for the app's resources (per the feature spec)
- Socket.io events for real-time updates (per the feature spec)
- Database queries via Drizzle ORM
- CORS configured for
Install and push schema:
cd <server-dir> && npm install
npx drizzle-kit push
Phase 2: (No bindings step)
Skip — PostgreSQL has no binding generation. The client calls REST/Socket.io APIs directly.
Phase 3: Generate Client
-
client/package.json:{ "name": "chat-client", "type": "module", "scripts": { "dev": "vite", "build": "tsc -b && vite build" }, "dependencies": { "react": "^18.3.1", "react-dom": "^18.3.1", "socket.io-client": "^4.7.4" }, "devDependencies": { "@types/react": "^18.3.12", "@types/react-dom": "^18.3.1", "@vitejs/plugin-react": "^4.3.4", "typescript": "^5.4.0", "vite": "^6.0.0" } } -
client/vite.config.ts— port 6273 (NOT 6173 — that's SpacetimeDB), proxy/apiand/socket.iotohttp://localhost:6001import { defineConfig } from 'vite'; import react from '@vitejs/plugin-react'; export default defineConfig({ plugins: [react()], server: { port: 6273, proxy: { '/api': 'http://localhost:6001', '/socket.io': { target: 'http://localhost:6001', ws: true, }, }, }, }); -
client/tsconfig.json -
client/index.html -
client/src/main.tsx— React entry point -
client/src/App.tsx— Main component usingfetch('/api/...')+ Socket.io client -
client/src/styles.css— Dark theme styling
The client connects to the server via the Vite proxy — no hardcoded localhost:6001 in client code.
Critical: Initialize the socket.io client without a hardcoded URL so it routes through the Vite proxy (e.g. io() or io({ path: '/socket.io' })). Hardcoding http://localhost:6001 bypasses the proxy and breaks WebSocket upgrades.
Phase 4: Verify
# Server
cd <server-dir> && npm install && npx tsc --noEmit
# Client
cd <client-dir> && npm install && npx tsc --noEmit && npm run build
Both must pass. If either fails:
- Read the error
- Fix the code
- Retry (up to 3 attempts)
- Each fix counts as a reprompt — log it
Phase 5: Deploy
# Kill any existing servers
npx kill-port 6273 2>/dev/null || true
npx kill-port 6001 2>/dev/null || true
# Start the API server in background
cd <server-dir> && npx tsx src/index.ts &
# Wait for API server to be ready (poll http://localhost:6001 up to 30s)
# Start client dev server in background
cd <client-dir> && npm run dev &
Wait for both servers to be ready:
- API server at
http://localhost:6001 - Client dev server at
http://localhost:6273
Redeploy (for fix iterations)
- If server changed: kill and restart the Express server
npx kill-port 6001 2>/dev/null || true cd <server-dir> && npx tsx src/index.ts & - If schema changed: push new schema before restarting
cd <server-dir> && npx drizzle-kit push - If client changed: Vite HMR handles it automatically (or restart dev server if needed)
Key Differences from SpacetimeDB
For context on what makes this backend different (this helps the benchmark comparison):
| Aspect | SpacetimeDB | PostgreSQL |
|---|---|---|
| Real-time | Built-in subscriptions | Socket.io (manual) |
| API layer | Reducers (auto-exposed) | Express routes (manual) |
| Schema | table() + reducer() |
Drizzle pgTable() |
| Bindings | Auto-generated types | Manual type definitions |
| Deployment | spacetime publish |
Start Express server |
| State sync | Automatic client cache | Manual fetch + Socket.io |
| Online presence | Via lifecycle hooks | Manual Socket.io tracking |
| Typing indicators | Reducer + subscription | Socket.io events |
| Infra dependencies | SpacetimeDB only | PostgreSQL + Express + Socket.io + CORS |
App Identity
- HTML
<title>MUST be "PostgreSQL Chat" (not "Chat App", not "SpacetimeDB Chat") - The app MUST show "PostgreSQL Chat" as the visible header/title in the UI
- This distinguishes it from the SpacetimeDB version during testing
Port Configuration
| Service | Port | Notes |
|---|---|---|
| PostgreSQL (Docker) | 6432 | Database |
| Express API server | 6001 | REST + Socket.io |
| Vite dev server | 6273 | React client — NOT 6173 (that's SpacetimeDB) |
Reference Files
The language and feature prompt files are provided as absolute paths in the launch prompt. No additional reference files are needed — this backend uses standard Node.js/TypeScript patterns.