supabase

mirror of https://github.com/supabase/supabase.git synced 2026-05-09 02:09:50 -04:00

Author	SHA1	Message	Date
Greg Richardson	538f9e3e82	fix: prevent AI assistant from soliciting sensitive creds (#45692 ) Adds prompt guardrails and evals to prevent the AI assistant from asking users to share sensitive data (API keys, `.env` contents, etc.) and to warn when credentials are shared. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Stronger safety behavior: assistant now refuses requests to share full environment files, asks for variable names only, and directs users to secure secret-management tooling. * Immediate warning and guidance if credentials or other sensitive values are pasted in chat, without repeating exposed secrets. * Behavior * Clarified evaluation rules so responses more consistently follow the new safety guidance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-05-07 13:22:19 -06:00
Greg Richardson	5f8906a20e	fix: add destructive operation guardrails to AI assistant (#45194 ) Prevents the AI assistant from helping with local git/filesystem operations, and adds explicit warnings before irreversible database operations (DROP TABLE, DELETE without WHERE, etc.). Adds a `safetyScorer` and eval cases to cover these behaviours. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added a Safety metric to evaluations so assistant responses are scored for safe handling of destructive or risky requests * Assistant guidance updated to refuse destructive local VCS/filesystem actions and require clear warnings for irreversible database operations * Tests * Added evaluation cases covering safe refusals, clear warnings, and correct handling of destructive or risky prompts * Chores * Enabled Safety metric in online evaluation manifests/handlers <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-05-06 09:24:21 -06:00
Greg Richardson	e38ba624bc	feat(ai): update rls knowledge for 'secure by default' (#45072 ) Updates the RLS knowledge loaded by the dashboard AI assistant to explain the new secure-by-default functionality. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Documentation * Clarified PostgreSQL/RLS guidance in Studio: tables are now "secure by default"—SQL-created tables aren’t exposed via the Data API unless explicit grants are given to anon/authenticated/service_role and RLS is enabled; added an “Exposing a Table to the Data API” workflow, strengthened RLS prerequisites in best practices, and improved troubleshooting/error-recovery guidance. * Tests / Evaluations * Added an evaluation case validating guidance for non-RLS tables requiring explicit grants and RLS policies. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ali Waseem <waseema393@gmail.com>	2026-04-22 10:02:43 -06:00
Matt Rossman	a325e86845	fix: only prefix scorer slugs on PR builds, not master deploys (#43578 ) Cleanup task following https://github.com/supabase/supabase/pull/43194 I noticed the run of `braintrust-scorers-deploy.yml` included the branch prefix on scorers in Assistant. This is unnecessary since there's only one copy of scorers in the "Assistant" project, unlike "Assistant (Staging Scorers)" which uses prefixes to disambiguate branches. <img width="502" height="262" alt="CleanShot 2026-03-09 at 15 45 19@2x" src="https://github.com/user-attachments/assets/214ec1e8-5f40-411f-8d2a-71cc4a5fc294" /> This is a small housekeeping correction so scorers in the main "Assistant" project don't include branch prefixes, whereas scorers from PRs deployed to "Assistant (Staging Scorers)" remain prefixed. https://docs.github.com/en/actions/reference/variables-reference <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Updated CI deployment configuration for scorer branch/prefix handling to optimize behavior across different GitHub event types (PR vs. push/dispatch events). <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-04-09 09:57:42 -04:00
Charis	205cbe7d26	chore(studio}: enforce import order, remove bare import specifiers (#44585 )	2026-04-07 20:34:10 -04:00
Matt Rossman	82deff37de	feat(assistant): lazy load topic knowledge via load_knowledge tool (#44296 ) Moves knowledge (RLS, Edge Functions, PostgreSQL best practices, Realtime) out of the static system prompt and into a `load_knowledge` tool the model calls on demand, reducing prompt bloat. This is a temporary stopgap until the [standard Supabase agent-skills](https://github.com/supabase/agent-skills) are ready for integration in Assistant. - New always-available `load_knowledge` tool added to `rendering-tools.ts` - Updated `Message.Parts.tsx` so the "Ran load_knowledge" chip renders in chat - System prompt replaces the four knowledge blobs with an `## Available Knowledge` block and is hardened to load knowledge for given topics - New "Knowledge Usage" scorer and `requiredKnowledge` assertions check that knowledge loads as expected in test scenarios - Filters GraphQL error responses out of `output.docs` before faithfulness scoring to reduce noise See "Knowledge Usage" scoring 100% in evals with no major regressions: https://github.com/supabase/supabase/pull/44296#issuecomment-4145760236 Sample trace showing the tool in action ([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=project_logs&object_id=5a8d02e5-b3b6-40cc-ba76-ecee286478f4&r=351a11c8-9cb7-4945-93ad-d11e8cc2e3e1&s=351a11c8-9cb7-4945-93ad-d11e8cc2e3e1)) <img width="2192" height="1730" alt="CleanShot 2026-03-30 at 13 53 59@2x" src="https://github.com/user-attachments/assets/f483767c-34e0-401c-8089-5b9834fe696a" /> References - https://ai-sdk.dev/cookbook/guides/agent-skills Closes AI-508 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added dynamic knowledge loading capability enabling the AI assistant to retrieve on-demand information about PostgreSQL best practices, Row Level Security, Edge Functions, and Realtime. * Bug Fixes * Improved search results filtering to exclude error responses in tool outputs. * Tests * Enhanced evaluation metrics with knowledge usage scoring. * Expanded test dataset cases to validate knowledge requirement handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-04-02 16:09:06 -04:00
Matt Rossman	0c5f64fcba	feat(assistant): upgrade default models to gpt-5.4-nano and gpt-5.3-codex (#44107 ) Replaces `gpt-5-mini` and `gpt-5` with `gpt-5.4-nano` and `gpt-5.3-codex` respectively. Clients with stale model IDs in IndexedDB will gracefully reset to the new defaults. While we can technically keep the existing models around, we've [opted](https://supabase.slack.com/archives/C051L8U2EJF/p1774283070517609?thread_ts=1773771991.871669&cid=C051L8U2EJF) to replace them w/ the newer models for simplicity. Basic completion endpoints use `'none'` reasoning level for optimal speed. Rationale for these models is they provide they best balance of intelligence/speed and cost. GPT-5.4-nano is less expensive (0.8x price), faster, and smarter than GPT-5-mini. GPT-5.4-mini would be even smarter but is 3x the price. GPT-5.3-Codex is ~1.4x the price of GPT-5, while GPT-5.4 would be 2x price, but 5.3-Codex is still a big intelligence boost from GPT-5. See [eval comparison](https://www.braintrust.dev/app/supabase.io/p/Assistant/experiments/mattrossman%2Fai-509-v2-upgrade-assistant-models-beyond-gpt-5-family-1774468619?c=master-1774458837&diff=between_experiments), scores are relatively stable and conciseness naturally improves on gpt-5.4-nano. Other change: - Fixed an eval test case to clarify that https://supabase.help is also a correct URL for submitting support ticket, which was unfairly scored as incorrect [here](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=experiment&object_id=5244cccd-23b2-4f79-9dd2-287f1b40ebad&r=bac9b903-8bde-4c21-99dd-e0ed141c4f9e&s=f248fbf5-75bf-4aab-be0a-87a4298e6d11) I sanity checked the Assistant, natural language filters, and SQL Editor completions on staging preview. References: - https://openai.com/index/introducing-gpt-5-4-mini-and-nano/ - https://openai.com/index/introducing-gpt-5-3-codex/ - https://developers.openai.com/api/docs/pricing Closes AI-509	2026-03-26 14:35:54 +08:00
Matt Rossman	adf8b0c67c	feat(assistant): per-endpoint reasoningEffort + model config cleanup (#43981 ) We're exploring support for newer models like [gpt-5.4-nano](https://openai.com/index/introducing-gpt-5-4-mini-and-nano/) in Assistant. This model doesn't support the `'minimal'` reasoning effort level we use for gpt-5-mini which leads to vague errors. <img width="595" height="263" alt="CleanShot 2026-03-18 at 17 13 05@2x" src="https://github.com/user-attachments/assets/cf7c2370-322d-4a8a-be55-23e680db0aa0" /> Also, we've [previously discussed](https://supabase.slack.com/archives/C0161K73J1J/p1771544464850199?thread_ts=1771493920.775699&cid=C0161K73J1J) that reasoning adds unnecessary latency to otherwise simple AI completion endpoints like `title-v2`. We want more control of reasoning level independent of model/endpoint. This PR aims to solve both problems by: - making reasoning effort configurable on a per-request basis - adding compile-time guardrails to prevent selecting an incompatible reasoning level for models - adding a `DEFAULT_COMPLETION_MODEL` with minimal reasoning that we can update with newer models that support disabling reasoning (independent of Assistant chat model reasoning) Other improvements to our model config logic: - Fixes bug in `onboarding/design.ts` and `assistant.eval.ts` where `providerOptions` was being dropped - `getModel()` now returns a bundled `modelParams` object (spread into AI SDK calls) so `providerOptions` can't be accidentally omitted (this [has happened before](https://supabase.slack.com/archives/C0161K73J1J/p1771518443534309?thread_ts=1771493920.775699&cid=C0161K73J1J)) - Introduces an `ASSISTANT_MODELS` registry as a single source of truth for assistant model config, eliminating hardcoded model IDs across the codebase - Aligns free/pro model conditional logic with `assistant.advance_model` entitlement naming conventions instead of the `isLimited` pattern - Adds `console.error` logging of Assistant stream errors so we can interpret reasoning effort compatibility errors in the future (instead of just opaque "Sorry, I'm having trouble responding right now" card) - Removes unnecessary type casts and generally making the model config logic stricter - Removes pre-existing dead code: `anthropic` provider variant in `GetModelParams` / `PROVIDERS` registry that was never implemented in `getModel()` Now if you try to select an unsupported reasoning level you get a type error: <img width="1306" height="320" alt="CleanShot 2026-03-20 at 14 37 24@2x" src="https://github.com/user-attachments/assets/a6ac234b-5ea5-4d81-8e01-ac4be34a0800" /> And if for some reason an invalid reasoning level slips through, you now get a server-side error surfacing the issue: <img width="1268" height="204" alt="CleanShot 2026-03-20 at 14 58 14@2x" src="https://github.com/user-attachments/assets/aadc1b7a-9495-475f-9741-39979bd27cd7" /> I've tested gpt-5 and gpt-5-mini are still working on the staging preview and verified the models were selected properly in Braintrust logs. Both models are available on my Pro test account, and my Free test account shows the Pro upgrade CTA. Closes AI-446 Closes AI-551	2026-03-25 11:29:23 -04:00
Matt Rossman	25036af80e	fix(assistant): sanitize backslash-escaped apostrophes in SQL (#43728 ) Fix for the LLM occasionally generating MySQL-style `\'` escapes in SQL, which are invalid in PostgreSQL. Example trace where this happened in the wild: ([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Assistant/review?tab=experiment&r=5fcf1b12-8584-455c-9e9a-bdc0fa3ed21c&s=5fcf1b12-8584-455c-9e9a-bdc0fa3ed21c&o=0627ada8-b567-4117-9fe8-49d847cb73a7&review=1)) Changes - Adds `fixSqlBackslashEscapes` to convert `\'` → `''` before SQL is executed - Unit tests + adversarial eval dataset case Compare the results of the adversarial test case: - `master`: 0% SQL Validity ([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Dev%20(mattrossman%2FAssistant)/trace?object_type=experiment&object_id=b469cbf7-4d6f-429c-9819-6c4099294123&r=dce5a29b-2fde-44c3-80f8-4e14d1f657c0&s=dce5a29b-2fde-44c3-80f8-4e14d1f657c0)) - This branch: 100% SQL Validity ([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=experiment&object_id=160e9ce0-e320-4f6d-8aa7-c5ad7e01fbd2&r=d75ef0e3-90ed-42a7-9ef3-8bf69592f193&s=0eeca492-dbe6-451e-8d81-127caff30320)) Closes AI-400	2026-03-17 13:44:14 -04:00
Matt Rossman	e4a9b6882c	fix(assistant): use extractTextOnly in conciseness scorer (#43612 ) `concisenessScorer` was passing full serialized text + tool calls JSON to the LLM judge (SQL queries, GraphQL payloads, etc.). Switches to `extractTextOnly` so the judge only evaluates text the user actually sees. Prerequisite for https://github.com/supabase/supabase/pull/43613 to set a fair conciseness baseline score. Ref AI-402	2026-03-11 13:55:02 -04:00
Matt Rossman	517171b246	feat(assistant): online evals support and CI workflows (#43194 ) Lays groundwork for online evals on Assistant chat logs. https://www.braintrust.dev/docs/observe/score-online ### Changes - New workflows: - `braintrust-scorers-deploy.yml` keeps prod scorers in sync on push to `master` - `braintrust-preview-scorers-deploy.yml` deploys preview scorers to the staging project for PRs labeled `preview-scorers`, posting a comment with scorer links ([example](https://github.com/supabase/supabase/pull/43194#issuecomment-4000097222)) - `braintrust-preview-scorers-cleanup.yml` deletes preview scorers when the PR is closed ([example](https://github.com/supabase/supabase/pull/43194#issuecomment-4000749847)) - Adds `evals/scorer-online.ts` entry point invoked with `pnpm scorers:deploy`, registering scorers for online evals in the Braintrust "Assistant" project - Refactors scorer code to separate online-compatible scorers (`scorer-online.ts`) from WASM-dependent ones (`scorer-wasm.ts`) - "URL Validity" scorer now only checks Supabase domains to prevent requests to untrusted origins - Span `input` is now shaped `{ prompt: string }` instead of plain `string` for compatibility with offline eval scorers - Env vars `BRAINTRUST_STAGING_PROJECT_ID` and `BRAINTRUST_PROJECT_ID` configured in GitHub repo settings - `generateAssistantResponse` now uses `startSpan` + `withCurrent` instead of `traced()` to manually manage the root span lifecycle — this ensures `onFinish` logs output to the span _before_ `span.end()` is called, which is when Braintrust triggers scoring automations ### Online Scorers We share scoring logic across offline and online evals, but some of our scorers aren't transferrable to an "online" setting due to runtime challenges or ground truth requirements. Supported - Goal Completion - Conciseness - Completeness - Docs Faithfulness - URL Validity Unsupported - Correctness (requires ground truth output) - Tool Usage (requires ground truth requiredTools) - SQL Syntax (uses libpg-query WASM) - SQL Identifier Quoting (uses libpg-query WASM) ### How to use these scorers Going forward if you want to add/edit online eval scorers, add the `preview-scorers` label to a PR. This deploys scorers to the [Assistant (Staging Scorers)](https://www.braintrust.dev/app/supabase.io/p/Assistant%20(Staging%20Scorers)?v=Overview) project in Braintrust with branch-specific slugs, and comments on the PR ([example](https://github.com/supabase/supabase/pull/43194#issuecomment-4000097222)). From the Braintrust dashboard you can "Test" the scorer with traces from any project. <img width="1866" height="528" alt="CleanShot 2026-03-05 at 15 15 00@2x" src="https://github.com/user-attachments/assets/4f15cebc-3f2d-4e8a-9ee2-fe8ef7bf4199" /> Once merged, scorers are deployed to the primary [Assistant](https://www.braintrust.dev/app/supabase.io/p/Assistant) project, and preview scorers are deleted from the staging project. Down the road, scorers on the Assistant project will run automatically on a sample of production traces. Closes AI-437	2026-03-09 13:05:26 -04:00
Matt Rossman	851cc00545	feat(assistant): run 3 trials for Assistant evals in CI (#42510 ) Runs 3 trials for Assistant evals in CI to reduce random variation. Locally, only 1 trial is run. Also adds `CI` to `studio#build` env in turbo.json. This env var is [automatically set by GitHub Actions](https://github.blog/changelog/2020-04-15-github-actions-sets-the-ci-environment-variable-to-true/). Compare number of trials: - [Assistant (mattrossman/ai-398-increase-trial-count-for-assistant-evals-1770305591)](https://www.braintrust.dev/app/supabase.io/p/Assistant/experiments/mattrossman%2Fai-398-increase-trial-count-for-assistant-evals-1770305591) - [Assistant (master)](https://www.braintrust.dev/app/supabase.io/p/Assistant/experiments/master-1770305906?c=mattrossman/ai-398-increase-trial-count-for-assistant-evals-1770305591) References: - https://www.braintrust.dev/docs/evaluate/run-evaluations#trials Closes AI-398 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Updated evaluation configuration to adjust trial counts based on CI environment * Integrated CI environment variable into build system configuration <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ali Waseem <waseema393@gmail.com>	2026-02-05 11:21:44 -05:00
Matt Rossman	4b8bab4d14	feat(assistant): score URL validity and fix support ticket URL guidance (#42227 ) Logic changes - Adds function in `helpers.ts` to extract URLs from text via regex - I also considering using a library like [linkify-it](https://www.npmjs.com/package/linkify-it) for this but figured it's not worth the extra dep - Adds associated tests in `helpers.test.ts` - Adds "URL Validity" scorer which performs a HEAD request for links in Assistant response text and determins what portion of links have `.ok` responses - Adds eval case to check correctness of support ticket URL answers Prompt changes - Informs Assistant of https://supabase.com/dashboard/support/new being the URL to create support tickets - Encourages Assistant to "self-debug" issues before directing users to create support tickets See [Eval Report](https://github.com/supabase/supabase/pull/42227#issuecomment-3807772871) and [Correctness](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=experiment&object_id=1ad0f9b0-5adb-436c-9812-a87aac62c036&r=1ef13459-a98c-4904-925e-6d81276cebb2&s=dbe5c607-a560-462b-8745-41d430744431) analysis for new support ticket test case. Resolves AI-384 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added URL validity scoring to evaluations and helper utilities for extracting/cleaning URLs. * Added evaluation cases for support-ticket URL handling and OAuth callback guidance. * Documentation * Updated assistant guidance to prefer self-resolution, include support-ticket direction, clarified data-recovery search steps, and added template-URL notation. * Tests * Expanded URL extraction and related utility tests to cover many formats and edge cases. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-01-30 09:53:21 -05:00
Matt Rossman	a127f2cbbc	test(assistant): add eval case for `execute_sql` usage on default "Generate sample data" prompt (#42219 ) test: add eval case for execute_sql on sample data generation	2026-01-28 10:02:22 -05:00
Matt Rossman	eb259f1364	feat(assistant): score and improve SQL identifier quoting (#42122 ) * feat: SQL correctness scorer, override mock tables * feat: replace "SQL Correctness" with "SQL Identifier Quoting" scorer * fix(prompt): discourage simulating confirmation of execute_sql tool this is already handled at the UI layer * fix(prompt): encourage quotes on identifiers with caps * feat: move extractIdentifiers to own file, add tests * chore: shorten tests * feat: extract ColumnDef column names in extractIdentifiers * refactor: sqlIdentifierQuotingScorer with more thorough checks * refactor: consolidate into `sql-identifier-quoting.ts` * feat: support mocking schemas, eval test case with case sensitive schema * fix: test cases that don't match default mock schema * chore: format * feat(prompt): mention special characters and reserved words * feat: optional description in metadata, test with special characters * feat: consolidated comprehensive test case * fix(prompt): revert conflicting instruction	2026-01-27 09:28:56 -05:00
Matt Rossman	4553f09bb5	feat(assistant): hallucination scorers + corrective measures for storage versioning answers (#41655 ) * feat: "Docs Faithfulness" scorer * feat: test case for storage object restoration * feat: "Factuality" scorer * feat: "Factuality" -> "Correctness" * feat: update Storage recovery test case * feat: finishReason in task output * feat: encourage parallel tool calls + docs search, discourage superfluous context gathering * prompt tuning (tool selection strategy) * add data recovery section in chat prompt * test: S3 versioning support correctness * refactor: derive stepsSerialized/textOnly from shared steps data * fix: input in correctness scorer	2026-01-15 14:22:46 +07:00
Matt Rossman	072883bcec	feat: assistant evals (#41311 ) * chore: bump `supabase` CLI * chore: stricter message types in `generate-v4.ts` * feat: tutorial eval https://www.braintrust.dev/docs/evaluation * feat: project ID for eval * refactor: `generateAssistantResponse` out of `handlePost` * refactor: generateAssistantResponse to lib/ai * feat: factuality eval with assistant response * chore: upgrade braintrust to v1.0.1 * chore: silence tsconfig warning * feat: assertion scorer * fix: aggregate tools across all steps * refactor: strict tool names, remove need for `as const` * refactor: generic tool name type in assertions * feat: transfer mocks from `feature/braintrust` * feat: LLM criteria assertion * feat: braintrust evals workflow * fix: BRAINTRUST_PROJECT_ID * feat: `sql_similar` assertion * fix: `OPENAI_API_KEY` in workflow env * feat: split AssertionScorer into separate scorers * feat: remove tutorial eval * feat: 20 minute CI timeout * feat: category in test case metadata * feat: score with gpt-5 * refactor: dataset to own file, colocate scorers * feat: "gpt-5.2-2025-12-11" for llm as a judge * feat: SQL syntax scorer with `libpg-query` * feat: `evals:setup` and `evals:run` scripts * feat: `evals:setup` in CI * feat: human readable scorer names * chore: rename to "SQL Validity" * feat: add 2 "sql_generation" test cases * feat: update requiredTools in test cases * chore: ignore Cursor MCP config * feat: "Conciseness" score * feat: "Completeness" scorer * fix: generate-v4 test mocks * feat: serialize "steps" for scorer inputs * updated node mem options for typecheck * updated runner * remove ram update as actions handle this * feat: read `BRAINTRUST_PROJECT_ID` from secrets * feat: score helpfulness, remove old scorers * feat: separate `evals:run` and `evals:upload` scripts * feat: passthrough entire classifier result * feat: use live `search_docs` impl, store docs result in metadata * feat: reduce classifier options * feat: filter workflow by `run-evals` PR label or `master` branch * chore: cleanup stubbed mock tools * fix: checkout actual branch with `ref:` * fix: capture search_docs results from all content parts * feat: simplify sql syntax score calculation * feat: use AI SDK's UI message validator * docs: justification for relative `extends` * fix: cleanup leftover validatedMessages * doc: note mock token isn't secret for snyk * fix: mock ui message to pass validation * feat: revert ignoring Cursor MCP config Using `.git/info/exclude` instead until we have an opinion on this * feat: add "tsconfig" as shared-data devDependency, revert relative path in tsconfig * refactor: tool call parsing into function * Update apps/studio/evals/assistant.eval.ts Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * refactor: organize mock schemas and tool factories --------- Co-authored-by: Ali Waseem <waseema393@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-12-22 23:45:48 -05:00

17 Commits