Commit Graph

17 Commits

Author SHA1 Message Date
Greg Richardson 538f9e3e82 fix: prevent AI assistant from soliciting sensitive creds (#45692)
Adds prompt guardrails and evals to prevent the AI assistant from asking
users to share sensitive data (API keys, `.env` contents, etc.) and to
warn when credentials are shared.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Stronger safety behavior: assistant now refuses requests to share full
environment files, asks for variable names only, and directs users to
secure secret-management tooling.
* Immediate warning and guidance if credentials or other sensitive
values are pasted in chat, without repeating exposed secrets.
* **Behavior**
* Clarified evaluation rules so responses more consistently follow the
new safety guidance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-05-07 13:22:19 -06:00
Greg Richardson 5f8906a20e fix: add destructive operation guardrails to AI assistant (#45194)
Prevents the AI assistant from helping with local git/filesystem
operations, and adds explicit warnings before irreversible database
operations (DROP TABLE, DELETE without WHERE, etc.).

Adds a `safetyScorer` and eval cases to cover these behaviours.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a Safety metric to evaluations so assistant responses are scored
for safe handling of destructive or risky requests
* Assistant guidance updated to refuse destructive local VCS/filesystem
actions and require clear warnings for irreversible database operations

* **Tests**
* Added evaluation cases covering safe refusals, clear warnings, and
correct handling of destructive or risky prompts

* **Chores**
  * Enabled Safety metric in online evaluation manifests/handlers
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-05-06 09:24:21 -06:00
Greg Richardson e38ba624bc feat(ai): update rls knowledge for 'secure by default' (#45072)
Updates the RLS knowledge loaded by the dashboard AI assistant to
explain the new secure-by-default functionality.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Clarified PostgreSQL/RLS guidance in Studio: tables are now "secure by
default"—SQL-created tables aren’t exposed via the Data API unless
explicit grants are given to anon/authenticated/service_role and RLS is
enabled; added an “Exposing a Table to the Data API” workflow,
strengthened RLS prerequisites in best practices, and improved
troubleshooting/error-recovery guidance.
* **Tests / Evaluations**
* Added an evaluation case validating guidance for non-RLS tables
requiring explicit grants and RLS policies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ali Waseem <waseema393@gmail.com>
2026-04-22 10:02:43 -06:00
Matt Rossman a325e86845 fix: only prefix scorer slugs on PR builds, not master deploys (#43578)
Cleanup task following https://github.com/supabase/supabase/pull/43194

I noticed the run of `braintrust-scorers-deploy.yml` included the branch
prefix on scorers in Assistant. This is unnecessary since there's only
one copy of scorers in the "Assistant" project, unlike "Assistant
(Staging Scorers)" which uses prefixes to disambiguate branches.

<img width="502" height="262" alt="CleanShot 2026-03-09 at 15 45 19@2x"
src="https://github.com/user-attachments/assets/214ec1e8-5f40-411f-8d2a-71cc4a5fc294"
/>

This is a small housekeeping correction so scorers in the main
"Assistant" project don't include branch prefixes, whereas scorers from
PRs deployed to "Assistant (Staging Scorers)" remain prefixed.

https://docs.github.com/en/actions/reference/variables-reference

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated CI deployment configuration for scorer branch/prefix handling
to optimize behavior across different GitHub event types (PR vs.
push/dispatch events).

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-09 09:57:42 -04:00
Charis 205cbe7d26 chore(studio}: enforce import order, remove bare import specifiers (#44585) 2026-04-07 20:34:10 -04:00
Matt Rossman 82deff37de feat(assistant): lazy load topic knowledge via load_knowledge tool (#44296)
Moves knowledge (RLS, Edge Functions, PostgreSQL best practices,
Realtime) out of the static system prompt and into a `load_knowledge`
tool the model calls on demand, reducing prompt bloat. This is a
temporary stopgap until the [standard Supabase
agent-skills](https://github.com/supabase/agent-skills) are ready for
integration in Assistant.

- New always-available `load_knowledge` tool added to
`rendering-tools.ts`
- Updated `Message.Parts.tsx` so the "Ran load_knowledge" chip renders
in chat
- System prompt replaces the four knowledge blobs with an `## Available
Knowledge` block and is hardened to load knowledge for given topics
- New "Knowledge Usage" scorer and `requiredKnowledge` assertions check
that knowledge loads as expected in test scenarios
- Filters GraphQL error responses out of `output.docs` before
faithfulness scoring to reduce noise


See "Knowledge Usage" scoring 100% in evals with no major regressions:
https://github.com/supabase/supabase/pull/44296#issuecomment-4145760236

Sample trace showing the tool in action
([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=project_logs&object_id=5a8d02e5-b3b6-40cc-ba76-ecee286478f4&r=351a11c8-9cb7-4945-93ad-d11e8cc2e3e1&s=351a11c8-9cb7-4945-93ad-d11e8cc2e3e1))

<img width="2192" height="1730" alt="CleanShot 2026-03-30 at 13 53
59@2x"
src="https://github.com/user-attachments/assets/f483767c-34e0-401c-8089-5b9834fe696a"
/>


**References**
- https://ai-sdk.dev/cookbook/guides/agent-skills

Closes AI-508

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added dynamic knowledge loading capability enabling the AI assistant
to retrieve on-demand information about PostgreSQL best practices, Row
Level Security, Edge Functions, and Realtime.

* **Bug Fixes**
* Improved search results filtering to exclude error responses in tool
outputs.

* **Tests**
  * Enhanced evaluation metrics with knowledge usage scoring.
* Expanded test dataset cases to validate knowledge requirement
handling.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-02 16:09:06 -04:00
Matt Rossman 0c5f64fcba feat(assistant): upgrade default models to gpt-5.4-nano and gpt-5.3-codex (#44107)
Replaces `gpt-5-mini` and `gpt-5` with `gpt-5.4-nano` and
`gpt-5.3-codex` respectively. Clients with stale model IDs in IndexedDB
will gracefully reset to the new defaults. While we can technically keep
the existing models around, we've
[opted](https://supabase.slack.com/archives/C051L8U2EJF/p1774283070517609?thread_ts=1773771991.871669&cid=C051L8U2EJF)
to replace them w/ the newer models for simplicity. Basic completion
endpoints use `'none'` reasoning level for optimal speed.

Rationale for these models is they provide they best balance of
intelligence/speed and cost. GPT-5.4-nano is less expensive (0.8x
price), faster, and smarter than GPT-5-mini. GPT-5.4-mini would be even
smarter but is 3x the price. GPT-5.3-Codex is ~1.4x the price of GPT-5,
while GPT-5.4 would be 2x price, but 5.3-Codex is still a big
intelligence boost from GPT-5.

See [eval
comparison](https://www.braintrust.dev/app/supabase.io/p/Assistant/experiments/mattrossman%2Fai-509-v2-upgrade-assistant-models-beyond-gpt-5-family-1774468619?c=master-1774458837&diff=between_experiments),
scores are relatively stable and conciseness naturally improves on
gpt-5.4-nano.

Other change:
- Fixed an eval test case to clarify that https://supabase.help is also
a correct URL for submitting support ticket, which was unfairly scored
as incorrect
[here](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=experiment&object_id=5244cccd-23b2-4f79-9dd2-287f1b40ebad&r=bac9b903-8bde-4c21-99dd-e0ed141c4f9e&s=f248fbf5-75bf-4aab-be0a-87a4298e6d11)

I sanity checked the Assistant, natural language filters, and SQL Editor
completions on staging preview.

References:
- https://openai.com/index/introducing-gpt-5-4-mini-and-nano/
- https://openai.com/index/introducing-gpt-5-3-codex/
- https://developers.openai.com/api/docs/pricing

Closes AI-509
2026-03-26 14:35:54 +08:00
Matt Rossman adf8b0c67c feat(assistant): per-endpoint reasoningEffort + model config cleanup (#43981)
We're exploring support for newer models like
[gpt-5.4-nano](https://openai.com/index/introducing-gpt-5-4-mini-and-nano/)
in Assistant. This model doesn't support the `'minimal'` reasoning
effort level we use for gpt-5-mini which leads to vague errors.

<img width="595" height="263" alt="CleanShot 2026-03-18 at 17 13 05@2x"
src="https://github.com/user-attachments/assets/cf7c2370-322d-4a8a-be55-23e680db0aa0"
/>


Also, we've [previously
discussed](https://supabase.slack.com/archives/C0161K73J1J/p1771544464850199?thread_ts=1771493920.775699&cid=C0161K73J1J)
that reasoning adds unnecessary latency to otherwise simple AI
completion endpoints like `title-v2`. We want more control of reasoning
level independent of model/endpoint.

This PR aims to solve both problems by:
- making reasoning effort configurable on a per-request basis
- adding compile-time guardrails to prevent selecting an incompatible
reasoning level for models
- adding a `DEFAULT_COMPLETION_MODEL` with minimal reasoning that we can
update with newer models that support disabling reasoning (independent
of Assistant chat model reasoning)

Other improvements to our model config logic:
- Fixes bug in `onboarding/design.ts` and `assistant.eval.ts` where
`providerOptions` was being dropped
- `getModel()` now returns a bundled `modelParams` object (spread into
AI SDK calls) so `providerOptions` can't be accidentally omitted (this
[has happened
before](https://supabase.slack.com/archives/C0161K73J1J/p1771518443534309?thread_ts=1771493920.775699&cid=C0161K73J1J))
- Introduces an `ASSISTANT_MODELS` registry as a single source of truth
for assistant model config, eliminating hardcoded model IDs across the
codebase
- Aligns free/pro model conditional logic with `assistant.advance_model`
entitlement naming conventions instead of the `isLimited` pattern
- Adds `console.error` logging of Assistant stream errors so we can
interpret reasoning effort compatibility errors in the future (instead
of just opaque "Sorry, I'm having trouble responding right now" card)
- Removes unnecessary type casts and generally making the model config
logic stricter
- Removes pre-existing dead code: `anthropic` provider variant in
`GetModelParams` / `PROVIDERS` registry that was never implemented in
`getModel()`

Now if you try to select an unsupported reasoning level you get a type
error:

<img width="1306" height="320" alt="CleanShot 2026-03-20 at 14 37 24@2x"
src="https://github.com/user-attachments/assets/a6ac234b-5ea5-4d81-8e01-ac4be34a0800"
/>

And if for some reason an invalid reasoning level slips through, you now
get a server-side error surfacing the issue:

<img width="1268" height="204" alt="CleanShot 2026-03-20 at 14 58 14@2x"
src="https://github.com/user-attachments/assets/aadc1b7a-9495-475f-9741-39979bd27cd7"
/>

I've tested gpt-5 and gpt-5-mini are still working on the staging
preview and verified the models were selected properly in Braintrust
logs. Both models are available on my Pro test account, and my Free test
account shows the Pro upgrade CTA.


Closes AI-446
Closes AI-551
2026-03-25 11:29:23 -04:00
Matt Rossman 25036af80e fix(assistant): sanitize backslash-escaped apostrophes in SQL (#43728)
Fix for the LLM occasionally generating MySQL-style `\'` escapes in SQL,
which are invalid in PostgreSQL.

Example trace where this happened in the wild:
([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Assistant/review?tab=experiment&r=5fcf1b12-8584-455c-9e9a-bdc0fa3ed21c&s=5fcf1b12-8584-455c-9e9a-bdc0fa3ed21c&o=0627ada8-b567-4117-9fe8-49d847cb73a7&review=1))

**Changes**
- Adds `fixSqlBackslashEscapes` to convert `\'` → `''` before SQL is
executed
- Unit tests + adversarial eval dataset case

Compare the results of the adversarial test case:
- `master`: 0% SQL Validity
([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Dev%20(mattrossman%2FAssistant)/trace?object_type=experiment&object_id=b469cbf7-4d6f-429c-9819-6c4099294123&r=dce5a29b-2fde-44c3-80f8-4e14d1f657c0&s=dce5a29b-2fde-44c3-80f8-4e14d1f657c0))
- This branch: 100% SQL Validity
([Braintrust](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=experiment&object_id=160e9ce0-e320-4f6d-8aa7-c5ad7e01fbd2&r=d75ef0e3-90ed-42a7-9ef3-8bf69592f193&s=0eeca492-dbe6-451e-8d81-127caff30320))

Closes AI-400
2026-03-17 13:44:14 -04:00
Matt Rossman e4a9b6882c fix(assistant): use extractTextOnly in conciseness scorer (#43612)
`concisenessScorer` was passing full serialized text + tool calls JSON
to the LLM judge (SQL queries, GraphQL payloads, etc.). Switches to
`extractTextOnly` so the judge only evaluates text the user actually
sees.

Prerequisite for https://github.com/supabase/supabase/pull/43613 to set
a fair conciseness baseline score.

Ref AI-402
2026-03-11 13:55:02 -04:00
Matt Rossman 517171b246 feat(assistant): online evals support and CI workflows (#43194)
Lays groundwork for online evals on Assistant chat logs.

https://www.braintrust.dev/docs/observe/score-online

### Changes

- New workflows:
- `braintrust-scorers-deploy.yml` keeps prod scorers in sync on push to
`master`
- `braintrust-preview-scorers-deploy.yml` deploys preview scorers to the
staging project for PRs labeled `preview-scorers`, posting a comment
with scorer links
([example](https://github.com/supabase/supabase/pull/43194#issuecomment-4000097222))
- `braintrust-preview-scorers-cleanup.yml` deletes preview scorers when
the PR is closed
([example](https://github.com/supabase/supabase/pull/43194#issuecomment-4000749847))
- Adds `evals/scorer-online.ts` entry point invoked with `pnpm
scorers:deploy`, registering scorers for online evals in the Braintrust
"Assistant" project
- Refactors scorer code to separate online-compatible scorers
(`scorer-online.ts`) from WASM-dependent ones (`scorer-wasm.ts`)
- "URL Validity" scorer now only checks Supabase domains to prevent
requests to untrusted origins
- Span `input` is now shaped `{ prompt: string }` instead of plain
`string` for compatibility with offline eval scorers
- Env vars `BRAINTRUST_STAGING_PROJECT_ID` and `BRAINTRUST_PROJECT_ID`
configured in GitHub repo settings
- `generateAssistantResponse` now uses `startSpan` + `withCurrent`
instead of `traced()` to manually manage the root span lifecycle — this
ensures `onFinish` logs output to the span _before_ `span.end()` is
called, which is when Braintrust triggers scoring automations

### Online Scorers

We share scoring logic across offline and online evals, but some of our
scorers aren't transferrable to an "online" setting due to runtime
challenges or ground truth requirements.

**Supported**
- Goal Completion
- Conciseness
- Completeness
- Docs Faithfulness
- URL Validity

**Unsupported**
- Correctness (requires ground truth output)
- Tool Usage (requires ground truth requiredTools)
- SQL Syntax (uses libpg-query WASM)
- SQL Identifier Quoting (uses libpg-query WASM)
 
### How to use these scorers

Going forward if you want to add/edit online eval scorers, add the
`preview-scorers` label to a PR. This deploys scorers to the [Assistant
(Staging
Scorers)](https://www.braintrust.dev/app/supabase.io/p/Assistant%20(Staging%20Scorers)?v=Overview)
project in Braintrust with branch-specific slugs, and comments on the PR
([example](https://github.com/supabase/supabase/pull/43194#issuecomment-4000097222)).
From the Braintrust dashboard you can "Test" the scorer with traces from
any project.

<img width="1866" height="528" alt="CleanShot 2026-03-05 at 15 15 00@2x"
src="https://github.com/user-attachments/assets/4f15cebc-3f2d-4e8a-9ee2-fe8ef7bf4199"
/>

Once merged, scorers are deployed to the primary
[Assistant](https://www.braintrust.dev/app/supabase.io/p/Assistant)
project, and preview scorers are deleted from the staging project. Down
the road, scorers on the Assistant project will run automatically on a
sample of production traces.

Closes AI-437
2026-03-09 13:05:26 -04:00
Matt Rossman 851cc00545 feat(assistant): run 3 trials for Assistant evals in CI (#42510)
Runs 3 trials for Assistant evals in CI to reduce random variation.
Locally, only 1 trial is run.

Also adds `CI` to `studio#build` env in turbo.json. This env var is
[automatically set by GitHub
Actions](https://github.blog/changelog/2020-04-15-github-actions-sets-the-ci-environment-variable-to-true/).

Compare number of trials:
- [Assistant
(mattrossman/ai-398-increase-trial-count-for-assistant-evals-1770305591)](https://www.braintrust.dev/app/supabase.io/p/Assistant/experiments/mattrossman%2Fai-398-increase-trial-count-for-assistant-evals-1770305591)
- [Assistant
(master)](https://www.braintrust.dev/app/supabase.io/p/Assistant/experiments/master-1770305906?c=mattrossman/ai-398-increase-trial-count-for-assistant-evals-1770305591)

References:
- https://www.braintrust.dev/docs/evaluate/run-evaluations#trials

Closes AI-398

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated evaluation configuration to adjust trial counts based on CI
environment
  * Integrated CI environment variable into build system configuration

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ali Waseem <waseema393@gmail.com>
2026-02-05 11:21:44 -05:00
Matt Rossman 4b8bab4d14 feat(assistant): score URL validity and fix support ticket URL guidance (#42227)
**Logic changes**
- Adds function in `helpers.ts` to extract URLs from text via regex
- I also considering using a library like
[linkify-it](https://www.npmjs.com/package/linkify-it) for this but
figured it's not worth the extra dep
- Adds associated tests in `helpers.test.ts`
- Adds "URL Validity" scorer which performs a HEAD request for links in
Assistant response text and determins what portion of links have `.ok`
responses
- Adds eval case to check correctness of support ticket URL answers

**Prompt changes**
- Informs Assistant of https://supabase.com/dashboard/support/new being
the URL to create support tickets
- Encourages Assistant to "self-debug" issues before directing users to
create support tickets

See [Eval
Report](https://github.com/supabase/supabase/pull/42227#issuecomment-3807772871)
and
[Correctness](https://www.braintrust.dev/app/supabase.io/p/Assistant/trace?object_type=experiment&object_id=1ad0f9b0-5adb-436c-9812-a87aac62c036&r=1ef13459-a98c-4904-925e-6d81276cebb2&s=dbe5c607-a560-462b-8745-41d430744431)
analysis for new support ticket test case.

Resolves AI-384

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added URL validity scoring to evaluations and helper utilities for
extracting/cleaning URLs.
* Added evaluation cases for support-ticket URL handling and OAuth
callback guidance.

* **Documentation**
* Updated assistant guidance to prefer self-resolution, include
support-ticket direction, clarified data-recovery search steps, and
added template-URL notation.

* **Tests**
* Expanded URL extraction and related utility tests to cover many
formats and edge cases.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-30 09:53:21 -05:00
Matt Rossman a127f2cbbc test(assistant): add eval case for execute_sql usage on default "Generate sample data" prompt (#42219)
test: add eval case for execute_sql on sample data generation
2026-01-28 10:02:22 -05:00
Matt Rossman eb259f1364 feat(assistant): score and improve SQL identifier quoting (#42122)
* feat: SQL correctness scorer, override mock tables

* feat: replace "SQL Correctness" with "SQL Identifier Quoting" scorer

* fix(prompt): discourage simulating confirmation of execute_sql tool

this is already handled at the UI layer

* fix(prompt): encourage quotes on identifiers with caps

* feat: move extractIdentifiers to own file, add tests

* chore: shorten tests

* feat: extract ColumnDef column names in extractIdentifiers

* refactor: sqlIdentifierQuotingScorer with more thorough checks

* refactor: consolidate into `sql-identifier-quoting.ts`

* feat: support mocking schemas, eval test case with case sensitive schema

* fix: test cases that don't match default mock schema

* chore: format

* feat(prompt): mention special characters and reserved words

* feat: optional description in metadata, test with special characters

* feat: consolidated comprehensive test case

* fix(prompt): revert conflicting instruction
2026-01-27 09:28:56 -05:00
Matt Rossman 4553f09bb5 feat(assistant): hallucination scorers + corrective measures for storage versioning answers (#41655)
* feat: "Docs Faithfulness" scorer

* feat: test case for storage object restoration

* feat: "Factuality" scorer

* feat: "Factuality" -> "Correctness"

* feat: update Storage recovery test case

* feat: finishReason in task output

* feat: encourage parallel tool calls + docs search, discourage superfluous context gathering

* prompt tuning (tool selection strategy)

* add data recovery section in chat prompt

* test: S3 versioning support correctness

* refactor: derive stepsSerialized/textOnly from shared steps data

* fix: input in correctness scorer
2026-01-15 14:22:46 +07:00
Matt Rossman 072883bcec feat: assistant evals (#41311)
* chore: bump `supabase` CLI

* chore: stricter message types in `generate-v4.ts`

* feat: tutorial eval

https://www.braintrust.dev/docs/evaluation

* feat: project ID for eval

* refactor: `generateAssistantResponse` out of `handlePost`

* refactor: generateAssistantResponse to lib/ai

* feat: factuality eval with assistant response

* chore: upgrade braintrust to v1.0.1

* chore: silence tsconfig warning

* feat: assertion scorer

* fix: aggregate tools across all steps

* refactor: strict tool names, remove need for `as const`

* refactor: generic tool name type in assertions

* feat: transfer mocks from `feature/braintrust`

* feat: LLM criteria assertion

* feat: braintrust evals workflow

* fix: BRAINTRUST_PROJECT_ID

* feat: `sql_similar` assertion

* fix: `OPENAI_API_KEY` in workflow env

* feat: split AssertionScorer into separate scorers

* feat: remove tutorial eval

* feat: 20 minute CI timeout

* feat: category in test case metadata

* feat: score with gpt-5

* refactor: dataset to own file, colocate scorers

* feat: "gpt-5.2-2025-12-11" for llm as a judge

* feat: SQL syntax scorer with `libpg-query`

* feat: `evals:setup` and `evals:run` scripts

* feat: `evals:setup` in CI

* feat: human readable scorer names

* chore: rename to "SQL Validity"

* feat: add 2 "sql_generation" test cases

* feat: update requiredTools in test cases

* chore: ignore Cursor MCP config

* feat: "Conciseness" score

* feat: "Completeness" scorer

* fix: generate-v4 test mocks

* feat: serialize "steps" for scorer inputs

* updated node mem options for typecheck

* updated runner

* remove ram update as actions handle this

* feat: read `BRAINTRUST_PROJECT_ID` from secrets

* feat: score helpfulness, remove old scorers

* feat: separate `evals:run` and `evals:upload` scripts

* feat: passthrough entire classifier result

* feat: use live `search_docs` impl, store docs result in metadata

* feat: reduce classifier options

* feat: filter workflow by `run-evals` PR label or `master` branch

* chore: cleanup stubbed mock tools

* fix: checkout actual branch with `ref:`

* fix: capture search_docs results from all content parts

* feat: simplify sql syntax score calculation

* feat: use AI SDK's UI message validator

* docs: justification for relative `extends`

* fix: cleanup leftover validatedMessages

* doc: note mock token isn't secret for snyk

* fix: mock ui message to pass validation

* feat: revert ignoring Cursor MCP config

Using `.git/info/exclude` instead until we have an opinion on this

* feat: add "tsconfig" as shared-data devDependency, revert relative path in tsconfig

* refactor: tool call parsing into function

* Update apps/studio/evals/assistant.eval.ts

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* refactor: organize mock schemas and tool factories

---------

Co-authored-by: Ali Waseem <waseema393@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-12-22 23:45:48 -05:00