Commit Graph

3186 Commits

Author SHA1 Message Date
Noa c6c0ba3051 Kick clients that are backing up their message channel (#930)
* Wrap ClientConnectionSender in an Arc everywhere

* Kick clients that are backing up their message channel

* Set tcp nodelay on incoming sockets
2024-03-06 22:39:17 +00:00
Phoebe Goldman 43eafda15b Avoid Header::find_pos_by_name in eval_incr. (#935)
Prior to this commit, `to_mem_table_with_op_type` would call `Header::find_pos_by_name`
to locate the `__op_type` column, if it existed.
This was slow, as `Header::find_pos_by_name` is a linear scan in increasing order,
and the `__op_type` column is always either last or not present,
so `to_mem_table_with_op_type` would traverse every column in the `Header`.
This happened during every incremental evaluation for every query.

With this commit, we rely on the fact that `__op_type` is always the last column if present,
and check only the last column of the header.
2024-03-06 21:37:57 +00:00
John Detter 4c547f02e6 Improve generate help text (#934)
Co-authored-by: John Detter <no-reply@boppygames.gg>
2024-03-06 17:22:16 +00:00
Phoebe Goldman f657e95e63 Single-table subscription queries: plan once, run repeatedly (#928)
* Single-table subscription queries: plan once, run repeatedly

Prior to this commit, every call to `ExecutionUnit::eval_incr`
re-invoked the query planner to convert its `eval` plan
into a plan suitable for reading from a `MemTable` of updates.

With this commit, specifically for single-table select queries,
we invoke the query planner once during `ExecutionUnit::new`,
and store the resuling `eval_incr_plan` for repeated use.

A follow-up will do the same for multi-table semijoins.

* Docstrings for `SourceSet::len` and `::is_empty`

Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>

* Compile select queries all the way to `QueryCode` ahead of time

---------

Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2024-03-06 16:58:25 +00:00
Shubham Mishra 71d435d096 proto changes (#924)
* proto changes

* messageId -> requestId

* fix oneoffquery

* add metrics for subscription

* requestId for reducer

* timer for reducer

* revert oneOffQeury related changes

* request_id u32 -> bytes

* use default RequestId for rust sdk

* added json changes

* added test

* clippy
2024-03-06 16:13:52 +00:00
Noa 5bcd5e2002 Use recv_many in ws_client_actor (#913) 2024-03-05 22:00:18 +00:00
Mazdak Farrokhzad b485ca20e8 Re-land mult-col index selection for queries (#918)
* Revert "Revert "Adding an index selector that take in account multi-column indexes (and improve the `query!` macro) (#694)" (#914)"

This reverts commit 8e5ce79df4.

* drive-by: refactor impl From<IndexScan> for ColumnOp

* reactor IndexScan bounds structure

* remove temp allocation in extract_fields

* skip index scan for NotEq

* drive-by: simplify Select

* clarify unreachable!(...) for NotEq

Signed-off-by: Mazdak Farrokhzad <twingoow@gmail.com>

* address Joshua's review + refactor compiler tests

---------

Signed-off-by: Mazdak Farrokhzad <twingoow@gmail.com>
2024-03-05 18:39:58 +00:00
joshua-spacetime 313f592db9 perf(747): Single query execution for multiple subscriptions (#917)
Closes #747.

Before this change,
we would evaluate each and every query,
for each and every subscription,
on each and every row update.

If N subscriptions had a query Q in common,
it would be evaluated N different times.

With this change,
distinct queries are evaluated once,
and the results copied for each client.

So in the example above, Q would be evaluated once,
with the results transmitted to N different clients.
2024-03-05 16:52:24 +00:00
Mazdak Farrokhzad f2a75f7f33 take AV by ref in iter_by_col_eq (#925) 2024-03-05 10:46:54 +00:00
Phoebe Goldman 9afbfa7b3c Remove Tables from query plans (#912)
* First step towards removing `Table` from query plans

Rework `SourceExpr` to be a logical placeholder for a table,
rather than a table itself.

Make query eval functions take an additional argument, a set of tables.
When evaluating a `SourceExpr` to a table, they will treat the `SourceExpr`
as a reference into the set of sources, and use the referred table.

This commit modifies only the VM crate; modifications to `core` are forthcoming.
It's possible that this commit's scheme for referring to `SourceExpr`s
will need to change,
as currently it forbids duplicate `SourceExpr`s,
which I think might occur during index joins.

* `SourceBuilder` interface for assigning `SourceId`s.

* Integrate new query plan repr into `core`

* Put `DbTable` back in the AST; only `MemTable` is separate

Per review from Joshua, this commit makes `SourceExpr` into an enum
similar to the previous definition, with a `DbTable(DbTable)` variant.
Indirection to a `SourceSet` is imposed only for the `MemTable` variant.

This sould make the PR's overall diff much simpler
(assuming I haven't inadvertently made any changes
in the process of reverting the `DbTable` code paths).

Related to the above, this PR simplifies `SourceSet`.
`SourceSet` now holds a `Vec<Option<MemTable>>`, where previously
it was a transparent newtype around `[Option<Table>]`.
This change eliminates the need for unsafe unsized conversions,
removes `SourceBuilder`,
and causes `SourceSet` to be uniformly consumed by the high-level query eval operators,
where previously `SourceSet`s had to be semi-reusable
because they could contain `DbTable`s.

* Per Mazdak's review, docs!
2024-03-04 20:33:14 +00:00
Mario Montoya 7a5b66e2e3 Fix the RowCount estimation for select operator (#900)
* Fix the RowCount estimation for select operator

* Update crates/vm/src/rel_ops.rs

Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Signed-off-by: Mario Montoya <mamcx@elmalabarista.com>

---------

Signed-off-by: Mario Montoya <mamcx@elmalabarista.com>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2024-03-04 16:26:19 +00:00
John Detter 8e5ce79df4 Revert "Adding an index selector that take in account multi-column indexes (and improve the query! macro) (#694)" (#914)
This reverts commit 726080dadc.

Co-authored-by: John Detter <no-reply@boppygames.gg>
2024-03-01 20:26:24 +00:00
Mario Montoya 726080dadc Adding an index selector that take in account multi-column indexes (and improve the query! macro) (#694)
* Adding an index selector that take in account multi-column indexes (and improve the query! macro)

* move select_best_index to vm/src/expr.rs; get rid of OpCmpIdx

* refactor test best_index

* simplify best_index* tests more

* create_table_multi_index: use ColListBuilder

* move & simplify create_table_multi_index

* simplify assert_index_scan + uses

* remove create_table, twas dead code

* ColumnOpFlat: use SmallVec instead

* simplify ScanIndex

* simplify best_index_range

* Add test for sql + joins + multi-index and fix invalid ambiguos field error

* slightly refactor select_best_index

* remove nonempty dependency

* Add test that actually run the multi-column sql

* Adding benchmark for multi vs many indexes

* simplify create_table_for_test*

* Add comments

* impl new algo for select_best_index + clone less

* improve select_best_index docs

* ScanIndex -> ScanOrIndex

* simplify is_sargable + use smallvec more

* let make_index handle a single ScanOrIndex

* make index stuff more private + remove dead code

* select_best_index: return IndexColumnOp directly; nix ScanOrindex -- this removes an allocation

* do not reconstruct scan argument; avoid heap allocations

* borrow ColList in IndexArgument + avoid temp alloc in is_sargable

* optmize_select: remove Cow from fields_found

* is_sargable: reuse allocation from extract_fields

* rename is_sargable, avoid temp fields_found allocs, simplify optmize_select

* fix subscription benches

* drive-by refactor benches/subscription

* Keep a single benchmark for location

* Squashed commit of the following:

commit e54b09bab2
Author: Mario Montoya <mamcx@elmalabarista.com>
Date:   Thu Feb 29 20:19:24 2024 -0500

    Correctly show the error for AmbiguousField and simplify the code (#910)

commit 48a205a818
Author: Kim Altintop <kim@eagain.io>
Date:   Thu Feb 29 19:32:21 2024 +0100

    core: Fix host controller to not replace module if lifecyle hooks failed (#904)

    * core: Fix host controller to not replace module if lifecyle hooks failed

    Previously, `spawn_module_host` would unconditionally insert the new
    module into the controller state, and not remove it if the lifecycle
    hooks (`init_database` / `update_database`) returned an error.

    This would mean that the module code was replaced with the new one, even
    if it should be rejected because the schema was not updated or the init
    / update reducer failed.

    Fix this by starting the module, and later "committing" it to the
    controller state in two phases.

    * Add commentary about database mutations / transactionality

---------

Signed-off-by: Mario Montoya <mamcx@elmalabarista.com>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2024-03-01 17:46:24 +00:00
Mario Montoya e54b09bab2 Correctly show the error for AmbiguousField and simplify the code (#910) 2024-03-01 01:19:24 +00:00
Kim Altintop 48a205a818 core: Fix host controller to not replace module if lifecyle hooks failed (#904)
* core: Fix host controller to not replace module if lifecyle hooks failed

Previously, `spawn_module_host` would unconditionally insert the new
module into the controller state, and not remove it if the lifecycle
hooks (`init_database` / `update_database`) returned an error.

This would mean that the module code was replaced with the new one, even
if it should be rejected because the schema was not updated or the init
/ update reducer failed.

Fix this by starting the module, and later "committing" it to the
controller state in two phases.

* Add commentary about database mutations / transactionality
2024-02-29 18:32:21 +00:00
joshua-spacetime b19de57522 perf(831): Remove row_pk computation from query (#908)
* perf(831): Remove row_pk computation from query

Closes #831.

* perf(831): remove Op, use identical TableOp.

perf(831): remove unnecesary .into_iter().map_into().collect()

perf(831): cleanup evaluators and make them lazier

* nix PrimaryKey & UniqueValue

* add TableOp::{insert, delete}

---------

Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2024-02-29 17:04:04 +00:00
james gilles 00aa2eedb4 Adjust benchmarks yaml to run callgrind plus with better comments (#907) 2024-02-28 21:45:36 +00:00
Shubham Mishra f7b0e0431a acquire remove subscription lock in spawn_blocking (#903)
* acquire remove subscription lock in spawn_blocking

* nit

Signed-off-by: Shubham Mishra <shubham@clockworklabs.io>

---------

Signed-off-by: Shubham Mishra <shubham@clockworklabs.io>
2024-02-28 20:47:40 +00:00
joshua-spacetime b2b8993e8e perf(832): Remove redundant row deduplication in subscriptions (#863)
* perf(832): Remove redundant row deduplication in subscriptions

Closes #832.

The database already operates under set semantics,
so unless multiple queries return rows from the same table,
deduplication of the result set is not necessary.

* Rip out all deduplication

---------

Co-authored-by: Noa <coolreader18@gmail.com>
2024-02-28 19:51:18 +00:00
Phoebe Goldman 0e02ad425c WebSocket API ref: remove row_pk. (#29)
Re https://github.com/clockworklabs/SpacetimeDB/pull/840 .

We're removing the `row_pk` from the WebSocket API `TableRowOperation`,
as computing it has a major performance impact on the server.

This commit removes references to it from the WebSocket API reference.
2024-02-28 11:12:58 -08:00
Phoebe Goldman d7a7586f71 Remove row_pk from the client API (#34)
Re: https://github.com/clockworklabs/SpacetimeDB/pull/840

This commit updates the TypeScript SDK to no longer use the `row_pk` field
in the client API,
as that field no longer exists.

As in the other SDKs, we replace our use of the `row_pk`
with the serialized representation of the row,
as this saves our needing to have objects/dicts/hash-maps keyed on domain types.

Unlike the other SDKs, we support either the binary (protobuf) or JSON APIs.
When using the binary API, we convert the BSATN row to a string,
and use that as the `rowPk`.
When using the JSON API, we `JSON.stringify` the row itself, and use that as the `rowPk`.
The latter is ugly and not performant,
but we don't care because the JSON API is slow anyways.

This commit also removes some uses of `any` from the deserialization code,
because I wanted the compiler to double-check my work.
2024-02-28 13:56:56 -05:00
Phoebe Goldman 75dc6b59b0 Remove the rowPk from the client API (#72)
Re: https://github.com/clockworklabs/SpacetimeDB/pull/840

This commit updates the C# SDK to no longer use the `row_pk` field
of the Protobuf client API,
as that field has been removed. (Will have been removed, as of merging.)

Where a table cache was previously keyed on `byte[] rowPk`,
it is now keyed on `byte[] rowBytes`,
where `rowBytes` is the BSATN-encoded bytes of the row.
This means we effectively store two copies of each row in the client cache:
the BSATN serialized format, and the decoded domain type.

An alternate implementation would be to make the table caches be sets of domain types,
discarding the BSATN bytes.
We find this undesirable for several reasons:
- Hashing and equality-comparing `byte[]` is almost certainly more efficient
  than doing the same for domain types.
- Even if hashing and equality-comparing domain types were efficient,
  we would still have to update codegen to emit hashing and equality methods
  for all types in the module_bindings.
  This implementation requires no changes to the module_bindings.
- We already have the BSATN bytes sitting around,
  as they're necessarily part of the message we recieve from the server.
  This change does no additional serialization or deserialization.

In essence, we're trading memory for time and simplicity.
Keeping the BSATN bytes live approximately doubles the table cache's memory usage,
but simplifies the implementation greatly,
and (we suspect) speeds up table cache insertions, deletions and lookups.
2024-02-28 13:56:38 -05:00
Phoebe Goldman efde86d981 Remove row_pk from client API; hash rows not row_ids on client (#840)
* Remove `row_pk` from client API; hash rows not row_ids on client

This commit removes the `row_pk` field from `message TableRowOperation` in the client API.
This is the beginning of addressing issue #831.

This has several implications for clients:
- Because we no longer have a stable content-addressed ID for each row,
  the client is responsible for hashing rows itself.
- This means that generated types must be `Hash + Eq`.
- This means that generated types must use `sats::F32` or `sats::F64`
  rather than `f32` or `f64`,
  as the former are `Eq + Hash` while the latter are not.
  (Aside: it's stupid that Rust floats are not `Eq + Ord + Hash`,
   since IEEE-754 defines a total equality and total ordering on floats.)

Exposing `sats::F32` to users is ugly, but I can't think of a better design.

This commit also makes some minor formatting changes to Rust codegen,
since it was emitting code that rustc warned about.

Still outstanding:

- [ ] Remove row_pk from the JSON API.
- [ ] Avoid computing row_pk in the subscription engine.
- [ ] Update other SDKs.
  - [ ] C#
  - [ ] TypeScript
  - [ ] Python

* Combine similar patterns

Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>

* Remove `row_pk` from JSON messages

* fmt

* SDK: ClientCache keyed on BSATN bytes, not domain types

Per Ingvar's suggestion, this commit makes the client cache
be `HashMap<Vec<u8>, T>`, where the key is a BSATN byte-buffer
containing the serialized representation of the value.

This means that generated structs and enums don't need to be `Hash + Eq`.
The key upside here is that we can revert to using `f32`/`f64`
rather than `sats::F32`/`sats::F64`.

---------

Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2024-02-28 18:56:24 +00:00
Noa 558822e9ed Parallelize QuerySet::eval (#891)
* Parallelize QuerySet::eval

* Reduce number of arguments for make_actor
2024-02-28 06:14:30 +00:00
Tyler Cloutier 6e088a734f Added #[tracing::instrument(skip_all)] to add_subscriber 2024-02-27 18:17:08 -08:00
Phoebe Goldman d24ead52c6 VM: Wrap Header in Arc to avoid cloning (#897) 2024-02-27 18:26:38 +00:00
joshua-spacetime 1904008ffc refactor(890): Improve magic constant for index join selection (#895)
Closes #890.

This changes the magic constant from 1000 to 3000 rows,
which means if the indexed table in an index join has 3000 rows or less,
we make it the probe table instead.
2024-02-27 17:40:57 +00:00
Ingvar Stepanyan f63723a845 C#: reuse Consume helper (#704) 2024-02-27 11:25:36 +00:00
Tyler Cloutier 488d1355ac Revert "Run initial subscriptions evals on rayon (#888)" (#894)
This reverts commit 0ef145d435.
2024-02-27 02:08:52 +00:00
Noa 0ef145d435 Run initial subscriptions evals on rayon (#888) 2024-02-26 21:14:13 +00:00
Shubham Mishra 28868647cb serialise disconnect logic (#886)
* serialise disconnect logic

* Keep the ModuleHost::disconnect_client method

---------

Co-authored-by: Noa <coolreader18@gmail.com>
2024-02-26 20:37:51 +00:00
Shubham Mishra 90032f8f58 Metric for request RTT (#872)
* Metric for request RTT

* moved metric from db to worker
2024-02-26 17:58:51 +00:00
Noa 63e97a4954 num_threads = available_concurrency (#887) 2024-02-25 23:07:06 +00:00
james gilles a0a18ddd57 Rename types in benchmarks (#803)
* Rename types in benchmarks, modify benches run in action

* Fix lints
2024-02-23 20:30:59 +00:00
Phoebe Goldman 87da94a9a1 HostThreadPool: don't run Rayon threads in Tokio blocking threads (#881)
Prior to this commit, we ran our Rayon threads
within the Tokio blocking threadpool.
This was bad for several reasons, of which the most obvious was
that it ignored the Rayon `thread_name`.

Our Rayon threads still need access to the Tokio runtime
(which I'm not super jazzed about; see comment),
so this commit adds a custom `spawn_handler`
which behaves like the Rayon default spawn handler,
but also enters the Tokio runtime.

This commit also adds a comment about how we're over-allocating our available parallelism,
and in the future we should not do that.
2024-02-23 18:52:42 +00:00
Phoebe Goldman 77e1c68154 Script to run perf against SpacetimeDB (#882)
* Script to run perf against SpacetimeDB

* Non-controversial script improvements

* No args is fine

---------

Co-authored-by: Boppy <no-reply@boppygames.gg>
2024-02-22 21:53:25 +00:00
Zeke Foppa 3357d8304f Add GitHub workflow to check for PRs with merge-blocking labels (#862)
* [bfops/deny-merge-labels]: add gh workflow to check merge-blocking labels

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: fix?

* [bfops/deny-merge-labels]: simplify
2024-02-22 00:17:35 +00:00
Noa 5de8fe06fc Fix module import warnings (#820)
* Fix module import warnings

* Regenerate module bindings
2024-02-21 20:23:15 +00:00
Mazdak Farrokhzad 913801e22a - Make RelValue into a cow-like structure. (#869)
- Move it and friends from sats to vm.
- MemTable now stores a Vec<PV>.
- Other related improvements.

Co-authored-by: Phoebe Goldman <phoebe@goldman-tribe.org>
2024-02-21 20:07:39 +00:00
Mario Montoya 8bd13a0046 Replace DbProgram with a leaner version that compile directly to queries without environment (#860)
* Replace DbProgram with a leaner version that compile directly to queries without environment

* Remove all the dead-code caused by removing the environment

* Addressing some PR comments

* Addressing some PR comments

* Lint
2024-02-21 18:41:40 +00:00
joshua-spacetime 3a6f5fd288 chore: Remove instrumentation from DataKey deserializers (#852) 2024-02-21 17:24:36 +00:00
Tyler Cloutier c7878a7faf Adds an isolation_level argument to begin_mut_tx calls (#864) 2024-02-21 11:18:05 +00:00
Phoebe Goldman c416f68650 Set name for Rayon threads (#870)
Set a name for Rayon threads, so that they don't inherit their thread names
from the parent, i.e. `tokio-runtime-w`.

Use the same name for all Rayon threads, i.e. ignore the thread-index,
so that tools like `perf` can merge all the Rayon threads into a single backtrace.
2024-02-21 10:33:14 +00:00
Mazdak Farrokhzad 3e2ab71540 dedup some with_label_values + reduce cloning in bootstrapping (#867) 2024-02-20 13:12:21 +00:00
Mario Montoya a0f40d4415 Fix boostrating ids for relating objects like indexes, caused by cache invalidation (#777) 2024-02-20 11:01:08 +00:00
Mazdak Farrokhzad 04120e778a system_tables, mut_tx, and friends: bye bye to_product_value (#851)
* optimize build_missing_tables; collect less + use read_col

* schema_for_table: don't go through PV

* table_id_from_name: use read_col

* relational_db tests: remove uses of to_product_value

* build_sequence_state: don't use to_pv

* build_indexes: don't use to_pv

* remove dead code: CommittedStateIter

* get_all_tables_tx: use read_col

* SystemTableQuery: don't use to_pv

* StModuleRow: don't use to_pv

* get rd of more to_product_value calls

* read_col / system_tables / mut_tx: cleanup + less work2
2024-02-19 19:21:13 +00:00
Tyler Cloutier 4a92618db8 Added extra check to create_table transactionality test (#866) 2024-02-19 02:43:16 +00:00
Mazdak Farrokhzad 5ab4342187 Refactor some ReadColumn stuff + relational_db tests (#847)
* refactor with read_col method + simplify InvalidFieldError creation

* simplify + dedup relational_db tests

* relational_db: dedup tests & nix some to_product_value call

* dedup relational_db tests more
2024-02-19 01:57:36 +00:00
Noa d1c7055665 Fix consistency issue (#861) 2024-02-16 20:06:07 +00:00
Noa e6cef1b627 Fix bench errors and include in CI (#855) 2024-02-16 19:15:13 +00:00