turn off use_insertmanyvalues for SQL Server

we will keep trying to find workarounds, however this
patch is the "turn it off" patch

Due to a critical bug identified in SQL Server, the SQLAlchemy
"insertmanyvalues" feature which allows fast INSERT of many rows while also
supporting RETURNING unfortunately needs to be disabled for SQL Server. SQL
Server is apparently unable to guarantee that the order of rows inserted
matches the order in which they are sent back by OUTPUT inserted when
table-valued rows are used with INSERT in conjunction with OUTPUT inserted.
We are trying to see if Microsoft is able to confirm this undocumented
behavior however there is no known workaround, other than it's not safe to
use table-valued expressions with OUTPUT inserted for now.

Fixes: #9603
Change-Id: I4b932fb8774390bbdf4e870a1f6cfe9a78c4b105
This commit is contained in:
Mike Bayer
2023-04-05 12:59:13 -04:00
parent e79ab08165
commit 7fb7365622
9 changed files with 165 additions and 50 deletions
+14
View File
@@ -0,0 +1,14 @@
.. change::
:tags: bug, mssql
:tickets: 9603
Due to a critical bug identified in SQL Server, the SQLAlchemy
"insertmanyvalues" feature which allows fast INSERT of many rows while also
supporting RETURNING unfortunately needs to be disabled for SQL Server. SQL
Server is apparently unable to guarantee that the order of rows inserted
matches the order in which they are sent back by OUTPUT inserted when
table-valued rows are used with INSERT in conjunction with OUTPUT inserted.
We are trying to see if Microsoft is able to confirm this undocumented
behavior however there is no known workaround, other than it's not safe to
use table-valued expressions with OUTPUT inserted for now.
+16 -7
View File
@@ -859,7 +859,8 @@ Optimized ORM bulk insert now implemented for all backends other than MySQL
The dramatic performance improvement introduced in the 1.4 series and described
at :ref:`change_5263` has now been generalized to all included backends that
support RETURNING, which is all backends other than MySQL: SQLite, MariaDB,
PostgreSQL (all drivers), Oracle, and SQL Server. While the original feature
PostgreSQL (all drivers), and Oracle; SQL Server has support but unfortunately
had to be turned off due to an issue with SQL Server [#]_. While the original feature
was most critical for the psycopg2 driver which otherwise had major performance
issues when using ``cursor.executemany()``, the change is also critical for
other PostgreSQL drivers such as asyncpg, as when using RETURNING,
@@ -894,16 +895,16 @@ With most databases now offering RETURNING (with the conspicuous exception of
MySQL, given that MariaDB supports it), the new change generalizes the psycopg2
"fast execution helper" approach to all dialects that support RETURNING, which
now includes SQlite and MariaDB, and for which no other approach for
"executemany plus RETURNING" is possible, which includes SQLite, MariaDB, all
PG drivers, and SQL Server. The cx_Oracle and oracledb drivers used for Oracle
"executemany plus RETURNING" is possible, which includes SQLite, MariaDB, and all
PG drivers. The cx_Oracle and oracledb drivers used for Oracle
support RETURNING with executemany natively, and this has also been implemented
to provide equivalent performance improvements. With SQLite and MariaDB now
to provide equivalent performance improvements. With SQLite and MariaDB now
offering RETURNING support, ORM use of ``cursor.lastrowid`` is nearly a thing
of the past, with only MySQL still relying upon it.
For INSERT statements that don't use RETURNING, traditional executemany()
behavior is used for most backends, with the current exceptions of psycopg2
and mssql+pyodbc, which both have very slow executemany() performance overall
behavior is used for most backends, with the current exception of psycopg2,
which has very slow executemany() performance overall
and are still improved by the "insertmanyvalues" approach.
Benchmarks
@@ -974,10 +975,18 @@ sqlite+pysqlite2 (memory) 6.204843 3.554856
postgresql+asyncpg (network) 88.292285 4.561492
postgresql+psycopg (network) N/A (psycopg3) 4.861368
oracle+cx_Oracle (network) 92.603953 4.809520
mssql+pyodbc (network) 158.396667 4.825139
mariadb+mysqldb (network) 71.705197 4.075377
============================ ==================== ====================
.. mssql+pyodbc (network) .. 158.396667 .. 4.825139
.. note::
.. [#] The feature is disabled for SQL Server as of SQLAlchemy 2.0.9 due
to incompatibilities in how table-valued expressions are handled by
SQL Server. See https://github.com/sqlalchemy/sqlalchemy/issues/9603
Two additional drivers have no change in performance; the psycopg2 drivers,
for which fast executemany was already implemented in SQLAlchemy 1.4,
and MySQL, which continues to not offer RETURNING support:
+6 -9
View File
@@ -1854,7 +1854,9 @@ as follows:
* SQLite - supported for SQLite versions 3.35 and above
* PostgreSQL - all supported Postgresql versions (9 and above)
* SQL Server - all supported SQL Server versions
* SQL Server - **disabled by default as of SQLAlchemy 2.0.9** - the SQL syntax
used has been shown to not be safe for RETURNING
(see https://github.com/sqlalchemy/sqlalchemy/issues/9603)
* MariaDB - supported for MariaDB versions 10.5 and above
* MySQL - no support, no RETURNING feature is present
* Oracle - supports RETURNING with executemany using native cx_Oracle / OracleDB
@@ -1887,11 +1889,7 @@ The feature can also be disabled from being used implicitly for a particular
)
The reason one might want to disable RETURNING for a specific table is to
work around backend-specific limitations. For example, there is a known
limitation of SQL Server that the ``OUTPUT inserted.<colname>`` feature
may not work correctly for a table that has INSERT triggers established;
such a table may need to include ``implicit_returning=False`` (see
:ref:`mssql_triggers`).
work around backend-specific limitations.
.. _engine_insertmanyvalues_page_size:
@@ -1929,9 +1927,8 @@ varies by dialect and server version; the largest size is 32700 (chosen as a
healthy distance away from PostgreSQL's limit of 32767 and SQLite's modern
limit of 32766, while leaving room for additional parameters in the statement
as well as for DBAPI quirkiness). Older versions of SQLite (prior to 3.32.0)
will set this value to 999; SQL Server sets it to 2099. MariaDB has no
established limit however 32700 remains as a limiting factor for SQL message
size.
will set this value to 999. MariaDB has no established limit however 32700
remains as a limiting factor for SQL message size.
The value of the "batch size" can be affected :class:`_engine.Engine`
wide via the :paramref:`_sa.create_engine.insertmanyvalues_page_size` parameter.
+15 -13
View File
@@ -251,18 +251,11 @@ The process for fetching this value has several variants:
INSERT INTO t (x) OUTPUT inserted.id VALUES (?)
As of SQLAlchemy 2.0, the :ref:`engine_insertmanyvalues` feature is also
used by default to optimize many-row INSERT statements; for SQL Server
the feature takes place for both RETURNING and-non RETURNING
INSERT statements.
* The value of :paramref:`_sa.create_engine.insertmanyvalues_page_size`
defaults to 1000, however the ultimate page size for a particular INSERT
statement may be limited further, based on an observed limit of
2100 bound parameters for a single statement in SQL Server.
The page size may also be modified on a per-engine
or per-statement basis; see the section
:ref:`engine_insertmanyvalues_page_size` for details.
.. note:: SQLAlchemy 2.0 introduced the :ref:`engine_insertmanyvalues`
feature for SQL Server, which is used by default to optimize many-row
INSERT statements; however as of SQLAlchemy 2.0.9 this feature had
to be turned off for SQL Server as the database does not support
deterministic RETURNING of INSERT rows for a multi-row INSERT statement.
* When RETURNING is not available or has been disabled via
``implicit_returning=False``, either the ``scope_identity()`` function or
@@ -3017,7 +3010,8 @@ class MSDialect(default.DefaultDialect):
# may be changed at server inspection time for older SQL server versions
supports_multivalues_insert = True
use_insertmanyvalues = True
# disabled due to #9603
use_insertmanyvalues = False
# note pyodbc will set this to False if fast_executemany is set,
# as of SQLAlchemy 2.0.9
@@ -3085,6 +3079,14 @@ class MSDialect(default.DefaultDialect):
super().__init__(**opts)
if self.use_insertmanyvalues:
raise exc.ArgumentError(
"The use_insertmanyvalues feature on SQL Server is currently "
"not safe to use, as returned result rows may be returned in "
"random order. Ensure use_insertmanyvalues is left at its "
"default of False (this setting changed in SQLAlchemy 2.0.9)"
)
self._json_serializer = json_serializer
self._json_deserializer = json_deserializer
+7 -13
View File
@@ -290,25 +290,19 @@ Pyodbc have been resolved as of SQLAlchemy 2.0.5. See the notes at
Fast Executemany Mode
---------------------
.. note:: SQLAlchemy 2.0 now includes an equivalent "fast executemany"
handler for INSERT statements that is more robust than the PyODBC feature
(but is not quite as performant particularly for very large datasets);
the feature is called :ref:`insertmanyvalues <engine_insertmanyvalues>`
and is enabled for all INSERT statements by default.
SQLAlchemy's feature integrates with the PyODBC ``setinputsizes()`` method
which allows for more accurate specification of datatypes, and additionally
uses a dynamically sized, batched approach that scales to any number of
columns and/or rows.
The SQL Server ``fast_executemany`` parameter may be used at the same time
as ``insertmanyvalues`` is enabled; however, the parameter will not be used
for INSERT statements that include RETURNING.
.. note:: SQLAlchemy 2.0 introduced the :ref:`engine_insertmanyvalues`
feature for SQL Server, which is used by default to optimize many-row
INSERT statements; however as of SQLAlchemy 2.0.9 this feature had
to be turned off for SQL Server as the database does not support
deterministic RETURNING of INSERT rows for a multi-row INSERT statement.
.. versionchanged:: 2.0.9 - ``fast_executemany`` executions will be used
for INSERT statements that don't include RETURNING, when
``fast_executemany`` is set. Previously, ``use_insertmanyvalues`` would
cause ``fast_executemany`` to not be used in most cases.
``use_insertmanyvalues`` is disabled for SQL Server overall as of 2.0.9.
The PyODBC driver includes support for a "fast executemany" mode of execution
which greatly reduces round trips for a DBAPI ``executemany()`` call when using
Microsoft ODBC drivers, for **limited size batches that fit in memory**. The
+10 -1
View File
@@ -3,6 +3,7 @@ import re
from unittest.mock import Mock
from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import event
from sqlalchemy import exc
from sqlalchemy import inspect
@@ -423,7 +424,7 @@ class FastExecutemanyTest(fixtures.TestBase):
@testing.variation("add_event", [True, False])
@testing.variation("setinputsizes", [True, False])
@testing.variation("fastexecutemany", [True, False])
@testing.variation("insertmanyvalues", [True, False])
@testing.variation("insertmanyvalues", [False]) # disabled due to #9603
@testing.variation("broken_types", [True, False])
def test_insert_typing(
self,
@@ -628,6 +629,14 @@ class MiscTest(fixtures.TestBase):
__only_on__ = "mssql"
__backend__ = True
def test_no_insertmanyvalues(self):
with expect_raises_message(
exc.ArgumentError,
"The use_insertmanyvalues feature on SQL Server is "
"currently not safe to use",
):
create_engine("mssql+pyodbc://", use_insertmanyvalues=True)
@testing.variation("enable_comments", [True, False])
def test_comments_enabled_disabled(
self, testing_engine, metadata, enable_comments
+69
View File
@@ -19,6 +19,7 @@ from sqlalchemy import Table
from sqlalchemy import testing
from sqlalchemy.dialects.mssql import base as mssql
from sqlalchemy.dialects.mssql import pyodbc as mssql_pyodbc
from sqlalchemy.orm import Session
from sqlalchemy.testing import AssertsCompiledSQL
from sqlalchemy.testing import config
from sqlalchemy.testing import engines
@@ -159,6 +160,74 @@ class QueryTest(testing.AssertsExecutionResults, fixtures.TestBase):
__only_on__ = "mssql"
__backend__ = True
def test_broken_table_values_test(self, decl_base, connection):
"""test #9603.
this uses the ORM to ensure the ORM is not using any kind of
insertmany that causes the problem.
"""
class Datum(decl_base):
__tablename__ = "datum"
id = Column(Integer, autoincrement=False, primary_key=True)
data = Column(String(10))
class Result(decl_base):
__tablename__ = "result"
id = Column(Integer, primary_key=True)
thing = Column(Integer)
lft_datum_id = Column(Integer, ForeignKey(Datum.id))
decl_base.metadata.create_all(connection)
with Session(connection) as sess:
size = 15
datum_ids = list(range(1, size + 1))
sess.add_all([Datum(id=id_, data=f"d{id_}") for id_ in datum_ids])
sess.flush()
result_data = [
Result(thing=num, lft_datum_id=datum_ids[num % size])
for num in range(size * size)
]
sess.add_all(result_data)
sess.flush()
# this is what we expected we put in
the_data_in_order_should_be = [
(num + 1, num, datum_ids[num % size])
for num in range(size * size)
]
# and yes, that's what went in
eq_(
sess.execute(
select(
Result.id, Result.thing, Result.lft_datum_id
).order_by(Result.id)
).all(),
the_data_in_order_should_be,
)
# however, if insertmanyvalues is turned on, OUTPUT inserted
# did not give us the rows in the order we sent, so ids were
# mis-applied. even if we sort the original records by the
# ids that were given
eq_(
[
(r.id, r.thing, r.lft_datum_id)
for r in sorted(result_data, key=lambda r: r.id)
],
the_data_in_order_should_be,
)
def test_fetchid_trigger(self, metadata, connection):
# TODO: investigate test hang on mssql when connection fixture is used
"""
+2 -1
View File
@@ -1335,7 +1335,8 @@ class StringRoundTripTest(fixtures.TestBase):
id_="ia",
)
@testing.combinations(
("insertmany", True),
# disabled due to #9603
# ("insertmany", True),
("insertsingle", False),
argnames="insertmany",
id_="ia",
+26 -6
View File
@@ -3217,14 +3217,34 @@ class EagerDefaultsSettingTest(
Conditional(
expect_returning,
[
CompiledSQL(
"INSERT INTO test (id, bar) VALUES (:id, :bar) "
"RETURNING test.foo",
Conditional(
connection.dialect.insert_executemany_returning,
[
{"id": 1, "bar": 6},
{"id": 2, "bar": 6},
CompiledSQL(
"INSERT INTO test (id, bar) "
"VALUES (:id, :bar) "
"RETURNING test.foo",
[
{"id": 1, "bar": 6},
{"id": 2, "bar": 6},
],
)
],
)
[
CompiledSQL(
"INSERT INTO test (id, bar) "
"VALUES (:id, :bar) "
"RETURNING test.foo",
{"id": 1, "bar": 6},
),
CompiledSQL(
"INSERT INTO test (id, bar) "
"VALUES (:id, :bar) "
"RETURNING test.foo",
{"id": 2, "bar": 6},
),
],
),
],
[
CompiledSQL(