mirror of
https://github.com/sqlalchemy/sqlalchemy.git
synced 2026-06-01 21:38:55 -04:00
- edits to the subqueryload ordering merge
This commit is contained in:
Vendored
+57
-39
@@ -603,62 +603,78 @@ The same idea applies to all the other arguments, such as ``foreign_keys``::
|
||||
|
||||
foo = relationship(Dest, foreign_keys=[foo_id, bar_id])
|
||||
|
||||
.. _faq_subqueryload_sort:
|
||||
.. _faq_subqueryload_limit_sort:
|
||||
|
||||
Why must I always ``ORDER BY`` a unique column when using ``subqueryload``?
|
||||
----------------------------------------------------------------------------
|
||||
Why is ``ORDER BY`` required with ``LIMIT`` (especially with ``subqueryload()``)?
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
The SQL standard prescribes that RDBMSs are free to return rows in any order it
|
||||
deems appropriate, if no ``ORDER BY`` clause is specified. This even extends to
|
||||
the case where the ``ORDER BY`` clause is not unique across all rows, i.e. rows
|
||||
with the same value in the ``ORDER BY`` column(s) will not necessarily be
|
||||
returned in a deterministic order.
|
||||
A relational database can return rows in any
|
||||
arbitrary order, when an explicit ordering is not set.
|
||||
While this ordering very often corresponds to the natural
|
||||
order of rows within a table, this is not the case for all databases and
|
||||
all queries. The consequence of this is that any query that limits rows
|
||||
using ``LIMIT`` or ``OFFSET`` should **always** specify an ``ORDER BY``.
|
||||
Otherwise, it is not deterministic which rows will actually be returned.
|
||||
|
||||
SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query
|
||||
(where the table specified in the relationship is joined to the original query)
|
||||
and then attempting to match up the results in Python. This works fine
|
||||
normally:
|
||||
When we use a SQLAlchemy method like :meth:`.Query.first`, we are in fact
|
||||
applying a ``LIMIT`` of one to the query, so without an explicit ordering
|
||||
it is not deterministic what row we actually get back.
|
||||
While we may not notice this for simple queries on databases that usually
|
||||
returns rows in their natural
|
||||
order, it becomes much more of an issue if we also use :func:`.orm.subqueryload`
|
||||
to load related collections, and we may not be loading the collections
|
||||
as intended.
|
||||
|
||||
SQLAlchemy implements :func:`.orm.subqueryload` by issuing a separate query,
|
||||
the results of which are matched up to the results from the first query.
|
||||
We see two queries emitted like this:
|
||||
|
||||
.. sourcecode:: python+sql
|
||||
|
||||
>>> session.query(User).options(subqueryload(User.addresses)).all()
|
||||
{opensql}# the "main" query
|
||||
{opensql}-- the "main" query
|
||||
SELECT users.id AS users_id
|
||||
FROM users
|
||||
{stop}
|
||||
{opensql}# the "load" query issued by subqueryload
|
||||
SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
|
||||
FROM (SELECT users.id AS users_id
|
||||
FROM users) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
|
||||
{opensql}-- the "load" query issued by subqueryload
|
||||
SELECT addresses.id AS addresses_id,
|
||||
addresses.user_id AS addresses_user_id,
|
||||
anon_1.users_id AS anon_1_users_id
|
||||
FROM (SELECT users.id AS users_id FROM users) AS anon_1
|
||||
JOIN addresses ON anon_1.users_id = addresses.user_id
|
||||
ORDER BY anon_1.users_id
|
||||
|
||||
Notice how the main query is a subquery in the load query. When an
|
||||
``OFFSET``/``LIMIT`` is involved, however, things get a bit tricky:
|
||||
The second query embeds the first query as a source of rows.
|
||||
When the inner query uses ``OFFSET`` and/or ``LIMIT`` without ordering,
|
||||
the two queries may not see the same results:
|
||||
|
||||
.. sourcecode:: python+sql
|
||||
|
||||
>>> user = session.query(User).options(subqueryload(User.addresses)).first()
|
||||
{opensql}# the "main" query
|
||||
{opensql}-- the "main" query
|
||||
SELECT users.id AS users_id
|
||||
FROM users
|
||||
LIMIT 1
|
||||
{stop}
|
||||
{opensql}# the "load" query issued by subqueryload
|
||||
SELECT addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
|
||||
FROM (SELECT users.id AS users_id
|
||||
FROM users
|
||||
LIMIT 1) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id
|
||||
{opensql}-- the "load" query issued by subqueryload
|
||||
SELECT addresses.id AS addresses_id,
|
||||
addresses.user_id AS addresses_user_id,
|
||||
anon_1.users_id AS anon_1_users_id
|
||||
FROM (SELECT users.id AS users_id FROM users LIMIT 1) AS anon_1
|
||||
JOIN addresses ON anon_1.users_id = addresses.user_id
|
||||
ORDER BY anon_1.users_id
|
||||
|
||||
The main query is still a subquery in the load query, but *it may return a
|
||||
different set of results in the second query from the first* because it does
|
||||
not have a deterministic sort order! Depending on database internals, there is
|
||||
a chance we may get the following resultset for the two queries::
|
||||
Depending on database specifics, there is
|
||||
a chance we may get the a result like the following for the two queries::
|
||||
|
||||
-- query #1
|
||||
+--------+
|
||||
|users_id|
|
||||
+--------+
|
||||
| 1|
|
||||
+--------+
|
||||
|
||||
-- query #2
|
||||
+------------+-----------------+---------------+
|
||||
|addresses_id|addresses_user_id|anon_1_users_id|
|
||||
+------------+-----------------+---------------+
|
||||
@@ -667,26 +683,28 @@ a chance we may get the following resultset for the two queries::
|
||||
| 4| 2| 2|
|
||||
+------------+-----------------+---------------+
|
||||
|
||||
From SQLAlchemy's point of view, it didn't get any addresses back for user 1,
|
||||
so ``user.addresses`` is empty. Oops.
|
||||
Above, we receive two ``addresses`` rows for ``user.id`` of 2, and none for
|
||||
1. We've wasted two rows and failed to actually load the collection. This
|
||||
is an insidious error because without looking at the SQL and the results, the
|
||||
ORM will not show that there's any issue; if we access the ``addresses``
|
||||
for the ``User`` we have, it will emit a lazy load for the collection and we
|
||||
won't see that anything actually went wrong.
|
||||
|
||||
The solution to this problem is to always specify a deterministic sort order,
|
||||
so that the main query always returns the same set of rows. This generally
|
||||
means that you should :meth:`.Query.order_by` on a unique column on the table,
|
||||
usually the primary key::
|
||||
means that you should :meth:`.Query.order_by` on a unique column on the table.
|
||||
The primary key is a good choice for this::
|
||||
|
||||
session.query(User).options(subqueryload(User.addresses)).order_by(User.id).first()
|
||||
|
||||
You can get away with not doing a sort if the ``OFFSET``/``LIMIT`` does not
|
||||
throw away any rows at all, but it's much simpler to remember to always ``ORDER
|
||||
BY`` the primary key::
|
||||
|
||||
session.query(User).options(subqueryload(User.addresses)).filter(User.id == 1).first()
|
||||
|
||||
Note that :func:`.joinedload` does not suffer from the same problem because
|
||||
only one query is ever issued, so the load query cannot be different from the
|
||||
main query.
|
||||
|
||||
.. seealso::
|
||||
|
||||
:ref:`subqueryload_ordering`
|
||||
|
||||
Performance
|
||||
===========
|
||||
|
||||
|
||||
Vendored
+18
-7
@@ -120,21 +120,32 @@ query options:
|
||||
# set children to load eagerly with a second statement
|
||||
session.query(Parent).options(subqueryload('children')).all()
|
||||
|
||||
.. _subquery_loading_tips:
|
||||
.. _subqueryload_ordering:
|
||||
|
||||
Subquery Loading Tips
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
The Importance of Ordering
|
||||
--------------------------
|
||||
|
||||
If you have ``LIMIT`` or ``OFFSET`` in your query, you **must** ``ORDER BY`` a
|
||||
unique column, generally the primary key of your table, in order to ensure
|
||||
correct results (see :ref:`faq_subqueryload_sort`)::
|
||||
A query which makes use of :func:`.subqueryload` in conjunction with a
|
||||
limiting modifier such as :meth:`.Query.first`, :meth:`.Query.limit`,
|
||||
or :meth:`.Query.offset` should **always** include :meth:`.Query.order_by`
|
||||
against unique column(s) such as the primary key, so that the additional queries
|
||||
emitted by :func:`.subqueryload` include
|
||||
the same ordering as used by the parent query. Without it, there is a chance
|
||||
that the inner query could return the wrong rows::
|
||||
|
||||
# incorrect
|
||||
# incorrect, no ORDER BY
|
||||
session.query(User).options(subqueryload(User.addresses)).first()
|
||||
|
||||
# incorrect if User.name is not unique
|
||||
session.query(User).options(subqueryload(User.addresses)).order_by(User.name).first()
|
||||
|
||||
# correct
|
||||
session.query(User).options(subqueryload(User.addresses)).order_by(User.name, User.id).first()
|
||||
|
||||
.. seealso::
|
||||
|
||||
:ref:`faq_subqueryload_limit_sort` - detailed example
|
||||
|
||||
Loading Along Paths
|
||||
-------------------
|
||||
|
||||
|
||||
Vendored
+5
-4
@@ -1631,11 +1631,12 @@ very easy to use:
|
||||
>>> jack.addresses
|
||||
[<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>]
|
||||
|
||||
.. warning::
|
||||
.. note::
|
||||
|
||||
If you use :func:`.subqueryload`, you should generally
|
||||
:meth:`.Query.order_by` on a unique column in order to ensure correct
|
||||
results. See :ref:`subquery_loading_tips`.
|
||||
:func:`.subqueryload` when used in conjunction with limiting such as
|
||||
:meth:`.Query.first`, :meth:`.Query.limit` or :meth:`.Query.offset`
|
||||
should also include :meth:`.Query.order_by` on a unique column in order to
|
||||
ensure correct results. See :ref:`subqueryload_ordering`.
|
||||
|
||||
Joined Load
|
||||
-------------
|
||||
|
||||
Reference in New Issue
Block a user