Files
sqlalchemy/doc/build/core/pooling.rst
T
Mike Bayer 2db5256143 - modernize the mysql connection timeout docs
Change-Id: Icb0474509539c1eb7536544749f2a48b4972078a
(cherry picked from commit 4ce46fb0a085c1cc739e21881cc25567e663f8dc)
2017-08-22 16:52:10 -04:00

546 lines
22 KiB
ReStructuredText

.. _pooling_toplevel:
Connection Pooling
==================
.. module:: sqlalchemy.pool
A connection pool is a standard technique used to maintain
long running connections in memory for efficient re-use,
as well as to provide
management for the total number of connections an application
might use simultaneously.
Particularly for
server-side web applications, a connection pool is the standard way to
maintain a "pool" of active database connections in memory which are
reused across requests.
SQLAlchemy includes several connection pool implementations
which integrate with the :class:`.Engine`. They can also be used
directly for applications that want to add pooling to an otherwise
plain DBAPI approach.
Connection Pool Configuration
-----------------------------
The :class:`~.engine.Engine` returned by the
:func:`~sqlalchemy.create_engine` function in most cases has a :class:`.QueuePool`
integrated, pre-configured with reasonable pooling defaults. If
you're reading this section only to learn how to enable pooling - congratulations!
You're already done.
The most common :class:`.QueuePool` tuning parameters can be passed
directly to :func:`~sqlalchemy.create_engine` as keyword arguments:
``pool_size``, ``max_overflow``, ``pool_recycle`` and
``pool_timeout``. For example::
engine = create_engine('postgresql://me@localhost/mydb',
pool_size=20, max_overflow=0)
In the case of SQLite, the :class:`.SingletonThreadPool` or
:class:`.NullPool` are selected by the dialect to provide
greater compatibility with SQLite's threading and locking
model, as well as to provide a reasonable default behavior
to SQLite "memory" databases, which maintain their entire
dataset within the scope of a single connection.
All SQLAlchemy pool implementations have in common
that none of them "pre create" connections - all implementations wait
until first use before creating a connection. At that point, if
no additional concurrent checkout requests for more connections
are made, no additional connections are created. This is why it's perfectly
fine for :func:`.create_engine` to default to using a :class:`.QueuePool`
of size five without regard to whether or not the application really needs five connections
queued up - the pool would only grow to that size if the application
actually used five connections concurrently, in which case the usage of a
small pool is an entirely appropriate default behavior.
.. _pool_switching:
Switching Pool Implementations
------------------------------
The usual way to use a different kind of pool with :func:`.create_engine`
is to use the ``poolclass`` argument. This argument accepts a class
imported from the ``sqlalchemy.pool`` module, and handles the details
of building the pool for you. Common options include specifying
:class:`.QueuePool` with SQLite::
from sqlalchemy.pool import QueuePool
engine = create_engine('sqlite:///file.db', poolclass=QueuePool)
Disabling pooling using :class:`.NullPool`::
from sqlalchemy.pool import NullPool
engine = create_engine(
'postgresql+psycopg2://scott:tiger@localhost/test',
poolclass=NullPool)
Using a Custom Connection Function
----------------------------------
All :class:`.Pool` classes accept an argument ``creator`` which is
a callable that creates a new connection. :func:`.create_engine`
accepts this function to pass onto the pool via an argument of
the same name::
import sqlalchemy.pool as pool
import psycopg2
def getconn():
c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test')
# do things with 'c' to set up
return c
engine = create_engine('postgresql+psycopg2://', creator=getconn)
For most "initialize on connection" routines, it's more convenient
to use the :class:`.PoolEvents` event hooks, so that the usual URL argument to
:func:`.create_engine` is still usable. ``creator`` is there as
a last resort for when a DBAPI has some form of ``connect``
that is not at all supported by SQLAlchemy.
Constructing a Pool
-------------------
To use a :class:`.Pool` by itself, the ``creator`` function is
the only argument that's required and is passed first, followed
by any additional options::
import sqlalchemy.pool as pool
import psycopg2
def getconn():
c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test')
return c
mypool = pool.QueuePool(getconn, max_overflow=10, pool_size=5)
DBAPI connections can then be procured from the pool using the :meth:`.Pool.connect`
function. The return value of this method is a DBAPI connection that's contained
within a transparent proxy::
# get a connection
conn = mypool.connect()
# use it
cursor = conn.cursor()
cursor.execute("select foo")
The purpose of the transparent proxy is to intercept the ``close()`` call,
such that instead of the DBAPI connection being closed, it is returned to the
pool::
# "close" the connection. Returns
# it to the pool.
conn.close()
The proxy also returns its contained DBAPI connection to the pool
when it is garbage collected,
though it's not deterministic in Python that this occurs immediately (though
it is typical with cPython).
The ``close()`` step also performs the important step of calling the
``rollback()`` method of the DBAPI connection. This is so that any
existing transaction on the connection is removed, not only ensuring
that no existing state remains on next usage, but also so that table
and row locks are released as well as that any isolated data snapshots
are removed. This behavior can be disabled using the ``reset_on_return``
option of :class:`.Pool`.
A particular pre-created :class:`.Pool` can be shared with one or more
engines by passing it to the ``pool`` argument of :func:`.create_engine`::
e = create_engine('postgresql://', pool=mypool)
Pool Events
-----------
Connection pools support an event interface that allows hooks to execute
upon first connect, upon each new connection, and upon checkout and
checkin of connections. See :class:`.PoolEvents` for details.
.. _pool_disconnects:
Dealing with Disconnects
------------------------
The connection pool has the ability to refresh individual connections as well as
its entire set of connections, setting the previously pooled connections as
"invalid". A common use case is allow the connection pool to gracefully recover
when the database server has been restarted, and all previously established connections
are no longer functional. There are two approaches to this.
.. _pool_disconnects_pessimistic:
Disconnect Handling - Pessimistic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The pessimistic approach refers to emitting a test statement on the SQL
connection at the start of each connection pool checkout, to test
that the database connection is still viable. Typically, this
is a simple statement like "SELECT 1", but may also make use of some
DBAPI-specific method to test the connection for liveness.
The approach adds a small bit of overhead to the connection checkout process,
however is otherwise the most simple and reliable approach to completely
eliminating database errors due to stale pooled connections. The calling
application does not need to be concerned about organizing operations
to be able to recover from stale connections checked out from the pool.
It is critical to note that the pre-ping approach **does not accommodate for
connections dropped in the middle of transactions or other SQL operations**.
If the database becomes unavailable while a transaction is in progress, the
transaction will be lost and the database error will be raised. While
the :class:`.Connection` object will detect a "disconnect" situation and
recycle the connection as well as invalidate the rest of the connection pool
when this condition occurs,
the individual operation where the exception was raised will be lost, and it's
up to the application to either abandon
the operation, or retry the whole transaction again.
Pessimistic testing of connections upon checkout is achievable by
using the :paramref:`.Pool.pre_ping` argument, available from :func:`.create_engine`
via the :paramref:`.create_engine.pool_pre_ping` argument::
engine = create_engine("mysql+pymysql://user:pw@host/db", pool_pre_ping=True)
The "pre ping" feature will normally emit SQL equivalent to "SELECT 1" each time a
connection is checked out from the pool; if an error is raised that is detected
as a "disconnect" situation, the connection will be immediately recycled, and
all other pooled connections older than the current time are invalidated, so
that the next time they are checked out, they will also be recycled before use.
If the database is still not available when "pre ping" runs, then the initial
connect will fail and the error for failure to connect will be propagated
normally. In the uncommon situation that the database is available for
connections, but is not able to respond to a "ping", the "pre_ping" will try up
to three times before giving up, propagating the database error last received.
.. note::
the "SELECT 1" emitted by "pre-ping" is invoked within the scope
of the connection pool / dialect, using a very short codepath for minimal
Python latency. As such, this statement is **not logged in the SQL
echo output**, and will not show up in SQLAlchemy's engine logging.
.. versionadded:: 1.2 Added "pre-ping" capability to the :class:`.Pool`
class.
Custom / Legacy Pessimistic Ping
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before :paramref:`.create_engine.pool_pre_ping` was added, the "pre-ping"
approach historically has been performed manually using
the :meth:`.ConnectionEvents.engine_connect` engine event.
The most common recipe for this is below, for reference
purposes in case an application is already using such a recipe, or special
behaviors are needed::
from sqlalchemy import exc
from sqlalchemy import event
from sqlalchemy import select
some_engine = create_engine(...)
@event.listens_for(some_engine, "engine_connect")
def ping_connection(connection, branch):
if branch:
# "branch" refers to a sub-connection of a connection,
# we don't want to bother pinging on these.
return
# turn off "close with result". This flag is only used with
# "connectionless" execution, otherwise will be False in any case
save_should_close_with_result = connection.should_close_with_result
connection.should_close_with_result = False
try:
# run a SELECT 1. use a core select() so that
# the SELECT of a scalar value without a table is
# appropriately formatted for the backend
connection.scalar(select([1]))
except exc.DBAPIError as err:
# catch SQLAlchemy's DBAPIError, which is a wrapper
# for the DBAPI's exception. It includes a .connection_invalidated
# attribute which specifies if this connection is a "disconnect"
# condition, which is based on inspection of the original exception
# by the dialect in use.
if err.connection_invalidated:
# run the same SELECT again - the connection will re-validate
# itself and establish a new connection. The disconnect detection
# here also causes the whole connection pool to be invalidated
# so that all stale connections are discarded.
connection.scalar(select([1]))
else:
raise
finally:
# restore "close with result"
connection.should_close_with_result = save_should_close_with_result
The above recipe has the advantage that we are making use of SQLAlchemy's
facilities for detecting those DBAPI exceptions that are known to indicate
a "disconnect" situation, as well as the :class:`.Engine` object's ability
to correctly invalidate the current connection pool when this condition
occurs and allowing the current :class:`.Connection` to re-validate onto
a new DBAPI connection.
Disconnect Handling - Optimistic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When pessimistic handling is not employed, as well as when the database is
shutdown and/or restarted in the middle of a connection's period of use within
a transaction, the other approach to dealing with stale / closed connections is
to let SQLAlchemy handle disconnects as they occur, at which point all
connections in the pool are invalidated, meaning they are assumed to be
stale and will be refreshed upon next checkout. This behavior assumes the
:class:`.Pool` is used in conjunction with a :class:`.Engine`.
The :class:`.Engine` has logic which can detect
disconnection events and refresh the pool automatically.
When the :class:`.Connection` attempts to use a DBAPI connection, and an
exception is raised that corresponds to a "disconnect" event, the connection
is invalidated. The :class:`.Connection` then calls the :meth:`.Pool.recreate`
method, effectively invalidating all connections not currently checked out so
that they are replaced with new ones upon next checkout. This flow is
illustrated by the code example below::
from sqlalchemy import create_engine, exc
e = create_engine(...)
c = e.connect()
try:
# suppose the database has been restarted.
c.execute("SELECT * FROM table")
c.close()
except exc.DBAPIError, e:
# an exception is raised, Connection is invalidated.
if e.connection_invalidated:
print("Connection was invalidated!")
# after the invalidate event, a new connection
# starts with a new Pool
c = e.connect()
c.execute("SELECT * FROM table")
The above example illustrates that no special intervention is needed to
refresh the pool, which continues normally after a disconnection event is
detected. However, one database exception is raised, per each connection
that is in use while the database unavailability event occurred.
In a typical web application using an ORM Session, the above condition would
correspond to a single request failing with a 500 error, then the web application
continuing normally beyond that. Hence the approach is "optimistic" in that frequent
database restarts are not anticipated.
.. _pool_setting_recycle:
Setting Pool Recycle
~~~~~~~~~~~~~~~~~~~~
An additional setting that can augment the "optimistic" approach is to set the
pool recycle parameter. This parameter prevents the pool from using a particular
connection that has passed a certain age, and is appropriate for database backends
such as MySQL that automatically close connections that have been stale after a particular
period of time::
from sqlalchemy import create_engine
e = create_engine("mysql://scott:tiger@localhost/test", pool_recycle=3600)
Above, any DBAPI connection that has been open for more than one hour will be invalidated and replaced,
upon next checkout. Note that the invalidation **only** occurs during checkout - not on
any connections that are held in a checked out state. ``pool_recycle`` is a function
of the :class:`.Pool` itself, independent of whether or not an :class:`.Engine` is in use.
.. _pool_connection_invalidation:
More on Invalidation
^^^^^^^^^^^^^^^^^^^^
The :class:`.Pool` provides "connection invalidation" services which allow
both explicit invalidation of a connection as well as automatic invalidation
in response to conditions that are determined to render a connection unusable.
"Invalidation" means that a particular DBAPI connection is removed from the
pool and discarded. The ``.close()`` method is called on this connection
if it is not clear that the connection itself might not be closed, however
if this method fails, the exception is logged but the operation still proceeds.
When using a :class:`.Engine`, the :meth:`.Connection.invalidate` method is
the usual entrypoint to explicit invalidation. Other conditions by which
a DBAPI connection might be invalidated include:
* a DBAPI exception such as :class:`.OperationalError`, raised when a
method like ``connection.execute()`` is called, is detected as indicating
a so-called "disconnect" condition. As the Python DBAPI provides no
standard system for determining the nature of an exception, all SQLAlchemy
dialects include a system called ``is_disconnect()`` which will examine
the contents of an exception object, including the string message and
any potential error codes included with it, in order to determine if this
exception indicates that the connection is no longer usable. If this is the
case, the :meth:`._ConnectionFairy.invalidate` method is called and the
DBAPI connection is then discarded.
* When the connection is returned to the pool, and
calling the ``connection.rollback()`` or ``connection.commit()`` methods,
as dictated by the pool's "reset on return" behavior, throws an exception.
A final attempt at calling ``.close()`` on the connection will be made,
and it is then discarded.
* When a listener implementing :meth:`.PoolEvents.checkout` raises the
:class:`~sqlalchemy.exc.DisconnectionError` exception, indicating that the connection
won't be usable and a new connection attempt needs to be made.
All invalidations which occur will invoke the :meth:`.PoolEvents.invalidate`
event.
Using Connection Pools with Multiprocessing
-------------------------------------------
It's critical that when using a connection pool, and by extension when
using an :class:`.Engine` created via :func:`.create_engine`, that
the pooled connections **are not shared to a forked process**. TCP connections
are represented as file descriptors, which usually work across process
boundaries, meaning this will cause concurrent access to the file descriptor
on behalf of two or more entirely independent Python interpreter states.
There are two approaches to dealing with this.
The first is, either create a new :class:`.Engine` within the child
process, or upon an existing :class:`.Engine`, call :meth:`.Engine.dispose`
before the child process uses any connections. This will remove all existing
connections from the pool so that it makes all new ones. Below is
a simple version using ``multiprocessing.Process``, but this idea
should be adapted to the style of forking in use::
eng = create_engine("...")
def run_in_process():
eng.dispose()
with eng.connect() as conn:
conn.execute("...")
p = Process(target=run_in_process)
The next approach is to instrument the :class:`.Pool` itself with events
so that connections are automatically invalidated in the subprocess.
This is a little more magical but probably more foolproof::
from sqlalchemy import event
from sqlalchemy import exc
import os
eng = create_engine("...")
@event.listens_for(engine, "connect")
def connect(dbapi_connection, connection_record):
connection_record.info['pid'] = os.getpid()
@event.listens_for(engine, "checkout")
def checkout(dbapi_connection, connection_record, connection_proxy):
pid = os.getpid()
if connection_record.info['pid'] != pid:
connection_record.connection = connection_proxy.connection = None
raise exc.DisconnectionError(
"Connection record belongs to pid %s, "
"attempting to check out in pid %s" %
(connection_record.info['pid'], pid)
)
Above, we use an approach similar to that described in
:ref:`pool_disconnects_pessimistic` to treat a DBAPI connection that
originated in a different parent process as an "invalid" connection,
coercing the pool to recycle the connection record to make a new connection.
API Documentation - Available Pool Implementations
--------------------------------------------------
.. autoclass:: sqlalchemy.pool.Pool
.. automethod:: __init__
.. automethod:: connect
.. automethod:: dispose
.. automethod:: recreate
.. automethod:: unique_connection
.. autoclass:: sqlalchemy.pool.QueuePool
.. automethod:: __init__
.. automethod:: connect
.. automethod:: unique_connection
.. autoclass:: SingletonThreadPool
.. automethod:: __init__
.. autoclass:: AssertionPool
.. autoclass:: NullPool
.. autoclass:: StaticPool
.. autoclass:: _ConnectionFairy
:members:
.. autoattribute:: _connection_record
.. autoclass:: _ConnectionRecord
:members:
Pooling Plain DB-API Connections
--------------------------------
Any :pep:`249` DB-API module can be "proxied" through the connection
pool transparently. Usage of the DB-API is exactly as before, except
the ``connect()`` method will consult the pool. Below we illustrate
this with ``psycopg2``::
import sqlalchemy.pool as pool
import psycopg2 as psycopg
psycopg = pool.manage(psycopg)
# then connect normally
connection = psycopg.connect(database='test', username='scott',
password='tiger')
This produces a :class:`_DBProxy` object which supports the same
``connect()`` function as the original DB-API module. Upon
connection, a connection proxy object is returned, which delegates its
calls to a real DB-API connection object. This connection object is
stored persistently within a connection pool (an instance of
:class:`.Pool`) that corresponds to the exact connection arguments sent
to the ``connect()`` function.
The connection proxy supports all of the methods on the original
connection object, most of which are proxied via ``__getattr__()``.
The ``close()`` method will return the connection to the pool, and the
``cursor()`` method will return a proxied cursor object. Both the
connection proxy and the cursor proxy will also return the underlying
connection to the pool after they have both been garbage collected,
which is detected via weakref callbacks (``__del__`` is not used).
Additionally, when connections are returned to the pool, a
``rollback()`` is issued on the connection unconditionally. This is
to release any locks still held by the connection that may have
resulted from normal activity.
By default, the ``connect()`` method will return the same connection
that is already checked out in the current thread. This allows a
particular connection to be used in a given thread without needing to
pass it around between functions. To disable this behavior, specify
``use_threadlocal=False`` to the ``manage()`` function.
.. autofunction:: sqlalchemy.pool.manage
.. autofunction:: sqlalchemy.pool.clear_managers