Commit Graph

470 Commits

Author SHA1 Message Date
Mark Shannon 600f4dbd54 GH-145668: Add FOR_ITER specialization for virtual iterators. Specialize GET_ITER. (GH-147967)
* Add FOR_ITER_VIRTUAL to specialize FOR_ITER for virtual iterators
* Add GET_ITER_SELF to specialize GET_ITER for iterators (including generators)
* Add GET_ITER_VIRTUAL to specialize GET_ITER for iterables as virtual iterators
* Add new (internal) _tp_iteritem function slot to PyTypeObject
* Put limited RESUME at start of genexpr for free-threading. Fix up exception handling in genexpr
2026-04-16 15:22:22 +01:00
Pieter Eendebak 1f6a09fb36 gh-100239: Specialize more binary operations using BINARY_OP_EXTEND (GH-128956) 2026-04-16 09:22:41 +01:00
Serhiy Storchaka 473d2a35ce gh-147944: Increase range of bytes_per_sep (GH-147946)
Accepted range for the bytes_per_sep argument of bytes.hex(),
bytearray.hex(), memoryview.hex(), and binascii.b2a_hex()
is now increased, so passing sys.maxsize and -sys.maxsize is now
valid.
2026-04-01 08:33:30 +00:00
Stan Ulbrych 62a6e898e0 gh-147856: Allow the 'count' argument of bytes.replace() to be a keyword (#147943) 2026-03-31 19:27:52 +02:00
Serhiy Storchaka 2520617361 gh-142037: Fix a refleak introduced in GH-142081 (GH-144256) 2026-01-26 21:15:21 +00:00
Serhiy Storchaka 012c498035 gh-142037: Improve error messages for printf-style formatting (GH-142081)
This affects string formatting as well as bytes and bytearray formatting.

* For errors in the format string, always include the position of the
  start of the format unit.
* For errors related to the formatted arguments, always include the number
  or the name of the formatted argument.
* Suggest more probable causes of errors in the format string (stray %,
  unsupported format, unexpected character).
* Provide more information when the number of arguments does not match
  the number of format units.
* Raise more specific errors when access of arguments by name is mixed with
  sequential access and when * is used with a mapping.
* Add tests for some uncovered cases.
2026-01-24 11:13:50 +00:00
Gregory P. Smith a966d94e76 gh-144157: Optimize bytes.translate() by deferring change detection (GH-144158)
Optimize bytes.translate() by deferring change detection

Move the equality check out of the hot loop to allow better compiler
optimization. Instead of checking each byte during translation, perform
a single memcmp at the end to determine if the input can be returned
unchanged.

This allows compilers to unroll and pipeline the loops, resulting in ~2x
throughput improvement for medium-to-large inputs (tested on an AMD zen2).
No change observed on small inputs.

It will also be faster for bytes subclasses as those do not need change
detection.
2026-01-22 09:21:07 -08:00
Cody Maloney 732224e113 gh-139871: Add bytearray.take_bytes([n]) to efficiently extract bytes (GH-140128)
Update `bytearray` to contain a `bytes` and provide a zero-copy path to
"extract" the `bytes`. This allows making several code paths more efficient.

This does not move any codepaths to make use of this new API. The documentation
changes include common code patterns which can be made more efficient with
this API.

---

When just changing `bytearray` to contain `bytes` I ran pyperformance on a
`--with-lto --enable-optimizations --with-static-libpython` build and don't see
any major speedups or slowdowns with this; all seems to be in the noise of
my machine (Generally changes under 5% or benchmarks that don't touch
bytes/bytearray).


Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Maurycy Pawłowski-Wieroński <5383+maurycy@users.noreply.github.com>
2025-11-13 13:19:44 +00:00
Stan Ulbrych d6c89a2df2 gh-140939: Fix memory leak in _PyBytes_FormatEx error path (#140957) 2025-11-06 11:20:57 +05:30
Sergey Miryanov 32c264982e gh-140061: Use _PyObject_IsUniquelyReferenced() to check if objects are uniquely referenced (gh-140062)
The previous `Py_REFCNT(x) == 1` checks can have data races in the free
threaded build. `_PyObject_IsUniquelyReferenced(x)` is a more conservative
check that is safe in the free threaded build and is identical to
`Py_REFCNT(x) == 1` in the default GIL-enabled build.
2025-10-15 09:48:21 -04:00
Victor Stinner c9a79a02a8 gh-139156: Use PyBytesWriter in _PyUnicode_EncodeCharmap() (#139251)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.

Add _PyBytesWriter_GetSize() and _PyBytesWriter_GetData() static
inline functions.
2025-09-24 16:15:34 +02:00
Victor Stinner dd45179fa0 gh-129813, PEP 782: Remove the private _PyBytesWriter API (#139264)
It is now replaced with the new public PyBytesWriter API (PEP 782).
2025-09-23 15:29:55 +00:00
Victor Stinner 9f7bbafffe gh-129813, PEP 782: Optimize byteswriter_resize() (#139101)
There is no need to copy the small_buffer in PyBytesWriter_Create().
2025-09-18 09:52:16 +00:00
Victor Stinner c72ffe71f1 gh-129813, PEP 782: Set invalid bytes in PyBytesWriter (#139054)
Initialize the buffer with 0xFF byte pattern when creating a writer
object, but also when resizing/growing the writer.
2025-09-17 17:58:32 +02:00
Serhiy Storchaka a1cf6e92b6 gh-71679: Share the repr implementation between bytes and bytearray (GH-138181)
This allows to use the smart quotes algorithm in the bytearray's repr.
2025-09-17 11:10:29 +03:00
Victor Stinner 4e00e2504f gh-129813, PEP 782: Add PyBytesWriter.overallocate (#138941)
Disable overallocation in _PyBytes_FormatEx() at the last write.
2025-09-16 00:15:32 +02:00
Victor Stinner 7c6efc3a4f gh-129813, PEP 782: Init small_buffer in PyBytesWriter_Create() (#138924)
Fill small_buffer with 0xFF byte pattern to detect the usage of
uninitialized bytes in debug build.
2025-09-15 16:23:11 +02:00
Victor Stinner 2d724939d7 gh-129813, PEP 782: Use PyBytesWriter in _PyBytes_FormatEx() (#138839)
Replace the private _PyBytesWriter API with the new public
PyBytesWriter API.
2025-09-15 12:23:36 +02:00
Victor Stinner 430900d15b gh-129813, PEP 782: Use PyBytesWriter in _PyBytes_FromList() (#138837)
Use the new public PyBytesWriter API in:

* _PyBytes_FromHex()
* _PyBytes_FromBuffer()
* _PyBytes_FromList()
* _PyBytes_FromTuple()
* _PyBytes_FromIterator()

Add _PyBytesWriter_ResizeAndUpdatePointer() and
_PyBytesWriter_GetAllocated() helper functions.
2025-09-13 19:23:57 +02:00
Victor Stinner 06b7891f12 gh-129813, PEP 782: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#138830)
Replace PyBytes_FromStringAndSize(NULL, 0) with
Py_GetConstant(Py_CONSTANT_EMPTY_BYTES). Py_GetConstant() cannot
fail.
2025-09-13 18:30:25 +02:00
Victor Stinner 8fc6ae68b0 gh-129813, PEP 782: Use PyBytesWriter in _PyBytes_DecodeEscape2() (#138838)
Replace the private _PyBytesWriter API with the new public
PyBytesWriter API.
2025-09-13 18:28:08 +02:00
Victor Stinner c3fca5d478 gh-129813, PEP 782: Add PyBytesWriter_Format() (#138824)
Modify PyBytes_FromFormatV() to use the public PyBytesWriter API
rather than the _PyBytesWriter private API.
2025-09-12 14:21:57 +02:00
Victor Stinner adb414044f gh-129813, PEP 782: Add PyBytesWriter C API (#138822) 2025-09-12 13:41:59 +02:00
Adam Turner 918e3ba6c0 GH-137623: Use an AC decorator for docstring line length enforcement (#137690) 2025-08-18 18:29:00 +01:00
Semyon Moroz 968f6e523a gh-130821: Add type information to error messages for invalid return type (GH-130835) 2025-08-14 11:04:41 +03:00
Serhiy Storchaka 9f69a58623 gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
2025-05-12 20:42:23 +03:00
Ageev Maxim 7c3692fe27 gh-130928: Fix error message during bytes formatting for the 'i' flag (#130967) 2025-03-24 22:07:03 +03:00
Bénédikt Tran a1205ef524 gh-111178: fix UBSan failures for PyBytesObject (#131603) 2025-03-24 11:02:09 +01:00
Victor Stinner 20c5f969dd gh-131238: Remove more includes from pycore_interp.h (#131480) 2025-03-19 23:01:32 +01:00
Victor Stinner 061da44bac gh-111178: Change Argument Clinic signature for @classmethod (#131157)
Use "PyObject*", instead of "PyTypeObject*", for `@classmethod`
functions to fix an undefined behavior.
2025-03-12 17:42:07 +01:00
Daniel Pope e0637cebe5 gh-129349: Accept bytes in bytes.fromhex()/bytearray.fromhex() (#129844)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2025-03-12 11:40:11 +01:00
Victor Stinner 9d759b63d8 gh-111178: Change Argument Clinic signature for METH_O (#130682)
Use "PyObject*" for METH_O functions to fix an undefined behavior.
2025-03-11 16:33:36 +01:00
Umar Butler 8d8b854824 gh-128016: Improved invalid escape sequence warning message (#128020) 2025-01-15 18:00:54 +01:00
Bénédikt Tran 295776c7f3 gh-111178: fix UBSan failures in Objects/bytesobject.c (GH-128237)
* remove redundant casts for `bytesobject`
* fix UBSan failures for `striterobject`
2025-01-10 11:48:06 +01:00
Abhijeet 0706bab1c0 gh-128133: use relaxed atomics for hash of bytes (#128412) 2025-01-03 13:50:56 +05:30
Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్ రెడ్డి) db9bea0386 gh-127740: For odd-length input to bytes.fromhex(...) change the error message to ValueError: fromhex() arg must be of even length (#127756) 2024-12-11 08:35:17 +01:00
Pablo Galindo Salgado 30aeb00d36 gh-126076: Account for relocated objects in tracemalloc (#126077) 2024-11-19 10:35:17 +00:00
Mark Shannon fa40922597 GH-126547: Pre-assign version numbers for a few common classes (GH-126551) 2024-11-08 16:44:44 +00:00
Mark Shannon c9014374c5 GH-125174: Make immortal objects more robust, following design from PEP 683 (GH-125251) 2024-10-10 18:19:08 +01:00
Victor Stinner 6a39e96ab8 gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#125195)
Replace PyBytes_FromString("") and PyBytes_FromStringAndSize("", 0)
with Py_GetConstant(Py_CONSTANT_EMPTY_BYTES).
2024-10-09 17:12:11 +02:00
Victor Stinner 1d3700f943 gh-111178: Fix function signatures in bytesobject.c (#124806) 2024-10-02 13:35:51 +02:00
Victor Stinner f1a0d96f41 gh-123091: Use _Py_IsImmortalLoose() (#123511)
Use _Py_IsImmortalLoose() in bytesobject.c, typeobject.c
and ceval.c.
2024-09-02 14:25:19 +02:00
Victor Stinner d8e69b2c1b gh-122854: Add Py_HashBuffer() function (#122855) 2024-08-30 15:42:27 +00:00
Victor Stinner 3d60dfbe17 gh-121645: Add PyBytes_Join() function (#121646)
* Replace _PyBytes_Join() with PyBytes_Join().
* Keep _PyBytes_Join() as an alias to PyBytes_Join().
2024-08-30 12:57:33 +00:00
Victor Stinner fda6bd842a Replace PyObject_Del with PyObject_Free (#122453)
PyObject_Del() is just a alias to PyObject_Free() kept for backward
compatibility. Use directly PyObject_Free() instead.
2024-08-01 14:12:33 +02:00
Serhiy Storchaka b313cc68d5 gh-117557: Improve error messages when a string, bytes or bytearray of length 1 are expected (GH-117631) 2024-05-28 12:01:37 +03:00
Geoffrey Thomas ef172521a9 Remove almost all unpaired backticks in docstrings (#119231)
As reported in #117847 and #115366, an unpaired backtick in a docstring
tends to confuse e.g. Sphinx running on subclasses of standard library
objects, and the typographic style of using a backtick as an opening
quote is no longer in favor. Convert almost all uses of the form

    The variable `foo' should do xyz

to

    The variable 'foo' should do xyz

and also fix up miscellaneous other unpaired backticks (extraneous /
missing characters).

No functional change is intended here other than in human-readable
docstrings.
2024-05-22 12:35:18 -04:00
Erlend E. Aasland deb921f851 gh-117431: Adapt bytes and bytearray .find() and friends to Argument Clinic (#117502)
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used. The following bytes and bytearray methods are adapted:

- count()
- find()
- index()
- rfind()
- rindex()

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2024-04-12 07:40:55 +00:00
Sam Gross 1a6594f661 gh-117439: Make refleak checking thread-safe without the GIL (#117469)
This keeps track of the per-thread total reference count operations in
PyThreadState in the free-threaded builds. The count is merged into the
interpreter's total when the thread exits.
2024-04-08 12:11:36 -04:00
Erlend E. Aasland 595bb496b0 gh-117431: Adapt bytes and bytearray .startswith() and .endswith() to Argument Clinic (#117495)
This change gives a significant speedup, as the METH_FASTCALL calling
convention is now used.
2024-04-03 13:11:14 +02:00