Commit Graph

1764 Commits

Author SHA1 Message Date
Victor Stinner 7e5fcae09b gh-142217: Remove internal _Py_Identifier functions (#142219)
Remove internal functions:

* _PyDict_ContainsId()
* _PyDict_DelItemId()
* _PyDict_GetItemIdWithError()
* _PyDict_SetItemId()
* _PyEval_GetBuiltinId()
* _PyObject_CallMethodIdNoArgs()
* _PyObject_CallMethodIdObjArgs()
* _PyObject_CallMethodIdOneArg()
* _PyObject_VectorcallMethodId()
* _PyUnicode_EqualToASCIIId()

These functions were not exported and so no usable outside CPython.
2025-12-03 14:33:32 +01:00
Victor Stinner 600f3feb23 gh-141070: Add PyUnstable_Object_Dump() function (#141072)
* Promote _PyObject_Dump() as a public function.
* Keep _PyObject_Dump() alias to PyUnstable_Object_Dump()
  for backward compatibility.
* Replace _PyObject_Dump() with PyUnstable_Object_Dump().

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
2025-11-18 16:13:13 +00:00
Stan Ulbrych a3ce2f77f0 gh-55531: Implement normalize_encoding in C (#136643)
Closes gh-55531
2025-10-30 15:31:47 +01:00
Victor Stinner efc37ba49e gh-139353: Add Objects/unicode_writer.c file (#139911)
Move the public PyUnicodeWriter API and the private _PyUnicodeWriter
API to a new Objects/unicode_writer.c file.

Rename a few helper functions to share them between unicodeobject.c
and unicode_writer.c, such as resize_compact() or unicode_result().
2025-10-30 14:36:15 +01:00
Stan Ulbrych dbe3950a76 gh-129117: Add unicodedata.isxidstart() function (#140269)
Expose `_PyUnicode_IsXidContinue/Start` in `unicodedata`:
add isxidstart() and isxidcontinue() functions.

Co-authored-by: Victor Stinner <vstinner@python.org>
2025-10-30 10:18:12 +00:00
Shamil 7d70a147f5 Remove dead stores to 'size' in UTF-8 decoder (unicodeobject.c) (#140637) 2025-10-27 11:55:57 +03:00
Victor Stinner 166cdaa6fb gh-111489: Remove _PyTuple_FromArray() alias (#139973)
Replace _PyTuple_FromArray() with PyTuple_FromArray().
Remove pycore_tuple.h includes.
2025-10-11 22:58:14 +02:00
Victor Stinner 4c119714d5 gh-139353: Add Objects/unicode_format.c file (#139491)
* Move PyUnicode_Format() implementation from unicodeobject.c
  to unicode_format.c.
* Replace unicode_modifiable() with _PyUnicode_IsModifiable()
* Add empty lines to have two empty lines between functions.
2025-10-10 12:52:59 +02:00
Victor Stinner 3d3f126e86 gh-139353: Rename formatter_unicode.c to unicode_formatter.c (#139723)
* Move Python/formatter_unicode.c to Objects/unicode_formatter.c.
* Move Objects/stringlib/localeutil.h content into
  unicode_formatter.c. Remove localeutil.h.
* Move _PyUnicode_InsertThousandsGrouping() to unicode_formatter.c
  and mark the function as static.
* Rename unicode_fill() to _PyUnicode_Fill() and export it in
  pycore_unicodeobject.h.
* Move MAX_UNICODE to pycore_unicodeobject.h as _Py_MAX_UNICODE.
2025-10-08 14:56:00 +02:00
Victor Stinner e9c538dd54 gh-139156: Optimize _PyUnicode_EncodeCharmap() (#139306)
Specialize _PyUnicode_EncodeCharmap() for EncodingMapType which is
used by Python codecs such as iso8859_15.
2025-09-25 11:42:16 +02:00
Victor Stinner 8d83b7df3f gh-139156: Optimize the UTF-7 encoder (#139253)
Remove base64SetO and base64WhiteSpace parameters.
2025-09-24 17:57:29 +02:00
Victor Stinner c7b11b7546 gh-139156: Use PyBytesWriter in PyUnicode_EncodeCodePage() (#139259)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
2025-09-24 16:39:40 +02:00
Victor Stinner c9a79a02a8 gh-139156: Use PyBytesWriter in _PyUnicode_EncodeCharmap() (#139251)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.

Add _PyBytesWriter_GetSize() and _PyBytesWriter_GetData() static
inline functions.
2025-09-24 16:15:34 +02:00
Victor Stinner 8cfd7b4ecf gh-129813, PEP 782: Use PyBytesWriter in utf8_encoder() (#138874)
Replace the private _PyBytesWriter API with the new public
PyBytesWriter API in utf8_encoder() and unicode_encode_ucs1().
2025-09-23 11:47:09 +02:00
Victor Stinner 49e83e31bd gh-139156: Use PyBytesWriter in PyUnicode_AsRawUnicodeEscapeString() (#139250)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
2025-09-22 23:46:19 +02:00
Victor Stinner c497694f77 gh-139156: Use PyBytesWriter in UTF-16 encoder (#139233)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
2025-09-22 23:36:05 +02:00
Victor Stinner e578a9e6a5 gh-139156: Use PyBytesWriter in PyUnicode_AsUnicodeEscapeString() (#139249)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
2025-09-22 23:22:27 +02:00
Victor Stinner c863349f98 gh-139156: Use PyBytesWriter in the UTF-7 encoder (#139248)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
2025-09-22 22:49:25 +02:00
Victor Stinner 92ba2c92c4 gh-139156: Use PyBytesWriter in UTF-32 encoder (#139157)
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
2025-09-22 20:05:35 +00:00
Victor Stinner 06b7891f12 gh-129813, PEP 782: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#138830)
Replace PyBytes_FromStringAndSize(NULL, 0) with
Py_GetConstant(Py_CONSTANT_EMPTY_BYTES). Py_GetConstant() cannot
fail.
2025-09-13 18:30:25 +02:00
Petr Viktorin 0c74fc8af0 gh-137210: Add a struct, slot & function for checking an extension's ABI (GH-137212)
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2025-09-05 16:23:18 +02:00
Adam Turner 918e3ba6c0 GH-137623: Use an AC decorator for docstring line length enforcement (#137690) 2025-08-18 18:29:00 +01:00
Peter Bierma 082f370cdd gh-137514: Add a free-threading wrapper for mutexes (GH-137515)
Add `FT_MUTEX_LOCK`/`FT_MUTEX_UNLOCK`, which call `PyMutex_Lock` and `PyMutex_Unlock` on the free-threaded build, and no-op otherwise.
2025-08-07 11:24:50 -04:00
Victor Stinner ce1b747ff6 gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (#137415)
Fix name of the Python encoding in Unicode errors of the code page
codec: use "cp65000" and "cp65001" instead of "CP_UTF7" and "CP_UTF8"
which are not valid Python code names.
2025-08-06 14:35:27 +02:00
Dave Peck c5e77af131 gh-132661: Disallow Template/str concatenation after PEP 750 spec update (#135996)
Co-authored-by: sobolevn <mail@sobolevn.me>
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
2025-07-21 08:44:26 +02:00
Petr Viktorin e413e26719 gh-134891: Add PyUnstable_Unicode_GET_CACHED_HASH (GH-134892) 2025-06-06 15:51:00 +02:00
Victor Stinner f49a07b531 gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973)
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().

Unrelated change to please the linter: remove an unused
import in test_ctypes.

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-05-29 14:54:30 +00:00
Victor Stinner fe9f6e829a gh-133968: Add fast path to PyUnicodeWriter_WriteStr() (#133969)
Don't call PyObject_Str() if the input type is str.
2025-05-13 15:31:41 +02:00
Serhiy Storchaka 9f69a58623 gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
2025-05-12 20:42:23 +03:00
Stan Ulbrych 4fd1095280 gh-133610: Remove PyUnicode_AsDecoded/Encoded functions (#133612) 2025-05-09 17:31:24 +02:00
Petr Viktorin 987e45e632 gh-128972: Add _Py_ALIGN_AS and revert PyASCIIObject memory layout. (GH-133085)
Add `_Py_ALIGN_AS` as per C API WG vote: https://github.com/capi-workgroup/decisions/issues/61
This patch only adds it to free-threaded builds; the `#ifdef Py_GIL_DISABLED`
can be removed in the future.

Use this to revert `PyASCIIObject` memory layout for non-free-threaded builds.
The long-term plan is to deprecate the entire struct; until that happens
it's better to keep it unchanged, as courtesy to people that rely on it despite
it not being stable ABI.
2025-05-02 18:30:40 +02:00
Lysandros Nikolaou 60202609a2 gh-132661: Implement PEP 750 (#132662)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Wingy <git@wingysam.xyz>
Co-authored-by: Koudai Aono <koxudaxi@gmail.com>
Co-authored-by: Dave Peck <davepeck@gmail.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Paul Everitt <pauleveritt@me.com>
Co-authored-by: sobolevn <mail@sobolevn.me>
2025-04-30 11:46:41 +02:00
Donghee Na 75cbb8d89e gh-132070: Use _PyObject_IsUniquelyReferenced in unicodeobject (gh-133039)
---------

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2025-04-29 09:48:53 +09:00
Stan Ulbrych f6fb498c97 gh-132798: Schedule removal of PyUnicode_AsDecoded/Encoded functions for 3.15 (#132799)
Co-authored-by: Victor Stinner <vstinner@python.org>
2025-04-25 15:07:41 +02:00
Jon Crall fc0ec29889 gh-103997: Automatically dedent the argument to "-c" (#103998)
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
Co-authored-by: Kirill Podoprigora <80244920+Eclips4@users.noreply.github.com>
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-18 17:39:30 +09:00
Bénédikt Tran edbf7fb129 gh-111178: remove redundant casts for functions with correct signatures (#131673) 2025-04-01 17:18:11 +02:00
Victor Stinner 22706843e0 gh-131238: Remove many includes from pycore_interp.h (#131472) 2025-03-19 17:46:24 +00:00
Mark Shannon a1aeec61c4 GH-131238: Core header refactor (GH-131250)
* Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h

* Removes many cross-header dependencies
2025-03-17 09:19:04 +00:00
Mark Shannon f30376c650 GH-127705: Fix _Py_RefcntAdd to handle objects becoming immortal (GH-131140) 2025-03-12 16:54:10 +00:00
Victor Stinner ed8675c571 gh-111178: Fix function signatures of unicodeiter (#130684) 2025-03-04 10:33:09 +01:00
Sergey Miryanov 3a7f17c7e2 gh-130790: Remove references about unicode's readiness from comments (#130801) 2025-03-03 19:18:09 +00:00
Sergey B Kirpichev f39a07be47 gh-87790: support thousands separators for formatting fractional part of floats (#125304)
```pycon
>>> f"{123_456.123_456:_._f}"  # Whole and fractional
'123_456.123_456'
>>> f"{123_456.123_456:_f}"    # Integer component only
'123_456.123456'
>>> f"{123_456.123_456:._f}"   # Fractional component only
'123456.123_456'
>>> f"{123_456.123_456:.4_f}"  # with precision
'123456.1_235'
```
2025-02-25 16:27:07 +01:00
Sam Gross b9d2ee687c gh-129701: Fix a data race in intern_common in the free threaded build (GH-130089)
* gh-129701: Fix a data race in `intern_common` in the free threaded build

* Use a mutex to avoid potentially returning a non-immortalized string,
  because immortalization happens after the insertion into the interned
  dict.

* Use `Py_DECREF()` calls instead of `Py_SET_REFCNT(s, Py_REFCNT(s) - 2)`
  for thread-safety. This code path isn't performance sensistive, so
  just use `Py_DECREF()` unconditionally for simplicity.
2025-02-17 14:15:40 +01:00
Stan Ulbrych 3402e133ef gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130118)
We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.

With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database.  That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.

Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.

Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.

Also add a thorough test that the implementation agrees with this definition.

Author:    Greg Price <gnprice@gmail.com>

Co-authored-by: Greg Price <gnprice@gmail.com>
2025-02-14 18:16:47 +01:00
Victor Stinner 0373926260 gh-129354: Use PyErr_FormatUnraisable() function (#129511)
Replace PyErr_WriteUnraisable() with PyErr_FormatUnraisable().
2025-01-31 13:16:08 +01:00
Victor Stinner a810cb89f1 gh-89188: Implement PyUnicode_KIND() as a function (#129412)
Implement PyUnicode_KIND() and PyUnicode_DATA() as function, in
addition to the macros with the same names. The macros rely on C bit
fields which have compiler-specific layout.
2025-01-30 11:27:27 +00:00
Umar Butler 8d8b854824 gh-128016: Improved invalid escape sequence warning message (#128020) 2025-01-15 18:00:54 +01:00
Donghee Na ae23a012e6 gh-128137: Update PyASCIIObject to handle interned field with the atomic operation (gh-128196) 2025-01-05 18:17:06 +09:00
Alexander Shadchin 46cb6340d7 gh-127903: Fix a crash on debug builds when calling Objects/unicodeobject::_copy_characters` (#127876) 2025-01-03 18:47:58 +00:00
Sam Gross 8eebe4e6d0 gh-128212: Fix race in _PyUnicode_CheckConsistency (GH-128367)
There was a data race on the utf8 field between `PyUnicode_SET_UTF8` and
`_PyUnicode_CheckConsistency`. Use the `_PyUnicode_UTF8()` accessor,
which uses an atomic load internally, to avoid the data race.
2025-01-02 14:02:54 -05:00