Commit Graph

151 Commits

Author SHA1 Message Date
Victor Stinner db9e7556c3 [3.13] gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250) (#151269) (#151283) (#151287)
[3.14][3.15] gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250) (#151269) (#151283)

[3.15] gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250) (#151269)

gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250)

If "import encodings" fails at Python startup, dump the Python path
configuration to help users debugging their configuration. The
encodings module is the first module imported during Python startup.

(cherry picked from commit 7b6e98911e)
(cherry picked from commit 10f616cf39)
(cherry picked from commit b3a7758d8a)
2026-06-10 23:24:46 +02:00
Miss Islington (bot) f2d6931656 [3.13] gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (GH-137415) (#137461)
gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (GH-137415)

Fix name of the Python encoding in Unicode errors of the code page
codec: use "cp65000" and "cp65001" instead of "CP_UTF7" and "CP_UTF8"
which are not valid Python code names.
(cherry picked from commit ce1b747ff6)

Co-authored-by: Victor Stinner <vstinner@python.org>
2025-08-06 12:59:11 +00:00
Petr Viktorin 9769b7ae06 [3.13] gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520) (GH-120945)
* Add an InternalDocs file describing how interning should work and how to use it.

* Add internal functions to *explicitly* request what kind of interning is done:
  - `_PyUnicode_InternMortal`
  - `_PyUnicode_InternImmortal`
  - `_PyUnicode_InternStatic`

* Switch uses of `PyUnicode_InternInPlace` to those.

* Disallow using `_Py_SetImmortal` on strings directly.
  You should use `_PyUnicode_InternImmortal` instead:
  - Strings should be interned before immortalization, otherwise you're possibly
    interning a immortalizing copy.
  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
    backports, as they are now part of public API and version-specific ABI.

* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
  - `_Py_ID`
  - `_Py_STR` (including the empty string)
  - one-character latin-1 singletons

  Now, when you intern a singleton, that exact singleton will be interned.

* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

* Intern `_Py_STR` singletons at startup.

* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.

* Beef up the tests. Cover internal details (marked with `@cpython_only`).

* Add lots of assertions

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2024-06-24 20:24:19 +02:00
Brett Simmers f8290df63f gh-116738: Make _codecs module thread-safe (#117530)
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:

- Move codecs-related state that lives on `PyInterpreterState` to a
  struct declared in `pycore_codecs.h`.

- In free-threaded builds, add a mutex to `codecs_state` to synchronize
  operations on `search_path`. Because `search_path_mutex` is used as a
  normal mutex and not a critical section, we must be extremely careful
  with operations called while holding it.

- The codec registry is explicitly initialized as part of
  `_PyUnicode_InitEncodings` to simplify thread-safety.
2024-05-02 18:25:36 -04:00
Kirill Podoprigora 0785c68559 gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249) 2023-11-30 11:12:49 +01:00
Serhiy Storchaka aa438bdd6d gh-111789: Use PyDict_GetItemRef() in Python/codecs.c (gh-112082) 2023-11-27 18:53:43 +01:00
Victor Stinner 03c4080c71 gh-108765: Python.h no longer includes <ctype.h> (#108831)
Remove <ctype.h> in C files which don't use it; only sre.c and
_decimal.c still use it.

Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h:

* Code added by commit b5047fd019
  in 2004 for MacOSX and FreeBSD.
* Test removed by commit 52ddaefb6b
  in 2007, since Python str type now uses locale independent
  functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode
  database.

Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new
functions: sre_isalnum(), sre_tolower(), sre_toupper().

Remove unused includes:

* _localemodule.c: remove <stdio.h>.
* getargs.c: remove <float.h>.
* dynload_win.c: remove <direct.h>, it no longer calls _getcwd()
  since commit fb1f68ed7c (in 2001).
2023-09-03 18:54:27 +02:00
Victor Stinner 4dc9f48930 gh-108308: Replace _PyDict_GetItemStringWithError() (#108372)
Replace _PyDict_GetItemStringWithError() calls with
PyDict_GetItemStringRef() which returns a strong reference to the
item.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-08-23 22:59:00 +02:00
Victor Stinner 615f6e946d gh-106320: Remove _PyDict_GetItemStringWithError() function (#108313)
Remove private _PyDict_GetItemStringWithError() function of the
public C API: the new PyDict_GetItemStringRef() can be used instead.

* Move private _PyDict_GetItemStringWithError() to the internal C API.
* _testcapi get_code_extra_index() uses PyDict_GetItemStringRef().
  Avoid using private functions in _testcapi which tests the public C
  API.
2023-08-22 18:17:25 +00:00
Serhiy Storchaka be1b968dc1 gh-106521: Remove _PyObject_LookupAttr() function (GH-106642) 2023-07-12 08:57:10 +03:00
Victor Stinner bc7eb17084 gh-106320: Use _PyInterpreterState_GET() (#106336)
Replace PyInterpreterState_Get() with inlined
_PyInterpreterState_GET().
2023-07-02 16:37:37 +00:00
Irit Katriel 55c99d97e1 gh-77757: replace exception wrapping by PEP-678 notes in typeobject's __set_name__ (#103402) 2023-04-11 11:53:06 +01:00
Irit Katriel 76350e85eb gh-102406: replace exception chaining by PEP-678 notes in codecs (#102407) 2023-03-21 21:36:31 +00:00
Victor Stinner 8211cf5d28 gh-99300: Replace Py_INCREF() with Py_NewRef() (#99530)
Replace Py_INCREF() and Py_XINCREF() using a cast with Py_NewRef()
and Py_XNewRef().
2022-11-16 18:34:24 +01:00
Victor Stinner d8f239d86e gh-99300: Use Py_NewRef() in Python/ directory (#99302)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Python/ directory.
2022-11-10 09:03:39 +01:00
Eric Snow 81c72044a1 bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Kumar Aditya 41026c3155 bpo-45855: Replaced deprecated PyImport_ImportModuleNoBlock with PyImport_ImportModule (GH-30046) 2021-12-12 10:45:20 +02:00
Victor Stinner d943d19172 bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895)
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
  PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
  defined to get access to internal _PyObject_CallNoArgs().
2021-10-12 08:38:19 +02:00
Victor Stinner ce3489cfdb bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891)
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
2021-10-12 00:42:23 +02:00
Victor Stinner 920cb647ba bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:

  * Remove size and state members
  * Remove state and self parameters of getcode() and getname()
    functions

* Remove global_module_state
2020-10-26 19:19:36 +01:00
Victor Stinner 47e1afd2a1 bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.

* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
2020-10-26 16:43:47 +01:00
Hai Shi c9f696cb96 bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)
* Move the codecs' (un)register operation to testcases.
* Remove _codecs._forget_codec() and _PyCodec_Forget()
2020-10-16 10:34:15 +02:00
Hai Shi d332e7b816 bpo-41842: Add codecs.unregister() function (GH-22360)
Add codecs.unregister() and PyCodec_Unregister() functions
to unregister a codec search function.
2020-09-28 23:41:11 +02:00
Victor Stinner e5014be049 bpo-40268: Remove a few pycore_pystate.h includes (GH-19510) 2020-04-14 17:52:15 +02:00
Victor Stinner 81a7be3fa2 bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).

Add also "assert(tstate != NULL);" to the function.
2020-04-14 15:14:01 +02:00
Victor Stinner 4a3fe08353 bpo-40268: Include explicitly pycore_interp.h (GH-19505)
pycore_pystate.h no longer includes pycore_interp.h:
it's now included explicitly in files accessing PyInterpreterState.
2020-04-14 14:26:24 +02:00
Serhiy Storchaka cd8295ff75 bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345) 2020-04-11 10:48:40 +03:00
Victor Stinner ff4584caca bpo-39947: Use _PyInterpreterState_GET_UNSAFE() (GH-18978)
Replace _PyInterpreterState_Get() function call with
_PyInterpreterState_GET_UNSAFE() macro which is more efficient but
don't check if tstate or interp is NULL.

_Py_GetConfigsAsDict() now uses _PyThreadState_GET().
2020-03-13 18:03:56 +01:00
Andy Lester 7386a70746 closes bpo-39630: Update pointers to string literals to be const char *. (GH-18510) 2020-02-13 20:42:56 -08:00
Petr Viktorin ffd9753a94 bpo-39245: Switch to public API for Vectorcall (GH-18460)
The bulk of this patch was generated automatically with:

    for name in \
        PyObject_Vectorcall \
        Py_TPFLAGS_HAVE_VECTORCALL \
        PyObject_VectorcallMethod \
        PyVectorcall_Function \
        PyObject_CallOneArg \
        PyObject_CallMethodNoArgs \
        PyObject_CallMethodOneArg \
    ;
    do
        echo $name
        git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
    done

    old=_PyObject_FastCallDict
    new=PyObject_VectorcallDict
    git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"

and then cleaned up:

- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
2020-02-11 17:46:57 +01:00
Victor Stinner a102ed7d2f bpo-39573: Use Py_TYPE() macro in Python and Include directories (GH-18391)
Replace direct access to PyObject.ob_type with Py_TYPE().
2020-02-07 02:24:48 +01:00
Victor Stinner d3a1de2270 bpo-38631: Avoid Py_FatalError() in _PyCodecRegistry_Init() (GH-18217)
_PyCodecRegistry_Init() now reports exceptions to the caller,
rather than calling Py_FatalError().
2020-01-27 23:23:12 +01:00
Jordon Xu 20f59fe1f7 bpo-37751: Fix codecs.lookup() normalization (GH-15092)
Fix codecs.lookup() to normalize the encoding name the same way
than encodings.normalize_encoding(), except that codecs.lookup()
also converts the name to lower case.
2019-08-21 14:26:20 +01:00
Jeroen Demeyer 1dbd084f1f bpo-29548: no longer use PyEval_Call* functions (GH-14683) 2019-07-12 00:57:32 +09:00
Jeroen Demeyer 6e43d07324 bpo-37483: fix reference leak in _PyCodec_Lookup (GH-14600) 2019-07-05 19:57:32 +09:00
Jeroen Demeyer 196a530e00 bpo-37483: add _PyObject_CallOneArg() function (#14558) 2019-07-04 19:31:34 +09:00
Serhiy Storchaka a24107b04c bpo-35459: Use PyDict_GetItemWithError() instead of PyDict_GetItem(). (GH-11112) 2019-02-25 17:59:46 +02:00
Serhiy Storchaka 8905fcc85a bpo-35454: Fix miscellaneous minor issues in error handling. (#11077)
* bpo-35454: Fix miscellaneous minor issues in error handling.

* Fix a null pointer dereference.
2018-12-11 08:38:03 +02:00
Victor Stinner 621cebe81b bpo-35081: Rename internal headers (GH-10275)
Rename Include/internal/ headers:

* pycore_hash.h -> pycore_pyhash.h
* pycore_lifecycle.h -> pycore_pylifecycle.h
* pycore_mem.h -> pycore_pymem.h
* pycore_state.h -> pycore_pystate.h

Add missing headers to Makefile.pre.in and PCbuild:

* pycore_condvar.h.
* pycore_hamt.h
* pycore_pyhash.h
2018-11-12 16:53:38 +01:00
Victor Stinner 27e2d1f219 bpo-35081: Add pycore_ prefix to internal header files (GH-10263)
* Rename Include/internal/ header files:

  * pyatomic.h -> pycore_atomic.h
  * ceval.h -> pycore_ceval.h
  * condvar.h -> pycore_condvar.h
  * context.h -> pycore_context.h
  * pygetopt.h -> pycore_getopt.h
  * gil.h -> pycore_gil.h
  * hamt.h -> pycore_hamt.h
  * hash.h -> pycore_hash.h
  * mem.h -> pycore_mem.h
  * pystate.h -> pycore_state.h
  * warnings.h -> pycore_warnings.h

* PCbuild project, Makefile.pre.in, Modules/Setup: add the
  Include/internal/ directory to the search paths of header files.
* Update includes. For example, replace #include "internal/mem.h"
  with #include "pycore_mem.h".
2018-11-01 00:52:28 +01:00
Victor Stinner caba55b3b7 bpo-34301: Add _PyInterpreterState_Get() helper function (GH-8592)
sys_setcheckinterval() now uses a local variable to parse arguments,
before writing into interp->check_interval.
2018-08-03 15:33:52 +02:00
INADA Naoki 0c1c4563a6 bpo-33231: Fix potential leak in normalizestring() (GH-6386) 2018-04-06 15:51:24 +09:00
Serhiy Storchaka f320be77ff bpo-32571: Avoid raising unneeded AttributeError and silencing it in C code (GH-5222)
Add two new private APIs: _PyObject_LookupAttr() and _PyObject_LookupAttrId()
2018-01-25 17:49:40 +09:00
Eric Snow 2ebc5ce42a bpo-30860: Consolidate stateful runtime globals. (#3397)
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals

Other globals are excluded (see globals.txt and check-c-globals.py).
2017-09-07 23:51:28 -06:00
Victor Stinner 7bfb42d5b7 Issue #28858: Remove _PyObject_CallArg1() macro
Replace
   _PyObject_CallArg1(func, arg)
with
   PyObject_CallFunctionObjArgs(func, arg, NULL)

Using the _PyObject_CallArg1() macro increases the usage of the C stack, which
was unexpected and unwanted. PyObject_CallFunctionObjArgs() doesn't have this
issue.
2016-12-05 17:04:32 +01:00
Victor Stinner 4778eab1f2 Replace PyObject_CallFunction() with fastcall
Replace
    PyObject_CallFunction(func, "O", arg)
and
    PyObject_CallFunction(func, "O", arg, NULL)
with
    _PyObject_CallArg1(func, arg)

Replace
    PyObject_CallFunction(func, NULL)
with
    _PyObject_CallNoArg(func)

_PyObject_CallNoArg() and _PyObject_CallArg1() are simpler and don't allocate
memory on the C stack.
2016-12-01 14:51:04 +01:00
Serhiy Storchaka 85b0f5beb1 Added the const qualifier to char* variables that refer to readonly internal
UTF-8 represenatation of Unicode objects.
2016-11-20 10:16:47 +02:00
Serhiy Storchaka cb33a01bbc Issue #28510: Clean up decoding error handlers.
Since PyUnicodeDecodeError_GetObject() always returns bytes, following
PyBytes_AsString() can be replaced with PyBytes_AS_STRING().
2016-10-23 09:44:50 +03:00
Martin Panter 6245cb3c01 Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc
This affects documentation, code comments, and a debugging messages.
2016-04-15 02:14:19 +00:00
Victor Stinner 38b8ae0f5b Issue #24993: Handle import error in namereplace error handler
Handle PyCapsule_Import() failure (exception) in PyCodec_NameReplaceErrors():
return immedialty NULL.
2015-09-03 16:19:40 +02:00