Files
cpython/Tools
Petr Viktorin 49f6beb56a [3.12] gh-113993: Make interned strings mortal (GH-120520, GH-121364, GH-121903, GH-122303) (#123065)
This backports several PRs for gh-113993, making interned strings mortal so they can be garbage-collected when no longer needed.

* Allow interned strings to be mortal, and fix related issues (GH-120520)

  * Add an InternalDocs file describing how interning should work and how to use it.

  * Add internal functions to *explicitly* request what kind of interning is done:
    - `_PyUnicode_InternMortal`
    - `_PyUnicode_InternImmortal`
    - `_PyUnicode_InternStatic`

  * Switch uses of `PyUnicode_InternInPlace` to those.

  * Disallow using `_Py_SetImmortal` on strings directly.
    You should use `_PyUnicode_InternImmortal` instead:
    - Strings should be interned before immortalization, otherwise you're possibly
      interning a immortalizing copy.
    - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
      `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
      backports, as they are now part of public API and version-specific ABI.

  * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

   Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
    - `_Py_ID`
    - `_Py_STR` (including the empty string)
    - one-character latin-1 singletons

    Now, when you intern a singleton, that exact singleton will be interned.

  * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

  * Intern `_Py_STR` singletons at startup.

  * Beef up the tests. Cover internal details (marked with `@cpython_only`).

  * Add lots of assertions

* Don't immortalize in PyUnicode_InternInPlace; keep immortalizing in other API (GH-121364)

  * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs

  * Document immortality in some functions that take `const char *`

  This is PyUnicode_InternFromString;
  PyDict_SetItemString, PyObject_SetAttrString;
  PyObject_DelAttrString; PyUnicode_InternFromString;
  and the PyModule_Add convenience functions.

  Always point out a non-immortalizing alternative.

  * Don't immortalize user-provided attr names in _ctypes

* Immortalize names in code objects to avoid crash (GH-121903)

* Intern latin-1 one-byte strings at startup (GH-122303)

There are some 3.12-specific changes, mainly to allow statically allocated strings in deepfreeze. (In 3.13, deepfreeze switched to the general `_Py_ID`/`_Py_STR`.)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2024-09-27 13:28:48 -07:00
..

This directory contains a number of Python programs that are useful
while building or extending Python.

build           Automatically generated directory by the build system
                contain build artifacts and intermediate files.

buildbot        Batchfiles for running on Windows buildbot workers.

c-analyzer      Tools to check no new global variables have been added.

cases_generator Tooling to generate interpreters.

ccbench         A Python threads-based concurrency benchmark. (*)

clinic          A preprocessor for CPython C files in order to automate
                the boilerplate involved with writing argument parsing
                code for "builtins".

freeze          Create a stand-alone executable from a Python program.

gdb             Python code to be run inside gdb, to make it easier to
                debug Python itself (by David Malcolm).

i18n            Tools for internationalization. pygettext.py
                parses Python source code and generates .pot files,
                and msgfmt.py generates a binary message catalog
                from a catalog in text format.

importbench     A set of micro-benchmarks for various import scenarios.

iobench         Benchmark for the new Python I/O system. (*)

msi             Support for packaging Python as an MSI package on Windows.

nuget           Files for the NuGet package manager for .NET.

patchcheck      Tools for checking and applying patches to the Python source code
                and verifying the integrity of patch files.

peg_generator   PEG-based parser generator (pegen) used for new parser.

scripts         A number of useful single-file programs, e.g. tabnanny.py
                by Tim Peters, which checks for inconsistent mixing of
                tabs and spaces, and 2to3, which converts Python 2 code
                to Python 3 code.

ssl             Scripts to generate ssl_data.h from OpenSSL sources, and run
                tests against multiple installations of OpenSSL and LibreSSL.

stringbench     A suite of micro-benchmarks for various operations on
                strings (both 8-bit and unicode). (*)

tz              A script to dump timezone from /usr/share/zoneinfo.

unicode         Tools for generating unicodedata and codecs from unicode.org
                and other mapping files (by Fredrik Lundh, Marc-Andre Lemburg
                and Martin von Loewis).

unittestgui     A Tkinter based GUI test runner for unittest, with test
                discovery.

wasm            Config and helpers to facilitate cross compilation of CPython
                to WebAssembly (WASM).

(*) A generic benchmark suite is maintained separately at https://github.com/python/performance

Note: The pynche color editor has moved to https://gitlab.com/warsaw/pynche