Commit Graph

272 Commits

Author SHA1 Message Date
Serhiy Storchaka f7ef0203d4 gh-123803: Support arbitrary code page encodings on Windows (GH-123804)
If the cpXXX encoding is not directly implemented in Python, fall back
to use the Windows-specific API codecs.code_page_encode() and
codecs.code_page_decode().
2024-11-18 17:45:25 +00:00
Serhiy Storchaka 19984fe024 gh-53203: Improve tests for strptime() (GH-125090)
Run them with different locales and different date and time.

Add the @run_with_locales() decorator to run the test with multiple
locales.

Improve the run_with_locale() context manager/decorator -- it now
catches only expected exceptions and reports the test as skipped if no
appropriate locale is available.
2024-10-08 08:40:02 +00:00
Victor Stinner e9f4d80fa6 gh-120417: Add #noqa: F401 to tests (#120627)
Ignore linter "imported but unused" warnings in tests when the linter
doesn't understand how the import is used.
2024-06-18 15:51:47 +00:00
Victor Stinner a557478987 gh-116417: Move limited C API unicode.c tests to _testlimitedcapi (#116993)
Split unicode.c tests of _testcapi into two parts: limited C API
tests in _testlimitedcapi and non-limited C API tests in _testcapi.

Update test_codecs.
2024-03-19 12:30:39 +00:00
John Sloboda 649857a157 gh-85287: Change codecs to raise precise UnicodeEncodeError and UnicodeDecodeError (#113674)
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2024-03-17 04:58:42 +00:00
Zackery Spytz d180b507c4 gh-63283: IDNA prefix should be case insensitive (GH-17726)
Any capitalization of "xn--" should be acceptable for the ACE prefix
(see https://tools.ietf.org/html/rfc3490#section-5).

Co-authored-by: Pepijn de Vos <pepijndevos@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
2024-03-15 15:38:13 +01:00
Serhiy Storchaka b987fdb19b gh-109848: Make test_rot13_func in test_codecs independent (GH-109850) 2023-10-07 16:01:39 +03:00
Furkan Onder 3439cb0049 gh-66143: Allow copying and pickling of CodecInfo object (GH-109235)
Co-authored-by: Robert Lehmann <mail@robertlehmann.de>
2023-09-29 20:07:09 +03:00
Serhiy Storchaka d6892c2b92 gh-50644: Forbid pickling of codecs streams (GH-109180)
Attempts to pickle or create a shallow or deep copy of codecs streams
now raise a TypeError.

Previously, copying failed with a RecursionError, while pickling
produced wrong results that eventually caused unpickling to fail with
a RecursionError.
2023-09-10 20:06:09 +03:00
Nikita Sobolev 6e6a4cd523 gh-106300: Improve assertRaises(Exception) usages in tests (GH-106302) 2023-07-07 13:42:40 -07:00
Irit Katriel 76350e85eb gh-102406: replace exception chaining by PEP-678 notes in codecs (#102407) 2023-03-21 21:36:31 +00:00
Gregory P. Smith d315722564 gh-98433: Fix quadratic time idna decoding. (#99092)
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

This also adds an early length check in IDNA decoding to outright reject
huge inputs early on given the ultimate result is defined to be 63 or fewer
characters.
2022-11-07 16:54:41 -08:00
Stanley d9407b174c gh-51511: Note that codecs.open()'s encoding parameter affects automatic conversion to binary mode (#94370) 2022-10-21 16:01:05 -07:00
Victor Stinner 3ceb4b8d3a gh-84623: Remove unused imports in tests (#93772) 2022-06-13 16:56:03 +02:00
Serhiy Storchaka 3483299a24 gh-81548: Deprecate octal escape sequences with value larger than 0o377 (GH-91668) 2022-04-30 13:16:27 +03:00
Nikita Sobolev 6c83c8e6b5 bpo-46198: rename duplicate tests and remove unused code (GH-30297) 2022-03-10 08:20:11 -08:00
Victor Stinner ccbe8045fa bpo-46659: Fix the MBCS codec alias on Windows (GH-31218) 2022-02-22 22:04:07 +01:00
Victor Stinner 04dd60e50c bpo-46659: Update the test on the mbcs codec alias (GH-31168)
encodings registers the _alias_mbcs() codec search function before
the search_function() codec search function. Previously, the
_alias_mbcs() was never used.

Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code
page, not a fake ANSI code page number.

Remove the test_site.test_aliasing_mbcs() test: the alias is now
implemented in the encodings module, no longer in the site module.
2022-02-06 21:50:09 +01:00
Victor Stinner ea1a54506b bpo-46303: Move fileutils.h private functions to internal C API (GH-30484)
Move almost all private functions of Include/cpython/fileutils.h to
the internal C API Include/internal/pycore_fileutils.h.

Only keep _Py_fopen_obj() in Include/cpython/fileutils.h, since it's
used by _testcapi which must not use the internal C API.

Move EncodeLocaleEx() and DecodeLocaleEx() functions from _testcapi
to _testinternalcapi, since the C API moved to the internal C API.
2022-01-11 11:56:16 +01:00
Christian Heimes e73283a20f bpo-45668: Fix PGO tests without test extensions (GH-29315) 2021-11-01 11:14:53 +01:00
Serhiy Storchaka 39aa98346d bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec (GH-28944)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 20:04:19 +03:00
Serhiy Storchaka c96d1546b1 bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 13:17:00 +03:00
Victor Stinner 19ba2122ac bpo-37330: open() no longer accept 'U' in file mode (GH-28118)
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
2021-09-02 12:58:00 +02:00
Max Bernstein 3635388f52 bpo-42065: Fix incorrectly formatted _codecs.charmap_decode error message (GH-19940) 2020-10-17 23:38:21 +03:00
Hai Shi c9f696cb96 bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)
* Move the codecs' (un)register operation to testcases.
* Remove _codecs._forget_codec() and _PyCodec_Forget()
2020-10-16 10:34:15 +02:00
Hai Shi c5b049b91c bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters (GH-22219) 2020-10-14 17:43:31 +02:00
Hai Shi 3f342376ab bpo-39337: Add a test case for normalizing of codec names (GH-19069) 2020-10-08 21:20:57 +02:00
Hai Shi d332e7b816 bpo-41842: Add codecs.unregister() function (GH-22360)
Add codecs.unregister() and PyCodec_Unregister() functions
to unregister a codec search function.
2020-09-28 23:41:11 +02:00
Victor Stinner 0ee0b2938c bpo-41521: Replace whitelist/blacklist with allowlist/denylist (GH-21823)
Rename 5 test method names in test_codecs and test_typing.
2020-08-11 15:28:43 +02:00
Hai Shi 4660597b51 bpo-40275: Use new test.support helper submodules in tests (GH-21448) 2020-08-03 18:49:18 +02:00
Victor Stinner 942f7a2dea bpo-39674: Revert "bpo-37330: open() no longer accept 'U' in file mode (GH-16959)" (GH-18767)
This reverts commit e471e72977.

The mode will be removed from Python 3.10.
2020-03-04 18:50:22 +01:00
Chris A 2565edec2c bpo-38971: Open file in codecs.open() closes if exception raised. (GH-17666)
Open issue in the BPO indicated a desire to make the implementation of
codecs.open() at parity with io.open(), which implements a try/except to
assure file stream gets closed before an exception is raised.
2020-03-02 08:39:50 +02:00
Berker Peksag ba22e8f174 bpo-30566: Fix IndexError when using punycode codec (GH-18632)
Trying to decode an invalid string with the punycode codec
shoud raise UnicodeError.
2020-02-25 06:19:03 +03:00
Pablo Galindo 293dd23477 Remove binding of captured exceptions when not used to reduce the chances of creating cycles (GH-17246)
Capturing exceptions into names can lead to reference cycles though the __traceback__ attribute of the exceptions in some obscure cases that have been reported previously and fixed individually. As these variables are not used anyway, we can remove the binding to reduce the chances of creating reference cycles.

See for example GH-13135
2019-11-19 21:34:03 +00:00
Victor Stinner e471e72977 bpo-37330: open() no longer accept 'U' in file mode (GH-16959)
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
2019-10-28 15:40:08 +01:00
Zeth b3b48c81f0 bpo-37876: Tests for ROT-13 codec (GH-15314)
The Rot-13 codec is for educational use but does not have unit tests,
dragging down test coverage. This adds a few very simple tests.
2019-09-09 07:50:36 -07:00
Steve Dower 7ebdda0dbe bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083) 2019-08-21 16:22:33 -07:00
Victor Stinner 8f4ef3b019 Remove unused imports in tests (GH-14518) 2019-07-01 18:28:25 +02:00
Serhiy Storchaka 894263ba80 bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
2019-06-25 11:54:18 +03:00
Victor Stinner ca612a9728 bpo-36778: Remove outdated comment from CodePageTest (GH-13807)
CP65001Test has been removed.
2019-06-04 17:09:10 +02:00
Ammar Askar a6ec1ce1ac bpo-33361: Fix bug with seeking in StreamRecoders (GH-8278) 2019-05-31 22:44:00 +03:00
Jelle Zijlstra b3be407288 bpo-33482: fix codecs.StreamRecoder.writelines (GH-6779)
A very simple fix. I found this while writing typeshed stubs for StreamRecoder.


https://bugs.python.org/issue33482
2019-05-22 08:18:26 -07:00
Victor Stinner d267ac20c3 bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230) 2019-05-10 03:19:54 +02:00
Paul Monson 62dfd7d6fe bpo-35920: Windows 10 ARM32 platform support (GH-11774) 2019-04-25 18:36:45 +00:00
Serhiy Storchaka 7a465cb5ee bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)
The bug occurred when the encoded surrogate character is passed
to the incremental decoder in two chunks.
2019-03-30 08:23:38 +02:00
Serhiy Storchaka c1e2c288f4 bpo-36312: Fix decoders for some code pages. (GH-12369) 2019-03-20 21:45:18 +02:00
Inada Naoki 6a16b18224 bpo-36297: remove "unicode_internal" codec (GH-12342) 2019-03-18 15:44:11 +09:00
Serhiy Storchaka 5b10b98247 bpo-22831: Use "with" to avoid possible fd leaks in tests (part 2). (GH-10929) 2019-03-05 10:06:26 +02:00
Serhiy Storchaka 4013c17911 bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848) 2018-12-03 10:36:45 +02:00
Victor Stinner bde9d6bbb4 bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)
Fix memory leak in PyUnicode_EncodeLocale() and
PyUnicode_EncodeFSDefault() on error handling.

Changes:

* Fix unicode_encode_locale() error handling
* Fix test_codecs.LocaleCodecTest
2018-11-28 10:26:20 +01:00