Benjamin Peterson
4d94474ba3
rewrite the parsing of field names to be more consistent wrt recursive expansion
13 years ago
Benjamin Peterson
d2b58a9880
only recursively expand in the format spec ( closes #17644 )
13 years ago
Ezio Melotti
6b02772c13
Remove trailing whitespace.
13 years ago
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
13 years ago
Victor Stinner
76b3b2726c
stringlib: remove unused STRINGLIB_RESIZE macro
13 years ago
Serhiy Storchaka
e2cef885a2
Issue #16061 : Speed up str.replace() for replacing 1-character strings.
13 years ago
Victor Stinner
7efa3b8242
Close #13126 : "Simplify" FASTSEARCH() code to help the compiler to emit more
efficient machine code. Patch written by Antoine Pitrou.
Without this change, str.find() was 10% slower than str.rfind() in the worst
case.
13 years ago
Victor Stinner
cfc4c13b04
Add _PyUnicodeWriter_WriteSubstring() function
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
13 years ago
Serhiy Storchaka
18809fa94e
Remove unused defines.
13 years ago
Antoine Pitrou
4de7457009
Issue #17173 : Remove uses of locale-dependent C functions (isalpha() etc.) in the interpreter.
I've left a couple of them in: zlib (third-party lib), getaddrinfo.c
(doesn't include Python.h, and probably obsolete), _sre.c (legitimate
use for the re.LOCALE flag).
13 years ago
Serhiy Storchaka
18ba40b945
Check for NULL before the pointer aligning in fastsearch_memchr_1char.
There is no guarantee that NULL is aligned.
13 years ago
Christian Heimes
5f7e8dab11
Issue #16592 : stringlib_bytes_join doesn't raise MemoryError on allocation failure
13 years ago
Victor Stinner
ab60de478d
Issue #8271 : Fix compilation on Windows
13 years ago
Ezio Melotti
f7ed5d111b
#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
13 years ago
Mark Dickinson
fb90c0934c
Issue #14700 : Fix buggy overflow checks for large precision and width in new-style and old-style formatting.
13 years ago
Mark Dickinson
75d3600466
Issue #14700 : Fix buggy overflow checks for large precision and width in new-style and old-style formatting.
13 years ago
Antoine Pitrou
6f7b0da6bc
Issue #12805 : Make bytes.join and bytearray.join faster when the separator is empty.
Patch by Serhiy Storchaka.
13 years ago
Christian Heimes
743e0cd6b5
Issue #16166 : Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
endianess detection and handling.
13 years ago
Antoine Pitrou
cfc22b4a9b
Issue #15958 : bytes.join and bytearray.join now accept arbitrary buffer objects.
13 years ago
Antoine Pitrou
ca8aa4acf6
Issue #15144 : Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
Patch by Serhiy Storchaka.
14 years ago
Victor Stinner
b3f5501250
Close #15534 : Fix a typo in the fast search function of the string library (_s => s)
Replace _s with ptr to avoid future confusion. Add also non regression tests.
14 years ago
Mark Dickinson
01ac8b6ab1
Use correct types for ASCII_CHAR_MASK integer constants.
14 years ago
Mark Dickinson
106c4145ff
Issue #14923 : Optimize continuation-byte check in UTF-8 decoding. Patch by Serhiy Storchaka.
14 years ago
Antoine Pitrou
a759d4e9f4
Make private function static (from `make smelly`)
14 years ago
Antoine Pitrou
27f6a3b0bf
Issue #15026 : utf-16 encoding is now significantly faster (up to 10x).
Patch by Serhiy Storchaka.
14 years ago
Victor Stinner
d7b7c7472b
Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield
14 years ago
Victor Stinner
d3f0882dfb
Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
avoids a temporary buffer in most cases.
* Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
keep a reference to the string if the output is only composed of one string
* Disable overallocation when formatting the last argument of str%args and
str.format(args)
* Overallocation allocates at least 100 characters: add min_length attribute
to the _PyUnicodeWriter structure
* Add new private functions: _PyUnicode_FastCopyCharacters(),
_PyUnicode_FastFill() and _PyUnicode_FromASCII()
The speed up is around 20% in average.
14 years ago
Antoine Pitrou
63065d761e
Issue #14624 : UTF-16 decoding is now 3x to 4x faster on various inputs.
Patch by Serhiy Storchaka.
14 years ago
Antoine Pitrou
ca5f91b888
Issue #14738 : Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka.
14 years ago
Victor Stinner
3b1a74a9c3
Rename unicode_write_t structure and its methods to "_PyUnicodeWriter"
14 years ago
Victor Stinner
ee4544c920
Issue #14744 : Inline unicode_writer_write_char() and unicode_write_str()
Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once
per argument.
14 years ago
Victor Stinner
202fdca133
Close #14716 : str.format() now uses the new "unicode writer" API instead of the
PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.
14 years ago
Victor Stinner
41a863cb81
Issue #13706 : Fix format(int, "n") for locale with non-ASCII thousands separator
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
(from the locale encoding), instead of decoding them implicitly from latin1
* Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
* Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
character if unicode is NULL
* Replace MIN/MAX macros by Py_MIN/Py_MAX
* stringlib/undef.h undefines STRINGLIB_IS_UNICODE
* stringlib/localeutil.h only supports Unicode
14 years ago
Benjamin Peterson
21e0da228d
remove some usage of Py_UNICODE_TOUPPER/LOWER
14 years ago
Victor Stinner
6099a03202
Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization
The main bottleneck was the PyUnicode_READ() macro.
14 years ago
Victor Stinner
f8eac00779
Issue #13623 : Fix a performance regression introduced by issue #12170 in
bytes.find() and handle correctly OverflowError (raise the same ValueError than
the error for -1).
14 years ago
Victor Stinner
b37b17423b
Replace PyUnicode_FromUnicode(NULL, 0) by PyUnicode_New(0, 0)
Create an empty string with the new Unicode API.
14 years ago
Antoine Pitrou
0a3229de6b
Issue #13417 : speed up utf-8 decoding by around 2x for the non-fully-ASCII case.
This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass.
14 years ago
Victor Stinner
0fc35196bb
stringlib: remove unused STRINGLIB_FILL
14 years ago
Victor Stinner
7931d9a951
Replace PyUnicodeObject type by PyObject
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
* Remove now useless casts to PyObject*
14 years ago
Victor Stinner
9db1a8b69f
Replace PyUnicodeObject* by PyObject* where it was irrevelant
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
14 years ago
Antoine Pitrou
ac65d96777
Issue #12170 : The count(), find(), rfind(), index() and rindex() methods
of bytes and bytearray objects now accept an integer between 0 and 255
as their first argument. Patch by Petri Lehtinen.
14 years ago
Antoine Pitrou
5b9f4c1539
Fix typo
14 years ago
Antoine Pitrou
c198d0599b
Add a comment explaining this heuristic.
14 years ago
Antoine Pitrou
dda339e6d2
Simplify heuristic for when to use memchr
14 years ago
Antoine Pitrou
dd4e2f0153
Issue #13155 : Optimize finding the optimal character width of an unicode string
14 years ago
Victor Stinner
d218bf14cc
stringlib: Fix STRINGLIB_STR for UCS2/UCS4
14 years ago
Victor Stinner
8cc70dcf70
Fix fastsearch for UCS2 and UCS4
* If needle is 0, try (p[0] >> 16) & 0xff for UCS4
* Disable fastsearch_memchr_1char() if needle is zero for UCS2 and UCS4
14 years ago
Antoine Pitrou
2c3b2302ad
Issue #13134 : optimize finding single-character strings using memchr
14 years ago
Martin v. Löwis
c47adb04b3
Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE.
14 years ago