Victor Stinner
9e6b4d715c
Issue #18408 : _PyUnicodeWriter_Finish() now clears its buffer attribute in all
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
13 years ago
Victor Stinner
15a0bd3965
Issue #18408 : Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
13 years ago
Victor Stinner
6f8eeee7b9
Issue #18203 : Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()
13 years ago
Victor Stinner
1a7425f67a
Issue #18203 : Replace malloc() with PyMem_RawMalloc() at Python initialization
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
of PyMem_Malloc()
13 years ago
Christian Heimes
d47802eef7
Fix ref leak in error case of unicode find, count, formatlong
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
13 years ago
Christian Heimes
d47a0456b1
Fix ref leak in error case of unicode index
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
13 years ago
Christian Heimes
ea71a525c3
Fix ref leak in error case of unicode rindex and rfind
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
13 years ago
Christian Heimes
305e49e17e
Fix memory leak in endswith
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
13 years ago
Serhiy Storchaka
8eeae2126c
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
OverflowError when an argument of %c format is out of range.
13 years ago
Benjamin Peterson
7e30373126
remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183 )
13 years ago
Victor Stinner
9f067f490f
Issue #9566 : Fix compiler warning on Windows 64-bit
13 years ago
Antoine Pitrou
8b0e98426d
Issue #17237 : Fix crash in the ASCII decoder on m68k.
13 years ago
Victor Stinner
f4f24248dc
Fix uninitialized value in charmap_decode_mapping()
13 years ago
Victor Stinner
8cecc8c262
Issue #7330 : Implement width and precision (ex: "%5.3s") for the format string
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
13 years ago
Victor Stinner
bb4503f61e
Partial revert of changeset 9744b2df134c
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
13 years ago
Victor Stinner
fb161b1b6d
Split PyUnicode_DecodeCharmap() into subfunction for readability
13 years ago
Victor Stinner
170ca6f84b
Fix bug in Unicode decoders related to _PyUnicodeWriter
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
13 years ago
Victor Stinner
376cfa122d
Fix typo in unicode_decode_call_errorhandler_writer()
Bug introduced by changeset 7ed9993d53b4.
13 years ago
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
13 years ago
Victor Stinner
77282cb4f8
Cleanup PyUnicode_Contains()
* No need to double-check that strings are ready: test already done by
PyUnicode_FromObject()
* Remove useless kind variable (use kind1 instead)
13 years ago
Victor Stinner
d92e078c8d
Minor change: fix character in do_strip() for the ASCII case
13 years ago
Victor Stinner
f033510fee
Cleanup PyUnicode_Append()
* Check also that right is a Unicode object
* call directly resize_compact() instead of unicode_resize() for a more
explicit error handling, and to avoid testing some properties twice
(ex: unicode_modifiable())
13 years ago
Victor Stinner
4560f9c63f
PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code
13 years ago
Victor Stinner
55c08781e8
Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped
13 years ago
Victor Stinner
af03757d20
Optimize ascii(str): don't encode/decode repr if repr is already ASCII
13 years ago
Victor Stinner
8a1a6cffd6
Add _PyUnicodeWriter_WriteCharInline()
13 years ago
Serhiy Storchaka
e2cef885a2
Issue #16061 : Speed up str.replace() for replacing 1-character strings.
13 years ago
Victor Stinner
a0dd0213cc
Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
the legacy Py_UNICODE API.
Add also a new _PyUnicodeWriter_WriteChar() function.
13 years ago
Victor Stinner
247109e74d
Issue #17615 : On Windows (VS2010), Performances of wmemcmp() to compare Unicode
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.
wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
13 years ago
Victor Stinner
0cff4b16d9
replace(): only call PyUnicode_DATA(u) once
13 years ago
Victor Stinner
cc7af72192
Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII
13 years ago
Victor Stinner
f50a4e9bc9
Don't calls macros in PyUnicode_WRITE() parameters
PyUnicode_WRITE() expands some parameters twice or more.
13 years ago
Victor Stinner
9c79e41fc5
Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
it twice
13 years ago
Victor Stinner
b3a6014504
Fix _PyUnicode_XStrip()
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
13 years ago
Victor Stinner
63d5c1a14a
Optimize PyUnicode_DecodeCharmap()
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
13 years ago
Victor Stinner
a85af502a4
Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
13 years ago
Victor Stinner
69ed0f4c86
Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()
"PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls
PyUnicode_KIND() and might call it twice." according to its documentation.
13 years ago
Victor Stinner
03c3e35d42
Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
cp037, cp500 and iso8859_1 codecs
13 years ago
Victor Stinner
cd777eaf53
Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible
wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora
18/x86_64, GCC 4.7.2.
13 years ago
Victor Stinner
c1302bba4c
Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare():
write specialized functions for each combination of Unicode kinds.
13 years ago
Victor Stinner
207dd38726
fix unused variable
13 years ago
Victor Stinner
eb4b5ac8af
Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function
when possible
13 years ago
Victor Stinner
cfc4c13b04
Add _PyUnicodeWriter_WriteSubstring() function
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
13 years ago
Benjamin Peterson
da2c7ebd23
allow any type with __getitem__ to be a mapping for the purposes of % ( #15801 )
13 years ago
Raymond Hettinger
378170d5d9
Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.
13 years ago
Victor Stinner
2cb16aa3cb
_PyUnicode_Writer() now also reuses Unicode singletons:
empty string and latin1 single character
13 years ago
Victor Stinner
cf77da9fb5
Backed out changeset b9f7b1bf36aa
13 years ago
Victor Stinner
313cac88c5
Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
to reject invalid UTF-16 surrogate.
13 years ago
Victor Stinner
d21b58c05d
Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside
the range U+0000-U+10ffff.
13 years ago
Victor Stinner
bbbac2ec34
Issue #17137 : When an Unicode string is resized, the internal wide character
string (wstr) format is now cleared.
13 years ago