Benjamin Peterson
821e4cfd01
make fix_decimal_and_space_to_ascii check if it modifies the string
15 years ago
Benjamin Peterson
0c91392fe6
kill capwords implementation which has been disabled since the begining
15 years ago
Benjamin Peterson
b2bf01d824
use full unicode mappings for upper/lower/title case ( #12736 )
Also broaden the category of characters that count as lowercase/uppercase.
15 years ago
Victor Stinner
3fe553160c
Add a new PyUnicode_Fill() function
It is faster than the unicode_fill() function which was implemented in
formatter_unicode.c.
15 years ago
Benjamin Peterson
5e458f520c
also decref the right thing
15 years ago
Benjamin Peterson
4c13a4a352
ready the correct string
15 years ago
Benjamin Peterson
22a29708fd
fix some possible refleaks from PyUnicode_READY error conditions
15 years ago
Benjamin Peterson
9ca3ffac94
== -1 is convention
15 years ago
Benjamin Peterson
e157cf1012
make switch more robust
15 years ago
Benjamin Peterson
c0b95d18fa
4 space indentation
15 years ago
Benjamin Peterson
ead6b53659
fix spacing around switch statements
15 years ago
Victor Stinner
6099a03202
Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization
The main bottleneck was the PyUnicode_READ() macro.
15 years ago
Victor Stinner
73f53b57d1
Optimize str * n for len(str)==1 and UCS-2 or UCS-4
15 years ago
Victor Stinner
f644110816
Issue #13621 : Optimize str.replace(char1, char2)
Use findchar() which is more optimized than a dummy loop using
PyUnicode_READ(). PyUnicode_READ() is a complex and slow macro.
15 years ago
Victor Stinner
2f197078fb
The locale decoder raises a UnicodeDecodeError instead of an OSError
Search the invalid character using mbrtowc().
15 years ago
Victor Stinner
1b57967b96
Issue #13560 : Locale codec functions use the classic "errors" parameter,
instead of surrogateescape
So it would be possible to support more error handlers later.
15 years ago
Victor Stinner
ab59594326
What's New in Python 3.3: complete the deprecation list
Add also FIXMEs in unicodeobject.c
15 years ago
Victor Stinner
1f33f2b0c3
Issue #13560 : os.strerror() now uses the current locale encoding instead of UTF-8
15 years ago
Victor Stinner
f2ea71fcc8
Issue #13560 : Add PyUnicode_EncodeLocale()
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
available
* Document my last changes in Misc/NEWS
15 years ago
Victor Stinner
af02e1c85a
Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
from the current locale encoding
* _Py_char2wchar() writes an "error code" in the size argument to indicate
if the function failed because of memory allocation failure or because of a
decoding error. The function doesn't write the error message directly to
stderr.
* Fix time.strftime() (if wcsftime() is missing): decode strftime() result
from the current locale encoding, not from the filesystem encoding.
15 years ago
Victor Stinner
16e6a80923
PyUnicode_Resize(): warn about canonical representation
Call also directly unicode_resize() in unicodeobject.c
15 years ago
Victor Stinner
b0a82a6a7f
Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
Fix also PyUnicode_Resize() doc
15 years ago
Victor Stinner
bf6e560d0c
Make PyUnicode_Copy() private => _PyUnicode_Copy()
Undocument the function.
Make also decode_utf8_errors() as private (static).
15 years ago
Victor Stinner
7a9105a380
resize_copy() now supports legacy ready strings
15 years ago
Victor Stinner
488fa49acf
Rewrite PyUnicode_Append(); unicode_modifiable() is more strict
* Rename unicode_resizable() to unicode_modifiable()
* Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
that the function is private
* Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
to simplify the code
* unicode_modifiable() return 0 if the hash has been computed or if the string
is not an exact unicode string
* Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
hash has already been computed, you cannot modify a string inplace anymore
* PyUnicode_Concat() checks for integer overflow
15 years ago
Victor Stinner
c4b495497a
Create unicode_result_unchanged() subfunction
15 years ago
Victor Stinner
eaab604829
Fix fixup() for unchanged unicode subtype
If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.
15 years ago
Victor Stinner
e6b2d4407a
unicode_fromascii() doesn't check string content twice in debug mode
_PyUnicode_CheckConsistency() also checks string content.
15 years ago
Victor Stinner
a1d12bb119
Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()
* Remove micro-optimization from PyUnicode_FromStringAndSize():
PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
and one ascii char).
* Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
useless variable
15 years ago
Victor Stinner
382955ff4e
Use directly unicode_empty instead of PyUnicode_New(0, 0)
15 years ago
Victor Stinner
785938eebd
Move the slowest UTF-8 decoder to its own subfunction
* Create decode_utf8_errors()
* Reuse unicode_fromascii()
* decode_utf8_errors() doesn't refit at the beginning
* Remove refit_partial_string(), use unicode_adjust_maxchar() instead
15 years ago
Victor Stinner
84def3774d
Fix error handling in resize_compact()
15 years ago
Victor Stinner
8faf8216e4
PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a
character in not in range [U+0000; U+10ffff].
15 years ago
Victor Stinner
551ac95733
Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
And use surrogates macros everywhere in unicodeobject.c
15 years ago
Victor Stinner
6345be9a14
Close #13093 : PyUnicode_EncodeDecimal() doesn't support error handlers
different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler.
15 years ago
Benjamin Peterson
1518e8713d
and back to the "magic" formula (with a comment) it is
15 years ago
Benjamin Peterson
5944c36931
cave to those who like readable code
15 years ago
Benjamin Peterson
0268675193
fix compiler warning by implementing this more cleverly
15 years ago
Victor Stinner
ca4f20782e
find_maxchar_surrogates() reuses surrogate macros
15 years ago
Victor Stinner
0d3721d986
Issue #13441 : Disable temporary the check on the maximum character until
the Solaris issue is solved.
But add assertion on the maximum character in various encoders: UTF-7, UTF-8,
wide character (wchar_t*, Py_UNICODE*), unicode-escape, raw-unicode-escape.
Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is
now always "wide".
15 years ago
Victor Stinner
f8facacf30
Fix compiler warnings
15 years ago
Victor Stinner
b84d723509
(Merge 3.2) Issue #13093 : Fix error handling on PyUnicode_EncodeDecimal()
15 years ago
Victor Stinner
ab1d16b456
Issue #13093 : Fix error handling on PyUnicode_EncodeDecimal()
* Add tests for PyUnicode_EncodeDecimal() and PyUnicode_TransformDecimalToASCII()
* Remove the unused "e" variable in replace()
15 years ago
Victor Stinner
cfed46e00a
PyUnicode_FromKindAndData() fails with a ValueError if size < 0
15 years ago
Victor Stinner
42885206ec
UTF-8 decoder: set consumed value in the latin1 fast-path
15 years ago
Victor Stinner
d3df8ab377
Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready()
* unicode_ready() has a simpler API
* try to reuse unicode_empty and latin1_char singleton everywhere
* Fix a reference leak in _PyUnicode_TranslateCharmap()
* PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid
having to handle a failure
15 years ago
Victor Stinner
f01245067a
Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API
15 years ago
Victor Stinner
2d718f39a5
Remove an unused variable from PyUnicode_Copy()
15 years ago
Victor Stinner
87af4f2f3a
Simplify PyUnicode_Copy()
USe PyUnicode_Copy() in fixup()
15 years ago
Victor Stinner
5bbe5e7c85
Fix a compiler warning in _PyUnicode_CheckConsistency()
15 years ago