You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1075 lines
41 KiB

Merged revisions 55007-55179 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/p3yk ........ r55077 | guido.van.rossum | 2007-05-02 11:54:37 -0700 (Wed, 02 May 2007) | 2 lines Use the new print syntax, at least. ........ r55142 | fred.drake | 2007-05-04 21:27:30 -0700 (Fri, 04 May 2007) | 1 line remove old cruftiness ........ r55143 | fred.drake | 2007-05-04 21:52:16 -0700 (Fri, 04 May 2007) | 1 line make this work with the new Python ........ r55162 | neal.norwitz | 2007-05-06 22:29:18 -0700 (Sun, 06 May 2007) | 1 line Get asdl code gen working with Python 2.3. Should continue to work with 3.0 ........ r55164 | neal.norwitz | 2007-05-07 00:00:38 -0700 (Mon, 07 May 2007) | 1 line Verify checkins to p3yk (sic) branch go to 3000 list. ........ r55166 | neal.norwitz | 2007-05-07 00:12:35 -0700 (Mon, 07 May 2007) | 1 line Fix this test so it runs again by importing warnings_test properly. ........ r55167 | neal.norwitz | 2007-05-07 01:03:22 -0700 (Mon, 07 May 2007) | 8 lines So long xrange. range() now supports values that are outside -sys.maxint to sys.maxint. floats raise a TypeError. This has been sitting for a long time. It probably has some problems and needs cleanup. Objects/rangeobject.c now uses 4-space indents since it is almost completely new. ........ r55171 | guido.van.rossum | 2007-05-07 10:21:26 -0700 (Mon, 07 May 2007) | 4 lines Fix two tests that were previously depending on significant spaces at the end of a line (and before that on Python 2.x print behavior that has no exact equivalent in 3.0). ........
19 years ago
Merged revisions 55007-55179 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/p3yk ........ r55077 | guido.van.rossum | 2007-05-02 11:54:37 -0700 (Wed, 02 May 2007) | 2 lines Use the new print syntax, at least. ........ r55142 | fred.drake | 2007-05-04 21:27:30 -0700 (Fri, 04 May 2007) | 1 line remove old cruftiness ........ r55143 | fred.drake | 2007-05-04 21:52:16 -0700 (Fri, 04 May 2007) | 1 line make this work with the new Python ........ r55162 | neal.norwitz | 2007-05-06 22:29:18 -0700 (Sun, 06 May 2007) | 1 line Get asdl code gen working with Python 2.3. Should continue to work with 3.0 ........ r55164 | neal.norwitz | 2007-05-07 00:00:38 -0700 (Mon, 07 May 2007) | 1 line Verify checkins to p3yk (sic) branch go to 3000 list. ........ r55166 | neal.norwitz | 2007-05-07 00:12:35 -0700 (Mon, 07 May 2007) | 1 line Fix this test so it runs again by importing warnings_test properly. ........ r55167 | neal.norwitz | 2007-05-07 01:03:22 -0700 (Mon, 07 May 2007) | 8 lines So long xrange. range() now supports values that are outside -sys.maxint to sys.maxint. floats raise a TypeError. This has been sitting for a long time. It probably has some problems and needs cleanup. Objects/rangeobject.c now uses 4-space indents since it is almost completely new. ........ r55171 | guido.van.rossum | 2007-05-07 10:21:26 -0700 (Mon, 07 May 2007) | 4 lines Fix two tests that were previously depending on significant spaces at the end of a line (and before that on Python 2.x print behavior that has no exact equivalent in 3.0). ........
19 years ago
Merged revisions 55007-55179 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/p3yk ........ r55077 | guido.van.rossum | 2007-05-02 11:54:37 -0700 (Wed, 02 May 2007) | 2 lines Use the new print syntax, at least. ........ r55142 | fred.drake | 2007-05-04 21:27:30 -0700 (Fri, 04 May 2007) | 1 line remove old cruftiness ........ r55143 | fred.drake | 2007-05-04 21:52:16 -0700 (Fri, 04 May 2007) | 1 line make this work with the new Python ........ r55162 | neal.norwitz | 2007-05-06 22:29:18 -0700 (Sun, 06 May 2007) | 1 line Get asdl code gen working with Python 2.3. Should continue to work with 3.0 ........ r55164 | neal.norwitz | 2007-05-07 00:00:38 -0700 (Mon, 07 May 2007) | 1 line Verify checkins to p3yk (sic) branch go to 3000 list. ........ r55166 | neal.norwitz | 2007-05-07 00:12:35 -0700 (Mon, 07 May 2007) | 1 line Fix this test so it runs again by importing warnings_test properly. ........ r55167 | neal.norwitz | 2007-05-07 01:03:22 -0700 (Mon, 07 May 2007) | 8 lines So long xrange. range() now supports values that are outside -sys.maxint to sys.maxint. floats raise a TypeError. This has been sitting for a long time. It probably has some problems and needs cleanup. Objects/rangeobject.c now uses 4-space indents since it is almost completely new. ........ r55171 | guido.van.rossum | 2007-05-07 10:21:26 -0700 (Mon, 07 May 2007) | 4 lines Fix two tests that were previously depending on significant spaces at the end of a line (and before that on Python 2.x print behavior that has no exact equivalent in 3.0). ........
19 years ago
Merged revisions 68633,68648,68667,68706,68718,68720-68721,68724-68727,68739 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r68633 | thomas.heller | 2009-01-16 12:53:44 -0600 (Fri, 16 Jan 2009) | 3 lines Change an example in the docs to avoid a mistake when the code is copy pasted and changed afterwards. ........ r68648 | benjamin.peterson | 2009-01-16 22:28:57 -0600 (Fri, 16 Jan 2009) | 1 line use enumerate ........ r68667 | amaury.forgeotdarc | 2009-01-17 14:18:59 -0600 (Sat, 17 Jan 2009) | 3 lines #4077: No need to append \n when calling Py_FatalError + fix a declaration to make it match the one in pythonrun.h ........ r68706 | benjamin.peterson | 2009-01-17 19:28:46 -0600 (Sat, 17 Jan 2009) | 1 line fix grammar ........ r68718 | georg.brandl | 2009-01-18 04:42:35 -0600 (Sun, 18 Jan 2009) | 1 line #4976: union() and intersection() take multiple args, but talk about "the other". ........ r68720 | georg.brandl | 2009-01-18 04:45:22 -0600 (Sun, 18 Jan 2009) | 1 line #4974: fix redundant mention of lists and tuples. ........ r68721 | georg.brandl | 2009-01-18 04:48:16 -0600 (Sun, 18 Jan 2009) | 1 line #4914: trunc is in math. ........ r68724 | georg.brandl | 2009-01-18 07:24:10 -0600 (Sun, 18 Jan 2009) | 1 line #4979: correct result range for some random functions. ........ r68725 | georg.brandl | 2009-01-18 07:47:26 -0600 (Sun, 18 Jan 2009) | 1 line #4857: fix augmented assignment target spec. ........ r68726 | georg.brandl | 2009-01-18 08:41:52 -0600 (Sun, 18 Jan 2009) | 1 line #4923: clarify what was added. ........ r68727 | georg.brandl | 2009-01-18 12:25:30 -0600 (Sun, 18 Jan 2009) | 1 line #4986: augassigns are not expressions. ........ r68739 | benjamin.peterson | 2009-01-18 15:11:38 -0600 (Sun, 18 Jan 2009) | 1 line fix test that wasn't working as expected #4990 ........
17 years ago
  1. import codecs
  2. import html.entities
  3. import sys
  4. import test.support
  5. import unicodedata
  6. import unittest
  7. import warnings
  8. class PosReturn:
  9. # this can be used for configurable callbacks
  10. def __init__(self):
  11. self.pos = 0
  12. def handle(self, exc):
  13. oldpos = self.pos
  14. realpos = oldpos
  15. if realpos<0:
  16. realpos = len(exc.object) + realpos
  17. # if we don't advance this time, terminate on the next call
  18. # otherwise we'd get an endless loop
  19. if realpos <= exc.start:
  20. self.pos = len(exc.object)
  21. return ("<?>", oldpos)
  22. # A UnicodeEncodeError object with a bad start attribute
  23. class BadStartUnicodeEncodeError(UnicodeEncodeError):
  24. def __init__(self):
  25. UnicodeEncodeError.__init__(self, "ascii", "", 0, 1, "bad")
  26. self.start = []
  27. # A UnicodeEncodeError object with a bad object attribute
  28. class BadObjectUnicodeEncodeError(UnicodeEncodeError):
  29. def __init__(self):
  30. UnicodeEncodeError.__init__(self, "ascii", "", 0, 1, "bad")
  31. self.object = []
  32. # A UnicodeDecodeError object without an end attribute
  33. class NoEndUnicodeDecodeError(UnicodeDecodeError):
  34. def __init__(self):
  35. UnicodeDecodeError.__init__(self, "ascii", bytearray(b""), 0, 1, "bad")
  36. del self.end
  37. # A UnicodeDecodeError object with a bad object attribute
  38. class BadObjectUnicodeDecodeError(UnicodeDecodeError):
  39. def __init__(self):
  40. UnicodeDecodeError.__init__(self, "ascii", bytearray(b""), 0, 1, "bad")
  41. self.object = []
  42. # A UnicodeTranslateError object without a start attribute
  43. class NoStartUnicodeTranslateError(UnicodeTranslateError):
  44. def __init__(self):
  45. UnicodeTranslateError.__init__(self, "", 0, 1, "bad")
  46. del self.start
  47. # A UnicodeTranslateError object without an end attribute
  48. class NoEndUnicodeTranslateError(UnicodeTranslateError):
  49. def __init__(self):
  50. UnicodeTranslateError.__init__(self, "", 0, 1, "bad")
  51. del self.end
  52. # A UnicodeTranslateError object without an object attribute
  53. class NoObjectUnicodeTranslateError(UnicodeTranslateError):
  54. def __init__(self):
  55. UnicodeTranslateError.__init__(self, "", 0, 1, "bad")
  56. del self.object
  57. class CodecCallbackTest(unittest.TestCase):
  58. def test_xmlcharrefreplace(self):
  59. # replace unencodable characters which numeric character entities.
  60. # For ascii, latin-1 and charmaps this is completely implemented
  61. # in C and should be reasonably fast.
  62. s = "\u30b9\u30d1\u30e2 \xe4nd eggs"
  63. self.assertEqual(
  64. s.encode("ascii", "xmlcharrefreplace"),
  65. b"&#12473;&#12497;&#12514; &#228;nd eggs"
  66. )
  67. self.assertEqual(
  68. s.encode("latin-1", "xmlcharrefreplace"),
  69. b"&#12473;&#12497;&#12514; \xe4nd eggs"
  70. )
  71. def test_xmlcharnamereplace(self):
  72. # This time use a named character entity for unencodable
  73. # characters, if one is available.
  74. def xmlcharnamereplace(exc):
  75. if not isinstance(exc, UnicodeEncodeError):
  76. raise TypeError("don't know how to handle %r" % exc)
  77. l = []
  78. for c in exc.object[exc.start:exc.end]:
  79. try:
  80. l.append("&%s;" % html.entities.codepoint2name[ord(c)])
  81. except KeyError:
  82. l.append("&#%d;" % ord(c))
  83. return ("".join(l), exc.end)
  84. codecs.register_error(
  85. "test.xmlcharnamereplace", xmlcharnamereplace)
  86. sin = "\xab\u211c\xbb = \u2329\u1234\u20ac\u232a"
  87. sout = b"&laquo;&real;&raquo; = &lang;&#4660;&euro;&rang;"
  88. self.assertEqual(sin.encode("ascii", "test.xmlcharnamereplace"), sout)
  89. sout = b"\xab&real;\xbb = &lang;&#4660;&euro;&rang;"
  90. self.assertEqual(sin.encode("latin-1", "test.xmlcharnamereplace"), sout)
  91. sout = b"\xab&real;\xbb = &lang;&#4660;\xa4&rang;"
  92. self.assertEqual(sin.encode("iso-8859-15", "test.xmlcharnamereplace"), sout)
  93. def test_uninamereplace(self):
  94. # We're using the names from the unicode database this time,
  95. # and we're doing "syntax highlighting" here, i.e. we include
  96. # the replaced text in ANSI escape sequences. For this it is
  97. # useful that the error handler is not called for every single
  98. # unencodable character, but for a complete sequence of
  99. # unencodable characters, otherwise we would output many
  100. # unnecessary escape sequences.
  101. def uninamereplace(exc):
  102. if not isinstance(exc, UnicodeEncodeError):
  103. raise TypeError("don't know how to handle %r" % exc)
  104. l = []
  105. for c in exc.object[exc.start:exc.end]:
  106. l.append(unicodedata.name(c, "0x%x" % ord(c)))
  107. return ("\033[1m%s\033[0m" % ", ".join(l), exc.end)
  108. codecs.register_error(
  109. "test.uninamereplace", uninamereplace)
  110. sin = "\xac\u1234\u20ac\u8000"
  111. sout = b"\033[1mNOT SIGN, ETHIOPIC SYLLABLE SEE, EURO SIGN, CJK UNIFIED IDEOGRAPH-8000\033[0m"
  112. self.assertEqual(sin.encode("ascii", "test.uninamereplace"), sout)
  113. sout = b"\xac\033[1mETHIOPIC SYLLABLE SEE, EURO SIGN, CJK UNIFIED IDEOGRAPH-8000\033[0m"
  114. self.assertEqual(sin.encode("latin-1", "test.uninamereplace"), sout)
  115. sout = b"\xac\033[1mETHIOPIC SYLLABLE SEE\033[0m\xa4\033[1mCJK UNIFIED IDEOGRAPH-8000\033[0m"
  116. self.assertEqual(sin.encode("iso-8859-15", "test.uninamereplace"), sout)
  117. def test_backslashescape(self):
  118. # Does the same as the "unicode-escape" encoding, but with different
  119. # base encodings.
  120. sin = "a\xac\u1234\u20ac\u8000\U0010ffff"
  121. sout = b"a\\xac\\u1234\\u20ac\\u8000\\U0010ffff"
  122. self.assertEqual(sin.encode("ascii", "backslashreplace"), sout)
  123. sout = b"a\xac\\u1234\\u20ac\\u8000\\U0010ffff"
  124. self.assertEqual(sin.encode("latin-1", "backslashreplace"), sout)
  125. sout = b"a\xac\\u1234\xa4\\u8000\\U0010ffff"
  126. self.assertEqual(sin.encode("iso-8859-15", "backslashreplace"), sout)
  127. def test_nameescape(self):
  128. # Does the same as backslashescape, but prefers ``\N{...}`` escape
  129. # sequences.
  130. sin = "a\xac\u1234\u20ac\u8000\U0010ffff"
  131. sout = (b'a\\N{NOT SIGN}\\N{ETHIOPIC SYLLABLE SEE}\\N{EURO SIGN}'
  132. b'\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff')
  133. self.assertEqual(sin.encode("ascii", "namereplace"), sout)
  134. sout = (b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\\N{EURO SIGN}'
  135. b'\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff')
  136. self.assertEqual(sin.encode("latin-1", "namereplace"), sout)
  137. sout = (b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\xa4'
  138. b'\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff')
  139. self.assertEqual(sin.encode("iso-8859-15", "namereplace"), sout)
  140. def test_decoding_callbacks(self):
  141. # This is a test for a decoding callback handler
  142. # that allows the decoding of the invalid sequence
  143. # "\xc0\x80" and returns "\x00" instead of raising an error.
  144. # All other illegal sequences will be handled strictly.
  145. def relaxedutf8(exc):
  146. if not isinstance(exc, UnicodeDecodeError):
  147. raise TypeError("don't know how to handle %r" % exc)
  148. if exc.object[exc.start:exc.start+2] == b"\xc0\x80":
  149. return ("\x00", exc.start+2) # retry after two bytes
  150. else:
  151. raise exc
  152. codecs.register_error("test.relaxedutf8", relaxedutf8)
  153. # all the "\xc0\x80" will be decoded to "\x00"
  154. sin = b"a\x00b\xc0\x80c\xc3\xbc\xc0\x80\xc0\x80"
  155. sout = "a\x00b\x00c\xfc\x00\x00"
  156. self.assertEqual(sin.decode("utf-8", "test.relaxedutf8"), sout)
  157. # "\xc0\x81" is not valid and a UnicodeDecodeError will be raised
  158. sin = b"\xc0\x80\xc0\x81"
  159. self.assertRaises(UnicodeDecodeError, sin.decode,
  160. "utf-8", "test.relaxedutf8")
  161. def test_charmapencode(self):
  162. # For charmap encodings the replacement string will be
  163. # mapped through the encoding again. This means, that
  164. # to be able to use e.g. the "replace" handler, the
  165. # charmap has to have a mapping for "?".
  166. charmap = dict((ord(c), bytes(2*c.upper(), 'ascii')) for c in "abcdefgh")
  167. sin = "abc"
  168. sout = b"AABBCC"
  169. self.assertEqual(codecs.charmap_encode(sin, "strict", charmap)[0], sout)
  170. sin = "abcA"
  171. self.assertRaises(UnicodeError, codecs.charmap_encode, sin, "strict", charmap)
  172. charmap[ord("?")] = b"XYZ"
  173. sin = "abcDEF"
  174. sout = b"AABBCCXYZXYZXYZ"
  175. self.assertEqual(codecs.charmap_encode(sin, "replace", charmap)[0], sout)
  176. charmap[ord("?")] = "XYZ" # wrong type in mapping
  177. self.assertRaises(TypeError, codecs.charmap_encode, sin, "replace", charmap)
  178. def test_decodeunicodeinternal(self):
  179. with test.support.check_warnings(('unicode_internal codec has been '
  180. 'deprecated', DeprecationWarning)):
  181. self.assertRaises(
  182. UnicodeDecodeError,
  183. b"\x00\x00\x00\x00\x00".decode,
  184. "unicode-internal",
  185. )
  186. if len('\0'.encode('unicode-internal')) == 4:
  187. def handler_unicodeinternal(exc):
  188. if not isinstance(exc, UnicodeDecodeError):
  189. raise TypeError("don't know how to handle %r" % exc)
  190. return ("\x01", 1)
  191. self.assertEqual(
  192. b"\x00\x00\x00\x00\x00".decode("unicode-internal", "ignore"),
  193. "\u0000"
  194. )
  195. self.assertEqual(
  196. b"\x00\x00\x00\x00\x00".decode("unicode-internal", "replace"),
  197. "\u0000\ufffd"
  198. )
  199. self.assertEqual(
  200. b"\x00\x00\x00\x00\x00".decode("unicode-internal", "backslashreplace"),
  201. "\u0000\\x00"
  202. )
  203. codecs.register_error("test.hui", handler_unicodeinternal)
  204. self.assertEqual(
  205. b"\x00\x00\x00\x00\x00".decode("unicode-internal", "test.hui"),
  206. "\u0000\u0001\u0000"
  207. )
  208. def test_callbacks(self):
  209. def handler1(exc):
  210. r = range(exc.start, exc.end)
  211. if isinstance(exc, UnicodeEncodeError):
  212. l = ["<%d>" % ord(exc.object[pos]) for pos in r]
  213. elif isinstance(exc, UnicodeDecodeError):
  214. l = ["<%d>" % exc.object[pos] for pos in r]
  215. else:
  216. raise TypeError("don't know how to handle %r" % exc)
  217. return ("[%s]" % "".join(l), exc.end)
  218. codecs.register_error("test.handler1", handler1)
  219. def handler2(exc):
  220. if not isinstance(exc, UnicodeDecodeError):
  221. raise TypeError("don't know how to handle %r" % exc)
  222. l = ["<%d>" % exc.object[pos] for pos in range(exc.start, exc.end)]
  223. return ("[%s]" % "".join(l), exc.end+1) # skip one character
  224. codecs.register_error("test.handler2", handler2)
  225. s = b"\x00\x81\x7f\x80\xff"
  226. self.assertEqual(
  227. s.decode("ascii", "test.handler1"),
  228. "\x00[<129>]\x7f[<128>][<255>]"
  229. )
  230. self.assertEqual(
  231. s.decode("ascii", "test.handler2"),
  232. "\x00[<129>][<128>]"
  233. )
  234. self.assertEqual(
  235. b"\\u3042\u3xxx".decode("unicode-escape", "test.handler1"),
  236. "\u3042[<92><117><51>]xxx"
  237. )
  238. self.assertEqual(
  239. b"\\u3042\u3xx".decode("unicode-escape", "test.handler1"),
  240. "\u3042[<92><117><51>]xx"
  241. )
  242. self.assertEqual(
  243. codecs.charmap_decode(b"abc", "test.handler1", {ord("a"): "z"})[0],
  244. "z[<98>][<99>]"
  245. )
  246. self.assertEqual(
  247. "g\xfc\xdfrk".encode("ascii", "test.handler1"),
  248. b"g[<252><223>]rk"
  249. )
  250. self.assertEqual(
  251. "g\xfc\xdf".encode("ascii", "test.handler1"),
  252. b"g[<252><223>]"
  253. )
  254. def test_longstrings(self):
  255. # test long strings to check for memory overflow problems
  256. errors = [ "strict", "ignore", "replace", "xmlcharrefreplace",
  257. "backslashreplace", "namereplace"]
  258. # register the handlers under different names,
  259. # to prevent the codec from recognizing the name
  260. for err in errors:
  261. codecs.register_error("test." + err, codecs.lookup_error(err))
  262. l = 1000
  263. errors += [ "test." + err for err in errors ]
  264. for uni in [ s*l for s in ("x", "\u3042", "a\xe4") ]:
  265. for enc in ("ascii", "latin-1", "iso-8859-1", "iso-8859-15",
  266. "utf-8", "utf-7", "utf-16", "utf-32"):
  267. for err in errors:
  268. try:
  269. uni.encode(enc, err)
  270. except UnicodeError:
  271. pass
  272. def check_exceptionobjectargs(self, exctype, args, msg):
  273. # Test UnicodeError subclasses: construction, attribute assignment and __str__ conversion
  274. # check with one missing argument
  275. self.assertRaises(TypeError, exctype, *args[:-1])
  276. # check with one argument too much
  277. self.assertRaises(TypeError, exctype, *(args + ["too much"]))
  278. # check with one argument of the wrong type
  279. wrongargs = [ "spam", b"eggs", b"spam", 42, 1.0, None ]
  280. for i in range(len(args)):
  281. for wrongarg in wrongargs:
  282. if type(wrongarg) is type(args[i]):
  283. continue
  284. # build argument array
  285. callargs = []
  286. for j in range(len(args)):
  287. if i==j:
  288. callargs.append(wrongarg)
  289. else:
  290. callargs.append(args[i])
  291. self.assertRaises(TypeError, exctype, *callargs)
  292. # check with the correct number and type of arguments
  293. exc = exctype(*args)
  294. self.assertEqual(str(exc), msg)
  295. def test_unicodeencodeerror(self):
  296. self.check_exceptionobjectargs(
  297. UnicodeEncodeError,
  298. ["ascii", "g\xfcrk", 1, 2, "ouch"],
  299. "'ascii' codec can't encode character '\\xfc' in position 1: ouch"
  300. )
  301. self.check_exceptionobjectargs(
  302. UnicodeEncodeError,
  303. ["ascii", "g\xfcrk", 1, 4, "ouch"],
  304. "'ascii' codec can't encode characters in position 1-3: ouch"
  305. )
  306. self.check_exceptionobjectargs(
  307. UnicodeEncodeError,
  308. ["ascii", "\xfcx", 0, 1, "ouch"],
  309. "'ascii' codec can't encode character '\\xfc' in position 0: ouch"
  310. )
  311. self.check_exceptionobjectargs(
  312. UnicodeEncodeError,
  313. ["ascii", "\u0100x", 0, 1, "ouch"],
  314. "'ascii' codec can't encode character '\\u0100' in position 0: ouch"
  315. )
  316. self.check_exceptionobjectargs(
  317. UnicodeEncodeError,
  318. ["ascii", "\uffffx", 0, 1, "ouch"],
  319. "'ascii' codec can't encode character '\\uffff' in position 0: ouch"
  320. )
  321. self.check_exceptionobjectargs(
  322. UnicodeEncodeError,
  323. ["ascii", "\U00010000x", 0, 1, "ouch"],
  324. "'ascii' codec can't encode character '\\U00010000' in position 0: ouch"
  325. )
  326. def test_unicodedecodeerror(self):
  327. self.check_exceptionobjectargs(
  328. UnicodeDecodeError,
  329. ["ascii", bytearray(b"g\xfcrk"), 1, 2, "ouch"],
  330. "'ascii' codec can't decode byte 0xfc in position 1: ouch"
  331. )
  332. self.check_exceptionobjectargs(
  333. UnicodeDecodeError,
  334. ["ascii", bytearray(b"g\xfcrk"), 1, 3, "ouch"],
  335. "'ascii' codec can't decode bytes in position 1-2: ouch"
  336. )
  337. def test_unicodetranslateerror(self):
  338. self.check_exceptionobjectargs(
  339. UnicodeTranslateError,
  340. ["g\xfcrk", 1, 2, "ouch"],
  341. "can't translate character '\\xfc' in position 1: ouch"
  342. )
  343. self.check_exceptionobjectargs(
  344. UnicodeTranslateError,
  345. ["g\u0100rk", 1, 2, "ouch"],
  346. "can't translate character '\\u0100' in position 1: ouch"
  347. )
  348. self.check_exceptionobjectargs(
  349. UnicodeTranslateError,
  350. ["g\uffffrk", 1, 2, "ouch"],
  351. "can't translate character '\\uffff' in position 1: ouch"
  352. )
  353. self.check_exceptionobjectargs(
  354. UnicodeTranslateError,
  355. ["g\U00010000rk", 1, 2, "ouch"],
  356. "can't translate character '\\U00010000' in position 1: ouch"
  357. )
  358. self.check_exceptionobjectargs(
  359. UnicodeTranslateError,
  360. ["g\xfcrk", 1, 3, "ouch"],
  361. "can't translate characters in position 1-2: ouch"
  362. )
  363. def test_badandgoodstrictexceptions(self):
  364. # "strict" complains about a non-exception passed in
  365. self.assertRaises(
  366. TypeError,
  367. codecs.strict_errors,
  368. 42
  369. )
  370. # "strict" complains about the wrong exception type
  371. self.assertRaises(
  372. Exception,
  373. codecs.strict_errors,
  374. Exception("ouch")
  375. )
  376. # If the correct exception is passed in, "strict" raises it
  377. self.assertRaises(
  378. UnicodeEncodeError,
  379. codecs.strict_errors,
  380. UnicodeEncodeError("ascii", "\u3042", 0, 1, "ouch")
  381. )
  382. self.assertRaises(
  383. UnicodeDecodeError,
  384. codecs.strict_errors,
  385. UnicodeDecodeError("ascii", bytearray(b"\xff"), 0, 1, "ouch")
  386. )
  387. self.assertRaises(
  388. UnicodeTranslateError,
  389. codecs.strict_errors,
  390. UnicodeTranslateError("\u3042", 0, 1, "ouch")
  391. )
  392. def test_badandgoodignoreexceptions(self):
  393. # "ignore" complains about a non-exception passed in
  394. self.assertRaises(
  395. TypeError,
  396. codecs.ignore_errors,
  397. 42
  398. )
  399. # "ignore" complains about the wrong exception type
  400. self.assertRaises(
  401. TypeError,
  402. codecs.ignore_errors,
  403. UnicodeError("ouch")
  404. )
  405. # If the correct exception is passed in, "ignore" returns an empty replacement
  406. self.assertEqual(
  407. codecs.ignore_errors(
  408. UnicodeEncodeError("ascii", "a\u3042b", 1, 2, "ouch")),
  409. ("", 2)
  410. )
  411. self.assertEqual(
  412. codecs.ignore_errors(
  413. UnicodeDecodeError("ascii", bytearray(b"a\xffb"), 1, 2, "ouch")),
  414. ("", 2)
  415. )
  416. self.assertEqual(
  417. codecs.ignore_errors(
  418. UnicodeTranslateError("a\u3042b", 1, 2, "ouch")),
  419. ("", 2)
  420. )
  421. def test_badandgoodreplaceexceptions(self):
  422. # "replace" complains about a non-exception passed in
  423. self.assertRaises(
  424. TypeError,
  425. codecs.replace_errors,
  426. 42
  427. )
  428. # "replace" complains about the wrong exception type
  429. self.assertRaises(
  430. TypeError,
  431. codecs.replace_errors,
  432. UnicodeError("ouch")
  433. )
  434. self.assertRaises(
  435. TypeError,
  436. codecs.replace_errors,
  437. BadObjectUnicodeEncodeError()
  438. )
  439. self.assertRaises(
  440. TypeError,
  441. codecs.replace_errors,
  442. BadObjectUnicodeDecodeError()
  443. )
  444. # With the correct exception, "replace" returns an "?" or "\ufffd" replacement
  445. self.assertEqual(
  446. codecs.replace_errors(
  447. UnicodeEncodeError("ascii", "a\u3042b", 1, 2, "ouch")),
  448. ("?", 2)
  449. )
  450. self.assertEqual(
  451. codecs.replace_errors(
  452. UnicodeDecodeError("ascii", bytearray(b"a\xffb"), 1, 2, "ouch")),
  453. ("\ufffd", 2)
  454. )
  455. self.assertEqual(
  456. codecs.replace_errors(
  457. UnicodeTranslateError("a\u3042b", 1, 2, "ouch")),
  458. ("\ufffd", 2)
  459. )
  460. def test_badandgoodxmlcharrefreplaceexceptions(self):
  461. # "xmlcharrefreplace" complains about a non-exception passed in
  462. self.assertRaises(
  463. TypeError,
  464. codecs.xmlcharrefreplace_errors,
  465. 42
  466. )
  467. # "xmlcharrefreplace" complains about the wrong exception types
  468. self.assertRaises(
  469. TypeError,
  470. codecs.xmlcharrefreplace_errors,
  471. UnicodeError("ouch")
  472. )
  473. # "xmlcharrefreplace" can only be used for encoding
  474. self.assertRaises(
  475. TypeError,
  476. codecs.xmlcharrefreplace_errors,
  477. UnicodeDecodeError("ascii", bytearray(b"\xff"), 0, 1, "ouch")
  478. )
  479. self.assertRaises(
  480. TypeError,
  481. codecs.xmlcharrefreplace_errors,
  482. UnicodeTranslateError("\u3042", 0, 1, "ouch")
  483. )
  484. # Use the correct exception
  485. cs = (0, 1, 9, 10, 99, 100, 999, 1000, 9999, 10000, 99999, 100000,
  486. 999999, 1000000)
  487. cs += (0xd800, 0xdfff)
  488. s = "".join(chr(c) for c in cs)
  489. self.assertEqual(
  490. codecs.xmlcharrefreplace_errors(
  491. UnicodeEncodeError("ascii", "a" + s + "b",
  492. 1, 1 + len(s), "ouch")
  493. ),
  494. ("".join("&#%d;" % c for c in cs), 1 + len(s))
  495. )
  496. def test_badandgoodbackslashreplaceexceptions(self):
  497. # "backslashreplace" complains about a non-exception passed in
  498. self.assertRaises(
  499. TypeError,
  500. codecs.backslashreplace_errors,
  501. 42
  502. )
  503. # "backslashreplace" complains about the wrong exception types
  504. self.assertRaises(
  505. TypeError,
  506. codecs.backslashreplace_errors,
  507. UnicodeError("ouch")
  508. )
  509. # Use the correct exception
  510. tests = [
  511. ("\u3042", "\\u3042"),
  512. ("\n", "\\x0a"),
  513. ("a", "\\x61"),
  514. ("\x00", "\\x00"),
  515. ("\xff", "\\xff"),
  516. ("\u0100", "\\u0100"),
  517. ("\uffff", "\\uffff"),
  518. ("\U00010000", "\\U00010000"),
  519. ("\U0010ffff", "\\U0010ffff"),
  520. # Lone surrogates
  521. ("\ud800", "\\ud800"),
  522. ("\udfff", "\\udfff"),
  523. ("\ud800\udfff", "\\ud800\\udfff"),
  524. ]
  525. for s, r in tests:
  526. with self.subTest(str=s):
  527. self.assertEqual(
  528. codecs.backslashreplace_errors(
  529. UnicodeEncodeError("ascii", "a" + s + "b",
  530. 1, 1 + len(s), "ouch")),
  531. (r, 1 + len(s))
  532. )
  533. self.assertEqual(
  534. codecs.backslashreplace_errors(
  535. UnicodeTranslateError("a" + s + "b",
  536. 1, 1 + len(s), "ouch")),
  537. (r, 1 + len(s))
  538. )
  539. tests = [
  540. (b"a", "\\x61"),
  541. (b"\n", "\\x0a"),
  542. (b"\x00", "\\x00"),
  543. (b"\xff", "\\xff"),
  544. ]
  545. for b, r in tests:
  546. with self.subTest(bytes=b):
  547. self.assertEqual(
  548. codecs.backslashreplace_errors(
  549. UnicodeDecodeError("ascii", bytearray(b"a" + b + b"b"),
  550. 1, 2, "ouch")),
  551. (r, 2)
  552. )
  553. def test_badandgoodnamereplaceexceptions(self):
  554. # "namereplace" complains about a non-exception passed in
  555. self.assertRaises(
  556. TypeError,
  557. codecs.namereplace_errors,
  558. 42
  559. )
  560. # "namereplace" complains about the wrong exception types
  561. self.assertRaises(
  562. TypeError,
  563. codecs.namereplace_errors,
  564. UnicodeError("ouch")
  565. )
  566. # "namereplace" can only be used for encoding
  567. self.assertRaises(
  568. TypeError,
  569. codecs.namereplace_errors,
  570. UnicodeDecodeError("ascii", bytearray(b"\xff"), 0, 1, "ouch")
  571. )
  572. self.assertRaises(
  573. TypeError,
  574. codecs.namereplace_errors,
  575. UnicodeTranslateError("\u3042", 0, 1, "ouch")
  576. )
  577. # Use the correct exception
  578. tests = [
  579. ("\u3042", "\\N{HIRAGANA LETTER A}"),
  580. ("\x00", "\\x00"),
  581. ("\ufbf9", "\\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH "
  582. "HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"),
  583. ("\U000e007f", "\\N{CANCEL TAG}"),
  584. ("\U0010ffff", "\\U0010ffff"),
  585. # Lone surrogates
  586. ("\ud800", "\\ud800"),
  587. ("\udfff", "\\udfff"),
  588. ("\ud800\udfff", "\\ud800\\udfff"),
  589. ]
  590. for s, r in tests:
  591. with self.subTest(str=s):
  592. self.assertEqual(
  593. codecs.namereplace_errors(
  594. UnicodeEncodeError("ascii", "a" + s + "b",
  595. 1, 1 + len(s), "ouch")),
  596. (r, 1 + len(s))
  597. )
  598. def test_badandgoodsurrogateescapeexceptions(self):
  599. surrogateescape_errors = codecs.lookup_error('surrogateescape')
  600. # "surrogateescape" complains about a non-exception passed in
  601. self.assertRaises(
  602. TypeError,
  603. surrogateescape_errors,
  604. 42
  605. )
  606. # "surrogateescape" complains about the wrong exception types
  607. self.assertRaises(
  608. TypeError,
  609. surrogateescape_errors,
  610. UnicodeError("ouch")
  611. )
  612. # "surrogateescape" can not be used for translating
  613. self.assertRaises(
  614. TypeError,
  615. surrogateescape_errors,
  616. UnicodeTranslateError("\udc80", 0, 1, "ouch")
  617. )
  618. # Use the correct exception
  619. for s in ("a", "\udc7f", "\udd00"):
  620. with self.subTest(str=s):
  621. self.assertRaises(
  622. UnicodeEncodeError,
  623. surrogateescape_errors,
  624. UnicodeEncodeError("ascii", s, 0, 1, "ouch")
  625. )
  626. self.assertEqual(
  627. surrogateescape_errors(
  628. UnicodeEncodeError("ascii", "a\udc80b", 1, 2, "ouch")),
  629. (b"\x80", 2)
  630. )
  631. self.assertRaises(
  632. UnicodeDecodeError,
  633. surrogateescape_errors,
  634. UnicodeDecodeError("ascii", bytearray(b"a"), 0, 1, "ouch")
  635. )
  636. self.assertEqual(
  637. surrogateescape_errors(
  638. UnicodeDecodeError("ascii", bytearray(b"a\x80b"), 1, 2, "ouch")),
  639. ("\udc80", 2)
  640. )
  641. def test_badandgoodsurrogatepassexceptions(self):
  642. surrogatepass_errors = codecs.lookup_error('surrogatepass')
  643. # "surrogatepass" complains about a non-exception passed in
  644. self.assertRaises(
  645. TypeError,
  646. surrogatepass_errors,
  647. 42
  648. )
  649. # "surrogatepass" complains about the wrong exception types
  650. self.assertRaises(
  651. TypeError,
  652. surrogatepass_errors,
  653. UnicodeError("ouch")
  654. )
  655. # "surrogatepass" can not be used for translating
  656. self.assertRaises(
  657. TypeError,
  658. surrogatepass_errors,
  659. UnicodeTranslateError("\ud800", 0, 1, "ouch")
  660. )
  661. # Use the correct exception
  662. for enc in ("utf-8", "utf-16le", "utf-16be", "utf-32le", "utf-32be"):
  663. with self.subTest(encoding=enc):
  664. self.assertRaises(
  665. UnicodeEncodeError,
  666. surrogatepass_errors,
  667. UnicodeEncodeError(enc, "a", 0, 1, "ouch")
  668. )
  669. self.assertRaises(
  670. UnicodeDecodeError,
  671. surrogatepass_errors,
  672. UnicodeDecodeError(enc, "a".encode(enc), 0, 1, "ouch")
  673. )
  674. for s in ("\ud800", "\udfff", "\ud800\udfff"):
  675. with self.subTest(str=s):
  676. self.assertRaises(
  677. UnicodeEncodeError,
  678. surrogatepass_errors,
  679. UnicodeEncodeError("ascii", s, 0, len(s), "ouch")
  680. )
  681. tests = [
  682. ("utf-8", "\ud800", b'\xed\xa0\x80', 3),
  683. ("utf-16le", "\ud800", b'\x00\xd8', 2),
  684. ("utf-16be", "\ud800", b'\xd8\x00', 2),
  685. ("utf-32le", "\ud800", b'\x00\xd8\x00\x00', 4),
  686. ("utf-32be", "\ud800", b'\x00\x00\xd8\x00', 4),
  687. ("utf-8", "\udfff", b'\xed\xbf\xbf', 3),
  688. ("utf-16le", "\udfff", b'\xff\xdf', 2),
  689. ("utf-16be", "\udfff", b'\xdf\xff', 2),
  690. ("utf-32le", "\udfff", b'\xff\xdf\x00\x00', 4),
  691. ("utf-32be", "\udfff", b'\x00\x00\xdf\xff', 4),
  692. ("utf-8", "\ud800\udfff", b'\xed\xa0\x80\xed\xbf\xbf', 3),
  693. ("utf-16le", "\ud800\udfff", b'\x00\xd8\xff\xdf', 2),
  694. ("utf-16be", "\ud800\udfff", b'\xd8\x00\xdf\xff', 2),
  695. ("utf-32le", "\ud800\udfff", b'\x00\xd8\x00\x00\xff\xdf\x00\x00', 4),
  696. ("utf-32be", "\ud800\udfff", b'\x00\x00\xd8\x00\x00\x00\xdf\xff', 4),
  697. ]
  698. for enc, s, b, n in tests:
  699. with self.subTest(encoding=enc, str=s, bytes=b):
  700. self.assertEqual(
  701. surrogatepass_errors(
  702. UnicodeEncodeError(enc, "a" + s + "b",
  703. 1, 1 + len(s), "ouch")),
  704. (b, 1 + len(s))
  705. )
  706. self.assertEqual(
  707. surrogatepass_errors(
  708. UnicodeDecodeError(enc, bytearray(b"a" + b[:n] + b"b"),
  709. 1, 1 + n, "ouch")),
  710. (s[:1], 1 + n)
  711. )
  712. def test_badhandlerresults(self):
  713. results = ( 42, "foo", (1,2,3), ("foo", 1, 3), ("foo", None), ("foo",), ("foo", 1, 3), ("foo", None), ("foo",) )
  714. encs = ("ascii", "latin-1", "iso-8859-1", "iso-8859-15")
  715. for res in results:
  716. codecs.register_error("test.badhandler", lambda x: res)
  717. for enc in encs:
  718. self.assertRaises(
  719. TypeError,
  720. "\u3042".encode,
  721. enc,
  722. "test.badhandler"
  723. )
  724. for (enc, bytes) in (
  725. ("ascii", b"\xff"),
  726. ("utf-8", b"\xff"),
  727. ("utf-7", b"+x-"),
  728. ("unicode-internal", b"\x00"),
  729. ):
  730. with test.support.check_warnings():
  731. # unicode-internal has been deprecated
  732. self.assertRaises(
  733. TypeError,
  734. bytes.decode,
  735. enc,
  736. "test.badhandler"
  737. )
  738. def test_lookup(self):
  739. self.assertEqual(codecs.strict_errors, codecs.lookup_error("strict"))
  740. self.assertEqual(codecs.ignore_errors, codecs.lookup_error("ignore"))
  741. self.assertEqual(codecs.strict_errors, codecs.lookup_error("strict"))
  742. self.assertEqual(
  743. codecs.xmlcharrefreplace_errors,
  744. codecs.lookup_error("xmlcharrefreplace")
  745. )
  746. self.assertEqual(
  747. codecs.backslashreplace_errors,
  748. codecs.lookup_error("backslashreplace")
  749. )
  750. self.assertEqual(
  751. codecs.namereplace_errors,
  752. codecs.lookup_error("namereplace")
  753. )
  754. def test_unencodablereplacement(self):
  755. def unencrepl(exc):
  756. if isinstance(exc, UnicodeEncodeError):
  757. return ("\u4242", exc.end)
  758. else:
  759. raise TypeError("don't know how to handle %r" % exc)
  760. codecs.register_error("test.unencreplhandler", unencrepl)
  761. for enc in ("ascii", "iso-8859-1", "iso-8859-15"):
  762. self.assertRaises(
  763. UnicodeEncodeError,
  764. "\u4242".encode,
  765. enc,
  766. "test.unencreplhandler"
  767. )
  768. def test_badregistercall(self):
  769. # enhance coverage of:
  770. # Modules/_codecsmodule.c::register_error()
  771. # Python/codecs.c::PyCodec_RegisterError()
  772. self.assertRaises(TypeError, codecs.register_error, 42)
  773. self.assertRaises(TypeError, codecs.register_error, "test.dummy", 42)
  774. def test_badlookupcall(self):
  775. # enhance coverage of:
  776. # Modules/_codecsmodule.c::lookup_error()
  777. self.assertRaises(TypeError, codecs.lookup_error)
  778. def test_unknownhandler(self):
  779. # enhance coverage of:
  780. # Modules/_codecsmodule.c::lookup_error()
  781. self.assertRaises(LookupError, codecs.lookup_error, "test.unknown")
  782. def test_xmlcharrefvalues(self):
  783. # enhance coverage of:
  784. # Python/codecs.c::PyCodec_XMLCharRefReplaceErrors()
  785. # and inline implementations
  786. v = (1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000,
  787. 500000, 1000000)
  788. s = "".join([chr(x) for x in v])
  789. codecs.register_error("test.xmlcharrefreplace", codecs.xmlcharrefreplace_errors)
  790. for enc in ("ascii", "iso-8859-15"):
  791. for err in ("xmlcharrefreplace", "test.xmlcharrefreplace"):
  792. s.encode(enc, err)
  793. def test_decodehelper(self):
  794. # enhance coverage of:
  795. # Objects/unicodeobject.c::unicode_decode_call_errorhandler()
  796. # and callers
  797. self.assertRaises(LookupError, b"\xff".decode, "ascii", "test.unknown")
  798. def baddecodereturn1(exc):
  799. return 42
  800. codecs.register_error("test.baddecodereturn1", baddecodereturn1)
  801. self.assertRaises(TypeError, b"\xff".decode, "ascii", "test.baddecodereturn1")
  802. self.assertRaises(TypeError, b"\\".decode, "unicode-escape", "test.baddecodereturn1")
  803. self.assertRaises(TypeError, b"\\x0".decode, "unicode-escape", "test.baddecodereturn1")
  804. self.assertRaises(TypeError, b"\\x0y".decode, "unicode-escape", "test.baddecodereturn1")
  805. self.assertRaises(TypeError, b"\\Uffffeeee".decode, "unicode-escape", "test.baddecodereturn1")
  806. self.assertRaises(TypeError, b"\\uyyyy".decode, "raw-unicode-escape", "test.baddecodereturn1")
  807. def baddecodereturn2(exc):
  808. return ("?", None)
  809. codecs.register_error("test.baddecodereturn2", baddecodereturn2)
  810. self.assertRaises(TypeError, b"\xff".decode, "ascii", "test.baddecodereturn2")
  811. handler = PosReturn()
  812. codecs.register_error("test.posreturn", handler.handle)
  813. # Valid negative position
  814. handler.pos = -1
  815. self.assertEqual(b"\xff0".decode("ascii", "test.posreturn"), "<?>0")
  816. # Valid negative position
  817. handler.pos = -2
  818. self.assertEqual(b"\xff0".decode("ascii", "test.posreturn"), "<?><?>")
  819. # Negative position out of bounds
  820. handler.pos = -3
  821. self.assertRaises(IndexError, b"\xff0".decode, "ascii", "test.posreturn")
  822. # Valid positive position
  823. handler.pos = 1
  824. self.assertEqual(b"\xff0".decode("ascii", "test.posreturn"), "<?>0")
  825. # Largest valid positive position (one beyond end of input)
  826. handler.pos = 2
  827. self.assertEqual(b"\xff0".decode("ascii", "test.posreturn"), "<?>")
  828. # Invalid positive position
  829. handler.pos = 3
  830. self.assertRaises(IndexError, b"\xff0".decode, "ascii", "test.posreturn")
  831. # Restart at the "0"
  832. handler.pos = 6
  833. self.assertEqual(b"\\uyyyy0".decode("raw-unicode-escape", "test.posreturn"), "<?>0")
  834. class D(dict):
  835. def __getitem__(self, key):
  836. raise ValueError
  837. self.assertRaises(UnicodeError, codecs.charmap_decode, b"\xff", "strict", {0xff: None})
  838. self.assertRaises(ValueError, codecs.charmap_decode, b"\xff", "strict", D())
  839. self.assertRaises(TypeError, codecs.charmap_decode, b"\xff", "strict", {0xff: sys.maxunicode+1})
  840. def test_encodehelper(self):
  841. # enhance coverage of:
  842. # Objects/unicodeobject.c::unicode_encode_call_errorhandler()
  843. # and callers
  844. self.assertRaises(LookupError, "\xff".encode, "ascii", "test.unknown")
  845. def badencodereturn1(exc):
  846. return 42
  847. codecs.register_error("test.badencodereturn1", badencodereturn1)
  848. self.assertRaises(TypeError, "\xff".encode, "ascii", "test.badencodereturn1")
  849. def badencodereturn2(exc):
  850. return ("?", None)
  851. codecs.register_error("test.badencodereturn2", badencodereturn2)
  852. self.assertRaises(TypeError, "\xff".encode, "ascii", "test.badencodereturn2")
  853. handler = PosReturn()
  854. codecs.register_error("test.posreturn", handler.handle)
  855. # Valid negative position
  856. handler.pos = -1
  857. self.assertEqual("\xff0".encode("ascii", "test.posreturn"), b"<?>0")
  858. # Valid negative position
  859. handler.pos = -2
  860. self.assertEqual("\xff0".encode("ascii", "test.posreturn"), b"<?><?>")
  861. # Negative position out of bounds
  862. handler.pos = -3
  863. self.assertRaises(IndexError, "\xff0".encode, "ascii", "test.posreturn")
  864. # Valid positive position
  865. handler.pos = 1
  866. self.assertEqual("\xff0".encode("ascii", "test.posreturn"), b"<?>0")
  867. # Largest valid positive position (one beyond end of input
  868. handler.pos = 2
  869. self.assertEqual("\xff0".encode("ascii", "test.posreturn"), b"<?>")
  870. # Invalid positive position
  871. handler.pos = 3
  872. self.assertRaises(IndexError, "\xff0".encode, "ascii", "test.posreturn")
  873. handler.pos = 0
  874. class D(dict):
  875. def __getitem__(self, key):
  876. raise ValueError
  877. for err in ("strict", "replace", "xmlcharrefreplace",
  878. "backslashreplace", "namereplace", "test.posreturn"):
  879. self.assertRaises(UnicodeError, codecs.charmap_encode, "\xff", err, {0xff: None})
  880. self.assertRaises(ValueError, codecs.charmap_encode, "\xff", err, D())
  881. self.assertRaises(TypeError, codecs.charmap_encode, "\xff", err, {0xff: 300})
  882. def test_translatehelper(self):
  883. # enhance coverage of:
  884. # Objects/unicodeobject.c::unicode_encode_call_errorhandler()
  885. # and callers
  886. # (Unfortunately the errors argument is not directly accessible
  887. # from Python, so we can't test that much)
  888. class D(dict):
  889. def __getitem__(self, key):
  890. raise ValueError
  891. #self.assertRaises(ValueError, "\xff".translate, D())
  892. self.assertRaises(ValueError, "\xff".translate, {0xff: sys.maxunicode+1})
  893. self.assertRaises(TypeError, "\xff".translate, {0xff: ()})
  894. def test_bug828737(self):
  895. charmap = {
  896. ord("&"): "&amp;",
  897. ord("<"): "&lt;",
  898. ord(">"): "&gt;",
  899. ord('"'): "&quot;",
  900. }
  901. for n in (1, 10, 100, 1000):
  902. text = 'abc<def>ghi'*n
  903. text.translate(charmap)
  904. def test_mutatingdecodehandler(self):
  905. baddata = [
  906. ("ascii", b"\xff"),
  907. ("utf-7", b"++"),
  908. ("utf-8", b"\xff"),
  909. ("utf-16", b"\xff"),
  910. ("utf-32", b"\xff"),
  911. ("unicode-escape", b"\\u123g"),
  912. ("raw-unicode-escape", b"\\u123g"),
  913. ("unicode-internal", b"\xff"),
  914. ]
  915. def replacing(exc):
  916. if isinstance(exc, UnicodeDecodeError):
  917. exc.object = 42
  918. return ("\u4242", 0)
  919. else:
  920. raise TypeError("don't know how to handle %r" % exc)
  921. codecs.register_error("test.replacing", replacing)
  922. with test.support.check_warnings():
  923. # unicode-internal has been deprecated
  924. for (encoding, data) in baddata:
  925. with self.assertRaises(TypeError):
  926. data.decode(encoding, "test.replacing")
  927. def mutating(exc):
  928. if isinstance(exc, UnicodeDecodeError):
  929. exc.object[:] = b""
  930. return ("\u4242", 0)
  931. else:
  932. raise TypeError("don't know how to handle %r" % exc)
  933. codecs.register_error("test.mutating", mutating)
  934. # If the decoder doesn't pick up the modified input the following
  935. # will lead to an endless loop
  936. with test.support.check_warnings():
  937. # unicode-internal has been deprecated
  938. for (encoding, data) in baddata:
  939. with self.assertRaises(TypeError):
  940. data.decode(encoding, "test.replacing")
  941. def test_fake_error_class(self):
  942. handlers = [
  943. codecs.strict_errors,
  944. codecs.ignore_errors,
  945. codecs.replace_errors,
  946. codecs.backslashreplace_errors,
  947. codecs.namereplace_errors,
  948. codecs.xmlcharrefreplace_errors,
  949. codecs.lookup_error('surrogateescape'),
  950. codecs.lookup_error('surrogatepass'),
  951. ]
  952. for cls in UnicodeEncodeError, UnicodeDecodeError, UnicodeTranslateError:
  953. class FakeUnicodeError(str):
  954. __class__ = cls
  955. for handler in handlers:
  956. with self.subTest(handler=handler, error_class=cls):
  957. self.assertRaises(TypeError, handler, FakeUnicodeError())
  958. class FakeUnicodeError(Exception):
  959. __class__ = cls
  960. for handler in handlers:
  961. with self.subTest(handler=handler, error_class=cls):
  962. with self.assertRaises((TypeError, FakeUnicodeError)):
  963. handler(FakeUnicodeError())
  964. if __name__ == "__main__":
  965. unittest.main()