This new interface will also allow for future planned enhancements
in control over the parser/generator without requiring any additional
complexity in the parser/generator API.
Patch reviewed by Éric Araujo and Barry Warsaw.
The work on this is not 100% complete, but everything is present to
allow real-world testing of the code. The only remaining major todo
item is to (hopefully!) enhance the handling of non-ASCII bytes in headers
converted to unicode by RFC2047 encoding them rather than replacing them with
'?'s.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r82922 | r.david.murray | 2010-07-16 21:19:57 -0400 (Fri, 16 Jul 2010) | 4 lines
#1555570: correctly handle a \r\n that is split by the read buffer.
Patch and test by Tony Nelson.
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
................
r82011 | r.david.murray | 2010-06-15 22:19:40 -0400 (Tue, 15 Jun 2010) | 17 lines
Merged revisions 81675 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81675 | r.david.murray | 2010-06-03 11:43:20 -0400 (Thu, 03 Jun 2010) | 10 lines
#5610: use \Z not $ so we don't eat extra chars when body part ends with \r\n.
If a body part ended with \r\n, feedparser, using '$' to terminate its
search for the newline, would match on the \r\n, and think that it needed
to strip two characters in order to account for the line end before the
boundary. That made it chop one too many characters off the end of
the body part. Using \Z makes the match correct.
Patch and test by Tony Nelson.
........
................
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81675 | r.david.murray | 2010-06-03 11:43:20 -0400 (Thu, 03 Jun 2010) | 10 lines
#5610: use \Z not $ so we don't eat extra chars when body part ends with \r\n.
If a body part ended with \r\n, feedparser, using '$' to terminate its
search for the newline, would match on the \r\n, and think that it needed
to strip two characters in order to account for the line end before the
boundary. That made it chop one too many characters off the end of
the body part. Using \Z makes the match correct.
Patch and test by Tony Nelson.
........
number of tests, all because of the codecs/_multibytecodecs issue described
here (it's not a Py3K issue, just something Py3K discovers):
http://mail.python.org/pipermail/python-dev/2006-April/064051.html
Hye-Shik Chang promised to look for a fix, so no need to fix it here. The
tests that are expected to break are:
test_codecencodings_cn
test_codecencodings_hk
test_codecencodings_jp
test_codecencodings_kr
test_codecencodings_tw
test_codecs
test_multibytecodec
This merge fixes an actual test failure (test_weakref) in this branch,
though, so I believe merging is the right thing to do anyway.
caused by a self._input.readline() call that wasn't checking for the
NeedsMoreData marker.
msg_43.txt contains a message that illustrates the problem, when
email.message_from_*() is called. That interface uses the Parser API, which
splits reads into 8192 byte chunks. It so happens that for the test message,
the 8192 chunk falls inside a message/delivery-status, which is where in the
FeedParser the readline() call was that didn't check for NeedsMoreData.
I also added an assert to unreadline() so it'll be more evident if an attempt
to push back NeedsMoreData ever happens again.
Bump the email package version number.
in a newline, and it's an end boundary, the FeedParser wasn't recognizing it
as such. Tweak the regexp to make the ending linesep optional.
For grins, clear self._partial when closing the BufferedSubFile.
Added a test case.
capturing_preamble but we found a StartBoundaryNotFoundDefect, we need to
consume all lines from the current position to the EOF, which we'll set as the
epilogue of the current message. If we're not at EOF when we return from
here, the outer message's capturing_preamble assertion will fail.
Briefly (from the NEWS file):
- Updates for the email package:
+ All deprecated APIs that in email 2.x issued warnings have been removed:
_encoder argument to the MIMEText constructor, Message.add_payload(),
Utils.dump_address_pair(), Utils.decode(), Utils.encode()
+ New deprecations: Generator.__call__(), Message.get_type(),
Message.get_main_type(), Message.get_subtype(), the 'strict' argument to
the Parser constructor. These will be removed in email 3.1.
+ Support for Python earlier than 2.3 has been removed (see PEP 291).
+ All defect classes have been renamed to end in 'Defect'.
+ Some FeedParser fixes; also a MultipartInvariantViolationDefect will be
added to messages that claim to be multipart but really aren't.
+ Updates to documentation.
\r\n only get the \n stripped, not the \r (unless it's the last header which
does get the \r stripped). Patch by Tony Meyer.
test_whitespace_continuation_last_header(),
test_strip_line_feed_and_carriage_return_in_headers(): New tests.
_parse_headers(): Be sure to strip \r\n from the right side of header lines.
parser must recognize outer boundaries in inner parts. So cruise through the
EOF stack backwards testing each predicate against the current line.
There's still some discussion about whether this is (always) the best thing to
do. Anthony would rather parse these messages as if the outer boundaries were
ignored. I think that's counter to the RFC, but might be practically more
useful. Can you say behavior flag? (ug).
message/delivery-status clause, and genericize it to handle all (other)
message/* content types. This lets us correctly parse 2 more of Anthony's
MIME torture tests (specifically, the message/external-body examples).
(standard) tests, and doesn't throw parse errors. I still need throw
Anthony's torture test at it, but I wanted to get this checked in and off my
disk.