Ezio Melotti
c45868ec69
#14538 : HTMLParser can now parse correctly start tags that contain a bare /.
14 years ago
Ezio Melotti
36b7361fe7
HTMLParser is now able to handle slashes in the start tag.
14 years ago
Ezio Melotti
65d36dab4d
#13987 : HTMLParser is now able to handle malformed start tags.
14 years ago
Ezio Melotti
d2307cb48a
#13987 : HTMLParser is now able to handle EOFs in the middle of a construct.
14 years ago
Ezio Melotti
369cbd744e
Fix an index, add more tests, avoid raising errors for unknown declarations, and clean up comments.
14 years ago
Ezio Melotti
f117443cb8
#13993 : HTMLParser is now able to handle broken end tags.
14 years ago
Ezio Melotti
4b92cc3f79
#13960 : HTMLParser is now able to handle broken comments.
14 years ago
Ezio Melotti
00dc60beee
#13358 : HTMLParser now calls handle_data only once for each CDATA.
14 years ago
Ezio Melotti
0f1571ce7f
#1745761 , #755670 , #13357 , #12629 , #1200313 : improve attribute handling in HTMLParser.
14 years ago
Ezio Melotti
7e82b276dd
#670664 : Fix HTMLParser to correctly handle the content of ``<script>...</script>`` and ``<style>...</style>``.
14 years ago
Éric Araujo
31890bc9ba
Fix display of html.parser.HTMLParser.feed docstrin
15 years ago
Ezio Melotti
9f1ffb2ae9
#7311 : fix HTMLParser to accept non-ASCII attribute values.
15 years ago
Senthil Kumaran
3f60f09eb2
Fix Issue10759 - HTMLParser.unescape() to handle malform charrefs.
15 years ago
Victor Stinner
b0c42877de
Merged revisions 81500-81501 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines
Issue #6662 : Fix parsing of malformatted charref (&#bad;)
........
r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines
Add the author of the last fix (Issue #6662 )
........
16 years ago
Victor Stinner
554a3b82e4
Issue #6662 : Fix parsing of malformatted charref (&#bad;)
16 years ago
Fred Drake
d995e1150c
revert creation of the html.entities and html.parser modules
(http://bugs.python.org/issue2882 )
18 years ago
Fred Drake
cb51d84214
update references and documentation for modules in the new html package
(http://bugs.python.org/issue2882 )
18 years ago
Fred Drake
91ae250273
rename HTMLParser to html.parser, htmlentitydefs to html.entities
(http://bugs.python.org/issue2882 )
18 years ago
Martin v. Löwis
ab8a6bba25
Patch #912410 : Replace HTML entity references for attribute values
in HTMLParser.
19 years ago
Georg Brandl
cd3c26a717
Reverting previous checkin. This breaks too much of HTMLParser to be applied
without thought. Anyway, such malformed HTML is better handled by something
like BeautifulSoup.
21 years ago
Georg Brandl
7847405a76
bug [ 761452 ] HTMLParser chokes on my.yahoo.com output
21 years ago
Fred Drake
49b4d19172
remove unnecessary override of base class method
22 years ago
Andrew M. Kuchling
b7d8ce0275
[Bug #921657 ] Allow '@' in unquoted HTML attributes. Not strictly legal according to the HTML REC, but HTMLParser is already a pretty loose parser. Reported by Bernd Zimmermann.
22 years ago
Walter Dörwald
70a6b49821
Replace backticks with repr() or "%r"
From SF patch #852334 .
22 years ago
Fred Drake
0834d77bc4
Accept commas in unquoted attribute values.
This closes SF patch #669683 .
23 years ago
Fred Drake
30d59baecd
Simplify code to remove an unnecessary test.
24 years ago
Fred Drake
248b04383f
Convert to using string methods instead of the string module.
In goahead(), use a bound version of rawdata.startswith() since we use the
same method all the time and never change the value of rawdata. This can
save a lot of bound method creation.
24 years ago
Fred Drake
bfc8fea1e0
Re-factor the HTMLParser class to use the new markupbase.ParserBase class.
Use a new internal method, error(), consistently to raise parse errors;
the new base class also uses this.
25 years ago
Tim Peters
b64bec3ec0
Whitespace normalization.
25 years ago
Fred Drake
7cf613dc77
HTMLParser is allowed to be more strict than sgmllib, so let's not
change their basic behavior: When parsing something that cannot possibly
be valid in either HTML or XHTML, raise an exception.
25 years ago
Fred Drake
68eac2b574
Added reasonable parsing of the DOCTYPE declaration, fixed edge cases
regarding bare ampersands in content.
25 years ago
Fred Drake
029acfb922
Deal more appropriately with bare ampersands and pointy brackets; this
module has to deal with "class" HTML-as-deployed as well as XHTML, so we
cannot be as strict as XHTML allows.
This closes SF bug #453059 , but uses a different fix than suggested in
the bug comments.
25 years ago
Fred Drake
1d4601d306
Change some comments into docstrings.
Fix handling of hexadecimal character references (legal in XHTML) so that
they are properly interpreted as character references.
This fixes SF bug #445196 .
25 years ago
Fred Drake
1c48eb74c9
Merge my changes to the offending comment with Guido's changes.
25 years ago
Guido van Rossum
07f353c560
Removed incorrect comment left over from sgmllib.py.
25 years ago
Guido van Rossum
8846d7178b
A much improved HTML parser -- a replacement for sgmllib. The API is
derived from but not quite compatible with that of sgmllib, so it's a
new file. I suppose it needs documentation, and htmllib needs to be
changed to use this instead of sgmllib, and sgmllib needs to be
declared obsolete. But that can all be done later.
This code was first published as part of TAL (part of Zope Page
Templates), but that was strongly based on sgmllib anyway. Authors
are Fred drake and Guido van Rossum.
25 years ago