You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

135 lines
5.8 KiB

  1. .. _xml:
  2. XML Processing Modules
  3. ======================
  4. .. module:: xml
  5. :synopsis: Package containing XML processing modules
  6. .. sectionauthor:: Christian Heimes <christian@python.org>
  7. .. sectionauthor:: Georg Brandl <georg@python.org>
  8. **Source code:** :source:`Lib/xml/`
  9. --------------
  10. Python's interfaces for processing XML are grouped in the ``xml`` package.
  11. .. warning::
  12. The XML modules are not secure against erroneous or maliciously
  13. constructed data. If you need to parse untrusted or
  14. unauthenticated data see the :ref:`xml-vulnerabilities` and
  15. :ref:`defusedxml-package` sections.
  16. It is important to note that modules in the :mod:`xml` package require that
  17. there be at least one SAX-compliant XML parser available. The Expat parser is
  18. included with Python, so the :mod:`xml.parsers.expat` module will always be
  19. available.
  20. The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the
  21. definition of the Python bindings for the DOM and SAX interfaces.
  22. The XML handling submodules are:
  23. * :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight
  24. XML processor
  25. ..
  26. * :mod:`xml.dom`: the DOM API definition
  27. * :mod:`xml.dom.minidom`: a minimal DOM implementation
  28. * :mod:`xml.dom.pulldom`: support for building partial DOM trees
  29. ..
  30. * :mod:`xml.sax`: SAX2 base classes and convenience functions
  31. * :mod:`xml.parsers.expat`: the Expat parser binding
  32. .. _xml-vulnerabilities:
  33. XML vulnerabilities
  34. -------------------
  35. The XML processing modules are not secure against maliciously constructed data.
  36. An attacker can abuse XML features to carry out denial of service attacks,
  37. access local files, generate network connections to other machines, or
  38. circumvent firewalls.
  39. The following table gives an overview of the known attacks and whether
  40. the various modules are vulnerable to them.
  41. ========================= ================== ================== ================== ================== ==================
  42. kind sax etree minidom pulldom xmlrpc
  43. ========================= ================== ================== ================== ================== ==================
  44. billion laughs **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1)
  45. quadratic blowup **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1)
  46. external entity expansion Safe (5) Safe (2) Safe (3) Safe (5) Safe (4)
  47. `DTD`_ retrieval Safe (5) Safe Safe Safe (5) Safe
  48. decompression bomb Safe Safe Safe Safe **Vulnerable**
  49. ========================= ================== ================== ================== ================== ==================
  50. 1. Expat 2.4.1 and newer is not vulnerable to the "billion laughs" and
  51. "quadratic blowup" vulnerabilities. Items still listed as vulnerable due to
  52. potential reliance on system-provided libraries. Check
  53. :data:`pyexpat.EXPAT_VERSION`.
  54. 2. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
  55. :exc:`ParserError` when an entity occurs.
  56. 3. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
  57. the unexpanded entity verbatim.
  58. 4. :mod:`xmlrpclib` doesn't expand external entities and omits them.
  59. 5. Since Python 3.7.1, external general entities are no longer processed by
  60. default.
  61. billion laughs / exponential entity expansion
  62. The `Billion Laughs`_ attack -- also known as exponential entity expansion --
  63. uses multiple levels of nested entities. Each entity refers to another entity
  64. several times, and the final entity definition contains a small string.
  65. The exponential expansion results in several gigabytes of text and
  66. consumes lots of memory and CPU time.
  67. quadratic blowup entity expansion
  68. A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
  69. entity expansion, too. Instead of nested entities it repeats one large entity
  70. with a couple of thousand chars over and over again. The attack isn't as
  71. efficient as the exponential case but it avoids triggering parser countermeasures
  72. that forbid deeply-nested entities.
  73. external entity expansion
  74. Entity declarations can contain more than just text for replacement. They can
  75. also point to external resources or local files. The XML
  76. parser accesses the resource and embeds the content into the XML document.
  77. `DTD`_ retrieval
  78. Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type
  79. definitions from remote or local locations. The feature has similar
  80. implications as the external entity expansion issue.
  81. decompression bomb
  82. Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
  83. that can parse compressed XML streams such as gzipped HTTP streams or
  84. LZMA-compressed
  85. files. For an attacker it can reduce the amount of transmitted data by three
  86. magnitudes or more.
  87. The documentation for `defusedxml`_ on PyPI has further information about
  88. all known attack vectors with examples and references.
  89. .. _defusedxml-package:
  90. The :mod:`defusedxml` Package
  91. ------------------------------------------------------
  92. `defusedxml`_ is a pure Python package with modified subclasses of all stdlib
  93. XML parsers that prevent any potentially malicious operation. Use of this
  94. package is recommended for any server code that parses untrusted XML data. The
  95. package also ships with example exploits and extended documentation on more
  96. XML exploits such as XPath injection.
  97. .. _defusedxml: https://pypi.org/project/defusedxml/
  98. .. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs
  99. .. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb
  100. .. _DTD: https://en.wikipedia.org/wiki/Document_type_definition