././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1702224198.3803241 feedparser-6.0.11/0000775000175000017500000000000014535360506012524 5ustar00kurtkurt././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702223577.0 feedparser-6.0.11/LICENSE0000664000175000017500000000611214535357331013533 0ustar00kurtkurtfeedparser and its unit tests are released under the following license: ----- begin license block ----- Copyright (C) 2010-2023 Kurt McKee Copyright (C) 2002-2008 Mark Pilgrim All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ----- end license block ----- The feedparser documentation (everything in the docs/ directory) is released under the following license: ----- begin license block ----- Copyright (C) 2010-2023 Kurt McKee Copyright (C) 2004-2008 Mark Pilgrim. All rights reserved. Redistribution and use in source (Sphinx ReST) and "compiled" forms (HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code (Sphinx ReST) must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in compiled form (converted to HTML, PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/MANIFEST.in0000664000175000017500000000016114535121615014254 0ustar00kurtkurtrecursive-include tests *.py *.xml *.gz *.z recursive-include docs *.rst *.py *.css include LICENSE include NEWS ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702223577.0 feedparser-6.0.11/NEWS0000664000175000017500000006172514535357331013240 0ustar00kurtkurtcoming in the next release: 6.0.11 - 9 December 2023 * Resolve ``cgi`` module deprecation warnings. (#330) 6.0.10 - 21 May 2022 * Populate ```` correctly if it comes after ````. (#260) 6.0.9 - 19 May 2022 * Fix a crash that can occur with GeoRSS feeds that lack a ```` tag. (#305) 6.0.8 - 22 June 2021 * Fix the name and link to the chardet module in the documentation. (#280) No code changed in this hotfix, only documentation. 6.0.7 - 21 June 2021 * Catch ``urllib.error.URLError`` to prevent crashes. (#239) 6.0.6 - 15 June 2021 * Prevent an AttributeError that occurs when a server returns HTTP 3xx but doesn't include a Location header as well. (#267) 6.0.5 - 14 June 2021 * Prevent a TypeError crash that may occur when including a username and password in the feed URL. (#276) 6.0.4 - 13 June 2021 * Prevent a UnicodeDecodeError crash that may occur when the title element's type attribute exists but is empty. (#277) * Prevent a UnicodeEncodeError crash that may occur if the URL contains Unicode characters in the path. (#273) 6.0.3 - 12 June 2021 * Fix an issue with the HTTP request status on Python >= 3.9. 6.0.2 - 25 October 2020 * Stop building Python wheels with ``universal=1`` set. (#251) This was causing pip to find and install the feedparser 6.x wheels on Python 2 even though Python 2 is no longer supported. * Fix a bug that put a trailing quote in the documentation version. (#232) * Update the documentation URL to point to ReadTheDocs. 6.0.1 - 15 September 2020 [YANKED] * Remove all Python 2 compatibility code (#228) * Add *python_requires* to ``setup.py`` (#231) 6.0.0 - 12 September 2020 [YANKED] * Support Python 3.6, 3.7, 3.8 and 3.9 * Drop support for Python 2.4 through 2.7, and Python 3.0 through 3.5 (#169) * Convert feedparser from a monolithic file to a package * ``feedparser.parse(sanitize_html=bool)`` argument replaces the ``feedparser.SANITIZE_HTML`` global * ``feedparser.parse(resolve_relative_uris=bool)`` replaces the ``feedparser.RESOLVE_RELATIVE_URIS`` global * Unify the codebase so that 2to3 conversion is no longer required * Remove references to iconv_codecs * Update the Creative Commons namespace URI's * Update the default User-Agent name and URL * Support Middle European (Summer) Time timezones (#20) * Pass ``data`` to ``lazy_chardet_encoding()`` (#50) * Document that datetimes are returned in UTC (#51) * Remove cjkpython references in the documentation (#57) * Resolve ResourceWarnings thrown during unit tests (#170) * Fix tox build failures (#213) * Use ``base64.decodebytes()`` directly to support Python 3.9 (#201) * Fix Python 3.8 ``urllib.parse.splittype()`` deprecation warning (#211) * Support parsing colons in RFC822 timezones (#144) * Add `chardet` as an optional tox environment dependency * Fix the Big5 unit test that fails when chardet is installed (#184) 5.2.1 - July 23, 2015 * Fix #22 (pip package keeps upgrading all the time) 5.2.0 - April 16, 2015 * Support PyPy * Remove the HTTP Status 9001 test that caused unit test tracebacks * Remove the completely-untested HTML tidy code * Remove BeautifulSoup as a dependency * Remove the XFN microformat parsing code * Remove the rel_enclosure microformat parsing code * Remove the rel_hcard microformat parsing code * Remove the rel_tag microformat parsing code * Replace the regex-based RFC 822 date parser with a procedural one * Replace the Python-licensed W3DTF date parser * Support HTML5 audio/source/video element relative URL's * Remove the unparsed itunes_keywords key from the result dictionary * Fix issue 321 just a little more (yet another code path was missed) * Issue 62 (support georss and gml namespaces) * Issue 296 (GUID's are always treated like relative URI's) * Issue 334 (media:restriction element content is not returned) * Issue 335 (sub-elements of media:group are not parsed and returned) * Issue 342 (support multiple dc:creator elements) * Issue 357 (loose parser breaks ampersands in link element URL's) * Issue 374 (support the Podlove Simple Chapters namespace) * Issue 380 (support media:rating element) * Issue 384 (fix chardet support in Python 3) * Issue 389 (elements in unknown uppercase namespaces are ignored) * Issue 392 (tags element subverts 'tags' key in result dictionary) * Issue 396 (Podlove Simple Chapters version 1.0 causes a KeyError) * Issue 399 (docs call `request_headers` parameter `extra_headers`) * Issue 401 (support additional dcterms and media namespaces elements) * Issue 404 (support asctime datetime strings with timezone information) * Issue 407 (decode forward slashes encoded as character entities) * Issue 421 (delay chardet invocation as long as possible) * Issue 422 (add return types docstrings) * Issue 433 (update the list of allowed MathML elements and attributes) 5.1.3 - December 9, 2012 * Consolidated and simplified the character encoding detection code * Issue 346 (the gb2312 encoding isn't always upgraded to gb18030) * Issue 350 (HTTP Last-Modified example is incorrect in documentation) * Issue 352 (importing lxml.etree changes what exceptions libxml2 throws) * Issue 356 (add support for the HTML5 attributes `poster` and `preload`) * Issue 364 (enclosure-sniffing microformat code can throw ValueError) * Issue 373 (support RFC822-ish dates with swapped days and months) * Issue 376 (uppercase 'X' in hex character references cause ValueError) * Issue 382 (don't strip inline user:password credentials from FTP URL's) 5.1.2 - May 3, 2012 * Minor changes to the documentation * Strip potentially dangerous ENTITY declarations in encoded feeds * feedparser will now try to continue parsing despite compression errors * Fix issue 321 a little more (the initial fix missed a code path) * Issue 337 (`_parse_date_rfc822()` returns None on single-digit days) * Issue 343 (add magnet links to the ACCEPTABLE_URI_SCHEMES) * Issue 344 (handle deflated data with no headers nor checksums) * Issue 347 (support `itunes:image` elements with a `url` attribute) 5.1.1 - March 20, 2011 * Fix mistakes, typos, and bugs in the unit test code * Fix crash in Python 2.4 and 2.5 if the feed has a UTF_32 byte order mark * Replace the RFC822 date parser for more extensibility * Issue 304 (handle RFC822 dates with timezones like GMT+00:00) * Issue 309 (itunes:keywords should be split by commas, not whitespace) * Issue 310 (pubDate should map to `published`, not `updated`) * Issue 313 (include the compression test files in MANIFEST.in) * Issue 314 (far-flung RFC822 dates don't throw OverflowError on x64) * Issue 315 (HTTP server for unit tests runs on 0.0.0.0) * Issue 321 (malformed URIs can cause ValueError to be thrown) * Issue 322 (HTTP redirect to HTTP 304 causes SAXParseException) * Issue 323 (installing chardet causes 11 unit test failures) * Issue 325 (map `description_detail` to `summary_detail`) * Issue 326 (Unicode filename causes UnicodeEncodeError if locale is ASCII) * Issue 327 (handle RFC822 dates with extraneous commas) * Issue 328 (temporarily map `updated` to `published` due to issue 310) * Issue 329 (escape backslashes in Windows path in docs/introduction.rst) * Issue 331 (don't escape backslashes that are in raw strings in the docs) 5.1 - December 2, 2011 * Extensive, extensive unit test refactoring * Convert the Docbook documentation to ReST * Include the documentation in the source distribution * Consolidate the disparate README files into one * Support Jython somewhat (almost all unit tests pass) * Support Python 3.2 * Fix Python 3 issues exposed by improved unit tests * Fix international domain name issues exposed by improved unit tests * Issue 148 (loose parser doesn't always return unicode strings) * Issue 204 (FeedParserDict behavior should not be controlled by `assert`) * Issue 247 (mssql date parser uses hardcoded tokyo timezone) * Issue 249 (KeyboardInterrupt and SystemExit exceptions being caught) * Issue 250 (`updated` can be a 9-tuple or a string, depending on context) * Issue 252 (running setup.py in Python 3 fails due to missing sgmllib) * Issue 253 (document that text/plain content isn't sanitized) * Issue 260 (Python 3 doesn't decompress gzip'ed or deflate'd content) * Issue 261 (popping from empty tag list) * Issue 262 (docs are missing from distribution files) * Issue 264 (vcard parser crashes on non-ascii characters) * Issue 265 (http header comparisons are case sensitive) * Issue 271 (monkey-patching sgmllib breaks other libraries) * Issue 272 (can't pass bytes or str to `parse()` in Python 3) * Issue 275 (`_parse_date()` doesn't catch OverflowError) * Issue 276 (mutable types used as default values in `parse()`) * Issue 277 (`python3 setup.py install` fails) * Issue 281 (`_parse_date()` doesn't catch ValueError) * Issue 282 (`_parse_date()` crashes when passed `None`) * Issue 285 (crash on empty xmlns attribute) * Issue 286 ('apos' character entity not handled properly) * Issue 289 (add an option to disable microformat parsing) * Issue 290 (Blogger's invalid img tags are unparseable) * Issue 292 (atom id element not explicitly supported) * Issue 294 ('categories' key exists but raises KeyError) * Issue 297 (unresolvable external doctype causes crash) * Issue 298 (nested nodes clobber actual values) * Issue 300 (performance improvements) * Issue 303 (unicode characters cause crash during relative uri resolution) * Remove "Hot RSS" support since the format doesn't actually exist * Remove the old feedparser.org website files from the source * Remove the feedparser command line interface * Remove the Zope interoperability hack * Remove extraneous whitespace 5.0.1 - February 20, 2011 * Fix issue 91 (invalid text in XML declaration causes sanitizer to crash) * Fix issue 254 (sanitization can be bypassed by malformed XML comments) * Fix issue 255 (sanitizer doesn't strip unsafe URI schemes) 5.0 - January 25, 2011 * Improved MathML support * Support microformats (rel-tag, rel-enclosure, xfn, hcard) * Support IRIs * Allow safe CSS through sanitization * Allow safe HTML5 through sanitization * Support SVG * Support inline XML entity declarations * Support unescaped quotes and angle brackets in attributes * Support additional date formats * Added the `request_headers` argument to parse() * Added the `response_headers` argument to parse() * Support multiple entry, feed, and source authors * Officially make Python 2.4 the earliest supported version * Support Python 3 * Bug fixes, bug fixes, bug fixes =============================================================================== 1.0 - 9/27/2002 - MAP - fixed namespace processing on prefixed RSS 2.0 elements, added Simon Fell's test suite 1.1 - 9/29/2002 - MAP - fixed infinite loop on incomplete CDATA sections 2.0 - 10/19/2002 JD - use inchannel to watch out for image and textinput elements which can also contain title, link, and description elements JD - check for isPermaLink='false' attribute on guid elements JD - replaced openAnything with open_resource supporting ETag and If-Modified-Since request headers JD - parse now accepts etag, modified, agent, and referrer optional arguments JD - modified parse to return a dictionary instead of a tuple so that any etag or modified information can be returned and cached by the caller 2.0.1 - 10/21/2002 - MAP - changed parse() so that if we don't get anything because of etag/modified, return the old etag/modified to the caller to indicate why nothing is being returned 2.0.2 - 10/21/2002 - JB - added the inchannel to the if statement, otherwise its useless. Fixes the problem JD was addressing by adding it. 2.1 - 11/14/2002 - MAP - added gzip support 2.2 - 1/27/2003 - MAP - added attribute support, admin:generatorAgent. start_admingeneratoragent is an example of how to handle elements with only attributes, no content. 2.3 - 6/11/2003 - MAP - added USER_AGENT for default (if caller doesn't specify); also, make sure we send the User-Agent even if urllib2 isn't available. Match any variation of backend.userland.com/rss namespace. 2.3.1 - 6/12/2003 - MAP - if item has both link and guid, return both as-is. 2.4 - 7/9/2003 - MAP - added preliminary Pie/Atom/Echo support based on Sam Ruby's snapshot of July 1 ; changed project name 2.5 - 7/25/2003 - MAP - changed to Python license (all contributors agree); removed unnecessary urllib code -- urllib2 should always be available anyway; return actual url, status, and full HTTP headers (as result['url'], result['status'], and result['headers']) if parsing a remote feed over HTTP -- this should pass all the HTTP tests at ; added the latest namespace-of-the-week for RSS 2.0 2.5.1 - 7/26/2003 - RMK - clear opener.addheaders so we only send our custom User-Agent (otherwise urllib2 sends two, which confuses some servers) 2.5.2 - 7/28/2003 - MAP - entity-decode inline xml properly; added support for inline and as used in some RSS 2.0 feeds 2.5.3 - 8/6/2003 - TvdV - patch to track whether we're inside an image or textInput, and also to return the character encoding (if specified) 2.6 - 1/1/2004 - MAP - dc:author support (MarekK); fixed bug tracking nested divs within content (JohnD); fixed missing sys import (JohanS); fixed regular expression to capture XML character encoding (Andrei); added support for Atom 0.3-style links; fixed bug with textInput tracking; added support for cloud (MartijnP); added support for multiple category/dc:subject (MartijnP); normalize content model: 'description' gets description (which can come from description, summary, or full content if no description), 'content' gets dict of base/language/type/value (which can come from content:encoded, xhtml:body, content, or fullitem); fixed bug matching arbitrary Userland namespaces; added xml:base and xml:lang tracking; fixed bug tracking unknown tags; fixed bug tracking content when element is not in default namespace (like Pocketsoap feed); resolve relative URLs in link, guid, docs, url, comments, wfw:comment, wfw:commentRSS; resolve relative URLs within embedded HTML markup in description, xhtml:body, content, content:encoded, title, subtitle, summary, info, tagline, and copyright; added support for pingback and trackback namespaces 2.7 - 1/5/2004 - MAP - really added support for trackback and pingback namespaces, as opposed to 2.6 when I said I did but didn't really; sanitize HTML markup within some elements; added mxTidy support (if installed) to tidy HTML markup within some elements; fixed indentation bug in _parse_date (FazalM); use socket.setdefaulttimeout if available (FazalM); universal date parsing and normalization (FazalM): 'created', modified', 'issued' are parsed into 9-tuple date format and stored in 'created_parsed', 'modified_parsed', and 'issued_parsed'; 'date' is duplicated in 'modified' and vice-versa; 'date_parsed' is duplicated in 'modified_parsed' and vice-versa 2.7.1 - 1/9/2004 - MAP - fixed bug handling " and '. fixed memory leak not closing url opener (JohnD); added dc:publisher support (MarekK); added admin:errorReportsTo support (MarekK); Python 2.1 dict support (MarekK) 2.7.4 - 1/14/2004 - MAP - added workaround for improperly formed
tags in encoded HTML (skadz); fixed unicode handling in normalize_attrs (ChrisL); fixed relative URI processing for guid (skadz); added ICBM support; added base64 support 2.7.5 - 1/15/2004 - MAP - added workaround for malformed DOCTYPE (seen on many blogspot.com sites); added _debug variable 2.7.6 - 1/16/2004 - MAP - fixed bug with StringIO importing 3.0b3 - 1/23/2004 - MAP - parse entire feed with real XML parser (if available); added several new supported namespaces; fixed bug tracking naked markup in description; added support for enclosure; added support for source; re-added support for cloud which got dropped somehow; added support for expirationDate 3.0b4 - 1/26/2004 - MAP - fixed xml:lang inheritance; fixed multiple bugs tracking xml:base URI, one for documents that don't define one explicitly and one for documents that define an outer and an inner xml:base that goes out of scope before the end of the document 3.0b5 - 1/26/2004 - MAP - fixed bug parsing multiple links at feed level 3.0b6 - 1/27/2004 - MAP - added feed type and version detection, result['version'] will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized; added support for creativeCommons:license and cc:license; added support for full Atom content model in title, tagline, info, copyright, summary; fixed bug with gzip encoding (not always telling server we support it when we do) 3.0b7 - 1/28/2004 - MAP - support Atom-style author element in author_detail (dictionary of 'name', 'url', 'email'); map author to author_detail if author contains name + email address 3.0b8 - 1/28/2004 - MAP - added support for contributor 3.0b9 - 1/29/2004 - MAP - fixed check for presence of dict function; added support for summary 3.0b10 - 1/31/2004 - MAP - incorporated ISO-8601 date parsing routines from xml.util.iso8601 3.0b11 - 2/2/2004 - MAP - added 'rights' to list of elements that can contain dangerous markup; fiddled with decodeEntities (not right); liberalized date parsing even further 3.0b12 - 2/6/2004 - MAP - fiddled with decodeEntities (still not right); added support to Atom 0.2 subtitle; added support for Atom content model in copyright; better sanitizing of dangerous HTML elements with end tags (script, frameset) 3.0b13 - 2/8/2004 - MAP - better handling of empty HTML tags (br, hr, img, etc.) in embedded markup, in either HTML or XHTML form (
,
,
) 3.0b14 - 2/8/2004 - MAP - fixed CDATA handling in non-wellformed feeds under Python 2.1 3.0b15 - 2/11/2004 - MAP - fixed bug resolving relative links in wfw:commentRSS; fixed bug capturing author and contributor URL; fixed bug resolving relative links in author and contributor URL; fixed bug resolving relative links in generator URL; added support for recognizing RSS 1.0; passed Simon Fell's namespace tests, and included them permanently in the test suite with his permission; fixed namespace handling under Python 2.1 3.0b16 - 2/12/2004 - MAP - fixed support for RSS 0.90 (broken in b15) 3.0b17 - 2/13/2004 - MAP - determine character encoding as per RFC 3023 3.0b18 - 2/17/2004 - MAP - always map description to summary_detail (Andrei); use libxml2 (if available) 3.0b19 - 3/15/2004 - MAP - fixed bug exploding author information when author name was in parentheses; removed ultra-problematic mxTidy support; patch to workaround crash in PyXML/expat when encountering invalid entities (MarkMoraes); support for textinput/textInput 3.0b20 - 4/7/2004 - MAP - added CDF support 3.0b21 - 4/14/2004 - MAP - added Hot RSS support 3.0b22 - 4/19/2004 - MAP - changed 'channel' to 'feed', 'item' to 'entries' in results dict; changed results dict to allow getting values with results.key as well as results[key]; work around embedded illformed HTML with half a DOCTYPE; work around malformed Content-Type header; if character encoding is wrong, try several common ones before falling back to regexes (if this works, bozo_exception is set to CharacterEncodingOverride); fixed character encoding issues in BaseHTMLProcessor by tracking encoding and converting from Unicode to raw strings before feeding data to sgmllib.SGMLParser; convert each value in results to Unicode (if possible), even if using regex-based parsing 3.0b23 - 4/21/2004 - MAP - fixed UnicodeDecodeError for feeds that contain high-bit characters in attributes in embedded HTML in description (thanks Thijs van de Vossen); moved guid, date, and date_parsed to mapped keys in FeedParserDict; tweaked FeedParserDict.has_key to return True if asking about a mapped key 3.0fc1 - 4/23/2004 - MAP - made results.entries[0].links[0] and results.entries[0].enclosures[0] into FeedParserDict; fixed typo that could cause the same encoding to be tried twice (even if it failed the first time); fixed DOCTYPE stripping when DOCTYPE contained entity declarations; better textinput and image tracking in illformed RSS 1.0 feeds 3.0fc2 - 5/10/2004 - MAP - added and passed Sam's amp tests; added and passed my blink tag tests 3.0fc3 - 6/18/2004 - MAP - fixed bug in _changeEncodingDeclaration that failed to parse utf-16 encoded feeds; made source into a FeedParserDict; duplicate admin:generatorAgent/@rdf:resource in generator_detail.url; added support for image; refactored parse() fallback logic to try other encodings if SAX parsing fails (previously it would only try other encodings if re-encoding failed); remove unichr madness in normalize_attrs now that we're properly tracking encoding in and out of BaseHTMLProcessor; set feed.language from root-level xml:lang; set entry.id from rdf:about; send Accept header 3.0 - 6/21/2004 - MAP - don't try iso-8859-1 (can't distinguish between iso-8859-1 and windows-1252 anyway, and most incorrectly marked feeds are windows-1252); fixed regression that could cause the same encoding to be tried twice (even if it failed the first time) 3.0.1 - 6/22/2004 - MAP - default to us-ascii for all text/* content types; recover from malformed content-type header parameter with no equals sign ('text/xml; charset:iso-8859-1') 3.1 - 6/28/2004 - MAP - added and passed tests for converting HTML entities to Unicode equivalents in illformed feeds (aaronsw); added and passed tests for converting character entities to Unicode equivalents in illformed feeds (aaronsw); test for valid parsers when setting XML_AVAILABLE; make version and encoding available when server returns a 304; add handlers parameter to pass arbitrary urllib2 handlers (like digest auth or proxy support); add code to parse username/password out of url and send as basic authentication; expose downloading-related exceptions in bozo_exception (aaronsw); added __contains__ method to FeedParserDict (aaronsw); added publisher_detail (aaronsw) 3.2 - 7/3/2004 - MAP - use cjkcodecs and iconv_codec if available; always convert feed to UTF-8 before passing to XML parser; completely revamped logic for determining character encoding and attempting XML parsing (much faster); increased default timeout to 20 seconds; test for presence of Location header on redirects; added tests for many alternate character encodings; support various EBCDIC encodings; support UTF-16BE and UTF16-LE with or without a BOM; support UTF-8 with a BOM; support UTF-32BE and UTF-32LE with or without a BOM; fixed crashing bug if no XML parsers are available; added support for 'Content-encoding: deflate'; send blank 'Accept-encoding: ' header if neither gzip nor zlib modules are available 3.3 - 7/15/2004 - MAP - optimize EBCDIC to ASCII conversion; fix obscure problem tracking xml:base and xml:lang if element declares it, child doesn't, first grandchild redeclares it, and second grandchild doesn't; refactored date parsing; defined public registerDateHandler so callers can add support for additional date formats at runtime; added support for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1); added zopeCompatibilityHack() which turns FeedParserDict into a regular dictionary, required for Zope compatibility, and also makes command- line debugging easier because pprint module formats real dictionaries better than dictionary-like objects; added NonXMLContentType exception, which is stored in bozo_exception when a feed is served with a non-XML media type such as 'text/plain'; respect Content-Language as default language if not xml:lang is present; cloud dict is now FeedParserDict; generator dict is now FeedParserDict; better tracking of xml:lang, including support for xml:lang='' to unset the current language; recognize RSS 1.0 feeds even when RSS 1.0 namespace is not the default namespace; don't overwrite final status on redirects (scenarios: redirecting to a URL that returns 304, redirecting to a URL that redirects to another URL with a different type of redirect); add support for HTTP 303 redirects 4.0 - MAP - support for relative URIs in xml:base attribute; fixed encoding issue with mxTidy (phopkins); preliminary support for RFC 3229; support for Atom 1.0; support for iTunes extensions; new 'tags' for categories/keywords/etc. as array of dict {'term': term, 'scheme': scheme, 'label': label} to match Atom 1.0 terminology; parse RFC 822-style dates with no time; lots of other bug fixes 4.1 - MAP - removed socket timeout; added support for chardet library ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1702224198.3803241 feedparser-6.0.11/PKG-INFO0000644000175000017500000000452414535360506013624 0ustar00kurtkurtMetadata-Version: 2.1 Name: feedparser Version: 6.0.11 Summary: Universal feed parser, handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds Home-page: https://github.com/kurtmckee/feedparser Download-URL: https://pypi.python.org/pypi/feedparser Author: Kurt McKee Author-email: contactme@kurtmckee.org License: BSD-2-Clause Keywords: atom,cdf,feed,parser,rdf,rss Platform: POSIX Platform: Windows Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Text Processing :: Markup :: XML Requires-Python: >=3.6 Description-Content-Type: text/x-rst License-File: LICENSE Requires-Dist: sgmllib3k .. This file is part of feedparser. Copyright 2010-2023 Kurt McKee Copyright 2002-2008 Mark Pilgrim Released under the BSD 2-clause license. feedparser ########## Parse Atom and RSS feeds in Python. ---- Installation ============ feedparser can be installed by running pip: .. code-block:: console $ pip install feedparser Documentation ============= The feedparser documentation is available on the web at: https://feedparser.readthedocs.io/en/latest/ It is also included in its source format, ReST, in the ``docs/`` directory. To build the documentation you'll need the Sphinx package, which is available at: https://www.sphinx-doc.org/ You can then build HTML pages using a command similar to: .. code-block:: console $ sphinx-build -b html docs/ fpdocs This will produce HTML documentation in the ``fpdocs/`` directory. Testing ======= Feedparser has an extensive test suite, powered by tox. To run it, type this: .. code-block:: console $ python -m venv venv $ source venv/bin/activate # or "venv\bin\activate.ps1" on Windows (venv) $ pip install -r requirements-dev.txt (venv) $ tox This will spawn an HTTP server that will listen on port 8097. The tests will fail if that port is in use. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702223577.0 feedparser-6.0.11/README.rst0000664000175000017500000000242214535357331014215 0ustar00kurtkurt.. This file is part of feedparser. Copyright 2010-2023 Kurt McKee Copyright 2002-2008 Mark Pilgrim Released under the BSD 2-clause license. feedparser ########## Parse Atom and RSS feeds in Python. ---- Installation ============ feedparser can be installed by running pip: .. code-block:: console $ pip install feedparser Documentation ============= The feedparser documentation is available on the web at: https://feedparser.readthedocs.io/en/latest/ It is also included in its source format, ReST, in the ``docs/`` directory. To build the documentation you'll need the Sphinx package, which is available at: https://www.sphinx-doc.org/ You can then build HTML pages using a command similar to: .. code-block:: console $ sphinx-build -b html docs/ fpdocs This will produce HTML documentation in the ``fpdocs/`` directory. Testing ======= Feedparser has an extensive test suite, powered by tox. To run it, type this: .. code-block:: console $ python -m venv venv $ source venv/bin/activate # or "venv\bin\activate.ps1" on Windows (venv) $ pip install -r requirements-dev.txt (venv) $ tox This will spawn an HTTP server that will listen on port 8097. The tests will fail if that port is in use. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1702224198.2323234 feedparser-6.0.11/docs/0000775000175000017500000000000014535360506013454 5ustar00kurtkurt././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1702224198.2323234 feedparser-6.0.11/docs/_static/0000775000175000017500000000000014535360506015102 5ustar00kurtkurt././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/_static/feedparser.css0000664000175000017500000000013314405125152017721 0ustar00kurtkurt.pre, .pre * { font-style: normal; font-family: monospace; white-space: pre; } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/add_custom_css.py0000664000175000017500000000017314535121615017015 0ustar00kurtkurt# Makes Sphinx create a to feedparser.css in the HTML output def setup(app): app.add_css_file('feedparser.css') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/advanced.rst0000664000175000017500000000111614405125152015742 0ustar00kurtkurtAdvanced Features ################# .. toctree:: :maxdepth: 2 date-parsing html-sanitization content-normalization namespace-handling resolving-relative-links version-detection character-encoding bozo .. COMMENT:
<para>xxx</para> </abstract> </sectioninfo> <title>Language Detection xxx
././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/annotated-atom03.rst0000664000175000017500000001117014405125152017254 0ustar00kurtkurt.. _annotated.atom03: Atom 0.3 ======== This is a sample Atom 0.3 feed, annotated with links that show how each value can be accessed once the feed is parsed. .. caution:: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present. .. rubric:: Annotated Atom 0.3 feed .. container:: pre `"?> :ref:`Sample Feed <reference.feed.title>` :ref:`For documentation <em>only</em> ` :ref:`<p>Copyright 2004, Mark Pilgrim</p>< ` :ref:`Sample Toolkit ` \ :ref:`tag:feedparser.org,2004-04-20:/docs/examples/atom03.xml `\ \ :ref:`2004-04-20T11:56:34Z `\ :ref:`\
\

This is an Atom syndication feed.\

\
`
\ :ref:`First entry title <reference.entry.title>`\ \ :ref:`tag:feedparser.org,2004-04-20:/docs/examples/atom03.xml:3 `\ \ :ref:`2004-04-19T07:45:00Z `\ \ :ref:`2004-04-20T00:23:47Z `\ \ :ref:`2004-04-20T11:56:34Z `\ \ :ref:`Mark Pilgrim `\ \ :ref:`http://diveintomark.org/ `\ \ :ref:`mark@example.org `\ \ :ref:`Joe `\ \ :ref:`http://example.org/joe/ `\ \ :ref:`joe@example.org `\ \ :ref:`Sam `\ \ :ref:`http://example.org/sam/ `\ \ :ref:`sam@example.org `\ :ref:`Watch out for nasty tricks ` :ref:`\
Watch out for \ nasty tricks\\
`
././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/annotated-atom10.rst0000664000175000017500000001076714405125152017265 0ustar00kurtkurt.. _annotated.atom10: Atom 1.0 ======== This is a sample Atom 1.0 feed, annotated with links that show how each value can be accessed once the feed is parsed. .. caution:: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present. .. rubric:: Annotated Atom 1.0 feed .. container:: pre `"?> :ref:`Sample Feed <reference.feed.title>` :ref:`For documentation <em>only</em> ` :ref:`<p>Copyright 2005, Mark Pilgrim</p> ` :ref:`Sample Toolkit ` \ :ref:`tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml `\ \ :ref:`2005-11-09T11:56:34Z `\ \ :ref:`First entry title <reference.entry.title>`\ \ :ref:`tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3 `\ \ :ref:`2005-11-09T00:23:47Z `\ \ :ref:`2005-11-09T11:56:34Z `\ \ :ref:`Mark Pilgrim `\ \ :ref:`http://diveintomark.org/ `\ \ :ref:`mark@example.org `\ \ :ref:`Joe `\ \ :ref:`http://example.org/joe/ `\ \ :ref:`joe@example.org `\ \ :ref:`Sam `\ \ :ref:`http://example.org/sam/ `\ \ :ref:`sam@example.org `\ :ref:`Watch out for nasty tricks ` \ :ref:`\
Watch out for \ nasty tricks\\
`
././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/annotated-examples.rst0000664000175000017500000000027214535121615017774 0ustar00kurtkurt.. _annotated: Annotated Examples ################## .. toctree:: :maxdepth: 2 annotated-atom10 annotated-atom03 annotated-rss20 annotated-rss20-dc annotated-rss10 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/annotated-rss10.rst0000664000175000017500000000560414405125152017126 0ustar00kurtkurt.. _annotated.rss10: :abbr:`RSS (Rich Site Summary)` 1.0 =================================== This is a sample :abbr:`RSS (Rich Site Summary)` 1.0 feed, annotated with links that show how each value can be accessed once the feed is parsed. .. caution:: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present. .. rubric:: Annotated :abbr:`RSS (Rich Site Summary)` 1.0 feed .. container:: pre `"?> \ :ref:`Sample Feed <reference.feed.title>`\ \ :ref:`http://www.example.org/ `\ \ :ref:`For documentation only `\ \ :ref:`en `\ \ :ref:`Mark Pilgrim ` (:ref:`mark@example.org `) \ :ref:`2004-06-04T17:40:33-05:00 `\ \ :ref:`First of all <reference.entry.title>`\ \ :ref:`http://example.org/archives/2002/09/04.html#first_of_all `\ :ref:`Americans are fat. Smokers are stupid. People who don't speak Perl are irrelevant. ` \ :ref:`Quotes `\ \ :ref:`2004-05-30T14:23:54-06:00 `\ Ian Hickson\: \\ Americans are fat. Smokers are stupid. People who don't speak Perl are irrelevant. \\]]> ` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/annotated-rss20-dc.rst0000664000175000017500000000537114405125152017514 0ustar00kurtkurt.. _annotated.rss20dc: RSS 2.0 with Namespaces ======================= This is a sample :abbr:`RSS (Rich Site Summary)` 2.0 feed that uses several allowable extension modules in namespaces. The feed is annotated with links that show how each value can be accessed once the feed is parsed. .. caution:: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present. .. rubric:: Annotated :abbr:`RSS (Rich Site Summary)` 2.0 feed with namespaces .. container:: pre `"?> \ :ref:`Sample Feed <reference.feed.title>`\ \ :ref:`http://example.org/ `\ \ :ref:`For documentation only `\ \ :ref:`en-us `\ \ :ref:`Mark Pilgrim ` (:ref:`mark@example.org `) \ :ref:`Copyright 2004 Mark Pilgrim `\ \ :ref:`2004-06-04T17:40:33-05:00 `\ \ :ref:`First of all <reference.entry.title>`\ \ :ref:`http://example.org/archives/2002/09/04.html#first_of_all `\ \ :ref:`1983@example.org `\ :ref:`Americans are fat. Smokers are stupid. People who don't speak Perl are irrelevant. ` \ :ref:`Quotes `\ \ :ref:`2002-09-04T13:54:20-05:00 `\ Ian Hickson\: \\ \ :ref:`Sample Feed <reference.feed.title>`\ \ :ref:`For documentation <em>only</em> `\ \ :ref:`http://example.org/ `\ \ :ref:`en `\ \ :ref:`Copyright 2004, Mark Pilgrim `\ \ :ref:`editor@example.org `\ \ :ref:`webmaster@example.org `\ \ :ref:`Sat, 07 Sep 2002 0:00:01 GMT `\ \ :ref:`Examples `\ \ :ref:`Sample Toolkit `\ \ :ref:`http://feedvalidator.org/docs/rss2.html `\ \ :ref:`60 `\ \ :ref:`http://example.org/banner.png `\ \ :ref:`Example banner <reference.feed.image.title>`\ \ :ref:`http://example.org/ `\ \ :ref:`80 `\ \ :ref:`15 `\ \ :ref:`Search <reference.feed.textinput.title>`\ \ :ref:`Search this site: `\ \ :ref:`q `\ \ :ref:`http://example.org/mt/mt-search.cgi `\ \ :ref:`First item title <reference.entry.title>`\ \ :ref:`http://example.org/item/1 `\ \ :ref:`Watch out for <span style="background: url(javascript:window.location='http://example.org/')"> nasty tricks</span> ` \ :ref:`mark@example.org `\ \ :ref:`Miscellaneous `\ \ :ref:`http://example.org/comments/1 `\ \ :ref:`http://example.org/guid/1 `\ \ :ref:`Thu, 05 Sep 2002 0:00:01 GMT `\ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/atom-detail.rst0000664000175000017500000000316114535121615016403 0ustar00kurtkurtGetting Detailed Information on Atom Elements ============================================= Several Atom elements share the Atom content model: title, subtitle, rights, summary, and of course content. (Atom 0.3 also had an info element which shared this content model.) :program:`Universal Feed Parser` captures all relevant metadata about these elements, most importantly the content type and the value itself. Detailed Information on Feed Elements ------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d.feed.title_detail {'type': u'text/plain', 'base': u'http://example.org/', 'language': u'en', 'value': u'Sample Feed'} >>> d.feed.subtitle_detail {'type': u'text/html', 'base': u'http://example.org/', 'language': u'en', 'value': u'For documentation only'} >>> d.feed.rights_detail {'type': u'text/html', 'base': u'http://example.org/', 'language': u'en', 'value': u'

Copyright 2004, Mark Pilgrim

'} >>> d.entries[0].title_detail {'type': 'text/plain', 'base': u'http://example.org/', 'language': u'en', 'value': u'First entry title'} >>> d.entries[0].summary_detail {'type': u'text/plain', 'base': u'http://example.org/', 'language': u'en', 'value': u'Watch out for nasty tricks'} >>> len(d.entries[0].content) 1 >>> d.entries[0].content[0] {'type': u'application/xhtml+xml', 'base': u'http://example.org/entry/3', 'language': u'en-US' 'value': u'
Watch out for nasty tricks
'} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/basic-existence.rst0000664000175000017500000000127314535121615017253 0ustar00kurtkurtTesting for Existence ===================== Feeds in the real world may be missing elements, even elements that are required by the specification. You should always test for the existence of an element before getting its value. Never assume an element is present. To test whether elements exist, you can use standard :program:`Python` dictionary idioms. Testing if elements are present ------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> 'title' in d.feed True >>> 'ttl' in d.feed False >>> d.feed.get('title', 'No title') u'Sample feed' >>> d.feed.get('ttl', 60) 60 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/basic.rst0000664000175000017500000000030014535121615015254 0ustar00kurtkurtBasic Features ############## .. toctree:: :maxdepth: 2 introduction common-rss-elements common-atom-elements atom-detail uncommon-rss uncommon-atom basic-existence ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/bozo.rst0000664000175000017500000000226614535121615015161 0ustar00kurtkurt.. _advanced.bozo: Bozo Detection ============== :program:`Universal Feed Parser` can parse feeds whether they are well-formed :abbr:`XML (Extensible Markup Language)` or not. However, since some applications may wish to reject or warn users about non-well-formed feeds, :program:`Universal Feed Parser` sets the ``bozo`` bit when it detects that a feed is not well-formed. Thanks to `Tim Bray `_ for suggesting this terminology. Detecting a non-well-formed feed -------------------------------- :: >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d.bozo 0 >>> d = feedparser.parse('http://feedparser.org/tests/illformed/rss/aaa_illformed.xml') >>> d.bozo 1 >>> d.bozo_exception >>> exc = d.bozo_exception >>> exc.getMessage() "expected '>'\\n" >>> exc.getLineNumber() 6 There are many reasons an :abbr:`XML (Extensible Markup Language)` document could be non-well-formed besides this example (incomplete end tags) See :ref:`advanced.encoding` for some other ways to trip the bozo bit. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/changes-26.rst0000664000175000017500000000305014535121615016035 0ustar00kurtkurtChanges in version 2.6 ====================== :program:`Ultra-liberal Feed Parser` 2.6 was released on January 1, 2004. - dc:author support (MarekK) - fixed bug tracking nested divs within content (JohnD) - fixed missing :file:`sys` import (JohanS) - fixed regular expression to capture :abbr:`XML (Extensible Markup Language)` character encoding (Andrei) - added support for Atom 0.3-style links - fixed bug with textInput tracking - added support for cloud (MartijnP) - added support for multiple category/dc:subject (MartijnP) - normalize content model: ``description`` gets description (which can come from ````, ``
``, or full content if no ````), ``content`` gets dict of ``base``/``language``/``type``/``value`` (which can come from ````, ````, ````, or ````) - fixed bug matching arbitrary Userland namespaces - added xml:base and xml:lang tracking - fixed bug tracking unknown tags - fixed bug tracking content when ```` element is not in default namespace (like Pocketsoap feed) - resolve relative URLs in ````, ````, ````, ````, ````, ````, ```` - resolve relative :abbr:`URI (Uniform Resource Identifier)`s within embedded :abbr:`HTML (HyperText Markup Language)` markup in ````, ````, ````, ````, ````, ``<subtitle>``, ``<summary>``, ``<info>``, ``<tagline>``, and ``<copyright>`` - added support for pingback and trackback namespaces ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-27.rst���������������������������������������������������������������0000664�0001750�0001750�00000005440�14535121615�016043� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 2.7.x ======================== The 2.7 series was a brief but necessary transition towards some of the core ideas in version 3.0. :program:`Ultra-liberal Feed Parser` 2.7.6 was released on January 16, 2004. - fixed bug with :file:`StringIO` importing :program:`Ultra-liberal Feed Parser` 2.7.5 was released on January 15, 2004. - added workaround for malformed DOCTYPE (seen on many ``blogspot.com`` sites) - added ``_debug`` variable :program:`Ultra-liberal Feed Parser` 2.7.4 was released on January 14, 2004. - added workaround for improperly formed <br/> tags in encoded :abbr:`HTML (HyperText Markup Language)` (skadz) - fixed unicode handling in normalize_attrs (ChrisL) - fixed relative :abbr:`URI (Uniform Resource Identifier)` processing for guid (skadz) - added ICBM support - added :file:`base64` support :program:`Ultra-liberal Feed Parser` 2.7.3 was released on January 14, 2004. - reverted all changes made in 2.7.2 :program:`Ultra-liberal Feed Parser` 2.7.2 was released on January 13, 2004. - "Version 2.7.2 of my feed parser, released today, will by default refuse to parse `this feed <http://intertwingly.net/stories/2004/01/12/broken.rss>`_. It does a first-pass check for wellformedness, and when that fails it sets the 'bozo' bit in the result to ``1`` and immediately terminates. You can revert to the previous behavior by passing ``disableWellFormedCheck=1``, but it will print arrogant warning messages to stderr to the effect that anyone who can't create a well-formed :abbr:`XML (Extensible Markup Language)` feed is a bozo and an incompetent fool." `source <http://intertwingly.net/blog/2004/01/12/Scientific-Method#c1074047818>`_ :program:`Ultra-liberal Feed Parser` 2.7.1 was released on January 9, 2004. - fixed bug handling " and ' - fixed memory leak not closing url opener (JohnD) - added dc:publisher support (MarekK) - added admin:errorReportsTo support (MarekK) - :program:`Python` 2.1 ``dict`` support (MarekK) :program:`Ultra-liberal Feed Parser` 2.7 was released on January 5, 2004. - really added support for trackback and pingback namespaces, as opposed to 2.6 when I said I did but didn't really - sanitize :abbr:`HTML (HyperText Markup Language)` markup within some elements - added :file:`mxTidy` support (if installed) to tidy :abbr:`HTML (HyperText Markup Language)` markup within some elements - fixed indentation bug in ``_parse_date`` (FazalM) - use ``socket.setdefaulttimeout`` if available (FazalM) - universal date parsing and normalization (FazalM): ``created``, ``modified``, ``issued`` are parsed into 9-tuple date format and stored in ``created_parsed``, ``modified_parsed``, and ``issued_parsed`` - ``date`` is duplicated in ``modified`` and vice-versa - ``date_parsed`` is duplicated in ``modified_parsed`` and vice-versa ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-30.rst���������������������������������������������������������������0000664�0001750�0001750�00000017121�14535121615�016034� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 3.0 ====================== :program:`Universal Feed Parser` 3.0 was released on June 21, 2004. - don't try ``iso-8859-1`` (can't distinguish between ``iso-8859-1`` and ``windows-1252`` anyway, and most incorrectly marked feeds are ``windows-1252``) - fixed regression that could cause the same encoding to be tried twice (even if it failed the first time) :program:`Universal Feed Parser` 3.0fc3 was released on June 18, 2004. - fixed bug in ``_changeEncodingDeclaration`` that failed to parse UTF-16 encoded feeds - made ``source`` into a FeedParserDict - duplicate admin:generatorAgent/@rdf:resource in ``generator_detail.url`` - added support for image - refactored ``parse()`` fallback logic to try other encodings if SAX parsing fails (previously it would only try other encodings if re-encoding failed) - remove ``unichr`` madness in normalize_attrs now that we're properly tracking encoding in and out of BaseHTMLProcessor - set ``feed.language`` from root-level xml:lang - set ``entry.id`` from rdf:about - send ``Accept`` header :program:`Universal Feed Parser` 3.0fc2 was released on May 10, 2004. - added and passed Sam's amp tests - added and passed my blink tag tests :program:`Universal Feed Parser` 3.0fc1 was released on April 23, 2004. - made ``results.entries[0].links[0]`` and ``results.entries[0].enclosures[0]`` into FeedParserDict - fixed typo that could cause the same encoding to be tried twice (even if it failed the first time) - fixed DOCTYPE stripping when DOCTYPE contained entity declarations - better textinput and image tracking in illformed :abbr:`RSS (Rich Site Summary)` 1.0 feeds :program:`Universal Feed Parser` 3.0b23 was released on April 21, 2004. - fixed ``UnicodeDecodeError`` for feeds that contain high-bit characters in attributes in embedded :abbr:`HTML (HyperText Markup Language)` in description (thanks Thijs van de Vossen) - moved ``guid``, ``date``, and ``date_parsed`` to mapped keys in FeedParserDict - tweaked FeedParserDict.has_key to return ``True`` if asking about a mapped key :program:`Universal Feed Parser` 3.0b22 was released on April 19, 2004. - changed ``channel`` to ``feed``, ``item`` to ``entries`` in ``results`` dict - changed ``results`` dict to allow getting values with ``results.key`` as well as ``results[key]`` - work around embedded illformed :abbr:`HTML (HyperText Markup Language)` with half a DOCTYPE - work around malformed ``Content-Type`` header - if character encoding is wrong, try several common ones before falling back to regexes (if this works, ``bozo_exception`` is set to ``CharacterEncodingOverride`` - fixed character encoding issues in BaseHTMLProcessor by tracking encoding and converting from Unicode to raw strings before feeding data to sgmllib.SGMLParser - convert each value in results to Unicode (if possible), even if using regex-based parsing :program:`Universal Feed Parser` 3.0b21 was released on April 14, 2004. - added Hot RSS support :program:`Universal Feed Parser` 3.0b20 was released on April 7, 2004. - added :abbr:`CDF (Channel Definition Format)` support :program:`Universal Feed Parser` 3.0b19 was released on March 15, 2004. - fixed bug exploding author information when author name was in parentheses - removed ultra-problematic :file:`mxTidy` support - patch to workaround crash in PyXML/expat when encountering invalid entities (MarkMoraes) - support for textinput/textInput :program:`Universal Feed Parser` 3.0b18 was released on February 17, 2004. - always map description to ``summary_detail`` (Andrei) - use :file:`libxml2` (if available) :program:`Universal Feed Parser` 3.0b17 was released on February 13, 2004. - determine character encoding as per `RFC 3023 <http://www.ietf.org/rfc/rfc3023.txt>`_ :program:`Universal Feed Parser` 3.0b16 was released on February 12, 2004. - fixed support for :abbr:`RSS (Rich Site Summary)` 0.90 (broken in b15) :program:`Universal Feed Parser` 3.0b15 was released on February 11, 2004. - fixed bug resolving relative links in wfw:commentRSS - fixed bug capturing author and contributor :abbr:`URI (Uniform Resource Identifier)` - fixed bug resolving relative links in author and contributor :abbr:`URI (Uniform Resource Identifier)` - fixed bug resolving relative links in generator :abbr:`URI (Uniform Resource Identifier)` - added support for recognizing :abbr:`RSS (Rich Site Summary)` 1.0 - passed Simon Fell's namespace tests, and included them permanently in the test suite with his permission - fixed namespace handling under :program:`Python` 2.1 :program:`Universal Feed Parser` 3.0b14 was released on February 8, 2004. - fixed CDATA handling in non-wellformed feeds under :program:`Python` 2.1 :program:`Universal Feed Parser` 3.0b13 was released on February 8, 2004. - better handling of empty :abbr:`HTML (HyperText Markup Language)` tags (br, hr, img, etc.) in embedded markup, in either :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)` form (<br>, <br/>, <br />) :program:`Universal Feed Parser` 3.0b12 was released on February 6, 2004. - fiddled with ``decodeEntities`` (still not right) - added support to Atom 0.2 subtitle - added support for Atom content model in copyright - better sanitizing of dangerous :abbr:`HTML (HyperText Markup Language)` elements with end tags (script, frameset) :program:`Universal Feed Parser` 3.0b11 was released on February 2, 2004. - added rights to list of elements that can contain dangerous markup - fiddled with ``decodeEntities`` (not right) - liberalized date parsing even further :program:`Universal Feed Parser` 3.0b10 was released on January 31, 2004. - incorporated ISO-8601 date parsing routines from :file:`xml.util.iso8601` :program:`Universal Feed Parser` 3.0b9 was released on January 29, 2004. - fixed check for presence of ``dict`` function - added support for summary :program:`Universal Feed Parser` 3.0b8 was released on January 28, 2004. - added support for contributor :program:`Universal Feed Parser` 3.0b7 was released on January 28, 2004. - support Atom-style author element in ``author_detail`` (dictionary of ``name``, ``url``, ``email``) - map ``author`` to ``author_detail`` if ``author`` contains name + email address :program:`Universal Feed Parser` 3.0b6 was released on January 27, 2004. - added feed type and version detection, ``result['version']`` will be one of ``SUPPORTED_VERSIONS.keys()`` or empty string if unrecognized - added support for creativeCommons:license and cc:license - added support for full Atom content model in title, tagline, info, copyright, summary - fixed bug with gzip encoding (not always telling server we support it when we do) :program:`Universal Feed Parser` 3.0b5 was released on January 26, 2004. - fixed bug parsing multiple links at feed level :program:`Universal Feed Parser` 3.0b4 was released on January 26, 2004. - fixed xml:lang inheritance - fixed multiple bugs tracking xml:base :abbr:`URI (Uniform Resource Identifier)`, one for documents that don't define one explicitly and one for documents that define an outer and an inner xml:base that goes out of scope before the end of the document :program:`Universal Feed Parser` 3.0b3 was released on January 23, 2004. - parse entire feed with real :abbr:`XML (Extensible Markup Language)` parser (if available) - added several new supported namespaces - fixed bug tracking naked markup in description - added support for enclosure - added support for source - re-added support for cloud which got dropped somehow - added support for expirationDate :program:`Universal Feed Parser` 3.0b2 and 3.0b1 have been lost in the mists of time. �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-301.rst��������������������������������������������������������������0000664�0001750�0001750�00000001332�14535121615�016112� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 3.0.1 ======================== :program:`Universal Feed Parser` 3.0.1 was released on June 21, 2004. - default to ``us-ascii`` for all text/* content types - recover from malformed ``content-type`` header parameter with no equals sign ("text/xml; charset:iso-8859-1") - docs: added :file:`reference-feed.html` and :file:`reference-entry.html` (bug #977723) - docs: fixed ``entry[i]`` in documentation (should be ``entries[i]``) (bug #977722) - docs: added note about Unicode string usage (bug #977716) - docs: added :file:`basic-existence.html` (bug #977704) - docs: fixed description of feed title (bug #977685) - docs: fixed typo in annotated :abbr:`RSS (Rich Site Summary)` 1.0 feed (bug #977682)������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-31.rst���������������������������������������������������������������0000664�0001750�0001750�00000001534�14535121615�016036� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 3.1 ====================== :program:`Universal Feed Parser` 3.1 was released on June 28, 2004. - added and passed tests for converting :abbr:`HTML (HyperText Markup Language)` entities to Unicode equivalents in illformed feeds (aaronsw) - added and passed tests for converting character entities to Unicode equivalents in illformed feeds (aaronsw) - test for valid parsers when setting ``XML_AVAILABLE`` - make version and encoding available when server returns a ``304`` - add ``handlers`` parameter to pass arbitrary :file:`urllib2` handlers (like digest auth or proxy support) - add code to parse username/password out of url and send as basic authentication - expose downloading-related exceptions in ``bozo_exception`` (aaronsw) - added __contains__ method to FeedParserDict (aaronsw) - added ``publisher_detail`` (aaronsw)��������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-32.rst���������������������������������������������������������������0000664�0001750�0001750�00000002067�14535121615�016041� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 3.2 ====================== :program:`Universal Feed Parser` 3.2 was released on July 3, 2004. - use :file:`cjkcodecs` and :file:`iconv_codec` if available - always convert feed to UTF-8 before passing to :abbr:`XML (Extensible Markup Language)` parser - completely revamped logic for determining character encoding and attempting :abbr:`XML (Extensible Markup Language)` parsing (much faster) - increased default timeout to 20 seconds - test for presence of ``Location`` header on redirects - added tests for many alternate character encodings - support various :abbr:`EBCDIC` encodings - support UTF-16BE and UTF16-LE with or without a :abbr:`BOM (Byte Order Mark)` - support UTF-8 with a :abbr:`BOM (Byte Order Mark)` - support UTF-32BE and UTF-32LE with or without a :abbr:`BOM (Byte Order Mark)` - fixed crashing bug if no :abbr:`XML (Extensible Markup Language)` parsers are available - added support for ``Content-encoding: deflate`` - send blank ``Accept-encoding`` header if neither :file:`gzip` nor :file:`zlib` modules are available �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-33.rst���������������������������������������������������������������0000664�0001750�0001750�00000003302�14535121615�016033� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 3.3 ====================== :program:`Universal Feed Parser` 3.3 was released on July 15, 2004. - optimized :abbr:`EBCDIC` to :abbr:`ASCII` conversion - fixed obscure problem tracking xml:base and xml:lang if element declares it, child doesn't, first grandchild redeclares it, and second grandchild doesn't - refactored date parsing - defined public ``registerDateHandler`` so callers can add support for additional date formats at runtime - added support for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1) - added ``zopeCompatibilityHack()`` which turns FeedParserDict into a regular dictionary, required for :program:`Zope` compatibility, and also makes command-line debugging easier because pprint module formats real dictionaries better than dictionary-like objects - added NonXMLContentType exception, which is stored in ``bozo_exception`` when a feed is served with a non-:abbr:`XML (Extensible Markup Language)` media type such as ``'text/plain'`` - respect ``Content-Language`` as default language if no xml:lang is present - ``cloud`` dict is now FeedParserDict - generator dict is now FeedParserDict - better tracking of xml:lang, including support for xml:lang='' to unset the current language - recognize :abbr:`RSS (Rich Site Summary)` 1.0 feeds even when :abbr:`RSS (Rich Site Summary)` 1.0 namespace is not the default namespace - don't overwrite final status on redirects (scenarios: redirecting to a :abbr:`URI (Uniform Resource Identifier)` that returns ``304``, redirecting to a :abbr:`URI (Uniform Resource Identifier)` that redirects to another :abbr:`URI (Uniform Resource Identifier)` with a different type of redirect) - add support for ``HTTP 303`` redirects������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-40.rst���������������������������������������������������������������0000664�0001750�0001750�00000002660�14535121615�016037� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 4.0 ====================== :program:`Universal Feed Parser` 4.0 was released on December 23, 2005. - Support for :ref:`annotated.atom10`. - Support for :program:`iTunes` extensions. - Support for dc:contributor. - :program:`Universal Feed Parser` now captures the feed's :ref:`reference.namespaces`. See :ref:`advanced.namespaces` for details. - Lots of things have been renamed to match Atom 1.0 terminology. issued is now :ref:`reference.entry.published`, modified is now :ref:`reference.entry.updated`, and url is now href everywhere. You can still access these elements with the old names, so you shouldn't need to change any existing code, but don't be surprised if you can't find them during debugging. - category and categories have been replaced by tags, see :ref:`reference.feed.tags` and :ref:`reference.entry.tags`. The old names still work. - mode is gone from all detail and content dictionaries. It was never terribly useful, since :program:`Universal Feed Parser` unescapes content automatically. - :ref:`reference.entry.source` is now a dictionary of feed metadata as per section 4.2.11 of RFC 4287. :program:`Universal Feed Parser` no longer supports the :abbr:`RSS (Rich Site Summary)` 2.0's source element. - Content in unknown namespaces is no longer discarded (`bug 993305 <http://sourceforge.net/tracker/index.php?func=detail&aid=993305&group_id=112328&atid=661937>`_) - Lots of other bug fixes.��������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-401.rst��������������������������������������������������������������0000664�0001750�0001750�00000000265�14535121615�016117� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 4.0.1 ======================== :program:`Universal Feed Parser` 4.0.1 was released on December 24, 2005. - bug fixes for :program:`Python` 2.1 compatibility.�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-402.rst��������������������������������������������������������������0000664�0001750�0001750�00000000233�14535121615�016113� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 4.0.2 ======================== :program:`Universal Feed Parser` 4.0.2 was released on December 24, 2005. - cleared ``_debug`` flag.���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-41.rst���������������������������������������������������������������0000664�0001750�0001750�00000001267�14535121615�016042� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 4.1 ====================== :program:`Universal Feed Parser` 4.1 was released on January 11, 2006. - Support for the `Universal Encoding Detector <http://chardet.feedparser.org/>`_ to autodetect character encoding of feeds that declare their encoding incorrectly or don't declare it at all. See :ref:`advanced.encoding` for details of when this gets called. - :program:`Universal Feed Parser` no longer sets a default socket timeout (SourceForge bug `1392140 <http://sourceforge.net/tracker/index.php?func=detail&aid=1392140&group_id=112328&atid=661937>`_). If you were relying on this feature, you will need to call socket.setdefaulttimeout(TIMEOUT_IN_SECONDS) yourself. �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-42.rst���������������������������������������������������������������0000664�0001750�0001750�00000002476�14535121615�016046� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in version 4.2 ====================== :program:`Universal Feed Parser` 4.2 was released on 2008-03-12. - Support for parsing microformats, including rel=enclosure, rel=tag, XFN, and hCard. - Updated the whitelist of :ref:`acceptable HTML elements and attributes <advanced.sanitization.html>` based on the latest draft of the :abbr:`HTML (HyperText Markup Language)` 5 specification. - Support for :ref:`advanced.sanitization.css`. (Previous versions of :program:`Universal Feed Parser` simply stripped all inline styles.) Many thanks to Sam Ruby for implementing this, despite my insistence that it was impossible. - Support for :ref:`advanced.sanitization.svg`. - Support for :ref:`advanced.sanitization.mathml`. Many thanks to Jacques Distler for patiently debugging this feature. - :abbr:`IRI (International Resource Identifier)` support for every element that can contain a :abbr:`URI (Uniform Resource Identifier)`. - Ability to :ref:`disable relative URI resolution <advanced.base.disable>`. - Command-line arguments and alternate serializers, for manipulating :program:`Universal Feed Parser` from shell scripts or other non-Python sources. - More robust parsing of author email addresses, misencoded win-1252 content, rel=self links, and better detection of HTML content in elements with ambiguous content types. ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/changes-early.rst������������������������������������������������������������0000664�0001750�0001750�00000012616�14535121615�016732� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Changes in earlier versions =========================== :program:`Universal Feed Parser` began as an "ultra-liberal RSS parser" named :file:`rssparser.py`. It was written as a weapon for battles that no one remembers, to work around problems that no longer exist. :program:`Ultra-liberal Feed Parser` 2.5.3 was released on August 3, 2003. - track whether we're inside an image or textInput (TvdV) - return the character encoding, if specified :program:`Ultra-liberal Feed Parser` 2.5.2 was released on July 28, 2003. - entity-decode inline :abbr:`XML (Extensible Markup Language)` properly - added support for inline <xhtml:body> and <xhtml:div> as used in some :abbr:`RSS (Rich Site Summary)` 2.0 feeds :program:`Ultra-liberal Feed Parser` 2.5.1 was released on July 26, 2003. - clear ``opener.addheaders`` so we only send our custom ``User-Agent`` (otherwise :file:`urllib2` sends two, which confuses some servers) (RMK) :program:`Ultra-liberal Feed Parser` 2.5 was released on July 25, 2003. - changed to :program:`Python` license (all contributors agree) - removed unnecessary :file:`>urllib` code -- :file:`urllib2` should always be available anyway - return actual ``url``, ``status``, and full :abbr:`HTTP (Hypertext Transfer Protocol)` headers (as ``result['url']``, ``result['status']``, and ``result['headers']``) if parsing a remote feed over :abbr:`HTTP (Hypertext Transfer Protocol)`. This should pass all the `Aggregator client :abbr:`HTTP (Hypertext Transfer Protocol)` tests <https://web.archive.org/web/20110404234421/http://diveintomark.org/tests/client/http/>`_. - added the latest namespace-of-the-week for :abbr:`RSS (Rich Site Summary)` 2.0 :program:`Ultra-liberal Feed Parser` 2.4 was released on July 9, 2003. - added preliminary Pie/Atom/Echo support based on `Sam Ruby's snapshot of July 1 <http://www.intertwingly.net/blog/1506.html>`_ - changed project name :program:`Ultra-liberal RSS Parser` 2.3.1 was released on June 12, 2003. - if item has both link and guid, return both as-is :program:`Ultra-liberal RSS Parser` 2.3 was released on June 11, 2003. - added ``USER_AGENT`` for default (if caller doesn't specify) - make sure we send the ``User-Agent`` even if :file:`urllib2` isn't available - Match any variation of ``backend.userland.com/rss`` namespace :program:`Ultra-liberal RSS Parser` 2.2 was released on January 27, 2003. - added attribute support and admin:generatorAgent. start_admingeneratoragent is an example of how to handle elements with only attributes, no content. :program:`Ultra-liberal RSS Parser` 2.1 was released on November 14, 2002. - added gzip support :program:`Ultra-liberal RSS Parser` 2.0.2 was released on October 21, 2002. - added the ``inchannel`` to the ``if`` statement, otherwise it's useless. Fixes the problem JD was addressing by adding it. (JB) :program:`Ultra-liberal RSS Parser` 2.0.1 was released on October 21, 2002. - changed ``parse()`` so that if we don't get anything because of ``etag``/``modified``, return the old ``etag``/``modified`` to the caller to indicate why nothing is being returned :program:`Ultra-liberal RSS Parser` 2.0 was released on October 19, 2002. - use ``inchannel`` to watch out for image and textinput elements which can also contain title, link, and description elements (JD) - check for isPermaLink='false' attribute on guid elements (JD) - replaced ``openAnything`` with ``open_resource`` supporting ``ETag`` and ``If-Modified-Since`` request headers (JD) - ``parse`` now accepts ``etag``, ``modified``, ``agent``, and ``referrer`` optional arguments (JD) - modified ``parse`` to return a dictionary instead of a tuple so that any ``etag`` or ``modified`` information can be returned and cached by the caller :program:`Ultra-liberal RSS Parser` 1.1 was released on September 27, 2002. - fixed infinite loop on incomplete CDATA sections :program:`Ultra-liberal RSS Parser` 1.0 was released on September 27, 2002. - fixed namespace processing on prefixed :abbr:`RSS (Rich Site Summary)` 2.0 elements - added Simon Fell's namespace test suite :program:`Ultra-liberal RSS Parser` was first released on August 13, 2002. `Announcement <https://web.archive.org/web/20110424133115/http://diveintomark.org/archives/2002/08/13/ultraliberal_rss_parser>`_: Aaron Swartz has been looking for an ultra-liberal :abbr:`RSS (Rich Site Summary)` parser. Now that I'm experimenting with a homegrown :abbr:`RSS (Rich Site Summary)`-to-email news aggregator, so am I. You see, most :abbr:`RSS (Rich Site Summary)` feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register's feed most days). Or just a bastardized mix of :abbr:`RSS (Rich Site Summary)` 0.9x elements with :abbr:`RSS (Rich Site Summary)` 1.0 elements (Movable Type feeds). Then there are feeds, like Aaron's feed, which are too bleeding edge. He puts an excerpt in the description element but puts the full text in the content:encoded element (as CDATA). This is valid :abbr:`RSS (Rich Site Summary)` 1.0, but nobody actually uses it (except Aaron), few news aggregators support it, and many parsers choke on it. Other parsers are confused by the new elements (guid) in :abbr:`RSS (Rich Site Summary)` 0.94 (see Dave Winer's feed for an example). And then there's Jon Udell's feed, with the fullitem element that he just sort of made up. :file:`rssparser.py`. GPL-licensed. Tested on 5000 active feeds. ������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/character-encoding.rst�������������������������������������������������������0000664�0001750�0001750�00000012137�14535121615�017726� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������.. _advanced.encoding: Character Encoding Detection ============================ `RFC 3023 <http://www.ietf.org/rfc/rfc3023.txt>`_ defines the interaction between :abbr:`XML (Extensible Markup Language)` and :abbr:`HTTP (Hypertext Transfer Protocol)` as it relates to character encoding. :abbr:`XML (Extensible Markup Language)` and :abbr:`HTTP (Hypertext Transfer Protocol)` have different ways of specifying character encoding and different defaults in case no encoding is specified, and determining which value takes precedence depends on a variety of factors. Introduction to Character Encoding ---------------------------------- In :abbr:`XML (Extensible Markup Language)`, the character encoding is optional and may be given in the :abbr:`XML (Extensible Markup Language)` declaration in the first line of the document, like this: .. sourcecode:: xml <?xml version="1.0" encoding="utf-8"?> If no encoding is given, :abbr:`XML (Extensible Markup Language)` supports the use of a Byte Order Mark to identify the document as some flavor of UTF-32, UTF-16, or UTF-8. `Section F of the XML specification <http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info>`_ outlines the process for determining the character encoding based on unique properties of the Byte Order Mark in the first two to four bytes of the document. If no encoding is specified and no Byte Order Mark is present, :abbr:`XML (Extensible Markup Language)` defaults to UTF-8. :abbr:`HTTP (Hypertext Transfer Protocol)` uses :abbr:`MIME` to define a method of specifying the character encoding, as part of the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header, which looks like this: :: Content-Type: text/html; charset="utf-8" If no charset is specified, :abbr:`HTTP (Hypertext Transfer Protocol)` defaults to iso-8859-1, but only for text/* media types. For other media types, the default encoding is undefined, which is where :abbr:`RFC (Request For Comments)` 3023 comes in. According to :abbr:`RFC (Request For Comments)` 3023, if the media type given in the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header is application/xml, application/xml-dtd, application/xml-external-parsed-entity, or any one of the subtypes of application/xml such as application/atom+xml or application/rss+xml or even application/rdf+xml, then the encoding is #. the encoding given in the ``charset`` parameter of the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header, or #. the encoding given in the encoding attribute of the :abbr:`XML (Extensible Markup Language)` declaration within the document, or #. utf-8. On the other hand, if the media type given in the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header is text/xml, text/xml-external-parsed-entity, or a subtype like text/AnythingAtAll+xml, then the encoding attribute of the :abbr:`XML (Extensible Markup Language)` declaration within the document is ignored completely, and the encoding is #. the encoding given in the charset parameter of the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header, or #. us-ascii. Handling Incorrectly-Declared Encodings --------------------------------------- :program:`Universal Feed Parser` initially uses the rules specified in :abbr:`RFC (Request For Comments)` 3023 to determine the character encoding of the feed. If parsing succeeds, then that's that. If parsing fails, :program:`Universal Feed Parser` sets the ``bozo`` bit to ``1`` and sets ``bozo_exception`` to ``feedparser.CharacterEncodingOverride``. Then it tries to reparse the feed with the following character encodings: #. the encoding specified in the :abbr:`XML (Extensible Markup Language)` declaration #. the encoding sniffed from the first four bytes of the document (as per `Section F <http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info>`_) #. the encoding auto-detected by the `chardet <https://github.com/chardet/chardet>`_, if installed #. utf-8 #. windows-1252 If the character encoding can not be determined, :program:`Universal Feed Parser` sets the ``bozo`` bit to ``1`` and sets ``bozo_exception`` to ``feedparser.CharacterEncodingUnknown``. In this case, parsed values will be strings, not Unicode strings. Handling Incorrectly-Declared Media Types ----------------------------------------- :abbr:`RFC (Request For Comments)` 3023 only applies when the feed is served over :abbr:`HTTP (Hypertext Transfer Protocol)` with a Content-Type that declares the feed to be some kind of :abbr:`XML (Extensible Markup Language)`. However, some web servers are severely misconfigured and serve feeds with a Content-Type of text/plain, application/octet-stream, or some completely bogus media type. :program:`Universal Feed Parser` will attempt to parse such feeds, but it will set the ``bozo`` bit to ``1`` and set ``bozo_exception`` to ``feedparser.NonXMLContentType``. .. seealso:: * `RFC 3023 <http://www.ietf.org/rfc/rfc3023.txt>`_ * `Section F of the XML specification <http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info>`_ * `On the well-formedness of XML documents served as text/plain <http://www.imc.org/atom-syntax/mail-archive/msg05575.html>`_ ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1702142861.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������feedparser-6.0.11/docs/common-atom-elements.rst�����������������������������������������������������0000664�0001750�0001750�00000010561�14535121615�020245� 0����������������������������������������������������������������������������������������������������ustar�00kurt����������������������������kurt�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Common Atom Elements ==================== Atom feeds generally contain more information than :abbr:`RSS (Rich Site Summary)` feeds (because more elements are required), but the most commonly used elements are still title, link, subtitle/description, various dates, and ID. This sample Atom feed is at `http://feedparser.org/docs/examples/atom10.xml <http://feedparser.org/docs/examples/atom10.xml>`_. .. sourcecode:: xml <?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://example.org/" xml:lang="en"> <title type="text">Sample Feed For documentation <em>only</em> <p>Copyright 2005, Mark Pilgrim</p>< tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml Sample Toolkit 2005-11-09T11:56:34Z First entry title tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3 2005-11-09T00:23:47Z 2005-11-09T11:56:34Z Watch out for nasty tricks
Watch out for nasty tricks
The feed elements are available in ``d.feed``. Accessing Common Feed Elements ------------------------------ :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d.feed.title u'Sample feed' >>> d.feed.link u'http://example.org/' >>> d.feed.subtitle u'For documentation only' >>> d.feed.updated u'2005-11-09T11:56:34Z' >>> d.feed.updated_parsed (2005, 11, 9, 11, 56, 34, 2, 313, 0) >>> d.feed.id u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml' Entries are available in ``d.entries``, which is a list. You access entries in the order in which they appear in the original feed, so the first entry is ``d.entries[0]``. Accessing Common Entry Elements ------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d.entries[0].title u'First entry title' >>> d.entries[0].link u'http://example.org/entry/3 >>> d.entries[0].id u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3' >>> d.entries[0].published u'2005-11-09T00:23:47Z' >>> d.entries[0].published_parsed (2005, 11, 9, 0, 23, 47, 2, 313, 0) >>> d.entries[0].updated u'2005-11-09T11:56:34Z' >>> d.entries[0].updated_parsed (2005, 11, 9, 11, 56, 34, 2, 313, 0) >>> d.entries[0].summary u'Watch out for nasty tricks' >>> d.entries[0].content [{'type': u'application/xhtml+xml', 'base': u'http://example.org/entry/3', 'language': u'en-US', 'value': u'
Watch out for nasty tricks
'}] .. note:: The parsed summary and content are not the same as they appear in the original feed. The original elements contained dangerous :abbr:`HTML (HyperText Markup Language)` markup which was sanitized. See :ref:`advanced.sanitization` for details. Because Atom entries can have more than one content element, ``d.entries[0].content`` is a list of dictionaries. Each dictionary contains metadata about a single content element. The two most important values in the dictionary are the content type, in ``d.entries[0].content[0].type``, and the actual content value, in ``d.entries[0].content[0].value``. You can get this level of detail on other Atom elements too. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/common-rss-elements.rst0000664000175000017500000000527214535121615020117 0ustar00kurtkurtCommon :abbr:`RSS (Rich Site Summary)` Elements =============================================== The most commonly used elements in :abbr:`RSS (Rich Site Summary)` feeds (regardless of version) are title, link, description, publication date, and entry ID. The publication date comes from the pubDate element, and the entry ID comes from the guid element. This sample :abbr:`RSS (Rich Site Summary)` feed is at `http://feedparser.org/docs/examples/rss20.xml `_. .. sourcecode:: xml Sample Feed For documentation <em>only</em> http://example.org/ Sat, 07 Sep 2002 00:00:01 GMT First entry title http://example.org/entry/3 Watch out for <span style="background-image: url(javascript:window.location='http://example.org/')">nasty tricks</span> Thu, 05 Sep 2002 00:00:01 GMT http://example.org/entry/3 The channel elements are available in ``d.feed``. Accessing Common Channel Elements --------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/rss20.xml') >>> d.feed.title u'Sample Feed' >>> d.feed.link u'http://example.org/' >>> d.feed.description u'For documentation only' >>> d.feed.published u'Sat, 07 Sep 2002 00:00:01 GMT' >>> d.feed.published_parsed (2002, 9, 7, 0, 0, 1, 5, 250, 0) The items are available in ``d.entries``, which is a list. You access items in the list in the same order in which they appear in the original feed, so the first item is available in ``d.entries[0]``. Accessing Common Item Elements ------------------------------ :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/rss20.xml') >>> d.entries[0].title u'First item title' >>> d.entries[0].link u'http://example.org/item/1' >>> d.entries[0].description u'Watch out for nasty tricks' >>> d.entries[0].published u'Thu, 05 Sep 2002 00:00:01 GMT' >>> d.entries[0].published_parsed (2002, 9, 5, 0, 0, 1, 3, 248, 0) >>> d.entries[0].id u'http://example.org/guid/1' .. tip:: You can also access data from :abbr:`RSS (Rich Site Summary)` feeds using Atom terminology. See :ref:`advanced.normalization` for details. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702223577.0 feedparser-6.0.11/docs/conf.py0000664000175000017500000000135214535357331014756 0ustar00kurtkurtimport os import pathlib import re import sys content = (pathlib.Path(__file__).parent.parent / 'feedparser/__init__.py').read_text() match = re.search(r"""__version__ = ['"](?P.+?)['"]""", content) version = match.group('version') release = version # project information project = 'feedparser' copyright = '2010-2023 Kurt McKee, 2004-2008 Mark Pilgrim' language = 'en' # documentation options master_doc = 'index' exclude_patterns = ['_build'] # use a custom extension to make Sphinx add a to feedparser.css sys.path.append(os.path.dirname(os.path.abspath(__file__))) extensions = ['add_custom_css'] # customize the html # files in html_static_path will be copied into _static/ when compiled html_static_path = ['_static'] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/content-normalization.rst0000664000175000017500000000450314535121615020542 0ustar00kurtkurt.. _advanced.normalization: Content Normalization ===================== :program:`Universal Feed Parser` can parse many different types of feeds: Atom, :abbr:`CDF (Channel Definition Format)`, and nine different versions of :abbr:`RSS (Rich Site Summary)`. You should not be forced to learn the differences between these formats. :program:`Universal Feed Parser` does its best to ensure that you can treat all feeds the same way, regardless of format or version. You can access the basic elements of an Atom feed using :abbr:`RSS (Rich Site Summary)` terminology. Accessing an Atom feed as an :abbr:`RSS (Rich Site Summary)` feed ----------------------------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d['channel']['title'] u'Sample Feed' >>> d['channel']['link'] u'http://example.org/' >>> d['channel']['description'] u'For documentation only >>> len(d['items']) 1 >>> e = d['items'][0] >>> e['title'] u'First entry title' >>> e['link'] u'http://example.org/entry/3' >>> e['description'] u'Watch out for nasty tricks' >>> e['author'] u'Mark Pilgrim (mark@example.org)' The same thing works in reverse: you can access :abbr:`RSS (Rich Site Summary)` feeds as if they were Atom feeds. Accessing an :abbr:`RSS (Rich Site Summary)` feed as an Atom feed ----------------------------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse(' http://feedparser.org/docs/examples/rss20.xml') >>> d.feed.subtitle_detail {'type': 'text/html', 'base': 'http://feedparser.org/docs/examples/rss20.xml', 'language': None, 'value': u'For documentation only'} >>> len(d.entries) 1 >>> e = d.entries[0] >>> e.links [{'rel': 'alternate', 'type': 'text/html', 'href': u'http://example.org/item/1'}] >>> e.summary_detail {'type': 'text/html', 'base': 'http://feedparser.org/docs/examples/rss20.xml', 'language': u'en', 'value': u'Watch out for nasty tricks'} >>> e.updated_parsed (2002, 9, 5, 0, 0, 1, 3, 248, 0) .. note:: For more examples of how :program:`Universal Feed Parser` normalizes content from different formats, see :ref:`annotated`. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/date-parsing.rst0000664000175000017500000002177314535121615016572 0ustar00kurtkurt.. _advanced.date: Date Parsing ============ Different feed types and versions use wildly different date formats. :program:`Universal Feed Parser` will attempt to auto-detect the date format used in any date element, and parse it into a standard :program:`Python` 9-tuple in UTC, as documented in `the Python time module `_. The following elements are parsed as dates: - :ref:`reference.feed.updated` is parsed into :ref:`reference.feed.updated_parsed`. - :ref:`reference.entry.published` is parsed into :ref:`reference.entry.published_parsed`. - :ref:`reference.entry.updated` is parsed into :ref:`reference.entry.updated_parsed`. - :ref:`reference.entry.created` is parsed into :ref:`reference.entry.created_parsed`. - :ref:`reference.entry.expired` is parsed into :ref:`reference.entry.expired_parsed`. History of Date Formats ----------------------- Here is a brief history of feed date formats: - :abbr:`CDF (Channel Definition Format)` states that all date values must conform to ISO 8601:1988. ISO 8601:1988 is not a freely available specification, but a brief (non-normative) description of the date formats it describes is available here: `ISO 8601:1988 Date/Time Representations `_. - :abbr:`RSS (Rich Site Summary)` 0.90 has no date elements. - Netscape :abbr:`RSS (Rich Site Summary)` 0.91 does not specify a date format, but examples within the specification show :abbr:`RFC (Request For Comments)` 822-style dates with 4-digit years. - Userland :abbr:`RSS (Rich Site Summary)` 0.91 states, "All date-times in :abbr:`RSS (Rich Site Summary)` conform to the Date and Time Specification of :abbr:`RFC (Request For Comments)` 822." `RFC 822 `_ mandates 2-digit years; it does not allow 4-digit years. - :abbr:`RSS (Rich Site Summary)` 1.0 states that all date elements must conform to `W3CDTF `_, which is a profile of ISO 8601:1988. - :abbr:`RSS (Rich Site Summary)` 2.0 states, "All date-times in :abbr:`RSS (Rich Site Summary)` conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred)." - Atom 0.3 states that all date elements must conform to `W3CDTF `_. - Atom 1.0 states that all date elements "MUST conform to the date-time production in `RFC 3339 `_. In addition, an uppercase T character MUST be used to separate date and time, and an uppercase Z character MUST be present in the absence of a numeric time zone offset." Recognized Date Formats ----------------------- Here is a representative list of the formats that :program:`Universal Feed Parser` can recognize in any date element: Recognized Date Formats ============================================ ================================= ===================================== Description Example Parsed Value ============================================ ================================= ===================================== valid RFC 822 (2-digit year) Thu, 01 Jan 04 19:48:21 GMT (2004, 1, 1, 19, 48, 21, 3, 1, 0) valid RFC 822 (4-digit year) Thu, 01 Jan 2004 19:48:21 GMT (2004, 1, 1, 19, 48, 21, 3, 1, 0) invalid RFC 822 (no time) 01 Jan 2004 (2004, 1, 1, 0, 0, 0, 3, 1, 0) invalid RFC 822 (no seconds) 01 Jan 2004 00:00 GMT (2004, 1, 1, 0, 0, 0, 3, 1, 0) valid W3CDTF (numeric timezone) 2003-12-31T10:14:55-08:00 (2003, 12, 31, 18, 14, 55, 2, 365, 0) valid W3CDTF (UTC timezone) 2003-12-31T10:14:55Z (2003, 12, 31, 10, 14, 55, 2, 365, 0) valid W3CDTF (yyyy) 2003 (2003, 1, 1, 0, 0, 0, 2, 1, 0) valid W3CDTF (yyyy-mm) 2003-12 (2003, 12, 1, 0, 0, 0, 0, 335, 0) valid W3CDTF (yyyy-mm-dd) 2003-12-31 (2003, 12, 31, 0, 0, 0, 2, 365, 0) valid ISO 8601 (yyyymmdd) 20031231 (2003, 12, 31, 0, 0, 0, 2, 365, 0) valid ISO 8601 (-yy-mm) -03-12 (2003, 12, 1, 0, 0, 0, 0, 335, 0) valid ISO 8601 (-yymm) -0312 (2003, 12, 1, 0, 0, 0, 0, 335, 0) valid ISO 8601 (-yy-mm-dd) -03-12-31 (2003, 12, 31, 0, 0, 0, 2, 365, 0) valid ISO 8601 (yymmdd) 031231 (2003, 12, 31, 0, 0, 0, 2, 365, 0) valid ISO 8601 (yyyy-o) 2003-335 (2003, 12, 1, 0, 0, 0, 0, 335, 0) valid ISO 8601 (yyo) 03335 (2003, 12, 1, 0, 0, 0, 0, 335, 0) valid asctime Sun Jan 4 16:29:06 PST 2004 (2004, 1, 5, 0, 29, 6, 0, 5, 0) bogus RFC 822 (invalid day/month) Thu, 31 Jun 2004 19:48:21 GMT (2004, 7, 1, 19, 48, 21, 3, 183, 0) bogus RFC 822 (invalid month) Mon, 26 January 2004 16:31:00 EST (2004, 1, 26, 21, 31, 0, 0, 26, 0) bogus RFC 822 (invalid timezone) Mon, 26 Jan 2004 16:31:00 ET (2004, 1, 26, 21, 31, 0, 0, 26, 0) bogus W3CDTF (invalid hour) 2003-12-31T25:14:55Z (2004, 1, 1, 1, 14, 55, 3, 1, 0) bogus W3CDTF (invalid minute) 2003-12-31T10:61:55Z (2003, 12, 31, 11, 1, 55, 2, 365, 0) bogus W3CDTF (invalid second) 2003-12-31T10:14:61Z (2003, 12, 31, 10, 15, 1, 2, 365, 0) bogus (MSSQL) 2004-07-08 23:56:58.0 (2004, 7, 8, 14, 56, 58, 3, 190, 0) bogus (MSSQL-ish, without fractional second) 2004-07-08 23:56:58 (2004, 7, 8, 14, 56, 58, 3, 190, 0) bogus (Korean) 2004-05-25 오 11:23:17 (2004, 5, 25, 14, 23, 17, 1, 146, 0) bogus (Greek) ΚυÏ, 11 ΙοÏλ 2004 12:00:00 EST (2004, 7, 11, 17, 0, 0, 6, 193, 0) bogus (Hungarian) július-13T9:15-05:00 (2004, 7, 13, 14, 15, 0, 1, 195, 0) ============================================ ================================= ===================================== :program:`Universal Feed Parser` recognizes all character-based timezone abbreviations defined in :abbr:`RFC (Request For Comments)` 822. In addition, :program:`Universal Feed Parser` recognizes the following invalid timezones: - ``AT`` is treated as ``AST`` - ``ET`` is treated as ``EST`` - ``CT`` is treated as ``CST`` - ``MT`` is treated as ``MST`` - ``PT`` is treated as ``PST`` Supporting Additional Date Formats ---------------------------------- :program:`Universal Feed Parser` supports many different date formats, but there are probably many more in the wild that are still unsupported. If you find other date formats, you can support them by registering them with ``registerDateHandler``. It takes a single argument, a callback function. The callback function should take a single argument, a string, and return a single value, a 9-tuple :program:`Python` date in UTC. Registering a third-party date handler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: import feedparser import re _my_date_pattern = re.compile( r'(\d{,2})/(\d{,2})/(\d{4}) (\d{,2}):(\d{2}):(\d{2})') def myDateHandler(aDateString): """parse a UTC date in MM/DD/YYYY HH:MM:SS format""" month, day, year, hour, minute, second = \ _my_date_pattern.search(aDateString).groups() return (int(year), int(month), int(day), \ int(hour), int(minute), int(second), 0, 0, 0) feedparser.registerDateHandler(myDateHandler) d = feedparser.parse(...) Your newly-registered date handler will be tried before all the other date handlers built into :program:`Universal Feed Parser`. (More specifically, all date handlers are tried in "last in, first out" order; i.e. the last handler to be registered is the first one tried, and so on in reverse order of registration.) If your date handler returns ``None``, or anything other than a :program:`Python` 9-tuple date, or raises an exception of any kind, the error will be silently ignored and the other registered date handlers will be tried in order. If no date handlers succeed, then the date is not parsed, and the \*_parsed value will not be present in the results dictionary. The original date string will still be available in the appropriate element in the results dictionary. .. tip:: If you write a new date handler, you are encouraged (but not required) to `submit a patch `_ so it can be integrated into the next version of :program:`Universal Feed Parser`. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/history.rst0000664000175000017500000000037514535121615015710 0ustar00kurtkurtRevision history ################ .. toctree:: :maxdepth: 2 changes-42 changes-41 changes-402 changes-401 changes-40 changes-33 changes-32 changes-31 changes-301 changes-30 changes-27 changes-26 changes-early ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/html-sanitization.rst0000664000175000017500000003603214535121615017664 0ustar00kurtkurt.. _advanced.sanitization: Sanitization ============ Most feeds embed :abbr:`HTML (HyperText Markup Language)` markup within feed elements. Some feeds even embed other types of markup, such as :abbr:`SVG (Scalable Vector Graphics)` or :abbr:`MathML (Mathematical Markup Language)`. Since many feed aggregators use a web browser (or browser component) to display content, :program:`Universal Feed Parser` sanitizes embedded markup to remove things that could pose security risks. These elements are sanitized by default: * :ref:`reference.entry.content` * :ref:`reference.entry.summary` * :ref:`reference.entry.title` * :ref:`reference.feed.info` * :ref:`reference.feed.rights` * :ref:`reference.feed.subtitle` * :ref:`reference.feed.title` .. note:: If the content is declared to be (or is determined to be) :mimetype:`text/plain`, it will not be sanitized. This is to avoid data loss. It is recommended that you check the content type in e.g. :py:attr:`entries[i].summary_detail.type`. If it is :mimetype:`text/plain` then it has not been sanitized (and you should perform HTML escaping before rendering the content). .. _advanced.sanitization.html: :abbr:`HTML (HyperText Markup Language)` Sanitization ----------------------------------------------------- The following :abbr:`HTML (HyperText Markup Language)` elements are allowed by default (all others are stripped): .. hlist:: :columns: 3 * a * abbr * acronym * address * area * article * aside * audio * b * big * blockquote * br * button * canvas * caption * center * cite * code * col * colgroup * command * datagrid * datalist * dd * del * details * dfn * dialog * dir * div * dl * dt * em * event-source * fieldset * figure * font * footer * form * h1 * h2 * h3 * h4 * h5 * h6 * header * hr * i * img * input * ins * kbd * keygen * label * legend * li * m * map * menu * meter * multicol * nav * nextid * noscript * ol * optgroup * option * output * p * pre * progress * q * s * samp * section * select * small * sound * source * spacer * span * strike * strong * sub * sup * table * tbody * td * textarea * tfoot * th * thead * time * tr * tt * u * ul * var * video The following :abbr:`HTML (HyperText Markup Language)` attributes are allowed by default (all others are stripped): .. hlist:: :columns: 3 * abbr * accept * accept-charset * accesskey * action * align * alt * autocomplete * autofocus * autoplay * axis * background * balance * bgcolor * bgproperties * border * bordercolor * bordercolordark * bordercolorlight * bottompadding * cellpadding * cellspacing * ch * challenge * char * charoff * charset * checked * choff * cite * class * clear * color * cols * colspan * compact * contenteditable * coords * data * datafld * datapagesize * datasrc * datetime * default * delay * dir * disabled * draggable * dynsrc * enctype * end * face * for * form * frame * galleryimg * gutter * headers * height * hidden * hidefocus * high * href * hreflang * hspace * icon * id * inputmode * ismap * keytype * label * lang * leftspacing * list * longdesc * loop * loopcount * loopend * loopstart * low * lowsrc * max * maxlength * media * method * min * multiple * name * nohref * noshade * nowrap * open * optimum * pattern * ping * point-size * poster * pqg * preload * prompt * radiogroup * readonly * rel * repeat-max * repeat-min * replace * required * rev * rightspacing * rows * rowspan * rules * scope * selected * shape * size * span * src * start * step * summary * suppress * tabindex * target * template * title * toppadding * type * unselectable * urn * usemap * valign * value * variable * volume * vrml * vspace * width * wrap * xml:lang .. _advanced.sanitization.svg: :abbr:`SVG (Scalable Vector Graphics)` Sanitization --------------------------------------------------- The following SVG elements are allowed by default (all others are stripped): .. hlist:: :columns: 3 * a * animate * animateColor * animateMotion * animateTransform * circle * defs * desc * ellipse * font-face * font-face-name * font-face-src * foreignObject * g * glyph * hkern * line * linearGradient * marker * metadata * missing-glyph * mpath * path * polygon * polyline * radialGradient * rect * set * stop * svg * switch * text * title * tspan * use The following :abbr:`SVG (Scalable Vector Graphics)` attributes are allowed by default (all others are stripped): .. hlist:: :columns: 3 * accent-height * accumulate * additive * alphabetic * arabic-form * ascent * attributeName * attributeType * baseProfile * bbox * begin * by * calcMode * cap-height * class * color * color-rendering * content * cx * cy * d * descent * display * dur * dx * dy * end * fill * fill-opacity * fill-rule * font-family * font-size * font-stretch * font-style * font-variant * font-weight * from * fx * fy * g1 * g2 * glyph-name * gradientUnits * hanging * height * horiz-adv-x * horiz-origin-x * id * ideographic * k * keyPoints * keySplines * keyTimes * lang * marker-end * marker-mid * marker-start * markerHeight * markerUnits * markerWidth * mathematical * max * min * name * offset * opacity * orient * origin * overline-position * overline-thickness * panose-1 * path * pathLength * points * preserveAspectRatio * r * refX * refY * repeatCount * repeatDur * requiredExtensions * requiredFeatures * restart * rotate * rx * ry * slope * stemh * stemv * stop-color * stop-opacity * strikethrough-position * strikethrough-thickness * stroke * stroke-dasharray * stroke-dashoffset * stroke-linecap * stroke-linejoin * stroke-miterlimit * stroke-opacity * stroke-width * systemLanguage * target * text-anchor * to * transform * type * u1 * u2 * underline-position * underline-thickness * unicode * unicode-range * units-per-em * values * version * viewBox * visibility * width * widths * x * x-height * x1 * x2 * xlink:actuate * xlink:arcrole * xlink:href * xlink:role * xlink:show * xlink:title * xlink:type * xml:base * xml:lang * xml:space * xmlns * xmlns:xlink * y * y1 * y2 * zoomAndPan .. _advanced.sanitization.mathml: :abbr:`MathML (Mathematical Markup Language)` Sanitization ---------------------------------------------------------- The following :abbr:`MathML (Mathematical Markup Language)` elements are allowed by default (all others are stripped): .. hlist:: :columns: 3 * annotation * annotation-xml * maction * maligngroup * malignmark * math * menclose * merror * mfenced * mfrac * mglyph * mi * mlabeledtr * mlongdiv * mmultiscripts * mn * mo * mover * mpadded * mphantom * mprescripts * mroot * mrow * ms * mscarries * mscarry * msgroup * msline * mspace * msqrt * msrow * mstack * mstyle * msub * msubsup * msup * mtable * mtd * mtext * mtr * munder * munderover * none * semantics The following :abbr:`MathML (Mathematical Markup Language)` attributes are allowed by default (all others are stripped): .. hlist:: :columns: 3 * accent * accentunder * actiontype * align * alignmentscope * altimg * altimg-height * altimg-valign * altimg-width * alttext * bevelled * charalign * close * columnalign * columnlines * columnspacing * columnspan * columnwidth * crossout * decimalpoint * denomalign * depth * dir * display * displaystyle * edge * encoding * equalcolumns * equalrows * fence * fontstyle * fontweight * form * frame * framespacing * groupalign * height * href * id * indentalign * indentalignfirst * indentalignlast * indentshift * indentshiftfirst * indentshiftlast * indenttarget * infixlinebreakstyle * largeop * length * linebreak * linebreakmultchar * linebreakstyle * lineleading * linethickness * location * longdivstyle * lquote * lspace * mathbackground * mathcolor * mathsize * mathvariant * maxsize * minlabelspacing * minsize * movablelimits * notation * numalign * open * other * overflow * position * rowalign * rowlines * rowspacing * rowspan * rquote * rspace * scriptlevel * scriptminsize * scriptsizemultiplier * selection * separator * separators * shift * side * src * stackalign * stretchy * subscriptshift * superscriptshift * symmetric * voffset * width * xlink:href * xlink:show * xlink:type * xmlns * xmlns:xlink .. _advanced.sanitization.css: :abbr:`CSS (Cascading Style Sheets)` Sanitization ------------------------------------------------- The following :abbr:`CSS (Cascading Style Sheets)` properties are allowed by default in style attributes (all others are stripped): .. hlist:: :columns: 3 * azimuth * background-color * border-bottom-color * border-collapse * border-color * border-left-color * border-right-color * border-top-color * clear * color * cursor * direction * display * elevation * float * font * font-family * font-size * font-style * font-variant * font-weight * height * letter-spacing * line-height * overflow * pause * pause-after * pause-before * pitch * pitch-range * richness * speak * speak-header * speak-numeral * speak-punctuation * speech-rate * stress * text-align * text-decoration * text-indent * unicode-bidi * vertical-align * voice-family * volume * white-space * width .. note:: Not all possible CSS values are allowed for these properties. The allowable values are restricted by a whitelist and a regular expression that allows color values and lengths. :abbr:`URI (Uniform Resource Identifier)`\s are not allowed, to prevent `platypus attacks `_. See the _HTMLSanitizer class for more details. Whitelist, Don't Blacklist -------------------------- I am often asked why :program:`Universal Feed Parser` is so hard-assed about :abbr:`HTML (HyperText Markup Language)` and :abbr:`CSS (Cascading Style Sheets)` sanitizing. To illustrate the problem, here is an incomplete list of potentially dangerous :abbr:`HTML (HyperText Markup Language)` tags and attributes: * script, which can contain malicious script * applet, embed, and object, which can automatically download and execute malicious code * meta, which can contain malicious redirects * onload, onunload, and all other on* attributes, which can contain malicious script * style, link, and the style attribute, which can contain malicious script *style?* Yes, style. :abbr:`CSS (Cascading Style Sheets)` definitions can contain executable code. Embedding Javascript in :abbr:`CSS (Cascading Style Sheets)` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This sample is taken from `http://feedparser.org/docs/examples/rss20.xml `_: .. sourcecode:: html Watch out for <span style="background: url(javascript:window.location='http://example.org/')"> nasty tricks</span> This sample is more advanced, and does not contain the keyword javascript: that many naive :abbr:`HTML (HyperText Markup Language)` sanitizers scan for: .. sourcecode:: html Watch out for <span style="any: expression(window.location='http://example.org/')"> nasty tricks</span> Internet Explorer for Windows will execute the Javascript in both of these examples. Now consider that in :abbr:`HTML (HyperText Markup Language)`, attribute values may be entity-encoded in several different ways. Embedding encoded Javascript in :abbr:`CSS (Cascading Style Sheets)` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To a browser, this: .. sourcecode:: html is the same as this (without the line breaks): .. sourcecode:: html which is the same as this (without the line breaks): .. sourcecode:: html And so on, plus several other variations, plus every combination of every variation. The more I investigate, the more cases I find where Internet Explorer for Windows will treat seemingly innocuous markup as code and blithely execute it. This is why :program:`Universal Feed Parser` uses a whitelist and not a blacklist. I am reasonably confident that none of the elements or attributes on the whitelist are security risks. I am not at all confident about elements or attributes that I have not explicitly investigated. And I have no confidence at all in my ability to detect strings within attribute values that Internet Explorer for Windows will treat as executable code. Disabling HTML Sanitization ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Though not recommended, it is possible to disable :program:`Universal Feed Parser`\'s HTML sanitization by passing ``sanitize_html=False`` to :func:`feedparser.parse()`. When passing this flag you are responsible for manually sanitizing HTML from the feed. .. seealso:: `How to consume RSS safely `_ Explains the platypus attack. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/http-authentication.rst0000664000175000017500000001240214535121615020175 0ustar00kurtkurtPassword-Protected Feeds ======================== :program:`Universal Feed Parser` supports downloading and parsing password-protected feeds that are protected by :abbr:`HTTP (Hypertext Transfer Protocol)` authentication. Both basic and digest authentication are supported. Downloading a feed protected by basic authentication (the easy way) ------------------------------------------------------------------- The easiest way is to embed the username and password in the feed :abbr:`URL (Uniform Resource Locator)` itself. In this example, the username is test and the password is basic. :: >>> import feedparser >>> d = feedparser.parse('http://test:basic@feedparser.org/docs/examples/basic_auth.xml') >>> d.feed.title u'Sample Feed' The same technique works for digest authentication. (Technically, :program:`Universal Feed Parser` will attempt basic authentication first, but if that fails and the server indicates that it requires digest authentication, :program:`Universal Feed Parser` will automatically re-request the feed with the appropriate digest authentication headers. *This means that this technique will send your password to the server in an easily decryptable form.*) .. _example.auth.inline.digest: Downloading a feed protected by digest authentication (the easy but horribly insecure way) ------------------------------------------------------------------------------------------ In this example, the username is test and the password is digest. :: >>> import feedparser >>> d = feedparser.parse('http://test:digest@feedparser.org/docs/examples/digest_auth.xml') >>> d.feed.title u'Sample Feed' You can also construct a HTTPBasicAuthHandler that contains the password information, then pass that as a handler to the ``parse`` function. HTTPBasicAuthHandler is part of the standard `urllib2 `_ module. Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` basic authentication (the hard way) -------------------------------------------------------------------------------------------------------------- :: import urllib2, feedparser # Construct the authentication handler auth = urllib2.HTTPBasicAuthHandler() # Add password information: realm, host, user, password. # A single handler can contain passwords for multiple sites; # urllib2 will sort out which passwords get sent to which sites # based on the realm and host of the URL you're retrieving auth.add_password('BasicTest', 'feedparser.org', 'test', 'basic') # Pass the authentication handler to the feed parser. # handlers is a list because there might be more than one # type of handler (urllib2 defines lots of different ones, # and you can build your own) d = feedparser.parse('http://feedparser.org/docs/examples/basic_auth.xml', handlers=[auth]) Digest authentication is handled in much the same way, by constructing an HTTPDigestAuthHandler and populating it with the necessary realm, host, user, and password information. This is more secure than :ref:`stuffing the username and password in the URL `, since the password will be encrypted before being sent to the server. Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` digest authentication (the secure way) ----------------------------------------------------------------------------------------------------------------- :: import urllib2, feedparser auth = urllib2.HTTPDigestAuthHandler() auth.add_password('DigestTest', 'feedparser.org', 'test', 'digest') d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml', handlers=[auth]) The examples so far have assumed that you know in advance that the feed is password-protected. But what if you don't know? If you try to download a password-protected feed without sending all the proper password information, the server will return an :abbr:`HTTP (Hypertext Transfer Protocol)` status code ``401``. :program:`Universal Feed Parser` makes this status code available in ``d.status``. Details on the authentication scheme are in ``d.headers['www-authenticate']``. :program:`Universal Feed Parser` does not do any further parsing on this field; you will need to parse it yourself. Everything before the first space is the type of authentication (probably ``Basic`` or ``Digest``), which controls which type of handler you'll need to construct. The realm name is given as realm="foo" -- so foo would be your first argument to auth.add_password. Other information in the www-authenticate header is probably safe to ignore; the :file:`urllib2` module will handle it for you. Determining that a feed is password-protected --------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/basic_auth.xml') >>> d.status 401 >>> d.headers['www-authenticate'] 'Basic realm="Use test/basic"' >>> d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml') >>> d.status 401 >>> d.headers['www-authenticate'] 'Digest realm="DigestTest", nonce="+LV/uLLdAwA=5d77397291261b9ef256b034e19bcb94f5b7992a", algorithm=MD5, qop="auth"' ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/http-etag.rst0000664000175000017500000000644314535121615016106 0ustar00kurtkurt.. _http.etag: ETag and Last-Modified Headers ============================== ETags and Last-Modified headers are two ways that feed publishers can save bandwidth, but they only work if clients take advantage of them. :program:`Universal Feed Parser` gives you the ability to take advantage of these features, but you must use them properly. The basic concept is that a feed publisher may provide a special :abbr:`HTTP (Hypertext Transfer Protocol)` header, called an ETag, when it publishes a feed. You should send this ETag back to the server on subsequent requests. If the feed has not changed since the last time you requested it, the server will return a special :abbr:`HTTP (Hypertext Transfer Protocol)` status code (``304``) and no feed data. Using ETags to reduce bandwidth ------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d.etag '"6c132-941-ad7e3080"' >>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', etag=d.etag) >>> d2.status 304 >>> d2.feed {} >>> d2.entries [] >>> d2.debug_message 'The feed has not changed since you last checked, so the server sent no data. This is a feature, not a bug!' There is a related concept which accomplishes the same thing, but slightly differently. In this case, the server publishes the last-modified date of the feed in the :abbr:`HTTP (Hypertext Transfer Protocol)` header. You can send this back to the server on subsequent requests, and if the feed has not changed, the server will return :abbr:`HTTP (Hypertext Transfer Protocol)` status code ``304`` and no feed data. Using Last-Modified headers to reduce bandwidth ----------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d.modified Fri, 11 Jun 2012 23:00:34 GMT >>> d.modified_parsed (2004, 6, 11, 23, 0, 34, 4, 163, 0) >>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', modified=d.modified) >>> d2.status 304 >>> d2.feed {} >>> d2.entries [] >>> d2.debug_message 'The feed has not changed since you last checked, so the server sent no data. This is a feature, not a bug!' Clients should support both ETag and Last-Modified headers, as some servers support one but not the other. .. important:: If you do not support ETag and Last-Modified headers, you will repeatedly download feeds that have not changed. This wastes your bandwidth and the publisher's bandwidth, and the publisher may ban you from accessing their server. .. note:: You can control the behaviour of :abbr:`HTTP (Hypertext Transfer Protocol)` caches between your application and the origin server by using the ``extra_headers`` parameter. For example, you may want to send ``Cache-control: max-age=60`` to make the caches revalidate against the origin server unless their cached copy is less than a minute old. Again, this should be used with consideration. .. seealso:: * `HTTP Conditional Get For RSS Hackers `_ * `HTTP Web Services `_ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/http-other.rst0000664000175000017500000000265114535121615016304 0ustar00kurtkurtOther :abbr:`HTTP (Hypertext Transfer Protocol)` Headers ======================================================== You can specify additional :abbr:`HTTP (Hypertext Transfer Protocol)` request headers as a dictionary. When you download a feed from a remote web server, :program:`Universal Feed Parser` exposes the complete set of :abbr:`HTTP (Hypertext Transfer Protocol)` response headers as a dictionary. .. _example.http.headers.request: Sending custom :abbr:`HTTP (Hypertext Transfer Protocol)` request headers ------------------------------------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom03.xml', request_headers={'Cache-control': 'max-age=0'}) Accessing other :abbr:`HTTP (Hypertext Transfer Protocol)` response headers --------------------------------------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom03.xml') >>> d.headers {'date': 'Fri, 11 Jun 2004 23:57:50 GMT', 'server': 'Apache/2.0.49 (Debian GNU/Linux)', 'last-modified': 'Fri, 11 Jun 2004 23:00:34 GMT', 'etag': '"6c132-941-ad7e3080"', 'accept-ranges': 'bytes', 'vary': 'Accept-Encoding,User-Agent', 'content-encoding': 'gzip', 'content-length': '883', 'connection': 'close', 'content-type': 'application/xml'} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/http-redirect.rst0000664000175000017500000000523014535121615016760 0ustar00kurtkurt:abbr:`HTTP (Hypertext Transfer Protocol)` Redirects ==================================================== When you download a feed from a remote web server, :program:`Universal Feed Parser` exposes the :abbr:`HTTP (Hypertext Transfer Protocol)` status code. You need to understand the different codes, including permanent and temporary redirects, and feeds that have been marked "gone". When a feed has temporarily moved to a new location, the web server will return a ``302`` status code. :program:`Universal Feed Parser` makes this available in ``d.status``. There is nothing special you need to do with temporary redirects; by the time you learn about it, :program:`Universal Feed Parser` has already followed the redirect to the new location (available in ``d.href``), downloaded the feed, and parsed it. Since the redirect is temporary, you should continue requesting the original :abbr:`URL (Uniform Resource Locator)` the next time you want to parse the feed. Noticing temporary redirects ---------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/temporary.xml') >>> d.status 302 >>> d.href 'http://feedparser.org/docs/examples/atom10.xml' >>> d.feed.title u'Sample Feed' When a feed has permanently moved to a new location, the web server will return a ``301`` status code. Again, :program:`Universal Feed Parser` makes this available in ``d.status``. If you are polling a feed on a regular basis, it is very important to check the status code (``d.status``) every time you download. If the feed has been permanently redirected, you should update your database or configuration file with the new address (``d.href``). Repeatedly requesting the original address of a feed that has been permanently redirected is very rude, and may get you banned from the server. Noticing permanent redirects ---------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/permanent.xml') >>> d.status 301 >>> d.href 'http://feedparser.org/docs/examples/atom10.xml' >>> d.feed.title u'Sample Feed' When a feed has been permanently deleted, the web server will return a ``410`` status code. If you ever receive a ``410``, you should stop polling the feed and inform the end user that the feed is gone for good. Repeatedly requesting a feed that has been marked as "gone" is very rude, and may get you banned from the server. Noticing feeds marked "gone" ---------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/gone.xml') >>> d.status 410 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/http-useragent.rst0000664000175000017500000000302514535121615017154 0ustar00kurtkurtUser-Agent and Referer Headers ============================== :program:`Universal Feed Parser` sends a default User-Agent string when it requests a feed from a web server. The default User-Agent string looks like this: :: UniversalFeedParser/5.0.1 +http://feedparser.org/ If you are embedding :program:`Universal Feed Parser` in a larger application, you should change the User-Agent to your application name and :abbr:`URL (Uniform Resource Locator)`. Customizing the User-Agent -------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', agent='MyApp/1.0 +http://example.com/') You can also set the User-Agent once, globally, and then call the ``parse`` function normally. Customizing the User-Agent permanently -------------------------------------- :: >>> import feedparser >>> feedparser.USER_AGENT = "MyApp/1.0 +http://example.com/" >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') :program:`Universal Feed Parser` also lets you set the referrer when you download a feed from a web server. This is discouraged, because it is a violation of `RFC 2616 `_. The default behavior is to send a blank referrer, and you should never need to override this. Customizing the referrer ------------------------ :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', referrer='http://example.com/') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/http.rst0000664000175000017500000000033414405125152015155 0ustar00kurtkurt:abbr:`HTTP (Hypertext Transfer Protocol)` Features ################################################### .. toctree:: :maxdepth: 2 http-etag http-useragent http-redirect http-authentication http-other ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/index.rst0000664000175000017500000000124314535121615015311 0ustar00kurtkurt============= Documentation ============= This documentation claims to describe the behavior of :program:`feedparser` |version|. It does not claim to describe the behavior of any other version. This documentation lives at `https://feedparser.readthedocs.io/en/latest/ `_. If you're reading it somewhere else, you may not have the latest version. This documentation is provided by the author "as is" without any express or implied warranties. See :ref:`the documentation license ` for more details. .. toctree:: :maxdepth: 2 basic advanced http annotated-examples history reference license ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/introduction.rst0000664000175000017500000000513314535121615016725 0ustar00kurtkurtIntroduction ============ :program:`Universal Feed Parser` is a :program:`Python` module for downloading and parsing syndicated feeds. It can handle :abbr:`RSS (Rich Site Summary)` 0.90, Netscape :abbr:`RSS (Rich Site Summary)` 0.91, Userland :abbr:`RSS (Rich Site Summary)` 0.91, :abbr:`RSS (Rich Site Summary)` 0.92, :abbr:`RSS (Rich Site Summary)` 0.93, :abbr:`RSS (Rich Site Summary)` 0.94, :abbr:`RSS (Rich Site Summary)` 1.0, :abbr:`RSS (Rich Site Summary)` 2.0, Atom 0.3, Atom 1.0, and :abbr:`CDF (Channel Definition Format)` feeds. It also parses several popular extension modules, including Dublin Core and Apple's :program:`iTunes` extensions. To use :program:`Universal Feed Parser`, you will need :program:`Python` 3.6 or later. :program:`Universal Feed Parser` is not meant to run standalone; it is a module for you to use as part of a larger :program:`Python` program. :program:`Universal Feed Parser` is easy to use; the module is self-contained in a single file, :file:`feedparser.py`, and it has one primary public function, ``parse``. ``parse`` takes a number of arguments, but only one is required, and it can be a :abbr:`URL (Uniform Resource Locator)`, a local filename, or a raw string containing feed data in any format. Parsing a feed from a remote :abbr:`URL (Uniform Resource Locator)` ------------------------------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') >>> d['feed']['title'] u'Sample Feed' The following example assumes you are on Windows, and that you have saved a feed at :file:`c:\\incoming\\atom10.xml`. .. note:: :program:`Universal Feed Parser` works on any platform that can run :program:`Python`; use the path syntax appropriate for your platform. Parsing a feed from a local file -------------------------------- :: >>> import feedparser >>> d = feedparser.parse(r'c:\incoming\atom10.xml') >>> d['feed']['title'] u'Sample Feed' :program:`Universal Feed Parser` can also parse a feed in memory. Parsing a feed from a string ---------------------------- :: >>> import feedparser >>> rawdata = """ Sample Feed """ >>> d = feedparser.parse(rawdata) >>> d['feed']['title'] u'Sample Feed' Values are returned as :program:`Python` Unicode strings (except when they're not -- see :ref:`advanced.encoding` for all the gory details). .. seealso:: `Introduction to Python Unicode strings `_ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702223577.0 feedparser-6.0.11/docs/license.rst0000664000175000017500000000276214535357331015641 0ustar00kurtkurt.. _license: Documentation license ===================== Copyright 2010-2023 Kurt McKee, 2004-2008 Mark Pilgrim. All rights reserved. Redistribution and use in source (Sphinx ReST) and "compiled" forms (HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code (Sphinx ReST) must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in compiled form (converted to HTML, PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/namespace-handling.rst0000664000175000017500000001477314535121615017734 0ustar00kurtkurt.. _advanced.namespaces: Namespace Handling ================== :program:`Universal Feed Parser` attempts to expose all possible data in feeds, including elements in extension namespaces. Some common namespaced elements are mapped to core elements. For further information about these mappings, see :ref:`reference`. Other namespaced elements are available as ``prefixelement``. The namespaces defined in the feed are available in the parsed results as ``namespaces``, a dictionary of {prefix: namespaceURI}. If the feed defines a default namespace, it is listed as ``namespaces['']``. Accessing namespaced elements ----------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/prism.rdf') >>> d.feed.prism_issn u'0028-0836' >>> d.namespaces {'': u'http://purl.org/rss/1.0/', 'prism': u'http://prismstandard.org/namespaces/1.2/basic/', 'rdf': u'http://www.w3.org/1999/02/22-rdf-syntax-ns#'} The prefix used to construct the variable name is not guaranteed to be the same as the prefix of the namespaced element in the original feed. If :program:`Universal Feed Parser` recognizes the namespace, it will use the namespace's preferred prefix to construct the variable name. It will also list the namespace in the ``namespaces`` dictionary using the namespace's preferred prefix. In the previous example, the namespace (http://prismstandard.org/namespaces/1.2/basic/) was defined with the namespace's preferred prefix (prism), so the prism:issn element was accessible as the variable ``d.feed.prism_issn``. However, if the namespace is defined with a non-standard prefix, :program:`Universal Feed Parser` will still construct the variable name using the preferred prefix, *not* the actual prefix that is used in the feed. This will become clear with an example. Accessing namespaced elements with non-standard prefixes -------------------------------------------------------- :: >>> import feedparser >>> d = feedparser.parse('http://feedparser.org/docs/examples/nonstandard_prefix.rdf') >>> d.feed.prism_issn u'0028-0836' >>> d.feed.foo_issn Traceback (most recent call last): File "", line 1, in ? File "feedparser.py", line 158, in __getattr__ raise AttributeError, "object has no attribute '%s'" % key AttributeError: object has no attribute 'foo_issn' >>> d.namespaces {'': u'http://purl.org/rss/1.0/', 'prism': u'http://prismstandard.org/namespaces/1.2/basic/', 'rdf': u'http://www.w3.org/1999/02/22-rdf-syntax-ns#'} This is the complete list of namespaces that :program:`Universal Feed Parser` recognizes and uses to construct the variable names for data in these namespaces: =============== ===================================================== Prefix Namespace =============== ===================================================== admin http://webns.net/mvcb/ ag http://purl.org/rss/1.0/modules/aggregation/ annotate http://purl.org/rss/1.0/modules/annotate/ audio http://media.tangent.org/rss/1.0/ blogChannel http://backend.userland.com/blogChannelModule cc http://web.resource.org/cc/ co http://purl.org/rss/1.0/modules/company content http://purl.org/rss/1.0/modules/content/ cp http://my.theinfo.org/changed/1.0/rss/ creativeCommons http://backend.userland.com/creativeCommonsRssModule dc http://purl.org/dc/elements/1.1/ dcterms http://purl.org/dc/terms/ email http://purl.org/rss/1.0/modules/email/ ev http://purl.org/rss/1.0/modules/event/ feedburner http://rssnamespace.org/feedburner/ext/1.0 fm http://freshmeat.net/rss/fm/ foaf http://xmlns.com/foaf/0.1/ geo http://www.w3.org/2003/01/geo/wgs84_pos# icbm http://postneo.com/icbm/ image http://purl.org/rss/1.0/modules/image/ itunes http://example.com/DTDs/PodCast-1.0.dtd itunes http://www.itunes.com/DTDs/PodCast-1.0.dtd l http://purl.org/rss/1.0/modules/link/ media http://search.yahoo.com/mrss pingback http://madskills.com/public/xml/rss/module/pingback/ prism http://prismstandard.org/namespaces/1.2/basic/ rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs http://www.w3.org/2000/01/rdf-schema# ref http://purl.org/rss/1.0/modules/reference/ reqv http://purl.org/rss/1.0/modules/richequiv/ search http://purl.org/rss/1.0/modules/search/ slash http://purl.org/rss/1.0/modules/slash/ soap http://schemas.xmlsoap.org/soap/envelope/ ss http://purl.org/rss/1.0/modules/servicestatus/ str http://hacks.benhammersley.com/rss/streaming/ sub http://purl.org/rss/1.0/modules/subscription/ sy http://purl.org/rss/1.0/modules/syndication/ szf http://schemas.pocketsoap.com/rss/myDescModule/ taxo http://purl.org/rss/1.0/modules/taxonomy/ thr http://purl.org/rss/1.0/modules/threading/ ti http://purl.org/rss/1.0/modules/textinput/ trackback http://madskills.com/public/xml/rss/module/trackback/ wfw http://wellformedweb.org/CommentAPI/ wiki http://purl.org/rss/1.0/modules/wiki/ xhtml http://www.w3.org/1999/xhtml xlink http://www.w3.org/1999/xlink xml http://www.w3.org/XML/1998/namespace =============== ===================================================== .. note:: :program:`Universal Feed Parser` treats namespaces as case-insensitive to match the behavior of certain versions of :program:`iTunes`. .. warning:: Data from namespaced elements is not :ref:`sanitized ` (even if it contains :abbr:`HTML (HyperText Markup Language)` markup). ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-bozo.rst0000664000175000017500000000154114535121615017110 0ustar00kurtkurt:py:attr:`bozo` =============== An integer, either ``1`` or ``0``. Set to ``1`` if the feed is not well-formed :abbr:`XML (Extensible Markup Language)`, and ``0`` otherwise. See :ref:`advanced.bozo` for more details on the :py:attr:`bozo` bit. .. tip:: :py:attr:`bozo` may not be present. Some platforms, such as Mac OS X 10.2 and some versions of FreeBSD, do not include an :abbr:`XML (Extensible Markup Language)` parser in their :program:`Python` distributions. :program:`Universal Feed Parser` will still work on these platforms, but it will not be able to detect whether a feed is well-formed. However, it *can* detect whether a feed's character encoding is incorrectly declared. (This is done in :program:`Python`, not by the :abbr:`XML (Extensible Markup Language)` parser.) See :ref:`advanced.encoding` for details. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-bozo_exception.rst0000664000175000017500000000040214535121615021161 0ustar00kurtkurt:py:attr:`bozo_exception` ========================= The exception raised when attempting to parse a non-well-formed feed. See :ref:`advanced.bozo` for more details. .. tip:: :py:attr:`bozo_exception` will only be present if :py:attr:`bozo` is ``1``. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-encoding.rst0000664000175000017500000000063614535121615017731 0ustar00kurtkurt.. _reference.encoding: :py:attr:`encoding` =================== The character encoding that was used to parse the feed. .. note:: The process by which :program:`Universal Feed Parser` determines the character encoding of the feed is explained in :ref:`advanced.encoding`. .. tip:: This element always exists, although it may be an empty string if the character encoding cannot be determined. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-author.rst0000664000175000017500000000066014535121615020601 0ustar00kurtkurt.. _reference.entry.author: :py:attr:`entries[i].author` ============================ The author of this entry. .. seealso:: * :ref:`reference.entry.author_detail` .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:author * /atom03:feed/atom03:entry/atom03:author * /rss/channel/item/dc:creator * /rss/channel/item/dc:author * /rss/channel/itunes:author * /rdf:RDF/rdf:item/dc:creator * /rdf:RDF/rdf:item/dc:author ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-author_detail.rst0000664000175000017500000000227214535121615022124 0ustar00kurtkurt.. _reference.entry.author_detail: :py:attr:`entries[i].author_detail` =================================== A dictionary with details about the author of this entry. .. seealso:: * :ref:`reference.entry.author` .. _reference.entry.author_detail.name: :py:attr:`entries[i].author_detail.name` ---------------------------------------- The name of this entry's author. .. _reference.entry.author_detail.href: :py:attr:`entries[i].author_detail.href` ---------------------------------------- The :abbr:`URL (Uniform Resource Locator)` of this entry's author. This can be the author's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.entry.author_detail.email: :py:attr:`entries[i].author_detail.email` ----------------------------------------- The email address of this entry's author. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:author * /atom03:feed/atom03:entry/atom03:author * /rss/channel/item/dc:creator * /rss/channel/item/dc:author * /rss/channel/itunes:author * /rdf:RDF/rdf:item/dc:creator * /rdf:RDF/rdf:item/dc:author ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-comments.rst0000664000175000017500000000064614535121615021130 0ustar00kurtkurt.. _reference.entry.comments: :py:attr:`entries[i].comments` ============================== A :abbr:`URL (Uniform Resource Locator)` of the :abbr:`HTML (HyperText Markup Language)` comment submission page associated with this entry. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /rss/channel/item/comments ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-content.rst0000664000175000017500000000742614535121615020760 0ustar00kurtkurt.. _reference.entry.content: :py:attr:`entries[i].content` ============================= A list of dictionaries with details about the full content of the entry. Atom feeds may contain multiple content elements. Clients should render as many of them as possible, based on the type and the client's abilities. .. _reference.entry.content.value: :py:attr:`entries[i].content[j].value` -------------------------------------- The value of this piece of content. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. .. _reference.entry.content.type: :py:attr:`entries[i].content[j].type` ------------------------------------- The content type of this piece of content. Most likely values for `type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. .. _reference.entry.content.language: :py:attr:`entries[i].content[j].language` ----------------------------------------- The language of this piece of content. :py:attr:`~entries[i].content[j].language` is supposed to be a language code, as specified by :rfc:`3066`, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. :py:attr:`~entries[i].content[j].language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the :mailheader:`Content-Language` :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~entries[i].content[j].language` will be ``None``, the :program:`Python` null value. .. _reference.entry.content.base: :py:attr:`entries[i].content[j].base` ------------------------------------- The original base :abbr:`URI (Uniform Resource Identifier)` for links within this piece of content. :py:attr:`~entries[i].content[j].base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* .. rubric:: Comes from * /atom03:feed/atom03:entry/atom03:content * /atom10:feed/atom10:entry/atom10:content * /rdf:RDF/rdf:item/content:encoded * /rss/channel/item/body * /rss/channel/item/content:encoded * /rss/channel/item/fullitem * /rss/channel/item/xhtml:body ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-contributors.rst0000664000175000017500000000177514535121615022044 0ustar00kurtkurt:py:attr:`entries[i].contributors` ================================== A list of contributors (secondary authors) to this entry. .. _reference.entry.contributors.name: :py:attr:`entries[i].contributors[j].name` ------------------------------------------ The name of this contributor. .. _reference.entry.contributors.href: :py:attr:`entries[i].contributors[j].href` ------------------------------------------ The :abbr:`URL (Uniform Resource Locator)` of this contributor. This can be the contributor's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.entry.contributors.email: :py:attr:`entries[i].contributors[j].email` ------------------------------------------- The email address of this contributor. .. rubric:: Comes from * /atom03:feed/atom03:entry/atom03:contributor * /atom10:feed/atom10:entry/atom10:contributor * /rss/channel/item/dc:contributor ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-created.rst0000664000175000017500000000101514535121615020701 0ustar00kurtkurt.. _reference.entry.created: :py:attr:`entries[i].created` ============================= The date this entry was first created (drafted), as a string in the same format as it was published in the original feed). This element is :ref:`parsed as a date ` and stored in :ref:`reference.entry.created_parsed`. .. rubric:: Comes from * /atom03:feed/atom03:entry/atom03:created * /rdf:RDF/rdf:item/dcterms:created * /rss/channel/item/dcterms:created .. seealso:: * :ref:`reference.entry.created_parsed` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-created_parsed.rst0000664000175000017500000000061414535121615022243 0ustar00kurtkurt.. _reference.entry.created_parsed: :py:attr:`entries[i].created_parsed` ==================================== The date this entry was first created (drafted), as a standard :program:`Python` 9-tuple. .. rubric:: Comes from * /atom03:feed/atom03:entry/atom03:created * /rdf:RDF/rdf:item/dcterms:created * /rss/channel/item/dcterms:created .. seealso:: * :ref:`reference.entry.created` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-enclosures.rst0000664000175000017500000000254414535121615021464 0ustar00kurtkurt.. _reference.entry.enclosures: :py:attr:`entries[i].enclosures` ================================ A list of links to external files associated with this entry. Some aggregators automatically download enclosures (although this technique has `known problems `_). Some aggregators render each enclosure as a link. Most aggregators ignore them. The :abbr:`RSS (Rich Site Summary)` specification states that there can be at most one enclosure per item. However, because some feeds break this rule, :program:`Universal Feed Parser` captures all of them and makes them available as a list. .. rubric:: Comes from - /atom10:feed/atom10:entry/atom10:link[@rel="enclosure"] - /rss/channel/item/enclosure .. _reference.entry.enclosures.href: :py:attr:`entries[i].enclosures[j].href` ---------------------------------------- The :abbr:`URL (Uniform Resource Locator)` of the linked file. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.entry.enclosures.length: :py:attr:`entries[i].enclosures[j].length` ------------------------------------------ The length of the linked file. .. _reference.entry.enclosures.type: :py:attr:`entries[i].enclosures[j].type` ---------------------------------------- The content type of the linked file. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-expired.rst0000664000175000017500000000117614535121615020742 0ustar00kurtkurt.. _reference.entry.expired: :py:attr:`entries[i].expired` ============================= The date this entry is set to expire, as a string in the same format as it was published in the original feed). This element is :ref:`parsed as a date ` and stored in :ref:`reference.entry.expired_parsed`. This element is rare. It only existed in :abbr:`RSS (Rich Site Summary)` 0.93, and it was never widely implemented by publishers. Most clients ignore it in favor of user-defined expiration algorithms. .. rubric:: Comes from * /rss/channel/item/expirationDate .. seealso:: * :ref:`reference.entry.expired_parsed` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-expired_parsed.rst0000664000175000017500000000077514535121615022304 0ustar00kurtkurt.. _reference.entry.expired_parsed: :py:attr:`entries[i].expired_parsed` ==================================== The date this entry is set to expire, as a standard :program:`Python` 9-tuple. This element is rare. It only existed in :abbr:`RSS (Rich Site Summary)` 0.93, and it was never widely implemented by publishers. Most clients ignore it in favor of user-defined expiration algorithms. .. rubric:: Comes from * /rss/channel/item/expirationDate .. seealso:: * :ref:`reference.entry.expired` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-id.rst0000664000175000017500000000063414535121615017674 0ustar00kurtkurt.. _reference.entry.id: :py:attr:`entries[i].id` ======================== A globally unique identifier for this entry. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom03:feed/atom03:entry/atom03:id * /atom10:feed/atom10:entry/atom10:id * /rdf:RDF/rdf:item/@rdf:about * /rss/channel/item/guid ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-license.rst0000664000175000017500000000075714535121615020730 0ustar00kurtkurt.. _reference.entry.license: :py:attr:`entries[i].license` ============================= A :abbr:`URL (Uniform Resource Locator)` of the license under which this entry is distributed. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:link[@rel="license"]/@href * /rdf:RDF/rdf:item/cc:license/@rdf:resource * /rss/channel/item/creativeCommons:license ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-link.rst0000664000175000017500000000216214535121615020233 0ustar00kurtkurt.. _reference.entry.link: :py:attr:`entries[i].link` ========================== The primary link of this entry. Most feeds use this as the permanent link to the entry in the site's archives. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. Some :abbr:`RSS (Rich Site Summary)` feeds use guid when they mean link. guid can also be used as an opaque identifier that has nothing to do with links. If an :abbr:`RSS (Rich Site Summary)` feed uses guid as the entry link and no link is present, :program:`Universal Feed Parser` detects this and makes the guid available in :py:attr:`entries[i].link`. In other words, you can always use :py:attr:`entries[i].link` to get the entry link, regardless of how the feed is actually structured. .. rubric:: Comes from - /atom03:feed/atom03:entry/atom03:link[@rel="alternate"]/@href - /atom10:feed/atom10:entry/atom10:link[@rel="alternate"]/@href - /atom10:feed/atom10:entry/atom10:link[not(@rel)]/@href - /rdf:RDF/rdf:item/rdf:link - /rss/channel/item/link .. seealso:: * :ref:`reference.entry.links` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-links.rst0000664000175000017500000000276014535121615020422 0ustar00kurtkurt.. _reference.entry.links: :py:attr:`entries[i].links` =========================== A list of dictionaries with details on the links associated with the feed. Each link has a rel (relationship), type (content type), and href (the :abbr:`URL (Uniform Resource Locator)` that the link points to). Some links may also have a title. .. _reference.entry.links.rel: :py:attr:`entries[i].links[j].rel` ---------------------------------- The relationship of this entry link. Atom 1.0 defines five standard link relationships and describes the process for registering others. Here are the five standard rel values: * `alternate` * `enclosure` * `related` * `self` * `via` .. _reference.entry.links.type: :py:attr:`entries[i].links[j].type` ----------------------------------- The content type of the page that this entry link points to. .. _reference.entry.links.href: :py:attr:`entries[i].links[j].href` ----------------------------------- The :abbr:`URL (Uniform Resource Locator)` of the page that this entry link points to. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.entry.links.title: :py:attr:`entries[i].links[j].title` ------------------------------------ The title of this entry link. .. rubric:: Comes from - /atom03:feed/atom03:entry/atom03:link - /atom10:feed/atom10:entry/atom10:link - /rdf:RDF/rdf:item/rdf:link - /rss/channel/item/link .. seealso:: * :ref:`reference.entry.link` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-published.rst0000664000175000017500000000112414535121615021252 0ustar00kurtkurt.. _reference.entry.published: :py:attr:`entries[i].published` =============================== The date this entry was first published, as a string in the same format as it was published in the original feed. This element is :ref:`parsed as a date ` and stored in :ref:`reference.entry.published_parsed`. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:published * /atom03:feed/atom03:entry/atom03:issued * /rss/channel/item/dcterms:issued * /rss/channel/item/pubDate * /rdf:RDF/rdf:item/dcterms:issued .. seealso:: * :ref:`reference.entry.published_parsed` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-published_parsed.rst0000664000175000017500000000072214535121615022613 0ustar00kurtkurt.. _reference.entry.published_parsed: :py:attr:`entries[i].published_parsed` ====================================== The date this entry was first published, as a standard :program:`Python` 9-tuple. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:published * /atom03:feed/atom03:entry/atom03:issued * /rss/channel/item/dcterms:issued * /rdf:RDF/rdf:item/dcterms:issued * /rss/channel/item/pubDate .. seealso:: * :ref:`reference.entry.published` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-publisher.rst0000664000175000017500000000045014535121615021271 0ustar00kurtkurt.. _reference.entry.publisher: :py:attr:`entries[i].publisher` =============================== The publisher of the entry. .. rubric:: Comes from * /rss/item/dc:publisher * /rss/item/itunes:owner * /rdf:RDF/rdf:item/dc:publisher .. seealso:: * :ref:`reference.entry.publisher_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-publisher_detail.rst0000664000175000017500000000177114535121615022622 0ustar00kurtkurt.. _reference.entry.publisher_detail: :py:attr:`entries[i].publisher_detail` ====================================== A dictionary with details about the entry publisher. :py:attr:`entries[i].publisher_detail.name` ------------------------------------------- The name of this entry's publisher. .. _reference.entry.publisher_detail.href: :py:attr:`entries[i].publisher_detail.href` ------------------------------------------- The :abbr:`URL (Uniform Resource Locator)` of this entry's publisher. This can be the publisher's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`entries[i].publisher_detail.email` -------------------------------------------- The email address of this entry's publisher. .. rubric:: Comes from * /rss/item/dc:publisher * /rss/item/itunes:owner * /rdf:RDF/rdf:item/dc:publisher .. seealso:: * :ref:`reference.entry.publisher` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-source.rst0000664000175000017500000004100014535121615020570 0ustar00kurtkurt.. _reference.entry.source: :py:attr:`entries[i].source` ============================ A dictionary with details about the source of the entry. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:source :py:attr:`entries[i].source.author` ----------------------------------- The author of the source of this entry. :py:attr:`entries[i].source.author_detail` ------------------------------------------ A dictionary containing details about the author of the source of this entry. :py:attr:`entries[i].source.author_detail.name` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The name of the author of the source of this entry. .. _reference.entry.source.author_detail.href: :py:attr:`entries[i].source.author_detail.href` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :abbr:`URL (Uniform Resource Locator)` of the author of the source of this entry. This can be the author's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.author_detail.email` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The email address of the author of the source of this entry. :py:attr:`entries[i].source.contributors` ----------------------------------------- A list of contributors to the source of this entry. :py:attr:`entries[i].source.contributors[j].name` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The name of a contributor to the source of this entry. .. _reference.entry.source.contributors.href: :py:attr:`entries[i].source.contributors[j].href` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :abbr:`URL (Uniform Resource Locator)` of a contributor to the source of this entry. This can be the contributor's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.contributors[j].email` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The email address of a contributor to the source of this entry. :py:attr:`entries[i].source.icon` --------------------------------- The :abbr:`URL (Uniform Resource Locator)` of an icon representing the source of this entry. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.id` ------------------------------- A globally unique identifier for the source of this entry. :py:attr:`entries[i].source.link` --------------------------------- The primary permanent link of the source of this entry :py:attr:`entries[i].source.links` ---------------------------------- A list of all links defined by the source of this entry. :py:attr:`entries[i].source.links[j].rel` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The relationship of a link defined by the source of this entry. Atom 1.0 defines five standard link relationships and describes the process for registering others. Here are the five standard rel values: * ``alternate`` * ``self`` * ``related`` * ``via`` * ``enclosure`` :py:attr:`entries[i].source.links[j].type` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The content type of the page pointed to by a link defined by the source of this entry. .. _reference.entry.source.links.href: :py:attr:`entries[i].source.links[j].href` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :abbr:`URL (Uniform Resource Locator)` of the page pointed to by a link defined by the source of this entry. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.links[j].title` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The title of a link defined by the source of this entry. :py:attr:`entries[i].source.logo` --------------------------------- The :abbr:`URL (Uniform Resource Locator)` of a logo representing the source of this entry. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.entry.source.rights: :py:attr:`entries[i].source.rights` ----------------------------------- A human-readable copyright statement for the source of this entry. :py:attr:`entries[i].source.rights_detail` ------------------------------------------ A dictionary containing details about the copyright statement for the source of this entry. :py:attr:`entries[i].source.rights_detail.value` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same as :ref:`reference.entry.source.rights`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.rights_detail.type` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The content type of the copyright statement for the source of this entry. Most likely values for :py:attr:`~entries[i].source.rights_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`entries[i].source.rights_detail.language` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The language of the copyright statement for the source of this entry. :py:attr:`~entries[i].source.rights_detail.language` is supposed to be a language code, as specified by `RFC 3066`_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. .. _RFC 3066: http://www.ietf.org/rfc/rfc3066.txt :py:attr:`~entries[i].source.rights_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~entries[i].source.rights_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`entries[i].source.rights_detail.base` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The original base :abbr:`URI (Uniform Resource Identifier)` for links within the copyright statement for the source of this entry. :py:attr:`entries[i].source.rights_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* .. _reference.entry.source.subtitle: :py:attr:`entries[i].source.subtitle` ------------------------------------- A subtitle, tagline, slogan, or other short description of the source of this entry. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.subtitle_detail` -------------------------------------------- A dictionary containing details about the subtitle for the source of this entry. :py:attr:`entries[i].source.subtitle_detail.value` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same as :ref:`reference.entry.source.subtitle`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.subtitle_detail.type` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The content type of the subtitle of the source of this entry. Most likely values for :py:attr:`~entries[i].source.subtitle_detail.type`: * :mimetype:`text/plain`` * :mimetype:`text/html`` * :mimetype:`application/xhtml+xml`` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain`` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html``. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain``, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`entries[i].source.subtitle_detail.language` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The language of the subtitle of the source of this entry. :py:attr:`~entries[i].source.subtitle_detail.language` is supposed to be a language code, as specified by `RFC 3066`_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. :py:attr:`~entries[i].source.subtitle_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~entries[i].source.subtitle_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`entries[i].source.subtitle_detail.base` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The original base :abbr:`URI (Uniform Resource Identifier)` for links within the subtitle of the source of this entry. :py:attr:`entries[i].source.subtitle_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* .. _reference.entry.source.title: :py:attr:`entries[i].source.title` ---------------------------------- The title of the source of this entry. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.title_detail` ----------------------------------------- A dictionary containing details about the title for the source of this entry. :py:attr:`entries[i].source.title_detail.value` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same as :ref:`reference.entry.source.title`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. :py:attr:`entries[i].source.title_detail.type` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The content type of the title of the source of this entry. Most likely values for :py:attr:`entries[i].source.title_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`entries[i].source.title_detail.language` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The language of the title of the source of this entry. :py:attr:`~entries[i].source.title_detail.language` is supposed to be a language code, as specified by `RFC 3066`_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. :py:attr:`~entries[i].source.title_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~entries[i].source.title_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`entries[i].source.title_detail.base` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The original base :abbr:`URI (Uniform Resource Identifier)` for links within the title of the source of this entry. :py:attr:`entries[i].source.title_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* :py:attr:`entries[i].source.updated` ------------------------------------ The date the source of this entry was last updated, as a string in the same format as it was published in the original feed. This element is :ref:`parsed as a date ` and stored in :ref:`reference.entry.source.updated_parsed`. .. _reference.entry.source.updated_parsed: :py:attr:`entries[i].source.updated_parsed` ------------------------------------------- The date this entry was last updated, as a standard :program:`Python` 9-tuple. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-summary.rst0000664000175000017500000000266614535121615021004 0ustar00kurtkurt.. _reference.entry.summary: :py:attr:`entries[i].summary` ============================= A summary of the entry. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. Some publishing systems auto-generate this value from the first few words or first paragraph of the entry. Other publishing systems misuse it to include the full content. In the latter cases, :program:`Universal Feed Parser` ought to detect it and put the value in :ref:`reference.entry.content` instead, but it doesn't. .. note:: Some feeds include both a summary and description element for each entry. In this case, the first element will be available in ``entry['summary']`` and the second will be available in ``entry['content'][0]``. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:summary * /atom03:feed/atom03:entry/atom03:summary * /rss/channel/item/description * /rss/channel/item/dc:description * /rdf:RDF/rdf:item/rdf:description * /rdf:RDF/rdf:item/dc:description .. seealso:: * :ref:`reference.entry.summary_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-summary_detail.rst0000664000175000017500000000733014535121615022317 0ustar00kurtkurt.. _reference.entry.summary_detail: :py:attr:`entries[i].summary_detail` ==================================== A dictionary with details about the entry summary. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:summary * /atom03:feed/atom03:entry/atom03:summary * /rss/channel/item/description * /rss/channel/item/dc:description * /rdf:RDF/rdf:item/rdf:description * /rdf:RDF/rdf:item/dc:description .. seealso:: * :ref:`reference.entry.summary` .. _reference.entry.summary_detail.value: :py:attr:`entries[i].summary_detail.value` ------------------------------------------ Same as :ref:`reference.entry.summary`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. .. _reference.entry.summary_detail.type: :py:attr:`entries[i].summary_detail.type` ----------------------------------------- The content type of the entry summary. Most likely values for :py:attr:`~entries[i].summary_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`entries[i].summary_detail.language` --------------------------------------------- The language of the entry summary. :py:attr:`~entries[i].summary_detail.language` is supposed to be a language code, as specified by `RFC 3066`_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. .. _RFC 3066: http://www.ietf.org/rfc/rfc3066.txt :py:attr:`~entries[i].summary_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~entries[i].summary_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`entries[i].summary_detail.base` ----------------------------------------- The original base :abbr:`URI (Uniform Resource Identifier)` for links within the entry summary. :py:attr:`~entries[i].summary_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-tags.rst0000664000175000017500000000223414535121615020234 0ustar00kurtkurt.. _reference.entry.tags: :py:attr:`entries[i].tags` ========================== A list of dictionaries that contain details of the categories for the entry. .. note:: Prior to version 4.0, :program:`Universal Feed Parser` exposed categories in ``feed.category`` (the primary category) and ``feed.categories`` (a list of tuples containing the domain and term of each category). These uses are still supported for backward compatibility, but you will not see them in the parsed results unless you explicitly ask for them. .. _reference.entry.tags.term: :py:attr:`entries[i].tags[j].term` ---------------------------------- The category term (keyword). :py:attr:`entries[i].tags[j].scheme` ------------------------------------ The category scheme (domain). :py:attr:`entries[i].tags[j].label` ----------------------------------- A human-readable label for the category. .. rubric:: Comes from * /atom10:feed/atom10:entry/category * /atom03:feed/atom03:entry/dc:subject * /rss/channel/item/category * /rss/channel/item/dc:subject * /rss/channel/item/itunes:category * /rss/channel/item/itunes:keywords * /rdf:RDF/rdf:channel/rdf:item/dc:subject ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-title.rst0000664000175000017500000000154214535121615020420 0ustar00kurtkurt.. _reference.entry.title: :py:attr:`entries[i].title` =========================== The title of the entry. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, they are :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom03:feed/atom03:entry/atom03:title * /atom10:feed/atom10:entry/atom10:title * /rdf:RDF/rdf:item/dc:title * /rdf:RDF/rdf:item/rdf:title * /rss/channel/item/dc:title * /rss/channel/item/title .. seealso:: * :ref:`reference.entry.title_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-title_detail.rst0000664000175000017500000000714414535121615021746 0ustar00kurtkurt.. _reference.entry.title_detail: :py:attr:`entries[i].title_detail` ================================== A dictionary with details about the entry title. .. _reference.entry.title_detail.value: :py:attr:`entries[i].title_detail.value` ---------------------------------------- Same as :ref:`reference.entry.title`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. :py:attr:`entries[i].title_detail.type` --------------------------------------- The content type of the entry title. Most likely values for :py:attr:`~entries[i].title_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`entries[i].title_detail.language` ------------------------------------------- The language of the entry title. :py:attr:`~entries[i].title_detail.language` is supposed to be a language code, as specified by `RFC 3066`_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. .. _RFC 3066: http://www.ietf.org/rfc/rfc3066.txt :py:attr:`~entries[i].title_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~entries[i].title_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`entries[i].title_detail.base` --------------------------------------- The original base :abbr:`URI (Uniform Resource Identifier)` for links within the entry title. :py:attr:`~entries[i].title_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:title * /atom03:feed/atom03:entry/atom03:title * /rss/channel/item/title * /rss/channel/item/dc:title * /rdf:RDF/rdf:item/rdf:title * /rdf:RDF/rdf:item/dc:title .. seealso:: * :ref:`reference.entry.title` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-updated.rst0000664000175000017500000000246414535121615020731 0ustar00kurtkurt.. _reference.entry.updated: :py:attr:`entries[i].updated` ============================= The date this entry was last updated, as a string in the same format as it was published in the original feed). This element is :ref:`parsed as a date ` and stored in :ref:`reference.entry.updated_parsed`. .. note:: As of version 5.1.1, if this key doesn't exist but :py:attr:`entries[i].published` does, the value of :py:attr:`entries[i].published` will be returned. In the past the RSS pubDate element was stored in `updated`, but this incorrect behavior was reported in issue 310. However, developers may have come to rely on this incorrect behavior -- as was reported in issue 328 -- so to help avoid hurting their users' experience, this mapping from `updated` to `published` was temporarily introduced to give developers time to update their software, and to give users time to upgrade. This mapping is temporary and will be removed in a future version of feedparser. .. rubric:: Comes from * /atom03:feed/atom03:entry/atom03:modified * /atom10:feed/atom10:entry/atom10:updated * /rdf:RDF/rdf:item/dc:date * /rdf:RDF/rdf:item/dcterms:modified * /rss/channel/item/dc:date * /rss/channel/item/dcterms:modified .. seealso:: * :ref:`reference.entry.updated_parsed` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry-updated_parsed.rst0000664000175000017500000000232014535121615022256 0ustar00kurtkurt.. _reference.entry.updated_parsed: :py:attr:`entries[i].updated_parsed` ==================================== The date this entry was last updated, as a standard :program:`Python` 9-tuple. .. note:: As of version 5.1.1, if this key doesn't exist but :py:attr:`entries[i].published_parsed` does, the value of :py:attr:`entries[i].published_parsed` will be returned. In the past the RSS pubDate element was stored in `updated`, but this incorrect behavior was reported in issue 310. However, developers may have come to rely on this incorrect behavior -- as was reported in issue 328 -- so to help avoid hurting their users' experience, this mapping from `updated_parsed` to `published_parsed` was temporarily introduced to give developers time to update their software, and to give users time to upgrade. This mapping is temporary and will be removed in a future version of feedparser. .. rubric:: Comes from * /atom10:feed/atom10:entry/atom10:updated * /atom03:feed/atom03:entry/atom03:modified * /rss/channel/item/dc:date * /rss/channel/item/dcterms:modified * /rdf:RDF/rdf:item/dc:date * /rdf:RDF/rdf:item/dcterms:modified .. seealso:: * :ref:`reference.entry.updated` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-entry.rst0000664000175000017500000000061114535121615017275 0ustar00kurtkurt:py:attr:`entries` ================== A list of dictionaries. Each dictionary contains data from a different entry. Entries are listed in the order in which they appear in the original feed. .. tip:: This element always exists, although it may be an empty list. .. rubric:: Comes from * /atom03:feed/atom03:entry * /atom10:feed/atom10:entry * /rdf:RDF/rdf:item * /rss/channel/item ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-etag.rst0000664000175000017500000000102014535121615017047 0ustar00kurtkurt:py:attr:`etag` =============== The ETag of the feed, as specified in the :abbr:`HTTP (Hypertext Transfer Protocol)` headers. The purpose of :py:attr:`etag` is explained more fully in :ref:`http.etag`. .. tip:: :py:attr:`etag` will only be present if the feed was retrieved from a web server, and only if the web server provided an ETag :abbr:`HTTP (Hypertext Transfer Protocol)` header for the feed. If the feed was parsed from a local file or from a string in memory, :py:attr:`etag` will not be present. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-author.rst0000664000175000017500000000064314535121615020344 0ustar00kurtkurt.. _reference.feed.author: :py:attr:`feed.author` ====================== The author of this feed. .. rubric:: Comes from * /atom03:feed/atom03:author * /atom10:feed/atom10:author * /rdf:RDF/rdf:channel/dc:author * /rdf:RDF/rdf:channel/dc:creator * /rss/channel/dc:author * /rss/channel/dc:creator * /rss/channel/itunes:author * /rss/channel/managingEditor .. seealso:: * :ref:`reference.feed.author_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-author_detail.rst0000664000175000017500000000216214535121615021664 0ustar00kurtkurt.. _reference.feed.author_detail: :py:attr:`feed.author_detail` ============================= A dictionary with details about the feed author. .. _reference.feed.author_detail.name: :py:attr:`feed.author_detail.name` ---------------------------------- The name of the feed author. .. _reference.feed.author_detail.href: :py:attr:`feed.author_detail.href` ---------------------------------- The :abbr:`URL (Uniform Resource Locator)` of the feed author. This can be the author's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.feed.author_detail.email: :py:attr:`feed.author_detail.email` ----------------------------------- The email address of the feed author. .. rubric:: Comes from * /atom03:feed/atom03:author * /atom10:feed/atom10:author * /rdf:RDF/rdf:channel/dc:author * /rdf:RDF/rdf:channel/dc:creator * /rss/channel/dc:author * /rss/channel/dc:creator * /rss/channel/itunes:author * /rss/channel/managingEditor .. seealso:: * :ref:`reference.feed.author` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-cloud.rst0000664000175000017500000000346414535121615020154 0ustar00kurtkurt:py:attr:`feed.cloud` ===================== No one really knows what a cloud is. It is vaguely documented in `:abbr:`SOAP (Simple Object Access Protocol)` meets :abbr:`RSS (Rich Site Summary)` `_. .. _reference.feed.cloud.domain: :py:attr:`feed.cloud.domain` ---------------------------- The domain of the cloud. Should be just the domain name, not including the http:// protocol. All clouds are presumed to operate over :abbr:`HTTP (Hypertext Transfer Protocol)`. The cloud specification does not support secure clouds over :abbr:`HTTPS`, nor can clouds operate over other protocols. .. _reference.feed.cloud.port: :py:attr:`feed.cloud.port` -------------------------- The port of the cloud. Should be an integer, but :program:`Universal Feed Parser` currently returns it as a string. .. _reference.feed.cloud.path: :py:attr:`feed.cloud.path` -------------------------- The :abbr:`URL (Uniform Resource Locator)` path of the cloud. .. _reference.feed.cloud.registerProcedure: :py:attr:`feed.cloud.registerProcedure` --------------------------------------- The name of the procedure to call on the cloud. .. _reference.feed.cloud.protocol: :py:attr:`feed.cloud.protocol` ------------------------------ The protocol of the cloud. Documentation differs on what the acceptable values are. Acceptable values definitely include xml-rpc and soap, although only in lowercase, despite both being acronyms. There is no way for a publisher to specify the version number of the protocol to use. soap refers to :abbr:`SOAP (Simple Object Access Protocol)` 1.1; the cloud interface does not support :abbr:`SOAP (Simple Object Access Protocol)` 1.0 or 1.2. post or http-post might also be acceptable values; nobody really knows for sure. .. rubric:: Comes from * /rss/channel/cloud ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-contributors.rst0000664000175000017500000000153314535121615021576 0ustar00kurtkurt:py:attr:`feed.contributors` ============================ A list of contributors (secondary authors) to this feed. :py:attr:`feed.contributors[i].name` ------------------------------------ The name of this contributor. .. _reference.feed.contributors.href: :py:attr:`feed.contributors[i].href` ------------------------------------ The :abbr:`URL (Uniform Resource Locator)` of this contributor. This can be the contributor's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`feed.contributors[i].email` ------------------------------------- The email address of this contributor. .. rubric:: Comes from * /atom03:feed/atom03:contributor * /atom10:feed/atom10:contributor * /rss/channel/dc:contributor ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-docs.rst0000664000175000017500000000115514535121615017771 0ustar00kurtkurt.. _reference.feed.docs: :py:attr:`feed.docs` ==================== A :abbr:`URL (Uniform Resource Locator)` pointing to the specification which this feed conforms to. This element is rare. The reasoning was that in 25 years, someone will stumble on an :abbr:`RSS (Rich Site Summary)` feed and not know what it is, so we should waste everyone's bandwidth with useless links until then. Most publishers skip it, and all clients ignore it. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /rss/channel/docs ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-errorreportsto.rst0000664000175000017500000000034414535121615022153 0ustar00kurtkurt.. _reference.feed.errorreportsto: :py:attr:`feed.errorreportsto` ============================== An email address for reporting errors in the feed itself. .. rubric:: Comes from * /rdf:RDF/admin:errorReportsTo/@rdf:resource ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-generator.rst0000664000175000017500000000060214535121615021023 0ustar00kurtkurt.. _reference.feed.generator: :py:attr:`feed.generator` ========================= A human-readable name of the application used to generate the feed. .. rubric:: Comes from * /atom03:feed/atom03:generator * /atom10:feed/atom10:generator * /rdf:RDF/rdf:channel/admin:generatorAgent/@rdf:resource * /rss/channel/generator .. seealso:: * :ref:`reference.feed.generator_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-generator_detail.rst0000664000175000017500000000216414535121615022352 0ustar00kurtkurt.. _reference.feed.generator_detail: :py:attr:`feed.generator_detail` ================================ A dictionary with details about the feed generator. :py:attr:`feed.generator_detail.name` ------------------------------------- Same as :ref:`reference.feed.generator`. .. _reference.feed.generator_detail.href: :py:attr:`feed.generator_detail.href` ------------------------------------- The :abbr:`URL (Uniform Resource Locator)` of the application used to generate the feed. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.feed.generator_detail.version: :py:attr:`feed.generator_detail.version` ---------------------------------------- The version number of the application used to generate the feed. There is no required format for this, but most applications use a MAJOR.MINOR version number. .. rubric:: Comes from * /atom03:feed/atom03:generator * /atom10:feed/atom10:generator * /rdf:RDF/rdf:channel/admin:generatorAgent/@rdf:resource * /rss/channel/generator .. seealso:: * :ref:`reference.feed.generator` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-icon.rst0000664000175000017500000000042214535121615017765 0ustar00kurtkurt:py:attr:`feed.icon` ==================== A URL to a small icon representing the feed. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom10:feed/atom10:icon ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-id.rst0000664000175000017500000000047414535121615017440 0ustar00kurtkurt.. _reference.feed.id: :py:attr:`feed.id` ================== A globally unique identifier for this feed. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom03:feed/atom03:id * /atom10:feed/atom10:id ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-image.rst0000664000175000017500000000540314535121615020123 0ustar00kurtkurt:py:attr:`feed.image` ===================== A dictionary with details about the feed image. A feed image can be a logo, banner, or a picture of the author. .. _reference.feed.image.title: :py:attr:`feed.image.title` --------------------------- The alternate text of the feed image, which would go in the alt attribute if you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img element. .. _reference.feed.image.href: :py:attr:`feed.image.href` -------------------------- The :abbr:`URL (Uniform Resource Locator)` of the feed image itself, which would go in the src attribute if you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img element. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.feed.image.link: :py:attr:`feed.image.link` -------------------------- The :abbr:`URL (Uniform Resource Locator)` which the feed image would point to. If you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img element, you would wrap it in an a element and put this in the href attribute. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.feed.image.width: :py:attr:`feed.image.width` --------------------------- The width of the feed image, which would go in the width attribute if you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img element. .. _reference.feed.image.height: :py:attr:`feed.image.height` ---------------------------- The height of the feed image, which would go in the height attribute if you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img element. :py:attr:`feed.image.description` --------------------------------- A short description of the feed image, which would go in the title attribute if you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img element. This element is rare; it was available in Netscape :abbr:`RSS (Rich Site Summary)` 0.91 but was dropped from Userland :abbr:`RSS (Rich Site Summary)` 0.91. .. rubric:: Annotated example This is a feed image: :: Feed logo http://example.org/logo.png http://example.org/ 80 15 Visit my home page This feed image could be rendered in :abbr:`HTML (HyperText Markup Language)` as this: ::
Feed logo .. rubric:: Comes from * /rdf:RDF/rdf:image * /rss/channel/image ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-info-detail.rst0000664000175000017500000000657614535121615021250 0ustar00kurtkurt.. _reference.feed.info_detail: :py:attr:`feed.info_detail` =========================== A dictionary with details about the feed info. .. rubric:: Comes from * /atom03:feed/atom03:info .. seealso:: * :ref:`reference.feed.info` .. _reference.feed.info_detail.value: :py:attr:`feed.info_detail.value` --------------------------------- Same as :ref:`reference.feed.info`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, they are :ref:`resolved according to a set of rules `. .. _reference.feed.info_detail.type: :py:attr:`feed.info_detail.type` -------------------------------- The content type of the feed info. Most likely values for :py:attr:`~feed.info_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`feed.info_detail.language` ------------------------------------ The language of the feed info. :py:attr:`~feed.info_detail.language` is supposed to be a language code, as specified by `:abbr:`RFC (Request For Comments)` 3066 `_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. :py:attr:`~feed.info_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~feed.info_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`feed.info_detail.base` -------------------------------- The original base :abbr:`URI (Uniform Resource Identifier)` for links within the feed copyright. :py:attr:`~feed.info_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-info.rst0000664000175000017500000000160714535121615017776 0ustar00kurtkurt.. _reference.feed.info: :py:attr:`feed.info` ==================== Free-form human-readable description of the feed format itself. Intended for people who view the feed in a browser, to explain what they just clicked on. This element is generally ignored by feed readers. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, they are :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom03:feed/atom03:info * /rss/channel/feedburner:browserFriendly .. seealso:: * :ref:`reference.feed.info_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-language.rst0000664000175000017500000000042514535121615020623 0ustar00kurtkurt.. _reference.feed.language: :py:attr:`feed.language` ======================== The primary language of the feed. .. rubric:: Comes from * /atom03:feed/@xml:lang * /atom10:feed/@xml:lang * /rdf:RDF/rdf:channel/dc:language * /rss/channel/dc:language * /rss/channel/language ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-license.rst0000664000175000017500000000070714535121615020465 0ustar00kurtkurt.. _reference.feed.license: :py:attr:`feed.license` ======================= A :abbr:`URL (Uniform Resource Locator)` of the license under which this feed is distributed. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom10:feed/atom10:link[@rel="license"]/@href * /rdf:RDF/cc:license/@rdf:resource * /rss/channel/creativeCommons:license ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-link.rst0000664000175000017500000000151214535121615017773 0ustar00kurtkurt.. _reference.feed.link: :py:attr:`feed.link` ==================== The :abbr:`URL (Uniform Resource Locator)` of the :abbr:`HTML (HyperText Markup Language)` page associated with this feed. For site feeds, this is probably the home page of the site. For category feeds, this is probably the category's archive page. For search feeds, this is probably the web page that displays the search results for the given search parameters. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom03:feed/atom03:link[@rel="alternate"]/@href * /atom10:feed/atom10:link[@rel="alternate"]/@href * /atom10:feed/atom10:link[not(@rel)]/@href * /rdf:RDF/rdf:channel/rdf:link * /rss/channel/link .. seealso:: * :ref:`reference.feed.links` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-links.rst0000664000175000017500000000255514535121615020166 0ustar00kurtkurt.. _reference.feed.links: :py:attr:`feed.links` ===================== A list of dictionaries with details on the links associated with the feed. Each link has a rel (relationship), type (content type), and href (the :abbr:`URL (Uniform Resource Locator)` that the link points to). Some links may also have a title. .. _reference.feed.links.rel: :py:attr:`feed.links[i].rel` ---------------------------- The relationship of this feed link. Atom 1.0 defines five standard link relationships and describes the process for registering others. Here are the five standard rel values: - `alternate` - `enclosure` - `related` - `self` - `via` .. _reference.feed.links.type: :py:attr:`feed.links[i].type` ----------------------------- The content type of the page that this feed link points to. .. _reference.feed.links.href: :py:attr:`feed.links[i].href` ----------------------------- The :abbr:`URL (Uniform Resource Locator)` of the page that this feed link points to. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`feed.links[i].title` ------------------------------ The title of this feed link. .. rubric:: Comes from * /atom03:feed/atom03:link * /atom10:feed/atom10:link * /rdf:RDF/rdf:channel/rdf:link * /rss/channel/link .. seealso:: * :ref:`reference.feed.link` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-logo.rst0000664000175000017500000000043214535121615017776 0ustar00kurtkurt:py:attr:`feed.logo` ==================== A URL to a graphic representing a logo for the feed. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom10:feed/atom10:logo ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-published.rst0000664000175000017500000000063314535121615021020 0ustar00kurtkurt.. _reference.feed.published: :py:attr:`feed.published` ========================= The date the feed was published, as a string in the same format as it was published in the original feed. This element is :ref:`parsed as a date ` and stored in :ref:`reference.feed.published_parsed`. .. rubric:: Comes from * /rss/channel/pubDate .. seealso:: * :ref:`reference.feed.published_parsed` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-published_parsed.rst0000664000175000017500000000043214535121615022353 0ustar00kurtkurt.. _reference.feed.published_parsed: :py:attr:`feed.published_parsed` ================================ The date the feed was published, as a standard :program:`Python` 9-tuple. .. rubric:: Comes from * /rss/channel/pubDate .. seealso:: * :ref:`reference.feed.published` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-publisher.rst0000664000175000017500000000047314535121615021040 0ustar00kurtkurt.. _reference.feed.publisher: :py:attr:`feed.publisher` ========================= The publisher of the feed. .. rubric:: Comes from * /rdf:RDF/rdf:channel/dc:publisher * /rss/channel/dc:publisher * /rss/channel/itunes:owner * /rss/channel/webMaster .. seealso:: * :ref:`reference.feed.publisher_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-publisher_detail.rst0000664000175000017500000000174414535121615022364 0ustar00kurtkurt.. _reference.feed.publisher_detail: :py:attr:`feed.publisher_detail` ================================ A dictionary with details about the feed publisher. :py:attr:`feed.publisher_detail.name` ------------------------------------- The name of this feed's publisher. .. _reference.feed.publisher_detail.href: :py:attr:`feed.publisher_detail.href` ------------------------------------- The :abbr:`URL (Uniform Resource Locator)` of this feed's publisher. This can be the publisher's home page, or a contact page with a webmail form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. :py:attr:`feed.publisher_detail.email` -------------------------------------- The email address of this feed's publisher. .. rubric:: Comes from * /rdf:RDF/rdf:channel/dc:publisher * /rss/channel/dc:publisher * /rss/channel/itunes:owner * /rss/channel/webMaster .. seealso:: * :ref:`reference.feed.publisher` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-rights.rst0000664000175000017500000000163714535121615020346 0ustar00kurtkurt.. _reference.feed.rights: :py:attr:`feed.rights` ====================== A human-readable copyright statement for the feed. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, they are :ref:`resolved according to a set of rules `. .. note:: For machine-readable copyright information, see :ref:`reference.feed.license`. .. rubric:: Comes from * /atom03:feed/atom03:copyright * /atom10:feed/atom10:rights * /rdf:RDF/rdf:channel/dc:rights * /rss/channel/copyright * /rss/channel/dc:rights .. seealso:: * :ref:`reference.feed.rights_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-rights_detail.rst0000664000175000017500000000705014535121615021663 0ustar00kurtkurt.. _reference.feed.rights_detail: :py:attr:`feed.rights_detail` ============================= A dictionary with details on the feed copyright. .. _reference.feed.rights_detail.value: :py:attr:`feed.rights_detail.value` ----------------------------------- Same as :ref:`reference.feed.rights`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, they are :ref:`resolved according to a set of rules `. .. _reference.feed.rights_detail.type: :py:attr:`feed.rights_detail.type` ---------------------------------- The content type of the feed copyright. Most likely values for :py:attr:`~feed.rights_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`feed.rights_detail.language` -------------------------------------- The language of the feed copyright. :py:attr:`~feed.rights_detail.language` is supposed to be a language code, as specified by `:abbr:`RFC (Request For Comments)` 3066 `_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. :py:attr:`~feed.rights_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~feed.rights_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`feed.rights_detail.base` ---------------------------------- The original base :abbr:`URI (Uniform Resource Identifier)` for links within the feed copyright. :py:attr:`~feed.rights_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* .. rubric:: Comes from * /atom03:feed/atom03:copyright * /atom10:feed/atom10:rights * /rdf:RDF/rdf:channel/dc:rights * /rss/channel/copyright * /rss/channel/dc:rights .. seealso:: * :ref:`reference.feed.rights` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-subtitle.rst0000664000175000017500000000165314535121615020677 0ustar00kurtkurt.. _reference.feed.subtitle: :py:attr:`feed.subtitle` ======================== A subtitle, tagline, slogan, or other short description of the feed. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, they are :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom03:feed/atom03:tagline * /atom10:feed/atom10:subtitle * /rdf:RDF/rdf:channel/dc:description * /rdf:RDF/rdf:channel/rdf:description * /rss/channel/dc:description * /rss/channel/description * /rss/channel/itunes:subtitle .. seealso:: * :ref:`reference.feed.subtitle_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-subtitle_detail.rst0000664000175000017500000000724114535121615022220 0ustar00kurtkurt.. _reference.feed.subtitle_detail: :py:attr:`feed.subtitle_detail` =============================== A dictionary with details about the feed subtitle. .. rubric:: Comes from * /atom03:feed/atom03:tagline * /atom10:feed/atom10:subtitle * /rdf:RDF/rdf:channel/dc:description * /rdf:RDF/rdf:channel/rdf:description * /rss/channel/dc:description * /rss/channel/description * /rss/channel/itunes:subtitle .. seealso:: * :ref:`reference.feed.subtitle` .. _reference.feed.subtitle_detail.value: :py:attr:`feed.subtitle_detail.value` ------------------------------------- Same as :ref:`reference.feed.subtitle`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. .. _reference.feed.subtitle_detail.type: :py:attr:`feed.subtitle_detail.type` ------------------------------------ The content type of the feed subtitle. Most likely values for :py:attr:`~feed.subtitle_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. :py:attr:`feed.subtitle_detail.language` ---------------------------------------- The language of the feed subtitle. :py:attr:`~feed.subtitle_detail.language` is supposed to be a language code, as specified by `:abbr:`RFC (Request For Comments)` 3066 `_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. :py:attr:`~feed.subtitle_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~feed.subtitle_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`feed.subtitle_detail.base` ------------------------------------ The original base :abbr:`URI (Uniform Resource Identifier)` for links within the feed subtitle. :py:attr:`~feed.subtitle_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-tags.rst0000664000175000017500000000206214535121615017775 0ustar00kurtkurt.. _reference.feed.tags: :py:attr:`feed.tags` ==================== A list of dictionaries that contain details of the categories for the feed. .. note:: Prior to version 4.0, :program:`Universal Feed Parser` exposed categories in ``feed.category`` (the primary category) and ``feed.categories`` (a list of tuples containing the domain and term of each category). These uses are still supported for backward compatibility, but you will not see them in the parsed results unless you explicitly ask for them. .. _reference.feed.tags.term: :py:attr:`feed.tags[i].term` ---------------------------- The category term (keyword). :py:attr:`feed.tags[i].scheme` ------------------------------ The category scheme (domain). :py:attr:`feed.tags[i].label` ----------------------------- A human-readable label for the category. .. rubric:: Comes from * /atom03:feed/dc:subject * /atom10:feed/category * /rdf:RDF/rdf:channel/dc:subject * /rss/channel/category * /rss/channel/dc:subject * /rss/channel/itunes:category * /rss/channel/itunes:keywords ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-textinput.rst0000664000175000017500000000331114535121615021101 0ustar00kurtkurt:py:attr:`feed.textinput` ========================= A text input form. No one actually uses this. Why are you? .. _reference.feed.textinput.title: :py:attr:`feed.textinput.title` ------------------------------- The title of the text input form, which would go in the value attribute of the form's submit button. .. _reference.feed.textinput.link: :py:attr:`feed.textinput.link` ------------------------------ The link of the script which processes the text input form, which would go in the action attribute of the form. If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is :ref:`resolved according to a set of rules `. .. _reference.feed.textinput.name: :py:attr:`feed.textinput.name` ------------------------------ The name of the text input box in the form, which would go in the name attribute of the form's input box. .. _reference.feed.textinput.description: :py:attr:`feed.textinput.description` ------------------------------------- A short description of the text input form, which would go in the label element of the form. .. rubric:: Annotated example This is a text input in a feed: :: Go! http://example.org/search keyword Search this site: This is how it could be rendered in :abbr:`HTML (HyperText Markup Language)`: ::
.. rubric:: Comes from * /rdf:RDF/rdf:textinput * /rss/channel/textInput * /rss/channel/textinput ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-title.rst0000664000175000017500000000146514535121615020166 0ustar00kurtkurt.. _reference.feed.title: :py:attr:`feed.title` ===================== The title of the feed. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, they are :ref:`resolved according to a set of rules `. .. rubric:: Comes from * /atom03:feed/atom03:title * /atom10:feed/atom10:title * /rdf:RDF/rdf:channel/dc:title * /rdf:RDF/rdf:channel/rdf:title * /rss/channel/dc:title * /rss/channel/title .. seealso:: * :ref:`reference.feed.title_detail` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-title_detail.rst0000664000175000017500000000711014535121615021501 0ustar00kurtkurt.. _reference.feed.title_detail: :py:attr:`feed.title_detail` ============================ A dictionary with details about the feed title. .. _reference.feed.title_detail.value: :py:attr:`feed.title_detail.value` ---------------------------------- Same as :ref:`reference.feed.title`. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, it is :ref:`sanitized ` by default. If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, they are :ref:`resolved according to a set of rules `. .. _reference.feed.title_detail.type: :py:attr:`feed.title_detail.type` --------------------------------- The content type of the feed title. Most likely values for :py:attr:`~feed.title_detail.type`: * :mimetype:`text/plain` * :mimetype:`text/html` * :mimetype:`application/xhtml+xml` For Atom feeds, the content type is taken from the type attribute, which defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by inspecting the content, and defaults to :mimetype:`text/html`. Note that this may cause silent data loss if the value contains plain text with angle brackets. There is nothing I can do about this problem; it is a limitation of :abbr:`RSS (Rich Site Summary)`. Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly specify that certain values default to :mimetype:`text/plain`, and :program:`Universal Feed Parser` should respect this, but it doesn't yet. .. _reference.feed.title_detail.language: :py:attr:`feed.title_detail.language` ------------------------------------- The language of the feed title. :py:attr:`~feed.title_detail.language` is supposed to be a language code, as specified by `:abbr:`RFC (Request For Comments)` 3066 `_, but publishers have been known to publish random values like "English" or "German". :program:`Universal Feed Parser` does not do any parsing or normalization of language codes. :py:attr:`~feed.title_detail.language` may come from the element's xml:lang attribute, or it may inherit from a parent element's xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the feed does not specify a language, :py:attr:`~feed.title_detail.language` will be ``None``, the :program:`Python` null value. :py:attr:`feed.title_detail.base` --------------------------------- The original base :abbr:`URI (Uniform Resource Identifier)` for links within the feed title. :py:attr:`~feed.title_detail.base` is only useful in rare situations and can usually be ignored. It is the original base :abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's xml:base attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the time you see it, :program:`Universal Feed Parser` has already resolved relative links in all values where it makes sense to do so. *Clients should never need to manually resolve relative links.* .. rubric:: Comes from * /atom03:feed/atom03:title * /atom10:feed/atom10:title * /rdf:RDF/rdf:channel/dc:title * /rdf:RDF/rdf:channel/rdf:title * /rss/channel/dc:title * /rss/channel/title .. seealso:: * :ref:`reference.feed.title` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-ttl.rst0000664000175000017500000000135114535121615017642 0ustar00kurtkurt.. _reference.feed.ttl: :py:attr:`feed.ttl` =================== According to the :abbr:`RSS (Rich Site Summary)` specification, "None" No one is quite sure what this means, and no one publishes feeds via file-sharing networks. Some clients have interpreted this element to be some sort of inline caching mechanism, albeit one that completely ignores the underlying :abbr:`HTTP (Hypertext Transfer Protocol)` protocol, its robust caching mechanisms, and the huge amount of :abbr:`HTTP (Hypertext Transfer Protocol)`-savvy network infrastructure that understands them. Given the vague documentation, it is impossible to say that this interpretation is any more ridiculous than the element itself. .. rubric:: Comes from * /rss/channel/ttl ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-updated.rst0000664000175000017500000000233114535121615020464 0ustar00kurtkurt.. _reference.feed.updated: :py:attr:`feed.updated` ======================= The date the feed was last updated, as a string in the same format as it was published in the original feed. This element is :ref:`parsed as a date ` and stored in :ref:`reference.feed.updated_parsed`. .. note:: As of version 5.1.1, if this key doesn't exist but :py:attr:`feed.published` does, the value of :py:attr:`feed.published` will be returned. In the past the RSS pubDate element was stored in `updated`, but this incorrect behavior was reported in issue 310. However, developers may have come to rely on this incorrect behavior -- as was reported in issue 328 -- so to help avoid hurting their users' experience, this mapping from `updated` to `published` was temporarily introduced to give developers time to update their software, and to give users time to upgrade. This mapping is temporary and will be removed in a future version of feedparser. .. rubric:: Comes from * /atom03:feed/atom03:modified * /atom10:feed/atom10:updated * /rdf:RDF/rdf:channel/dc:date * /rdf:RDF/rdf:channel/dcterms:modified * /rss/channel/dc:date .. seealso:: * :ref:`reference.feed.updated_parsed` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed-updated_parsed.rst0000664000175000017500000000216614535121615022030 0ustar00kurtkurt.. _reference.feed.updated_parsed: :py:attr:`feed.updated_parsed` ============================== The date the feed was last updated, as a standard :program:`Python` 9-tuple. .. note:: As of version 5.1.1, if this key doesn't exist but :py:attr:`feed.published_parsed` does, the value of :py:attr:`feed.published_parsed` will be returned. In the past the RSS pubDate element was stored in `updated`, but this incorrect behavior was reported in issue 310. However, developers may have come to rely on this incorrect behavior -- as was reported in issue 328 -- so to help avoid hurting their users' experience, this mapping from `updated_parsed` to `published_parsed` was temporarily introduced to give developers time to update their software, and to give users time to upgrade. This mapping is temporary and will be removed in a future version of feedparser. .. rubric:: Comes from * /atom03:feed/atom03:modified * /atom10:feed/atom10:updated * /rdf:RDF/rdf:channel/dc:date * /rdf:RDF/rdf:channel/dcterms:modified * /rss/channel/dc:date .. seealso:: * :ref:`reference.feed.updated` ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-feed.rst0000664000175000017500000000037014535121615017041 0ustar00kurtkurt:py:attr:`feed` =============== A dictionary of data about the feed. .. rubric:: Comes from * /atom03:feed * /atom10:feed * /rdf:RDF/rdf:channel * /rss/channel .. tip:: This element always exists, although it may be an empty dictionary. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-headers.rst0000664000175000017500000000060314535121615017550 0ustar00kurtkurt:py:attr:`headers` ================== A dictionary of all the :abbr:`HTTP (Hypertext Transfer Protocol)` headers received from the web server when retrieving the feed. .. tip:: :py:attr:`headers` will only be present if the feed was retrieved from a web server. If the feed was parsed from a local file or from a string in memory, :py:attr:`headers` will not be present. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-href.rst0000664000175000017500000000067614535121615017073 0ustar00kurtkurt:py:attr:`href` =============== The final :abbr:`URL (Uniform Resource Locator)` of the feed that was parsed. If the feed was redirected from the original requested address, :py:attr:`href` will contain the final (redirected) address. .. tip:: :py:attr:`href` will only be present if the feed was retrieved from a web server. If the feed was parsed from a local file or from a string in memory, :py:attr:`href` will not be present. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-modified.rst0000664000175000017500000000107614535121615017722 0ustar00kurtkurt:py:attr:`modified` =================== The last-modified date of the feed, as specified in the :abbr:`HTTP (Hypertext Transfer Protocol)` headers. The purpose of :py:attr:`modified` is explained more fully in :ref:`http.etag`. .. tip:: :py:attr:`modified` will only be present if the feed was retrieved from a web server, and only if the web server provided a Last-Modified :abbr:`HTTP (Hypertext Transfer Protocol)` header for the feed. If the feed was parsed from a local file or from a string in memory, :py:attr:`modified` will not be present. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-namespaces.rst0000664000175000017500000000112214535121615020251 0ustar00kurtkurt.. _reference.namespaces: :py:attr:`namespaces` ===================== A dictionary of all :abbr:`XML (Extensible Markup Language)` namespaces defined in the feed, as ``{prefix: namespaceURI}``. .. note:: The prefixes listed in the :py:attr:`namespaces` dictionary may not match the prefixes defined in the original feed. See :ref:`advanced.namespaces` for more details. .. tip:: This element always exists, although it may be an empty dictionary if the feed does not define any namespaces (such as an :abbr:`RSS (Rich Site Summary)` 2.0 feed with no extensions). ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-status.rst0000664000175000017500000000154414535121615017465 0ustar00kurtkurt:py:attr:`status` ================= The :abbr:`HTTP (Hypertext Transfer Protocol)` status code that was returned by the web server when the feed was fetched. If the feed was redirected from its original :abbr:`URL (Uniform Resource Locator)`, :py:attr:`status` will contain the redirect status code, not the final status code. If :py:attr:`status` is ``301``, the feed was permanently redirected to a new :abbr:`URL (Uniform Resource Locator)`. Clients should update their address book to request the new :abbr:`URL (Uniform Resource Locator)` from now on. If :py:attr:`status` is ``410``, the feed is gone. Clients should stop polling the feed. .. tip:: :py:attr:`status` will only be present if the feed was retrieved from a web server. If the feed was parsed from a local file or from a string in memory, :py:attr:`status` will not be present. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/reference-version.rst0000664000175000017500000000370314535121615017626 0ustar00kurtkurt.. _reference.version: :py:attr:`version` ================== The format and version of the feed. Here is the complete list of known feed types and versions that may be returned in :py:attr:`version`: ============ ==================================================================================== ``atom`` Atom (unknown or unrecognized version) ``atom01`` `Atom 0.1 `_ ``atom02`` `Atom 0.2 `_ ``atom03`` `Atom 0.3 `_ ``atom10`` `Atom 1.0 `_ ``cdf`` `CDF `_ ``rss`` :abbr:`RSS (Rich Site Summary)` (unknown or unrecognized version) ``rss090`` `RSS 0.90 `_ ``rss091n`` `Netscape RSS 0.91 `_ ``rss091u`` `Userland RSS 0.91 `_ ``rss092`` `RSS 0.92 `_ ``rss093`` `RSS 0.93 `_ ``rss094`` :abbr:`RSS (Rich Site Summary)` 0.94 (no accurate specification is known to exist) ``rss10`` `RSS 1.0 `_ ``rss20`` `RSS 2.0 `_ ============ ==================================================================================== If the feed type is completely unknown, :py:attr:`version` will be an empty string. .. tip:: This element always exists, although it may be an empty string if the version can not be determined. .. seealso:: `The Myth of RSS compatibility `_ Mark Pilgrim's excellent analysis of the extraordinary variety of incompatibilities each version of "RSS" introduced. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1679075946.0 feedparser-6.0.11/docs/reference.rst0000664000175000017500000000013414405125152016132 0ustar00kurtkurt.. _reference: Reference ######### .. toctree:: :maxdepth: 2 :glob: reference-* ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1702142861.0 feedparser-6.0.11/docs/resolving-relative-links.rst0000664000175000017500000002430614535121615021146 0ustar00kurtkurt.. _advanced.base: Relative Link Resolution ======================== Many feed elements and attributes are :abbr:`URI (Uniform Resource Identifier)`\s. :program:`Universal Feed Parser` resolves relative :abbr:`URI (Uniform Resource Identifier)`\s according to the `XML:Base `_ specification. We'll see how that works in a minute, but first let's talk about which values are treated as :abbr:`URI (Uniform Resource Identifier)`\s. Which Values Are :abbr:`URI (Uniform Resource Identifier)`\s ------------------------------------------------------------ These feed elements are treated as :abbr:`URI (Uniform Resource Identifier)`\s, and resolved if they are relative: * :ref:`reference.entry.author_detail.href` * :ref:`reference.entry.comments` * :ref:`reference.entry.contributors.href` * :ref:`reference.entry.enclosures.href` * :ref:`reference.entry.id` * :ref:`reference.entry.license` * :ref:`reference.entry.link` * :ref:`reference.entry.links.href` * :ref:`reference.entry.publisher_detail.href` * :ref:`reference.entry.source.author_detail.href` * :ref:`reference.entry.source.contributors.href` * :ref:`reference.entry.source.links.href` * :ref:`reference.feed.author_detail.href` * :ref:`reference.feed.contributors.href` * :ref:`reference.feed.docs` * :ref:`reference.feed.generator_detail.href` * :ref:`reference.feed.id` * :ref:`reference.feed.image.href` * :ref:`reference.feed.image.link` * :ref:`reference.feed.license` * :ref:`reference.feed.link` * :ref:`reference.feed.links.href` * :ref:`reference.feed.publisher_detail.href` * :ref:`reference.feed.textinput.link` In addition, several feed elements may contain :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)` markup. Certain elements and attributes in :abbr:`HTML (HyperText Markup Language)` can be relative :abbr:`URI (Uniform Resource Identifier)`\s, and :program:`Universal Feed Parser` will resolve these :abbr:`URI (Uniform Resource Identifier)`\s according to the same rules as the feed elements listed above. These feed elements may contain :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)` markup. In Atom feeds, whether these elements are treated as :abbr:`HTML (HyperText Markup Language)` depends on the value of the type attribute. In :abbr:`RSS (Rich Site Summary)` feeds, these values are always treated as :abbr:`HTML (HyperText Markup Language)`. * :ref:`reference.entry.content.value` * :ref:`reference.entry.summary` (:ref:`reference.entry.summary_detail.value`) * :ref:`reference.entry.title` (:ref:`reference.entry.title_detail.value`) * :ref:`reference.feed.info` (:ref:`reference.feed.info_detail.value`) * :ref:`reference.feed.rights` (:ref:`reference.feed.rights_detail.value`) * :ref:`reference.feed.subtitle` (:ref:`reference.feed.subtitle_detail.value`) * :ref:`reference.feed.title` (:ref:`reference.feed.title_detail.value`) When any of these feed elements contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)` markup, the following :abbr:`HTML (HyperText Markup Language)` elements are treated as :abbr:`URI (Uniform Resource Identifier)`\s and are resolved if they are relative: * * * *