Reference
- readability.browser.open_in_browser(html)
Open the HTML document in a web browser, saving it to a temporary file to open it. Note that this does not delete the file after use. This is mainly meant for debugging.
- readability.encoding.fix_charset(encoding)
Overrides encoding when charset declaration or charset determination is a subset of a larger charset. Created because of issues with Chinese websites
- class readability.readability.Document(input, positive_keywords=None, negative_keywords=None, url=None, min_text_length=25, retry_length=250, xpath=False, handle_failures='discard')
Bases:
object
Class to build a etree document out of html.
- author()
Returns document author
- content()
Returns document body
- get_clean_html()
An internal method, which can be overridden in subclasses, for example, to disable or to improve DOM-to-text conversion in .summary() method
- short_title()
Returns cleaned up document title
- summary(html_partial=False, keep_all_images=False)
Given a HTML file, extracts the text of the article.
- Parameters:
html_partial – return only the div of the document, don’t wrap in html and body tags.
keep_all_images – Keep all images in summary.
Warning: It mutates internal DOM representation of the HTML document, so it is better to call other API methods before this one.
- title()
Returns document title
- exception readability.readability.Unparseable
Bases:
ValueError