Utilities¶
The utilities package and its modules provide functions and helpers for transforming the scraped data.
- tofeed.utilities.shorten_to_title(string, length, separator=' ', appendix='...')¶
Use the string to create a human readable title.
The function searches for the last occurrence of the separator string within the range of the string limited by the length parameter and appends the appendix.
Parameters: - string (str) – The string to create the title out of
- length (int) – The approximate length of the title. The length is only approximate to this value, because it’s possible that the next separator string encountered may lie further ahead.
- separator (str) – The character that separates words in the text. Usually of course a space, this should only have to be changed in very rare cases.
- appendix (str) – The string to append to the end of the title
Returns: The title created from the string
Spoon¶
Helper functions for modifying BeautifulSoup objects.
- tofeed.utilities.spoon.absolutize_references(base_url, tag, attributes=['href', 'src'], recursive=True)¶
Turns references found within the tag’s attributes absolute.
Parameters: - base_url (str) – The base URL used to absolutize the references
- tag (bs4.element.Tag) – The tag to absolutize references in
- attributes (list) – The attributes containing the URLs that should be made absolute
- recursive (bool) – If true the tag and all its sub tags will be searched, else only the tag and its direct descendants will be searched.
- tofeed.utilities.spoon.collapse_tag(tag)¶
Replaces the tag’s descendants with their strings.
Parameters: tag (bs4.element.Tag) – The tag to collapse.
- tofeed.utilities.spoon.convert_newlines(tag, recursive=True)¶
Replaces newline characters found in the tag’s strings with line break tags.
Parameters: - tag (bs4.element.Tag) – The tag to convert newline characters in.
- recursive (bool) – If true the tag and all its sub tags will be searched, else only the tag and its direct descendants will be searched.
- tofeed.utilities.spoon.new_string = <bound method BeautifulSoup.new_string of >¶
Shortcut for BeautifulSoup.new_string(). This allows using the bound methods of the BeautifulSoup class without having to instantiate it manually.
- tofeed.utilities.spoon.new_tag = <bound method BeautifulSoup.new_tag of >¶
Shortcut for BeautifulSoup.new_tag(). This allows using the bound methods of the BeautifulSoup class without having to instantiate it manually.
- tofeed.utilities.spoon.replace_string_with_tag(tag, string, replacement, recursive=True)¶
Replaces all occurrences of string within the tag’s strings with the replacement tag.
Parameters: - tag (bs4.element.Tag) – The tag to replace strings in
- string (str) – The string to replace
- bs4.element.Tag replacement (str) – The tag replacing the string
- recursive (bool) – If true the tag and all its sub tags will be searched, else only the tag and its direct descendants will be searched.