Link Grabber
Link Grabber provides a quick and easy way to grab links from a single web page. This python package is a simple wrapper around BeautifulSoup, focusing on grabbing HTML's hyperlink tag, "a."
Documentation
find
- Parameters:
- filters (dict): Beautiful Soup's filters as a dictionary
- limit (int): Limit the number of links in sequential order
- reverse (bool): Reverses how the list of <a> tags are sorted
- sort (function): Accepts a function that accepts which key to sort upon within the List class
Sort by a link's attribute:
from linkGrabber import Links
links = Links("http://www.google.com")
links.find(limit=3, sort=lambda key: key['text'])
Exclude text:
import re
from linkGrabber import Links
links = Links("http://www.google.com")
links.find(exclude=[{ "text": re.compile("Read More") }])
Remove duplicate URLs and make the output pretty:
from linkGrabber import Links links = Links("http://www.google.com") links.find(duplicates=False, pretty=True)
The codes working. Depend on connection to website.
'href': 'http://www.google.lt/imghp?hl=lt&tab=wi',
u'seo': 'imghp?hl=lt&tab=wi',
u'text': u'Vaizdai'},
{ 'class': ['gb1'],
'href': 'http://maps.google.lt/maps?hl=lt&tab=wl',
u'seo': 'maps?hl=lt&tab=wl',
u'text': u'\u017dem\u0117lapiai'},
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
{ 'href': '/intl/lt/policies/privacy/', u'seo': '', u'text': u'Privatumas'},
{ 'href': '/intl/lt/policies/terms/', u'seo': '', u'text': u'S\u0105lygos'}]
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.