I found this code on Python For Beginner (www.pythonforbeginners.com)
In this script, we are going to use the re module to get all links from any website.
One of the most powerful function in the re module is "re.findall()".
While re.search() is used to find the first match for a pattern, re.findall() finds *all*
the matches and returns them as a list of strings, with each string representing one match.
Get all links from a website
This example will get all the links from any websites HTML code.
To find all the links, we will in this example use the urllib2 module together
with the re.module
import urllib2 import re #connect to a URL website = urllib2.urlopen(url) #read html code html = website.read() #use re.findall to get all the links links = re.findall('"((http|ftp)s?://.*?)"', html) print linksThis code works, I testet it. Below
HOWTO Fetch Internet Resources Using urllib2¶ You can read more about urllib2 library and syntax.
This code works, I testet it. Below are urls from this blog extratcted by the code/
[('http://learningpythontobuildbusiness.blogspot.com/feeds/posts/default', 'http'), ('http://learningpythontobuildbusiness.blogspot.com/feeds/posts/default?alt=rss', 'http'), ('https://www.blogger.com/feeds/7427797381533068430/posts/default', 'http'),
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
('https://www.blogger.com/static/v1/widgets/1929302928-widgets.js', 'http')]