python - url errors in beautiful soup -

i trying obtain data-pid , price craigslist using beautifulsoup. have written separate code gives me file clallsites.txt. in code trying grab each of sites txt file , pids of entries in first 10 pages. code is:

  bs4 import beautifulsoup          urllib2 import urlopen    readfile = open("clallsites.txt")   product = "mcy"   while 1:     u = ""     count = 0     line = readfile.readline()     commaposition = line.find(',')     site = line[0:commaposition]     location = line[commaposition+1:]     site_filename = location + '.txt'     f = open(site_filename, "a")     while (count < 10):        sitenow = site + "\\" + product + "\\" + str(u)        html = urlopen(str(sitenow))                              soup = beautifulsoup(html)                        postings = soup('p',{"class":"row"})        post in postings:             y = post['data-pid']             print y        count = count +1        index = count*100        u = "index" + str(index) + ".html"     if not line:         break     pass

my clallsites.txt looks this:

craiglist site, location (stackoverflow not allow posting cragslist links cannot show text, try attach text file if helps.)

when run code following error:

traceback (most recent call last):

file "reading.py", line 16, in html = urlopen(str(sitenow))

file "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout)

file "/usr/lib/python2.7/urllib2.py", line 400, in open response = self._open(req, data)

file "/usr/lib/python2.7/urllib2.py", line 418, in _open '_open', req)

file "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args)

file "/usr/lib/python2.7/urllib2.py", line 1207, in http_open return self.do_open(httplib.httpconnection, req)

file "/usr/lib/python2.7/urllib2.py", line 1177, in do_open raise urlerror(err)

urllib2.urlerror:

any ideas doing wrong?

i don't know content of sitenow, looks invalid url. note urls use slashes , not backslashes (so statement sould similar sitenow = site + "/" + product + "/" + str(u))

SoEM

Search This Blog

python - url errors in beautiful soup -

Comments

Post a Comment