wolfmat
Confirmed Asshole
So you're writing python there.How do I check if an element is already in a list/array?
I have the following code:
Code:for item in page_info: if len(item.contents) > 3: a_element = item.contents[3].find("a") if a_element: link = a_element.get('href') if "/places" in link: full_url.append(url + link)
I'm scraping a page for url links and placing them into "link." The url's coming from this part, look like "/places/blahblahblah" or "/events/blahblahblah." The last piece of two lines of the code are making sure it is only grabbing the "/places" urls.
The problem is, that there are duplicates urls on each page. How do I check if the url has already been scraped before adding it into the full_url array?
I would appreciate any help/direction.
Sets guarantee that each item is unique.
If you add() an element that is already in the set, nothing happens.
Code:
url_set = set()
for item in page_info:
if len(item.contents) > 3:
a_element = item.contents[3].find("a")
if a_element:
link = a_element.get('href')
if "/places" in link:
url_set.add(url + link)
Edit: You can also do what you did and then use the set to make the array unique once the for-loop is through:
Code:
full_url = list(set(full_url))
Edit: Note that mathematically, sets do not guarantee order preservation, but in python, they just _do_ preserve it. Just so you know.
Edit: Also note that you of course _could_ scan the array each time and append() conditionally:
Code:
if (url+link) not in full_url:
full_url.append(url+link)
sets internally operate on hashes if I'm not mistaken (similar to dictionaries), so complexity of unique adding is linear (lexical lookup).