Python多线程抓取Google搜索链接网页

1）urllib2+BeautifulSoup抓取Goolge搜索链接

近期，参加的项目需要对Google搜索功效举办处理惩罚，之前进修了Python处理惩罚网页相关的东西。实际应用中，利用了urllib2和beautifulsoup来举办网页的抓取，可是在抓取google搜索功效的时候，发明假如是直接对google搜索功效页面的源代码举办处理惩罚，会获得许多“脏”链接。

看下图为搜索“titanic james”的功效：

QQ截图20130407145449

图中赤色标志的是不需要的，蓝色标志的是需要抓取处理惩罚的。

这种“脏链接”虽然可以通过法则过滤的要领来过滤掉，可是这样措施的巨大度就高了。合法本身没精打彩的正在写过滤法则时。同学提醒说google应该提供相关的api，才恍然大大白。

（2）Google Web Search API+多线程

文档中给出利用Python举办搜索的例子：

import simplejson 
# The request also includes the userip parameter which provides the end

# user's IP address. Doing so will help distinguish this legitimate

# server-side traffic from traffic which doesn't come from an end-user.

url = ('https://ajax.googleapis.com/ajax/services/search/web'

       '?v=1.0&q=Paris%20Hilton&userip=USERS-IP-ADDRESS') 
request = urllib2.Request(

    url, None, {'Referer': /* Enter the URL of your site here *

当前位置：以往代写 > Python教程 >Python多线程抓取Google搜索链接网页

Python多线程抓取Google搜索链接网页

Python多线程抓取Google搜索链接网页

Python多线程抓取Google搜索链接网页

在线提交作业

当前位置：以往代写 > Python教程 >Python多线程抓取Google搜索链接网页

Python多线程抓取Google搜索链接网页

关键字：

在线提交作业