Importing base64 library because we’ll need it ONLY in case if the proxy we are going to use requires authentication

import base64

Start your middleware class

class ProxyMiddleware(object):
# overwrite process request
def process_request(self, request, spider):
# Set the location of the proxy
request.meta[‘proxy’] = “http://YOUR_PROXY_IP:PORT”

    # Use the following lines if your proxy requires authentication
    proxy_user_pass = "USERNAME:PASSWORD"
    # setup basic authentication for the proxy
    encoded_user_pass = base64.encodestring(proxy_user_pass)
    request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

该代码片段来自于: http://www.sharejs.com/codes/python/8309

2.在项目配置文件里(./project_name/settings.py)添加python
DOWNLOADER_MIDDLEWARES = {
‘scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware’: 110,
‘project_name.middlewares.ProxyMiddleware’: 100,
}

只要两步，现在请求就是通过代理的了。测试一下^_^python
from scrapy.spider import BaseSpider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.http import Request

class TestSpider(CrawlSpider):
name = “test”
domain_name = “whatismyip.com”
# The following url is subject to change, you can get the last updated one from here :
# http://www.whatismyip.com/faq/automation.asp
start_urls = [“http://xujian.info”]

def parse(self, response):
    open('test.html', 'wb').write(response.body)

“`

标签：python

当前位置：以往代写 > Python教程 >python scrapy 网络采集使用代理的方法

python scrapy 网络采集使用代理的方法

python scrapy 网络采集使用代理的方法

python scrapy 网络采集使用代理的方法

Importing base64 library because we’ll need it ONLY in case if the proxy we are going to use requires authentication

Start your middleware class

该代码片段来自于: http://www.sharejs.com/codes/python/8309

在线提交作业

当前位置：以往代写 > Python教程 >python scrapy 网络采集使用代理的方法

python scrapy 网络采集使用代理的方法

Importing base64 library because we’ll need it ONLY in case if the proxy we are going to use requires authentication

Start your middleware class

该代码片段来自于: http://www.sharejs.com/codes/python/8309

关键字：

在线提交作业