Python爬虫使用代理IP的实现

在爬取网站数据时，有些网站会限制同一 IP 地址的请求频率，为了避免被封禁 IP，我们可以使用代理 IP 来发送请求。以下是 Python 爬虫使用代理 IP 的实现方法。

使用 requests 模块发送请求

使用 requests 模块发送请求时，可以通过 proxies 参数设置代理 IP。以下是一个使用 requests 模块发送请求的示例：

import requests

url = 'http://www.example.com'
proxies = {
    'http': 'http://127.0.0.1:8888',
    'https': 'http://127.0.0.1:8888'
}
response = requests.get(url, proxies=proxies)
print(response.text)

在上面的示例中，我们使用 requests 模块发送了一个 GET 请求，并设置了代理 IP，然后打印了响应的文本内容。

使用 urllib.request 模块发送请求

使用 urllib.request 模块发送请求时，可以通过 ProxyHandler 对象设置代理 IP。以下是一个使用 urllib.request 模块发送请求的示例：

from urllib import request

url = 'http://www.example.com'
proxy_handler = request.ProxyHandler({'http': 'http://127.0.0.1:8888', 'https': 'http://127.0.0.1:8888'})
opener = request.build_opener(proxy_handler)
response = opener.open(url)
print(response.read().decode('utf-8'))

在上面的示例中，我们使用 urllib.request 模块发送了一个 GET 请求，并设置了代理 IP，然后打印了响应的文本内容。

使用 scrapy 框架发送请求

使用 scrapy 框架发送请求时，可以通过设置 DOWNLOADER_MIDDLEWARES 配置项来设置代理 IP。以下是一个使用 scrapy 框架发送请求的示例：

import scrapy

class ExampleSpider(scrapy.Spider):
    name = 'example'
    start_urls = ['http://www.example.com']

    custom_settings = {
        'DOWNLOADER_MIDDLEWARES': {
            'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 543,
            'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
            'example.middlewares.ProxyMiddleware': 125,
        },
        'RETRY_TIMES': 10,
        'DOWNLOAD_TIMEOUT': 30,
    }

    def parse(self, response):
        print(response.text)

在上面的示例中，我们使用 scrapy 框架发送了一个 GET 请求，并设置了代理 IP，然后打印了响应的文本内容。

以上是 Python 爬虫使用代理 IP 的实现方法，希望对您有所帮助。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：Python爬虫使用代理IP的实现 - Python技术站

Python爬虫使用代理IP的实现

Python爬虫使用代理IP的实现

使用 requests 模块发送请求

使用 urllib.request 模块发送请求

使用 scrapy 框架发送请求

相关文章