Python常用的爬虫技巧总结

yizhihongxing

Python常用的爬虫技巧总结

在本攻略中,我们将介绍Python常用的爬虫技巧,包括如何使用requests库发送HTTP请求、如何使用BeautifulSoup库解析HTML文档、如何使用正则表达式提取数据、如何使用Selenium库模拟浏览器行为、如何使用代理IP和用户代理等技巧。我们将提供两个示例,演示如何使用这些技巧爬取网页数据。

步骤1:安装必要的库

在开始之前,我们需要安装必要的库。我们可以使用以下命令来安装这些库:

pip install requests beautifulsoup4 selenium

步骤2:使用requests库发送HTTP请求

requests库是Python中最常用的HTTP库之一,它提供了简单易用的API,可以轻松地发送HTTP请求并获取响应数据。我们可以按照以下步骤来使用requests库发送HTTP请求:

  1. 导入requests库。
import requests
  1. 发送HTTP请求并获取响应数据。
url = 'http://example.com'
response = requests.get(url)
html = response.text

在上面的代码中,我们定义了一个URL,并使用requests库的get()方法发送HTTP请求并获取响应数据。我们使用response.text属性获取响应数据的HTML文本。

步骤3:使用BeautifulSoup库解析HTML文档

BeautifulSoup库是Python中最常用的HTML解析库之一,它可以将HTML文档解析为Python对象,并提供了简单易用的API,可以轻松地提取数据。我们可以按照以下步骤来使用BeautifulSoup库解析HTML文档:

  1. 导入BeautifulSoup库。
from bs4 import BeautifulSoup
  1. 将HTML文档解析为Python对象。
soup = BeautifulSoup(html, 'html.parser')

在上面的代码中,我们使用BeautifulSoup库将HTML文档解析为Python对象。我们使用'html.parser'作为解析器。

  1. 提取数据。
title = soup.title.text

在上面的代码中,我们使用text属性获取标签的文本内容。</p> <h2>步骤4:使用正则表达式提取数据</h2> <p>正则表达式是一种强大的文本处理工具,可以用来匹配和提取文本中的数据。我们可以按照以下步骤来使用正则表达式提取数据:</p> <ol> <li>导入re库。</li> </ol> <pre><code class="language-python">import re </code></pre> <ol> <li>编写正则表达式。</li> </ol> <pre><code class="language-python">pattern = r'<title>(.*?)</title>' </code></pre> <p>在上面的代码中,我们定义了一个正则表达式,用于匹配<title>标签的文本内容。</p> <ol> <li>使用re库匹配和提取数据。</li> </ol> <pre><code class="language-python">match = re.search(pattern, html) title = match.group(1) </code></pre> <p>在上面的代码中,我们使用re库的search()方法匹配正则表达式,并使用group()方法提取匹配到的数据。</p> <h2>步骤5:使用Selenium库模拟浏览器行为</h2> <p>Selenium库是Python中最常用的Web自动化测试库之一,它可以模拟浏览器行为,包括点击、输入、滚动等操作。我们可以按照以下步骤来使用Selenium库模拟浏览器行为:</p> <ol> <li>导入Selenium库。</li> </ol> <pre><code class="language-python">from selenium import webdriver </code></pre> <ol> <li>创建浏览器对象。</li> </ol> <pre><code class="language-python">driver = webdriver.Chrome() </code></pre> <p>在上面的代码中,我们创建了一个Chrome浏览器对象。</p> <ol> <li>打开网页。</li> </ol> <pre><code class="language-python">url = 'http://example.com' driver.get(url) </code></pre> <p>在上面的代码中,我们使用get()方法打开了一个网页。</p> <ol> <li>模拟浏览器行为。</li> </ol> <pre><code class="language-python">element = driver.find_element_by_xpath('//input[@name="q"]') element.send_keys('Python') element.submit() </code></pre> <p>在上面的代码中,我们使用find_element_by_xpath()方法查找一个输入框,并使用send_keys()方法输入文本。然后,我们使用submit()方法提交表单。</p> <h2>步骤6:使用代理IP和用户代理</h2> <p>代理IP和用户代理是爬虫中常用的技巧,可以帮助我们隐藏真实的IP地址和浏览器信息,从而避免被封禁。我们可以按照以下步骤来使用代理IP和用户代理:</p> <ol> <li>定义代理IP和用户代理。</li> </ol> <pre><code class="language-python">proxies = { 'http': 'http://127.0.0.1:8888', 'https': 'https://127.0.0.1:8888' } headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } </code></pre> <p>在上面的代码中,我们定义了一个代理IP和一个用户代理。</p> <ol> <li>使用代理IP和用户代理发送HTTP请求。</li> </ol> <pre><code class="language-python">url = 'http://example.com' response = requests.get(url, proxies=proxies, headers=headers) html = response.text </code></pre> <p>在上面的代码中,我们使用requests库发送HTTP请求,并使用proxies参数和headers参数设置代理IP和用户代理。</p> <h2>示例1:使用requests库和BeautifulSoup库爬取网页数据</h2> <p>以下是一个示例代码,演示如何使用requests库和BeautifulSoup库爬取网页数据:</p> <pre><code class="language-python">import requests from bs4 import BeautifulSoup url = 'http://example.com' response = requests.get(url) html = response.text soup = BeautifulSoup(html, 'html.parser') title = soup.title.text print(title) </code></pre> <p>在上面的代码中,我们首先使用requests库发送HTTP请求,并使用response.text属性获取响应数据的HTML文本。然后,我们使用BeautifulSoup库将HTML文本解析为Python对象,并使用text属性获取<title>标签的文本内容。最后,我们使用print()函数输出标题。</p> <h2>示例2:使用Selenium库模拟浏览器行为</h2> <p>以下是一个示例代码,演示如何使用Selenium库模拟浏览器行为:</p> <pre><code class="language-python">from selenium import webdriver driver = webdriver.Chrome() url = 'http://example.com' driver.get(url) element = driver.find_element_by_xpath('//input[@name="q"]') element.send_keys('Python') element.submit() print(driver.title) driver.quit() </code></pre> <p>在上面的代码中,我们首先创建了一个Chrome浏览器对象,并使用get()方法打开了一个网页。然后,我们使用find_element_by_xpath()方法查找一个输入框,并使用send_keys()方法输入文本。最后,我们使用submit()方法提交表单,并使用title属性获取网页标题。最后,我们使用quit()方法关闭浏览器。</p> <div class="entry-readmore"><div class="entry-readmore-btn"></div></div> <div class="entry-copyright"><p>本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:<a href="https://pythonjishu.com/jonhakwgvtszmbt/">Python常用的爬虫技巧总结 - Python技术站</a></p></div> </div> <div class="entry-tag"><a href="https://pythonjishu.com/tag/beautifulsoup/" rel="tag">BeautifulSoup</a><a href="https://pythonjishu.com/tag/python/" rel="tag">python</a></div> <div class="entry-action"> <div class="btn-zan" data-id="139842"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up-fill"></use></svg></i> 赞 <span class="entry-action-num">(0)</span></div> </div> <div class="entry-bar"> <div class="entry-bar-inner"> <div class="entry-bar-info entry-bar-info2"> <div class="info-item meta"> <a class="meta-item j-heart" href="javascript:;" data-id="139842"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i> <span class="data">0</span></a> <a class="meta-item" href="#comments"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i> <span class="data">0</span></a> <a class="meta-item dashang" href="javascript:;"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-cny-circle-fill"></use></svg></i> 打赏 <span class="dashang-img dashang-img2"> <span> <img src="//pythonjishu.com/wp-content/uploads/2023/02/2023-02-06_10-34-29.jpg" alt="微信扫一扫"/> 微信扫一扫 </span> <span> <img src="//pythonjishu.com/wp-content/uploads/2023/02/2023-02-06_10-35-01.jpg" alt="支付宝扫一扫"/> 支付宝扫一扫 </span> </span> </a> </div> <div class="info-item share"> <a class="meta-item mobile j-mobile-share" href="javascript:;" data-id="139842" data-qrcode="https://pythonjishu.com/jonhakwgvtszmbt/"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-share"></use></svg></i> 生成海报</a> <a class="meta-item wechat" data-share="wechat" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-wechat"></use></svg></i> </a> <a class="meta-item weibo" data-share="weibo" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-weibo"></use></svg></i> </a> <a class="meta-item qq" data-share="qq" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-qq"></use></svg></i> </a> <a class="meta-item qzone" data-share="qzone" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-qzone"></use></svg></i> </a> </div> <div class="info-item act"> <a href="javascript:;" id="j-reading"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-article"></use></svg></i></a> </div> </div> </div> </div> </div> <div class="entry-page"> <div class="entry-page-prev entry-page-nobg"> <a href="https://pythonjishu.com/qlvjbxqxviqknck/" title="python&MongoDB爬取图书馆借阅记录" rel="prev"> <span>python&MongoDB爬取图书馆借阅记录</span> </a> <div class="entry-page-info"> <span class="pull-left"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-arrow-left-double"></use></svg></i> 上一篇</span> <span class="pull-right">2023年5月15日</span> </div> </div> <div class="entry-page-next entry-page-nobg"> <a href="https://pythonjishu.com/iqmtjynutgzbzqo/" title="Python实现鼠标自动在屏幕上随机移动功能" rel="next"> <span>Python实现鼠标自动在屏幕上随机移动功能</span> </a> <div class="entry-page-info"> <span class="pull-right">下一篇 <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-arrow-right-double"></use></svg></i></span> <span class="pull-left">2023年5月15日</span> </div> </div> </div> <div class="entry-related-posts"> <h3 class="entry-related-title">相关文章</h3><ul class="entry-related cols-3 post-loop post-loop-default"><li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/gimipaknffucwbj/" rel="bookmark"> Python+微信接口实现运维报警 </a> </h3> <div class="item-excerpt"> <p>Python+微信接口实现运维报警 在运维工作中,报警是非常重要的一环。本文将介绍如何使用Python和微信接口实现运报警功能。我们将使用Python requests库来发送HTTP请求,微信公众平台提供的接口来发送报警信息。 准备工作 在开始之前,我们需要准备以下工作: 一个微信公众号,用于接收报警信息。 一个用于发送报警信息的Python脚本。 一个用…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月13日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/gimipaknffucwbj/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/yxlqyqxppvemhuj/" rel="bookmark"> python对RabbitMQ的简单入门使用教程 </a> </h3> <div class="item-excerpt"> <p>Python对RabbitMQ的简单入门使用教程 RabbitMQ是一个开源的消息队列系统,可以用于实现异步消息传递、任务分发等功能。Python提供了多种库,可以用于与RabbitMQ进行交互。本文将详细讲解如何使用Python对RabbitMQ进行简单入门使用,包括如何安装RabbitMQ、如何使用pika库、如何发送和接收消息等。 安装RabbitMQ…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月15日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/yxlqyqxppvemhuj/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/cgfyaqxbtpxbehj/" rel="bookmark"> Python使用itertools模块实现排列组合功能示例 </a> </h3> <div class="item-excerpt"> <p>以下是“Python使用itertools模块实现排列组合功能”的完整攻略。 模块介绍 itertools是Python的标准库之一,提供用于高效利用内存的各种迭代器函数。在处理排列组合问题时,itertools提供的几个函数特别有用,包括: itertools.permutations(iterable, r=None):返回可迭代对象iterable的所…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月14日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/cgfyaqxbtpxbehj/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/python-lambda-f/" rel="bookmark"> 详解Python lambda函数 </a> </h3> <div class="item-excerpt"> <p>下面是Python lambda函数的完整攻略。 Python lambda函数 Python中的lambda函数是一种匿名函数,它通常用于简单的函数定义,可以用于需要函数对象的任何地方。lambda函数的主要特点是它是匿名的,即没有具体的名称。 基本语法 lambda函数的基本语法如下: lambda arguments: expression 其中,ar…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python-answer/python-answer-2/" target="_blank">python-answer</a> <span class="item-meta-li date">2023年3月25日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/python-lambda-f/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-myimg"><div class="wpcom_myimg_wrap __flow"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-2252152819722406" crossorigin="anonymous"></script> <!-- 通用 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-2252152819722406" data-ad-slot="5528197265" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div></li><li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/fkhksdrswowdiqk/" rel="bookmark"> python中如何打包用户自定义模块 </a> </h3> <div class="item-excerpt"> <p>打包用户自定义模块是在Python中快速分发代码的常见需求之一。以下是打包用户自定义模块的完整攻略: 步骤1:创建模块 首先,你需要创建一个Python模块。在此需注意以下几点: 模块应该有一个有意义的名称,并命名为.py文件,例如mypackage.py。 在模块中定义类、函数和变量,以实现你期望的功能。 步骤2:创建模块的文件夹 接下来,你需要为模块创建…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年6月3日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/fkhksdrswowdiqk/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/mczijttpucbdasu/" rel="bookmark"> Python简易计算器制作方法代码详解 </a> </h3> <div class="item-excerpt"> <p>下面就是关于“Python简易计算器制作方法代码详解”的完整攻略: 1. 准备工作 首先你需要安装Python运行环境(建议使用Python 3.x版本),选择一个文本编辑器编写代码。 2. 编写代码 计算器的代码需要实现以下功能:能够进行加减乘除运算,用户可以输入运算符和数字,程序会输出运算结果。 首先,我们需要获取用户输入的运算符和数字,可以使用Pyth…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月31日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/mczijttpucbdasu/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/bsthlteujjrnrpd/" rel="bookmark"> Python求两个list的差集、交集与并集的方法 </a> </h3> <div class="item-excerpt"> <p>以下是详细讲解“Python求两个list的差集、交集与并集的方法”的完整攻略。 在Python中,可以使用set集合来求两个列表的差集、交集和并集。下面是一些常见的方法。 求差集 求两个列表的差集,可以使用set集合的差集操作。例如: lst1 = [1, 2, 3, 4, 5] lst2 = [3, 4, 5, 6, 7] diff = list(set…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月13日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/bsthlteujjrnrpd/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/plfmjsbtdbkhoqo/" rel="bookmark"> Python编程使用matplotlib挑钻石seaborn画图入门教程 </a> </h3> <div class="item-excerpt"> <p>Python编程使用Matplotlib和Seaborn绘制钻石数据图表入门教程 介绍 数据可视化是数据科学家不可或缺的一种能力。Python中的Matplotlib和Seaborn是两个强大的数据可视化库。在这个入门教程中,我们将演示如何使用Matplotlib和Seaborn来绘制钻石数据图表。 安装和初始化 Matplotlib和Seaborn是Pyt…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月19日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/plfmjsbtdbkhoqo/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> </ul> </div> </article> </main> <aside class="sidebar"> <div class="widget widget_html_myimg"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-2252152819722406" crossorigin="anonymous"></script> <!-- 通用 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-2252152819722406" data-ad-slot="5528197265" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div><div class="widget widget_tags"><h3 class="widget-title"><span>热门标签</span></h3> <div class="tagcloud"> <a href="https://pythonjishu.com/tag/python/" title="python">python</a> <a href="https://pythonjishu.com/tag/ai/" title="人工智能">人工智能</a> <a href="https://pythonjishu.com/tag/node-js/" title="node js">node js</a> <a href="https://pythonjishu.com/tag/pandas/" title="Pandas">Pandas</a> <a href="https://pythonjishu.com/tag/django/" title="django">django</a> <a href="https://pythonjishu.com/tag/nginx/" title="Nginx">Nginx</a> <a href="https://pythonjishu.com/tag/%e7%88%ac%e8%99%ab/" title="爬虫">爬虫</a> <a href="https://pythonjishu.com/tag/docker/" title="Docker">Docker</a> <a href="https://pythonjishu.com/tag/numpy/" title="NumPy">NumPy</a> <a href="https://pythonjishu.com/tag/%e5%8d%b7%e7%a7%af%e7%a5%9e%e7%bb%8f%e7%bd%91%e7%bb%9c/" title="卷积神经网络">卷积神经网络</a> <a href="https://pythonjishu.com/tag/%e7%9b%ae%e6%a0%87%e6%a3%80%e6%b5%8b/" title="目标检测">目标检测</a> <a href="https://pythonjishu.com/tag/machine-learning/" title="机器学习">机器学习</a> <a href="https://pythonjishu.com/tag/rabbitmq/" title="rabbitmq">rabbitmq</a> <a href="https://pythonjishu.com/tag/%e5%be%aa%e7%8e%af%e7%a5%9e%e7%bb%8f%e7%bd%91%e7%bb%9c/" title="循环神经网络">循环神经网络</a> <a href="https://pythonjishu.com/tag/pip/" title="pip">pip</a> <a href="https://pythonjishu.com/tag/unity/" title="Unity">Unity</a> <a href="https://pythonjishu.com/tag/wcf/" title="wcf">wcf</a> <a href="https://pythonjishu.com/tag/apache/" title="apache">apache</a> </div> </div><div class="widget widget_lastest_news"><h3 class="widget-title"><span>热门文章</span></h3> <ul class="orderby-meta_value_num"> <li><a href="https://pythonjishu.com/python-list-search/" title="Python查询列表元素的5种常用方法">Python查询列表元素的5种常用方法</a></li> <li><a href="https://pythonjishu.com/python-custom-module/" title="Python 如何自定义模块(详解版)">Python 如何自定义模块(详解版)</a></li> <li><a href="https://pythonjishu.com/python-close-file/" title="Python 关闭文件(close)函数使用方法">Python 关闭文件(close)函数使用方法</a></li> <li><a href="https://pythonjishu.com/python-write-file/" title="Python 写入文件数据(write)函数使用方法">Python 写入文件数据(write)函数使用方法</a></li> <li><a href="https://pythonjishu.com/python-float/" title="Python小数类型(float)详解">Python小数类型(float)详解</a></li> <li><a href="https://pythonjishu.com/python-complex/" title="详解Python中复数类型的创建、比较与运算!">详解Python中复数类型的创建、比较与运算!</a></li> <li><a href="https://pythonjishu.com/python-reversed/" title="Python 反转序列(reversed函数)使用方法">Python 反转序列(reversed函数)使用方法</a></li> <li><a href="https://pythonjishu.com/python-eval-exec/" title="Python 将字符串转换为代码的函数(eval和exec)详解">Python 将字符串转换为代码的函数(eval和exec)详解</a></li> <li><a href="https://pythonjishu.com/python-read-file/" title="Python 读取文件(read)函数使用方法">Python 读取文件(read)函数使用方法</a></li> <li><a href="https://pythonjishu.com/python-none/" title="Python 空值None用法详解">Python 空值None用法详解</a></li> </ul> </div> </aside> </div> </div> <footer class="footer"> <div class="container"> <div class="footer-col-wrap footer-with-icon"> <div class="footer-col footer-col-copy"> <ul class="footer-nav hidden-xs"><li id="menu-item-374373" class="menu-item menu-item-374373"><a href="https://pythonjishu.com/about/">关于我们</a></li> <li id="menu-item-374372" class="menu-item menu-item-privacy-policy menu-item-374372"><a rel="privacy-policy" href="https://pythonjishu.com/privacy-policy/">隐私政策</a></li> </ul> <div class="copyright"> <div class="copyright"> <p style="text-align: left;">© 2022-2024 <strong><a href="https://pythonjishu.com/" target="_blank" rel="noopener">Python技术站</a> </strong> 保留所有权利</p> <p style="text-align: left;"><img class="" src="https://pythonjishu.com/wp-content/uploads/2023/11/baico.png" alt="baico" width="16" height="18" /> <a href="https://beian.mps.gov.cn/#/query/webSearch?code=21010502000733" target="_blank" rel="nofollow noopener noreferrer">辽公网安备21010502000733号</a> <a href="https://beian.miit.gov.cn" target="_blank" rel="nofollow noopener noreferrer">辽ICP备18014290号</a></p> <p><img class="alignleft" src="https://pythonjishu.com/wp-content/uploads/2023/11/aliprotected.png" alt="aliprotected" width="244" height="26" /></p> </div> </div> </div> <div class="footer-col footer-col-sns"> <div class="footer-sns"> <a class="sns-wx" href="javascript:;" aria-label="icon"> <i class="wpcom-icon fa fa-wechat sns-icon"></i> <span style="background-image:url('//pythonjishu.com/wp-content/uploads/2023/01/wechat-metahuber.jpg');"></span> </a> <a class="sns-wx" href="javascript:;" aria-label="icon"> <i class="wpcom-icon ri-music-fill sns-icon"></i> <span style="background-image:url('//pythonjishu.com/wp-content/uploads/2023/05/2023-05-07_20-49-41.jpg');"></span> </a> </div> </div> </div> </div> </footer> <div class="action action-style-1 action-color-1 action-pos-1" style="bottom:320px;"> <div class="action-item"> <i class="wpcom-icon fa fa-wechat action-item-icon"></i> <span>合作推广</span> <div class="action-item-inner action-item-type-1"> <img class="action-item-img" src="//pythonjishu.com/wp-content/uploads/2023/01/wechat-metahuber.jpg" alt="合作推广"> </div> </div> <div class="action-item j-share"> <i class="wpcom-icon wi action-item-icon"><svg aria-hidden="true"><use xlink:href="#wi-share"></use></svg></i> <span>分享本页</span> </div> <div class="action-item gotop j-top"> <i class="wpcom-icon wi action-item-icon"><svg aria-hidden="true"><use xlink:href="#wi-arrow-up-2"></use></svg></i> <span>返回顶部</span> </div> </div> <script type="text/javascript" id="main-js-extra"> /* <![CDATA[ */ var _wpcom_js = {"webp":"","ajaxurl":"https:\/\/pythonjishu.com\/wp-admin\/admin-ajax.php","theme_url":"https:\/\/pythonjishu.com\/wp-content\/themes\/justnews","slide_speed":"5000","is_admin":"0","lang":"zh_CN","js_lang":{"share_to":"\u5206\u4eab\u5230:","copy_done":"\u590d\u5236\u6210\u529f\uff01","copy_fail":"\u6d4f\u89c8\u5668\u6682\u4e0d\u652f\u6301\u62f7\u8d1d\u529f\u80fd","confirm":"\u786e\u5b9a","qrcode":"\u4e8c\u7ef4\u7801","page_loaded":"\u5df2\u7ecf\u5230\u5e95\u4e86","no_content":"\u6682\u65e0\u5185\u5bb9","load_failed":"\u52a0\u8f7d\u5931\u8d25\uff0c\u8bf7\u7a0d\u540e\u518d\u8bd5\uff01","expand_more":"\u9605\u8bfb\u5269\u4f59 %s"},"share":"1","lightbox":"1","post_id":"139842","user_card_height":"356","poster":{"notice":"\u8bf7\u957f\u6309\u4fdd\u5b58\u56fe\u7247\uff0c\u5c06\u5185\u5bb9\u5206\u4eab\u7ed9\u66f4\u591a\u597d\u53cb","generating":"\u6b63\u5728\u751f\u6210\u6d77\u62a5\u56fe\u7247...","failed":"\u6d77\u62a5\u56fe\u7247\u751f\u6210\u5931\u8d25"},"video_height":"484","fixed_sidebar":"1","dark_style":"0","font_url":"\/\/fonts.googleapis.com\/css2?family=Noto+Sans+SC:wght@400;500&display=swap","follow_btn":"<i class=\"wpcom-icon wi\"><svg aria-hidden=\"true\"><use xlink:href=\"#wi-add\"><\/use><\/svg><\/i>\u5173\u6ce8","followed_btn":"\u5df2\u5173\u6ce8","user_card":"1"}; /* ]]> */ </script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/js/main.js?ver=6.16.4" id="main-js"></script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/themer/assets/js/icons-2.7.17.js?ver=6.16.4" id="wpcom-icons-js"></script> <script type="text/javascript" id="wpcom-member-js-extra"> /* <![CDATA[ */ var _wpmx_js = {"ajaxurl":"https:\/\/pythonjishu.com\/wp-admin\/admin-ajax.php","plugin_url":"https:\/\/pythonjishu.com\/wp-content\/plugins\/wpcom-member\/","post_id":"139842","js_lang":{"login_desc":"\u60a8\u8fd8\u672a\u767b\u5f55\uff0c\u8bf7\u767b\u5f55\u540e\u518d\u8fdb\u884c\u76f8\u5173\u64cd\u4f5c\uff01","login_title":"\u8bf7\u767b\u5f55","login_btn":"\u767b\u5f55","reg_btn":"\u6ce8\u518c"},"login_url":"https:\/\/pythonjishu.com\/%e7%94%a8%e6%88%b7%e7%99%bb%e5%bd%95\/?modal-type=login","register_url":"https:\/\/pythonjishu.com\/%e7%94%a8%e6%88%b7%e6%b3%a8%e5%86%8c\/?modal-type=register","errors":{"require":"\u4e0d\u80fd\u4e3a\u7a7a","email":"\u8bf7\u8f93\u5165\u6b63\u786e\u7684\u7535\u5b50\u90ae\u7bb1","pls_enter":"\u8bf7\u8f93\u5165","password":"\u5bc6\u7801\u5fc5\u987b\u4e3a6~32\u4e2a\u5b57\u7b26","passcheck":"\u4e24\u6b21\u5bc6\u7801\u8f93\u5165\u4e0d\u4e00\u81f4","phone":"\u8bf7\u8f93\u5165\u6b63\u786e\u7684\u7535\u8bdd\u53f7\u7801","terms":"\u8bf7\u9605\u8bfb\u5e76\u540c\u610f\u6761\u6b3e","sms_code":"\u9a8c\u8bc1\u7801\u9519\u8bef","captcha_verify":"\u8bf7\u70b9\u51fb\u6309\u94ae\u8fdb\u884c\u9a8c\u8bc1","captcha_fail":"\u4eba\u673a\u9a8c\u8bc1\u5931\u8d25\uff0c\u8bf7\u91cd\u8bd5","nonce":"\u968f\u673a\u6570\u6821\u9a8c\u5931\u8d25","req_error":"\u8bf7\u6c42\u5931\u8d25"}}; /* ]]> */ </script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/plugins/wpcom-member/js/index.js?ver=1.3.4" id="wpcom-member-js"></script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/js/wp-embed.js?ver=6.16.4" id="wp-embed-js"></script> <script id="module-flowchart"> (function($) { $(function() { if (typeof $.fn.flowChart !== "undefined") { if ($(".language-flow").length > 0) { $(".language-flow").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-flow").addClass("flowchart").removeClass("language-flow"); $(".flowchart").flowChart(); } } }); })(jQuery); </script> <script id="module-sequence-diagram"> (function($) { $(function() { if (typeof $.fn.sequenceDiagram !== "undefined") { $(".language-sequence").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-seq").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-sequence").addClass("sequence-diagram").removeClass("language-sequence"); $(".language-seq").addClass("sequence-diagram").removeClass("language-seq"); $(".sequence-diagram").sequenceDiagram({ theme: "simple" }); } }); })(jQuery); </script> <script id="module-toc"> (function($) { $(function() { }); })(jQuery); </script> <script>document.getElementById('j-user-wrap').style.display="none";</script> <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Article", "@id": "https://pythonjishu.com/jonhakwgvtszmbt/", "url": "https://pythonjishu.com/jonhakwgvtszmbt/", "headline": "Python常用的爬虫技巧总结", "description": "Python常用的爬虫技巧总结 在本攻略中,我们将介绍Python常用的爬虫技巧,包括如何使用requests库发送HTTP请求、如何使用BeautifulSoup库解析HTML文档、如何使用正则表达式提取数据、如何使用Selenium库模拟浏览器行为、如何使用代理IP和用户代理等技巧。我们将提供两个示例,演示如何使用…", "datePublished": "2023-05-15T04:05:16+08:00", "dateModified": "2023-05-15T04:05:16+08:00", "author": {"@type":"Person","name":"Python技术站官方","url":"/1","image":"//pythonjishu.com/wp-content/uploads/2018/07/f9352ad8b4a1ce8c616fe60de409e340.jpg"} } </script> </body> </html> <!-- Cached by WP-Optimize - https://getwpo.com - Last modified: 2024年4月4日 am2:52 (Asia/Shanghai UTC:8) -->