Python常用的爬虫技巧总结

Python常用的爬虫技巧总结

在本攻略中,我们将介绍Python常用的爬虫技巧,包括如何使用requests库发送HTTP请求、如何使用BeautifulSoup库解析HTML文档、如何使用正则表达式提取数据、如何使用Selenium库模拟浏览器行为、如何使用代理IP和用户代理等技巧。我们将提供两个示例,演示如何使用这些技巧爬取网页数据。

步骤1:安装必要的库

在开始之前,我们需要安装必要的库。我们可以使用以下命令来安装这些库:

pip install requests beautifulsoup4 selenium

步骤2:使用requests库发送HTTP请求

requests库是Python中最常用的HTTP库之一,它提供了简单易用的API,可以轻松地发送HTTP请求并获取响应数据。我们可以按照以下步骤来使用requests库发送HTTP请求:

  1. 导入requests库。
import requests
  1. 发送HTTP请求并获取响应数据。
url = 'http://example.com'
response = requests.get(url)
html = response.text

在上面的代码中,我们定义了一个URL,并使用requests库的get()方法发送HTTP请求并获取响应数据。我们使用response.text属性获取响应数据的HTML文本。

步骤3:使用BeautifulSoup库解析HTML文档

BeautifulSoup库是Python中最常用的HTML解析库之一,它可以将HTML文档解析为Python对象,并提供了简单易用的API,可以轻松地提取数据。我们可以按照以下步骤来使用BeautifulSoup库解析HTML文档:

  1. 导入BeautifulSoup库。
from bs4 import BeautifulSoup
  1. 将HTML文档解析为Python对象。
soup = BeautifulSoup(html, 'html.parser')

在上面的代码中,我们使用BeautifulSoup库将HTML文档解析为Python对象。我们使用'html.parser'作为解析器。

  1. 提取数据。
title = soup.title.text

在上面的代码中,我们使用text属性获取标签的文本内容。</p> <h2>步骤4:使用正则表达式提取数据</h2> <p>正则表达式是一种强大的文本处理工具,可以用来匹配和提取文本中的数据。我们可以按照以下步骤来使用正则表达式提取数据:</p> <ol> <li>导入re库。</li> </ol> <pre><code class="language-python">import re </code></pre> <ol> <li>编写正则表达式。</li> </ol> <pre><code class="language-python">pattern = r'<title>(.*?)</title>' </code></pre> <p>在上面的代码中,我们定义了一个正则表达式,用于匹配<title>标签的文本内容。</p> <ol> <li>使用re库匹配和提取数据。</li> </ol> <pre><code class="language-python">match = re.search(pattern, html) title = match.group(1) </code></pre> <p>在上面的代码中,我们使用re库的search()方法匹配正则表达式,并使用group()方法提取匹配到的数据。</p> <h2>步骤5:使用Selenium库模拟浏览器行为</h2> <p>Selenium库是Python中最常用的Web自动化测试库之一,它可以模拟浏览器行为,包括点击、输入、滚动等操作。我们可以按照以下步骤来使用Selenium库模拟浏览器行为:</p> <ol> <li>导入Selenium库。</li> </ol> <pre><code class="language-python">from selenium import webdriver </code></pre> <ol> <li>创建浏览器对象。</li> </ol> <pre><code class="language-python">driver = webdriver.Chrome() </code></pre> <p>在上面的代码中,我们创建了一个Chrome浏览器对象。</p> <ol> <li>打开网页。</li> </ol> <pre><code class="language-python">url = 'http://example.com' driver.get(url) </code></pre> <p>在上面的代码中,我们使用get()方法打开了一个网页。</p> <ol> <li>模拟浏览器行为。</li> </ol> <pre><code class="language-python">element = driver.find_element_by_xpath('//input[@name="q"]') element.send_keys('Python') element.submit() </code></pre> <p>在上面的代码中,我们使用find_element_by_xpath()方法查找一个输入框,并使用send_keys()方法输入文本。然后,我们使用submit()方法提交表单。</p> <h2>步骤6:使用代理IP和用户代理</h2> <p>代理IP和用户代理是爬虫中常用的技巧,可以帮助我们隐藏真实的IP地址和浏览器信息,从而避免被封禁。我们可以按照以下步骤来使用代理IP和用户代理:</p> <ol> <li>定义代理IP和用户代理。</li> </ol> <pre><code class="language-python">proxies = { 'http': 'http://127.0.0.1:8888', 'https': 'https://127.0.0.1:8888' } headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } </code></pre> <p>在上面的代码中,我们定义了一个代理IP和一个用户代理。</p> <ol> <li>使用代理IP和用户代理发送HTTP请求。</li> </ol> <pre><code class="language-python">url = 'http://example.com' response = requests.get(url, proxies=proxies, headers=headers) html = response.text </code></pre> <p>在上面的代码中,我们使用requests库发送HTTP请求,并使用proxies参数和headers参数设置代理IP和用户代理。</p> <h2>示例1:使用requests库和BeautifulSoup库爬取网页数据</h2> <p>以下是一个示例代码,演示如何使用requests库和BeautifulSoup库爬取网页数据:</p> <pre><code class="language-python">import requests from bs4 import BeautifulSoup url = 'http://example.com' response = requests.get(url) html = response.text soup = BeautifulSoup(html, 'html.parser') title = soup.title.text print(title) </code></pre> <p>在上面的代码中,我们首先使用requests库发送HTTP请求,并使用response.text属性获取响应数据的HTML文本。然后,我们使用BeautifulSoup库将HTML文本解析为Python对象,并使用text属性获取<title>标签的文本内容。最后,我们使用print()函数输出标题。</p> <h2>示例2:使用Selenium库模拟浏览器行为</h2> <p>以下是一个示例代码,演示如何使用Selenium库模拟浏览器行为:</p> <pre><code class="language-python">from selenium import webdriver driver = webdriver.Chrome() url = 'http://example.com' driver.get(url) element = driver.find_element_by_xpath('//input[@name="q"]') element.send_keys('Python') element.submit() print(driver.title) driver.quit() </code></pre> <p>在上面的代码中,我们首先创建了一个Chrome浏览器对象,并使用get()方法打开了一个网页。然后,我们使用find_element_by_xpath()方法查找一个输入框,并使用send_keys()方法输入文本。最后,我们使用submit()方法提交表单,并使用title属性获取网页标题。最后,我们使用quit()方法关闭浏览器。</p> <div class="entry-readmore"><div class="entry-readmore-btn"></div></div> <div class="entry-copyright"><p>本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:<a href="https://pythonjishu.com/jonhakwgvtszmbt/">Python常用的爬虫技巧总结 - Python技术站</a></p></div> </div> <div class="entry-tag"><a href="https://pythonjishu.com/tag/beautifulsoup/" rel="tag">BeautifulSoup</a><a href="https://pythonjishu.com/tag/python/" rel="tag">python</a></div> <div class="entry-action"> <div class="btn-zan" data-id="139842"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up-fill"></use></svg></i> 赞 <span class="entry-action-num">(0)</span></div> </div> <div class="entry-bar"> <div class="entry-bar-inner"> <div class="entry-bar-info entry-bar-info2"> <div class="info-item meta"> <a class="meta-item j-heart" href="javascript:;" data-id="139842"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i> <span class="data">0</span></a> <a class="meta-item" href="#comments"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i> <span class="data">0</span></a> <a class="meta-item dashang" href="javascript:;"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-cny-circle-fill"></use></svg></i> 打赏 <span class="dashang-img dashang-img2"> <span> <img src="//pythonjishu.com/wp-content/uploads/2023/02/2023-02-06_10-34-29.jpg" alt="微信扫一扫"/> 微信扫一扫 </span> <span> <img src="//pythonjishu.com/wp-content/uploads/2023/02/2023-02-06_10-35-01.jpg" alt="支付宝扫一扫"/> 支付宝扫一扫 </span> </span> </a> </div> <div class="info-item share"> <a class="meta-item mobile j-mobile-share" href="javascript:;" data-id="139842" data-qrcode="https://pythonjishu.com/jonhakwgvtszmbt/"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-share"></use></svg></i> 生成海报</a> <a class="meta-item wechat" data-share="wechat" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-wechat"></use></svg></i> </a> <a class="meta-item weibo" data-share="weibo" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-weibo"></use></svg></i> </a> <a class="meta-item qq" data-share="qq" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-qq"></use></svg></i> </a> <a class="meta-item qzone" data-share="qzone" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-qzone"></use></svg></i> </a> </div> <div class="info-item act"> <a href="javascript:;" id="j-reading"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-article"></use></svg></i></a> </div> </div> </div> </div> </div> <div class="entry-page"> <div class="entry-page-prev entry-page-nobg"> <a href="https://pythonjishu.com/qlvjbxqxviqknck/" title="python&MongoDB爬取图书馆借阅记录" rel="prev"> <span>python&MongoDB爬取图书馆借阅记录</span> </a> <div class="entry-page-info"> <span class="pull-left"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-arrow-left-double"></use></svg></i> 上一篇</span> <span class="pull-right">2023年5月15日</span> </div> </div> <div class="entry-page-next entry-page-nobg"> <a href="https://pythonjishu.com/iqmtjynutgzbzqo/" title="Python实现鼠标自动在屏幕上随机移动功能" rel="next"> <span>Python实现鼠标自动在屏幕上随机移动功能</span> </a> <div class="entry-page-info"> <span class="pull-right">下一篇 <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-arrow-right-double"></use></svg></i></span> <span class="pull-left">2023年5月15日</span> </div> </div> </div> <div class="entry-related-posts"> <h3 class="entry-related-title">相关文章</h3><ul class="entry-related cols-3 post-loop post-loop-default"><li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/vltagdaourfpzgg/" rel="bookmark"> matplotlib之属性组合包(cycler)的使用 </a> </h3> <div class="item-excerpt"> <p>下面我来详细讲解一下“matplotlib之属性组合包(cycler)的使用”的完整攻略。 什么是属性组合包(cycler) 在绘制图表时,我们通常需要对每一个子图的属性进行设置,例如线条颜色、线型、标记样式等。而在matplotlib中,属性组合包(cycler)可以让我们更加方便地对这些属性进行组合和设置。 属性组合包(cycler)本质上是一个包含多个…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年6月3日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/vltagdaourfpzgg/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/nzckzdbdqbteluh/" rel="bookmark"> python实现将字符串中的数字提取出来然后求和 </a> </h3> <div class="item-excerpt"> <p>如何使用Python将字符串中的数字提取出来并求和?这是一个常见的问题。下面是一个处理字符串中数字的Python示例程序: import re str1 = "a1b2c3d4" # 利用正则表达式查找数字 pattern = re.compile(r’\d+’) result = pattern.findall(str1) # 将查找到…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年6月5日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/nzckzdbdqbteluh/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/vigdvrjqdhxhzgn/" rel="bookmark"> python实现对excel中需要的数据的单元格填充颜色 </a> </h3> <div class="item-excerpt"> <p>下面是详细讲解“python实现对excel中需要的数据的单元格填充颜色”的完整实例教程。 准备工作 在开始教程之前,我们需要做一些准备工作。首先,确保你已经安装好了Python和openpyxl库。如果还没有安装openpyxl库,可以使用以下命令进行安装: pip install openpyxl 示例一 我们可以通过以下步骤,实现对Excel中某些单元…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月14日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/vigdvrjqdhxhzgn/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/zpbyqxhpbnvbxfb/" rel="bookmark"> Python利用PsUtil实现实时监控系统状态 </a> </h3> <div class="item-excerpt"> <p>Python利用PsUtil实现实时监控系统状态 PsUtil是Python库之一,它提供了比内置的os库更多的进程和系统状态信息,并以易于使用的方式提供。本文将通过PsUtil库提供的功能实现实时监控系统状态的方法。 1. 安装PsUtil库 PsUtil库是Python非常常用的库之一,可以使用pip命令简单安装: pip install psutil …</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月30日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/zpbyqxhpbnvbxfb/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/pawdzmtpktyjlpp/" rel="bookmark"> 解决Python import docx出错DLL load failed的问题 </a> </h3> <div class="item-excerpt"> <p>解决Python import docx出错DLL load failed的问题 在使用Python中的import docx模块操作Word文档时,有时会出现DLL load failed的错误,导致无法正常使用该模块。本文将详细讲解解决Python import docx出错DLL load failed的问题的完整攻略,包括检查Python版本、安装M…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月13日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/pawdzmtpktyjlpp/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-myimg"><div class="wpcom_myimg_wrap __flow"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-2252152819722406" crossorigin="anonymous"></script> <!-- 通用 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-2252152819722406" data-ad-slot="5528197265" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div></li><li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/wyzyekooczwynks/" rel="bookmark"> jupyter notebook 使用过程中python莫名崩溃的原因及解决方式 </a> </h3> <div class="item-excerpt"> <p>Jupyter Notebook 使用过程中 Python 莫名崩溃的原因及解决方式 原因 Jupyter Notebook 并不是一个完整的开发环境,而是一个交互式的笔记本。当 Notebook 运行 Python 代码时,它会在后台启动一个 Python 进程,并将其连接到 Notebook 内核。如果 Python 进程出现问题,Notebook 就会…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月13日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/wyzyekooczwynks/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/fgynbizkkgkkdql/" rel="bookmark"> Python简繁体转换的简单实现步骤 </a> </h3> <div class="item-excerpt"> <p>下面是“Python简繁体转换的简单实现步骤”的完整攻略。 步骤一:安装Python第三方库opencc opencc 是一个开源项目,可以实现简繁体转换。在 Python 中,可以使用第三方库opencc来进行简繁体转换,步骤如下: 下载并安装opencc 在Linux系统下,在终端中输入以下命令: sudo apt install opencc 安装Py…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年6月5日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/fgynbizkkgkkdql/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/zhhmgefhyfexpzu/" rel="bookmark"> 基于Python编写一个计算器程序,实现简单的加减乘除和取余二元运算 </a> </h3> <div class="item-excerpt"> <p>基于Python编写一个计算器程序 1. 确定程序功能 在编写计算器程序之前需要确定程序的具体功能,本文假设只实现简单的加减乘除和取余二元运算。 2. 编写代码 以下是一个简单的计算器程序示例: def add(a, b): """加法运算""" return a + b def sub(a, b)…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月19日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/zhhmgefhyfexpzu/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> </ul> </div> </article> </main> <aside class="sidebar"> <div class="widget widget_html_myimg"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-2252152819722406" crossorigin="anonymous"></script> <!-- 通用 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-2252152819722406" data-ad-slot="5528197265" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div><div class="widget widget_tags"><h3 class="widget-title"><span>热门标签</span></h3> <div class="tagcloud"> <a href="https://pythonjishu.com/tag/python/" title="python">python</a> <a href="https://pythonjishu.com/tag/ai/" title="人工智能">人工智能</a> <a href="https://pythonjishu.com/tag/node-js/" title="node js">node js</a> <a href="https://pythonjishu.com/tag/pandas/" title="Pandas">Pandas</a> <a href="https://pythonjishu.com/tag/django/" title="django">django</a> <a href="https://pythonjishu.com/tag/nginx/" title="Nginx">Nginx</a> <a href="https://pythonjishu.com/tag/%e7%88%ac%e8%99%ab/" title="爬虫">爬虫</a> <a href="https://pythonjishu.com/tag/docker/" title="Docker">Docker</a> <a href="https://pythonjishu.com/tag/numpy/" title="NumPy">NumPy</a> <a href="https://pythonjishu.com/tag/%e5%8d%b7%e7%a7%af%e7%a5%9e%e7%bb%8f%e7%bd%91%e7%bb%9c/" title="卷积神经网络">卷积神经网络</a> <a href="https://pythonjishu.com/tag/%e7%9b%ae%e6%a0%87%e6%a3%80%e6%b5%8b/" title="目标检测">目标检测</a> <a href="https://pythonjishu.com/tag/machine-learning/" title="机器学习">机器学习</a> <a href="https://pythonjishu.com/tag/rabbitmq/" title="rabbitmq">rabbitmq</a> <a href="https://pythonjishu.com/tag/%e5%be%aa%e7%8e%af%e7%a5%9e%e7%bb%8f%e7%bd%91%e7%bb%9c/" title="循环神经网络">循环神经网络</a> <a href="https://pythonjishu.com/tag/pip/" title="pip">pip</a> <a href="https://pythonjishu.com/tag/unity/" title="Unity">Unity</a> <a href="https://pythonjishu.com/tag/wcf/" title="wcf">wcf</a> <a href="https://pythonjishu.com/tag/apache/" title="apache">apache</a> </div> </div><div class="widget widget_lastest_news"><h3 class="widget-title"><span>热门文章</span></h3> <ul class="orderby-meta_value_num"> <li><a href="https://pythonjishu.com/python-list-search/" title="Python查询列表元素的5种常用方法">Python查询列表元素的5种常用方法</a></li> <li><a href="https://pythonjishu.com/python-custom-module/" title="Python 如何自定义模块(详解版)">Python 如何自定义模块(详解版)</a></li> <li><a href="https://pythonjishu.com/python-close-file/" title="Python 关闭文件(close)函数使用方法">Python 关闭文件(close)函数使用方法</a></li> <li><a href="https://pythonjishu.com/python-write-file/" title="Python 写入文件数据(write)函数使用方法">Python 写入文件数据(write)函数使用方法</a></li> <li><a href="https://pythonjishu.com/python-float/" title="Python小数类型(float)详解">Python小数类型(float)详解</a></li> <li><a href="https://pythonjishu.com/python-complex/" title="详解Python中复数类型的创建、比较与运算!">详解Python中复数类型的创建、比较与运算!</a></li> <li><a href="https://pythonjishu.com/python-reversed/" title="Python 反转序列(reversed函数)使用方法">Python 反转序列(reversed函数)使用方法</a></li> <li><a href="https://pythonjishu.com/python-eval-exec/" title="Python 将字符串转换为代码的函数(eval和exec)详解">Python 将字符串转换为代码的函数(eval和exec)详解</a></li> <li><a href="https://pythonjishu.com/python-none/" title="Python 空值None用法详解">Python 空值None用法详解</a></li> <li><a href="https://pythonjishu.com/python-read-file/" title="Python 读取文件(read)函数使用方法">Python 读取文件(read)函数使用方法</a></li> </ul> </div> </aside> </div> </div> <footer class="footer"> <div class="container"> <div class="footer-col-wrap footer-with-icon"> <div class="footer-col footer-col-copy"> <ul class="footer-nav hidden-xs"><li id="menu-item-374373" class="menu-item menu-item-374373"><a href="https://pythonjishu.com/about/">关于我们</a></li> <li id="menu-item-374372" class="menu-item menu-item-privacy-policy menu-item-374372"><a rel="privacy-policy" href="https://pythonjishu.com/privacy-policy/">隐私政策</a></li> </ul> <div class="copyright"> <div class="copyright"> <p style="text-align: left;">© 2022-2024 <strong><a href="https://pythonjishu.com/" target="_blank" rel="noopener">Python技术站</a> </strong> 保留所有权利</p> <p style="text-align: left;"><img class="" src="https://pythonjishu.com/wp-content/uploads/2023/11/baico.png" alt="baico" width="16" height="18" /> <a href="https://beian.mps.gov.cn/#/query/webSearch?code=21010502000733" target="_blank" rel="nofollow noopener noreferrer">辽公网安备21010502000733号</a> <a href="https://beian.miit.gov.cn" target="_blank" rel="nofollow noopener noreferrer">辽ICP备18014290号</a></p> <p><img class="alignleft" src="https://pythonjishu.com/wp-content/uploads/2023/11/aliprotected.png" alt="aliprotected" width="244" height="26" /></p> </div> </div> </div> <div class="footer-col footer-col-sns"> <div class="footer-sns"> <a class="sns-wx" href="javascript:;" aria-label="icon"> <i class="wpcom-icon fa fa-wechat sns-icon"></i> <span style="background-image:url('//pythonjishu.com/wp-content/uploads/2023/01/wechat-metahuber.jpg');"></span> </a> <a class="sns-wx" href="javascript:;" aria-label="icon"> <i class="wpcom-icon ri-music-fill sns-icon"></i> <span style="background-image:url('//pythonjishu.com/wp-content/uploads/2023/05/2023-05-07_20-49-41.jpg');"></span> </a> </div> </div> </div> </div> </footer> <div class="action action-style-1 action-color-1 action-pos-1" style="bottom:320px;"> <div class="action-item"> <i class="wpcom-icon fa fa-wechat action-item-icon"></i> <span>合作推广</span> <div class="action-item-inner action-item-type-1"> <img class="action-item-img" src="//pythonjishu.com/wp-content/uploads/2023/01/wechat-metahuber.jpg" alt="合作推广"> </div> </div> <div class="action-item j-share"> <i class="wpcom-icon wi action-item-icon"><svg aria-hidden="true"><use xlink:href="#wi-share"></use></svg></i> <span>分享本页</span> </div> <div class="action-item gotop j-top"> <i class="wpcom-icon wi action-item-icon"><svg aria-hidden="true"><use xlink:href="#wi-arrow-up-2"></use></svg></i> <span>返回顶部</span> </div> </div> <script type="text/javascript" id="main-js-extra"> /* <![CDATA[ */ var _wpcom_js = {"webp":"","ajaxurl":"https:\/\/pythonjishu.com\/wp-admin\/admin-ajax.php","theme_url":"https:\/\/pythonjishu.com\/wp-content\/themes\/justnews","slide_speed":"5000","is_admin":"0","lang":"zh_CN","js_lang":{"share_to":"\u5206\u4eab\u5230:","copy_done":"\u590d\u5236\u6210\u529f\uff01","copy_fail":"\u6d4f\u89c8\u5668\u6682\u4e0d\u652f\u6301\u62f7\u8d1d\u529f\u80fd","confirm":"\u786e\u5b9a","qrcode":"\u4e8c\u7ef4\u7801","page_loaded":"\u5df2\u7ecf\u5230\u5e95\u4e86","no_content":"\u6682\u65e0\u5185\u5bb9","load_failed":"\u52a0\u8f7d\u5931\u8d25\uff0c\u8bf7\u7a0d\u540e\u518d\u8bd5\uff01","expand_more":"\u9605\u8bfb\u5269\u4f59 %s"},"share":"1","lightbox":"1","post_id":"139842","user_card_height":"356","poster":{"notice":"\u8bf7\u300c\u70b9\u51fb\u4e0b\u8f7d\u300d\u6216\u300c\u957f\u6309\u4fdd\u5b58\u56fe\u7247\u300d\u540e\u5206\u4eab\u7ed9\u66f4\u591a\u597d\u53cb","generating":"\u6b63\u5728\u751f\u6210\u6d77\u62a5\u56fe\u7247...","failed":"\u6d77\u62a5\u56fe\u7247\u751f\u6210\u5931\u8d25"},"video_height":"484","fixed_sidebar":"1","dark_style":"0","font_url":"\/\/fonts.googleapis.com\/css2?family=Noto+Sans+SC:wght@400;500&display=swap","follow_btn":"<i class=\"wpcom-icon wi\"><svg aria-hidden=\"true\"><use xlink:href=\"#wi-add\"><\/use><\/svg><\/i>\u5173\u6ce8","followed_btn":"\u5df2\u5173\u6ce8","user_card":"1"}; /* ]]> */ </script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/js/main.js?ver=6.19.0" id="main-js"></script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/themer/assets/js/icons-2.7.19.js?ver=6.19.0" id="wpcom-icons-js"></script> <script type="text/javascript" id="wpcom-member-js-extra"> /* <![CDATA[ */ var _wpmx_js = {"ajaxurl":"https:\/\/pythonjishu.com\/wp-admin\/admin-ajax.php","plugin_url":"https:\/\/pythonjishu.com\/wp-content\/plugins\/wpcom-member\/","post_id":"139842","js_lang":{"login_desc":"\u60a8\u8fd8\u672a\u767b\u5f55\uff0c\u8bf7\u767b\u5f55\u540e\u518d\u8fdb\u884c\u76f8\u5173\u64cd\u4f5c\uff01","login_title":"\u8bf7\u767b\u5f55","login_btn":"\u767b\u5f55","reg_btn":"\u6ce8\u518c"},"login_url":"https:\/\/pythonjishu.com\/%e7%94%a8%e6%88%b7%e7%99%bb%e5%bd%95\/?modal-type=login","register_url":"https:\/\/pythonjishu.com\/%e7%94%a8%e6%88%b7%e6%b3%a8%e5%86%8c\/?modal-type=register","errors":{"require":"\u4e0d\u80fd\u4e3a\u7a7a","email":"\u8bf7\u8f93\u5165\u6b63\u786e\u7684\u7535\u5b50\u90ae\u7bb1","pls_enter":"\u8bf7\u8f93\u5165","password":"\u5bc6\u7801\u5fc5\u987b\u4e3a6~32\u4e2a\u5b57\u7b26","passcheck":"\u4e24\u6b21\u5bc6\u7801\u8f93\u5165\u4e0d\u4e00\u81f4","phone":"\u8bf7\u8f93\u5165\u6b63\u786e\u7684\u624b\u673a\u53f7\u7801","terms":"\u8bf7\u9605\u8bfb\u5e76\u540c\u610f\u6761\u6b3e","sms_code":"\u9a8c\u8bc1\u7801\u9519\u8bef","captcha_verify":"\u8bf7\u70b9\u51fb\u6309\u94ae\u8fdb\u884c\u9a8c\u8bc1","captcha_fail":"\u4eba\u673a\u9a8c\u8bc1\u5931\u8d25\uff0c\u8bf7\u91cd\u8bd5","nonce":"\u968f\u673a\u6570\u6821\u9a8c\u5931\u8d25","req_error":"\u8bf7\u6c42\u5931\u8d25"}}; /* ]]> */ </script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/plugins/wpcom-member/js/index.js?ver=1.5.2.1" id="wpcom-member-js"></script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/js/wp-embed.js?ver=6.19.0" id="wp-embed-js"></script> <script id="module-flowchart"> (function($) { $(function() { if (typeof $.fn.flowChart !== "undefined") { if ($(".language-flow").length > 0) { $(".language-flow").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-flow").addClass("flowchart").removeClass("language-flow"); $(".flowchart").flowChart(); } } }); })(jQuery); </script> <script id="module-sequence-diagram"> (function($) { $(function() { if (typeof $.fn.sequenceDiagram !== "undefined") { $(".language-sequence").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-seq").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-sequence").addClass("sequence-diagram").removeClass("language-sequence"); $(".language-seq").addClass("sequence-diagram").removeClass("language-seq"); $(".sequence-diagram").sequenceDiagram({ theme: "simple" }); } }); })(jQuery); </script> <script id="module-toc"> (function($) { $(function() { }); })(jQuery); </script> <script>document.getElementById('j-user-wrap').style.display="none";</script> <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Article", "@id": "https://pythonjishu.com/jonhakwgvtszmbt/", "url": "https://pythonjishu.com/jonhakwgvtszmbt/", "headline": "Python常用的爬虫技巧总结", "description": "Python常用的爬虫技巧总结 在本攻略中,我们将介绍Python常用的爬虫技巧,包括如何使用requests库发送HTTP请求、如何使用BeautifulSoup库解析HTML文档、如何使用正则表达式提取数据、如何使用Selenium库模拟浏览器行为、如何使用代理IP和用户代理等技巧。我们将提供两个示例,演示如何使用…", "datePublished": "2023-05-15T04:05:16+08:00", "dateModified": "2023-05-15T04:05:16+08:00", "author": {"@type":"Person","name":"Python技术站官方","url":"/1","image":"//pythonjishu.com/wp-content/uploads/2018/07/f9352ad8b4a1ce8c616fe60de409e340.jpg"} } </script> </body> </html> <!-- Cached by WP-Optimize (gzip) - https://getwpo.com - Last modified: 2024年11月13日 am4:35 (Asia/Shanghai UTC:8) -->