Python常用的爬虫技巧总结

Python常用的爬虫技巧总结

在本攻略中,我们将介绍Python常用的爬虫技巧,包括如何使用requests库发送HTTP请求、如何使用BeautifulSoup库解析HTML文档、如何使用正则表达式提取数据、如何使用Selenium库模拟浏览器行为、如何使用代理IP和用户代理等技巧。我们将提供两个示例,演示如何使用这些技巧爬取网页数据。

步骤1:安装必要的库

在开始之前,我们需要安装必要的库。我们可以使用以下命令来安装这些库:

pip install requests beautifulsoup4 selenium

步骤2:使用requests库发送HTTP请求

requests库是Python中最常用的HTTP库之一,它提供了简单易用的API,可以轻松地发送HTTP请求并获取响应数据。我们可以按照以下步骤来使用requests库发送HTTP请求:

  1. 导入requests库。
import requests
  1. 发送HTTP请求并获取响应数据。
url = 'http://example.com'
response = requests.get(url)
html = response.text

在上面的代码中,我们定义了一个URL,并使用requests库的get()方法发送HTTP请求并获取响应数据。我们使用response.text属性获取响应数据的HTML文本。

步骤3:使用BeautifulSoup库解析HTML文档

BeautifulSoup库是Python中最常用的HTML解析库之一,它可以将HTML文档解析为Python对象,并提供了简单易用的API,可以轻松地提取数据。我们可以按照以下步骤来使用BeautifulSoup库解析HTML文档:

  1. 导入BeautifulSoup库。
from bs4 import BeautifulSoup
  1. 将HTML文档解析为Python对象。
soup = BeautifulSoup(html, 'html.parser')

在上面的代码中,我们使用BeautifulSoup库将HTML文档解析为Python对象。我们使用'html.parser'作为解析器。

  1. 提取数据。
title = soup.title.text

在上面的代码中,我们使用text属性获取标签的文本内容。</p> <h2>步骤4:使用正则表达式提取数据</h2> <p>正则表达式是一种强大的文本处理工具,可以用来匹配和提取文本中的数据。我们可以按照以下步骤来使用正则表达式提取数据:</p> <ol> <li>导入re库。</li> </ol> <pre><code class="language-python">import re </code></pre> <ol> <li>编写正则表达式。</li> </ol> <pre><code class="language-python">pattern = r'<title>(.*?)</title>' </code></pre> <p>在上面的代码中,我们定义了一个正则表达式,用于匹配<title>标签的文本内容。</p> <ol> <li>使用re库匹配和提取数据。</li> </ol> <pre><code class="language-python">match = re.search(pattern, html) title = match.group(1) </code></pre> <p>在上面的代码中,我们使用re库的search()方法匹配正则表达式,并使用group()方法提取匹配到的数据。</p> <h2>步骤5:使用Selenium库模拟浏览器行为</h2> <p>Selenium库是Python中最常用的Web自动化测试库之一,它可以模拟浏览器行为,包括点击、输入、滚动等操作。我们可以按照以下步骤来使用Selenium库模拟浏览器行为:</p> <ol> <li>导入Selenium库。</li> </ol> <pre><code class="language-python">from selenium import webdriver </code></pre> <ol> <li>创建浏览器对象。</li> </ol> <pre><code class="language-python">driver = webdriver.Chrome() </code></pre> <p>在上面的代码中,我们创建了一个Chrome浏览器对象。</p> <ol> <li>打开网页。</li> </ol> <pre><code class="language-python">url = 'http://example.com' driver.get(url) </code></pre> <p>在上面的代码中,我们使用get()方法打开了一个网页。</p> <ol> <li>模拟浏览器行为。</li> </ol> <pre><code class="language-python">element = driver.find_element_by_xpath('//input[@name="q"]') element.send_keys('Python') element.submit() </code></pre> <p>在上面的代码中,我们使用find_element_by_xpath()方法查找一个输入框,并使用send_keys()方法输入文本。然后,我们使用submit()方法提交表单。</p> <h2>步骤6:使用代理IP和用户代理</h2> <p>代理IP和用户代理是爬虫中常用的技巧,可以帮助我们隐藏真实的IP地址和浏览器信息,从而避免被封禁。我们可以按照以下步骤来使用代理IP和用户代理:</p> <ol> <li>定义代理IP和用户代理。</li> </ol> <pre><code class="language-python">proxies = { 'http': 'http://127.0.0.1:8888', 'https': 'https://127.0.0.1:8888' } headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } </code></pre> <p>在上面的代码中,我们定义了一个代理IP和一个用户代理。</p> <ol> <li>使用代理IP和用户代理发送HTTP请求。</li> </ol> <pre><code class="language-python">url = 'http://example.com' response = requests.get(url, proxies=proxies, headers=headers) html = response.text </code></pre> <p>在上面的代码中,我们使用requests库发送HTTP请求,并使用proxies参数和headers参数设置代理IP和用户代理。</p> <h2>示例1:使用requests库和BeautifulSoup库爬取网页数据</h2> <p>以下是一个示例代码,演示如何使用requests库和BeautifulSoup库爬取网页数据:</p> <pre><code class="language-python">import requests from bs4 import BeautifulSoup url = 'http://example.com' response = requests.get(url) html = response.text soup = BeautifulSoup(html, 'html.parser') title = soup.title.text print(title) </code></pre> <p>在上面的代码中,我们首先使用requests库发送HTTP请求,并使用response.text属性获取响应数据的HTML文本。然后,我们使用BeautifulSoup库将HTML文本解析为Python对象,并使用text属性获取<title>标签的文本内容。最后,我们使用print()函数输出标题。</p> <h2>示例2:使用Selenium库模拟浏览器行为</h2> <p>以下是一个示例代码,演示如何使用Selenium库模拟浏览器行为:</p> <pre><code class="language-python">from selenium import webdriver driver = webdriver.Chrome() url = 'http://example.com' driver.get(url) element = driver.find_element_by_xpath('//input[@name="q"]') element.send_keys('Python') element.submit() print(driver.title) driver.quit() </code></pre> <p>在上面的代码中,我们首先创建了一个Chrome浏览器对象,并使用get()方法打开了一个网页。然后,我们使用find_element_by_xpath()方法查找一个输入框,并使用send_keys()方法输入文本。最后,我们使用submit()方法提交表单,并使用title属性获取网页标题。最后,我们使用quit()方法关闭浏览器。</p> <div class="entry-readmore"><div class="entry-readmore-btn"></div></div> <div class="entry-copyright"><p>本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:<a href="https://pythonjishu.com/jonhakwgvtszmbt/">Python常用的爬虫技巧总结 - Python技术站</a></p></div> </div> <div class="entry-tag"><a href="https://pythonjishu.com/tag/beautifulsoup/" rel="tag">BeautifulSoup</a><a href="https://pythonjishu.com/tag/python/" rel="tag">python</a></div> <div class="entry-action"> <div class="btn-zan" data-id="139842"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up-fill"></use></svg></i> 赞 <span class="entry-action-num">(0)</span></div> </div> <div class="entry-bar"> <div class="entry-bar-inner"> <div class="entry-bar-info entry-bar-info2"> <div class="info-item meta"> <a class="meta-item j-heart" href="javascript:;" data-id="139842"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i> <span class="data">0</span></a> <a class="meta-item" href="#comments"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i> <span class="data">0</span></a> <a class="meta-item dashang" href="javascript:;"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-cny-circle-fill"></use></svg></i> 打赏 <span class="dashang-img dashang-img2"> <span> <img src="//pythonjishu.com/wp-content/uploads/2023/02/2023-02-06_10-34-29.jpg" alt="微信扫一扫"/> 微信扫一扫 </span> <span> <img src="//pythonjishu.com/wp-content/uploads/2023/02/2023-02-06_10-35-01.jpg" alt="支付宝扫一扫"/> 支付宝扫一扫 </span> </span> </a> </div> <div class="info-item share"> <a class="meta-item mobile j-mobile-share" href="javascript:;" data-id="139842" data-qrcode="https://pythonjishu.com/jonhakwgvtszmbt/"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-share"></use></svg></i> 生成海报</a> <a class="meta-item wechat" data-share="wechat" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-wechat"></use></svg></i> </a> <a class="meta-item weibo" data-share="weibo" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-weibo"></use></svg></i> </a> <a class="meta-item qq" data-share="qq" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-qq"></use></svg></i> </a> <a class="meta-item qzone" data-share="qzone" target="_blank" rel="nofollow" href="#"> <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-qzone"></use></svg></i> </a> </div> <div class="info-item act"> <a href="javascript:;" id="j-reading"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-article"></use></svg></i></a> </div> </div> </div> </div> </div> <div class="entry-page"> <div class="entry-page-prev entry-page-nobg"> <a href="https://pythonjishu.com/qlvjbxqxviqknck/" title="python&MongoDB爬取图书馆借阅记录" rel="prev"> <span>python&MongoDB爬取图书馆借阅记录</span> </a> <div class="entry-page-info"> <span class="pull-left"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-arrow-left-double"></use></svg></i> 上一篇</span> <span class="pull-right">2023年5月15日</span> </div> </div> <div class="entry-page-next entry-page-nobg"> <a href="https://pythonjishu.com/iqmtjynutgzbzqo/" title="Python实现鼠标自动在屏幕上随机移动功能" rel="next"> <span>Python实现鼠标自动在屏幕上随机移动功能</span> </a> <div class="entry-page-info"> <span class="pull-right">下一篇 <i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-arrow-right-double"></use></svg></i></span> <span class="pull-left">2023年5月15日</span> </div> </div> </div> <div class="entry-related-posts"> <h3 class="entry-related-title">相关文章</h3><ul class="entry-related cols-3 post-loop post-loop-default"><li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/wxalpctzzacdpfj/" rel="bookmark"> Python %r和%s区别代码实例解析 </a> </h3> <div class="item-excerpt"> <p>在Python中,%r和%s都是用于格式化字符串的占位符。但是它们之间有一些区别。以下是Python %r和%s区别的详细攻略: %r和%s的区别 %r和%s都是用于格式化字符串的占位符,但是它们之间有一些区别。%r会将变量转换为它的repr()形式,而%s会将变量转换为它的str()形式。repr()和str()是Python中两种不同的字符串表示形式。r…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月14日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/wxalpctzzacdpfj/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/dehptvhajqdonpu/" rel="bookmark"> Zookeeper接口kazoo实例解析 </a> </h3> <div class="item-excerpt"> <p>Zookeeper接口kazoo实例解析 Zookeeper是一个分布式协调服务,可以用于管理分布式系统中的配置信息、命名服务、分布式锁等。Kazoo是一个基于Python的Zookeeper客户端库,可以方便地与Zookeeper进行交互。本文将详细讲解Kazoo的安装和使用过程,包括Kazoo的安装、连接Zookeeper、创建节点、获取节点数据等内容,…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月15日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/dehptvhajqdonpu/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/avtuvqyepwnw/" rel="bookmark"> Python 2 和 3 兼容的方式通过键和值迭代 dict </a> </h3> <div class="item-excerpt"> <p>【问题标题】:Python 2 and 3 compatible way of iterating through dict with key and valuePython 2 和 3 兼容的方式通过键和值迭代 dict 【发布时间】:2023-04-06 11:58:01 【问题描述】: 由于使用iteritems(),我有以下仅适用于 Python 2…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/" target="_blank">Python开发</a> <span class="item-meta-li date">2023年4月6日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/avtuvqyepwnw/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/narlerolcrrdkoa/" rel="bookmark"> 六个Python编程最受用的内置函数使用详解 </a> </h3> <div class="item-excerpt"> <p>当然,我很乐意为您提供“六个Python编程最受用的内置函数使用详解”的完整攻略。以下是详细步骤示例。 六个Python编程最受用的内置函数 Python提供了许多内置函数,这些函数是Python编中最常用的函数之一。以下是六个Python编程最受用的内置函数: print() len() range() type() input() str() 1. pr…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月13日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/narlerolcrrdkoa/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/tfjzfesnacouuki/" rel="bookmark"> python如何更新包 </a> </h3> <div class="item-excerpt"> <p>要更新Python包,有不同的方法,而具体使用哪种方法取决于包的安装方式。在这里,我总结了几种常见的情况及其对应的更新方法。 1. 使用pip安装的包 使用pip安装的包是最常见的情况,通过pip安装的包也是可以轻松地更新的。以下是具体步骤: 打开终端或命令行窗口。 输入以下命令来检查当前以安装的包是否有可用的更新。 pip list –outdated …</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月14日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/tfjzfesnacouuki/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-myimg"><div class="wpcom_myimg_wrap __flow"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-2252152819722406" crossorigin="anonymous"></script> <!-- 通用 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-2252152819722406" data-ad-slot="5528197265" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div></li><li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/iqmcpcqoyotjsft/" rel="bookmark"> numpy给array增加维度np.newaxis的实例 </a> </h3> <div class="item-excerpt"> <p>首先,需要了解numpy中多维数组的概念。在numpy中,多维数组也被称为ndarray,它是一种类似于数组的数据结构,但是可以支持多维数组,其中每个元素都必须是同类型。 numpy为了方便处理多维数组,提供了一些函数和属性来处理多维数组。其中,np.newaxis是一个非常有用的属性,可以在数组的指定位置增加一维。 具体来说,当我们使用np.newaxis…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年6月6日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/iqmcpcqoyotjsft/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/naelbkidgflaagx/" rel="bookmark"> Python脚本实现自动登录校园网 </a> </h3> <div class="item-excerpt"> <p>请看下面我为您详细讲解Python脚本实现自动登录校园网的完整攻略。 一、准备工作 1.1 确认登录方式 要实现自动登录校园网,首先要确认校园网的登录方式,一般来说有以下几种: 基于Web表单的登录:需要提交表单(一般是POST请求)来完成登录。 基于二维码的登录:需要将二维码输入到APP或者微信中才能完成登录。 基于HTTP Basic认证的登录:需要在请…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年5月19日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/naelbkidgflaagx/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> <li class="item item-no-thumb"> <div class="item-content"> <h3 class="item-title"> <a href="https://pythonjishu.com/wnfdyqjcpvqniaq/" rel="bookmark"> Python中函数的参数定义和可变参数用法实例分析 </a> </h3> <div class="item-excerpt"> <p>下面是关于“Python中函数的参数定义和可变参数用法实例分析”的攻略,分为以下几个部分: 1. Python中函数的参数定义 在Python中,函数的参数定义分为位置参数、默认值参数和关键字参数。例如: # 位置参数 def func_name(arg1, arg2, arg3): pass # 默认值参数 def func_name(arg1, arg2…</p> </div> <div class="item-meta"> <a class="item-meta-li" href="https://pythonjishu.com/python/python-2/" target="_blank">python</a> <span class="item-meta-li date">2023年6月5日</span> <div class="item-meta-right"> <a class="item-meta-li comments" href="https://pythonjishu.com/wnfdyqjcpvqniaq/#comments" target="_blank" title="评论数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-comment"></use></svg></i>0</a><span class="item-meta-li stars" title="收藏数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-star"></use></svg></i>0</span><span class="item-meta-li likes" title="点赞数"><i class="wpcom-icon wi"><svg aria-hidden="true"><use xlink:href="#wi-thumb-up"></use></svg></i>0</span> </div> </div> </div> </li> </ul> </div> </article> </main> <aside class="sidebar"> <div class="widget widget_html_myimg"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-2252152819722406" crossorigin="anonymous"></script> <!-- 通用 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-2252152819722406" data-ad-slot="5528197265" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div><div class="widget widget_tags"><h3 class="widget-title"><span>热门标签</span></h3> <div class="tagcloud"> <a href="https://pythonjishu.com/tag/python/" title="python">python</a> <a href="https://pythonjishu.com/tag/ai/" title="人工智能">人工智能</a> <a href="https://pythonjishu.com/tag/node-js/" title="node js">node js</a> <a href="https://pythonjishu.com/tag/pandas/" title="Pandas">Pandas</a> <a href="https://pythonjishu.com/tag/django/" title="django">django</a> <a href="https://pythonjishu.com/tag/nginx/" title="Nginx">Nginx</a> <a href="https://pythonjishu.com/tag/%e7%88%ac%e8%99%ab/" title="爬虫">爬虫</a> <a href="https://pythonjishu.com/tag/docker/" title="Docker">Docker</a> <a href="https://pythonjishu.com/tag/numpy/" title="NumPy">NumPy</a> <a href="https://pythonjishu.com/tag/%e5%8d%b7%e7%a7%af%e7%a5%9e%e7%bb%8f%e7%bd%91%e7%bb%9c/" title="卷积神经网络">卷积神经网络</a> <a href="https://pythonjishu.com/tag/%e7%9b%ae%e6%a0%87%e6%a3%80%e6%b5%8b/" title="目标检测">目标检测</a> <a href="https://pythonjishu.com/tag/machine-learning/" title="机器学习">机器学习</a> <a href="https://pythonjishu.com/tag/rabbitmq/" title="rabbitmq">rabbitmq</a> <a href="https://pythonjishu.com/tag/%e5%be%aa%e7%8e%af%e7%a5%9e%e7%bb%8f%e7%bd%91%e7%bb%9c/" title="循环神经网络">循环神经网络</a> <a href="https://pythonjishu.com/tag/unity/" title="Unity">Unity</a> <a href="https://pythonjishu.com/tag/pip/" title="pip">pip</a> <a href="https://pythonjishu.com/tag/wcf/" title="wcf">wcf</a> <a href="https://pythonjishu.com/tag/apache/" title="apache">apache</a> </div> </div><div class="widget widget_lastest_news"><h3 class="widget-title"><span>热门文章</span></h3></div> </aside> </div> </div> <footer class="footer"> <div class="container"> <div class="footer-col-wrap footer-with-icon"> <div class="footer-col footer-col-copy"> <ul class="footer-nav hidden-xs"><li id="menu-item-374373" class="menu-item menu-item-374373"><a href="https://pythonjishu.com/about/">关于我们</a></li> <li id="menu-item-374372" class="menu-item menu-item-privacy-policy menu-item-374372"><a rel="privacy-policy" href="https://pythonjishu.com/privacy-policy/">隐私政策</a></li> </ul> <div class="copyright"> <div class="copyright"> <p style="text-align: left;">© 2022-2024 <strong><a href="https://pythonjishu.com/" target="_blank" rel="noopener">Python技术站</a> </strong> 保留所有权利</p> <p style="text-align: left;"><img class="" src="https://pythonjishu.com/wp-content/uploads/2023/11/baico.png" alt="baico" width="16" height="18" /> <a href="https://beian.mps.gov.cn/#/query/webSearch?code=21010502000733" target="_blank" rel="nofollow noopener noreferrer">辽公网安备21010502000733号</a> <a href="https://beian.miit.gov.cn" target="_blank" rel="nofollow noopener noreferrer">辽ICP备18014290号</a></p> <p><img class="alignleft" src="https://pythonjishu.com/wp-content/uploads/2023/11/aliprotected.png" alt="aliprotected" width="244" height="26" /></p> </div> </div> </div> <div class="footer-col footer-col-sns"> <div class="footer-sns"> <a class="sns-wx" href="javascript:;" aria-label="icon"> <i class="wpcom-icon fa fa-wechat sns-icon"></i> <span style="background-image:url('//pythonjishu.com/wp-content/uploads/2023/01/wechat-metahuber.jpg');"></span> </a> <a class="sns-wx" href="javascript:;" aria-label="icon"> <i class="wpcom-icon ri-music-fill sns-icon"></i> <span style="background-image:url('//pythonjishu.com/wp-content/uploads/2023/05/2023-05-07_20-49-41.jpg');"></span> </a> </div> </div> </div> </div> </footer> <div class="action action-style-1 action-color-1 action-pos-1" style="bottom:320px;"> <div class="action-item"> <i class="wpcom-icon fa fa-wechat action-item-icon"></i> <span>合作推广</span> <div class="action-item-inner action-item-type-1"> <img class="action-item-img" src="//pythonjishu.com/wp-content/uploads/2023/01/wechat-metahuber.jpg" alt="合作推广"> </div> </div> <div class="action-item j-share"> <i class="wpcom-icon wi action-item-icon"><svg aria-hidden="true"><use xlink:href="#wi-share"></use></svg></i> <span>分享本页</span> </div> <div class="action-item gotop j-top"> <i class="wpcom-icon wi action-item-icon"><svg aria-hidden="true"><use xlink:href="#wi-arrow-up-2"></use></svg></i> <span>返回顶部</span> </div> </div> <script type="text/javascript" id="main-js-extra"> /* <![CDATA[ */ var _wpcom_js = {"webp":"","ajaxurl":"https:\/\/pythonjishu.com\/wp-admin\/admin-ajax.php","theme_url":"https:\/\/pythonjishu.com\/wp-content\/themes\/justnews","slide_speed":"5000","is_admin":"0","lang":"zh_CN","js_lang":{"share_to":"\u5206\u4eab\u5230:","copy_done":"\u590d\u5236\u6210\u529f\uff01","copy_fail":"\u6d4f\u89c8\u5668\u6682\u4e0d\u652f\u6301\u62f7\u8d1d\u529f\u80fd","confirm":"\u786e\u5b9a","qrcode":"\u4e8c\u7ef4\u7801","page_loaded":"\u5df2\u7ecf\u5230\u5e95\u4e86","no_content":"\u6682\u65e0\u5185\u5bb9","load_failed":"\u52a0\u8f7d\u5931\u8d25\uff0c\u8bf7\u7a0d\u540e\u518d\u8bd5\uff01","expand_more":"\u9605\u8bfb\u5269\u4f59 %s"},"share":"1","lightbox":"1","post_id":"139842","user_card_height":"356","poster":{"notice":"\u8bf7\u300c\u70b9\u51fb\u4e0b\u8f7d\u300d\u6216\u300c\u957f\u6309\u4fdd\u5b58\u56fe\u7247\u300d\u540e\u5206\u4eab\u7ed9\u66f4\u591a\u597d\u53cb","generating":"\u6b63\u5728\u751f\u6210\u6d77\u62a5\u56fe\u7247...","failed":"\u6d77\u62a5\u56fe\u7247\u751f\u6210\u5931\u8d25"},"video_height":"484","fixed_sidebar":"1","dark_style":"0","font_url":"\/\/fonts.googleapis.com\/css2?family=Noto+Sans+SC:wght@400;500&display=swap","follow_btn":"<i class=\"wpcom-icon wi\"><svg aria-hidden=\"true\"><use xlink:href=\"#wi-add\"><\/use><\/svg><\/i>\u5173\u6ce8","followed_btn":"\u5df2\u5173\u6ce8","user_card":"1"}; /* ]]> */ </script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/js/main.js?ver=6.19.0" id="main-js"></script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/themer/assets/js/icons-2.7.19.js?ver=6.19.0" id="wpcom-icons-js"></script> <script type="text/javascript" id="wpcom-member-js-extra"> /* <![CDATA[ */ var _wpmx_js = {"ajaxurl":"https:\/\/pythonjishu.com\/wp-admin\/admin-ajax.php","plugin_url":"https:\/\/pythonjishu.com\/wp-content\/plugins\/wpcom-member\/","post_id":"139842","js_lang":{"login_desc":"\u60a8\u8fd8\u672a\u767b\u5f55\uff0c\u8bf7\u767b\u5f55\u540e\u518d\u8fdb\u884c\u76f8\u5173\u64cd\u4f5c\uff01","login_title":"\u8bf7\u767b\u5f55","login_btn":"\u767b\u5f55","reg_btn":"\u6ce8\u518c"},"login_url":"https:\/\/pythonjishu.com\/%e7%94%a8%e6%88%b7%e7%99%bb%e5%bd%95\/?modal-type=login","register_url":"https:\/\/pythonjishu.com\/%e7%94%a8%e6%88%b7%e6%b3%a8%e5%86%8c\/?modal-type=register","errors":{"require":"\u4e0d\u80fd\u4e3a\u7a7a","email":"\u8bf7\u8f93\u5165\u6b63\u786e\u7684\u7535\u5b50\u90ae\u7bb1","pls_enter":"\u8bf7\u8f93\u5165","password":"\u5bc6\u7801\u5fc5\u987b\u4e3a6~32\u4e2a\u5b57\u7b26","passcheck":"\u4e24\u6b21\u5bc6\u7801\u8f93\u5165\u4e0d\u4e00\u81f4","phone":"\u8bf7\u8f93\u5165\u6b63\u786e\u7684\u624b\u673a\u53f7\u7801","terms":"\u8bf7\u9605\u8bfb\u5e76\u540c\u610f\u6761\u6b3e","sms_code":"\u9a8c\u8bc1\u7801\u9519\u8bef","captcha_verify":"\u8bf7\u70b9\u51fb\u6309\u94ae\u8fdb\u884c\u9a8c\u8bc1","captcha_fail":"\u4eba\u673a\u9a8c\u8bc1\u5931\u8d25\uff0c\u8bf7\u91cd\u8bd5","nonce":"\u968f\u673a\u6570\u6821\u9a8c\u5931\u8d25","req_error":"\u8bf7\u6c42\u5931\u8d25"}}; /* ]]> */ </script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/plugins/wpcom-member/js/index.js?ver=1.5.7" id="wpcom-member-js"></script> <script type="text/javascript" src="https://pythonjishu.com/wp-content/themes/justnews/js/wp-embed.js?ver=6.19.0" id="wp-embed-js"></script> <script id="module-flowchart"> (function($) { $(function() { if (typeof $.fn.flowChart !== "undefined") { if ($(".language-flow").length > 0) { $(".language-flow").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-flow").addClass("flowchart").removeClass("language-flow"); $(".flowchart").flowChart(); } } }); })(jQuery); </script> <script id="module-sequence-diagram"> (function($) { $(function() { if (typeof $.fn.sequenceDiagram !== "undefined") { $(".language-sequence").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-seq").parent("pre").attr("style", "text-align: center; background: none;"); $(".language-sequence").addClass("sequence-diagram").removeClass("language-sequence"); $(".language-seq").addClass("sequence-diagram").removeClass("language-seq"); $(".sequence-diagram").sequenceDiagram({ theme: "simple" }); } }); })(jQuery); </script> <script id="module-toc"> (function($) { $(function() { }); })(jQuery); </script> <script>document.getElementById('j-user-wrap').style.display="none";</script> <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Article", "@id": "https://pythonjishu.com/jonhakwgvtszmbt/", "url": "https://pythonjishu.com/jonhakwgvtszmbt/", "headline": "Python常用的爬虫技巧总结", "description": "Python常用的爬虫技巧总结 在本攻略中,我们将介绍Python常用的爬虫技巧,包括如何使用requests库发送HTTP请求、如何使用BeautifulSoup库解析HTML文档、如何使用正则表达式提取数据、如何使用Selenium库模拟浏览器行为、如何使用代理IP和用户代理等技巧。我们将提供两个示例,演示如何使用…", "datePublished": "2023-05-15T04:05:16+08:00", "dateModified": "2023-05-15T04:05:16+08:00", "author": {"@type":"Person","name":"Python技术站官方","url":"/1","image":"//pythonjishu.com/wp-content/uploads/2018/07/f9352ad8b4a1ce8c616fe60de409e340.jpg"} } </script> </body> </html> <!-- Cached by WP-Optimize (gzip) - https://getwpo.com - Last modified: 2024年12月20日 pm6:01 (Asia/Shanghai UTC:8) -->