django-beautifulsoup的简单使用

2023年4月12日下午10:59 • Django

一:介绍:简单来说，Beautiful Soup是python的一个库，最主要的功能是从网页抓取数据。官方解释如下：

'''
Beautiful Soup提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。
它是一个工具箱，通过解析文档为用户提供需要抓取的数据，因为简单，所以不需要多少代码就可以写出一个完整的应用程序。
'''

1,安装

pip3 install beautifulsoup4  
pip3 install bs4    #再 from bs4 import beautifulsou

解析器
Beautiful Soup支持Python标准库中的HTML解析器,还支持一些第三方的解析器，如果我们不安装它，则 Python 会使用 Python默认的解析器，lxml 解析器更加强大，速度更快，推荐安装。
pip3 install lxml
另一个可供选择的解析器是纯Python实现的 html5lib , html5lib的解析方式与浏览器相同,可以选择下列方法来安装html5lib:
pip install html5lib

解析器对比　

django-beautifulsoup的简单使用

二:快速开始

下面的一段HTML代码将作为例子被多次用到.这是 爱丽丝梦游仙境的 的一段内容(以后内容中简称为 爱丽丝 的文档):

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" >Elsie</a>,
<a href="http://example.com/lacie" class="sister" >Lacie</a> and
<a href="http://example.com/tillie" class="sister" >Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

使用BeautifulSoup解析这段代码,能够得到一个 BeautifulSoup 的对象,并能按照标准的缩进格式的结构输出:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())
格式化以后,就会成为下面的格式

<html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The Dormouse's story
   </b>
  </p>
  <p class="story">
   Once upon a time there were three little sisters; and their names were
   <a class="sister" href="http://example.com/elsie" id="link1">
    Elsie
   </a>
   ,
   <a class="sister" href="http://example.com/lacie" id="link2">
    Lacie
   </a>
   and
   <a class="sister" href="http://example.com/tillie" id="link3">
    Tillie
   </a>
   ;
and they lived at the bottom of a well.
  </p>
  <p class="story">
   ...
  </p>
 </body>
</html>

View Code

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：django-beautifulsoup的简单使用 - Python技术站

赞 (0)

微信扫一扫

微信扫一扫

支付宝扫一扫

支付宝扫一扫

django model content_type 使用

上一篇 2023年4月12日

django-跳转页面–模板-自定义过滤器

下一篇 2023年4月12日

python27+django1.9创建app的视图及实现动态页面

一、简易静态视图 views文件里写： from django.http import HttpResponse def hello(request): return HttpResponse(“Hello world”) 主目录的urls文件里写from start.views import hello这句，用来导入start这个app文件夹下views视…

Django 2023年4月10日
000
Django Form表单完整使用流程

Django Form表单提供了一种把一系列HTML元素和逻辑操作封装为Python对象的方式，用于从用户那里收集数据的任务。它是Django Web框架的一部分，并且是Web开发中最重要的组件之一。本文将提供完整的Django Form表单的使用流程，包括Form类的创建、表单页面的渲染、数据的验证和处理、以及数据的保存。以下是Django Form表…

Django 2023年3月12日
000
django.db.utils.OperationalError: (2003, “Can’t connect to MySQL server on ‘127.0.0.1’）

报错信息如下：检查发现原来是自己的sql没有启动启动mysql后，

Django 2023年4月12日
000
在SAE上部署Python的Django框架的一些问题汇总

下面是部署Python的Django框架在SAE上的一些问题汇总的完整攻略和两个示例说明。 1. 环境搭建首先，需要在本地安装 Django 和 mysqlclient。可以使用以下命令： pip install django pip install mysqlclient 如果报错，可以尝试使用以下命令： pip install django –use…

Django 2023年5月16日
000
Django中的文件的上传的几种方式

Django中的上传文件有多种方式，这里主要介绍三种，分别是使用Django自带的文件上传类、使用第三方库django-storages以及手动实现文件上传。使用Django自带的文件上传类 Django中自带了一个处理文件上传的类django.forms.ImageField，可以用它来实现上传图片的功能。在models.py中定义一个ImageFie…

Django 2023年5月16日
000
Django框架实现的分页demo示例

下面我将详细讲解“Django框架实现的分页demo示例”的完整攻略。示例一首先，我们需要在Django的项目中安装分页组件django-paginate。在终端中使用以下命令安装： pip install django-paginate 接下来，在views.py文件中编写视图函数。假设我们需要对一个产品列表进行分页，代码如下： from django…

Django 2023年5月16日
000
Django Rest Framework实现身份认证源码详解

我来详细讲解一下“Django Rest Framework实现身份认证源码详解”的完整攻略，下面我们将分为以下几个部分：介绍Django Rest Framework身份认证的基本原理详细讲解Django Rest Framework中使用基于Token的身份认证详细讲解Django Rest Framework中使用基于Session的身份认证 1…

Django 2023年5月16日
000
Django 表单的Widgets

　　每个字段都有一个默认的widget类型。如果想要使用一个不同的Widget，可以在定义字段时使用widget参数。像这样： from django import forms class CommentForm(forms.Form): name = forms.CharField() url = forms.URLField() comment = f…

Django 2023年4月11日
000

合作推广

合作推广

返回顶部