Python requests库爬取豆瓣电视剧数据并保存到本地详解

在进行爬虫开发时，我们可能需要使用Python的requests库来爬取网站数据。本文将介绍如何使用Python requests库爬取豆瓣电视剧数据并保存到本地，并提供两个示例。

实现步骤

步骤一：安装requests库和BeautifulSoup库

在Python中，我们可以使用pip命令安装requests库和BeautifulSoup库：

pip install requests
pip install beautifulsoup4

步骤二：编写爬虫代码

以下是一个示例，演示如何使用Python requests库爬取豆瓣电视剧数据并保存到本地：

import requests
from bs4 import BeautifulSoup

url = 'https://movie.douban.com/tv/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all('div', class_='item')

with open('douban_tv.txt', 'w', encoding='utf-8') as f:
    for item in items:
        title = item.find('span', class_='title').text
        rating = item.find('span', class_='rating_num').text
        f.write(title + ' ' + rating + '\n')

在上面的示例中，我们使用requests库发送GET请求，并使用BeautifulSoup库解析响应内容。我们使用find_all方法查找所有class为item的div元素。我们使用with语句打开文件douban_tv.txt，并使用write方法将电视剧名称和评分写入文件中。

步骤三：运行爬虫代码

我们可以使用以下命令运行爬虫代码：

python douban_tv.py

在运行爬虫代码时，我们会看到douban_tv.txt文件被创建，并包含豆瓣电视剧数据。

示例一：爬取豆瓣电影数据

以下是一个示例，演示如何使用Python requests库爬取豆瓣电影数据并保存到本地：

import requests
from bs4 import BeautifulSoup

url = 'https://movie.douban.com/top250'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all('div', class_='item')

with open('douban_movie.txt', 'w', encoding='utf-8') as f:
    for item in items:
        title = item.find('span', class_='title').text
        rating = item.find('span', class_='rating_num').text
        f.write(title + ' ' + rating + '\n')

在上面的示例中，我们使用requests库发送GET请求，并使用BeautifulSoup库解析响应内容。我们使用find_all方法查找所有class为item的div元素。我们使用with语句打开文件douban_movie.txt，并使用write方法将电影名称和评分写入文件中。

示例二：爬取新浪新闻数据

以下是一个示例，演示如何使用Python requests库爬取新浪新闻数据并保存到本地：

import requests
from bs4 import BeautifulSoup

url = 'https://news.sina.com.cn/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all('div', class_='news-item')

with open('sina_news.txt', 'w', encoding='utf-8') as f:
    for item in items:
        title = item.find('a').text
        f.write(title + '\n')

在上面的示例中，我们使用requests库发送GET请求，并使用BeautifulSoup库解析响应内容。我们使用find_all方法查找所有class为news-item的div元素。我们使用with语句打开文件sina_news.txt，并使用write方法将新闻标题写入文件中。

总结

本文介绍了如何使用Python requests库爬取豆瓣电视剧数据并保存到本地，并提供了两个示例。我们可以使用requests库方便地发送HTTP请求，并使用BeautifulSoup库解析响应内容。使用Python requests库爬取数据可以帮助我们快速获取网站数据，提高爬虫开发效率。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：基于Python模拟浏览器发送http请求 - Python技术站

基于Python模拟浏览器发送http请求