Python 淘宝爬虫小实例
简介
这是一个使用Python编写的淘宝爬虫,可以帮助我们获取淘宝中任意商品的价格、销量、收入等信息。
准备工作
使用Python编写爬虫需要安装requests库和BeautifulSoup库。可以使用以下命令进行安装:
pip install requests
pip install beautifulsoup4
爬取数据
- 首先我们需要通过requests库获取淘宝商品的html源码:
import requests
url = 'https://s.taobao.com/search?q=%E5%8D%AB%E8%A1%A3'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
response = requests.get(url, headers=headers)
html = response.text
- 接下来需要使用BeautifulSoup库解析html源码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
- 使用find_all方法获取所有商品标签,再遍历标签获取商品信息:
items = soup.find_all('div', {'class': 'item'})
for item in items:
# 获取商品名称
title = item.find('a', {'class': 'title'}).text.strip()
# 获取商品价格
price = item.find('strong').text.strip()
# 获取商品销量
deal = item.find('div', {'class': 'deal-cnt'}).text.strip()
# 获取店铺名称
shop_name = item.find('div', {'class': 'shop'}).text
# 获取店铺评分
shop_score = item.find('div', {'class': 'score'}).text.strip()
print(title, price, deal, shop_name, shop_score)
示例说明
示例1:获取卫衣销量排行榜
import requests
from bs4 import BeautifulSoup
url = 'https://s.taobao.com/search?q=%E5%8D%AB%E8%A1%A3&sort=sale-desc'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
items = soup.find_all('div', {'class': 'item'})
for item in items:
title = item.find('a', {'class': 'title'}).text.strip()
price = item.find('strong').text.strip()
deal = item.find('div', {'class': 'deal-cnt'}).text.strip()
shop_name = item.find('div', {'class': 'shop'}).text
shop_score = item.find('div', {'class': 'score'}).text.strip()
print(title, price, deal, shop_name, shop_score)
示例2:获取ipad pro价格排行榜
import requests
from bs4 import BeautifulSoup
url = 'https://s.taobao.com/search?q=ipad%20pro&sort=price-asc'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
items = soup.find_all('div', {'class': 'item'})
for item in items:
title = item.find('a', {'class': 'title'}).text.strip()
price = item.find('strong').text.strip()
deal = item.find('div', {'class': 'deal-cnt'}).text.strip()
shop_name = item.find('div', {'class': 'shop'}).text
shop_score = item.find('div', {'class': 'score'}).text.strip()
print(title, price, deal, shop_name, shop_score)
总结
Python淘宝爬虫是一个非常实用的工具,在实际开发中有很大的用处。当然,在使用爬虫的时候需要注意合法合规,不得使用爬虫进行非法活动。
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:python 淘宝爬虫小实例 - Python技术站