这里是做python urllib爬取百度云连接的完整攻略:
前置条件
在进行本操作之前,应该安装好python以及常用爬虫库requests和BeautifulSoup,并熟悉URl编码的知识。
思路
- 使用requests库请求百度云分享页面,获取页面HTML代码;
- 使用BeautifulSoup库解析HTML代码,提取百度云分享链接;
- 对链接进行URL编码,由于百度云分享链接可能会失效,需要将提取到的链接保存,以备后续使用。
代码实现
这里提供一个示例代码,以爬取机器学习大街的分享为例:
import requests
from bs4 import BeautifulSoup
import urllib.parse
url = 'http://www.jiqizhixin.com/share/detail/38d5fde8-2f5a-464c-9e0d-03f57088deaa'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
links = soup.find_all('a',class_='downbtn')
for link in links:
url = link.get('href').replace('pan.baidu.com/s/','www.baidupcs.com/rest/2.0/pcs/file')
url = url.replace('?','&')
url = url.replace('=','/')
url = url + f"&method=download&access_token=null&app_id=250528"
print(urllib.parse.quote(url,safe = "'/:&?=.,;~"))
示例说明
- 例一:爬取百度云中一张图片的分享链接
import requests
from bs4 import BeautifulSoup
import urllib.parse
url = 'https://pan.baidu.com/s/1IrkY2Jw2gGj6s-CL5SDQHw'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
links = soup.find_all('a',class_='new-dbtn')
for link in links:
url = link.get('href').replace('pan.baidu.com/s/','www.baidupcs.com/rest/2.0/pcs/file')
url = url.replace('?','&')
url = url.replace('=','/')
url = url + f"&method=download&access_token=null&app_id=250528"
print(urllib.parse.quote(url,safe = "'/:&?=.,;~"))
- 例二:爬取一个百度云电影的分享链接
import requests
from bs4 import BeautifulSoup
import urllib.parse
url = 'https://pan.baidu.com/s/1TTmmdmfR8dIFrw_Js98eyQ'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
links = soup.find_all('a',class_='down-btn')
for link in links:
url = link.get('href').replace('pan.baidu.com/s/','www.baidupcs.com/rest/2.0/pcs/file')
url = url.replace('?','&')
url = url.replace('=','/')
url = url + f"&method=download&access_token=null&app_id=250528"
print(urllib.parse.quote(url,safe = "'/:&?=.,;~"))
可以看到,在两个例子中我们都通过requests库访问了百度云的分享链接,然后使用了BeautifulSoup对获取到的HTML代码进行解析获取了分享链接,然后对分享链接进行了URL编码,实现了爬取百度云分享链接的功能。
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:python urllib爬取百度云连接的实例代码 - Python技术站