需求:使用bs4实现将诗词名句网站中三国演义小说的每一章的内容爬去到本地磁盘进行存储
http://www.shicimingju.com/book/sanguoyanyi.html
1 from bs4 import BeautifulSoup 2 import requests 3 4 url = 'http://www.shicimingju.com/book/sanguoyanyi.html' 5 headers = { 6 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36' 7 } 8 9 page_text = requests.get(url=url,headers=headers).text 10 11 #解析:章节的标题 详情页的url 12 soup = BeautifulSoup(page_text,'lxml') 13 li_list = soup.select('.book-mulu > ul > li') 14 fp = open('./xiaoshuo.txt','w',encoding='utf-8') 15 for li in li_list: 16 title = li.a.string 17 detail_url = 'http://www.shicimingju.com'+li.a['href'] 18 19 #对详情页发起请求 20 detail_page_text = requests.get(url=detail_url,headers=headers).text 21 soup = BeautifulSoup(detail_page_text,'lxml') 22 #返回的文本内容是一整个字符串数据 23 text = soup.find('div',class_='chapter_content').text 24 25 fp.write(title+"\n"+text) 26 fp.close() 27 print('over!!!')
爬虫代码
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:BeautifulSoup /bs4 爬虫实例 - Python技术站