用Python自动下载网站所有文件

要使用Python自动下载网站所有文件，可以采用以下步骤：

导入所需的模块：使用Python进行网络爬虫需要使用到的模块有requests和beautifulsoup4，因此需要先通过pip安装这两个模块。安装完成后，在Python脚本文件中使用import语句导入这两个模块。

import requests
from bs4 import BeautifulSoup

获取网页HTML源代码：使用requests模块中的get方法，提供目标网站的URL地址，以获取网页源代码。获取成功后，使用BeautifulSoup模块解析HTML源码，以便找到需要的文件链接。

url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

分析HTML源代码并提取需要的文件链接：使用BeautifulSoup模块中的find_all或find方法提取HTML源码中所有需要下载的文件链接，然后保存在一个文件链接列表中。

file_links = []
for link in soup.find_all('a'):
    if link.get('href').endswith('.pdf'):
        file_links.append(url + link.get('href'))

下载所有文件：遍历文件链接列表，使用requests模块中的get方法，分别下载每一个文件。

for link in file_links:
    filename = link.split('/')[-1]
    response = requests.get(link)
    with open(filename, 'wb') as f:
        f.write(response.content)

下面是两个示例说明：

示例1：下载某个知名电影网站的所有电影海报

import requests
from bs4 import BeautifulSoup

url = 'https://www.imdb.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

poster_links = []
for link in soup.find_all('img'):
    if 'src' in link.attrs and link.attrs['src'].startswith('https://m.media-amazon.com/images/') and link.attrs['src'].endswith('.jpg'):
        poster_links.append(link.attrs['src'])

for link in poster_links:
    filename = link.split('/')[-1]
    response = requests.get(link)
    with open(filename, 'wb') as f:
        f.write(response.content)

示例2：下载某个音乐网站的所有音频文件

import requests
from bs4 import BeautifulSoup

url = 'https://www.example.com/music'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

audio_links = []
for link in soup.find_all('a'):
    if link.get('href').endswith('.mp3'):
        audio_links.append(url + link.get('href'))

for link in audio_links:
    filename = link.split('/')[-1]
    response = requests.get(link)
    with open(filename, 'wb') as f:
        f.write(response.content)

使用上述代码，就可以自动下载目标网站中的所有文件。需要注意的是，下载大量文件时，需要考虑占用的磁盘空间和网络带宽。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：用Python自动下载网站所有文件 - Python技术站