Python while true实现爬虫定时任务

实现爬虫的定时任务需要用到while True循环和time.sleep()方法。当然在循环内部还需要完成实际的爬虫任务。下面是具体的步骤：

1. 导入相关模块

首先要导入的模块是requests和beautifulsoup4，用于进行网络请求和网页解析。另外还需要time模块用于设置间隔时间。

import requests
from bs4 import BeautifulSoup
import time

2. 编写爬虫的主体部分

在while True循环中完成实际的爬虫任务，比如爬取特定网站的数据。

while True:
    # 爬取网页
    html = requests.get('http://example.com').content

    # 解析网页
    soup = BeautifulSoup(html, 'html.parser')
    data = soup.find('div', {'class': 'data'}).text

    # 将数据存储到文件中
    with open('data.txt', 'a') as f:
        f.write(data)

    # 休眠10秒
    time.sleep(10)

在这个示例中，我们先爬取http://example.com这个网站的内容，然后解析其中的数据，最后将数据存储到data.txt文件中。接着我们让程序休眠10秒钟，继续下一轮循环。

3. 给循环添加退出条件

由于是无限重复执行的循环，需要添加一个退出条件，比如按下ctrl+c或者捕捉到其他的中断信号。

try:
    while True:
        # 爬取网页
        html = requests.get('http://example.com').content

        # 解析网页
        soup = BeautifulSoup(html, 'html.parser')
        data = soup.find('div', {'class': 'data'}).text

        # 将数据存储到文件中
        with open('data.txt', 'a') as f:
            f.write(data)

        # 休眠10秒
        time.sleep(10)

except KeyboardInterrupt:
    pass

这个示例中我们使用了try...except语句来捕捉到ctrl+c或其他中断信号后的退出操作。

示例1

try:
    while True:
        # 爬取360导航的热门搜索
        r = requests.get('https://hao.360.com/?src=3600&ls=nf29aac9ced')
        soup = BeautifulSoup(r.content, 'html.parser')
        data = soup.find_all('a', class_='tag-link')

        # 将数据存储到文件中
        with open('hotsearch.txt', 'a', encoding='utf-8') as f:
            f.write(str(data))

        # 休眠5分钟
        time.sleep(300)

except KeyboardInterrupt:
    pass

该示例爬取了360导航的热门搜索，将爬取结果写入到hotsearch.txt文件中，并且设置休眠时间为5分钟。

示例2

try:
    while True:
        # 爬取拉勾网的职位列表
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0;Win64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
        r = requests.get('https://www.lagou.com/zhaopin/', headers=headers)
        soup = BeautifulSoup(r.content, 'html.parser')
        data = soup.find_all('li', class_='con_list_item')

        # 将数据存储到文件中
        with open('joblist.txt', 'a', encoding='utf-8') as f:
            f.write(str(data))

        # 休眠10分钟
        time.sleep(600)

except KeyboardInterrupt:
    pass

该示例爬取拉勾网的职位列表，并且设置休眠时间为10分钟。将爬取结果写入到joblist.txt文件中。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：Python while true实现爬虫定时任务 - Python技术站

Python while true实现爬虫定时任务

1. 导入相关模块

2. 编写爬虫的主体部分

3. 给循环添加退出条件

示例1

示例2

相关文章