python_爬虫_爬取7*24小时财经新闻

2023年4月11日上午4:09 • 爬虫

import requests
import timefrom bs4 import BeautifulSoup

def sina():
    is_first = True
    task_q = [] # 本地存储新闻
    task_time = []
    while True:
        data_list = getNews()

        if is_first:
            task_q = data_list
            for data in data_list:
                print(data['n_time'],data['n_info'])
                time.sleep(0.5)
                task_time.append(data['n_time'])
            is_first = False
        else:
            for data in data_list:
                if data['n_time'] in task_time:
                    pass
                else:
                    task_time.append(data['n_time'])
                    print('-'*30)
                    print('新消息',data['n_time'],data['n_info'])

        time.sleep(5)
def getNews(): # 获取新闻函数
    news_list =[]
    base_url = 'http://live.sina.com.cn/zt/f/v/finance/globalnews1'
    response = requests.get(base_url)
    response.encoding = response.apparent_encoding
    html = response.text

    html_bs4 = BeautifulSoup(html,'lxml')
    info_list = (html_bs4.find_all('div',{'data-nick':'fin_图文直播'}))

    for info in info_list:  # 获取页面中自动刷新的新闻
        n_time = info.select('p[class="bd_i_time_c"]')[0].get_text()  # 新闻时间及内容
        n_info = info.select('p[class="bd_i_txt_c"]')[0].get_text()
        data = {
            'n_time': n_time,
            'n_info': n_info
        }
        news_list.append(data)
    return news_list[::-1] # 这里倒序，这样打印时才会先打印旧新闻，后打印新新闻
if __name__ == '__main__':
    sina()


'''
1 先得到页面的15条新闻
2 15条新闻放到列表并传递
3 每隔30秒请求一次页面，界面中时间与列表中的时间对照，不相同则读取
'''

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：python_爬虫_爬取7*24小时财经新闻 - Python技术站

赞 (0)

微信扫一扫

微信扫一扫

支付宝扫一扫

支付宝扫一扫

网络爬虫与web之间的访问授权协议——Robots

上一篇 2023年4月11日

python_爬虫_multiprocessing.dummy以及multiprocessing

下一篇 2023年4月11日

python爬虫——京东评论、jieba分词、wordcloud词云统计

接上一章，动态页面抓取——抓取京东评论区内容。 url=‘https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98vv399&productId=4560435&score=0&sortType=5&page=0&am…

爬虫 2023年4月11日
000
python爬虫 – js逆向之猿人学第十七题http2.0

前言继续干17题，就是个http2.0协议，有关这个协议的，我之前就出过相关的文章：python爬虫 – 爬虫之针对http2.0的某网站爬取代码所以，就不多比比了，直接上代码： import httpx headers = { “authority”: “match.yuanrenxue.com”, ‘cookie’: ‘sessionid=换成…

爬虫 2023年4月12日
000
如何实现分布式爬虫？

实现分布式爬虫需要以下几个步骤：确认需求：首先需要明确爬取的目标网站，并确定需要爬取的内容及其对应的网页结构。设计分布式架构：根据需求设计分布式架构，可以选择使用什么类型的分布式计算框架，如Spark、Hadoop、Storm等。考虑数据存储、任务调度、节点通信等方面，并确定主节点和从节点。编写代码：根据设计，编写代码实现分布式爬虫任务。主要工作包括：…

爬虫 2023年4月20日
000
小白scrapy爬虫之爬取简书网页并下载对应链接内容

*准备工作：爬取的网址：https://www.jianshu.com/p/7353375213ab 爬取的内容：下图中python库介绍的内容列表，并将其链接的文章内容写进文本文件中 1.同上一篇的步骤: 通过’scrapy startproject jianshu_python’命令创建scrapy工程通过’scrapy genspider jia…

爬虫 2023年4月10日
000
爬虫报错

今天在玩爬虫的时候出现了这个信息： selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a class=”btn btn-default” onclick=”SEARCH.page_jump(1…

爬虫 2023年4月12日
000
python爬虫爬取大众点评并导入redis

直接上代码，导入redis的中文编码没有解决，日后解决了会第一时间上代码！新手上路，多多包涵！ # -*- coding: utf-8 -*- import re import requests from time import sleep, ctime from urllib.request import urlopen from urllib.reque…

爬虫 2023年4月8日
000
爬虫—GEETEST滑动验证码识别

一、准备工作　　本次使用Selenium，浏览器为Chrome，并配置好ChromDriver 二、分析　　1.模拟点击验证按钮：可以直接使用Selenium完成。　 2.识别滑块的缺口位置：先观察图片中缺口的位置以及周围边缘，利用原图与其对比检测来识别缺口位置。　　　　同时获取原图与缺口图片，设定一个对比阀值，然后对两张图片进行遍历，找出相同位…

爬虫 2023年4月12日
000
python爬虫（二） urlparse和urlsplit函数

urlparse和urlsplit函数： urlparse： url=’http://www.baidu.com/s?wd=python&username=abc#1′ result=parse.urlparse(url) print(result) 输入的结果为解析之后的各部分输出对应的参数： url=’http://www.baidu.c…

爬虫 2023年4月11日
000

合作推广

合作推广

返回顶部