基于Python爬取搜狐证券股票过程解析

以下是基于Python爬取搜狐证券股票的完整攻略：

1. 爬取网页

首先，要使用Python的requests库发送HTTP请求获取搜狐证券股票的网页内容。可以使用如下代码：

import requests

url = 'https://q.stock.sohu.com/hisHq?code=cn_600519&start=20220101&end=20220131&stat=1&order=D&period=d&callback=historySearchHandler&rt=jsonp'

response = requests.get(url)
html_content = response.content.decode('gbk')
print(html_content)

这里以爬取茅台（股票代码cn_600519）2022年1月份的历史数据为例。上面的代码会打印出获取到的搜狐证券茅台股票2022年1月份历史数据的网页内容。

2. 解析网页

接着，要使用Python的re库对网页内容进行解析，提取所需的信息。可以使用如下代码：

import re

pattern = r'\[(.*?)\]'
match = re.search(pattern, html_content)
data_str = match.group(0)
print(data_str)

在上面的代码中，使用正则表达式匹配出网页内容中的json数据，并打印出所匹配到的数据。

3. 解析数据

接下来，要对json数据进行解析，提取股票的历史数据信息。可以使用如下代码：

import json

data_list = json.loads(data_str)
history_list = []
for data in data_list:
    history = {}
    items = data.split(',')
    history['date'] = items[0]
    history['open'] = float(items[1])
    history['close'] = float(items[2])
    history['high'] = float(items[3])
    history['low'] = float(items[4])
    history_list.append(history)
print(history_list)

在上面的代码中，将json数据解析成一个包含股票历史信息的列表。其中，每个历史数据都是一个字典，包含了日期、开盘价、收盘价、最高价、最低价等信息。

示例一

以上代码可以爬取所有的股票历史数据，包括沪深股市的所有股票。假设现在要爬取某只股票的历史数据，可以按如下方法修改代码：

import requests
import re
import json

def crawl_stock_history(stock_code, start_date, end_date):
    url_tpl = 'https://q.stock.sohu.com/hisHq?code=%s&start=%s&end=%s&stat=1&order=D&period=d&callback=historySearchHandler&rt=jsonp'
    url = url_tpl % (stock_code, start_date, end_date)

    response = requests.get(url)
    html_content = response.content.decode('gbk')

    pattern = r'\[(.*?)\]'
    match = re.search(pattern, html_content)
    data_str = match.group(0)

    history_list = []
    data_list = json.loads(data_str)
    for data in data_list:
        history = {}
        items = data.split(',')
        history['date'] = items[0]
        history['open'] = float(items[1])
        history['close'] = float(items[2])
        history['high'] = float(items[3])
        history['low'] = float(items[4])
        history_list.append(history)

    return history_list

stock_code = 'cn_600519'
start_date = '20220101'
end_date = '20220131'
history_list = crawl_stock_history(stock_code, start_date, end_date)
print(history_list)

在上面的代码中，新增了一个crawl_stock_history函数，根据给定的股票代码、开始日期和结束日期，爬取对应的历史数据。

示例二

在股票的历史数据中，还经常会使用到技术指标，如均线、MACD等。针对这些指标，我们可以使用Python的ta库导入相应的技术指标，并将其添加到历史数据中。例如，下面的代码将MA5和MA10指标添加到股票历史数据中：

import requests
import re
import json
import ta

def add_tech_indicators(history_list):
    df = ta.utils.DataFrame(history_list)
    df = ta.add_all_ta_features(df, open='open', high='high', low='low', close='close', volume=None, fillna=False)
    history_list = df.to_dict('records')
    return history_list

stock_code = 'cn_600519'
start_date = '20220101'
end_date = '20220131'
history_list = crawl_stock_history(stock_code, start_date, end_date)
history_list = add_tech_indicators(history_list)
print(history_list)

在上面的代码中，使用了Python的ta库计算MA5和MA10，并将这两个指标添加到股票历史数据中。可以看到，历史数据中新增了MA5和MA10两个字段。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：基于Python爬取搜狐证券股票过程解析 - Python技术站

基于Python爬取搜狐证券股票过程解析

1. 爬取网页

2. 解析网页

3. 解析数据

示例一

示例二

相关文章