Python数据分析之时间序列分析

时间序列分析是数据分析领域的一个重要分支，涉及到分析连续的时间点或间隔的数据。Python数据分析工具可以用来分析和可视化时间序列数据，帮助我们更好地理解趋势、季节性、周期性和其他相关性。

时间序列数据的读取

首先，我们需要读取并准备时间序列数据。在Python中，我们可以使用pandas库来读取和处理时间序列数据。以下是一个简单的示例，用于读取一个CSV文件并将日期/时间列转换为时间序列。我们将使用walmart_stock.csv数据集：

import pandas as pd

#读取CSV文件
df = pd.read_csv('walmart_stock.csv')

#转换日期/时间列为时间序列
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

时间序列数据的可视化

我们可以使用matplotlib和seaborn这两个库来可视化时间序列数据。以下是一个简单的示例，用于绘制walmart_stock.csv数据集中的收盘价数据：

import matplotlib.pyplot as plt
import seaborn as sns

#创建绘图窗口
sns.set_style('whitegrid')
fig, ax = plt.subplots(figsize=(12, 6))

#绘制收盘价数据
ax.plot(df['Close'])

#添加标签和标题
ax.set_xlabel('Date')
ax.set_ylabel('Closing Price ($)')
ax.set_title('Walmart Stock Closing Prices')
plt.show()

时间序列数据的平稳性检验

时间序列分析的基本假设是，我们正在处理一个平稳的时间序列。因此，在进行时间序列分析之前，我们需要检查数据的平稳性。对于大多数情况下，我们可以使用ADF单位根检验来检查时间序列数据的平稳性。以下是一个简单的示例，用于进行ADF单位根检验，检查walmart_stock.csv数据集的收盘价数据是否平稳：

from statsmodels.tsa.stattools import adfuller

#进行ADF单位根检验
result = adfuller(df['Close'])

#输出检验结果
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:')
for key, value in result[4].items():
    print('\t%s: %.3f' % (key, value))

时间序列数据的差分

如果时间序列数据不平稳，我们需要对其进行差分处理，以获得平稳的时间序列。差分是指计算相邻两个时间点之间的差异。以下是一个简单的示例，用于差分walmart_stock.csv数据集的收盘价数据，并可视化差分之后的数据：

#差分收盘价数据
df_diff = df['Close'].diff().dropna()

#创建绘图窗口
sns.set_style('whitegrid')
fig, ax = plt.subplots(figsize=(12, 6))

#绘制差分之后的数据
ax.plot(df_diff)

#添加标签和标题
ax.set_xlabel('Date')
ax.set_ylabel('Difference of Closing Price ($)')
ax.set_title('Walmart Stock Closing Prices (Differenced)')
plt.show()

时间序列数据的拟合

一旦我们获得了平稳的时间序列数据，我们可以对其进行拟合。我们可以使用AR、MA、ARMA和ARIMA这些时间序列模型来拟合时间序列数据。以下是一个简单的示例，用于拟合walmart_stock.csv数据集的收盘价数据，并可视化拟合结果：

from statsmodels.tsa.arima.model import ARIMA

#拟合收盘价数据
model = ARIMA(df['Close'], order=(1, 1, 1))
model_fit = model.fit()

#创建绘图窗口
sns.set_style('whitegrid')
fig, ax = plt.subplots(figsize=(12, 6))

#绘制原始数据和拟合结果
ax.plot(df['Close'], label='Actual')
ax.plot(model_fit.fittedvalues, label='ARIMA(1,1,1) Model')

#添加标签和标题
ax.set_xlabel('Date')
ax.set_ylabel('Closing Price ($)')
ax.set_title('Walmart Stock Closing Prices (ARIMA(1,1,1) Model)')
ax.legend()
plt.show()

以上是本文对Python数据分析之时间序列分析的简要介绍，包括数据读取、可视化、平稳性检验、差分和拟合。如果您对于时间序列数据感兴趣，可以充分利用这些工具来探索和分析数据，以及预测未来的趋势。

示例说明

以下是两个示例，用于说明时间序列分析的应用：

示例一：风速和电力产量的相关性

在这个示例中，我们使用pandas库和matplotlib库来分析风速和电力产量之间的相关性。我们将自己创建一个数据集，其中包含了风速和电力产量数据。以下是代码示例：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 生成风速和电力产量的数据
wind_speed = np.random.normal(10, 3, 1000)
power_output = np.sin(wind_speed) + np.random.normal(0, 0.5, 1000)

# 将数据转换为pandas DataFrame对象
df = pd.DataFrame({'Wind Speed': wind_speed, 'Power Output': power_output})

# 绘制风速和电力产量之间的散点图
plt.scatter(df['Wind Speed'], df['Power Output'])
plt.xlabel('Wind Speed')
plt.ylabel('Power Output')
plt.title('Wind Speed vs. Power Output Scatter Plot')
plt.show()

运行以上代码，我们可以生成一个散点图，用于展示风速和电力产量之间的相关性。

接下来，我们可以使用pandas和matplotlib绘制折线图，用于展示风速和电力产量之间的时间序列数据。以下是代码示例：

# 将日期/时间转换为时间序列
dti = pd.date_range(start='2022-01-01', end='2022-02-01', freq='H')

# 创建时间序列数据
df = pd.DataFrame({'Date/Time': dti, 'Wind Speed': wind_speed[:dti.size], 'Power Output': power_output[:dti.size]})
df.set_index('Date/Time', inplace=True)

# 创建绘图窗口
fig, ax = plt.subplots(figsize=(12, 6))

# 绘制风速和电力产量之间的时间序列数据
ax.plot(df.index, df['Wind Speed'], label='Wind Speed')
ax.plot(df.index, df['Power Output'], label='Power Output')

# 添加标签和标题
ax.set_xlabel('Date/Time')
ax.set_ylabel('Value')
ax.set_title('Wind Speed and Power Output Time Series Data')
ax.legend()
plt.show()

运行以上代码，我们可以生成一个时间序列图，用于展示风速和电力产量的趋势以及它们之间的相关性。

示例二：COVID-19数据的可视化和拟合

在这个示例中，我们使用pandas库、matplotlib库和statsmodels库来分析COVID-19疫情数据，并拟合一个时间序列模型，以预测未来的趋势。我们将使用disease.csv数据集，其中包含了COVID-19感染率、治愈率和死亡率数据。以下是代码示例：

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA

# 读取疫情数据
df = pd.read_csv('disease.csv')

# 将“日期”列转换为时间序列
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# 创建绘图窗口
plt.figure(figsize=(12, 8))

# 绘制COVID-19感染率、治愈率和死亡率之间的时间序列数据
plt.plot(df.index, df['Confirmed'], label='Confirmed Cases')
plt.plot(df.index, df['Recovered'], label='Recovered Cases')
plt.plot(df.index, df['Deaths'], label='Deaths')

# 添加标签和标题
plt.xlabel('Date')
plt.ylabel('Number of Cases')
plt.title('COVID-19 Time Series Data')
plt.legend()
plt.show()

# 拟合COVID-19感染率时间序列数据
model = ARIMA(df['Confirmed'], order=(7, 0, 1))
model_fit = model.fit()

# 计算10天的预测值
forecast = model_fit.forecast(steps=10)

# 创建绘图窗口
plt.figure(figsize=(12, 8))

# 绘制COVID-19感染率时间序列数据和拟合结果
plt.plot(df.index, df['Confirmed'], label='Actual')
plt.plot(model_fit.fittedvalues, label='ARIMA Model')
plt.plot(forecast.index, forecast.values, label='Forecast')

# 添加标签和标题
plt.xlabel('Date')
plt.ylabel('Number of Cases')
plt.title('COVID-19 Time Series Data (ARIMA Model)')
plt.legend()
plt.show()

运行以上代码，我们可以生成一个时间序列图，用于展示COVID-19疫情数据的趋势以及它们之间的相关性；并且我们还可以拟合一个ARIMA模型，以预测未来10天的感染率趋势。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：python数据分析之时间序列分析详情 - Python技术站

python数据分析之时间序列分析详情