基于Python实现的微信好友数据分析攻略

准备工作

为了进行微信好友数据分析，我们需要完成以下准备工作：

安装Python编程环境和必要的Python包，如pandas, matplotlib等。
获取微信好友聊天记录数据文件，可以导出微信聊天记录到文件，通常以txt格式保存。

数据清洗

在进行数据分析前，我们需要对数据进行清洗，以消除数据上的噪声以及非数据内容。在本实例中，我们将主要对微信好友聊天记录数据进行清洗。

步骤

读取数据文件：使用Python编程环境的pandas包读取数据文件，可以使用read_csv函数读取txt文件。
数据清洗：针对微信聊天记录，需要排除非文本内容。（如：图片、表情等）
数据去重：为了消除重复数据带来的影响，我们需要去除重复的行数据。
删除无用列数据：保留对我们有用的数据。在微信聊天记录中，我们关心的数据有：发送方，接收方，时间和聊天内容。

示例

import pandas as pd
# 读取数据文件
df = pd.read_csv('wechat_chat_history.txt', header=None, names=['text'])
# 清洗数据，去掉聊天记录中的图片、表情等非文本内容
df = df[~df.text.str.contains('<img')]
df = df[~df.text.str.contains('<span class=\"emoji')]
# 去重重复数据
df.drop_duplicates(inplace=True)
# 获取有用的数据（发送方，接收方，时间和聊天内容）
df[['user', 'type1', 'type2', 'type3', 'chat']] = df.text.str.split(n=4, expand=True)
df.drop(['text', 'type1', 'type2', 'type3'], axis=1, inplace=True)

数据分析

完成数据清洗步骤后，我们开始进行微信好友数据分析。以下是本实例中，我们将要探索的问题：

总体聊天趋势：每天，每周，每月聊天量的变化情况。
活跃聊天好友：哪些好友最活跃？
好友性别比例：我有多少比例的男女好友？
好友省份分布情况：散居各地的好友分布在省份上面如何？

步骤

总体聊天趋势

统计每日、每周、每月聊天次数。
使用可视化工具（如：matplotlib）绘制时间序列图，直观呈现聊天趋势。

活跃聊天好友

根据聊天次数多少，排序输出好友列表。
支持查看不同时间间隔内的好友活跃度。

好友性别比例

使用好友性别的头像进行统计。
将头像转为黑白图像，使用图像处理工具（如：opencv）进行颜色统计。
统计男女头像比例，绘制饼图或柱状图进行可视化呈现。

好友省份分布情况

通过解析好友所在地，获取省份信息。
将所有的好友按省份进行分组，统计每个省份的好友数量。
绘制热力图或地图，可视化呈现好友分布情况。

示例

总体聊天趋势

import pandas as pd
import matplotlib.pyplot as plt

# 读取数据文件
df = pd.read_csv('wechat_chat_history.txt', header=None, names=['text'])
# 清洗数据，去掉聊天记录中的图片、表情等非文本内容
df = df[~df.text.str.contains('<img')]
df = df[~df.text.str.contains('<span class=\"emoji')]
# 去重重复数据
df.drop_duplicates(inplace=True)
# 获取有用的数据（发送方，接收方，时间和聊天内容）
df[['user', 'type1', 'type2', 'type3', 'chat']] = df.text.str.split(n=4, expand=True)
df.drop(['text', 'type1', 'type2', 'type3'], axis=1, inplace=True)

# 统计每一天的聊天次数
df['date'] = pd.to_datetime(df['chat'].str[:10])
daily_counts = df.groupby('date').size().reset_index(name='counts')
daily_counts.set_index('date', inplace=True)

# 统计每一周的聊天次数
weekly_counts = daily_counts.resample('w').sum()

# 统计每一月的聊天次数
monthly_counts = daily_counts.resample('m').sum()

plt.figure(figsize=(16, 4))
plt.plot(daily_counts, label='daily counts')
plt.plot(weekly_counts, label='weekly counts')
plt.plot(monthly_counts, label='monthly counts')
plt.xlabel('date')
plt.ylabel('chat counts')
plt.legend()
plt.show()

活跃聊天好友

import pandas as pd
import matplotlib.pyplot as plt

# 读取数据文件
df = pd.read_csv('wechat_chat_history.txt', header=None, names=['text'])
# 清洗数据，去掉聊天记录中的图片、表情等非文本内容
df = df[~df.text.str.contains('<img')]
df = df[~df.text.str.contains('<span class=\"emoji')]
# 去重重复数据
df.drop_duplicates(inplace=True)
# 获取有用的数据（发送方，接收方，时间和聊天内容）
df[['user', 'type1', 'type2', 'type3', 'chat']] = df.text.str.split(n=4, expand=True)
df.drop(['text', 'type1', 'type2', 'type3'], axis=1, inplace=True)

# 按照联系人计数，并排序输出
friend_counts = df['user'].value_counts().reset_index().rename(columns={'index':'friend', 'user': 'counts'})
friend_counts.plot(kind='bar', x='friend', y='counts', figsize=(16, 6))
plt.xlabel("friend")
plt.ylabel("counts")
plt.show()

参考资料

Pandas documentation, “Pandas next-generation data manipulation for Python”. https://pandas.pydata.org/
Matplotlib documentation, “2D plotting library for Python”. https://matplotlib.org/
OpenCV documentation, “Open Source Computer Vision Library”. https://opencv.org/

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：基于Python实现的微信好友数据分析 - Python技术站

基于Python实现的微信好友数据分析

基于Python实现的微信好友数据分析攻略

准备工作

数据清洗

步骤

示例

数据分析

步骤

总体聊天趋势

活跃聊天好友

好友性别比例

好友省份分布情况

示例

总体聊天趋势

活跃聊天好友

参考资料

相关文章