Python中使用Counter进行字典创建以及key数量统计的方法

当我们处理一些文本数据时，常常需要对其进行词频统计。Python中的Counter类提供了快速、便捷地进行词频统计和字典创建的方法。

创建Counter对象

Counter类可以接受任意的迭代对象作为输入，返回一个以元素为键、出现次数为值的字典。我们可以通过以下方式创建一个Counter对象：

from collections import Counter

text = "hello, world. this is an example text!"
word_count = Counter(text.split())
print(word_count)

输出结果：

Counter({'hello,': 1, 'world.': 1, 'this': 1, 'is': 1, 'an': 1, 'example': 1, 'text!': 1})

上面的例子中，我们利用字符串split()方法将文本分割为单词列表，然后将其传递给Counter类来创建一个Counter对象，对象中每个键值对都记录着单词出现的次数。

统计key数量

我们可以通过Counter对象的keys()方法获取到所有的key，然后利用len()函数获取key的数量，举例如下：

from collections import Counter

text = "hello, world. this is an example text!"
word_count = Counter(text.split())
keys_count = len(word_count.keys())
print('keys count:', keys_count)

输出结果：

keys count: 7

上面的例子中，我们利用Counter对象的keys()方法获取到所有的key，然后利用len()函数获取key的数量。

示例

下面我们举两个实际应用的例子。

示例1：统计电影中的单词频率

这个示例通过读取电影剧本，统计剧本中单词的出现次数。

from collections import Counter

with open('movie_script.txt', 'r') as f:
    text = f.read()

word_count = Counter(text.split())

for word, count in word_count.most_common(10):
    print(word, count)

输出结果：

the 955
i 731
to 639
you 625
a 570
and 525
of 517
in 363
that 354
is 324

上面的例子中，我们通过open()函数读取到电影剧本，然后利用Counter对象统计文本中的单词次数，最后使用most_common()方法输出出现频率最高的10个单词。

示例2：统计网站访问日志中IP出现的次数

这个示例通过读取网站的访问日志，统计IP地址的出现次数。

from collections import Counter

with open('access.log', 'r') as f:
    lines = f.readlines()

ips = []
for line in lines:
    ip = line.split()[0]
    ips.append(ip)

ip_count = Counter(ips)

for ip, count in ip_count.most_common(10):
    print(ip, count)

输出结果：

10.0.0.23 8670
10.0.0.56 8491
10.0.0.49 7515
10.0.0.79 6607
10.0.0.106 6198
10.0.0.235 6174
10.0.0.180 5849
10.0.0.198 5273
10.0.0.85 5168
10.0.0.117 5029

上面的例子中，我们通过open()函数读取到网站的访问日志，然后按行读取每条访问记录，从中获取到IP地址并记录到列表中。最后利用Counter对象统计IP地址出现的次数，使用most_common()方法输出访问量最大的10个IP地址。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：Python中使用Counter进行字典创建以及key数量统计的方法 - Python技术站

Python中使用Counter进行字典创建以及key数量统计的方法

创建Counter对象

统计key数量

示例

示例1：统计电影中的单词频率

示例2：统计网站访问日志中IP出现的次数

相关文章