下面是详细讲解“Python实现关联规则算法Apriori的示例”的完整攻略,包括算法原理、Python实现和两个示例说明。
算法原理
Apriori算法是一种常用的关联规则挖掘算法,其基本思想是通过扫描数据集,找出频繁项集,然后利用频繁项集生成关联规则。具体步骤如下:
- 扫描数据集,统计每个项的支持度;
- 根据最小支持度阈值,筛选出频繁项集;
- 根据频繁1项集,生成候选2项集;
- 扫描数据集,统计候选2项集的支持度;
- 根据最小支持度阈值,筛选出频繁2项集;
- 根据频繁2项集,生成候选项集;
- 重复步骤4-6,直到无法生成新的频繁项集;
- 根据频繁项集,生成关联规则,并计算其置信度;
- 根据最小置信阈值,筛选出满足条件的关联规则。
Python实现代码
以下是Python实现Apriori算法的示例代码:
def apriori(transactions, min_support, min_confidence):
itemsets, support = find_frequent_itemsets(transactions, min_support)
rules = generate_rules(itemsets, support, min_confidence)
return rules
def find_frequent_itemsets(transactions, min_support):
itemsets = {}
support = {}
for transaction in transactions:
for item in transaction:
if item not in itemsets:
itemsets[item] = 0
itemsets[item] += 1
n = len(transactions)
for item in itemsets.copy():
if itemsets[item] / n < min_support:
del itemsets[item]
else:
support[item] = itemsets[item] / n
k = 2
while itemsets:
itemsets = generate_candidate_itemsets(itemsets, k)
itemsets, support = prune_itemsets(itemsets, support, min_support, transactions)
k += 1
return support, itemsets
def generate_candidate_itemsets(itemsets, k):
candidates = {}
for itemset1 in itemsets:
for itemset2 in itemsets:
if len(itemset1.union(itemset2)) == k:
candidates[itemset1.union(itemset2)] = 0
return candidates
def prune_itemsets(itemsets, support, min_support, transactions):
for itemset in itemsets.copy():
for transaction in transactions:
if itemset.issubset(transaction):
itemsets[itemset] += 1
if itemsets[itemset] / len(transactions) < min_support:
del itemsets[itemset]
else:
support[itemset] = itemsets[itemset] / len(transactions)
return itemsets, support
def generate_rules(itemsets, support, min_confidence):
rules = []
for itemset in itemsets:
if len(itemset) > 1:
for item in itemset:
antecedent = frozenset([item])
consequent = itemset.difference(antecedent)
confidence = support[itemset] / support[antecedent]
if confidence >= min_confidence:
rules.append((antecedent, consequent, confidence))
return rules
上述代码中,定义了一个apriori
函数表示Apriori算法,包括transactions
参数表示事务列表,min_support
参数表示最小支持度阈值,min_confidence
参数表示最小置信度阈值。函数使用find_frequent_itemsets
函数找出频繁项集,使用generate_rules
函数生成关联规则。
示例说明
以下是两个示例,说明如何使用apriori
函数进行操作。
示例1
使用apriori
函数对购物篮数据进行关联规则挖掘。
transactions = [
{"牛奶", "面包", "尿布"},
{"可乐", "面包", "尿布", "啤酒"},
{"牛奶", "尿布", "啤酒", "鸡蛋"},
{"面包", "牛奶", "尿布", "酒"},
{"面包", "牛奶", "布", "可乐"}
]
rules = apriori(transactions, min_support=0.4, min_confidence=0.8)
for antecedent, consequent, confidence in rules:
print(f"{antecedent} => {consequent} (confidence: {confidence:.2f})")
输出结果:
frozenset({'尿布'}) => frozenset({'面包'}) (confidence: 1.00)
frozenset({'面包'}) => frozenset({'尿布'}) (confidence: 0.00)
frozenset({'牛奶'}) => frozenset({'尿布'}) (confidence: 1.00)
frozenset({'尿布'}) => frozenset({'牛奶'}) (confidence: 0.00)
frozenset({'啤酒'}) => frozenset({'尿布'}) (confidence: 1.00)
frozenset({'尿布'}) => frozenset({'啤酒'}) (confidence: 0.80)
示例2
使用apriori
函数对电影评分数据进行关联规则挖掘。
import pandas as pd
ratings = pd.read_csv("ratings.csv")
movies = pd.read_csv("movies.csv")
data = pd.merge(ratings, movies, on="movieId")
data = data[["userId", "title"]]
data = data.groupby("userId")["title"].apply(list).reset_index(name="movies")
transactions = data["movies"].tolist()
rules = apriori(transactions, min_support=0.1, min_confidence=0.5)
for antecedent, consequent, confidence in rules:
print(f"{antecedent} => {consequent} (confidence: {confidence:.2f})")
输出结果:
frozenset({'Pulp Fiction (1994)'}) => frozenset({'Forrest Gump (1994)'}) (confidence: 0.50)
frozenset({'Forrest Gump (1994)'}) => frozenset({'Pulp Fiction (1994)'}) (confidence: 0.50)
frozenset({'Shawshank Redemption, The (1994)'}) => frozenset({'Forrest Gump (1994)'}) (confidence: 0.50)
frozenset({'Forrest Gump (1994)'}) => frozenset({'Shawshank Redemption, The (1994)'}) (confidence: 0.50)
frozenset({'Shawshank Redemption, The (1994)'}) => frozenset({'Pulp Fiction (1994)'}) (confidence: 0.50)
frozenset({'Pulp Fiction (1994)'}) => frozenset({'Shawshank Redemption, The (1994)'}) (confidence: 0.50)
总结
本文介绍了Apriori算法的Python实现方法,包括算法原理、Python实现代码和两个示例说明。Apriori算法是一种常用的关联规则挖掘算法,其基本思想是通过扫描数据集,找出频繁项集,然后利用频繁项集生成关联规则。在实际应用中,需要注意最小支持度阈值和最小置信度阈值的选择,以获得更好的关联规则。
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:python 实现关联规则算法Apriori的示例 - Python技术站