Association Rule Mining with Apriori Algorithm

4 min readJan 31, 2021

Before moving to the association rule mining, we consider what data mining is. According to Wikipedia that defined as “Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems”. Initially, two main methods are there in data mining “Predicting Methods” and “Description Methods”. Under those two methods, we can see mainly a few of the tasks regarding data mining.

Classification — Predictive
Clustering — Descriptive
Association Rule Mining — Descriptive
Sequential Pattern Mining — Descriptive
Regression — Predictive
Deviation Detection — Predictive

Now, we move into the topic of this article.

Using association rule mining we can find the interesting association and relationships among a large set of data. Found association rules show how frequently and similarly occurred the items in the data set.

Applications of Association Rule Mining

This technique is very popular in market basket analysis.
Medical diagnosis
Census data
Protein sequences

How can we use association rule mining?

When we talking about association rule mining, we can see there are some special terms and measures.

Item set
Support count (σ)
Support
Frequent itemset
Association rule
Rule evaluation metrics

Let’s consider these terms deeply one by one with the below transactions table.

Itemset

A collection of one or more items are known as an itemset.

Ex: {printer, tablet, monitor}

Very often used another term is regarding with the itemset is k-itemset, which means about an item set with k number of items.

Support count(σ)

The frequency of an itemset is called the support count.

Example: σ({printer, monitor, headset}) = 1

Support

There is a clear relationship between the support and support count. Support is the fraction of transactions that contain an itemset.

Example: s({printer, monitor, headset}) = 1/10

Frequent itemset

An itemset whose support is greater than or equal to the minsup (minimum support) threshold value.

Association rules

An implication expression of form X →Y, where X and Y are itemsets.

Examples: {laptop, headset}→{tablet}

Rule evaluation metrics

These evaluation metrics are used to determine the association rules according to our transaction data. Important metrics are described below.

Support

The fraction of transactions that contain both X and Y itemsets.

Ex:

A.R. — {laptop, headset}→{tablet}

s = σ(laptop,headset,tablet) / |T| = 4/10 = 0.4

Confidence

Measure how often items in Y appear in transactions that contain X.

Ex:

A.R. — {laptop, headset}→{tablet}

c = σ(laptop,headset,tablet) / σ(laptop,headset) = 4/4 = 1

For determining the association rules, there are some approaches. Among them, the Brute Force approach is the simplest way to find association rules. But, in that method, we have to calculate support and confidence for all the transactions and after that, we have to confirm the criteria. To be an association rule, the transaction should be filled with the support ≥ minsup and confidence ≥ minconf criteria. Calculate and checking this for each transaction is a very high computational thing. To overcome this problem, the Apriori Algorithm is used to find association rules.