**The core of Machine Learning** is to find out rules in a set of data. One basic operation of Machine Learning is “Cluster”. Or simply call it “classify data”. For example, there are 1000 records of toy sales data. It will be useful for proactive new incoming customer’s behavior if we can classify the data to multiple groups (Such as “buy stuffed toys” group and “buy electronic toys” group).

So we need to leverage **partition methods** to classify the sales data. There are multiple ways to do so. One simple method call “*K-Means*“. It calculates the distance between each data point and centroids ( Center point of a group ). And then assign data points to the closest centroids. Wikipedia has a detail description of the method.

Hence, as you can see, the key to “K-Means” is to **calculate distance**. There are several ways of calculation. “*Euclidean Distance*” is one way. Please refer to Wikipedia for deep dive. Long to short, you need to distribute data to a 2D axis. Each data point has x and y value. “*Euclidean Distance*” between two data points is:

The formulation is simple, but you have to calculate the distance between each data point to every centroids. Following is a super simple PowerShell code to help calculate Euclidean Distance of a 3 clustered data.

This script assumes you want to partition records to a set of 3 clusters (K1, K2, K3). $K1, $K2, and $K3 are centroids of each cluster (group). You can adjust it according to your purpose.

The script loads records in “*input.txt*” file then calculates *Euclidean Distance* of each record. Each record in “*input.txt*” only has x and y value. Following is a sample of “*input.txt*“. You can copy it for testing.

x,y

2, 10

2, 5

8, 4

5, 8

7, 5

6, 4

1, 2

4, 9

Following is the result:

K1 to A1 distance is 1.9465096968677

K2 to A1 distance is 7.55968914704831

K3 to A1 distance is 6.51920240520265

K1 to A2 distance is 4.33461647669087

K2 to A2 distance is 5.04469027790607

K3 to A2 distance is 1.58113883008419

K1 to A3 distance is 6.61429512495474

K2 to A3 distance is 1.05304320899002

K3 to A3 distance is 6.51920240520265

K1 to A4 distance is 1.66400120192264

K2 to A4 distance is 4.17958131874474

K3 to A4 distance is 5.70087712549569

K1 to A5 distance is 5.20469979921993

K2 to A5 distance is 0.67

K3 to A5 distance is 5.70087712549569

K1 to A6 distance is 5.5162396612185

K2 to A6 distance is 1.05304320899002

K3 to A6 distance is 4.52769256906871

K1 to A7 distance is 7.49192231673554

K2 to A7 distance is 6.43652856748108

K3 to A7 distance is 1.58113883008419

K1 to A8 distance is 0.33

K2 to A8 distance is 5.55057654663009

K3 to A8 distance is 6.04152298679729

You must log in to post a comment.