Clustering - Bernhard Pfann, CFA

Clustering is a type of *unsupervised learning*, where the goal is to group $n$ observations into $K$ clusters without any labels, based on their similarity. $ S_n = \{x^{(i)} \vert i=1 \dots n\} $ ## Partitioning A partition is a solution to group all elements of a set into clusters $\mathbb C_1, \mathbb C_2, \dots , \mathbb C_K$. Each cluster contains a set of indices (i.e. position of elements from $S_n$ that belong to that cluster). **Criteria for Valid Partition:** - *Complete coverage:* The union of all clusters contains all elements of $S_n$. $ \mathbb C_1 \cup \mathbb C_2 \cup \dots \cup \mathbb C_K = \{1,2, \dots, n\} $ - *No overlap:* The intersection of any two different clusters is the empty set. This means that each observation is only found in exactly $1$ cluster. $ \mathbb C_i \cap \mathbb C_j = \emptyset \quad \text{for any }i \not = j \text{ in } \{1, \dots , k\} $ >[!note:] >A trivial but valid partition would be having only a single cluster containing all observations. ## Costs Instead of a traditional loss function, clustering uses a *cost function* to evaluate the quality of a partition. The total cost is the sum of the costs for individual clusters: $ \text{Cost}(\mathbb C_1, \dots, \mathbb C_k)= \sum_{j=1}^k \text{Cost}(\mathbb C_j) $ The cost of each cluster can be defined in various ways: - Diameter of cluster (i.e. distance between the two farthest points) - Average distance between all possible points combinations - Average distance between each point and the class representation $z_j$ The latter one can be written as follows, where $x^{(i)}$ is the observed data point, and $z_j$ is the class representation (e.g. centroid) of $\mathbb C_j$. $ \text{Cost}(\mathbb C_j, z_j)=\sum_{i \in \mathbb C_j} \text{distance}(x^{(i)}, z_j) $ **Total Clustering Costs:** Using squared Euclidean distance as the measure of dissimilarity, the total cost can be expressed as follows. $ \text{Cost}(\mathbb C_1, \dots, \mathbb C_k, z_1, \dots , z_k)=\sum_{j=1}^k \sum_{i \in \mathbb C_j} ||x^{(i)}-z_j ||^2 $