22. Januar 2020 · Johannes Blömer (Universität Paderborn) · Soft and Hard Clustering


Soft and Hard Clustering

Clustering is one of the main techniques in data analysis. Usually its goal is to partition a set of objects into disjoint clusters such that objects within a cluster are more similar than objects in different clusters. Clusters then are represented by so-called centroids. In soft clustering objects may belong to several clusters with certain percentages while clusters are again represented by centroids. In some cases and applications centroids from soft clustering are better representatives for the whole data set. Many different algorithms have been designed to solve hard clustering problems approximately optimal. For soft clustering problems much less is known. We present several results for soft clusterings that rely on a general technique to transform solutions to a soft clustering to solutions to hard clustering with very similar properties. We also show that the same techniques can be used to simplify and improve the so-called expectation-maximization heuristic (EM algorithm)to compute mixture models for data sets. This is joint work with Sascha Brauer and Katrin Bujna.