Hard and Soft Clustering Explained
November 17, 2016
I read “An Introduction to Clustering and Different Methods of Clustering.” Clustering, it seems, remains a popular topic among the quasi-search and content processing crowd. What’s interesting about this write up is that it introduces hard clustering and soft clustering. I had assumed that clustering was neither hard nor soft. Here’s the distinction:
- In hard clustering, each data point either belongs to a cluster completely or not. For example, in the above example each customer is put into one group out of the 10 groups.
- In soft clustering, instead of putting each data point into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned.
The write up then highlights these go-to methods of clustering:
- K means clustering
- Hierarchical clustering.
The write up introduces the idea of supervised learning. I noted that the article did not point out that training is a time consuming and often expensive exercise. The omission complements the “quick look” approach in the write up.
I am not sure that a person interested in clustering will be able to make a giant leap forward. Perhaps the effort will result in a hard soft landing?
Stephen E Arnold, November 17, 2016