Browse our site
You are here:
Data mining and knowledge discovery missing t...
Data mining and knowledge discovery missing topic: anomalous cluster clustering
Boris Mirkin (Higher School of Economics, Russian Federation)
Wednesday, 25th of March 2015, 14h00
FCT/UNL, Seminar Room (Ed. II)
I consider first a rather simple intuitive criterion of individual cluster analysis, the product of average within-cluster similarity and the number of elements in it to be maximized, and bring forth its mathematical properties relating the criterion with high density subgraphs and spectral clustering approach. Then I present a simple approximation anomalous cluster model leading to the criterion and families of very effective ADDI crisp clustering methods (Mirkin, 1987) and FADDIS fuzzy clustering methods (Mirkin, Nascimento, 2012); the latter leading to misteries in the popular Laplace data normalization.
Then I show that the celebrated square-error k-means clustering criterion can be equivalently reformulated as of finding a partition consisting of anomalous clusters. I will finish with a problem of consensus clustering to show that it is equivalent to anomalous similarity clustering and present experimental results of the superiority of this approach over competition.
Boris Mirkin is a Professor at the Faculty of Computer Science, National Research University Higher School of Economics, Moscow RF. He holds a PhD in Computer Science and DSc in Systems Analysis degrees from Russian Universities. In 1991-2010 he extensively travelled taking visiting research appointments in France (1991-3), USA (1993-8), Germany (1996-9) and a teaching appointment at Birkbeck University of London, UK (2000-2010).
He develops methods for clustering and interpretation of complex data within the “data recovery” perspective. Currently these approaches are being extended to automation of text analysis problems including the development and use of hierarchical ontologies. His latest publications: textbook "Core concepts in data analysis" (Springer 2011) and monograph "Clustering: A data recovery approach" (Chapman and Hall/CRC Press, 2012).
Departamento de Informática, FCT/UNL
Quinta da Torre 2829-516 CAPARICA - Portugal
Tel. (+351) 21 294 8536 FAX (+351) 21 294 8541