Joshua Huang (E-Business Technology Institute, HKU) Title: Clustering Large Data Sets in Data Mining and Cluster Validation Abstract: Data mining is a process of discovering new knowledge from very large real world databases. Being scalable and being able to handle different data types are the two basic requirements for data mining algorithms. In this talk, I will focus clustering, one of the fundamental operations in data mining. I will first give a brief review of clustering algorithms used in data mining. Then, I will discuss the family of the k-means, k-modes and k-prototypes clustering algorithms. After that, I will discuss the problem of cluster validation in data mining and present a visual cluster validation method. Then, I will present a method to use the k-prototypes algorithm and visual cluster validation to interactively build classification models. Finally, I conclude my talk with some future research topics.