David Cheung (Computer Science and Information Systems, HKU) Title: S-Grace, An S-Graph-based Clustering Algorithm for Query Performance Enhancement on XML Documents Abstract: Using relational tables to store XML documents is an established trend. However, it fragments the documents and creates a large number of joins that seriously impacts query performance. If the collection contains documents of different structures, we show that a proper clustering of the documents will alleviate the problem. To achieve a good clustering on XML documents, we propose an algorithm S-GRACE which clusters documents according to their XML structures. S-GRACE is a hierarchical clustering algorithm for semi-structure data. The notion of structure graph (s-graph) is proposed which facilitates the definition of a distance metric applicable between documents as well as between clusters of documents. Our experiments with real data such as the DBLP database show that S-GRACE can discover clusters that cannot be spotted easily by manual action.