Tsinghua Science and Technology


self-supervised clustering, graph convolutional network, feature correspondence, semantic feature guidance, confusion matrix, evaluation indicator


The performances of semisupervised clustering for unlabeled data are often superior to those of unsupervised learning, which indicates that semantic information attached to clusters can significantly improve feature representation capability. In a graph convolutional network (GCN), each node contains information about itself and its neighbors that is beneficial to common and unique features among samples. Combining these findings, we propose a deep clustering method based on GCN and semantic feature guidance (GFDC) in which a deep convolutional network is used as a feature generator, and a GCN with a softmax layer performs clustering assignment. First, the diversity and amount of input information are enhanced to generate highly useful representations for downstream tasks. Subsequently, the topological graph is constructed to express the spatial relationship of features. For a pair of datasets, feature correspondence constraints are used to regularize clustering loss, and clustering outputs are iteratively optimized. Three external evaluation indicators, i.e., clustering accuracy, normalized mutual information, and the adjusted Rand index, and an internal indicator, i.e., the Davidson-Bouldin index (DBI), are employed to evaluate clustering performances. Experimental results on eight public datasets show that the GFDC algorithm is significantly better than the majority of competitive clustering methods, i.e., its clustering accuracy is 20% higher than the best clustering method on the United States Postal Service dataset. The GFDC algorithm also has the highest accuracy on the smaller Amazon and Caltech datasets. Moreover, DBI indicates the dispersion of cluster distribution and compactness within the cluster.