protein complex, similarity preclustering, protein-protein interaction networks, K-means
Proteins interact with each other to form protein complexes, and cell functionality depends on both protein interactions and these complexes. Based on the assumption that protein complexes are highly connected and correspond to the dense regions in Protein-protein Interaction Networks (PINs), many methods have been proposed to identify the dense regions in PINs. Because protein complexes may be formed by proteins with similar properties, such as topological and functional properties, in this paper, we propose a protein complex identification framework (KCluster). In KCluster, a PIN is divided into K subnetworks using a K-means algorithm, and each subnetwork comprises proteins of similar degrees. We adopt a strategy based on the expected number of common neighbors to detect the protein complexes in each subnetwork. Moreover, we identify the protein complexes spanning two subnetworks by combining closely linked protein complexes from different subnetworks. Finally, we refine the predicted protein complexes using protein subcellular localization information. We apply KCluster and nine existing methods to identify protein complexes from a highly reliable yeast PIN. The results show that KCluster achieves higher Sn and Sp values and f-measures than other nine methods. Furthermore, the number of perfect matches predicted by KCluster is significantly higher than that of other nine methods.
Tsinghua University Press
Xiaoqing Peng, Xiaodong Yan, Jianxin Wang. Framework to Identify Protein Complexes Based on Similarity Preclustering. Tsinghua Science and Technology 2017, 22(1): 42-51.