Tsinghua Science and Technology


genetic algorithm, gene signature, breast cancer, sparse logistic regression, predictor, chemosensitivity


Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment. After this treatment patients who achieve a pathologic Complete Response (pCR) usually have a favorable prognosis than those without. Therefore, pCR is now considered as the best prognosticator for patients with neoadjuvant chemotherapy. However, not all patients can benefit from this treatment. As a result, we need to find a way to predict what kind of patients can induce pCR. Various gene signatures of chemosensitivity in breast cancer have been identified, from which such predictors can be built. Nevertheless, many of them have their prediction accuracy around 80%. As such, identifying gene signatures that could be employed to build high accuracy predictors is a prerequisite for their clinical tests and applications. Furthermore, to elucidate the importance of each individual gene in a signature is another pressing need before such signature could be tested in clinical settings. In this study, Genetic Algorithm (GA) and Sparse Logistic Regression (SLR) along with t-test were employed to identify one signature. It had 28 probe sets selected by GA from the top 65 probe sets that were highly overexpressed between pCR and Residual Disease (RD) and was used to build an SLR predictor of pCR (SLR-28). This predictor tested on a training set (n = 81) and validation set (n = 52) had very precise predictions measured by accuracy, specificity, sensitivity, positive predictive value, and negative predictive value with their corresponding P value all zero. Furthermore, this predictor discovered 12 important genes in the 28 probe set signature. Our findings also demonstrated that the most discriminative genes measured by SLR as a group selected by GA were not necessarily those with the smallest P values by t-test as individual genes, highlighting the ability of GA to capture the interacting genes in pCR prediction as multivariate techniques. Our gene signature produced superior performance over a signature found in one previous study with prediction accuracy 92% vs 76%, demonstrating the potential of GA and SLR in identifying robust gene signatures in chemo response prediction in breast cancer.


Tsinghua University Press