Clinical implementation of the intrinsic subtypes of breast cancer

CM Perou, JS Parker, A Prat, MJ Ellis… - The lancet …, 2010 - thelancet.com
The lancet oncology, 2010thelancet.com
Our group has reported several intrinsic gene sets important for identifying subtypes of
breast cancer with clinical significance. 1–3 In these studies we have explored and
published methods for sample classification across different genomic platforms and tissue
qualities. In 2009, we suggested the use of a standardised gene set (PAM50) for subtype
classification to improve the classification concordance reported by investigators. 3
However, a standardised gene set does not completely resolve discrepancies between …
Our group has reported several intrinsic gene sets important for identifying subtypes of breast cancer with clinical significance. 1–3 In these studies we have explored and published methods for sample classification across different genomic platforms and tissue qualities. In 2009, we suggested the use of a standardised gene set (PAM50) for subtype classification to improve the classification concordance reported by investigators. 3 However, a standardised gene set does not completely resolve discrepancies between researchers since the genes might be quantitatively measured using different platforms and normalisation methods. Weigelt and coworkers4 applied three different intrinsic gene sets to four data sets using one prediction method and showed a range of agreement. Because of the level of discordance that was reported, they concluded that identification of the intrinsic subtypes is not ready for clinical implementation. We disagree with this interpretation. Careful examination of Weigelt and co-workers’ analyses4 revealed bioinformatics-based technical limitations that reduced the accuracy in subtype predictions and concordance of these three predictors. These limitations are highlighted in the accompanying letters, but we emphasise here the importance of dataset to dataset normalisation. The webappendix shows the relationship between the four datasets when they are not normalised (as done by Weigelt and co-workers4) and when row centring and column standardisation is done (as advocated by ourselves and others1–3, 5, 6). Additionally, differences in the composition of datasets (ie, proportion of oestrogen receptor-positive [ER+] tumours) can affect sample classification. 6 Nonetheless, all three gene sets were significant predictors of outcomes in univariate and multivariate testing, which suggests that this is a robust classification method. Many clinical assays begin in the research setting and are refined over time until ready for clinical use. Clinical concordance testing rarely has perfect agreement even under the best of circumstances, such as measuring a single analyte with a locked-down protocol across CLIA (Clinical Laboratory Improvement Amendments) laboratories. There is little value, and potential harm, to draw conclusions about the robustness and utility of a test based on research data from independent laboratories not intended for concordance testing, as Weigelt and colleagues4 and the accompanying commentary7 interpreted their findings. The interpretation of Weigelt and co-workers4 is also based on the hypothesis that training sets with different tumours and genes should result in high agreement in subtype classification. In fact, these three training sets were not specifically designed to be concordant at the individual-sample level. The reason for this is that these three classifiers reflect the logical evolution over time of a classification method based on the most up-to-date data and technologies available. Over the last decade, we have learned a great deal about microarray experimental design, objective statistical selection procedures, and the microarray technology itself has dramatically improved. For these reasons, we believe that the most accurate and trustworthy assay is the PAM50 assay. 3 In view of the confusion over what a single sample predictor (SSP) is, we define a SSP here as any predictor where the algorithm and any parameter values are exclusively determined from a training set, and test cases are assessed independently. This requires that normalisation of a test case (be it R/G ratio or housekeeper normalised for example) is not dependent on measurements from other test cases. In theory, any centroid …
thelancet.com