Mapping the proteo-genomic convergence of human diseases

M Pietzner, E Wheeler, J Carrasco-Zanini, A Cortes… - Science, 2021 - science.org
M Pietzner, E Wheeler, J Carrasco-Zanini, A Cortes, M Koprulu, MA Wörheide, E Oerton…
Science, 2021science.org
INTRODUCTION Proteins are essential functional units of the human body and represent
the largest class of drug targets. RATIONALE Broad-capture proteomics has the potential to
identify causal disease genes, mechanisms, and candidate drug targets through
systematically integrating knowledge about genetic signals that are shared among the
protein-encoding gene, the resulting protein abundance or function, and common complex
diseases. Although technological advances now enable such enquiry at scale, the genetic …
INTRODUCTION
Proteins are essential functional units of the human body and represent the largest class of drug targets.
RATIONALE
Broad-capture proteomics has the potential to identify causal disease genes, mechanisms, and candidate drug targets through systematically integrating knowledge about genetic signals that are shared among the protein-encoding gene, the resulting protein abundance or function, and common complex diseases. Although technological advances now enable such enquiry at scale, the genetic architecture of most proteins and its relevance for human health remains unknown. We performed a genome-proteome–wide association study including 4775 protein targets measured in plasma from 10,708 European-descent individuals (mean age 48.6 years, 53.3% women). We used the identified protein–quantitative trait loci (pQTLs) to create a proteo-genomic map of human health based on shared, colocalized genetic architecture tested across thousands of phenotypes at protein-encoding loci (cis-pQTLs).
RESULTS
We identified 10,674 genetic variant–protein target associations (P < 1.004 × 10–11) distributed across 2548 genomic regions (1097 unreported) and covering 3892 distinct protein targets. Of 1538 protein targets with at least one cis-pQTL, we found that half share a genetic signal with gene expression in at least one of 49 tissues; alternative splicing events account for about one-fifth of those, demonstrating the utility of plasma proteomics as a means to infer tissue effects. We demonstrated that cis-pQTLs helped to prioritize candidate causal genes at 558 established risk loci for 537 collated phenotypes. For one-fourth of these (24.6%), this included genes not reported or different from those prioritized by gene expression QTLs, including PRSS8 (encoding prostasin) for Alzheimer’s disease or RSPO1 (encoding R-spondin–1) for endometrial cancer. We created a cis-anchored proteo-genomic map of human health including 1859 gene-protein-phenotype connections comprising 412 proteins and 506 curated traits. The map highlighted strong cross-disease biological convergence. For example, the genetic signal at EFEMP1 (EGF-containing fibulin-like extracellular matrix protein 1) was shared across diverse connective tissue disorders consistent with abnormal elastic fiber morphology of the Efemp1 knockout mouse. Integration of diverse “omic” layers identified a supersaturated bile to promote cholesterol crystallization and gallstone formation as the mode of action at SULT2A1. We developed an approach to classify pQTLs by integrating ontology mapping with a data-derived protein network. This showed that 39% (n = 2302) of trans-pQTLs (i.e., those distant from the protein-encoding gene) were protein- or pathway-specific and identified established risk loci, such as rs738409 (PNPLA3), an established liver fibrosis locus, to act on several proteins that are all part of a specific protein community. We developed an interactive web resource (www.omicscience.org/apps/pgwas) to facilitate rapid access and interrogation to our results.
CONCLUSION
Genetically anchored plasma proteomics identifies shared etiologies across diseases, enables prioritization of drug targets, and provides a systems biology context for gene-to-phenotype and protein-to-phenotype connections.
Summary of the study design (outer circle) to construct a proteo-genomic map (inner circle) of human health.
Connections between protein-encoding genes, proteins, diseases, and phenotypes were drawn for all examples with strong evidence of a shared genetic signal based on statistical colocalization (posterior probability > 80%). Parts of the …
AAAS