Sep 21, 2011

Beyond the Genome 2011: Day 2: Exome sequencing

A complete overview of all challenges, achievements, case studies and future plans on using exome sequencing.


Jay Shendure (UW Genome Sciences)

Started exome sequencing in 2009
From 2009 to 2011, solution-based 96-plex exome capture, with multiple exomes per lane. More than 10,000 exomes sequenced, more than 200 publications, applied to cancers and other mendelian disorders

a) Gene discovery
b)Validation, follow-up and diagnostics
c)Kabuki syndrome
d)Autism spectrum disorders

Kabuki syndrome: suspected dominant, 10 probands with kabuki. Analysis was wrong (MUC16).
Reasons: genetic heterogeneity, undercallling of coding variants, causal non-coding or structural variants
New filtering strategy: Ng et al. Nature genetics (2010). Did phenotypic stratification and genotypic stratification. Narrowed down to MLL2 (7/10 cases had nonsense / frameshift mutations in this gene). Each of this was a de-novo mutation when parents were studied. Sanger sequencing was done in 110 cases for MLL2. Only 74% had MLL2 mutation. How do you explain 24% who did not have? Maybe there are non-coding and structural mutations. So they sequenced the entire MLL2 gene (48 samples). Found NO non-coding or structural mutations in MLL2-negative cases.

Autism spectrum disorders: Complex inheritance. Assumed a proportion of autism is due to coding variants with large effects, but 100's of genes are involved so the earlier approach wont work.
Approach: Trio-sequencing. Pilot study published (O'Roak et al. Nature Genetics 2011). Pilot: 20 trios, all of which are simplex autism. NimblegenEZ ExomeV2.0 used to capture exome.
"Child has a variant that either parent does not have". denovo SNV analysis: "Haystack".
242 new denovo variants confirmed. Mutation rate: 2.17 x 10(power -8) per generation
A protein-protein network shows that about 40% of severe mutations are part of the same pathway.
Some cases may have multiple hits: oligogenic model, extreme heterogeneity. So now how to vailidate the candidate genes? Solution: Molecular inversion Probes (Turner et al. Nature Methods-2009). Extreme locus heterogeneity: No single gene causes > 0.5% of simplex autism

Tim O'Connor (UW Genome Sciences)
Analysis of 2440 exome sequences (Goal 7000): Broad and UnivWash-Seattle
; Both cases and controls; after analysis, 564,698 SNVs intersection of all 2440 samples, after further filtering, only 503,385 remain. 82% of SNVs were novel. High skew towards rare alleles: 57% of SNVs are singletons (<0.1% MAF). No of segregating sites increase exponentially as the sample size increases.

Ron Do (Harvard): Group:Abecasis, Altshuler, Kathiseran, Sunyaev, Kiezun, Farlow, Gabriel)
Exome sequencing as a tool to discover genes for blood lipids for myocardial infarction

Disease: FHBL 
 Type 2: non-APOB related, FHBL type 1: APOB related
Family of type 2 FHBL: 30 members across 3 generations, where APOB has been ruled out. They hae extremely low lipid values, lower than population mean. Segregates in autosomal dominant fashion.
Sequenced 16000 genes on Illumina GAII, MAQ aligned, snps called using GATK.
18259 unfiltered variants -> basic qc -> ~16000 -> removed dbSNP or 1000G -> 481 -> remove control variants -> 60 -> only one gene where both siblings share two ANGPTL3 nonsense variants : TCC codon changes to TGA condon resulting in serine to Opl AA change. Both occur in first exon of the gene. Both alleles are in same sequencing read (IGV view). Confirmed by Sanger. Followed up by genotyping all 38 available family members. Mother had first mutation, father had second mutation, got transmitted to 10 kids. Individuals with both variants have low TG, LDL-C and HDL-C. Suggests a recessive effect. Extend observation to population : GWAS of lipids traits in over 100000 individuals. ANGPTL3 expressed in liver -> regulated HDL levels

Exome sequencing applied to common complex form of MI: 
looking for rare mutations of large effects in unrelated individuals. Both private mutations and low-frequency coding mutations can contribute. 1200 cases (young, strong primary mutations)) and 1200 controls (old, protective mutations). Cases 20 yrs younger than controls. 177x coverage. T1 test (variant of CMC: Li & Leal, 2008) performed. Systematic deflation: many genes dont have rare variants: about 4000 genes have a T1 allele count lower than statistically significant threshold). NBEAL1 is a GWAS locus for MI. How do we get to bonafide discovery? Follow-up with both hypothesis: private mutations and rare low-freq coding mutations. Re-discoered MI protective nonsense mutation: PCSK9 c.679K. 3rd strategy: imputation(GWAS and exome sequence reference panel and GWAS target). Re-discovered a known low-frequency MI SNP: LPA 1189IM. All top associated genes are know earlier.

Joris Veltman (Nijmegen Centre, Netherlands)
A denovo paradigm for intellectual disability

Dogma is wrong: genetic causes of diseases are all germline
In both linkage and association approaches, rare mutations are not found. Many diseases occur sporadic and are associated with reduced fitness. Eg. autism, schizophrenia. How can we identify the genetic causes of these disorders?

Intellectual disability has high heritability scores, large chr rearrangement. There are denovo CNVs in ID patients found from microarrays (parents are normal) => denovo CNVs. These are frequent and can occur throughout the genome. 15% of ID is caused by denov CNVs. CNV on 17q21.31 (parent has inversion at this locus that makes it susceptible). These denovo events may be drivers of evolution but causes of disease. Explanation: balance between genetic copying errors that turn normal alleles into harmful mutations and selection eliminating these mutations. If mutation is beneficial, it leads to evolution. If harmful, it leads to disease. Estimated per generation mutation rate: 50-100 denovo mutations per genome. Mutational target size determines the frequency of disease. Two examples of genes: SETB1 (small mutational target) -> disease is extremely rare. In contrast, DHODH, NDUFS1, ACAD9 (multiple genes), then it is a common disorder.
ID occurs mostly sporadic, ID associated with advanced paternal age, ID associated with a low recurrence risk, many genes involved (based on X-linked ID). So the hypotheses is : Each patient carries a denovo mutation in a different gene -> NGS sequencing of patient-parent trios which filters for denovo mutations. Pilot study: 10 trios (normal karyotype and CNV profile). Results: After filtering, 140-200 variants per individual (private variants). You cannot look for overlap as each patient could have different gene. Finally, left with 0-2 denovo mutations. 9 different genes affected by denovo mutations in 7 patients => captured known mental retardation genes (RAB39B and SYNGAP1). Does mutation impact protein function? Phylop score and grantham score shows distinct groups implying that these genes could be pathogenic. 7 out of 10 patients had a denovo mutation. It follows the expected per generation mutation rate ( no. of mutations not increased, but more in functionally relevant genes). Targeted reseq in 600 ID patients at YY1 locus. Hence denovo mutations explain about 40-50% of ID cases. De novo questions: a) Freq and timing of denovo mutations, b) randomly distributed or in hotspots? c) what are the risk factors that influence denovo mutation generation?, Applied ques: a) How to discriminate benign from pathogenic denovo mutations? b) what % of ID and other syndromes are explained by denovo mutations? c) Which genes/genomic elements cause ID

Gholson Lyon (Childrens Hospital Philadelphia)
Previously unrecognised x-linked :

We can find previously unreported mutations in neuro-psychiatric diseases (idiopathic). You have to link mutations to the phenotype. There are many idiopathic disorders not described in literature (not known, not in literature). He saw 100s of these, but wanted to choose one family that could prove his hypothesis. He met the first boy who died. More died by cardiac arrythmia. Not mention of this syndrome (all have consistent facial features). Mainly skeletal, cardiac, genital and neurologic conditions. Sutter died  and he attended the autopsy...all internal systems look fine.
Experimental design is critical for sequencing: With families, you have the power of the design. So he went after carrier mother, carrier grandmother, unaffected brother and unaffected uncle. All were boys, so guess was it is x-linked. Used ANNOVAR and VAAST. Mutation was segregating in all members of the family (proline to serine mutation). Study is first human genetic disorder involving the amino-terminal acetylation of proteins, open door to new biology. It is different from Progeria (no arterial sclerosis).

Stephen Kingsmore (Childrens Mercy Hospital in Kansas city)

7000 OMIM mendelian diseases -> 3280 molecular basis known -> reclassify disorders as genetic based. Disease risk is more than 4-fold in inbred populations.
Monogenic disease inheritance : autosomal recessive, x-linked recessive, dominant, mitochondrial, imprinting disorders
Mendelian disease testing: about 100 of 3280 diseases of known cause have sufficiently mature knowledge for clinical testing. But for others, knowledge is lacking (only about 200 disease gene tests are available). Multiple differential diagnosis in monogenic diseases: testing is done one at a time y sanger sequencing, costing $10K/patient. Pretty miserable scenario for widespread testing. Newborn screening is only for 60 out of 120 diseases eligible. Preconception carrier testing: in US this is mainly done in ethnic/subpopulations having high susceptibility to certain diseases (eg. cystic fibrosis). Most testing is done two ways: a) FDA approved process and b) CLIA-approved lab. All this will change as cost of sequencing drops rapidly; cost of interpretation will be more. BRCA1/BRCA2 ruling says sequence cannot be patented; so we will have more access to sequencing data. For mendelian diseases that are non-exonic, we need targeted sequencing as whole genome sequencing is expensive today. He talked about the recent BRCA1 / BRCA2 ruling
Illumina Truseq chemistry has good sensitivity and coverage, with 2 runs of enrichment. There is no standard in genotype calling, variant detection and interpretation. ASMG has guidelines but are not mandated. Need to have centralized clinical variation database. Another problem: 22% of literature-cited disease mutations were common SNPs or misannotated (not reproducible). Eg: exonic deletion which is actually a spliced-donor deletion. Due to NIH funding limitations, there is limited testing and therapies and lack of ascertainment or timely diagnosis. Even if there is an appealing drug, we cannot use them. The solution is to do collective R&D funding, rapid testing leading to genotype-phenotype relationships.

No comments:

Post a Comment