Sep 20, 2011

Beyond the Genome 2011: Workshop

Informatics workshop:
Cutting edge presentations by Mike Schatz, Ben Langmead and James Taylor, highlighting implementations of reproducible bionformatics workflows on the Amazon EC2 cloud computing platform.

  • Complete Genomics presentation: 
- Complex structural mutations commonly found in cancer genomes cannot be identified by algorithms. Such complex indels, substitutions or double translocations are wrongly identified SNPs.
- 69 completely sequenced genomes are now available at bionumbus.org or dnanexus.com.
- 400bp insert size is the minimum required to span across ALUs in the genome
Excellent talks by
  • Mike Schatz (Expt. design, Observation, Integration, Discovery process, Crossbow contrail, Quake, Jnomics, Hydra),
  • Matt Wood (Amazon AWS: Cloud BioLinux, CycleCloud, Parallel blast, Taverna, StarCluster, Rocks, Condor all on EC2),
  • Ben Langmead (Cloud based crossbow rna-seq pipeline, myRNA, Hadoop & MapReduce, Apache pig),
  • James Taylor (useGalaxy, getGalaxy, useGalaxy/cloud & ToolShed). The way to go is to use/build informatics workflows in Galaxy (leverage existing toolsheds) and use compute resources in the cloud (Amazon EC2 cloudman or other custom instances)
  • Very vague and wandering presentation by Yinguri Li (BGI) on the 'Trees' of information (Very conceptual and no framework presented) and a very nuts-n-bolts ALLPATHS-LG assembly algorithm by David Jaffe.
  • Mike Schatz threw a challenge on identifying a viral insert sequence in a genome. Cracked by two folks in just 1+ hour.

Overall, based on today's talks, I am convinced of the need to adopt cloud-based galaxy enabled informatics workflows for reliable, reproducible and collaborative analysis.

No comments:

Post a Comment