Editing BioMicroCenter:Research (section)

== 2010 ==
=== Processing Very Long Illumina Reads - [http://web.mit.edu/biology/www/facultyareas/facresearch/chisholm.html CHISHOLM LAB] - [http://web.mit.edu/biology Biology] ===
Different high-throughput sequencing platforms are currently available, and trade-offs currently exist between the cost per sequencing read, the number of reads, and the average read length. The Chisholm lab has been interested in optimizing the Illumina platform for the de novo sequencing of microorganisms. To this end, the Chisholm lab has worked with the BioMicro Center to develop a pipeline that significantly increases the read length yielded by the Illumina sequencing technology, generating sequencing reads that can exceed 250 nucleotides in length. Combined with Illumina's low cost and high-throughput, the procedure expands the range of applications that can be performed with this platform. <BR><BR>
Illumina reads tend to decrease in quality with length due to slight errors in incorporation and extension of the growing sequence. To improve the error rate at long read lengths, the Chisholm lab developed an algorithm SHERA (SHortread Error-Reducing Aligner) which uses overlapping paired-end reads to  create long and accurate composite reads. SHERA allows more than 87% of the paired-end sequencing reads to produce longer composite sequences with less than 1% of paired reads incorrectly aligned. The quality score of each overlapped base is re-evaluated to take into account the information from the two paired-end reads. The Chisholm lab sequenced a marine metagenomic DNA sample using 454-FLX and the Illumina paired-end overlapping procedure, and found that the taxonomic classification results are highly platform-independent, demonstrating that that composite sequencing reads constitute a cost-effective alternative to pyrosequencing. <BR><BR>
The creation of high-quality very long Illumina reads is not only applicable to metagenomics sequencing. The BioMicro Center is currently working to deploy this algorithm for many other applications including amplicon sequencing, transcriptomics, de novo assembly and resequencing for mutation detection. We anticipate a strong growth in very long reads in FY2011. This work has been accepted for publication in PLoS One.

=== High Throughput Metagenomics Using Illumina Sequencing - [http://almlab.mit.edu/l ALM LAB] - [http://web.mit.edu/be BE] - [http://web.mit.edu/cehs CEHS] === 
Understanding the role of the human microbiome in health and disease is an emerging field, and has been targeted as a major NIH Roadmap Initiative. Microbial community analysis by 16S rRNA sequencing is a key component of microbiome studies, together with whole genome sequencing and metagenomics. The BioMicro Center has worked with the Alm lab to establish and optimize an experimental approach to generating partial 16S rRNA sequences that is orders of magnitude less expensive than conventional methods, thus enabling unprecedented resolution in microbiome comparisons using the Illumina Genome Analyzers. <BR><BR>
The use of Illumina sequencing for assay microbiomes has been limited by read length and by financial considerations. In addition, homopolymeric sequences are very difficult to process using the standard Illumina image analysis software. Improvements to read length introduced by Illumina and careful selection of priming sites have addressed the former issue. The expense of the reads has been solved by using very highly multiplexed lanes. While each lane of Illumina sequence costs $3,300 for the read lengths needed for this project, multiplexing the samples (up to 96x) dramatically lowers the cost per sample. Barcoding the sample also allowed the Alm lab to bypass the lack of complexity in ribosomal reads as the highly diverse sequences meet the criteria needed for spot finding. Data from this project has been used in an NIH grant application and in a patent application.

=== Testing of NuGen’s  SPIA Technology for Illumina Sequencing - [http://web.mit.edu/biology/www/facultyareas/facresearch/burge.html BURGE LAB] - [http://web.mit.edu/be BE] - [http://web.mit.edu/biology Biology] - [http://web.mit.edu/ki KI] === 
One of the key limitations of RNA-sequencing is the relatively large amounts of RNA required for each sample. While recent protocols have reduced the amount of total RNA input down below 1g, the initial protocols required 5-10g of material. This is several orders of magnitude higher then we routinely use for microarray analysis. 
While most microarray labeling protocols are inappropriate for Illumina sequencing in that they use cRNA as their labeled materials, the NuGen kits used by the BioMicro Center since 2009 are unique in that they use amplified cDNA which has several benefits to microarray analysis. We were particularly interested in the ability of NuGen to handle amounts of RNA in the sub-nanomolar range which could allow next-generation sequencing of RNA from single cells. In order to test the viability of this approach, we established a collaboration with NuGen and with Dr. Chris Burge of the Biology and Biological Engineering departments. <BR><BR>
To establish the robustness of the NuGen system for RNA-seq, two mRNA samples were isolated from control and UPF-1 knockdown cells and were prepared either with the NuGen kit (by NuGen technicians) or with standard RNA-seq methods from Illumina (prepared in the Burge lab). The NuGen samples were prepared across a variety of concentrations, and seven paired-end libraries were run.  Differential error rates, coverage, sensitivity and differential expression were all calculated by the Burge Lab. <BR><BR> 
Our results demonstrated that the NuGen kit, unfortunately, has a number of issues that are concerning in the RNA-seq environment. Analysis of coverage showed very uneven coverage of exons, likely due to the semi-random-nonamers that NuGen uses in preparing the cDNA. In addition, the level of noise introduced in differential expression was quite large, at least for an experiment with subtle changes in expression.  Our results have discouraged us from focusing on NuGen protocols for looking at RNA-seq data. It has even raised questions to us about the quality of the NuGen kit for exon array analysis, though other whole transcriptome amplifications are probably no better.