DNA Analysis Facility

Analysis

The GA generates images of each base of each cluster. This image data enters a set of utilities known as the Genome Analyzer Pipeline, which performs data analysis on a sequencing run. Terabytes of tif images are processed in to 10's of GB worth of txt.. Data analysis consists of 3 steps:

  1. Image Analysis: raw tif files are processed for cluster position, intensity, and noise. This data is used for base calling.
  2. Base Calling: cluster intensities are used to output the sequence of bases from each cluster. A quality score is given for each base.
  3. Sequence Analysis: sequence data is aligned to reference sequence and results are output into txt and html files.

There are several programs that will perform downstream sequence analysis. A comprehensive list is available here. Many are open source, while others are vendor supplied. We will be happy to assist you with basic analysis. Currently this includes ELAND and Maq. Please note that there is a large amount of data associated with each sample. You should be prepared for the storage requirements. We will be happy to provide you with any and all data that you request, but due to the size, we cannot indefinitely store it for you. If you have any questions or concerns, please contact us.