

To determine where on the human genome our reads originated from, we will align our reads to the reference genome using STAR (Spliced Transcripts Alignment to a Reference). The choice of aligner is often a personal preference and also dependent on the computational resources that are available to you. The alignment process consists of choosing an appropriate reference genome to map our reads against and performing the read alignment using one of several splice-aware alignment tools such as STAR or HISAT2. To generate a BAM file we need to map our reads to the genome.


The paper by Heng Li et al provides a lot more detail on the specification. It is significantly smaller in size and is usually the file format requested for by downstream tools that require alignment data as input. A BAM file is the binary, compressed version of the SAM file. Specifically, we can obtain the genomic coordinates of where each read maps to in the genome and the quality of that mapping. The Sequence Alignment Map format (SAM) file is a tab-delimited text file that contains all information from the FASTQ file, with additional fields containing alignment information for each read. The BAM file will be used as input to a tool called Qualimap which computes various quality metrics such as DNA or rRNA contamination, 5’-3’ biases, and coverage biases. Since this is not part of the Salmon output we will need to use a genome alignment tools to generate a BAM file. In order for us to make an assessment on the quality of the mapping we need genomic coordinate information for where each read maps. When Salmon performs the quasi-alignment, internally the algorithm knows the location(s) to which each read is assigned, however this information is not shared with the user.

Running an alignment tool to generate BAM files.QC with STAR and Qualimap | Introduction to RNA-Seq using high-performance computing - ARCHIVED Introduction to RNA-Seq using high-performance computing - ARCHIVED View on GitHubĪpproximate time: 50 minutes Learning Objectives:
