RNA sequencing is performed on all HipSci iPS cell lines that are selected for banking
after passing QC. Sequencing and primary analysis are performed at the
Wellcome Trust Sanger Institue.
HipSci’s RNA-seq analysis pipeline is to
map sequence reads to the human GRCh37 reference using the
STAR spliced aligner. The mapping uses
version 19 of the Gencode gene annotation to enable splice-aware alignments.
Getting the data
Complete lists of exome-seq data can be found under the files tab of
the cell lines and data browser
or in the dataset indexes on the FTP site.
- Raw sequencing reads
– Distributed in the cram file format. Any cell line
can have multiple associated cram files; each corresponds to a single lane of sequencing.
- Splice-aware STAR alignment
– Distributed in the bam file format. We distribute one bam file per cell line.
For managed access cell lines, RNA-seq
files are archived in the EGA. The
data browser contains
links to the relevant EGA dataset page, from where researchers can request access to the data.
For open access cell lines, RNA-seq files
are archived in ENA. Data are openly available
to anybody, and the data browser
contains direct links to the files on the ENA FTP server.
HipSci’s FTP site contains: