Tutorial guide

In the following page, we describe the steps involved in running TallyNN to error correct barcodes and UMIs. The pipeline can be executed using a test dataset that contains 100,000 Nanopore sequencing reads (https://www.cgat.org/downloads/public/adam/TallyNN/test_nanopore.fastq.gz) or full data (https://www.cgat.org/downloads/public/adam/TallyNN/E1.fastq.gz).

The fastq files are derived from an experiment in which an equal mix of JJN3, NCI-H929 and DF15 myeloma cells.

Download the files

The demo data is available at the following link - https://www.cgat.org/downloads/public/adam/TallyNN/:

wget https://www.cgat.org/downloads/public/adam/TallyNN/test_nanopore.fastq.gz
wget https://www.cgat.org/downloads/public/adam/TallyNN/Homo_sapiens-GRCh38.fa
wget https://www.cgat.org/downloads/public/adam/TallyNN/Homo_sapiens.GRCh38.101.bed
wget https://www.cgat.org/downloads/public/adam/TallyNN/hg38.fasta
wget https://www.cgat.org/downloads/public/adam/TallyNN/Homo_sapiens.GRCh38.101.gtf
mkdir data
mv test_nanopore.fastq.gz data/

To run the demo you will need to download the fasta file of human transcripts, the fastq file of nanopore sequenced reads, a genome fasta file and a junction bed file generated according to minimap2 [documentation](https://lh3.github.io/minimap2/minimap2.html).

Add the downloaded data to a directory called data.dir/

Generate the config file

In order to run the nanopore pipeline you will need to generate a configuration file that can be used to modify the execution of the pipeline.

This can be performed by:

tallynn nanopore config

This will generate a pipeline.yml file in the current directory. However for the purpose of this tutorial please download the pipeline.yml as follows:

wget https://www.cgat.org/downloads/public/adam/TallyNN/pipeline.yml

Modify the config file

Open up the pipeline.yml file. You can see a set of key value pairs that can be modified to change the running of the pipeline. For the current analysis, no changes need to be made to this file.

Run the pipeline

The defult behaviour is for the pipeline to execute across a cluster. This is why it is important to set up your .cgat.yml file correctly. In order to run the pipeline to completion run the following in the commandline:

tallynn nanopore make full -v5

Otherwise, the pipeline can also be ran locally without a cluster:

tallynn nanopore make full -v5 --no-cluster