Generating single-cell sequencing reads

tags: Resimpy

Introduction

resimpy_umi_sc is a module that can simulate single-cell sequencing reads consisting of only UMIs per each from a gene-by-cell matrix generated by an external simulator called SPsimSeq. To achieve this purpose, a case-study CLI should look like below:

resimpy_umi_sc \
-r seq_errs \
-rs umi \
-perm_num 3 \
-umiup 1 \
-umiul 10 \
-umi_num 50 \
-pcr_num 8 \
-pcr_err 0.0001 \
-seq_err 0.0001 \
-ampl_rate 0.85 \
-sim_thres 3 \
-spl_rate 1 \
-seq_errs 1e-3;1e-2;0.1 \
-out_dir ./

Parameters are illustrated below.

Par ameter a cronym

Full name

Function

r

recipe

to specify a module to work on your requirement

rs

read structure

e.g., umi+seq or umi

pe rm_num

permutation number

in silico test numbers

umiup

UMI unit pattern

1 for monomer blocks, 2 for dimer blocks, 3 for trimer blocks

umiul

UMI unit len fixed

the fixed length of a monomer UMI

u mi_num

UMI number fixed

the fixed number of molecules/UMIs to be initiated in the initial read pool

sim _thres

similarity threshold fixed

how many nucleotites are different at least between each pair of two randomly generated UMIs

p cr_num

PCR n umber/cycle

a fixed PCR number

p cr_err

PCR error

a fixed DNA polymerase error rate during PCR

s eq_err

sequencing error

a fixed sequencing error rate

amp l_rate

am plification rate

PCR amplification rate

sp l_rate

subsampling rate

subsampling rate used for sequencing

se q_errs

sequencing errors

sequencing error rate partitioned by semicolon, e.g., 1e-3;1e-2;0.1

pc r_errs

PCR errors

DNA polymerase error rate partitioned by semicolon, e.g., 1e-3;1e-2;0.1

pc r_nums

PCR numbers

PCR numbers partitioned by semicolon, e.g., 8;9;10;11;12

um i_lens

UMI lengths

UMI lengths partitioned by semicolon, e.g., 8;9;10;11;12

ampl _rates

am plification rates

amplification rates partitioned by semicolon, e.g., 0.1;0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9;1.0

o ut_dir

output directory

a directory where you want to output results

As we have configured SPsimSeq internally, there is no need to specify it again in CLI. But all parameters for the SPsimSeq matrix are fixed, we are considering to extend it more flexibly. In each permutation test, reads will be generated based on one varying parameter such as seq_errs and all of the fixed parameters such as pcr_num except for the varying one. In this context, seq_err will not be applied because seq_errs is claimed, such that reads can be examined under this varying one. This is actually a one-factor experiment control. Similarly, for pcr_errs, pcr_nums, umi_lens, and ampl_rates, the CLIs should look like below:

Reads changing with PCR errors

resimpy_umi_sc -r pcr_errs -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -pcr_errs 1e-3;1e-2;0.1 -out_dir ./

Reads changing with amplification rates

resimpy_umi_sc -r ampl_rates -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -ampl_rates 0.1;0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9;1.0 -out_dir ./

Reads changing with PCR numbers

resimpy_umi_sc -r pcr_nums -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -pcr_nums 6;7;8;9;10;11;12;13;14 -out_dir ./

Reads changing with UMI lengths

resimpy_umi_sc -r umi_lens -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -umi_lens 6;7;8;9;10;11;12 -out_dir ./