Generating UMI-exclusive and UMI+Sequence structured reads
tags: Resimpy
Introduction
resimpy_general is a module that can simulate reads consisting of
only UMIs per each, or UMI+Genomic sequence per each. The
general-purpose design gives the module this name. To achieve this
purpose, a case-study CLI should look like below:
resimpy_general \
-r seq_errs \
-rs umi \
-perm_num 3 \
-umiup 1 \
-umiul 10 \
-umi_num 50 \
-seq_len 20 \
-pcr_num 8 \
-pcr_err 0.0001 \
-seq_err 0.0001 \
-ampl_rate 0.85 \
-sim_thres 3 \
-spl_rate 1 \
-seq_errs 1e-3;1e-2;0.1 \
-out_dir ./
Parameters are illustrated below.
Par ameter a cronym |
Full name |
Function |
|---|---|---|
r |
recipe |
to specify a module to work on your requirement |
rs |
read structure |
e.g., umi+seq or umi |
pe rm_num |
permutation number |
in silico test numbers |
umiup |
UMI unit pattern |
1 for monomer blocks, 2 for dimer blocks, 3 for trimer blocks |
umiul |
UMI unit len fixed |
the fixed length of a monomer UMI |
u mi_num |
UMI number fixed |
the fixed number of molecules/UMIs to be initiated in the initial read pool |
sim _thres |
similarity threshold fixed |
how many nucleotites are different at least between each pair of two randomly generated UMIs |
s eq_len |
sequence length |
the length of a genemic sequence |
p cr_num |
PCR n umber/cycle |
a fixed PCR number |
p cr_err |
PCR error |
a fixed DNA polymerase error rate during PCR |
s eq_err |
sequencing error |
a fixed sequencing error rate |
amp l_rate |
am plification rate |
PCR amplification rate |
sp l_rate |
subsampling rate |
subsampling rate used for sequencing |
se q_errs |
sequencing errors |
sequencing error rate partitioned by semicolon, e.g., 1e-3;1e-2;0.1 |
pc r_errs |
PCR errors |
DNA polymerase error rate partitioned by semicolon, e.g., 1e-3;1e-2;0.1 |
pc r_nums |
PCR numbers |
PCR numbers partitioned by semicolon, e.g., 8;9;10;11;12 |
um i_lens |
UMI lengths |
UMI lengths partitioned by semicolon, e.g., 8;9;10;11;12 |
ampl _rates |
am plification rates |
amplification rates partitioned by semicolon, e.g., 0.1;0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9;1.0 |
o ut_dir |
output directory |
a directory where you want to output results |
Due to -rs is specified as only umi, each read conly contain one
UMI. If -rs is specified as umi+seq, each read will contain one
UMI and one genomic sequence. In each permutation test, reads will be
generated based on one varying parameter such as seq_errs and all of
the fixed parameters such as pcr_num except for the varying one. In
this context, seq_err will not be applied because seq_errs is
claimed, such that reads can be examined under this varying one. This is
actually a one-factor experiment control. Similarly, for pcr_errs,
pcr_nums, umi_lens, and ampl_rates, the CLIs should look
like below:
Reads changing with PCR errors
resimpy_general -r pcr_errs -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -seq_len 20 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -pcr_errs 1e-3;1e-2;0.1 -out_dir ./
Reads changing with amplification rates
resimpy_general -r ampl_rates -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -seq_len 20 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -ampl_rates 0.1;0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9;1.0 -out_dir ./
Reads changing with PCR numbers
resimpy_general -r pcr_nums -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -seq_len 20 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -pcr_nums 6;7;8;9;10;11;12;13;14 -out_dir ./
Reads changing with UMI lengths
resimpy_general -r umi_lens -rs umi+seq -perm_num 3 -umiup 1 -umiul 10 -umi_num 50 -seq_len 20 -pcr_num 8 -pcr_err 0.0001 -seq_err 0.0001 -ampl_rate 0.85 -sim_thres 3 -spl_rate 1 -umi_lens 6;7;8;9;10;11;12 -out_dir ./