Usage¶
Basic usage¶
Command to launch the pipeline is as follows:
python MIRZA_G_pipeline.py run --config congig.ini --name-suffix name_of_the_run
All parameters for the script:
The main script launching MIRZA-G analysis pipeline
usage: MIRZA-G [-h] {run,clean} ...
- Sub-commands:
- run
Run a pipeline
usage: MIRZA-G run [-h] [-v] --config CONFIG [--name-suffix NAME_SUFFIX] [--protocol {seed,scan}] [--calulate-bls] [--modules [MODULES [MODULES ...]]]
- Options:
-v=False, --verbose=False Be loud! --config Config file --name-suffix=test_run Suffix to add to pipeline name in order to easily differentiate between different run, defaults to test_run --protocol=seed Protocol of MIRZA-G, defaults to seed
Possible choices: seed, scan
--calulate-bls=False NOT AVAILABLE: Calculate Branch Length Score (conservation) --modules A list of modules to load (if HPC or environment requires)
- clean
Clean after previous run
usage: MIRZA-G clean [-h] [-v] [-y]
- Options:
-v=False, --verbose=False Be loud! -y=False, --yes=False Force deletion of files.
Preparing config file¶
Copy config_example.ini from MIRZA-G directory to your working directory (directory where you want to perform calculation, WD):
cd Your/Working/Direcory
cp Path/To/MIRZA-G/config_example.ini config.ini
- Set all the necessary paths in your config.ini file as indicated in the comments inside the file. The most importand are:
- motifs: “Path/To/miRNAs.fa” - abs path to an input fasta file with mi/siRNA sequences of length 21 or more
- seqs: “Path/To/MIRZA-G/data/UTR_Sequences.fa” - abs path to a fasta file with the UTR sequences from which the coordinate file will be generated (you can use 3’UTR sequences in the MIRZA-G/data directory, for this file there are also alignments for conservation precalculated)
- mirza_binary: “MIRZA” - path to MIRZA binary (or how you invoke it in the bash)
- contrafold_binary: “contrafold” - path to CONTRAfold binary (or how you invoke in the bash)
- Models paths:
- model_with_bls: “Path/To/MIRZA-G/data/glm-with-bls.bin” - abs path to the model with BLS (you can find it in the pipeline/data directory)
- model_without_bls: “Path/To/MIRZA-G/data/glm-without-bls.bin” - same as before
- Additionally when you would like to calculate with evolutionary conservation you have to make sure that the variable run_only_MIRZA in CalculateMIRZA task is set to “no” instead of “yes” and that you provide proper paths with aligned UTRs and evolutionary tree:
phylogenetic_tree: “Path/To/MIRZA-G/data/human_tree.nh” - abspath to provided phylogenetic tree
- alignment_directory: “Path/To/MIRZA-G/data/HumanAlignments/” - abspath to provided human alignments directory. If you downloaded package from CLIPz website
this directory is already in the MIRZA-G directory. If you downloaded from GitHub you have to download it additionally.
If you would like to run it on cluster follow instructions in the configuration file and ask your admin what parameters you need to set up before (like DRMAA path, modules necessary, queues names etc.). All these parameters can be set up in config.ini.
To run it locally it takes ~70 to 90 seconds for one miRNA without conservation calculation and ~170 seconds with calculation (This might be substantial amount of time (up to half an hour per miRNA) for worse processors).
Example¶
To test the pipeline go to the tests directory and run:
cd Path/To/MIRZA-G/tests
bash rg_run_test.sh help
Note
Usage: rg_run_test.sh clean/run [MIRZA/binary/path] [‘CONTRAfold/binary/path’]
And if you have installed MIRZA and CONTRAfold to default locations (MIRZA and contrafold) run:
bash rg_run_test.sh run
Otherwise provide paths to BOTH of them:
bash rg_run_test.sh run Path/To/MIRZA/binary Path/To/CONTRAfold/binary