circDeep: Deep learning approach for circular RNA classification from other long non-coding RNA

circDeep fuse Reverse Complement Matching (RCM) descriptor, Asymmetric Convolution Neural Network combined with Long Short Term Memory (ACNN-BLSTM) sequence descriptor and conservation descriptor into high level abstraction descriptors, where the shared representations across different modalities are integrated. The experiments show that circDeep is not only faster than existing tools but also performs at an unprecedented level of accuracy by achieving more than 12 percent increase in accuracy over the existing tools.

Prerequisites

We recommend to use Anaconda 3 platform.

Keras (Deep learning library)
scikit-learn (Machine learning library)
h5py
gensim
pysam >= 0.9.1.4
pybigwig

Installation

Download circDeep by

git clone https://github.com/UofLBioinformatics/circDeep

Installation has been tested in Anaconda (Linux/Windows) platform with Python3.

Usage

usage: circDeep.py [-h] --train TRAIN --genome GENOME -gtf GTF --bigwig BIGWIG
               [--seq SEQ] [--rcm RCM] [--cons CONS] [--predict PREDICT]
               [--out_file OUT_FILE] [--model_dir MODEL_DIR] 
               [--positive_bed POSITIVE_BED] [--negative_bed NEGATIVE_BED] 
               [--testing_bed TESTING_BED] 

circular RNA classification from other long non-coding RNA using multimodal deep learning

Required arguments:
=================== 
   --data_dir <data_directory>
                        Under this directory, you will have descriptors files used for training, the label file, genome sequencefile , gtf annotation file and bigwig file
  --train TRAIN         use this option for training model
  --genome GENOME       Genome sequence. e.g., hg38.fa
  --gtf GTF             The gtf annotation file. e.g., hg38.gtf
  --bigwig BIGWIG       conservation scores in bigWig file format
                        
 optional arguments:
====================

   -h, --help            show this help message and exit
  --seq SEQ             The modularity of ACNN-BLSTM seq
  --rcm RCM             The modularity of RCM
  --cons CONS           The modularity of conservation
  --predict PREDICT     Predicting circular RNAs. if using train, then it will
                        be False
  --out_file OUT_FILE   The output file used to store the prediction
                        probability of testing data
  --model_dir MODEL_DIR
                        The directory to save the trained models for future
                        prediction
   --positive_bed POSITIVE_BED
                        BED input file for circular RNAs for training, it
                        should be like:chromosome start end gene
  --negative_bed NEGATIVE_BED
                        BED input file for other long non coding RNAs for
                        training, it should be like:chromosome start end gene
  --testing_bed TESTING_BED
                        BED input file for testing data, it should be
                        like:chromosome start end gene

Example

Train the model:

In our experiements, we have used circular RNAs from circRNADb and our negative dataset from GENCODE. The original coordinates of our datasets were in hg19 genome and we convert them to hg38 genome using liftOver provided in UCSC Genome Browser. We need also to download all necessary files and put them in data directory.

Dowload genome sequence in FASTA format for human genome ( It can be downloaded from UCSC Genome Browser)
Dowload gtf annotation for human genome.
Download phastCons scores for the human genome in PhastCons format.

python3 circDeep.py --data_dir 'data/' --train True --model_dir 'models/' --seq True --rcm True --cons True --genome 'data/hg38.fasta' --gtf 'data/Homo_sapiens.Ensembl.GRCh38.82.gtf' --bigwig 'data/hg38.phastCons20way.bw' --positive_bed 'data/circRNA_dataset.bed' --negative_bed 'data/negative_dataset.bed'

Test the model:

python3 circDeep.py --data_dir 'data/' --train False --model_dir 'models/' --seq True --rcm True --cons True --genome 'data/hg38.fasta' --gtf 'data/Homo_sapiens.Ensembl.GRCh38.82.gtf' --bigwig 'data/hg38.phastCons20way.bw' --testing_bed 'data/test.bed'

Note:

Input data files for training and testing should be in bed format:

chr17 17507350 17508308 + gene1

chr11 48014405 48015855 - gene2

chr17 77469161 77472770 - gene3

License

Citation

Mohamed Chaabane, Robert M Williams, Austin T Stephens, Juw Won Park, circDeep: deep learning approach for circular RNA classification from other long non-coding RNA, Bioinformatics, Volume 36, Issue 1, 1 January 2020, Pages 73–80, https://doi.org/10.1093/bioinformatics/btz537

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

circDeep: Deep learning approach for circular RNA classification from other long non-coding RNA

Prerequisites

Installation

Usage

Example

Train the model:

Test the model:

Note:

License

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
models		models
License		License
README.md		README.md
circDeep.py		circDeep.py

License

UofLBioinformatics/circDeep

Folders and files

Latest commit

History

Repository files navigation

circDeep: Deep learning approach for circular RNA classification from other long non-coding RNA

Prerequisites

Installation

Usage

Example

Train the model:

Test the model:

Note:

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages