Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example with CRISPR output files (piler-cr and CRISPRDetect) produces empty file #12

Open
shaman-narayanasamy opened this issue Feb 23, 2024 · 2 comments

Comments

@shaman-narayanasamy
Copy link

Dear authors/developers,

Please find the relevant information for my issue below. Please do not hesitate to ask for more informaiton.

Looking forward to hearing from you.

Expected Behavior

Non-empty file with output file to be produced, similar to that of a regular run.

Current Behavior

Empty output file produced.

Steps to Reproduce (for bugs)

$ rm -rf tmpFolder
$ mkdir tmpFolder
$ spacepharer easy-predict examples/crisprdetect_test examples/pilercr_test output/targetSetDB predictions.tsv tmpFolder

Spacepharer Output (for bugs)

The output file is empty. spacepharer worked when applied to the fasta format spacers. Here is the stdout of the run:

predictions.tsv exists and will be overwritten
easy-predict examples/crisprdetect_test examples/pilercr_test output/targetSetDB predictions.tsv tmpFolder 

MMseqs Version:                        	5.c2e680a
Taxonomy mapping file                  	
NCBI tax dump directory                	
Substitution matrix                    	nucl:nucleotide.out,aa:VTML40.out
<< Skipped for brevity >>
[=================================================================] 100.00% 2 0s 1ms
Time for merging to predictions.tsv: 0h 0m 1s 512ms
Time for processing: 0h 0m 2s 633ms

Context

I also tested using only piler-cr results such that I make sure there is only one set of CRISPR results being evaluated. Same output.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute SpacePHARER without any parameters):
MMseqs Version:                        	5.c2e680a
  • Which SpacePHARER version was used (Statically-compiled, self-compiled, Conda, etc.):
$ mamba env export
name: spacepharer_env
channels:
  - bioconda
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - bzip2=1.0.8=hd590300_5
  - ca-certificates=2024.2.2=hbcca054_0
  - gawk=5.3.0=ha916aea_0
  - gettext=0.21.1=h27087fc_0
  - gmp=6.3.0=h59595ed_0
  - libgcc-ng=13.2.0=h807b86a_5
  - libgomp=13.2.0=h807b86a_5
  - libidn2=2.3.7=hd590300_0
  - libstdcxx-ng=13.2.0=h7e041cc_5
  - libunistring=0.9.10=h7f98852_0
  - libxcrypt=4.4.36=hd590300_1
  - libzlib=1.2.13=hd590300_5
  - mpfr=4.2.1=h9458935_0
  - ncurses=6.4=h59595ed_2
  - openssl=3.2.1=hd590300_0
  - perl=5.32.1=7_hd590300_perl5
  - readline=8.2=h8228510_1
  - spacepharer=5.c2e680a=pl5321h6a68c12_3
  - wget=1.20.3=ha35d2d1_1
  - zlib=1.2.13=hd590300_5
prefix: /ibex/user/naras0c/conda-environments/spacepharer_env
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
    [Not applicable]
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
    [Not applicable]
  • Operating system and version:
$ uname -a
Linux cn605-27-r 5.14.0-162.23.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Apr 11 19:09:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
@RuoshiZhang
Copy link
Member

RuoshiZhang commented Feb 23, 2024

Hi!
The different test files in the example folder came from different bacteria genomes, only the one from fasta_test is supposed to get a hit from one of the example phage genomes. You could try searching against a larger target database (for instance spacepharer downloaddb GenBank_phage_2018_09 targetSetDB tmpFolder).
Hope this answers your question.

@shaman-narayanasamy
Copy link
Author

shaman-narayanasamy commented Feb 24, 2024

Hi!

Thanks for the response! Here is what I did:

$ mkdir -p database # Start from a fresh output directory
$ mkdir -p tmpFolder # Create a fresh tmp folder
$ $ spacepharer downloaddb GenBank_phage_2018_09 database/targetSetDB tmpFolder/
downloaddb GenBank_phage_2018_09 database/targetSetDB tmpFolder/ 

MMseqs Version:         5.c2e680a
Create reversed setdb   1
Threads                 40
Verbosity               3

2024-02-24 14:16:09 URL:https://wwwuser.gwdg.de/~compbiol/spacepharer/2018_09/genbank_phages_2018_09.tar [144250880/144250880] -> "genbank_phages_2018_09.tar" [1]
2024-02-24 14:16:10 URL:https://wwwuser.gwdg.de/~compbiol/spacepharer/2018_09/genbank_phages_2018_09.tsv [405478/405478] -> "genbank_phages_2018_09.tsv" [1]
tar2db genbank_phages_2018_09.tar /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/tardb --threads 40 -v 3 

Time for merging to tardb: 0h 0m 0s 81ms
Time for merging to tardb.lookup: 0h 0m 0s 409ms
Time for processing: 0h 0m 6s 409ms
createdb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/tardb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb -v 3 

Converting sequences
[8283] 1s 195ms
Time for merging to seqdb_h: 0h 0m 0s 383ms
Time for merging to seqdb: 0h 0m 4s 252ms
Database type: Nucleotide
Time for processing: 0h 0m 6s 161ms
createsetdb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb  /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672 --reverse-fragments 0 --tax-mapping-file genbank_phages_2018_09.tsv --extractorf-spacer 0 --translation-table 1 --add-orf-stop 0 --compressed 0 --threads 40 -v 3 

cp: '/ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb' and '/ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb' are the same file
Error: createsetdb failed

Perhaps I am doing something wrong here..?

EDIT/UPDATE

I tried to issue the createsetdb by myself as follows:

$ tar xvf tmpFolder/9610124632266045672/genbank_phages_2018_09.tar
< stdout of unpacking the fna.gz files >

$ mkdir phages # Create and move them to a different folder
$ mv *.fna.gz phages/
$ rm -rf databases/* # Clean up output directory
$ spacepharer createsetdb phages/*.fna.gz databases/targetSetDb tmpFolder/
< bunch of stdout >
$ spacepharer createsetdb phages/*.fna.gz databases/targetSetDb_rev tmpFolder/ --reverse-fragments 1
< bunch of stdout >

This seems to have worked with no issue. So, I proceeded to run the command that I wanted to run with the example CRISPR data:

$ spacepharer easy-predict examples/crisprdetect_test examples/pilercr_test databases/targetSetDb predictions.tsv tmpFolder
< bunch of stdout >

The predictions file is not empty this time around.

Let me know if you need more information :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants