Query database size and predictmatch matches #1

imrambo · 2020-06-26T20:51:39Z

Hello,

I am using SpacePHARER for a project, and had a few questions about how query database size can affect predictmatch results. I ran spacepharer predictmatch for two query databases, both times against the same target database. Only 31% of hits from the first run (which used a smaller query database) were found in the second run.

#First run spacer query DB:
2,206 spacers from 38 genomes.

#First run results:
161 spacer hits to viral target DB.

#Second run spacer query DB:
15,730 spacers from 450 genomes.

#Second run results:
1,764 spacer hits to viral target DB.
50 of the 161 spacers with hits in the first run were retained in the second run output.

Main question

I am wondering why spacers from the smaller query database that had a hit in the first run are not present from the output of the second run which has the increased query database size. Does the --simple-best-hit setting affect this?

The tmp folder was emptied after each run.

Environment

SpacePHARER Version: 2.fc5e668
Conda
Ubuntu 16.04

#The same parameters are used for both runs:
--strand 2 --fmt 2 --fdr 0.01 --simple-best-hit 1 --use-all-table-starts 1 --translate 1 --search-type 1 --translation-table 11 --rescore-mode 0 --num-iterations 4 --cov 0.50 --e-profile 0.0001 -s 5.70 --report-pam 1 --gap-open 16 --gap-extend 2 --cov-mode 0 --min-seq-id 0.95 --max-seq-id 1.00 --orf-start-mode 1 --remove-tmp-files 0

I'm reading through the supplemental info on the bioRxiv paper to try and understand the algorithm better.

Thank you very much for your time and help.

Cheers,
Ian

RuoshiZhang · 2020-06-26T21:55:44Z

Hi,

Could you please provide the full log of the two runs?
What is the size of your target DB? Are the matches(virus-host pairs) of the rest of the hits also not reported in the output?
The --simple-best-hit parameter in SpacePHARER is fixed and should not be related to this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query database size and predictmatch matches #1

Query database size and predictmatch matches #1

imrambo commented Jun 26, 2020

RuoshiZhang commented Jun 26, 2020

Query database size and predictmatch matches #1

Query database size and predictmatch matches #1

Comments

imrambo commented Jun 26, 2020

Main question

Environment

RuoshiZhang commented Jun 26, 2020