You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using SpacePHARER for a project, and had a few questions about how query database size can affect predictmatch results. I ran spacepharer predictmatch for two query databases, both times against the same target database. Only 31% of hits from the first run (which used a smaller query database) were found in the second run.
#First run spacer query DB:
2,206 spacers from 38 genomes.
#First run results:
161 spacer hits to viral target DB.
#Second run spacer query DB:
15,730 spacers from 450 genomes.
#Second run results:
1,764 spacer hits to viral target DB.
50 of the 161 spacers with hits in the first run were retained in the second run output.
Main question
I am wondering why spacers from the smaller query database that had a hit in the first run are not present from the output of the second run which has the increased query database size. Does the --simple-best-hit setting affect this?
The tmp folder was emptied after each run.
Environment
SpacePHARER Version: 2.fc5e668
Conda
Ubuntu 16.04
#The same parameters are used for both runs:
--strand 2 --fmt 2 --fdr 0.01 --simple-best-hit 1 --use-all-table-starts 1 --translate 1 --search-type 1 --translation-table 11 --rescore-mode 0 --num-iterations 4 --cov 0.50 --e-profile 0.0001 -s 5.70 --report-pam 1 --gap-open 16 --gap-extend 2 --cov-mode 0 --min-seq-id 0.95 --max-seq-id 1.00 --orf-start-mode 1 --remove-tmp-files 0
I'm reading through the supplemental info on the bioRxiv paper to try and understand the algorithm better.
Thank you very much for your time and help.
Cheers,
Ian
The text was updated successfully, but these errors were encountered:
Could you please provide the full log of the two runs?
What is the size of your target DB? Are the matches(virus-host pairs) of the rest of the hits also not reported in the output?
The --simple-best-hit parameter in SpacePHARER is fixed and should not be related to this problem.
Hello,
I am using SpacePHARER for a project, and had a few questions about how query database size can affect predictmatch results. I ran spacepharer predictmatch for two query databases, both times against the same target database. Only 31% of hits from the first run (which used a smaller query database) were found in the second run.
#First run spacer query DB:
2,206 spacers from 38 genomes.
#First run results:
161 spacer hits to viral target DB.
#Second run spacer query DB:
15,730 spacers from 450 genomes.
#Second run results:
1,764 spacer hits to viral target DB.
50 of the 161 spacers with hits in the first run were retained in the second run output.
Main question
I am wondering why spacers from the smaller query database that had a hit in the first run are not present from the output of the second run which has the increased query database size. Does the --simple-best-hit setting affect this?
The tmp folder was emptied after each run.
Environment
SpacePHARER Version: 2.fc5e668
Conda
Ubuntu 16.04
#The same parameters are used for both runs:
--strand 2 --fmt 2 --fdr 0.01 --simple-best-hit 1 --use-all-table-starts 1 --translate 1 --search-type 1 --translation-table 11 --rescore-mode 0 --num-iterations 4 --cov 0.50 --e-profile 0.0001 -s 5.70 --report-pam 1 --gap-open 16 --gap-extend 2 --cov-mode 0 --min-seq-id 0.95 --max-seq-id 1.00 --orf-start-mode 1 --remove-tmp-files 0
I'm reading through the supplemental info on the bioRxiv paper to try and understand the algorithm better.
Thank you very much for your time and help.
Cheers,
Ian
The text was updated successfully, but these errors were encountered: