fastx_subseq.py

This script can extract FASTA/Q subseq extremely fast but consumes memory a lot.

Methods

In Bash:

$ pypy fastx_subseq.py -f FASTX -l SEQ_NAME_LIST # No detailed information printed.

or

$ pypy fastx_subseq.py -f FASTX -l SEQ_NAME_LIST -o OUT_DIR -v

# Warning: This script is memeory-consuming! #
Initializing...
Extracting...
    [================================================================================]  Processing 100.0%... 
All done.

Without a specified "-o" option, all extracted sequences will be put in a fold named extracted_sequences in current work directory in default. Make sure you have permission.

Imported as a Python module:

An example:

import sys
sys.path.append('/path/to/fastx_subseq/')  # If necessary.
from fastx_subseq import Fastx
f = Fastx(FASTX, verbose=True)             # To process verbosely, set "verbose=True" (default).
f.ExtractInfo()                            # To extract the FASTX's info (consumes memory).
f.FetchSeq(SEQ_NAME_LIST, OUT_DIR)         # To fetch sequences.
f.ReleaseMemory()                          # Recommended.

For more details:

>>> from fastx_subseq import Fastx
>>> help(Fastx)

Formats:

FASTX is supposed to be a file in FASTA format or 4-line FASTQ format.
And SEQ_NAME_LIST is a plain text, containing sequence names, one per line (no space), such as:

$ head SEQ_NAME_LIST
E00247:343:HYMLVCCXX:8:1101:11363:40583
E00247:343:HYMLVCCXX:8:1101:1813:43941
E00247:343:HYMLVCCXX:8:1101:23023:68658
E00247:343:HYMLVCCXX:8:1101:23409:33041
E00247:343:HYMLVCCXX:8:1101:2656:67058

OUT_DIR refers to a customized output directory (default: "./extracted_sequences/").

Note:

If few subseqs need to be extracted from a FASTA file, samtools is suggested:

$ samtools faidx INPUT_FASTA                          # Build an index for your FASTA file first.  
$ samtools faidx INPUT_FASTA SEQ_NAME > OUTPUT_FASTA  # Extract the subseq.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
Argv.py		Argv.py
LICENSE		LICENSE
ProcessingBar.py		ProcessingBar.py
README.md		README.md
__init__.py		__init__.py
fastx_subseq.py		fastx_subseq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastx_subseq.py

Methods

In Bash:

Imported as a Python module:

Formats:

Note:

License

About

Releases

Packages

Languages

License

yangwu91/fastx_subseq

Folders and files

Latest commit

History

Repository files navigation

fastx_subseq.py

Methods

In Bash:

Imported as a Python module:

Formats:

Note:

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages