command line tool to calculate all possible glycan molecules and the m/z values of their ions given a set of input parameters
NOTE: I have now also added an 'R' version - if you use the reticulate package in R, you can source the file (sugarMassesPredict-r.py) and then use the function 'predict_sugars' directly in R to generate output there.
this tool is being frequently updated :) if you have any questions or issue please feel free to contact me at mbligh@mpi-bremen.de
word of caution: please note that this tool will predict sugars that are not possible as the nature of sugar chemistry means that it would take a long time to add in all the constraints!
e.g.
library(reticulate)
py_install("pandas", "numpy")
source_python("sugarMassesPredict-r.py")
dp1 = as.integer(1)
dp2 = as.integer(3)
ESI_mode = 'pos'
scan_range1 = as.integer(100)
scan_range2 = as.integer(800)
pent_option = as.integer(1)
modifications = list('sulphate', 'deoxy')
label = "procainamide"
df <- predict_sugars(dp1 = dp1, dp2 = dp2, ESI_mode = ESI_mode, scan_range1 = scan_range1, scan_range2 = scan_range2, pent_option = pent_option, modifications = modifications, label = label)
- pandas
- numpy
- python 3
- dp (degree of polymerisation) range
- whether pentose monomers should be used in addition to hexose
- modifications - possible options are none, all, or any combination of:
- sulphate
- carboxyl
- phosphate
- deoxy
- N-acetyl
- O-acetyl
- O-methyl
- anhydrobridge
- unsaturated
- alditol
- amino
- dehydrated
- maximum number of modifications per monomer on average
- ionisation mode
- scan range (m/z)
- label - current options are procainamide (added by reductive amination) and benzoic acid (added on free alcohol groups, will calculate glycans with no label to the maximum number of labels possible)
- output file path - defaults to "predicted_sugars.txt"
- options to do with the calculation of the possible number of structural isomers (but this section needs to be fixed)
tab delimited text file with one row per molecule. m/z values outside the scan range as shown as "NA", and molecules with no ions with m/z values within the scan are not returned. columns are as follows:
- degree of polymerisation (dp)
- name
- monoisotopic mass (Da)
- sum formula
- columns with m/ values of possible ions given the input parameters.
- positive mode:
- [M+H]+
- [M+Na]+
- negative mode:
- [M+Cl]-
- [M+CHOO]-
- [M+2Cl]2-
- [M+2CHOO]2-
- [M+Cl-H]2-
- [M+CHOO-H]2-
- [M+CHOO+Cl]2-
- [M-nH]n-, where n is 1 to the maximum number anionic groups that any single molecule in the table has
- positive mode:
- if parameters related to isomers were specified in the input, there is an additional column for the number of possible isomers per molecule
the help menu accessed with:
sugarMassesPredict.py -h
returns the following:
usage: sugarMassesPredict.py [-h] -dp int int [-p int] -m str [str ...]
[-n int] [-ds int] [-ld str [str ...]] [-oh int]
[-b int] -i str [str ...] -s int int [-l label]
[-o filepath]
Script to predict possible masses of unknown sugars. Written by Margot Bligh.
optional arguments:
-h, --help show this help message and exit
-dp int int, --dp_range int int
DP range to predict within: two space separated
numbers required (lower first)
-p int, --pent_option int
should pentose monomers be considered as well as
hexose: 0 for no {default}, 1 for yes
-m str [str ...], --modifications str [str ...]
space separated list of modifications to consider.
note that alditol and unsaturated are max once per
saccharide. allowed values: none OR all OR any
combination of carboxyl, phosphate, deoxy, nacetyl,
omethyl, anhydrobridge, oacetyl, unsaturated, alditol,
sulphate
-n int, --nmod_max int
max no. of modifications per monomer on average
{default 1}. does not take into account unsaturated or
alditol.
-ds int, --double_sulphate int
can monomers be double-sulphated: 0 for no {default},
1 for yes. for this you MUST give a value of at least
2 to -n/--nmod_max
-ld str [str ...], --LorD_isomers str [str ...]
isomers calculated for L and/or D enantiomers {default
D only}. write space separated if both
-oh int, --OH_stereo int
stereochem of OH groups considered when calculating
no. of isomers: 0 for no {default}, 1 for yes
-b int, --bond_stereo int
stereochem of glycosidic bonds and reducing end
anomeric carbons considered when calculating no. of
isomers: 0 for no {default}, 1 for yes
-i str [str ...], --ESI_mode str [str ...]
neg and/or pos mode for ionisation (space separated if
both)
-s int int, --scan_range int int
mass spec scan range to predict within: two space
separated numbers required (lower first)
-l label, --label label
name a label added to the oligosaccharide. if not
labelled do not include. options: procainamide OR
benzoic_acid.
-o filepath, --output filepath
filepath to .txt file for output table {default:
predicted_sugars.txt}