PyPretreatMol module

PyPretreatMol.StandardMol(mol)[source]

The function for performing standardization of molecules and deriving parent molecules. The function contains derive fragment, charge, tautomer, isotope and stereo parent molecules. The primary usage is:

mol1 = Chem.MolFromSmiles('C1=CC=CC=C1')
mol2 = s.standardize(mol1)
PyPretreatMol.StandardSmi(smi)[source]

The function for performing standardization of molecules and deriving parent molecules. The function contains derive fragment, charge, tautomer, isotope and stereo parent molecules. The primary usage is:

smi = StandardSmi('C[n+]1c([N-](C))cccc1')
class PyPretreatMol.StandardizeMol(normalizations=(Normalization(u'Nitro to N+(O-)=O', u'[*:1][N,P,As,Sb:2](=[O,S,Se,Te:3])=[O,S,Se,Te:4]>>[*:1][*+1:2]([*-1:3])=[*:4]'), Normalization(u'Sulfone to S(=O)(=O)', u'[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])'), Normalization(u'Pyridine oxide to n+O-', u'[n:1]=[O:2]>>[n+:1][O-:2]'), Normalization(u'Azide to N=N+=N-', u'[*,H:1][N:2]=[N:3]#[N:4]>>[*,H:1][N:2]=[N+:3]=[N-:4]'), Normalization(u'Diazo/azo to =N+=N-', u'[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]'), Normalization(u'Sulfoxide to -S+(O-)-', u'[!O:1][S+0;X3:2](=[O:3])[!O:4]>>[*:1][S+1:2]([O-:3])[*:4]'), Normalization(u'Phosphate to P(O-)=O', u'[O,S,Se,Te;-1:1][P+;D4:2][O,S,Se,Te;-1:3]>>[*+0:1]=[P+0;D5:2][*-1:3]'), Normalization(u'Amidinium to C(=NH2+)NH2', u'[C,S;X3+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization(u'Normalize hydrazine-diazonium', u'[CX4:1][NX3H:2]-[NX3H:3][CX4:4][NX2+:5]#[NX1:6]>>[CX4:1][NH0:2]=[NH+:3][C:4][N+0:5]=[NH:6]'), Normalization(u'Recombine 1,3-separated charges', u'[N,P,As,Sb,O,S,Se,Te;-1:1]-[A:2]=[N,P,As,Sb,O,S,Se,Te;+1:3]>>[*-0:1]=[*:2]-[*+0:3]'), Normalization(u'Recombine 1,3-separated charges', u'[n,o,p,s;-1:1]:[a:2]=[N,O,P,S;+1:3]>>[*-0:1]:[*:2]-[*+0:3]'), Normalization(u'Recombine 1,3-separated charges', u'[N,O,P,S;-1:1]-[a:2]:[n,o,p,s;+1:3]>>[*-0:1]=[*:2]:[*+0:3]'), Normalization(u'Recombine 1,5-separated charges', u'[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[A:3]-[A:4]=[N,P,As,Sb,O,S,Se,Te;+1:5]>>[*-0:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization(u'Recombine 1,5-separated charges', u'[n,o,p,s;-1:1]:[a:2]:[a:3]:[c:4]=[N,O,P,S;+1:5]>>[*-0:1]:[*:2]:[*:3]:[c:4]-[*+0:5]'), Normalization(u'Recombine 1,5-separated charges', u'[N,O,P,S;-1:1]-[c:2]:[a:3]:[a:4]:[n,o,p,s;+1:5]>>[*-0:1]=[c:2]:[*:3]:[*:4]:[*+0:5]'), Normalization(u'Normalize 1,3 conjugated cation', u'[N,O;+0!H0:1]-[A:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]=[*:2]-[*+0:3]'), Normalization(u'Normalize 1,3 conjugated cation', u'[n;+0!H0:1]:[c:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]:[*:2]-[*+0:3]'), Normalization(u'Normalize 1,3 conjugated cation', u'[N,O;+0!H0:1]-[c:2]:[n!$(*[O-]),o;+1H0:3]>>[*+1:1]=[*:2]:[*+0:3]'), Normalization(u'Normalize 1,5 conjugated cation', u'[N,O;+0!H0:1]-[A:2]=[A:3]-[A:4]=[N!$(*[O-]),O;+1H0:5]>>[*+1:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization(u'Normalize 1,5 conjugated cation', u'[n;+0!H0:1]:[a:2]:[a:3]:[c:4]=[N!$(*[O-]),O;+1H0:5]>>[n+1:1]:[*:2]:[*:3]:[*:4]-[*+0:5]'), Normalization(u'Normalize 1,5 conjugated cation', u'[N,O;+0!H0:1]-[c:2]:[a:3]:[a:4]:[n!$(*[O-]),o;+1H0:5]>>[*+1:1]=[c:2]:[*:3]:[*:4]:[*+0:5]'), Normalization(u'Normalize 1,5 conjugated cation', u'[n;+0!H0:1]1:[a:2]:[a:3]:[a:4]:[n!$(*[O-]);+1H0:5]1>>[n+1:1]1:[*:2]:[*:3]:[*:4]:[n+0:5]1'), Normalization(u'Normalize 1,5 conjugated cation', u'[n;+0!H0:1]:[a:2]:[a:3]:[a:4]:[n!$(*[O-]);+1H0:5]>>[n+1:1]:[*:2]:[*:3]:[*:4]:[n+0:5]'), Normalization(u'Charge normalization', u'[F,Cl,Br,I,At;-1:1]=[O:2]>>[*-0:1][O-:2]'), Normalization(u'Charge recombination', u'[N,P,As,Sb;-1:1]=[C+;v3:2]>>[*+0:1]#[C+0:2]')), acid_base_pairs=(AcidBasePair(u'-OSO3H', u'OS(=O)(=O)[OH]', u'OS(=O)(=O)[O-]'), AcidBasePair(u'u2013SO3H', u'[!O]S(=O)(=O)[OH]', u'[!O]S(=O)(=O)[O-]'), AcidBasePair(u'-OSO2H', u'O[SD3](=O)[OH]', u'O[SD3](=O)[O-]'), AcidBasePair(u'-SO2H', u'[!O][SD3](=O)[OH]', u'[!O][SD3](=O)[O-]'), AcidBasePair(u'-OPO3H2', u'OP(=O)([OH])[OH]', u'OP(=O)([OH])[O-]'), AcidBasePair(u'-PO3H2', u'[!O]P(=O)([OH])[OH]', u'[!O]P(=O)([OH])[O-]'), AcidBasePair(u'-CO2H', u'C(=O)[OH]', u'C(=O)[O-]'), AcidBasePair(u'thiophenol', u'c[SH]', u'c[S-]'), AcidBasePair(u'(-OPO3H)-', u'OP(=O)([OH])[O-]', u'OP(=O)([O-])[O-]'), AcidBasePair(u'(-PO3H)-', u'[!O]P(=O)([OH])[O-]', u'[!O]P(=O)([O-])[O-]'), AcidBasePair(u'phthalimide', u'O=C2c1ccccc1C(=O)[NH]2', u'O=C2c1ccccc1C(=O)[N-]2'), AcidBasePair(u'CO3H (peracetyl)', u'C(=O)O[OH]', u'C(=O)O[O-]'), AcidBasePair(u'alpha-carbon-hydrogen-nitro group', u'O=N(O)[CH]', u'O=N(O)[C-]'), AcidBasePair(u'-SO2NH2', u'S(=O)(=O)[NH2]', u'S(=O)(=O)[NH-]'), AcidBasePair(u'-OBO2H2', u'OB([OH])[OH]', u'OB([OH])[O-]'), AcidBasePair(u'-BO2H2', u'[!O]B([OH])[OH]', u'[!O]B([OH])[O-]'), AcidBasePair(u'phenol', u'c[OH]', u'c[O-]'), AcidBasePair(u'SH (aliphatic)', u'C[SH]', u'C[S-]'), AcidBasePair(u'(-OBO2H)-', u'OB([OH])[O-]', u'OB([O-])[O-]'), AcidBasePair(u'(-BO2H)-', u'[!O]B([OH])[O-]', u'[!O]B([O-])[O-]'), AcidBasePair(u'cyclopentadiene', u'[CH2]1C=CC=C1', u'[C-]1C=CC=C1'), AcidBasePair(u'-CONH2', u'C(=O)[NH2]', u'C(=O)[NH-]'), AcidBasePair(u'imidazole', u'c1cnc[n]1', u'c1cnc[n-]1'), AcidBasePair(u'-OH', u'[CX4][OH]', u'[CX4][O-]'), AcidBasePair(u'alpha-carbon-hydrogen-keto group', u'O=C[CH]', u'O=C[C-]'), AcidBasePair(u'alpha-carbon-hydrogen-acetyl ester group', u'OC(=O)[CH]', u'OC(=O)[C-]'), AcidBasePair(u'sp carbon hydrogen', u'C#[CH]', u'C#[C-]'), AcidBasePair(u'alpha-carbon-hydrogen-sulfone group', u'CS(=O)(=O)C[CH]', u'CS(=O)(=O)C[C-]'), AcidBasePair(u'alpha-carbon-hydrogen-sulfoxide group', u'C[SD3](=O)C[CH]', u'C[SD3](=O)C[C-]'), AcidBasePair(u'-NH2', u'[CX4][NH2]', u'[CX4][NH-]'), AcidBasePair(u'benzyl hydrogen', u'c[CD4H]', u'c[CD3-]'), AcidBasePair(u'sp2-carbon hydrogen', u'[CX3]=[CX3H]', u'[CX3]=[CX2-]'), AcidBasePair(u'sp3-carbon hydrogen', u'[CX4H]', u'[CX3-]')), tautomer_transforms=(TautomerTransform(u'1,3 (thio)keto/enol f', u'[CX4!H0][C]=[O,S,Se,Te;X1]', [], []), TautomerTransform(u'1,3 (thio)keto/enol r', u'[O,S,Se,Te;X2!H0][C]=[C]', [], []), TautomerTransform(u'1,5 (thio)keto/enol f', u'[CX4,NX3;!H0][C]=[C][CH0]=[O,S,Se,Te;X1]', [], []), TautomerTransform(u'1,5 (thio)keto/enol r', u'[O,S,Se,Te;X2!H0][CH0]=,:[C][C]=,:[C,N]', [], []), TautomerTransform(u'aliphatic imine f', u'[CX4!H0][C]=[NX2]', [], []), TautomerTransform(u'aliphatic imine r', u'[NX3!H0][C]=[CX3]', [], []), TautomerTransform(u'special imine f', u'[N!H0][C]=[CX3R0]', [], []), TautomerTransform(u'special imine r', u'[CX4!H0][c]=,:[n]', [], []), TautomerTransform(u'1,3 aromatic heteroatom H shift f', u'[#7!H0][#6R1]=[O,#7X2]', [], []), TautomerTransform(u'1,3 aromatic heteroatom H shift r', u'[O,#7;!H0][#6R1]=,:[#7X2]', [], []), TautomerTransform(u'1,3 heteroatom H shift', u'[#7,S,O,Se,Te;!H0][#7X2,#6,#15]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform(u'1,5 aromatic heteroatom H shift', u'[n,s,o;!H0]:[c,n]:[c]:[c,n]:[n,s,o;H0]', [], []), TautomerTransform(u'1,5 aromatic heteroatom H shift f', u'[#7,#16,#8,Se,Te;!H0][#6,nX2]=,:[#6,nX2][#6,#7X2]=,:[#7X2,S,O,Se,Te]', [], []), TautomerTransform(u'1,5 aromatic heteroatom H shift r', u'[#7,S,O,Se,Te;!H0][#6,#7X2]=,:[#6,nX2][#6,nX2]=,:[#7,#16,#8,Se,Te]', [], []), TautomerTransform(u'1,7 aromatic heteroatom H shift f', u'[#7,#8,#16,Se,Te;!H0][#6,#7X2]=,:[#6,#7X2][#6,#7X2]=,:[#6][#6,#7X2]=,:[#7X2,S,O,Se,Te,CX3]', [], []), TautomerTransform(u'1,7 aromatic heteroatom H shift r', u'[#7,S,O,Se,Te,CX4;!H0][#6,#7X2]=,:[#6][#6,#7X2]=,:[#6,#7X2][#6,#7X2]=,:[NX2,S,O,Se,Te]', [], []), TautomerTransform(u'1,9 aromatic heteroatom H shift f', u'[#7,O;!H0][#6,#7X2]=,:[#6,#7X2][#6,#7X2]=,:[#6,#7X2][#6,#7X2]=,:[#6,#7X2][#6,#7X2]=,:[#7,O]', [], []), TautomerTransform(u'1,11 aromatic heteroatom H shift f', u'[#7,O;!H0][#6,nX2]=,:[#6,nX2][#6,nX2]=,:[#6,nX2][#6,nX2]=,:[#6,nX2][#6,nX2]=,:[#6,nX2][#6,nX2]=,:[#7X2,O]', [], []), TautomerTransform(u'furanone f', u'[O,S,N;!H0][#6X3r5;$([#6][!#6])]=,:[#6X3r5]', [], []), TautomerTransform(u'furanone r', u'[#6r5!H0][#6X3r5;$([#6][!#6])]=[O,S,N]', [], []), TautomerTransform(u'keten/ynol f', u'[C!H0]=[C]=[O,S,Se,Te;X1]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform(u'keten/ynol r', u'[O,S,Se,Te;!H0X2][C]#[C]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform(u'ionic nitro/aci-nitro f', u'[C!H0][N+;$([N][O-])]=[O]', [], []), TautomerTransform(u'ionic nitro/aci-nitro r', u'[O!H0][N+;$([N][O-])]=[C]', [], []), TautomerTransform(u'oxim/nitroso f', u'[O!H0][N]=[C]', [], []), TautomerTransform(u'oxim/nitroso r', u'[C!H0][N]=[O]', [], []), TautomerTransform(u'oxim/nitroso via phenol f', u'[O!H0][N]=[C][C]=[C][C]=[OH0]', [], []), TautomerTransform(u'oxim/nitroso via phenol r', u'[O!H0][c]:[c]:[c]:[c][N]=[OH0]', [], []), TautomerTransform(u'cyano/iso-cyanic acid f', u'[O!H0][C]#[N]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform(u'cyano/iso-cyanic acid r', u'[N!H0]=[C]=[O]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform(u'formamidinesulfinic acid f', u'[O,N;!H0][C][S,Se,Te]=[O]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform(u'formamidinesulfinic acid r', u'[O!H0][S,Se,Te][C]=[O,N]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform(u'isocyanide f', u'[C-0!H0]#[N+0]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform(u'isocyanide r', u'[N+!H0]#[C-]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform(u'phosphonic acid f', u'[OH][PH0]', [rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform(u'phosphonic acid r', u'[PH]=[O]', [rdkit.Chem.rdchem.BondType.SINGLE], [])), tautomer_scores=(TautomerScore(u'benzoquinone', u'[#6]1([#6]=[#6][#6]([#6]=[#6]1)=,:[N,S,O])=,:[N,S,O]', 25), TautomerScore(u'oxim', u'[#6]=[N][OH]', 4), TautomerScore(u'C=O', u'[#6]=,:[#8]', 2), TautomerScore(u'N=O', u'[#7]=,:[#8]', 2), TautomerScore(u'P=O', u'[#15]=,:[#8]', 2), TautomerScore(u'C=hetero', u'[#6]=[!#1;!#6]', 1), TautomerScore(u'methyl', u'[CX4H3]', 1), TautomerScore(u'guanidine terminal=N', u'[#7][#6](=[NR0])[#7H0]', 1), TautomerScore(u'guanidine endocyclic=N', u'[#7;R][#6;R]([N])=[#7;R]', 2), TautomerScore(u'aci-nitro', u'[#6]=[N+]([O-])[OH]', -4)), max_restarts=200, max_tautomers=1000, prefer_organic=False)[source]

Bases: object

The main class for performing standardization of molecules and deriving parent molecules.

The primary usage is via the standardize() method:

s = Standardizer()
mol1 = Chem.MolFromSmiles('C1=CC=CC=C1')
mol2 = s.standardize(mol1)

There are separate methods to derive fragment, charge, tautomer, isotope and stereo parent molecules.

addhs(mol)[source]
canonicalize_tautomer
Returns:A callable TautomerCanonicalizer instance.
disconnect_metals
Returns:A callable MetalDisconnector instance.
largest_fragment
Returns:A callable LargestFragmentChooser instance.
normalize
Returns:A callable Normalizer instance.
reionize
Returns:A callable Reionizer instance.
rmhs(mol)[source]
uncharge
Returns:A callable Uncharger instance.
PyPretreatMol.ValidatorMol(mol)[source]

Return log messages for a given SMILES string using the default validations.

Note: This is a convenience function for quickly validating a single SMILES string.

Parameters:smiles (string) – The SMILES for the molecule.
Returns:A list of log messages.
Return type:list of strings.
PyPretreatMol.ValidatorSmi(smi)[source]

Return log messages for a given SMILES string using the default validations.

Note: This is a convenience function for quickly validating a single SMILES string.

Parameters:smiles (string) – The SMILES for the molecule.
Returns:A list of log messages.
Return type:list of strings.