PseudoAAC module¶
Instead of using the conventional 20-D amino acid composition to represent the sample
of a protein, Prof. Kuo-Chen Chou proposed the pseudo amino acid (PseAA) composition
in order for inluding the sequence-order information. Based on the concept of Chou’s
pseudo amino acid composition, the server PseAA was designed in a flexible way, allowing
users to generate various kinds of pseudo amino acid composition for a given protein
sequence by selecting different parameters and their combinations. This module aims at
computing two types of PseAA descriptors: Type I and Type II.
You can freely use and distribute it. If you have any problem, you could contact
with us timely.
References:
[1]: Kuo-Chen Chou. Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid
Composition. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246-255.
[2]: http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/
[3]: http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/type2.htm
[4]: Kuo-Chen Chou. Using amphiphilic pseudo amino acid composition to predict enzyme
subfamily classes. Bioinformatics, 2005,21,10-19.
Authors: Zhijiang Yao and Dongsheng Cao.
Date: 2016.06.04
Email: gadsby@163.com
-
PseudoAAC.
GetAAComposition
(ProteinSequence)[source]¶ Calculate the composition of Amino acids
for a given protein sequence.
Usage:
result=CalculateAAComposition(protein)
Input: protein is a pure protein sequence.
Output: result is a dict form containing the composition of
-
PseudoAAC.
GetAPseudoAAC
(ProteinSequence, lamda=30, weight=0.5)[source]¶ Computing all of type II pseudo-amino acid compostion descriptors based on the given
properties. Note that the number of PAAC strongly depends on the lamda value. if lamda
= 20, we can obtain 20+20=40 PAAC descriptors. The size of these values depends on the
choice of lamda and weight simultaneously.
Usage:
result=GetAPseudoAAC(protein,lamda,weight)
Input: protein is a pure protein sequence.
lamda factor reflects the rank of correlation and is a non-Negative integer, such as 15.
Note that (1)lamda should NOT be larger than the length of input protein sequence;
- lamda must be non-Negative integer, such as 0, 1, 2, ...; (3) when lamda =0, the
output of PseAA server is the 20-D amino acid composition.
weight factor is designed for the users to put weight on the additional PseAA components
with respect to the conventional AA components. The user can select any value within the
region from 0.05 to 0.7 for the weight factor.
-
PseudoAAC.
GetAPseudoAAC1
(ProteinSequence, lamda=30, weight=0.5)[source]¶ Computing the first 20 of type II pseudo-amino acid compostion descriptors based on
-
PseudoAAC.
GetAPseudoAAC2
(ProteinSequence, lamda=30, weight=0.5)[source]¶ Computing the last lamda of type II pseudo-amino acid compostion descriptors based on
-
PseudoAAC.
GetCorrelationFunction
(Ri='S', Rj='D', AAP=[])[source]¶ Computing the correlation between two given amino acids using the given
properties.
Usage:
result=GetCorrelationFunction(Ri,Rj,AAP)
Input: Ri and Rj are the amino acids, respectively.
AAP is a list form containing the properties, each of which is a dict form.
-
PseudoAAC.
GetPseudoAAC
(ProteinSequence, lamda=30, weight=0.05, AAP=[])[source]¶ Computing all of type I pseudo-amino acid compostion descriptors based on the given
properties. Note that the number of PAAC strongly depends on the lamda value. if lamda
= 20, we can obtain 20+20=40 PAAC descriptors. The size of these values depends on the
choice of lamda and weight simultaneously. You must specify some properties into AAP.
Usage:
result=GetPseudoAAC(protein,lamda,weight)
Input: protein is a pure protein sequence.
lamda factor reflects the rank of correlation and is a non-Negative integer, such as 15.
Note that (1)lamda should NOT be larger than the length of input protein sequence;
- lamda must be non-Negative integer, such as 0, 1, 2, ...; (3) when lamda =0, the
output of PseAA server is the 20-D amino acid composition.
weight factor is designed for the users to put weight on the additional PseAA components
with respect to the conventional AA components. The user can select any value within the
region from 0.05 to 0.7 for the weight factor.
AAP is a list form containing the properties, each of which is a dict form.
-
PseudoAAC.
GetPseudoAAC1
(ProteinSequence, lamda=30, weight=0.05, AAP=[])[source]¶ Computing the first 20 of type I pseudo-amino acid compostion descriptors based on the given
-
PseudoAAC.
GetPseudoAAC2
(ProteinSequence, lamda=30, weight=0.05, AAP=[])[source]¶ Computing the last lamda of type I pseudo-amino acid compostion descriptors based on the given
-
PseudoAAC.
GetSequenceOrderCorrelationFactor
(ProteinSequence, k=1, AAP=[])[source]¶ Computing the Sequence order correlation factor with gap equal to k based on
the given properities.
Usage:
result=GetSequenceOrderCorrelationFactor(protein,k,AAP)
Input: protein is a pure protein sequence.
k is the gap.
AAP is a list form containing the properties, each of which is a dict form.
-
PseudoAAC.
GetSequenceOrderCorrelationFactorForAPAAC
(ProteinSequence, k=1)[source]¶ Computing the Sequence order correlation factor with gap equal to k based on
[_Hydrophobicity,_hydrophilicity] for APAAC (type II PseAAC) .
Usage:
result=GetSequenceOrderCorrelationFactorForAPAAC(protein,k)
Input: protein is a pure protein sequence.
k is the gap.
-
PseudoAAC.
NormalizeEachAAP
(AAP)[source]¶ All of the amino acid indices are centralized and
standardized before the calculation.
Usage:
result=NormalizeEachAAP(AAP)
Input: AAP is a dict form containing the properties of 20 amino acids.
Output: result is the a dict form containing the normalized properties