# PseudoAAC module¶

Instead of using the conventional 20-D amino acid composition to represent the sample

of a protein, Prof. Kuo-Chen Chou proposed the pseudo amino acid (PseAA) composition

in order for inluding the sequence-order information. Based on the concept of Chou’s

pseudo amino acid composition, the server PseAA was designed in a flexible way, allowing

users to generate various kinds of pseudo amino acid composition for a given protein

sequence by selecting different parameters and their combinations. This module aims at

computing two types of PseAA descriptors: Type I and Type II.

You can freely use and distribute it. If you have any problem, you could contact

with us timely.

References:

[1]: Kuo-Chen Chou. Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid

Composition. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246-255.

[4]: Kuo-Chen Chou. Using amphiphilic pseudo amino acid composition to predict enzyme

subfamily classes. Bioinformatics, 2005,21,10-19.

Authors: Zhijiang Yao and Dongsheng Cao.

Date: 2016.06.04

PseudoAAC.GetAAComposition(ProteinSequence)[source]

Calculate the composition of Amino acids

for a given protein sequence.

Usage:

result=CalculateAAComposition(protein)

Input: protein is a pure protein sequence.

Output: result is a dict form containing the composition of

PseudoAAC.GetAPseudoAAC(ProteinSequence, lamda=30, weight=0.5)[source]

Computing all of type II pseudo-amino acid compostion descriptors based on the given

properties. Note that the number of PAAC strongly depends on the lamda value. if lamda

= 20, we can obtain 20+20=40 PAAC descriptors. The size of these values depends on the

choice of lamda and weight simultaneously.

Usage:

result=GetAPseudoAAC(protein,lamda,weight)

Input: protein is a pure protein sequence.

lamda factor reflects the rank of correlation and is a non-Negative integer, such as 15.

Note that (1)lamda should NOT be larger than the length of input protein sequence;

1. lamda must be non-Negative integer, such as 0, 1, 2, ...; (3) when lamda =0, the

output of PseAA server is the 20-D amino acid composition.

weight factor is designed for the users to put weight on the additional PseAA components

with respect to the conventional AA components. The user can select any value within the

region from 0.05 to 0.7 for the weight factor.

PseudoAAC.GetAPseudoAAC1(ProteinSequence, lamda=30, weight=0.5)[source]

Computing the first 20 of type II pseudo-amino acid compostion descriptors based on

PseudoAAC.GetAPseudoAAC2(ProteinSequence, lamda=30, weight=0.5)[source]

Computing the last lamda of type II pseudo-amino acid compostion descriptors based on

PseudoAAC.GetCorrelationFunction(Ri='S', Rj='D', AAP=[])[source]

Computing the correlation between two given amino acids using the given

properties.

Usage:

result=GetCorrelationFunction(Ri,Rj,AAP)

Input: Ri and Rj are the amino acids, respectively.

AAP is a list form containing the properties, each of which is a dict form.

PseudoAAC.GetPseudoAAC(ProteinSequence, lamda=30, weight=0.05, AAP=[])[source]

Computing all of type I pseudo-amino acid compostion descriptors based on the given

properties. Note that the number of PAAC strongly depends on the lamda value. if lamda

= 20, we can obtain 20+20=40 PAAC descriptors. The size of these values depends on the

choice of lamda and weight simultaneously. You must specify some properties into AAP.

Usage:

result=GetPseudoAAC(protein,lamda,weight)

Input: protein is a pure protein sequence.

lamda factor reflects the rank of correlation and is a non-Negative integer, such as 15.

Note that (1)lamda should NOT be larger than the length of input protein sequence;

1. lamda must be non-Negative integer, such as 0, 1, 2, ...; (3) when lamda =0, the

output of PseAA server is the 20-D amino acid composition.

weight factor is designed for the users to put weight on the additional PseAA components

with respect to the conventional AA components. The user can select any value within the

region from 0.05 to 0.7 for the weight factor.

AAP is a list form containing the properties, each of which is a dict form.

PseudoAAC.GetPseudoAAC1(ProteinSequence, lamda=30, weight=0.05, AAP=[])[source]

Computing the first 20 of type I pseudo-amino acid compostion descriptors based on the given

PseudoAAC.GetPseudoAAC2(ProteinSequence, lamda=30, weight=0.05, AAP=[])[source]

Computing the last lamda of type I pseudo-amino acid compostion descriptors based on the given

PseudoAAC.GetSequenceOrderCorrelationFactor(ProteinSequence, k=1, AAP=[])[source]

Computing the Sequence order correlation factor with gap equal to k based on

the given properities.

Usage:

result=GetSequenceOrderCorrelationFactor(protein,k,AAP)

Input: protein is a pure protein sequence.

k is the gap.

AAP is a list form containing the properties, each of which is a dict form.

PseudoAAC.GetSequenceOrderCorrelationFactorForAPAAC(ProteinSequence, k=1)[source]

Computing the Sequence order correlation factor with gap equal to k based on

[_Hydrophobicity,_hydrophilicity] for APAAC (type II PseAAC) .

Usage:

result=GetSequenceOrderCorrelationFactorForAPAAC(protein,k)

Input: protein is a pure protein sequence.

k is the gap.

PseudoAAC.NormalizeEachAAP(AAP)[source]

All of the amino acid indices are centralized and

standardized before the calculation.

Usage:

result=NormalizeEachAAP(AAP)

Input: AAP is a dict form containing the properties of 20 amino acids.

Output: result is the a dict form containing the normalized properties