Overview¶
To develop a powerful model for prediction tasks by machine learning algorithms such as sckit-learn, one of the most important things to consider is how to effectively represent the molecules under investigation such as small molecules, proteins, DNA and even complex interactions, by a descriptor. PyBioMed is a feature-rich package used for the characterization of various complex biological molecules and interaction samples, such as chemicals, proteins, DNA, and their interactions. PyBioMed calculates nine types of features including chemical descriptors or molecular fingerprints, structural and physicochemical features of proteins and peptides from amino acid sequence, composition and physicochemical features of DNA from their primary sequences, chemical-chemical interaction features, chemical-protein interaction features, chemical-DNA interaction features, protein-protein interaction features, protein-DNA interaction features, and DNA-DNA interaction features. We hope that the package can be used for exploring questions concerning structures, functions and interactions of various molecular data in the context of chemoinformatics, bioinformatics, and systems biology. The python package PyBioMed is designed by CBDD Group (Computational Biology & Drug Design Group), Xiangya School of Pharmaceutical Sciences, Central South University.
Who uses PyBioMed?¶
For those researchers from different biomedical fields, the PyBioMed package can be used to analyze and represent various complex molecular data under investigation. PyBioMed will be helpful when exploring questions concerning structures, functions and interactions of various molecular data in the context of chemoinformatics, bioinformatics, and systems biology.
Motivation¶
PyBioMed is intended to provide
- Tools for pretreating molecules, proteins sequence and DNA sequence
- Calculating chemical descriptors or molecular fingerprints from molecules’ structures
- Calculating structural and physicochemical features of proteins and peptides from amino acid sequence
- Calculating composition and physicochemical features of DNA from their primary sequences
- Calculating interaction features including chemical-chemical interaction features, chemical-protein interaction features, chemical-DNA interaction features, protein-protein interaction features, protein-DNA interaction features and DNA-DNA interaction features.
- Getting molecular structures, protein sequence and DNA sequence from Internet through the molecular ID, protein ID and DNA ID.
Feature overview¶
The table below shows the descriptors and the number of the descriptor that PyBioMed can calculate in four modules including PyMolecule, PyProtein, PyDNA and PyInteraction. PyMolecule module can calculate 14 different types of molecular descriptors and 18 different types of molecular fingerprints. PyProtein module can calculate 14 types of protein descriptors. PyDNA module can calculate 14 types of DNA descriptors and the number in the table appears when parameters are ‘all_property = True, lamada=2, w=0.05’. PyInteraction module can calculate three types of descriptors.
Types | Features | Description |
---|---|---|
PyMolecule |
|
PyMolecule |
|
||
PyProtein |
|
PyProtein |
PyDNA |
|
PyDNA |
PyInteraction |
|
PyInter |
The Python programming language¶
Python is a powerful programming language that allows simple and flexible representations of biochemical molecules, and clear and concise expressions of bioinformatics algorithms. Python has a vibrant and growing ecosystem of packages that PyBioMed uses to provide more features such as RDkit and Pybel. In addition, Python is also an excellent “glue” language for putting together pieces of software from other languages which allows reuse of legacy code and engineering of high-performance algorithms. Equally important, Python is free, well-supported, and a joy to use. In order to make full use of PyBioMed, you will want to know how to write basic programs in Python. Among the many guides to Python, we recommend the documentation at http://www.python.org.