Search This Blog

Wednesday, May 2, 2012

Pharmaceutical bioinformatics

Bioinformatics and structure- aided drug design are really part of the same continuum. Bioinformatics offers a means to get to a structure through sequence; while structure- aided drug design offers a means to get to a drug through structure. We plan to combine innovative computational techniques with biochemical and structural expertise to bring bioinformatics and structure- aided drug design even closer together. In particular, we intend to blend computational chemistry with computational biology to create software that will aid protein chemists in understanding, evaluating and predicting the structure, function and activity of medically and industrially important proteins. My laboratory is currently involved in three "bioinformatics" projects. These include: (1) the development of novel methods to identify remote sequence/ structure relationships; (2) the creation of a compact, relational database with advanced bioinformatics functionality; and (3) the development of novel methods for predicting and evaluating protein secondary and tertiary structure.

Molecular modeling:

A technique for the investigation of molecular structures and properties using computational chemistry and graphical visualization techniques in order to provide a plausible three- dimensional representation under a given set of circumstances. IUPAC Medicinal Chemistry

in silico: Literally "in the computer" (as contrasted with "in vitro" (in glass) or "in vivo" (in life). Can be used to screen out compounds which are not druggable.

Mapping and modeling networks and pathwaysThe experimental task of mapping genetic regulatory networks using genetic footprinting and [yeast] two- hybrid techniques is well underway, and the kinetics of these networks is being generated at an astounding rate. ... If the promise of the genome projects and the structural genomics effort is to be fully realized, then predictive simulation methods must be developed to make sense of this emerging experimental data.

There are three bottlenecks in the numerical analysis of biochemical reaction networks. The first is the multiple time scales involved. Since the time between biochemical reactions decreases exponentially with the total probability of a reaction per unit time, the number of computational steps to simulate a unit of biological time increases roughly exponentially as reactions are added to the system or rate constants are increased. The second bottleneck derives from the necessity to collect sufficient statistics from many runs of the Monte- Carlo simulation to predict the phenomenon of interest. The third bottleneck is a practical one of model building and testing: hypothesis exploration, sensitivity analyses, and back calculations, will also be computationally intensive.

Cheminformatics definitions

Mixing of information technology and management to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization. . In Chemoinformatics there are really only two [primary] questions: 1.) what to test next and 2.) what to make next. The main processes within drug discovery are lead identification, where a lead is something that has activity in the low micromolar range, and lead optimization, which is the process of transforming a lead into a drug candidate.

Increasingly incorporates "compound registration into databases, including library enumeration; access to primary and secondary scientific literature; QSAR Quantitative Structure Activity Relationships) and similar tools for relating activity to structure; physical and chemical property calculations; chemical structure and property databases, chemical library design and analysis; structure- based design and statistical methods. Because these techniques have traditionally been considered the realms of scientists from different disciplines, differences in computer systems and terminology provide a barrier to effective communication. This is probably the single most challenging problem that chemoinformatics must solve.


Many people view chemoinformatics as an extension of chemical information, which is a well established concept covering many areas that employ chemical structures, data storage and computational methods, such as compound registration databases, on- line chemical literature, SAR analysis and molecule- property calculation.

Protein bioinformatics

Structural bioinformaticsInvolves the process of determining a protein's three- dimensional structure using comparative primary sequence alignment, secondary and tertiary structure prediction methods, homology modeling, and crystallographic diffraction pattern analyses. Currently, there is no reliable de novo predictive method for protein 3D- structure determination. Over the past half- century, protein structure has been determined by purifying a protein, crystallizing it, then bombarding it with X-rays. The X-ray diffraction pattern from the bombardment is recorded electronically and analyzed using software that creates a rough draft of the 3D structure. Biological scientists and crystallographers then tweak and manipulate the rough draft considerably. The resulting spatial coordinate file can be examined using modeling- structure software to study the gross and subtle features of the protein's structure.

GEO

In the recent past, microarray technology has been extensively used by the scientific community. Consequently, over the years, there has been a lot of generation of data related to gene expression. This data is scattered and is not easily available for public use. For easing the accessibility to this data, the National Center for Biotechnology Information (NCBI) has formulated the Gene Expression Omnibus or GEO. It is a data repository facility which includes data on gene expression from varied sources.
Microarray probe design parameters
For 25-35 mers
ParameterMinimum ValueMaximum ValueDefault ValueUnit
Probe Length
10
99
30
bases
Probe Length tolerance
0
15
3

Probe Target Tm
40
99
63
°C
Probe Tm Tolerance (+)
0.1
99
5

Hairpin Max ÄG
0.1
99.9
4
Kcal/mol
Self Dimer ÄG
0.1
99.9
7
Kcal/mol
Run/Repeat
2
99
4
bases
For 35-45 mers
ParameterMinimum ValueMaximum ValueDefault ValueUnit
Probe Length
10
99
40
bases
Probe Length tolerance
0
15
3

Probe Target Tm
40
99
70
°C
Probe Tm Tolerance (+)
0.1
99
5

Hairpin Max ÄG
0.1
99.9
6
Kcal/mol
Self Dimer ÄG
0.1
99.9
8
Kcal/mol
Run/Repeat
2
99
5
bases
For 65-75 mers
ParameterMinimum ValueMaximum ValueDefault ValueUnit
Probe Length
10
99
70
bases
Probe Length tolerance
0
15
3

Probe Target Tm
40
99
75
°C
Probe Tm Tolerance (+/- above)
0.1
99
5

Hairpin Max ÄG
0.1
99.9
6
Kcal/mol
Self Dimer ÄG
0.1
99.9
8
Kcal/mol
Run/Repeat
2
99
6
bases

Other Parameters

  • Probe Location
    1. 3' end bias: The oligos chosen should be towards the 3' end of the gene i.e. Default : 3' end.
    2. The oligos should be designed by default within 999 bases of 3' end. The range can be from 0 to 1500 bases.
  • The oligos should be free of cross homology (i.e They should be BLAST searched against the appropriate genome category).
DNA Microarray with Array Designer: Array Designer is an exceptional software to design highly specific oligos for expression and SNP genotyping microarray experiments.

Applications of Microarrays

Gene discovery: DNA Microarray technology helps in the identification of new genes, know about their functioning and expression levels under different conditions.
Disease diagnosis: DNA Microarray technology helps researchers learn more about different diseases such as heart diseases, mental illness, infectious disease and especially the study of cancer. Until recently, different types of cancer have been classified on the basis of the organs in which the tumors develop. Now, with the evolution of microarray technology, it will be possible for the researchers to further classify the types of cancer on the basis of the patterns of gene activity in the tumor cells. This will tremendously help the pharmaceutical community to develop more effective drugs as the treatment strategies will be targeted directly to the specific type of cancer.
Drug discovery: Microarray technology has extensive application in Pharmacogenomics. Pharmacogenomics is the study of correlations between therapeutic responses to drugs and the genetic profiles of the patients. Comparative analysis of the genes from a diseased and a normal cell will help the identification of the biochemical constitution of the proteins synthesized by the diseased genes. The researchers can use this information to synthesize drugs which combat with these proteins and reduce their effect.
Toxicological research: Microarray technology provides a robust platform for the research of the impact of toxins on the cells and their passing on to the progeny. Toxicogenomics establishes correlation between responses to toxicants and the changes in the genetic profiles of the cells exposed to such toxicants.

Types of Microarrays

Types of Microarrays

Depending upon the kind of immobilized sample used construct arrays and the information fetched, the Microarray experiments can be categorized in three ways:
1. Microarray expression analysis: In this experimental setup, the cDNA derived from the mRNA of known genes is immobilized. The sample has genes from both the normal as well as the diseased tissues. Spots with more more intensity are obtained for diseased tissue gene if the gene is over expressed in the diseased condition. This expression pattern is then compared to the expression pattern of a gene responsible for a disease.
2. Microarray for mutation analysis: For this analysis, the researchers use gDNA. The genes might differ from each other by as less as a single nucleotide base.
A single base difference between two sequences is known as Single Nucleotide Polymorphism (SNP) and detecting them is known as SNP detection.
3. Comparative Genomic Hybridization: It is used for the identification in the increase or decrease of the important chromosomal fragments harboring genes involved in a disease.

Microarray Technique

An array is an orderly arrangement of samples where matching of known and unknown DNA samples is done based on base pairing rules. An array experiment makes use of common assay systems such as microplates or standard blotting membranes. The sample spot sizes are typically less than 200 microns in diameter usually contain thousands of spots.
Thousands of spotted samples known as probes (with known identity) are immobilized on a solid support (a microscope glass slides or silicon chips or nylon membrane). The spots can be DNA, cDNA, or oligonucleotides. These are used to determine complementary binding of the unknown sequences thus allowing parallel analysis for gene expression and gene discovery. An experiment with a single DNA chip can provide information on thousands of genes simultaneously. An orderly arrangement of the probes on the support is important as the location of each spot on the array is used for the identification of a gene.
Introduction to Microarray
                                                       



                    Molecular Biology research evolves through the development of the technologies used for carrying them out. It is not possible to research on a large number of genes using traditional methods. DNA Microarray is one such technology which enables the researchers to investigate and address issues which were once thought to be non traceable. One can analyze the expression of many genes in a single reaction quickly and in an efficient manner. DNA Microarray technology has empowered the scientific community to understand the fundamental aspects underlining the growth and development of life as well as to explore the genetic causes of anomalies occurring in the functioning of the human body.
                                A typical microarray experiment involves the hybridization of an mRNA molecule to the the DNA template from which it is originated. Many DNA samples are used to construct an array. The amount of mRNA bound to each site on the array indicates the expression level of the various genes. This number may run in thousands. All the data is collected and a profile is generated for gene expression in the cell.

Thursday, April 26, 2012


AUTODOCK:


AutoDock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.
Current distributions of AutoDock consist of two generations of software: AutoDock 4 and AutoDock Vina.
AutoDock 4 actually consists of two main programs: autodock performs the docking of the ligand to a set of grids describing the target protein; autogrid pre-calculates these grids.
In addition to using them for docking, the atomic affinity grids can be visualised. This can help, for example, to guide organic synthetic chemists design better binders.
AutoDock Vina does not require choosing atom types and pre-calculating grid maps for them. Instead, it calculates the grids internally, for the atom types that are needed, and it does this virtually instantly.
We have also developed a graphical user interface called AutoDockTools, or ADT for short, which amongst other things helps to set up which bonds will treated as rotatable in the ligand and to analyze dockings.
AutoDock has applications in:
  • X-ray crystallography;
  • structure-based drug design;
  • lead optimization;
  • virtual screening (HTS);
  • combinatorial library design;
  • protein-protein docking;
  • chemical mechanism studies.
AutoDock 4 is free and is available under the GNU General Public License. AutoDock Vina is available under the Apache license, allowing commercial and non-commercial use and redistribution. Click on the "Downloads" tab. And Happy Docking!

 

Bioinformatics Books

Sequence Analysis and General Bioinformatics
 
  • Bioinformatics for Geneticists, Michael Barnes, Ian C Gray (Editors), 2003, John Wiley & Sons
  • Bioinformatics for Dummies, Jean-Michel Claverie, Cedric Notredame, 2003, John Wiley & Sons
  • Mathematics of Genome Analysis, Jerome K. Percus, 2002, Cambridge Univ Press
  • Bioinformatics Computing, Bryan P. Bergeron, 2002, Prentice Hall
  • Evolutionary Computation in Bioinformatics, Gary B. Fogel, David W. Corne (Editors), 2002, Morgan Kaufmann
  • Introduction to Bioinformatics, Arthur M. Lesk, 2002, Oxford University Press
  • Instant Notes in Bioinformatics, D.R. Westhead, J. H. Parish, R.M. Twyman, 2002, Bios Scientific Pub
  • Fundamental Concepts of Bioinformatics, Dan E. Krane, Michael L. Raymer, Michaeel L. Raymer, Elaine Nicpon Marieb, 2002, Benjamin/Cummings
  • Essentials of Genomics and Bioinformatics, C. W. Sensen (Editor), 2002, John Wiley & Sons
  • Current Topics in Computational Molecular Biology (Computational Molecular Biology), Tao Jiang, Ying Xu, Michael Zhang (Editors), 2002, MIT Press
  • Hidden Markov Models for Bioinformatics, Timo Koski, Timo Koskinen, 2001, Kluwer Academic Publishers
  • Bioinformatics: From Genomes to Drugs, Thomas Lengauer (Editor), 2001, John Wiley & Sons
  • Statistical Methods in Bioinformatics: An Introduction (Statistics for Biology and Health), Warren Ewens, Gregory Grant, 2001, Springer Verlag
  • Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition, Andreas D. Baxevanis, B. F. Francis Ouellette, 2001, Wiley-Interscience
  • Bioinformatics: The Machine Learning Approach, Second Edition (Adaptive Computation and Machine Learning), Pierre Baldi, Soren Brunak, Sren Brunak, 2001, MIT Press
  • Introduction to Bioinformatics, T eresa Attwood, David Parry-Smith, 2001, Prentice Hall
  • Bioinformatics: A Primer, Charles Staben, 2001, Jones & Bartlett Pub
  • Data Analysis and Classification for Bioinformatics, Arun Jagota, 2000, AKJ Academics
  • Bioinformatics: A Biologist's Guide to Biocomputing and the Internet, Stuart M. Brown, 2000, Eaton Pub Co
  • Bioinformatics: Sequence, Structure and Databanks: A Practical Approach (The Practical Approach Series, 236), Des Higgins (Editor), Willie Taylor (Editor), 2000, Oxford Univ Press
  • Neural Networks and Genome Informatics, Cathy H. Wu, Jerry W. McLarty, 2000, Elsevier Science
  • Computational Molecular Biology: An Introduction (Wiley Series in Mathematical and Computational Biology), Peter Clote and Rolf Backofen, 2000, John Wiley & Sons
  • Post-Genome Informatics, Minoru Kanehisa, 2000, Oxford Univ Press
  • Mathematical and Computational Biology: Computational Morphogenesis, Hierarchical Complexity, and Digital Evolution, Chrystopher L. Nehaniv, 1999, American Mathematical Society
  • Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, David Sankoff and Joseph Kruskal (Editors), 1999, Cambridge University Press
  • Bioinformatics Basics: Applications in Biological Science and Medicine, Hooman Rashidi, 1999, CRC Press
  • Bioinformatics: Methods and Protocols (Methods in Molecular Biology, Vol 132), Stephen Misener and Stephen A. Krawetz (Editors),1999, Humana Press
  • Bioinformatics: Databases and Systems, Stanley Letovsky (Editor),1999, Kluwer Academic Publishers
  • Computational Molecular Biology, P. Green, 1998, Blackwell Science Inc.
  • Guide to Human Genome Computing, M. J. Bishop (Editor), 1998, Academic Press
  • Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Dan Gusfield, 1997, Cambridge University Press
  • Sequence Data Analysis Guidebook, Simon R. Swindell (Editor), 1997, Humana Press
  • High Performance Computational Methods for Biological Sequence Analysis, Tieng K. Yap, Ophir Frieder, Robert L. Martino, 1996, Kluwer Academic Pub.
  • Computer Methods for Macromolecular Sequence Analysis, Methods in Enzymology, volume 266, Russell F. Doolittle (Editor), 1996, Academic Press
  • DNA and Protein Sequence Analysis: A Practical Approach (Practical Approach Series , No 171), 1996, M. J. Bishop and C. J. Rawlings (Editors), 1996, IRL Press
  • Molecular Bioinformatics: Algorithms and Applications, Steffen Schulze-Kremer, 1995, Walter De Gruyter
  • Computer Analysis of Sequence Data, Annette M. Griffin and Hugh G. Griffin (Editors), 1994, Humana Press
  • Artificial Intelligence and Molecular Biology, Lawrence Hunter (Editor), 1993, AAAI Press
  • Sequence Analysis Primer, Michael Gribskov and John Devereux (Editors), 1992, Oxford University Press
  • Mathematical Methods of Analysis of Biopolymer Sequences (Dimacs Series in Discrete Mathematics and Theoretical Computer Science ; Volume 8), S. G. Gindikin, 1992, American Mathematical Society
  • Mathematical Methods for DNA Sequences
  • Inspirational Software for Biologists


    Geneious Pro is a revolutionary bioinformatics software platform that is both ultra-powerful and easy to use. Scientists, researchers and students are able to search, organize and analyze genomic and protein information via a single desktop program that provides publication ready images to enhance the impact of your research.
     
    Bioinformatics Software and Tools
    1. Gene Predictor(ChemGenome 2.0)
    Whole Genome Analysis
    2. Bhageerath
    Predicts native-like structures for small globular proteins
    3. Sanjeevini
    A complete drug design software.
    4. Binding Affinity Prediction of Protein-Ligand Server(BAPPL)
    Computes the binding free energy of a protein-ligand complex.
    5. Binding Affinity Prediction of Protein-Ligand complex containing Zinc Server
    (BAPPL-Z)
    Computes the binding free energy of a metalloprotein-ligand complex containing zinc.
    6. Drug-DNA Interaction Energy (PreDDICTA)
    Calculates the Drug-DNA interaction energy.
    7. ParDOCK - Automated Server for Rigid Docking
    Predicts the binding mode of the ligand in receptor target site.
    8. Active Site Prediction
    9. Automated Version Of Active Site Prediction (AADS)
    Predicts 10 binding sites in a protein target and docks the uploaded ligand molecule at all 10 sites predicted in an automated mode.
    10. DnaDOCK - Dna Ligand Docking
    All-atom energy based Monte Carlo DNA ligand docking
    11. Non Redundant Database of Small Molecules
    Virtual high throughput screening of small molecules and their optimization into lead like candidates.
    12. Lipinski Filters
    Checks whether a drug satisfies the 5 Lipinski rules.
    13. DNA Sequence to Structure
    Generates double helical secondary structure of DNA using conformational parameters taken from experimental fiber-diffraction studies.
    14. Hydrogen Addition to Nucleic Acid
    Adds the hydrogen coordinates to the X-ray crystal structures of Nucleic acids
    15. Hydrogen Addition to Protein
    Adds the hydrogen coordinates to the X-ray crystal structures of Proteins.

    16. Gene Evaluator(ChemGenome 1.1)
    Characterizes a DNA sequence as gene or nongene
    17. Protein Structure Generation
    Structure Generation from given dihedrals
    18. Persistence Length
    Filters for Globular Protein Evaluation
    19. Radius of Gyration
    Filters for Globular Protein Evaluation
    20. Hydrophobicity
    Filters for Globular Protein Evaluation
    21. Packing Fraction
    Filters for Globular Protein Evaluation
    22. ProRegIn
    Protein Regularity Index
    23. Protein structure optimizer
    Energy minimizer for proteins
    24. ProSEE
    Scoring Function for Protein Structure Evaluation Calculates intramolecular energy of a protein in component-wise break up.
    25. Superimpose
    Fits two molecules and calculates the RMSD between them.
    26. Protein Angle Descriptor
    Calculates the angles & dihedral in the main chain of the protein
    27. Wiener Index Calculator
    This tool is useful for calculating Wiener index.
    28. RASPD for Preliminary Screening of Drugs
    This tool is useful for preliminary screening of drug molecules based on Wiener index calculation. This will predict binding energy of drug/target at a preliminary stage.
    29. BGPred(Beta Gamma Turn Predictor)
    BG Pred web server predicts beta and gamma turns.
    30. Volume Calculator
    Calculates the volume of a molecule
    31. Melting Temperature Predictor (For oligonuclotide)
    It predicts the melting temperature of short DNA sequences (upto 70 base pairs) at a user defined salt within the specified range.
    32. Genome analysis by melting
    It predicts the melting temperature for longer(>70 bases) DNA sequences, and it also gives the melting profile for the sequence.
    33. Transferrable Partial Atomic Charge Model - up to 4 bonds (TPACM4)
    This tool is used for assignment of partial atomic charge of small molecules.
    34. PROSECSC
    PROTEIN SECONDARY STRUCTURE PREDICTION.
    35. Gene Predictor(ChemGenome 3.0)
    ChemGenome 3.0 is a gene prediction tool that takes a whole genome sequence or a part of the genome of a prokaryote or virus as input and predicts genes along with their coressponding protein sequences in all the six reading frames.

    36.Bhageerath H
    A Homology ab-intio Hybrid Web server for Protein Tertiary Structure Prediction. Starting with sequence, the web server predicts 5 native-like candidate structures for the protein.

    37. pcSM Software
    pcSM: Capturing Native Protein Structures with a Physico-Chemical Metric.