by ZBH

Keep me informed

View page in

format for printing

P-1 : A Retrospective Docking Study of PDE4B Ligands and an Introduction into Methods of Avoiding Some Failures of Current Scoring Functions

Chidochangu Mpamhanga; University of Sheffield, Sheffield, GB
Beining Chen, Iain McLay, Daniel Ormsby, and Mika K. Lindvall, University of Sheffield

In general fast scoring functions fall short in their ability to determine the relative affinities of ligands for their receptors nevertheless this study demonstrates a successful docking and scoring methodology for PDE4B. A series of known inhibitors of PDE4B were docked into the PDE4B/ Pyrazolo[3,4-b]pyridine binding site using LigandFit a fast shape matching docking algorithm. This work was done as a preliminary study, to verify the suitability of the LigandFit/DockScore protocol for virtual screening for another project which required a method that could enrich the top 5% of a database by a factor of at least four. An RMSD comparison of the LigandFit/DockScore (in virtual screening mode) generated poses with the crystallographic poses over 19 inhibitors, whose x-ray structures were available, revealed a reasonable success rate. However, the main objective was to investigate the effectiveness of five available scoring functions (PMF, JAIN, PLP2, LigScore2 and Do ckScore) to enrich the top ranked fractions of nine artificial databases constructed by seeding 20 randomly selected inhibitors (pIC50 > 6.5) into 1980 inactive ligands (pIC50 < 5). PMF and JAIN showed high average enrichment factors (greater than 4) in the top 5-10% of the ranked databases. Rank-based consensus scoring was also investigated and the rational combination of 3 scoring functions resulted in more robust and generalisable scoring schemes, consensus Score DPmJ (DockScore, PMF and JAIN) and PPmJ (PLP2, PMF and JAIN) yielded particularly good results. Finally a brief analysis of the behaviour of the scoring functions across different chemo-types or chemical classes followed. This revealed the inherent bias of the docking and scoring method towards the initial crystal structure binding mode (PDE4B/Pyrazolo[3,4-b]pyridine). And this suggests a need to develop better means of avoiding the problems of using the rigid receptor in docking studies. Future work will be focused on the docking of ligands into multiple binding sites of the PDE4B. Another lesson learnt from this investigation is that scoring functions can be used to a limited extent for lead optimization.


  1. C. M. Venkatachalam, X. Jiang, T. Oldfield, M. Waldman. Journal of Molecular Graphics and Modeling, 21, 2003, 289-30.
  2. D. G. Allen, D.M. Coe, C.M. Cook, M. D. Dowle, C. D. Edlin, J. N.Hamblin, M.R Johnson, P.S. Jones, R. G. Knowles, M. K. Lindvall, C. J. Mitchell, A.J. Redgrave, N. Trivedi, P. Ward, Pyrazolo[3,4-b]pyridine compounds, and their use as phosphodiesterase inhibitors.  PCT Int. Appl.  (2004), 293.
  3. C. P. Mpamhanga, B. Chen, D. L. Ormsby, M. K. Lindvall, I. McLay,  A Retrospective Docking and Scoring Study of PDE4 Ligands and the Consequences of Consensus Scoring. (Under review).



P-3 : Calculating Biases Using Artificial Intelligence in Conjunction with Data Assimilation

Hamse Mussa; Cambridge University, Unilever Centre for Molecular Science Informatics, Cambridge, GB
David J. Lary, NASA, Goddard Space Flight Centre
Robert C. Glen, Cambridge University, Unilever Centre for Molecular Science Informatics

In chemometrics and other parameter estimation areas, models are developed for analysing accurately linear and non-linear multivariate data. In these areas data are generally the sole constraint for fitting models which are supposed to give a mathematical representation of the underlining process(es) of the observations. In other words, the functional forms of the models are approximated solely from the data. Therefore it is crucial that multivariate analysis (MVA) methods take into account any errors in the data.

The MVA techniques deal well with random errors. however, tackling biases is a significant issue for them.   Unfortunately, measurements/observations are prone to biases whose detection is often a difficult task in its own right.

In this talk, we present a novel algorithm for predicting biases in observations. The method is based on artificial intelligence algorithms in conjunction with data assimilation [1], a mathematical scheme which blends the mathematical representation of the underlining process of the observation with the observed data, such that the obtained “mixture” is the “best” estimate of the underlining process which is consistent with both the observations and the model predictions (in this case, the MVA model predictions).

The new approach will allow us not only to detect and then remove biases from observations, but it also will make it easy to calibrate instruments.

Measurements of ozone ( O3 ) concentrations were used to test the performance of the proposed method. Results showing the performance of the new method are presented and discussed.


  1. B. V. Khattatov, J. C. Gille, L. Lyjak, G.P.Brasseur, V. L. Dvortsov, A. E. Roche, and J. W. Waters, J. Geophys. Res., 104 , 18,715 ( 1999 ).



P-5 : Storage and Processing of Chemical Information Directly from any Web Browser

Luc Patiny; Ecole Polytechnique Fédérale de Lausanne, Lausanne, CH
Damiane Banfi and Michal Krompiec, Ecole Polytechnique Fédérale de Lausanne

In industries and even worse at universities it is very difficult to find information, like NMR spectra or HPLC chromatograms that were acquired 5 years ago. During this presentation we will describe the system that we have developed allowing to store and process all physical characteristics, chemical structures, chromatograms and spectra in a database.  At the technical point of view, this project is based on :

on the server :

  • a java servlet
  • a SQL database (currently MySQL,
  • some shell scripts for the conversion of data coming from all the instruments

on the client :

We will first present the database diagram that we have design in order to keep all the information in a way that is intuitive for chemists. We will then present the various ways to import chemical information in this database, either directly from the web browser, using a home-made java application (XMLCreator) or by importing an XML file. This later will be the best choice to import straightforwardly the chemical information coming from the instruments (spectra and chromatograms).  We will then describe our choice of the format that is used in order to store the information and to maintain durability in this project. For instance all the chromatograms and spectra are converted in the ubiquitous jcamp format ( Queries can be done for all the information and we can for example search for chemical structures containing a benzene ring, having a boiling point around 150°C and a NMR peak with a chemical shift around 3ppm. Finally we will show "Nemo", our jcamp visualising applet, that allows to process and analyse in a very efficient way all the spectra. One of the main advantages is that there is only one user interface for any kind of experimental data.  This whole system can be used internally (on the intranet) in order to store the "private" chemical information but is also used to build a freely accessible database on the internet in which anybody can store any kind of chemical information.  As a conclusion, in this project we push the javascript and java features to the limits in order to have outstanding possibilities accessible directly on any computer in the world.



P-7 : Incorporating the Flexibilities of Both the Ligand and the V82F/I84V Drug-Resistant Mutant HIV Protease Target During Docking: Applying the Relaxed Complex Method of Drug Design to HIV-1 Protease

Alex Perryman; University of California at San Diego, La Jolla, CA, US
Jung-Hsin Lin and J. Andrew McCammon, University of California at San Diego

Including the flexibilities of both the drug and the protein target can enhance the drug design process, and the inclusion of that flexibility could be critical when targeting a highly dynamic protein, such as HIV-1 protease. The results of applying our new Relaxed Complex Method of drug design (Lin, J.-H., Perryman, A.L., Schames, J.R., & McCammon, J.A. “Computational drug design accommodating receptor flexibility: the relaxed complex scheme.” J. Am. Chem. Soc. 124(20): 5632-5633 (2002); and Lin, J.-H., Perryman, A.L., Schames, J.R., & McCammon, J.A.  “The relaxed complex method: Accommodating receptor flexibility for drug design with an improved scoring scheme.”  Biopolymers. 68(1): 47-62 (2003)) to the HIV-1 protease system will be presented.  Considering the huge conformational changes that HIV protease experiences, and considering both the large size and the extensive flexibility displayed by the HIV-1 protease inhibitors that are currently used clinically, applying the Relaxed Complex method to this system was a formidable challenge. The algorithmic details involved such things as converting all of the snapshots from all atom, parm99-formatted restart files into united atom, parm94-formatted pdb files and then further converting those files into AutoDock3.0.5’s format in a fully automated fashion (while maintaining the same relative position of the grid points). Significant trial-and-error was then involved in optimizing the run parameters for AutoDock3.0.5’s Lamarckian Genetic Algorithm (Morris, G.M.; Goodsell, D.S.; Halliday, R.S.; Huey, R.; Hart, W.E.; Belew, R.K.; Olson, A.J. J. Comp. Chem. 19: 1639-1662 (1998)), in order to get reproducible results in an efficient manner.

In the Relaxed Complex experiments that were performed, the completely-flexible drug JE-2147 was docked to every tenth picosecond snapshot extracted from both the 22 ns wild type HIV-1 protease MD simulation as well as from the 22 ns V82F/I84V mutant HIV-1 protease MD simulation.  Those MD simulations were discussed in: Perryman A.L., Lin, J.-H., & McCammon, J.A. “HIV-1 protease molecular dynamics of a wild-type and of the V82F/I84V mutant: possible contributions to drug resistance and a potential new target site for drugs.”  Protein Sci.  13(4):1108-23 (2004). JE-2147 was the drug crystallized with the wild type HIV-1 protease in the 1KZK.pdb structure, which was the basis for the conventional MD simulation of the wild type HIV protease. The same set of optimized run parameters performed very robustly when docking JE-2147 against all 22 ns of both the wild type and of the V82F/I84V mutant of HIV-1 protease, even though the mutant was crystallized with Tipranavir (1D4S.pdb). When each Relaxed Complex experiment was repeated with the same set of optimized run parameters, the estimated free energy of binding that was obtained from docking against each particular snapshot had good agreement between the three independent trials that were performed on each of the two systems (see Figures 1 and 2 below).

That set of optimized run parameters is currently being utilized in Relaxed Complex experiments that involve the design and evaluation of new active site inhibitors that should hopefully be more effective against the V82F/I84V drug-resistant mutant ensemble of conformations.  Structural intuition and guidance from the literature and from the control experiments were used to design a series of over 20 new, slightly different compounds, which exploit the advice that the size, the flexibility, and the asymmetry of the P2/P2' side chains should be increased when trying to design an HIV protease inhibitor that will be more effective against the drug-resistant mutants.  Fourteen of those compounds are currently being screened in silico, but the results of screening the entire series will be presented.


Fig. 1:  The drug JE-2147 was docked to 2,200 different snapshots from the wild type HIV-1 protease MD trajectory, and the results of three independent trials were quite robust.  The results of targeting 100 of those conformations (every tenth snapshot from the second ns of the 22 ns wild type trajectory) are shown above.  Each circle/trial signifies the best of ten runs of docking to one particular snapshot of HIV protease, and each snapshot was targeted in 3 separate experiments.


Fig. 2:  The drug JE-2147 was docked in three independent trials to 2,200 snapshots of the V82F/I84V drug-resistant mutant HIV-1 protease in a reproducible manner, using the exact same optimized run parameters and procedures that were utilized when docking against the wild type’s snapshots (see Fig. 1).  The results of targeting 100 of those snapshots (every tenth conformation from the first ns of the 22 ns mutant trajectory) are shown above.

A.L.P. is a Howard Hughes Medical Institute Pre-doctoral Fellow. We are grateful for the generous funding provided by the Howard Hughes Medical Institute and the W.M. Keck Foundation. Additional funding was provided, in part, by grants to J.A.M. from NIH, NSF, NPACI/SDSC, NBCR, and by UCSD's new NSF Center for Theoretical Biological Physics.



P-9 : Making Real Molecules in Virtual Space

Gyorgy Pirok; ChemAxon, Budapest, HU
Nora Mate, Jeno Varga, Miklos Vargyas, Szilard Dorant, and Ferenc Csizmadia, ChemAxon

Most virtual reaction applications require the manual intervention of experienced chemists in the enumeration phase (selection of appropriate reactants, assignment of the corresponding reaction sites, removing unlikely products). To automate the synthesis process we have moved the expertise intensive stages from the compound library design phase to the reaction library design phase. ChemAxon is building a library of the most important preparative reactions, where each reaction definition contains a generic transformation scheme and additional chemo-, regio- and stereoselectivity rules to handle specific reactants selectively.

The key component of this technology is the Java-based Reactor software able to evaluate and enumerate these "smart" reactions. Its high performance and ability to predict synthetically feasible reaction products opens new possibilities for researchers. We present some virtual synthesis and biotransformation applications, which are able to enumerate entire combichem libraries, build a diverse molecular space of synthetically feasible virtual compounds from available chemicals and predict metabolic pathways.



P-11 : SAPPHIRE: Structure Aided Pharmacophore Implied Reagent Extraction – A method for in silico Screening

Narasinga Rao; Scynexis Inc, Research Triangle Park, NC, US

In the past decade virtual screening methods have become an integral part of lead generation in drug discovery. Primarily, there have been two approaches, one based on ligand activity utilizing the key pharmacophore elements; and the other, based on the receptor structure using protein docking methods. Both approaches have proved to be invaluable tools for lead identification as well as lead optimization. However, each method has some limitations, especially, when it is necessary to incorporate both pharmacophore and receptor information into virtual screening. This work describes a composite approach that takes into account the pharmacophore information, as well as, spatial constraints of the receptor pocket for identifying potential hits. The method uses a cascade of filters that tries to select compounds based on pharmacophore descriptors, a user definable shape constraint and the receptor pocket information that could be tailored to focus hits to specific proteins of interest. The SAPPHIRE method can be particularly useful to prescreen compound libraries prior to subjecting them for high throughput in vitro screens.



P-13 : Strategies for ACE2 Structure-Based Inhibitor Design

Monika Rella; University of Leeds, Leeds, GB
Thierry Langer and Richard Jackson, University of Leeds

Angiotensin-Converting Enzyme (ACE) is an important drug target for hypertension and heart disease. Recently, a unique human ACE homologue termed ACE2 has been identified, which has been linked to hypertension, heart and kidney disease. In addition, ACE2 was shown to function as SARS-Coronavirus receptor. This surprising role and its assumed counter-regulatory function to ACE make ACE2 an interesting new cardio-renal disease target. With the recently resolved ACE2 structure in complex with an inhibitor available, a structure-based drug design project has been undertaken to identify novel potent and selective inhibitors. Computational approaches involve combinatorial library design and docking as well as pharmacophore-based virtual screening of large compound databases.

Initially a small number of fragments was selected and evaluated via docking and later used for combinatorial library design considering synthetic accessibility by mimicking a chemical reaction. Most suitable R-groups were suggested for synthesis. In a complementary approach, a protein-based pharmacophore model was created manually comprising several chemical features such as hydrogen bonding, electrostatic and hydrophobic interactions aligned in 3D resembling specific drug-receptor interactions. Selectivity of the model was ensured by initial screening for ACE inhibitors and enrichment enhanced through repeated optimisation cycles. The final model was used to search 2.5 million compounds for matching features, proving a fast and efficient alternative to docking for initial screening. Hits were further evaluated and prioritised via docking and the most promising candidates proposed for purchase and biological testing.



P-15 : Flexible Smoothed-Bounded Distance Matrix-Based Similarity Searching of the MDDR database

Nicholas Rhodes; University of Sheffield, Sheffield, GB
David Clark, Nicholas Rhodes, and Peter Willett, University of Sheffield

The work extends the autocorrelation vectors method originally described by Moreau and co-workers [1]. The method was applied first to similarity searching of 2-D structures and later to 3-D rigid similarity [2] and molecular surfaces [3]. Though recent progress has been made in the area of 3D flexible similarity [e.g. 4] few efficient methods are available. Molecular flexibility is encoded using smoothed bounded distance matrices. Eight atomic properties are employed and the vectors comprise 16 elements (the range from 0.0 to 20.8 Angstroms is divided into bins of 1.3 Angstroms). Two vectors are compared by computing (the square of) the Euclidean distance between them, the comparisons are very rapid, giving rise to short search times, even for large databases. The poster describes the application of the method to searching of MDDR with targets drawn from a number of activity classes.

  1. Moreau, G.; Broto, P. The autocorrelation of a topological structure: a new molecular descriptor. Nouv. J. Chim. 1980, 4, 359-360.
  2. Moreau, G.; Turpin, C. Use of similarity analysis to reduce large molecular libraries to smaller sets of representative compounds. Analysis 1996, 24, 17-21.
  3. Wagener, M.; Sadowski, J.; Gasteiger, J. Autocorrelation of molecular surface properties for modeling corticosteroid binding globulin and cytosolic Ah receptor activity by neural networks. J. Am. Chem. Soc. 1995, 117, 7769-7775.
  4. Raymond, J. W.; Willett, P. Similarity searching in databases of flexible 3D structures using smoothed bounded distance atrices. J. Chem. Inf. Comput. Sci. 2003, 43, 908-916.



P-17 : BRUTUS: A Fully Automated Rigid-Body Superposition Tool

Toni Ronkko; University of Kuopio, Kuopio, FI
Anu Tervo and Antti Poso, University of Kuopio

Often, drug research projects have to be started with little knowledge on possible target. Therefore, finding new lead molecules worthy of further research is one of the first challenges such projects face. The purpose of this study was to develop a new virtual screening method for finding biologically active while structurally dissimilar lead molecules. After all, structural analogs have usually been considered by pharmaceutical industry already, and structurally dissimilar lead molecules are desirable especially in early phases of drug discovery process. The result of our work, BRUTUS, is a fully automated rigid-body superposition method for virtual screening of large molecular databases. In BRUTUS, molecular energy fields are investigated instead of molecular structures to find dissimilar while biologically active compounds. In the course of a single molecular superposition, about 12000 different alignments are evaluated before producing 1-5 most prominent alignments for further research. The amount of trial alignments allows finding less obvious matches that are more difficult to find by methods focusing solely on molecular structures.  Despite of the many trial alignments, only 0.2 seconds of computer time is needed per conformation.  Hence, BRUTUS is a practical virtual screening method that is useful especially in early phases of drug discovery process.



P-19 : Modelling the Inhibition of P450 Enzymes

Gijs Schaftenaar; Radboud University Nijmegen, Nijmegen, NL

P450 enzymes are proteins that have an important role in removing xenobiotics from the human body. Other P450 enzymes are important in the biosynthesis and conversion of steroid hormones. The ability to predict whether a ligand will inhibit the activity of these enzymes will be instrumental in the design of drugs targeted at these enzymes. Quantum mechanical calculations on a model system of the active site of these proteins were performed in order to quantify the interaction between the active site and a model ligand. The mapping of the potential energy surface with respect to some key internal variables of active site - ligand complex, will produce data which can be converted to analytical forms of this interaction. This will be used in a force-field type of description of the interaction between ligand and the full protein.



P-21 : Treasure Island: Molecular 3D Shape-based Clustering with Neural Networks

Paul Selzer; Novartis Institutes for Biomedical Research, Basel, CH
Peter Ertl, Jörg Mühlbacher, and Stephen Jelfs, Novartis Institutes for Biomedical Research

Analyzing the relation between the structure of a molecule and its physicochemical and/or biological properties is a very powerful technique to identify molecular structural features of pharmacological importance. For this purpose a structural descriptor was applied that is calculated from the intramolecular atom distances in 3D space and thus describes the 3D shape of the molecule.

Transforming molecules into such molecular 3D-shape descriptors allows one to use three-dimensional structure information as input for the training of a neural network. Neural networks learn inductively about the relation between input (molecular structure) and output (physicochemical/biological property) by analyzing a set of examples, the so-called training set. After a network has been trained it is able to predict those properties for new molecules.

This versatile methodology was implemented as cheminformatics web tool on the Novartis intranet providing three main functionalities:

  • Diversity checking of a molecular data set
    • Mapping sets of molecules into a neural network provides quick feedback about the coverage of chemical space and the diversity or overlap of the different data sets.
  • Selection of Bioactive Molecules – “Cherry Picking”
    • Highlighting biological properties of the mapped compounds provides information about the overlap of biological and chemical space. Analyzing the close network neighborhood of active compounds indicates other compound candidates with a high probability of being active too.
  • Selecting a diverse and representative subset of a large compound collection

By mapping a large compound collection into the network and taking only one representative compound form each neuron a representative subset of molecules can be created. This could be applied e.g. to reduce the size of a virtual combinatorial library with a minimum loss of chemical space coverage.



P-23 : Classification of Protein Kinases: Clustering Similarity Matrices Generated from Alignment and Novel Sequence-Based Descriptors

Suresh Singh; Vitae Pharmaceuticals, Fort Washington, PA, US
Ansuman Bagchi, Keith Schmidt, and Robert P. Sheridan, Merck Research Laboratories
Richard D. Hull, Axontologic Inc.

We present a classification of a set of 296 protein kinases from the Steven Hanks’ protein kinase data set [1] based on sequence features. We generated classifications by clustering sequences given a cross-similarity matrix. Similarity matrices may be generated in a number of ways. The conventional method of calculating the similarity between two sequences is by alignment. For this we used BLAST2 [2] to generate the alignments and the BLOSUM62 [3] and GONNET [4] matrices to score them. We introduce a novel method for calculating similarities using sequence-based descriptors that do not require alignment. Given the descriptors, we use two different ways of calculating similarities, Dice [5] and a method based on singular-value decomposition called LaSSI [6] (Latent Semantic Structure Indexing). The clustering analysis shows that these clusters correlate well with the functional class membership defined by Steven Hanks. In addition our classification assigns Hanks defined functional class membership to most of the unassigned other protein kinase (OPK) groups.  We will present a comparison of our clustering based classifications with Steven Hanks’ classification.


  1. Hardi, G. and Hanks, S (1995). The protein kinase facts book, Vols I and II.Academic Press Inc., San Diego, CA 92101.
  2. Altschul, Stephen F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nuc. Acids Res. 25:3389-3402.
  3. Benner SA, Cohen MA, Gonnet GH. Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng. 1994 Nov;7(11):1323-32.
  4. Henikoff, S. and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89:10915-10519.
  5. Hull, R.D., Fluder, E.M., Singh, S.B., Nachbar, R.B., Kearsley, S.K., & Sheridan, R.P. (2001b).  Chemical Searches using Latent Semantic Structure Indexing (LaSSI). J. Med. Chem., 44, 1185-1191.
  6. Hull, R.D., Singh, S.B., Nachbar, R.B., Sheridan, R.P., Kearsley, S.K., & Fluder, E.M. (2001a). Latent Semantic Structure Indexing (LaSSI) for defining chemical similarity. J. Med. Chem., 44, 1177-1184.



P-25 : Distributed Search System CACTVS/SONORA: Search and Retrieval of Chemical Compounds and Associated Data from Very Large Databases

Markus Sitzmann; National Institutes of Health, Frederick, MD, US
Marc Nicklaus and Igor Filippov, National Institutes of Health
Wolf-Dietrich Ihlenfeldt, Xemistry GmbH

We present the distributed search system SONORA (Searches Optimized for Node-Operation for Rapid Answers) implemented on the basis of the chemical information system CACTVS [1]. SONORA is able to distribute any kind of searches in a chemical structure database to a set of CACTVS clients running on a computer cluster. The measured speed up for searching a database is nearly linear with the number of CPUs used. Currently, SONORA is working on our 96-CPU Beowulf-type parallel computer cluster, consisting of 48 dual AMD Athlon XP1900+ nodes. SONORA can make use of an arbitrary number of nodes and both processors of each assigned node.

As a first application based on SONORA, we are implementing a search and display service providing access to a database of over 13 million unique small-molecule structures distributed by ChemNavigator [2]. All entries of this database are searchable by various criteria similar to our “Enhanced NCI Database Browser” (, e.g. molecular formula, full structure, substructure, and molecular similarity. Additionally, calculated properties (such as ADME-type properties) useful in the context of drug development are being included in all entries of the database.




P-27 : Surrogate Docking: High Quality Docking at High Throughput Speeds

Andrew Smellie; Arqule, Woburn, MA, US
Sukjoon Yoon and Anton Filikov, Arqule

A methodology has been developed that provides a user-controlled continuum that trades off docking quality v.s. speed. By selecting a fraction of the molecules to be docked on a target, regular docking is performed and models are constructed that predict the likelihood of other molecules docking successfully. By ranking molecule sets from the scores from the model, it will be shown that most of the molecules that dock well are ranked highly, thus giving high enrichment. It will be shown that most of the good docking molecules can be obtained by docking a fraction of the database ranked by the model and that these molecules contain a high proportion of active compounds. Details will be shown that show the effectiveness of different descriptor sets and different model building methods. Examples will be described from docking studies of molecules from the NCI database on CDK2 and with known inhibitors of the Estrogen Receptor



P-29 : ROBIA: A Reaction Prediction Program

Ingrid Socorro; Cambridge University, Unilever Centre for Molecular Science Informatics, Cambridge, GB
Jonathan M. Goodman, Cambridge University, Unilever Centre for Molecular Science Informatics
Keith T. Taylor, Elsevier MDL

We are developing a computer program, ROBIA, with the purpose of predicting and analysing organic reactivity. This interactive computer program predicts the products of organic reactions from the starting materials and the reaction conditions, based on the selected transformations within its database.  This mechanistic approach generates a large number of products, from which the most important are selected using filters and molecular modeling calculations. The procedure has been applied successfully to the biosynthesis of dolabriferol [1] as shown in the example below.


  1. Ciavatta, M. L.; Gavagnin M.; Puliti R.; Cimino G. Tethrahedron, 1996, 52, 12831.



P-31 : 3D Structure-Activity Relationships of Non-Steroidal Ligands in Complex with Androgen Receptor Ligand-Binding Domain

Annu Söderholm; Finnish IT Center for Science, Espoo, FI
Lehtovuori Pekka and Nyrönen Tommi, Finnish IT Center for Science

Androgen receptor (AR) is a member of the nuclear receptor superfamily and functions as a ligand-dependent transcription factor in the regulation of AR targeted gene expression. The transcriptional activation of ARs is regulated through agonist and antagonist binding to the ligand-binding domain (LBD), resulting in conformational changes and subsequent recruitment of co-regulators. The agonistic mechanism and the agonist-bound structure of AR LBD are known, unlike the antagonist-bound structure and the antagonistic mechanism.

In this study [1], we applied the Comparative Molecular Similarity Indices Analysis (CoMSIA) method to gain insight into the physicochemical properties contributing to the binding affinity of a set of non-steroidal AR ligands. We combined molecular docking to the 3D-QSAR analysis to identify their preferred binding modes within the AR ligand-binding pocket (LBP) and to generate the ligand alignment for the 3D QSAR analysis.

The data for 70 AR ligands containing 67 non-steroids were obtained from the literature [2-6]. This panel represents a diverse set of compounds in terms of structure and function. The ligands were divided into a training set of 61 compounds and a test set of 9 compounds. Model validation was carried out by leave-one-out (LOO) and random groups (RG) cross-validation methods.

The CoMSIA model derived from the hydrophobic and hydrogen bond acceptor fields using five PLS components is stable and statistically significant as indicated by internal validation (Q2LOO=0.656, SDEPLOO=0.576; Q2RG10=0.612, SDEPRG10=0.612; R2=0.911, SEE=0.293). The external validation using the test set indicate a model with good predictive power (pred-R2=0.800, SEE=0.367).

The interpretation of the model is compatible with the protein environment. The superposition is thus likely to represent the biologically active conformations of the non-steroidal ligands, and the results provide information on how the ligands bind and interact with the AR LBD.

  1. Söderholm AA, Lehtovuori PT, Nyrönen TH, J. Med. Chem. DOI: 10.1021/jm0495879.
  2. Dalton JT, Mukherjee A, Zhu Z, Kirkovsky L, Miller DD, Biochem. Biophys. Res. Commun. 1998, 244, 1-4.
  3. Kirkovsky L, Mukherjee A, Yin D, Dalton JT, Miller DD, J. Med. Chem. 2000, 43, 581-590.
  4. Van Dort ME, Robins DM, Wayburn BJ, J. Med. Chem. 2000, 43, 3344-3347.
  5. Van Dort ME, Jung YW, Bioorg. Med. Chem. Lett. 2001, 11, 1045-1047.
  6. Yin D, He Y, Perera MA, et al., Mol. Pharmacol. 2003, 63, 211-223.



P-33 : Open Content Databases and Open Source Libraries for Chemoinformatics

Christoph Steinbeck; Cologne University, Cologne, DE
Stefan Kuhn and Christian Hoppe, Cologne University
Egon Willighagen, Radboud University Nijmegen

Traditionally, scientists share information and knowledge to enable others to build upon their results. While software development in early computational chemistry adhered to this paradigm, there were many areas of academic software and database development in chemistry which adopted a closed source/closed content culture. In contrast to bioinformatics, for example, chemistry thus lacks those valuable open data collections that allow scientists all over the world to perform research based on community-collected data.

With this talk we want to emphasize the importance of building open data repositories in chemistry using open source software. We will exemplify how this can be done based on two projects run by our group:

The Chemistry Development Kit (CDK) is an open-source Java library for Structural Chemo- and Bioinformatics [1]. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions will be described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, QSAR descriptor calculation, and much more. The CDK forms the basis of a number of applications [2-4], such as the open web database for organic structures and their NMR data, NMRShiftDB [5], which is available to the public at NMRShiftDB, with now more than 13.000 organic compounds and their NMR spectra, will serve as an example for an effort to create a community-built open content, open submission database, which grows by contributions from the user community. We will describe how the quality of data entered by the users is ensured by combining automated controls with a peer review by registered human reviewers. NMRShiftDB allows for (sub-) structure, (sub-) spectra and textual searches. It can further perform 13C-NMR spectrum predictions based on the HOSE code method and the database material.

Finally, a number of other Open Source projects developed by us and collaborating groups will be introduced to the audience in a short overview.

  1. Steinbeck, C., Han, Y. Q., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E., Journal of Chemical Information and Computer Sciences, 2003, 43, 493.
  2. Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., and Schomburg, D., Nucleic Acids Research, 2004, 32, D431.
  3. Steinbeck, C., Journal of Chemical Information & Computer Sciences, 2001, 41, 1500.
  4. Krause, S., Willighagen, E., and Steinbeck, C., Molecules, 2000, 5, 93.
  5. Steinbeck, C., Kuhn, S., and Krause, S., Journal of Chemical Information & Computer Sciences, 2003, 43, 1733



P-35 : Modeling the Metabolism of Xenobiotics

Lothar Terfloth; Universitaet Erlangen-Nuernberg, Erlangen, DE

Inappropriate pharmacokinetic properties are often responsible for the attrition of a new drug in a late phase of its development process. For this reason, the prediction of an acceptable ADME-Tox (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile for a new compound at an early stage is a key step in the drug discovery process. Xenobiotics are oxidized, reduced, or hydrolyzed in the first phase of the metabolism. Cytochrome P450 is involved in more than 90 percent of the oxidation reactions. The two major isoforms of cytochrome P450 which contribute to these oxidation reactions are 3A4 with about 50 percent and 2D6 with about 25 percent. In the second phase, a conjugation reaction such as a glucoronidation, acetylation, methylation, or glutathione conjugation follows. Here, we direct a special focus to the investigation of the metabolism of xenobiotics by cytochrome P450.

X-ray crystal structures for the membrane-bound, human Cytochrome P450 enzymes are available just recently. Therefore, we applied methods from ligand-based drug design to pursue our studies. A knowledge-based approach using neural networks will be presented [1]. For the classification of substrates of Cytochrome P450 3A4 and 2D6 the program SONNIA [2] (Self-Organizing Neural Network for Information Analysis) has been used.

Furthermore, we report on the development of a reaction database related to the metabolism of xenobiotics by human Cytochrome P450 enzymes.

  1. J. Zupan, J. Gasteiger, Neural Networks in Chemistry and Drug Design (Wiley-VCH, Weinheim, ed. 2, 1999).
  2. SONNIA, available at



P-37 : BRUTUS: Rapid Optimization of Molecular Electrostatic Overlay - Evaluation of the Applicability of the Algorithm

Anu Tervo; University of Kuopio, Kuopio, FI
Toni Rönkkö and Antti Poso, University of Kuopio
Tommi H. Nyrönen,  Finnish IT Centre for Science

BRUTUS is a fast molecular field-based superposition algorithm developed for chemical similarity searching. The properties of chemical compounds (e.g., possible biological activity to a certain target protein) are initially dependent on the distribution of their electrons surrounding the atomic nuclei. Therefore, the usage of molecular field information that is modeling the charge distribution of the compounds can be considered as appropriate in virtual screening of chemical databases, especially when structurally dissimilar compounds with similar properties are of particular interest.

Here we present the evaluation results of BRUTUS. It was utilized in chemical similarity searching on the basis of steric and electrostatic molecular fields of human immunodeficiency virus protease (HIV-1 PR) and cyclooxygenase-2 (COX-2) inhibitors. The search results of BRUTUS were structurally diverse, and comparable in magnitude to the reference results obtained using Unity fingerprints with Tanimoto coefficient as a measurement of similarity degree. The results suggest BRUTUS as a fast molecular field-based algorithm that can be successfully used in field-based similarity searching of large molecular databases.



P-39 : The Use of Exclusion Volume in Feature Based Alignment Pharmacophore Models: Catalyst HipHopRefine

Samuel Toba; Accelrys, San Diego, CA, US
Al Maynard, Jon Sutter, and Marvin Waldman, Accelrys

This presentation provides an overview of the Catalyst HipHop and HipHopRefine pharmacophore generation and refinement algorithms. HipHop performs feature-based alignment of a collection of compounds and generates pharmacophore models. HipHop is used to match features, such as surface-accessible hydrophobes, surface-accessible hydrogen bond donors/acceptors, and charged/ionizable groups, against a set of active candidate molecules. HipHop does not incorporate any penalty for incompatible sterics, and the HipHopRefine algorithm has been designed as a post processing technique, suited to HipHop pharmacophores, which targets the addition of excluded volumes as just such a penalty.Details of the detection, ranking and selection of locations for excluded volume addition to pharmacophores based on a set of active and inactive molecules is given, along with details of improved enrichments and elimination of false positives in database searches compared with the original pharmacophores.



P-41 : Diversity of Chemical Structure Libraries Characterized by the Distribution of Tanimoto Indices

Kurt Varmuza; Vienna University of Technology, Vienna, AT
Heinz Scsibrany, Vienna University of Technology

Chemical structures of organic compounds are represented by binary 2D-substructure descriptors. Similarity between two structures is characterized by the Tanimoto index calculated from 1365 substructures. Software SubMat has been developed for an easy and automatic calculation of binary substructure descriptors for a set of molecular structures and a set of substructures. SubMat generates a text file with a line for each molecular structure that contains a string of 0´s and 1´s for absence or presence of the substructures. Computing time for 1000 molecular structures and 200 substructures is typically one second (Pentium IV, 2.6 GHz). SubMat can be optionally executed by calling it from another program. In this case a command file is used to transfer file names and parameters to SubMat. During execution so called semaphore files are used to communicate with the calling program. The contents of spectral databases (IR, MS), with up to about 100,000 structures, have been characterized by the Tanimoto indices of randomly selected structure pairs. The distribution of typically up to one million Tanimoto indices describes the structural diversity of a database. Shape and parameters of such distributions are discussed.



P-43 : Detection of Toxicity Indicating Structural Patterns

Modest von Korff; Actelion Ltd., Allschwil, CH
Thomas Sander, Actelion Ltd.

Since several years the early stages of drug discovery are driven by multiobjective optimization. Besides a ligand’s binding affinity to the target protein its ADMET features came increasingly into the researchers’ focus. Where reliable high-throughput assays to evaluate ADMET properties are expensive or missing there is a high demand for predictive in-silico techniques. Thus, computational methods delivering indicators of a compounds’ toxicity potential are of high interest to medicinal chemists.

Based on the assumption that structurally similar compounds are also similar concerning their toxicity profile, we started a topological exploration of the RTECS database, which covers various toxicity classes. The IDDB database was used as reference for non-toxic compounds. The clustering and classifications methods applied were Naive Bayesian Clustering, k Next Neighbor Classification and Support Vector Machines. To find the optimum molecule representation we analyzed the behavior of three different descriptors. We underlying descriptors comprised one fragment based chemical fingerprint, a topological walk based chemical fingerprint and a topological pharmacophore end point descriptor. Any toxicity model based on one of these classification algorithms and one of these descriptors were trained and tested on independent datasets.

Furthermore, we created a toxicity alerting system for compounds that are outside of the known chemical space of toxic compounds. Considering that a compound’s toxicity is often caused by a certain chemical substructure we shreddered the RTECS database yielding 100.000s of substructure fragments. By introducing query features we retained the original substitution patterns within these fragments. The occurrence of each fragment was counted within all molecules that expose a certain class of toxicity and then normalized by the fragment’s natural occurrence found in the reference database. Fragments with both a statistically relevant overall frequency and significantly higher occurrence in the toxic group of compounds were listed as potentially risky fragments. For predicting the toxicity of an unknown molecule we run a substructure search of all risky fragments, which, depending on the result, may give us an indication of potential toxic behavior.

To visualize the chemical space of all RTECS and IDDB molecules available we trained a self-organizing map (SOM) with the compounds of both databases. The map topology was quadratic, toroidal and contained 10000 neurons. Finally we mapped the compounds of various toxicity classes onto the SOM, and colored those compounds accordingly. The result shows the distribution of toxic compounds in the chemical descriptor space. Areas with a high density of toxic compounds indicate regions in the chemical space to be avoided. Mapping external compounds onto the ‘toxicity tinged’ SOM provides for another toxicity measure purely based on the toxicity profile of the nearest neighbors and their similarity distances to the test compound.

Summarizing, we found that a Support Vector Machine in combination with the Fragment Based Fingerprint is well suited to classify compounds into various toxicity classes. Also SOMs perform excellent in separating toxic from nontoxic substances. While these methods reliably detect toxicity risks for compounds being similar to already known toxic compounds, our fragment based approach also covers compounds being dissimilar to anything known.



P-45 : Representing Structural Databases in a Self-Organising Map

Ron Wehrens; Radboud University Nijmegen, Nijmegen, NL
Rene de Gelder, Willem Melssen, and Lutgarde Buydens, Radboud University Nijmegen

We present a way to visualise large numbers of crystal structures, as represented by their simulated powder diffraction patterns, in a Kohonen feature map. Essential is the application of a recently introduced similarity criterion, the weighted cross-correlation. It will be shown that good results are obtained even if the network is trained with a small subset of the complete database. This makes it possible to construct the map, using common hardware, in a few hours. This two-dimensional visualisation has a number of important applications, such as fast and easy screening of a database, the selection of a representative set, and providing an overview of the contents of the database in terms of structural diversity of specific chemical classes of compounds.



P-47 : Drug Design Applications Based on COSMO-RS

Karin Wichmann; COSMOlogic GmbH & Co. KG, Leverkusen, DE
Andreas Klamt, COSMOlogic GmbH & Co. KG

Screening charge (σ) surfaces and screening charge distributions (σ-profiles) of molecules derived from quantum chemical COSMO (Conductor-like Screening Model) calculations offer a broadly applicable description of molecular interactions in liquid phases, which has already been used successfully in many applications in chemical engineering and in ADME property prediction. This approach is perpendicular to the force field based modelling methods commonly used in drug design and thus it has the potential to complement the established methods.

Although σ-profiles don’t encode any structural information, they can be used to investigate receptor-ligand interactions: If the σ-profile of a receptor is known, the receptor can be considered as a pseudo-liquid and the chemical potential of a ligand molecule in such a pseudo-liquid receptor (PLR) can be calculated. Since the calculation of the chemical potential of the ligand in water is a standard task for COSMO-RS, the partition coefficient of the ligand between the PLR and an aqueous phase can easily be calculated. This partition coefficient can be used as a measure for the overall polarity suitability of drug and receptor and may be a useful number in the selection of promising drug candidates. Since the receptor σ-profile has to be calculated only once, large numbers of drug candidates can be screened. For the COSMO treatment of enzymes a procedure based on linear scaling semi-empirical calculations has been developed. AM1-COSMO calculations and subsequent BP/SVP single point calculations were performed for 161 tri-peptides. On the basis of bond order and atom types of the neighboring atoms, 39 asymmetric bond types were found, and dipole and quadrupole corrections were fitted. Thus, the original rms deviation between the AM1 and BP/SVP electrostatic potentials was reduced from 0.0112 e/Å to 0.0058 e/Å.

First applications of this novel approach focus on factor Xa and kinase receptor-inhibitor interactions and will be presented.



P-49 : Techniques for Location-Independent Chemoinformatics Teaching and Research

David Wild; Indiana University School of Informatics, Bloomington, IN, US
Gary Wiggins, Indiana University School of Informatics

We have recently developed a chemoinformatics teaching curriculum and research base at Indiana University that allows participation at multiple geographic locations and institutions, with an aim to engage students, lecturers and researchers in multiple locations and disciplines and to bolster the connectivity of chemoinformatics to the wider research community. In this presentation we will discuss the programs we have created, evaluate their effectiveness, and detail the distance learning technologies that we have found most effective.



P-51 : Similarity-Based Virtual Screening Using Data Fusion

Peter Willett; University of Sheffield, Sheffield, GB
Val Gillet, Jerome Hert and Martin Whittle, University of Sheffield

Data fusion (which is referred to as consensus scoring in the ligand-docking community) is a general technique for combining the results of multiple database searches in systems for virtual screening [1]. The basic assumption of the data-fusion approach is that the use of multiple computational tools will enable a more effective prioritisation of a set of compounds for biological testing than will the use of a single such tool.  In similarity-based virtual screening, data fusion has typically involved matching a bioactive target structure against the database molecules using several different similarity measures, and then merging the various rankings. For example, one could use multiple representations (e.g., a 2D fingerprint, a set of four-point pharmacophores, and a set of topological indices) or multiple similarity coefficients (e.g., the Forbes, Tanimoto and Russell-Rao coefficients).  An alternative approach, and the one considered here, involves the use of a single similarity measure but multiple target structures, which we refer to as group fusion.

The idea that one can enhance retrieval effectiveness by using multiple target structures in a similarity search is not a new one (see, e.g., [2, 3]). We have recently compared several ways of combining the information from such multiple structures, using 2D fingerprints in extended simulated virtual screening searches of the MDL Drug Data report database [4].  This comparison demonstrates clearly the effectiveness of the group fusion approach: given some number (ten in our experiments) of known active target structures, match each of them against the database molecules and score each of these by the maximum of its similarities with the known actives. The similarity measure in these initial experiments was based on the Tanimoto Coefficient and Unity 2D fingerprints.  More recent experiments have evaluated different types of similarity coefficient and 2D fingerprint for group fusion; these experiments suggest the general effectiveness of the Tanimoto Coefficient and Scitegic circular substructure descriptors. We have also demonstrated that group fusion is most effective when the set of actives that is being searched for is structurally heterogeneous, a situation that is difficult for conventional similarity searching and data fusion; group fusion, conversely, will add little to conventional similarity searching when the actives are structurally homogenous [5, 6]. We are now developing a mathematical model of data fusion with the aim of being able to rationalise the effectiveness of different fusion rules.

  1. Ginn, C. M. R. et al. (2000).  Perspect. Drug Discov. Design, 20, 1-16.
  2. Schuffenhauer, A. et al. (2001). J. Chem. Inf. Comput. Sci., 43, 391-405.
  3. Xue, L. et al. (2001). J. Chem. Inf. Comput. Sci., 41, 746-753.
  4. Hert, J. et al. (2004).  J. Chem. Inf. Comput. Sci., 44, 1177-1185.
  5. Hert, J. et al. (2004).  Org. Biomol. Chem., 2, 3256-3266.
  6. Whittle, M. et al. (2004). ). J. Chem. Inf. Comput. Sci., 44, 1840-1848.



P-53 : On the Use of Spectra as Molecular Descriptors in QSAR Research

Egon Willighagen; Radboud University Nijmegen, Nijmegen, NL
R. Wehrens and L.M.C. Buydens, Radboud University Nijmegen

Several papers have been published that describe the use of infrared and NMR spectra as molecular descriptors in quantitative structure activity relationship (QSAR) research. This research
further explores the use of spectra and applied to several data sets with biochemical and physical properties. Results are compared against models based on conventional descriptors, such as topological, geometrical and electronic descriptors. Several modern modelling methods, like Partial Least Squares (PLS), Support Vector Regression (SVR) and Classification and Regression Trees (CART), are used and compared.



P-55 : The Development of a Machine Learning Algorithm for Ligand-Based Virtual Screening

David Wood; University of Sheffield, Sheffield, GB
Peter Willett and Beining Chen, University of Sheffield
Xiaoqing Lewell, GlaxoSmithKline

Machine learning techniques for ligand-based virtual screening (VS) generally involve inputting an algorithm with a training set of examples of active and inactive compounds in a descriptor format. The algorithm then uses the information in the training set to develop a model of activity which can be used to classify compounds with unknown activity. Such algorithms can therefore be used to rapidly identify promising compounds from large virtual libraries or supplier’s catalogues.

Kernel discrimination methods are a class of machine learning algorithms that classify unknown items by comparing population density distributions of each of the training set classes over the descriptor space. Binary Kernel Discrimination (BKD) is an example of a kernel-based classification technique that is input with multivariate binary data: a commonly used method of representing chemical compounds. BKD has only fairly recently been applied to VS [1]. Several comparative studies have indicated that it outperforms many commonly used ligand-based VS techniques [2], and has comparable performance to Support Vector Machines [3]. Further development of this technique could yield a useful tool for drug discovery.

A series of experiments were performed to develop BKD for VS. The performance of a range of commonly used fingerprint descriptors, when used in conjunction with BKD, was compared over 11 different activity classes. The fingerprints included BCI, Daylight, Unity and SciTegic’s Extended Connectivity fingerprints. The SciTegic fingerprint system was found consistently to demonstrate the best performance over many of the activity classes and so is the 2D fingerprint descriptor of choice for BKD. A further experiment demonstrated that the performance of the system is improved by increasing the number of inactive compounds in the training set, although the improvement rapidly tails off once sufficient inactive compounds have been included. Finally, alternative methods of optimising the model of activity in the algorithm’s training stage were considered and an improved variation to the original method is proposed. This poster will report in detail the experiments that have been conducted to develop the BKD method for VS.

  1. Harper, G., et al., Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comp. Sci., 2001. 41: p. 1295-1300.
  2. Wilton, D.J., et al., Comparison of ranking methods for virtual screening in lead-discovery programs. J. Chem. Inf. Comput. Sci., 2003. 43: p. 469-474.
  3. Wilton, D., Willet, P., Delaney, J., Lawson, K., & Mullier, G, The Use of Binary Kernel Discrimination and Support Vector Machines for Virtual Screening in Pesticide-Discovery Programmes. (In preparation, 2004).



P-57 : Development of the Transition State Data Base

Toru Yamaguchi; Yamaguchi University, Ube, JP
Kenzi Hori, Yamaguchi University

We have been developing a data base including information of transition states (TS) such as TS structures, activation energies and so on for various reactions. It is called the Transition States Data Base (TSDB) which assists to develop synthesis routes of compounds by using information of TSs together with the help of the KOSP program, a synthesis routes designing system developed by Funatsu and his coworkers. The TSDB is a system consisting of six programs as follows.

  1. TSLB is the main library storing information of transition states.
  2. TS_Search assists to search transition states of reactions.
  3. TSDB_View manages data in TSLB.
  4. FIND_Tsinfo searches information in TSLB related to synthesis routes from KOSP. This data base is constructed with ChemFinder of Cambridge Soft.
  5. Reaction_View is used for viewing molecular structures, animations of molecular vibrations and geometry transformations along the intrinsic reaction coordinates. We are now using the JMol program in this purpose.
  6. Auto_PTOD calculates TS structures at the B3LYP/6-31G* level of theory from initial geometries of PM3 calculations stored in the TSLB.

In the present study, we will represent the TSDB in detail and demonstrate how the data base works for developing synthesis routes of compounds.



P-59 : Pharmacophore Hypotheses Derived from Protein Structure and Inhibitors: Methods & Binding Site Comparisons of CYP3A4

Litai Zhang; Bristol-Mayers Squibb, Princeton, NJ, US

Cytochromes P450 (CYP 450) are the major enzymes involved in the oxidative metabolism of various drugs and other xenobiotics. A large number of pharmacologically active compounds are often rejected as new drug candidates due to either unsuitable pharmacokinetics or their interference with the metabolism of existing therapeutic agents. In many cases this is because the compounds are either low Km substrates or potent inhibitors of one or more CYP 450 isozymes, such as CYP 3A4, 2C9, 2D6, 2C19. The CYP 3A4 ranks among the most important drug metabolizing CYP isoforms present in human liver. Numerous inhibitory drug interactions of high clinical significance involving CYP 3A4 substrates have been described[1]. Historical precedents within BMS have indicated that CYP 3A4 has been particularly problematic.

Pharmacophore hypotheses derived from protein structure and inhibitors to elucidate the active site for the CYP 3A4 can be used in the triage of HTS & actively direct synthesis away from CYP  3A4 issues.



P-61 : Automatic Classification of Chemical Reactions without Identification of Reaction Centers

Qing-You Zhang; Universidade Nova de Lisboa, Caparica, PT
Joao Aires-de-Sousa, Universidade Nova de Lisboa

Automatic classification of chemical reactions is of high importance for the analysis of reaction databases, reaction retrieval, reaction prediction, or synthesis planning. After a period of almost inactivity, this topic is re-emerging particularly due to the current interest in metabolic reactions.

Well-documented methods for reaction classification have been based on a) physicochemical properties of bonds or atoms at the reaction center; [1,2] b) variation of physicochemical properties of one or more atoms attached to the reaction center; [3] c) degree of overlap of weighted fragment sets; [4] d) numerical codes of the neighborhoods of the reaction center. [5] These approaches require atom mapping and identification of the bonds broken or made in the reaction (the reaction center). Some also require ranking of the bonds involved in the reaction, and a scheme to compare reactions with a different number of bonds involved.

In the course of our QSAR studies with molecular maps of bonds (MOLMAPS), it was recognized that the difference between the map of the products bonds and the map of the reactants bonds could be interpreted as a map of the reaction. A MOLMAP is based on a Kohonen self-organizing map previously trained with bonds represented by a series of physicochemical properties calculated by PETRA. [6] The MOLMAP of a molecule is the pattern of neurons that are activated by the bonds existing in that molecule. Bonds far apart from the reaction center are unchanged during the reaction and activate the same position (neuron) of the MOLMAP both in the reactant map and in the product map. Therefore, the difference map (MOLMAP of the reaction) gets a zero value at that neuron. The pattern of neurons in the MOLMAP of the reaction with non-zero values relates to the bonds of the reactants that break or change, and to the bonds of the products that are made or changed in the reaction. The former lead to negative values, while the latter lead to positive values.

Following this approach, 543 photochemical reactions were encoded that involve two reactants and one product. They were manually assigned to eight classes – seven types of reactions (203 reactions) and a class with the remaining 340 reactions. The data set was divided into a training set consisting of 429 reactions and a test set with 114 reactions.

Classification of the reactions by machine learning on the basis of MOLMAPS of size 12x12 was investigated with an unsupervised method (Kohonen self-organizing map) and a supervised method (random forest). Kohonen SOMs achieved 97% of correct classifications for the training set and 87% for the test set. Random forests obtained 99% of correct predictions for the training set and 93% for the test set.

This work demonstrates that the difference between MOLMAP descriptors of products and reactants can be used to represent and classify reactions without assignment of reaction centers.

ACKNOWLEDGMENTS. The authors thank InfoChem GmbH (Munich, Germany) for sharing the data set of photochemical reactions. Z. Q. Y. acknowledges Fundação para a Ciência e Tecnologia (Lisbon, Portugal) for a post-doctoral grant under the POCTI program (SFRH / BPD / 14476 / 2003).


  1. L. Chen; J. Gasteiger; J. Am. Chem. Soc. 1997, 119, 4033-4042.
  2. O. Sacher. Classification of organic reactions by neural networks for the application in reaction prediction and synthesis design. Ph. D. Thesis, University of Erlangen-Nuremberg.
  3. H. Satoh; O. Sacher; T. Nakata; L. Chen; J. Gasteiger; K. Funatsu; J. Chem. Inf. Comput. Sci. 1998, 38, 210-219.
  4. T. E. Moock; D. L. Grier; W. D. Hounshell; G. Grethe; K. Cronin; J. G. Nourse; J. Theodosiau; Tetrahedron Comput. Methodol. 1988, 1, 117-128.



P-63 : Chiral QSPR Analysis of 13C NMR Properties in Chiral Solvents

Qing-You Zhang; Universidade Nova de Lisboa, Caparica, PT
Joao Aires-de-Sousa, Universidade Nova de Lisboa

The NMR chemical shifts of two opposite enantiomers are not necessarily the same in a chiral environment such as a chiral solvent. Similarly, a single enantiomer may exhibit different chemical shifts if the NMR spectrum is taken in two enantiomeric solvents. Kishi and co-workers [1-3] observed different 13C NMR chemical shifts for chiral alcohols in (S,S)- or (R,R)-BMBA-p-Me chiral solvents. A database was produced with those differences for the atoms adjacent to the chiral hydroxymethine center in 24 chiral alcohols. The prediction of such differences, and comparison with experimental values, can assist in the assignment of the absolute configuration.

In this poster we show how counterpropagation neural networks (CPG NNs) were trained and applied to estimate the difference between the chemical shifts of a given carbon atom in the two enantiomeric solvents. The neural networks receive as input a representation of the atom (an atomic chirality code) and give as output the difference of its chemical shift in the (R,R)-BMBA-p-Me and (S,S)-BMBA-p-Me solvent.

We have previously developed molecular chirality codes that represent the chirality of the whole molecule.[4,5] Introduction of an atomic chirality code is here necessary to account for local chirality around an atom, which is relevant for modeling atomic properties such as the NMR chemical shift. An atomic code means that every atom rather than the whole molecule has its own chirality code. For the generation of the molecular chirality codes, all the sets of 4 atoms are analyzed. The atomic version of the chirality code is computed in the same way, but only sets of 4 atoms that include a specific atom in the molecule are processed – to produce the atomic chirality code for that specific atom.

We used a dataset of 94 atoms and the corresponding chemical shifts differences. The dataset was partitioned into a 74-objects training set and a 20-objects test set. Parameters of the atomic chirality codes were optimized for the training set, and CPG NNs were trained to predict the sign of the differences. Correct predictions could be achieved for all the cases of the test set. With the same procedure, the quantitative prediction of the chemical shifts differences was investigated, and r2 of 0.839 and 0.936 between the predicted and the experimental values were obtained for the training and test set respectively.

The results show that atomic chirality codes describe the chirality of an atom’s environment in a way that can be correlated with a physical property - its NMR behavior in a chiral solvent. This work is also a contribution to the assignment of absolute configuration from NMR data, particularly for its implementation in automatic systems.[6]

ACKNOWLEDGMENTS. Z. Q. Y. acknowledges Fundação para a Ciência e Tecnologia (Lisbon, Portugal) for a post-doctoral grant under the POCTI program (SFRH / BPD / 14476 / 2003).


  1. Kobayashi, Y.; Hayashi, N.; Kishi, Y. Toward the creation of NMR databases in chiral solvents: bidentate chiral NMR solvents for assignment of the absolute configuration of acyclic secondary alcohols. Org. Lett. 2002, 4, 411-414.
  2. Kobayashi, Y.; Hayashi, N.; Kishi, Y. Application of chiral bidentate NMR solvents for assignment of the absolute configuration of alcohols: scope and limitation. Tetrahedron Lett. 2003, 44, 7489-7491.
  3. Kobayashi, Y.; Czechtizky, W.; Kishi, Y. Complete stereochemistry of tetrafibricin. Org. Lett. 2003, 5, 93-96.
  4. Aires-de-Sousa, J.; Gasteiger, J. Prediction of enantiomeric selectivity in chromatography. application of conformation-dependent and conformation-independent descriptors of molecular chirality. J. Molec. Graph. Model. 2002, 20, 373-388.
  5. Aires-de-Sousa, J.; Gasteiger, J.; Gutman, I.; Vidović, D. Chirality codes and molecular structure. J. Chem. Inf. Comput. Sci. 2004, 44, 831-836.
  6. Zhang, Q. Y.; Carrera, G.; Gomes, M. J. S.; Aires-de-Sousa, J. Automatic assignment of absolute configuration from 1D NMR data. J. Org. Chem., 2005, in press.



P-65 : Weighted Reaction Searching – Using Focused Fingerprints for Discriminated Results

Tim Aitken; Accelrys, Cambridge, GB
Ian Buchan, Accelrys

Similarity searching has long been used within the Drug Discovery industry for retrieving chemically similar hits from compound databases, clustering and carrying out diversity studies. Several algorithms for fingerprint comparisons are commonly used and a wide variety of fingerprinting methods have been implemented in commercial and non-commercial database search systems. With the widespread use of Reaction databases, however, the traditional molecule-oriented approaches have not proven to be as useful and search methods tend to focus on substructure, exact structure and text searches.

A new approach is demonstrated using a hash fingerprinting algorithm weighted towards reaction centre transformation and discriminating reactants from products within a reaction database. This encapsulation of molecular, reaction and reaction centre information within a ‘zoned’ fingerprint can return unexpected hits from a database search, which traditional searches would fail to return.



P-67 : Prediction of Enzyme Specificity Based on Structural MNA Descriptors

Andrey Fomenko; Russian Academy of Medical Science, Moscow, RU
D.A. Filimonov, B. N. Sobolev, and V. V. Poroikov, Russian Academy of Medical Science

EC (Enzyme Comission) classification of enzymes is based on biochemical specificity of enzymes ( Prediction of protein's position according to EC classification could give useful information, like type of chemical reactions and preferable substrate for the protein. Many authors investigated the relationship between the EC enzyme classification and protein's similarity (Thornton et al., 1999; Orengo et al., 1999; Devos and Valencia, 2000; Todd et al., 2001). It has been concluded that the first three figures of EC number can be predicted with about 95% accuracy for sequences with sequence identity about 30% (Todd et al., 2001). We introduce a new approach, which allows to predict enzyme specificity with highest accuracy than we know in this area.

Usually amino acid sequence is presented as a string of characters. This model describes chemical structure indirectly as a function of frequency of amino acids residues in alignment position. We propose a new description of amino acid sequence as a set of structural MNA descriptors. MNA (Multilevel Neighbourhoods of Atoms) descriptors were developed earlier (Filimonov et al., 1999) and are successfully used in PASS program for prediction of biological activity spectra of pharmaceutical substances (Poroikov and Filimonov, 2005).

We performed a case study on a sample of serine proteases as a large and well-studied group of enzymes. Totally we have taken 619 sequences from the ENZYME database (Bairoh A., 2001), which belong to 57 different EC numbers. We calculated a set of MNA descriptors for each sequence in data set. In this work we used MNA descriptors of different levels ranged from one to twelve. We used B-statistics to identify to which of groups (EC number) the query sequence belongs. B-statistics is described in the paper published earlier (Borodina et al., 2004). We calculated B-statistics for each sequence using leave-one-out cross validation method. To compare our approach with the traditional descriptions of amino acid sequences we calculate for each sequences set of peptides with different number of amino acid residues ranged from one to eight. We calculate B statistics for these sets of peptides in the manner described above.

We obtained the best average accuracy of predictions for the 9th, 10th and 11th levels of MNA descriptors. It was equal to 0.999. The best average accuracy were obtained for peptides of length 3, it was equal 0.97.

Thus our approach, based on structural MNA descriptors can be applicate for predicting of protein specificity. Using this approach we high accuracy prediction of the enzyme specificity up to the fourth figure of EC number, that cannot be done earlier.

This work are supported by Russian Foundation of Basic Research, grant N 04-04-49390

  1. Bairoch A. (2000) The ENZYME database in 2000. Nucleic Acids Res., 28:304-305.
  2. Borodina Y, Rudik A, Filimonov D, Kharchevnikova N, Dmitriev A, Blinova V, Poroikov V. (2004) A new statistical approach to predicting aromatic hydroxylation sites. Comparison with model-based approaches. J Chem Inf Comput Sci., 44,1998-2009.
  3. Devos D., Valencia A. (2000) Practical limits of functional prediction. Proteins: Structure, Function, and Genetics, 41, 98-107.
  4. Orengo C.A., Todd A.E., Thornton J.M. (1999) From protein structure to function. Current Opinion in Structural Biology, 9, 374-382.
  5. Poroikov V., Filimonov D. (2005). PASS: Prediction of Biological Activity Spectra for Substances. In: Predictive Toxicology. Ed. by Christoph Helma. N.Y.: Marcel Dekker, 459-478.
  6. Thornton J.M., Orengo C.A., Todd A.E., Pearl F.M. (1999) Protein folds, functions and evolution. J. Mol. Biol., 293, 333-342.
  7. Todd A.E., Orengo C.A., Thornton J.M. (2001) Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol., 307, 1113-1143.



P-69 : Construction of a System Predicting Hydration Rates of Toxic Substrates in the Environmental Conditions

Yutaka Ikenaga; Yamaguchi University, Ube, JP
Kenzi Hori, Yamaguchi University

Recently, we are required to investigate decomposition of toxic substances emitted to nature in order to avoid environment disruptions. However, it is impossible to measure rates of decomposition reactions for all the compounds since there are enormous numbers of toxic compounds. Theoretical calculations can play important role in predicting whether or not a toxic compound easily decomposes to others in the environmental conditions. It is because calculated activation energies (Ea) are closely related to the rates of decomposition of substrates. The Ea values should be correlated with experimental ones and used for predicting substrate to be decompose in the environmental conditions, i.e., a reaction with Ea more than, for example, 50 kcal mol-1 does not proceed in rivers, lakes or seas. In order to confirm this concept, we adopted esters which are forced to measure decomposition rates by a law in Japan. There are many esters which are toxic and widely used in industrial field. For this purpose, the mechanism of ester hydrolysis in the acidic conditions was investigated at the B3LYP/6-31G* level of theory. The DFT calculations did not locate any tetrahedral intermediates which many text books adopted as key intermediates. We offered an alternative mechanism which was an activation barrier of 31.0 kcal mol-1. We are also measuring Ea’s for several esters and to correlate them with calculated ones. We will use this correlation to construct a system predicting easiness of decomposition of esters in the environmental conditions.



P-71 : Structure-Based Design of Potential Novel Inhibitors of FGFR and VEGFR Tyrosine Kinase as Anti-Angiogenesis Agents

Naparat Kammasud; Mahidol University & Centre Universitaire, Orsay Cedex, FR
Isabelle André and David S.Grierson, Centre Universitaire
Opa Vajragupta, Mahidol University

Tumor angiogenesis is often the consequence of an angiogenic imbalance in which pro-angiogenic factors predominate over anti-angiogenic factors. Many growth factors and cytokines mediate cellular signaling through the activation of tyrosine kinases (TKs). In the area of cancer, receptor tyrosine kinases (RTKs) play an important role in the process of tumor growth, metastasis development, and angiogenesis.  Thus, the search for small molecules modulating the biological activity of such enzymes is of special relevance to develop new potential therapeutic agents.
Recently two indolinones developed by SUGEN Inc., SU4984 and SU5402 (Figure 1), were reported to exhibit moderate activity against the Fibroblast Growth Factor Receptor-1 tyrosine kinase (IC50s of 10-20 M), only SU5402 displayed selective activity .1


Figure 1: Indolinone inhibitors of Fibroblast Growth Factor Receptor-1 tyrosine kinase.

These inhibitors were co-crystallized with FGFR-1 tyrosine kinase [1]. Based on this structural information, SU5402 was used as a starting point for the conception of new families of inhibitors that can potentially display enhanced potency and selectivity.

Some earlier reports [2] have suggested that modifications at position 6 of SU5402 derivatives could increase the potency as well as the selectivity against VEGFR-2 and FGFR-1 tyrosine kinases. We report here the synthesis of new 6-substituted oxindole derivatives that will allow us to further explore the Structure-Activity Relationship (SAR) at this position.

In addition, we have used the information available on these derivatives to help identify alternate scaffolds by means of molecular modeling.  In our strategy, we have generated focused “virtual” chemical libraries around the oxindole scaffold, varying e.g. the nature of the substituents at different chemically accessible positions (4-6, or 3’-5’) of the indolinone core, replacing the pyrrole ring by a new ring, or breaking the indolinone ring (Figure 1).  The virtual chemical libraries generated were then screened “in silico” using the FGFR-1 tyrosine kinase crystallographic structure in order to identify new small molecules that could potentially inhibit FGFR-1 and other highly homologous RTKs such as VEGFR-2. These studies were used as a guide in our chemistry efforts.

  1. M. Mohammadi et al., Science 276 (1997), 955-960.
  2. L. Sun et al., J. Med. Chem. 42 (1999) 5120-5130.

This work was partially supported by the Royal Golden Jubilee Ph.D. Program, Thailand.



P-73 : Linking the Real and Predictive Worlds: A Conceptual Model of Chemical Information

Chris Marshall; AstraZeneca, Macclesfield, GB

The purpose of prediction is to provide insight into the likely properties and behaviours of real world materials. But real world materials aren't always as accurately characterised as we would like.  As part of the merger of Astra and Zeneca we have developed a conceptual model of chemical information based on the idea of independent but related container, sample and compound entities.  The model is part of a project extending to cover the whole of the pharmaceutical Discovery process - the Discovery Information Model. In this paper I will show how these entities not only help to manage compound registration but also set the framework for bringing together virtual and real properties. By understanding the features of the entities and appropriately assigning them we are able to distinguish and relate experimental and predicted data in a way which takes into account our evolving understanding of a sample's chemical content.



P-75 : Calculation of Physicochemical Descriptors Based on a new Structure Representation

Jörg Marusczyk; Universitaet Erlangen-Nuernberg, Erlangen, DE
Thomas Kleinöder, Achim Herwig, and Johann Gasteiger, Universitaet Erlangen-Nuernberg

The handling of chemical structures, their input from and output to physical media, and the calculation of molecular descriptors for in silico predictions are the core of Chemoinformatics. Traditionally, software systems developed and used in the field of Chemoinformatics process and store chemical structures as connection tables. Such a representation, based on valence bond theory, comprises certain problems: a compound may be written in different resonance structures which all denote the same molecule. Nitrobenzene, for example, can be found in databases in the two different ionic forms and even with a pentavalent nitrogen. On the other hand, a connection table is always the same for different electronic states of a molecule, which have distinct physicochemical properties, e.g., there is no way to specify a carbene in the singlet state.

In order to overcome some of these problems, we developed a structure representation based on the ideas of the Hückel molecular orbital theory, namely the sigma/pi separation [1]. The sigma framework of a molecule consists of two-centers-two-electron sigma systems as in a traditional connection table. Pi electrons can constitute larger electron systems spanning over more than two atoms and can also include lone pairs and radical electrons. This representation scheme is called RAMSES (Representation Architecture for Molecular Structures as Electron Systems) and is part of a comprehensive C++ toolkit library called MOSES (MOlecular Structure Encoding System) that was recently developed in our group.

Based on the RAMSES representation described above, we developed a new model for the calculation of atomic partial charges that makes use of the sigma/pi separation. The sigma charge distribution is quantified by a modified version of the well established Partial Equalization of Orbital Electronegativities (PEOE) [2]. For the pi charge distribution a modified Hückel calculation is used. Both calculation schemes were calibrated with charges from natural population analysis [3] of DFT wave functions for both uncharged and charged compounds. For a wide range of organic compounds a very good correlation can be found. Further, we extend the model for the charges to calculate the resonance energies of charged molecules. In future, we plan to use the combination of atomic partial charges and resonance energies in the field of reaction prediction and evaluation.

  1. S. Bauerschmidt, J. Gasteiger, J. Chem. Inf. Comput. Sci. 1997, 37, 705-714.
  2. J. Gasteiger, M. Marsili, Tetrahedron 1980, 36, 3219-3228.
  3. A. E. Reed, F. Weinhold, J. Chem. Phys. 1983, 78, 4066-4073.



P-77 : Drug Design, Chemoinformatics and Public Web Services with Very Large Databases

Marc Nicklaus; National Institutes of Health, Bethesda, MD, US
Markus Sitzmann and Igor V. Filippov, National Institutes of Health
Wolf-Dietrich Ihlenfeldt, Xemistry GmbH

We report on the newest versions of the tools and resources used in the drug design and in silico screening work of the CADD Group at LMC, CCR, NCI. Many of these resources are implemented in the form of web services, and most of these are made freely available to the public.  We present web-based search interfaces for databases with millions of compounds using a search engine operating in distributed mode across a Linux cluster. Many of these databases including multi-million collections of commercial screening samples, as well as data sets from various U.S. Government agencies, are being made publicly available. We present new automated tools for generating such web services as well as new calculable CACTVS hash code-based identifiers useful for rapid compound identification and database overlap analyses. We also briefly mention other chemoinformatics type services and tools available on our server at



P-79 : QSAR Analysis for Infinite Dilution Activity Coefficients of Organic Compounds Using a CODESSA PRO Software

Kaido Tämm; University of Tartu, Tartu, EE
Peeter Burk, University of Tartu

A quantitative structure activity relationship (QSAR) study of the infinite dilution activity coefficients for a set of 38 organic compounds in ionic liquids, such as 1-methyl-3-ethylimidazolium bis((trifluoromethyl)sulfonyl)imide ([emim][N(Tf)2]), 1,2-dimethyl-3-ethylimidazolium bis((trifluoromethyl)sulfonyl)imide ([em2im][N(Tf)2]), and 4-methyl-N-butylpyridinium tetrafluoroborate ([bmpy][BF4]) provided a general three-parameter QSAR models. QSAR study was carried out using the CODESSA PRO program. Three orthogonal theoretical molecular descriptors satisfactorily correlate with the activity coefficients. The descriptors, such as the complementary information content, the fractional partial negative surface area and the count of hydrogen donor sites directly describe the dilution mechanism in ionic liquids.



P-81 : A Neural Network Application in Multi-Target QSAR

Pierre-Jean L'Heureux; Universite de Montreal, Montreal, CA
Olivier Delalleau, Dumitru Erhan, Yoshua Bengio, and Shi Yi Yue, Universite de Montreal

Building a QSAR model of a new target for which few screening data is available is a daunting task. Hopefully, the new target may be part of a bigger family, for which we have plenty of screening data. We introduce a neural network architecture based on collaborative filtering that can use family information to produce predictive model of an undersampled target. We show it's performance on compound prioritization for an HTS campaign.



P-83 : The Quest for Bioisosteric Replacements

Jos Lommerse; NV Organon, Oss, NL
Markus Wagener, NV Organon

It is a major challenge to convert a compound resulting from lead finding activities into a successful drug. Whereas the initial lead compound may already bind with high affinity to the biological target, it will usually have some undesirable characteristics regarding oral bioavailability, metabolic stability, selectivity and/or toxicity.

One strategy to address these issues and to convert a lead compound into a development candidate is based on the concept of bioisosterism [1]: Structurally related compounds that both elicit the same biological activity are considered as bioisosters. A bioisosteric replacement transforms an active compound into another compound by exchanging a group of atoms with another, broadly similar group of atoms. The resulting new compound still has the original biological activity while improving on the undesirable characteristics.

We report a method that suggests potential bioisosteric replacements based on a topological pharmacophore description of the fragments. Based on that description, databases of R-groups, linkers and cores are searched for the most promising replacements. In order to focus the search on an improved ADMET profile, a number of search constraints (e.g. lipophilicity, flexibility, acidity/basicity) can be imposed. The method has been implemented as IBIS (Intranet BioIsoster Search) at Organon.

The topological pharmacophore description has been validated using the BioSter database [2,3], a database that collects examples of bioisosteric compounds from the literature. Several thousand pairs of bioisosteric fragments have been extracted from that database using unbiased criteria. Comparison of the true pairs of bioisosteric R-groups, linkers and cores from the BioSter database with random pairs confirmed the validity of the approach.

Several examples will be given which show the type of suggestions achievable with IBIS underlining the usefulness of the approach.

  1. Patani GA, LaVoie EJ. Bioisosterism: A Rational Approach. Chem. Rev. 1996; 96: 3147-76.
  2. Ujvary I. BIOSTER- A database of Structurally Analogous Compounds. Pestic. Sci. 1997; 51: 92-5.
  3. The BIOSTER database is available from Accelrys Inc. at

Keywords: bioisosters, topological pharmacophore fingerprint, descriptor validation



P-85 : Similarity Searching Using Molecular Interaction Fields

Kirstin Moffat; University of Sheffield, Sheffield, GB
Val Gillet, University of Sheffield
Gianpaolo Bravi and Andrew Leach, GlaxoSmithKline

Similarity searching is a valuable tool for aiding in the identification of possible lead compounds in drug discovery. Methods based on 3D field descriptors can be of particular use, since, unlike 2D methods, they do not consider molecular frameworks, but instead consider characteristics such as the overall shape, electrostatics and hydrophobicity of the molecules. These methods can therefore give a more representative view of what is “seen” by the receptor to which a compound will bind.

Field-based similarity searching methods have generally been based on atom-centred fields, for example, the Field Based Similarity Searcher (FBSS) program calculates similarity based on steric, electrostatic or hydrophobic fields or any combination thereof (Thorner et. al. (1996), Wild & Willett (1996)).  The fields are represented by atom-centred gaussians and a genetic algorithm (GA) is used to find an alignment that maximises the similarity between two molecules using the Carbo coefficient applied to the gaussian representations.

The aim of this work is to calculate the similarity between two molecules based on their molecular interaction fields (MIFs) rather than atom-centred fields. The rationale for the approach is that two molecules may exhibit similar interactions with a receptor even when the atoms that give rise to the interactions are in different locations in the active site.  Molecular interaction fields are calculated using the GRID program (Goodford, 1985) which places a molecule within a 3D grid and determines the interaction energy at each grid point between the molecule and a probe atom. The interaction energy at each grid point, xyz, is based on a number of components including the Lennard-Jones potential (Elj), the electrostatic potential (Eel) and the hydrogen bonding potential (Ehb):

Various probe atoms are available and more than one probe can be used at a time. The similarity between two molecules is then calculated from the grid representations using a methodology similar to that used in FBSS. The first step is to derive gaussian approximations of the MIFs. Points of minimum energy are identified in the grid and gaussians are centred at each of the minima. Each gaussian takes the form:

where h is the height of the gaussian, σ is the rate of decay of the gaussian and dis is the distance between the centre of the gaussian and the point, i, at which the interaction energy is being calculated. A simplex method is then used to optimise the parameters of the gaussians. Finally, a genetic algorithm is used to find the alignment of two molecules that maximises the similarity between the gaussian approximations.

Many different possibilities exist for deriving the gaussian approximations, for example, the number of gaussians and the number of grid points used in the optimisation can be varied, and the grid resolution and the number and type of probes used to calculate the fields can also be varied. The effectiveness of the various representations at approximating the MIFs has been determined by calculating a derived grid from the Gaussian approximations and comparing it with the original grid. Finally, the effectiveness of the method in similarity searching has been determined by comparison with established similarity methods via enrichment plots.

  1. Goodford (1985). “A Computational-Procedure For Determining Energetically Favorable Binding-Sites On Biologically Important Macromolecules”. Journal of Medicinal Chemistry, 28, 849-857.
  2. Thorner, D.A., Wild, D.J., Willett, P., Wright, P.M. (1996). "Similarity Searching in Files of Three-Dimensional Chemical Structures:  Flexible Field-Based Searching of Molecular Electrostatic Potentials". Journal of Chemical Information and Computer Science, 36, 900-908.
  3. Wild, D.J., Willett, P. (1996).  "Similarity Searching in Files of Three-Dimensional Chemical Structures.  Alignment of Molecular Electrostatic Potential Fields with a Genetic Algorithm". Journal of Chemical Information and Computer Science, 36, 159-167.



P-87 : ChemXtreme: Harvesting Chemical Information From Internet Using Distributed Approach

Muthukumarasamy Karthikeyan; National Chemical Laboratory, Pune, IN
S Krishnan, National Chemical Laboratory

Internet is a resource of large amount of unstructured chemical information. There is a wealth of scientific information available with experimental data, which require data mining and analysis tool. The java based software entitled "ChemXtreme" developed for harvesting chemical information from Internet using distributed approach is presented. In the present investigation, a novel and secured method of harvesting chemical information from public resources using distributed systems. The technology used for this purpose utilizes the "searching the search engine" strategy, where the URLs returned from search engines are analyzed for required patterns 'word by word' for chemical information and transformed automatically into structured format compatible for database operations. The query data from server is encoded, encrypted and compressed and sent to all the 'participating' active clients in the network with internet connectivity. The data received from the clients after web search and analysis is decompressed, verified and added to the database for data mining and further analysis. Given list of CAS-RN or Chemical Name (preferably common name) of the substances with selective keywords as patterns returns more accurate and useful data from URLs from the search engines. With 2MBPS connectivity speed the program is able to search (2-3 sec per query) and analyse (6-7 sec per URL) per client. The more detailed results on some case studies involving search of about common 100,000 CAS-RN with MSDS context in distributed environment will be presented. The biological activity data retrieved for list of chemical substance can be directly used for QSAR or QSTR analysis in combination with some of the descriptor generator and statistical tools.



Last updated 17 April, 2005

[Home] [General Information] [Corporate Sponsors] [Technical Program] [Accommodations] [Call For Papers] [Abstract Submission] [Registration] [Student Bursaries] [Exhibition] [Society Sponsors] [Advisory Board] [Previous Meetings] [Contact Us]