Hosted
by ZBH

Keep me informed

View page in

format for printing

EFFICIENT AND EFFECTIVE GENERATION, STORAGE, AND MANIPULATION OF FULLY FLEXIBLE PHARMACOPHORE MULTIPLETS FOR USE IN COMBINATORIAL LIBRARY DESIGN

Robert Clark1, E. Abrahamian1, P Fox, Lars Nærum2, H Thøgersen2; 1Tripos, Inc., St. Louis, MO, USA, 2Novo Nordisk A/S

Generalized formulations of the key interaction points in ligand binding to a specific protein - i.e., pharmaco-phore hypotheses - play a key role in 3D database searching, making it possible to identify lead compounds which can interact in the same way yet fall outside existing lead series or patent estates.  Fully flexible 3D searching has proven particularly effective in this regard. Given this success, it seemed very reasonable to characterize ligands of pharmacological interest in terms of all possible pharmacophores they might present to a potential binding site.  Unfortunately, even relatively small and rigid ligands can present a remarkably large number of pharmacophoric patterns, so it is necessary to decompose them into component feature multiplets of manageable size. Hopes for this approach were buoyed by the successful application of 2D substructural fingerprints, based on small constituent fragments, in diversity analysis and library design, and several groups investigated the use of pharmacophore distance triplets in lead follow-up using tools available in the ChemX software(or in modified versions thereof), in the PDT module in SYBYL, or in analogous software suites developed in-house .The extensive literature available in this area (e.g., [1-6]) makes it clear that triplets of features do not capture information at a high enough level of complexity to be useful.  This is perhaps not surprising, given that a pharmacophore query consisting of only three features is rarely specific enough to be useful in database searching. Pharmacophore quartets have shown considerably more potential, but their behavior has proven somewhat difficult to characterize adequately. This is in part due to their very large size and in part to uncertainty in how pharmacophores presented by different conformations should be consolidated and in how pharmacophore fingerprints from different molecules - or sets of molecules - could be meaningfully compared. This is particularly a problem as regards hydrogen bond donor and acceptor extension (site) points, which augment the information contained in the corresponding atomic features in critical ways not captured in "classical" pharmacophore quartet fingerprints. We will describe a novel compression method that supports storage, manipulation and analysis of large databases of very large pharmacophore triplet and quartet fingerprints without putting undue strain on memory resources.  The method is designed to operate fast enough to allow on-the-fly characterization of candidate combinatorial libraries in terms of suitability for follow-up of hits generated by high-throughput screening (HTS), yet is flexible enough to incorporate privileged substructures and extension points as well as atomic features.

1. A.C. Good and I.D. Kuntz; J. Comput.-Aided Mol. Design, 1995, 9, 373-379.
2. X. Chen, A. Rusinko, and S.S. Young; J. Chem. Inf. Comput. Sci. 1998, 38, 1054-1062.
3. J.S. Mason, I. Morize, P.R. Menard, D.L. Cheney, C. Hulme and R.F. Labaudiniere; J. Med. Chem. 1999, 42, 3251-3264.
4. M.J. McGregor and S.M. Muskal; J. Chem. Inf. Comput. Sci. 1999, 39, 569-574.
5. H. Matter and T. Pötter; J. Chem. Inf. Comput. Sci. 1999, 39, 1211-1225.
6. J.S. Mason and B.R. Beno; J. Mol. Graphics Mod. 2000, 18, 438-451.
7. S.J. Cato, in: Pharmacophore Perception, Development, and Use in Drug Design (O.F. Güner, ed.). International University Line, La Jolla, 2000; pp. 107-125.
8.M.J. McGregor and S.M. Muskal; J. Chem. Inf. Comput. Sci. 2000, 40, 117-125.

Back

 

COMPOUND SELECTION FOR VIRTUAL SCREENING APPLICATIONS

Paul Watson, Mike Hartshorn; Astex Technology, Cambridge CB4 0WE, UK

Virtual screening of chemical compounds using protein-ligand docking is being increasingly used in the pharmaceutical industry to identify novel lead structures. It is now straightforward to use a docking program like GOLD to dock millions of compounds against a proteins structure. The analysis of the results from such a docking run is less straightforward because of inadequacies in the docking methodology and the absence of accurate scoring functions for ranking compounds. This talk will address another important but less glamorous challenge - the problem of constructing, handling and querying compound collections prior to the docking process. Time spent on removing unnecessary compounds or providing focussed sets of compounds saves CPU time for docking jobs and, more importantly, saves analysis time after jobs have been run.

First we describe the construction of ATLAS (Astex Technology Library of Available Substances), a database of commercially available compounds from selected suppliers. ATLAS currently has 2.1 million unique compounds stored in an Oracle database together with calculated properties and information on the suppliers.  The informatics associated with building ATLAS, with incrementally updating it, and with handling duplicate entries is discussed. ATLAS can be interactively ‘pre-screened’ via a web-based interface using Oracle data cartridge technology. The pre-screening interface can identify subsets of compounds via logical queries based on 1D information (e.g. restrictions on chemical supplier of molecular weight) and 2D information (e.g. inclusion or exclusion of chemical substructures). Examples of queries and their associated search times will be given.

Virtual libraries provide another interesting source of compounds for docking. We have developed a web-based reaction toolkit that allows the creation of virtual libraries using user-defined chemical reactions. Chemical transformations are submitted to a reaction registry and libraries are enumerated from lists of relevant monomers identified using the pre-screening interface. The enumeration is performed within Oracle via what might be termed a “reaction cartridge”. Output from the cartridge can then be seamlessly fed into the docking procedure or back into the pre-screening interface for further refinement. Examples of the reaction toolkit will be presented.

Back

 

UNDERSTANDING CONTRADICTORY CLAIMS ABOUT LIBRARY DESIGN

Eric Martin, Narasinga Rao; chiron Corp., Emeryville, CA, USA

There have been several commonly expressed but contradictory claims about combinatorial library design:

  • Selected structures should sample extremes/interior of property space,
  • Property spaces should be high/low dimensional,
  • Active compounds should/should not cluster in diversity space,
  • Algorithms should be optimized for focused/diverse libraries,
  • Computed designs are better/worse than medicinal chemist’s intuition, and
  • The best property spaces are substituent/product based.

These contradictions often lead to confusion and mistakes. How did so many contradictory opinions arise? Which are correct? They arise largely because different practitioners use significantly different approaches for solving significantly different problems. Misunderstandings often arise when we assume that what was appropriate for one approach generalizes to others. These substantially different approaches include the following:

  • Fragment vs. whole-molecule based approaches,
  • 2-D vs. single conformer 3-D vs. conformationally enumerated 3-D properties,
  • Congeneric series vs. combinatorial libraries vs. arbitrary compound collections,
  • Highly automated vs. interactive approaches, and
  • Lead finding vs. lead hopping vs. lead optimization.

The presentation uses several examples to clarify the origins of these misunderstandings, and concludes that each of these seemingly incommensurate claims are sometimes correct, but each for a different kind of library design problem.

One example will be from a structure-based library designed for a protease inhibitor. In this single-substituent library, 1351 candidates were docked using a template-forced distance geometry method.  They were scored with a customized scoring function, derived using the interactive Magnet program, that captures the intuition of experienced medicinal chemists. It included terms for molecular mechanics energy, MW, rotatable bonds, buried surface area, and number of atoms in the P4 pocket. The docking results were divided into 4 bins: Best, Fair, Poor, and Bad. Compounds were chosen to maximize the diversity, but sampling from the Best bin most heavily, and the inferior bins progressively less. 139 compounds (10% of candidates) were purchased and tested. The results are summarized in Table I.

Table I. Statistics for selecting, and testing compounds from each docking bin.

 

Set

Best

Fair

Poor

Bad

NA

Total

OK-acd

98

388

625

240

--

1351

Design

98

110

13

0

--

221

Screen

49

69

7

3

11

139

%Scrn

50%

18%

1%

1%

--

10%

Hits

15

5

1

0

0

21

Hit/scrn

31%

7%

14%

0%

0%

15%

 

31% of the Best bin was hits, compared to only 7% of the Fair bin, 1 compound from the Poor bin, and no compounds from the Bad bin. This demonstrates that the docking did successfully predict the order of activity. However, the best hits were only twice as active as the initial lead, well short of the 10-fold activity increase required by the project. Because of the success of the docking-based predictions and the graded diversity sampling, it was clear after synthesizing and testing only 10% of the candidates that this approach was unlikely to achieve the goal, so the approach was “rationally terminated”. If the design had included only the focused docking component, it would remain unclear whether all types of reasonable candidates had been sampled.  If it had included only the diversity component, 10% sampling would not have been sufficient to terminate the approach. This example illustrates a general principle, that most library designs benefit from a combination of targeted and diversity components. It also shows how library design should not compete with medicinal chemistry intuition, but rather should explicitly incorporate it.

A second example addresses the debate between methods that sample the extremes of property space, such as D-optimal design, and methods that sample the interior of the space, such as grid-based methods or exclusion sphere methods. This disagreement arises from 2 sources: pure diversity designs vs. property-biased designs, and substituent designs vs. whole molecule designs. Distributions of similarities were studied for 4 daylight fingerprint-based libraries: a 20-substituent D-optimal design, a 20-substituents ADME-property constrained D-optimal design, and the 8,000 enumerated products from the 3-site N-substituted glycine “peptoid” libraries built from those 2 designs. D-optimal design did indeed separate the substituents so that all similarities were less than 0.5, a very spread out “extreme” design. However, these are just fragments of the final product structures. Each product molecule has 57 neighbors in the 8000-member library sharing 2 of 3 substituents (and, of course, the template). Each product also has 1083 neighbors sharing 1 common substituent. In fact, in these “overly extreme” substituent-based D-optimal designs, every single member of the enumerated product libraries had at least one near neighbor more than 0.9 similar, and dozens closer than 0.8. Table 2 contains distributions of the pair-wise Tanimoto similarities for these libraries. There are no close similarities among the substituents, and a good range of distances among the enumerated products, with a few percent similar enough to expect similar activity, indicating widely spaced but continuous coverage of the property space.

Table 2. Similarity Distributions for 20-Substituent Trimer Library Designs.

 

20 Substituents

8000 Enumerated Products

Similarity

D-optimal

ADME

D-optimal

ADME

0.9

0%

0%

0%

0%

0.8

0%

0%

1%

1%

0.7

0%

0%

3%

2%

0.6

0%

0%

5%

5%

0.5

1%

1%

8%

10%

0.4

3%

2%

16%

21%

0.3

7%

7%

38%

38%

0.2

16%

23%

27%

20%

0.1

40%

29%

3%

3%

0

33%

38%

0%

0%

Total

100%

100%

100%

100%

 

 A thought experiment in Figure 1 clarifies a third common misunderstanding with important implications for descriptor validation studies. It arises from incorrectly assuming that similarity and diversity are simple “opposites”.  Similarity is a 3-D property of 2 or more specific bioactive conformations intended to cluster activity for QSAR. Diversity is a “4-D” property of entire ensembles of conformations for a designed set of molecules meant to quantify pharmacophore coverage. Most diversity descriptor validation methods assume incorrectly that active compounds should cluster in 4-D diversity space. For rigid compounds, activity clustering should occur; proximal rigid compounds in diversity space should have similar biological profiles. Flexible compounds are different. There are two possible property spaces for flexible compounds. In a 3-D “similarity” property space (fig. 1a) each point represents a single 3-D conformation, so each flexible compound sets a collection of points. In a 4-D “diversity” space (fig. 1b), each point represents the entire ensemble of a compound’s accessible 3-D conformations, and distance reflects the overlap of similar conformers between two compounds. In the former case, every active compound should “cluster” to the extent that one point (conformation) from each active molecule should be in an active region of space (fig. 1a). This is the basis of “active-analog” approaches to pharmacophore identification. However, most clustering or near neighbor based diversity descriptor validation studies explicitly or implicitly use the latter approach, where each (flexible) molecule is represented by a single point in diversity space. 2-D or single conformer 3-D descriptors ignore flexibility, so activity should not cluster except in special cases where either the bioactive conformation is used, or where similar structures are constrained to have similar conformational ensembles. However, some diversity descriptors, such as a molecular pharmacophore fingerprints, combine the properties of a compound’s whole ensemble of conformations into a single point in diversity space (see fig 1b). In this very useful diversity approach, designs with widely spaced points have very different ensembles of conformations, and are therefore both diverse for efficient coverage, and orthogonal for efficiently decoding the active analogs. These are excellent diversity descriptors for library design. They do not generally exhibit activity clustering for any given target, because diversity space distances are dominated by the generally much larger number of inactive conformations (see fig. 1b). Pharmacophore fingerprints have faired poorly in cluster-based validation studies, not because they are poor diversity descriptors, but because of a poorly conceived diversity space validation hypothesis.

Figure 1. Two ways to represent 4 diverse but active compounds. Activity should cluster in the conformationally expanded 3-D “similarity space” where each point represents a single conformation (1a), but not in the 4-D “Diversity Space” (1b) where each point represents the whole ensemble conformations. Hence these 4 compounds are both similar (share the active pharmacophore) and diverse (uniquely cover all of property space).

 

Back

 

A LOOK AT ORGANIC SUBSTITUENTS FROM THE CHEMINFORMATICS POINT OF VIEW

Peter Ertl;  Novartis Pharma AG, CH-4002 Basel, Switzerland

The concept of organic substituents and their influence on molecular properties is one of the pillars of modern organic chemistry, as well as a basis of structure activity analysis. Pioneering contributions of Hammett about the effect of substituents on reactivity, reaction mechanism theory of Ingold, QSAR concept of Hansch, or ideas of Craig and Topliss may serve as examples. Current boom in the area of combinatorial chemistry is also based on the concept of substituents / building blocks.

Present contribution addresses the concept of substituents from the cheminformatics point of view. Question about the total number of substituents in the known “organic chemistry space” is addressed, based on the results of substituent analysis in several large databases. Implications for the diversity analysis and the possible size of a hypothetical “drug universe” are discussed. Extracted substituents are organized into a large “substituent tree”. Methods to characterize substituents by various calculated properties are described and used to illustrate diversity in the substituent space. These properties include hydrophobicity, easy to calculate electronic parameters compatible with the Hammett sigma constants, substituent size and hydrogen bonding power. Drug-likeness of substituents is explored, based on the analysis of a large collection of substituents extracted from the common drugs.

Various application possibilities of a large, representative database of drug-like substituents with calculated properties are discussed, including its use in structure activity analysis, bioisosteric replacements or rational drug design. As an example an algorithm for an automatic design of molecules with desired physicochemical properties and pharmacophoric features, based on an evolutionary optimization will be described. The large database of characterized substituents provides also an excellent basis for support of combinatorial chemistry.

The lecture will be enhanced by demonstrating several web-based cheminformatics tools, which have been developed at Novartis. A program for substituent bioisosteric search, which allows automatic identification of substituent or spacers, physicochemically compatible with the given target will be presented, as well as a system for optimal selection of building blocks for the design of targeted combinatorial libraries.

Back

 

Last updated 05 January, 2005

[Home] [General Information] [Corporate Sponsors] [Accommodations] [Call For Papers] [Abstract Submission] [Registration] [Exhibition] [Society Sponsors] [Advisory Board] [Previous Meetings] [Contact Us]