'BASIS PRODUCTS' METHOD FOR RAPID PROPERTY CALCULATION AND FILTERING ON VIRTUAL COMBINATORIAL LIBRARIES
Shenghua Shi, Zhengwei Peng, Jaroslaw Kostrowicki, Genevieve Paderes, Atsuo Kuki; Pfizer Global Research & Development- La Jolla / Agouron Pharmaceuticals, Inc.,La Jolla, CA 92137, USA
Design of combinatorial libraries with desired physical and/or pharmacological property profiles plays an important role in reducing attrition for accelerated drug discovery. However, the size of virtual combinatorial libraries often is so large that the direct structure enumeration and property calculations of virtual combinatorial libraries for design of libraries are unfeasible. In the present work, a ‘basis products’ method is introduced for rapid property calculations and filtering on combinatorial libraries.
The 'basis products' are the products formed from all the reactants for a reaction component in combination with a particular set of simple complementary reactant partners. For a K-component reaction, P = A1 + A2 + … + Ak, the 'basis products' Pi* for component i are defined as Pi* = Ai + , where , m = 1, 2,…, K is a particular set of reactants which form a particular product . There is a one-to-one correspondence between the reactants Ai and the 'basis products' . For an additive or approximately additive calculated property, Q, where a certain degree of mutual influence occurs between neighboring fragments, the 'basis products' method states that the property Q(P) for a product P in a combinatorial library can be calculated in terms of the property values of the corresponding 'basis products' and the particular product P* as . With this method, instead of forming and calculating the property for products , where Ni is the number of reactants for component i, one only needs to form the structures and perform the property calculation for 'basis products' Pj* and one special product P*. The gain in efficiency is obvious.
By virtue of ‘basis products’ method for property calculations, a combinatorial filtering algorithm based on tree sorting is developed for efficiently selecting products with desired properties from a combinatorial library. Consider filtering the virtual products according to the restrictions on M positively valued properties Q(P) = (Q1(P) Q2(P) … QM(P)): ( Note that any property value can be converted to a positive one by a constant shift for filtering purposes). Then the corresponding properties of the ‘basis products’ for component j with , must satisfy the condition and for component K, the requirement is . To efficiently solve the M simultaneous equations shown above, a simple recurrence sorting algorithm is utilized, where at generation m, , all the compounds have the same (within the bin size for real-valued property) property values for properties 1 to (m-1) and the value for property m is used for sorting. After sorting, the compounds are then grouped into bins which will serve as the nodes for next generation. The tree sorting is carried out on the 'basis products' for each of reaction components against M properties used in filtering. Then, to find the solution to the above equations, one just simply locates the bins in M-th generation. The products satisfying the requirements on the properties are the combinations of the reactants corresponding to the ‘basis products’ in the located bins. The high efficiency of this filtering method stems from the fact that it does not require the calculation of any properties of the numerous virtual products and only searches through a limited number of bins of 'basis products' in a hierarchical way.
The method is illustrated with calculating product logPs, van der Waals volumes, solvent accessible surface areas, and other product properties. Good results are obtained in filtering for a number of important molecular properties in a virtual library of 1.5 billion.
DEVELOPMENT OF PREDICTIVE ADME MODELS AND THEIR USE IN COMBINATORIAL LIBRARY PROFILING AND DESIGN
Robert D Brown, Eric A Jamois, Osman Guner; Accelrys Inc, San Diego, CA, USA
Following the introduction of combinatorial synthesis techniques into the drug discovery process it was often found that larger number of hits were being generated in high throughput screens than previously, but that these hits were not being successfully translated into medicinal chemistry leads. The design of such libraries had typically considered only the diversity of the reagents or products and had led to compounds that were too large, too highly featured and/or too hydrophobic, making them unsuitable for further development.
The design criteria for combinatorial libraries have therefore evolved from a simple consideration of diversity or similarity, to a multi-component optimization involving a simultaneous consideration of the coverage of chemical space; the suitability of the compounds for medicinal chemistry development and the cost and efficiency of their synthesis. This suitability is often measure in terms of an molecule¡¦s ¡§drug-likeness¡¨ or ¡§lead-likeness¡¨, which is in turn based on ranges of key physical properties observed in sets of known leads or drugs (an example being the rules formulated by Lipinski).
Many drug candidates that successfully pass pre-clincial development ultimately fail in clinical trials due to pharmacokinetic or toxicity problems and traditionally such properties have only been considered in the later stages of pre-clinical development. It is suggested that a consideration of ADME (absorption, distribution, metabolism and excretion) properties early in the drug discovery process might have a positive impact on the failure rate of compounds in later development. This in turn should improve the overall efficiency and reduce the cost of the drug discovery process. Such a consideration will also provide a more accurate assessment of a molecules potential success as a lead for a medicinal chemistry series than simple property filters alone. Thus, one constraint on the design of combinatorial libraries for lead generation and early stage lead optimization should be the selection of molecules with favorable ADME properties.
We have developed predictive computational models of a number of ADME properties, including aqueous solubility, human intestinal absorption, blood brain barrier penetration and protein binding that are sufficiently fast to be used in a high-throughput mode in library design. This talk will discuss the development and validation of these models. We will then discuss library design algorithms that we have developed based on the use of Monte-Carlo, simulated annealing or genetic algorithm optimization to allow the selection of subsets of a virtual combinatorial library. These selections can be made based on the simultaneous consideration of
- Diversity, similarity or coverage of chemical space,
- Computed ADME or other properties of the virtual library members,
- The cost, availability and desirability of reagents and,
- Optionally, the need to design full or sparse arrays or mixtures.
Finally, we will present example library designs. We show that the application of the types of constraints described above can have a minimal impact on the overall diversity or similarity of a library. At the same time, the selected subset of the library should have significantly better ADME properties and a more favorable cost and ease of production.
SELF-ORGANIZING NEURAL NETWORKS IN DRUG DESIGN
Lothar Terfloth, Johann Gasteiger; Computer-Chemie-Centrum, Universität Erlangen-Nürnberg, D-91052 Erlangen, Germany
Despite of the almost completely elucidated sequence of the human genome there is still a lack of knowledge with respect to both the coding regions and the three-dimensional structure of the proteins coded therein. Until now most systems of pharmaceutical interest have therapeutic targets with an unknown three-dimensional structure. Therefore it is a task in rational drug design to find and optimize lead structures for receptors with an unkown three-dimensional structure. Neural networks can be applied to this task on the basis of structure-activity data.
Beyond the classification and prediction of biologically active compounds presented here, self-organizing neural networks cover a broad range of applications in drug design:
- analysis of high-throughput screening (HTS) and multi-dimensional data;
- lead discovery;
- comparison of compound libraries;
- analysis of the similarity and diversity of combinatorial libraries;
- design of targeted libraries for HTS.
The advantages of self-organizing neural networks are their scalability to the size of the dataset, the fairly rapid training, the almost instantaneous prediction, the retrainability with new information, and that their weights can be interpreted if descriptors with physicochemical meaning are used. They are applicable to similarity perception, classification and clustering, and visualization of complex, multidimensional information. The application of the program SONNIA (Self-Organizing Neural Network for Information Analysis) for the classification of compounds having a different biological activity will be shown on the basis of two examples:
- The prediction of an acceptable ADME-Tox (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile for a new compound at an early stage of the drug discovery process is important. We direct a special focus towards the investigation of the metabolism of xenobiotics by cytochrome P450. Xenobiotics are oxidized, reduced, or hydrolyzed in the first phase of the metabolism. Cytochrome P450 is involved in more than 90 percent of all these oxidation reactions. The two major isoforms of cytochrome P450 contributing to these oxidation reactions are 3A4 with about 50 percent and 2D6 with about 25 percent. The establishment of a classification model for the cytochrome P450 isoforms 3A4 and 2D6 is important and is established here by a self-organizing neural network based on a dataset with 123 compounds will be presented.
- In the second example, a dataset consisting of 299 biologically active compounds, namely 75 5-hydroxy-tryptamine 5-HT1a-receptor agonists, 75 histamine H2-receptor antagonists, 74 monoamine oxidase MAOA inhibitors, and 75 thrombin inhibitors is investigated. Different structure representations have been used in order to establish a reliable and predictive classification model. A leave one out crossvalidation resulted in 92-98 percent correct predictions.
In another experiment we were able to show that these 299 biologically active compounds buried in a bulk of 7848 structures of unkown activity form distinct clusters.
1. J. Zupan, J. Gasteiger, Neural Networks in Chemistry and Drug Design, Second Edition, Wiley-VCH, Weinheim, 1999.