THE APPLICATION OF MULTIOBJECTIVE EVOLUTIONARY ALGORITHMS IN CHEMOINFORMATICS
Valerie Gillet; Department of Information Studies, University of Sheffield, Sheffield, UK
Evolutionary algorithms (EAs) have been used for the optimisation of many difficult problems in Chemoinformatics. Examples include de novo design, conformational analysis, protein-ligand docking and combinatorial library design. EAs are well suited to exploring the large search spaces that characterise these problems.
Most real world problems, including many in the field of chemoinformatics, represent multiobjective optimisation problems and often the individual objectives are in conflict. For example, in combinatorial library design the ideal library may be one that has maximum diversity, minimum cost and drug-like physicochemical property profiles. The typical way in which multiobjective problems are handled in EAs (and in more traditional optimisation techniques) is to reduce several objectives to a single objective via a weighted-sum fitness function. For example, the SELECT program for combinatorial library design  is based on a GA where multiple objectives, such as diversity and drug-like physicochemical properties, and handled via a weighted-sumfitness function such as the one shown:
There are, however, several limitations associated with the weighted-sum approach. For example, it is not always easy to assign appropriate weights, especially when the objectives are of different types such as diversity and cost in the library design context; the setting of weights can result in regions of the search space being obscured; and the result is a single solution that represents one particular trade-off in the objectives, when usually a family of solutions exists each of which represents a different compromise in the objectives.
Multiobjective evolutionary algorithms (MOEAs) are a recent development in evolutionary computing where multiple objectives are handled independently without the need for summation. We have recently been exploring the potential of MOEAs for the optimisation of problems in chemoinformatics. MOEAs exploit the population nature of EAs to evolve a family of solutions in parallel where each solution represents a different compromise in the objectives. The objectives in a MOEA are handled independently without the need to assign relative weights. The result is an entire family of equivalent solutions that represents the trade-off surface over all the objectives and thus allows the full range of compromise solutions to be explored. Thus, many of the limitations associated with the weighted-sum approach are overcome and the designer can make an informed choice from the full range of compromise solutions that are available.
Firstly, the multiobjective genetic algorithm (MOGA) will be used to illustrate the MOEA approach to multiobjective optimisation. Secondly, the application of multiobjective evolutionary algorithms to problems in chemoinformatics will be described including examples from different aspects of combinatorial library design [2,3].
 Gillet et al., Selecting combinatorial libraries to optimize diversity and physical Properties, Journal of Chemical Information and Computer Sciences, 39. 1999, 169-177.
 Gillet, V.J., Khatib, W., Willett, P., Fleming, P.J. and Green, D.V.S. Combinatorial library design using a multiobjective genetic algorithm, J. Chem. Inf. Comput. Sci., In Press.
 Gillet, V.J., Willett, P. Fleming, P.J., Green, D.V.S. Designing Focused Libraries Using MoSELECT. J. Mol. Graphics Modell. In Press.
REOPTIMIZATION OF MDL KEYS FOR USE IN DRUG DISCOVERY
Joseph L. Durant, Burton A. Leland, Douglas R. Henry, James G. Nourse; MDL Information System, Inc., San Leandro, CA, USA
The use of keysets based on a variety of different descriptors has an established place within the drug discovery workflow. Clustering anddiversity analysis are two common applications which are normally implemented using keysets, but many others exist. Binary-coded 2D descriptors or fingerprints are a popular choice for use in such keysets; they typically perform well, and are quickly calculated without requiring generation of 3D structures.
MDL has used both 166 bit keysets and 960 bit keysets for a number of years. These keysets were originally designed and optimized for substructure searching, not for use in other workflows. However, they do have performance for clustering and diversity analysis on par with keysets based on feature trees, Daylight fingerprints or Tripos Holograms.
The technology underlying the MDL keysets is quite general, and it is straightforward to produce a wide variety of keysets. It is logical to ask whether a keyset can be constructed using this technology which has superior performance in drug-discovery applications.
We will present an overview of the underlying technology supporting the definition of features, and the encoding of these features into keysets. This technology allows definition of features as combinations of sets of atom properties, bond properties and atomic neighborhoods at various topological separations. Additionally, a variety of custom features can be encoded, which support both construction of the 166-bit keyset and encoding of Sgroup features. Sgroups are substructures with additional chemical information or data associated with them. Features, combined with an occurence count, can then be used to set one or more bits in a keyset. In this way keysets can be constructed in which bits are set by unique features. Alternatively, keysets can combine multiple features into a single bit, producing a hashed keyset. Additionally a feature can be encoded into a single bit, or a series of bits. In this way a vast variety of keysets can be created.
Construction of a keyset containing all possible combinations of our set of defined features with occurence counts of one or more have been carried out. Construction of smaller keysets by random selection of included bits has demonstrated the robustness of these 2D keysets in clustering drugs. In general one finds only a few percent standard deviation in the clustering performance of populations of similarly sized keysets. Additionally, performance is seen to be quite insensitive to keyset size, especially for keysets larger than 1000 bits.
We have also examined a variety of strategies to construct keysets which are optimized for use in clustering and diversity applications for drug candidates. The performance and relative merits of these strategies will be discussed.
CLUSTERING OF DATABASES BY BROWSING AND BY MATHEMATICAL METHODS IN THE BIOLOGICAL SPACE
Alexander Kos1, Dusan Toman, Vladimir V. Poroikov, Ulrich Jordis, Timo Knuuttila; 1Akos GmbH, Riehen, Switzerland
Databases can be analysed, and understood with the help of visualization. One needs (a) a set of understandable parameters, (b) a method to reduce large data sets, or matrices, and (c) a method to visualize the results. We will show a comparison between parameters in the chemical space, using the MDL mol keys and the PASS (Prediction of Activity Spectra of Substances) parameters describing a biological space. Data reduction is done by clustering of the dimensions (columns in a spreadsheet), and by clustering the records (rows) of a database. Numerical clustering is done with a fast tree-based implementation of self-organizing maps (non-supervised neural net). The final analysis is done on a reduced dataset using miner3d.excel doing "clustering by browsing". Using the MDDR (MDL Drug Data Report) we were able to find new compounds that proved to be active acetylcholinesterase inhibitors.