7th ICCS Abstracts - Red Poster Session


	Hosted by ZBH

Keep me informed

View page in

format for printing

P-2 : Descriptors of Chemical Reactivity and Application to Mutagenicity Prediction

Joao Aires-de-Sousa; Universidade Nova de Lisboa, Caparica, PT
Qing-You Zhang, Universidade Nova de Lisboa

Mutagenicity is strongly related to chemical reactivity, namely to the ability of a compound to be metabolically activated and to react with DNA. [1] Chemical reactivity depends on the properties of chemical bonds, which determine how bonds break and rearrange in the presence of certain reactants, catalysts and conditions.

In this communication we will show our studies with descriptors of molecular reactivity (physicochemical properties of bonds) for the prediction of mutagenicity in Salmonella (Ames assay). Those empirical descriptors are easily calculated from the molecular structure and can be quickly generated for large data sets of compounds.

In order to use the information concerning several properties of bonds for an entire molecule, and at the same time to keep its representation within a reasonable fixed length, all the bonds of a molecule are mapped into a fixed-length 2D self-organizing map.

A self-organizing map (SOM) is trained beforehand with a diversity of bonds from different structures (each bond described by seven bond properties calculated by PETRA [2]). Then all the bonds of one molecule are submitted to the trained SOM, and the pattern of activated neurons is interpreted as a map of the reactivity features of that molecule (MOLMAP) – a fingerprint of the bonds available in that structure.

MOLMAP descriptors were generated for 548 compounds, and were complemented with 17 general molecular descriptors such as the molecular weight, maximum charge, or ring strain energy. On their basis, a random forest established a predictive model for mutagenicity. Learning in a random forest results from training an ensemble of classification trees. [3] Each tree is grown with a random subset of descriptors and a random subset of objects. The final prediction is obtained by majority voting. Random forests additionally associate a probability to every prediction, and report the importance of each descriptor in the global model.

We used data from the Berkeley Carcinogenic Potency Database [4] consisting of SMILES strings and the corresponding outcome of the Ames test. [5] After excluding inorganic and organometallic compounds, salts, duplicates, and structures not accepted by PETRA 3.11, [2] the remaining 548 structures were partitioned into a training and a test set with 445 and 103 objects respectively. Correct predictions were achieved for 81-84% of the independent test set, and an internal cross-validation error of 22% was obtained for the training set (out of bag estimation). These results compare well with the experimental interlaboratorial reproducibility error of ca. 15% usually associated with the Ames assay. [6]

Inspection of the results reveals that the MOLMAP descriptors do not simply correspond to a code of structural fragments. The model has some ability to base predictions for unknown functional groups on the detection of reactivity sites.

REFERENCES:

For a revision on QSAR for predicting mutagenicity see: G. Patlewicz; R. Rodford; J. D. Walker. Environ. Toxicol. Chem. 2003, 22, 1885-1893.
http://www2.chemie.uni-erlangen.de/software/petra
V. Svetnik; A. Liaw; C. Tong; J. C. Culberson; R. P. Sheridan; B. P Feuston. J. Chem. Inf. Comput. Sci. 2003, 43, 1947-1958.
http://potency.berkeley.edu
Downloaded from http://www.epa.gov/nheerl/dsstox . Version 15Oct03.
J. Kazius; R. McGuire; R. Bursi. J. Med. Chem. 2005, 48 (1), 312 -320.