The Calculation of Molecular Similarity: Principles and Practice
Peter Willett, University of Sheffield
For details, see the full paper in the Summer School issue of Molecular Informatics
The Calculation of Molecular Similarity: Principles and Practice - - PowerPoint PPT Presentation
The Calculation of Molecular Similarity: Principles and Practice Peter Willett, University of Sheffield For details, see the full paper in the Summer School issue of Molecular Informatics Overview Principles Why is molecular similarity
For details, see the full paper in the Summer School issue of Molecular Informatics
Molecular descriptors Weighting schemes Similarity coefficients
N O O H O H Morphine N O O H O Codeine N O O O O O Heroin
Banana Orange Basketball
1D properties: MW, logP, PSA etc 2D properties: fingerprints, topological indices, maximum common substructures 3D properties: molecular fields, shape
from all parts of the descriptor
sets of molecular descriptors
methods are very slow
substructures (or fragments)
molecular similarity (e.g., similarity searching, clustering and diversity analysis)
C C C C C C C C O
fragments to bits)
fragments to bits)
database
e.g., physicochemical property vectors
(e.g., cosine coefficient, Euclidean distance, Tversky index) but fingerprint/Tanimoto measures are the standard
N O O H O H
Morphine
N O O H O
Codeine 0.99 similar
N O O O O O
Heroin 0.95 similar
N O
Methadone 0.20 similar
Daylight fingerprints; Tanimoto similarities
N O N O O H O H
structure and measure the similarity
neighbours”) to the searcher
further searches, bioactivity testing or whatever
O H N N O H N H N N H
2
O N N H N N H
2
N H N N H
2
N H N N N H N N H O H Query
Compute similarities and then cluster molecules so that molecules in the same (or different) clusters are similar (or dissimilar) to each other Range of clustering methods available, e.g., Jarvis-Patrick (non-hierarchical) or Ward’s (hierarchical) methods Modern hardware/software enables clustering of files containing millions of molecules
molecule and the subset molecules
subset molecules
it against a database that contains other molecules having the same activity
actives towards the top of the ranking
compound (similarity fusion)
database in order of decreasing similarity
rank position
reference structures, types of fingerprint, biological activities etc.
used in docking studies. Cf “wisdom of crowds”
similarity coefficient
some types of weighting scheme
between similarity measures?
similarity searching?
similar molecule can come to market for ten years
Typical expert judgements Plot of proportion of experts saying similar against similarity score