Chemical Insights from a Random Forest Prediction of Molecular - - PowerPoint PPT Presentation

chemical insights from a random forest prediction of
SMART_READER_LITE
LIVE PREVIEW

Chemical Insights from a Random Forest Prediction of Molecular - - PowerPoint PPT Presentation

Chemical Insights from a Random Forest Prediction of Molecular Quantum Properties Beomchang Kang Seoul National University 2019.11.8, 1st XAIENCE Conference Fluorescent molecule Bio-imaging Specification Cell organelles


slide-1
SLIDE 1

Chemical Insights from a Random Forest Prediction

  • f Molecular Quantum Properties

Beomchang Kang Seoul National University 2019.11.8, 1st XAIENCE Conference

slide-2
SLIDE 2

Fluorescent molecule

  • Bio-imaging
  • Specification
  • Cell organelles
  • Proteins
  • Observation
  • Structure
  • Dynamics
slide-3
SLIDE 3

Good fluorescent molecule?

  • High quantum yield in visible area
  • Distinctive color
  • Low toxicity
  • High synthetic ability
slide-4
SLIDE 4

Towards discovery of novel and effective fluorescent molecules

  • Prediction of quantum properties for a given molecule
  • High quantum yield
  • Distinctive color
  • Searching the chemical space for molecules of

desired properties

slide-5
SLIDE 5

Today, I focus on…

  • Prediction of
  • Oscillator strength to get high quantum yield
  • Excitation energy
  • Gaining chemical insight from Random Forest results
slide-6
SLIDE 6

Excitation Energy

  • Energy difference between 2 state
  • Electronic transition
  • Determines color
slide-7
SLIDE 7

Oscillator strength (OS)

  • Dimensionless quantity
  • Probability of electromagnetic radiation
  • Absorption or emission
  • Transitions between energy levels
  • To have high OS (Oscillator Strength)
  • Orbital shapes of the two states must be

different

slide-8
SLIDE 8

Methods

slide-9
SLIDE 9

Prediction of molecular properties

Molecule

Predictor

Property

slide-10
SLIDE 10

PubChemQC Database

  • Molecular quantum calculation
  • DFT
  • TD-DFT
  • From PubChem
  • Really synthesized
  • Molecular orbitals
  • Quantum properties
  • Classical properties
slide-11
SLIDE 11

Data set for RF

  • From PubchemQC
  • Only H, B, C, N, O, F, P, S, Cl
  • Only neutral molecules
  • Randomly selected 0.5 M compounds
  • Training:Test = 9:1
slide-12
SLIDE 12

RandomForest

  • Advantage
  • Simple
  • White-box
  • Feature importance
  • From feature importance
  • Chemical Insight
  • To be compared with deep learning methods
slide-13
SLIDE 13

Extended Circular FingerPrint [ECFP]

  • 2D Molecule -> Identifiers
  • Parameter - Radius
  • Bit vector of ECFP
  • Hashing
  • One-hot encoding (binary)
  • Parameter - # of bits
slide-14
SLIDE 14

Results & Discussion

slide-15
SLIDE 15

RF result - Excitation Energy

  • RMSE 0.4500(eV)
  • PearsonR 0.8689
slide-16
SLIDE 16

RF result -Oscillator strength

  • RMSE 0.066
  • PearsonR 0.7300
slide-17
SLIDE 17

0.5 M set

Mean Median std 0.042 0.009 0.096

slide-18
SLIDE 18

Feature importance to Fragments

1 … 6128 6129 6130 … 16384 0.xxx 0.xxx 0.022 0.xxx 0.xxx Many Fragments…

slide-19
SLIDE 19

RandomForest - Feature importance

  • Oscillator strength
  • ECFP6
  • n_bit = 16384
  • Feature Importance > 0.02

Bit number 6129 Cc1=cc=c(o1)c=C Oscillator strength 0.4690

Bit Number Feature Importance # of Fragments

9352 0.0330 115 8017 0.0251 107 6192 0.0218 129

slide-20
SLIDE 20

Important Fragments

Fragment radius Mean OS # of molecules 1 0.175 10590 3 0.175 4 2 0.342 9 3 0.211 11 1 0.207 6263 3 0.101 4

  • # of molecules which have tag fragment > 3
  • Feature importance > 0.02
  • ECFP6, 16384 vector
  • Average of OS > 0.1
slide-21
SLIDE 21

Fragment of high OS

  • C(=C)c(c)o
  • Radius = 2
  • 9 molecules
  • Mean of OS = 0.342
slide-22
SLIDE 22

Ethyl 5-ethenylfuran-2-carboxylate OS = 0.5230

slide-23
SLIDE 23

5-ethenyl-3H-1,3-oxazole-2-thione OS = 0.4790

slide-24
SLIDE 24

ethyl 2-(5-ethenylfuran-2-yl)propanoate OS = 0.4730

slide-25
SLIDE 25

Thank You!