Thomas Wood NLP/data science consultant Past projects Boehringer - - PowerPoint PPT Presentation
Thomas Wood NLP/data science consultant Past projects Boehringer - - PowerPoint PPT Presentation
Thomas Wood NLP/data science consultant Past projects Boehringer Ingelheim - pharma CV Library: Predict industries/salaries from CV - word2vec + CNN Predict search terms from CV - LSTM Forensic stylometry demo
Past projects
- Boehringer Ingelheim - pharma
- CV Library:
○ Predict industries/salaries from CV - word2vec + CNN ○ Predict search terms from CV - LSTM
- Forensic stylometry demo
○ Identifying author
- Chatbots
○ Intelligent home etc ○ Question answering about products
- Document clustering, classification, trend detection, sentiment analysis
- Cambridge Masters: anaphora resolution it’s raining
Boehringer Ingelheim
- Before running a clinical trial a
pharma company writes a 200 page PDF called a protocol.
- I developed an ML model which
extracts important data from the protocol: type of treatment, toxicity, number of subjects, etc.
Boehringer Ingelheim (2)
- Company has factories all over the world. Most medicines go through
multiple facilities and countries before going to market.
- When manufacturing defect occurs it is written in free text in local
language by factory worker, e.g. temperature deviation of 5 degrees due to crack in vial probably occurring in transit
- I ran unsupervised topic detection to identify commonest problems in
various categories of products from the unstructured text data.
CV-Library
- Upload CV
- Goes through word2vec
- Recommends industry
- Use TensorFlow NMT to
recommend search term
○ Repurposed Viet translator
- Trained on 12 million CVs
- Deployed on GCP
- 7% increase in signups - £££
When you upload a CV, it gets converted to TXT and passed through deep NN ... Then some fields which candidate previously filled out, get autofilled! Result: more engagement, fewer dropouts This was 2.5 years ago, before ELMO/BERT
Chatbots Artificial Solutions
- Worked building chatbots for mobile and
web
- Shell, AT&T, IKEA, Samsung, HTC, Rightmove
- Integrated smart home with voice
commands
○ turn on the coffee machine every Tuesday when I
- pen the downstairs front door
Forensic stylometry
- https://www.fastdatascience.com/author-prediction-demo
- Oxford University workshop on NLP every summer
Document analysis, trend detection
- Developed NLP pipeline for English and
German at Pattern Science AG, near Frankfurt
- Used for document classification
- Trend detection
- Emerging topics
Masters Cambridge
- Unsupervised learning for identifying
pleonastic pronouns
○ It seemed that things would never get any better ○ It surprised me to hear him say that
- Download available