Data Quality Challenges ACM JDIQ EiC Open Knowledge Networks - PowerPoint PPT Presentation
Data Quality Challenges ACM JDIQ EiC Open Knowledge Networks (Biomedicine) Data Science for Finance (DSfin) Louiqa Raschid Smith School of Business Computer Science and UMIACS Technical Review of Data Quality Provenance
Data Quality Challenges ● ACM JDIQ EiC ● Open Knowledge Networks (Biomedicine) ● Data Science for Finance (DSfin) Louiqa Raschid Smith School of Business Computer Science and UMIACS
Technical Review of Data Quality ● Provenance Cleaning Annotation ● Data cleaning infrastructure and tools: ○ Robust first generation. ○ Big data and scalability. ○ Human-in-the-loop (HumInt). ● Process ○ Fitness to task. ○ Understanding workflows.
First Gen methodologies and products
Technical Review of Data Quality ● Provenance Cleaning Annotation ● Data cleaning infrastructure and tools: ○ Robust first generation. ○ Big data and scalability. ○ Human-in-the-loop (HumInt). ● Process ○ Fitness to task. ○ Understanding workflows.
Scenarios ● Lung cancer data (primary) generated by clinicians: ○ Patient entity identification in clinical notes, e.g., JM, J.M., etc. (cleaning) ○ Scale: barthel 4 ○ Stages: P0T0, Stage4, etc. (annotation) ● Analytics over (secondary) sources: ○ Drug induced liver injury (DILI): phenotype includes elevated levels of liver enzymes, etc. ○ (fitness to task; HumInt): There are many causes for elevated liver enzymes including transplants, some infants, etc.
Scenarios ● Privacy preserving data mining: ○ Entity linkage in the de-identified space. ○ Different entries contribute hashed identifiers but they may be missing a variety of fields. (provenance; fitness to task) ● iASiS SEMANTIC Data Cleaning / Annotation Pipeline ● Finding patterns in OKN: DILI Case Study ○ (Provenance; fitness to task; HumInt; Annotation.)
iASiS
DILI Case Study o Given a knowledge graph and a DILI phenotype (keywords) ... o Create profiles, e.g., [Phenotype | Drug | Gene | Pathway] o Rank the DRUG at most risk for DILI.
DILI Case Study
DILI Case Study
Tamr: Understanding Workflows
Tamr: Understanding Workflows
Tamr: Understanding Workflows
Lessons learned ● First generation tools work well. ● Next generation needs to focus on processes and workflows and HumInt. ● Scientists still spend huge amounts of time on cleaning. How can we fix this problem? ● Is Open Knowledge Networks a solution? ● An unexpected case study ...
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.