Result Clustering for Keyword Search on Graphs Madhulika Mohanty - PowerPoint PPT Presentation
Result Clustering for Keyword Search on Graphs Madhulika Mohanty Supervisor: Dr Maya Ramanath Common data formats across the Web Easily interpretable by machines Web of data LINKED DATA Collection of knowledge bases.
Result Clustering for Keyword Search on Graphs Madhulika Mohanty Supervisor: Dr Maya Ramanath
● Common data formats across the Web ● Easily interpretable by machines → “Web of data”
LINKED DATA ● Collection of knowledge bases. ● All the knowledge bases are interlinked. ● Represented as RDF. ● RDF : Resource Description Framework ● Data model to represent structured data ● Triples: <subject> <predicate> <object> ● Example: <Tom_Hanks> <ActedIn> <Cast_Away> ActedIn Tom Hanks Cast Away <Tom_Hanks> <ActedIn> <Forrest_Gump> Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Sample YAGO graph 1 1 http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
Querying graphs ● SPARQL queries – structured queries – Structured results – eg. Graph databases like Neo4j ● Natural Language queries → SPARQL → Structured results ● Relationship queries – unstructured text
Relationship queries ● Unstructured text, like Google. ● Answers are relationships among queried entities. ● More popularly known as “Keyword Search”. ● Why Keyword Search? – Make graphs query-able by casual users. – Find interesting relationships – even surprise discoveries.
Jeff Weiner Mark Zuckerberg
I bet you know this.. Jeff Weiner Mark Zuckerberg
Now that's interesting!! Jeff Weiner Mark Zuckerberg
Another interesting one.. Mausam Nobel Prize winner - Edwin G. Krebs Bill Gates 14th Dalai Lama
Another interesting one.. Mausam Doctorate Faculty Honorary Doctorate Honorary Nobel Prize winner - Doctorate Edwin G. Krebs Bill Gates 14th Dalai Lama
Movie dataset graph Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA Actor
Movie dataset graph Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara Searching for IsA A I 'Hanks Wright' s s A I IsA Actor
Movie 1994 2011 2000 2006 IsA IsA A IsA r s InYear I InYear a A r e a r s Y a I e Y e n Y I n I n I The Girl with the Forrest Gump Larry Crowne Cast Away Casino Royale Dragon Tattoo A c t ActedIn e d Acted In Acted In I Acted In n n Acted In n I I d d e e t t c c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara I IsA s IsA A IsA Actor
Movie 1994 2011 2000 2006 IsA IsA A IsA s r InYear I InYear a A r e a r s Y a e I Y e n Y I n I n I The Girl with the Forrest Gump Larry Crowne Cast Away Casino Royale Dragon Tattoo A c t ActedIn e d Acted In Acted In Acted In I n n Acted In n I I d d e e t t c c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara I IsA s IsA A IsA Actor
Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA Actor
Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA ● Results are trees. Actor
Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA ● Results are trees. ● There should exist interconnection between all pairs of keyword nodes. Actor
Keyword Search in a Graph structured data Query Given a set of query keywords, Q = k 1 ,k 2 , ..... ,k n and a graph G =( V , E ) ; find top- K minimal answer trees A 1 , A 2 , .... , A k ordered by their relevance score.
Research Areas Query
Research Areas Query Efficiency
Research Areas Query Efficiency ● Ranking of results ● Quality of results
Research Areas Query User experience Efficiency ● Ranking of results ● Quality of results
Research Areas Query User experience Efficiency ● Ranking of results ● Quality of results
Searching for 'Rekha Bachchan'
Searching for 'Rekha Bachchan' 18 such results
Searching for 'Rekha Bachchan' 18 such results Different contexts
User experience ● All kinds of results shown. ● Multiple results of same type. Eg. Amitabh and Rekha were co-actors in multiple movies. – Most of them ranked high. – User is forced to scroll through all before finding new answers. ● Results with different contexts. – User might completely miss some information.
User experience ● All kinds of results shown. ● Multiple results of same type. Eg. Amitabh and Rekha were co-actors in multiple movies. – Most of them ranked high. – User is forced to scroll through all before finding new answers. ● Results with different contexts. – User might completely miss some information. ● One way to deal with it – Clustering similar results.
Result clustering ● Cluster similar results together. ● Rank the clusters. ● Show one representative per cluster (Highest Ranked Tree). – User may click it and see all results. ● Advantages: – Can be used with any existing Keyword Search algorithm. – Provides user with a bird's eye view over the results. – Easy to analyze interesting patterns.
Result clustering (contd.) Isomorphism Tree Edit distance Language Model based based (LM) based ● Cluster isomorphic ● Clustering based on tree- ● Agglomerative Complete Link trees together. edit distance with a similarity Clustering ● Two trees need to threshold of 0.9 ● Each tree represented as a have exact same ● Cannot differentiate LM. structure to be different contexts like the ● JS Divergence as similarity clustered together. “Amitabh Bachchan” and measure. ● Ends up generating “Bol Bachchan” case. too many clusters.
Clustering Quality measure: User evaluation ● Dataset: IMDB ● User evaluations over 20 manually selected queries. – Varying from 2-6 keywords in each. ● User was not aware of the underlying technique. ● Asked to rate on a scale of 1-5: – How similar trees are within a cluster? – How dissimilar trees are between different clusters?
Thank you
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.