CS490W
Luo Si
Department of Computer Science Purdue University
XML data and Retrieval XML and Retrieval: Outline Outline:
Semi-Structure Data
XML, Examples, Application
XML Search
XQuery XIRQL
Text-Based XML Retrieval
Vector-space model INEX
Semi-Structured Data
XML has been used as the standard representation of Semi- Structured Data
eXtensible Markup Language is a W3C-recommended general-purpose markup language that supports a wide variety of applications. A framework for defining markup languages Open vocabulary for tags Each set of XML corresponds to different applications facilitate the sharing of data across different information
systems, particularly systems connected via the Internet
Examples: RSS, XHTML, MathML
Semi-Structured Data
Structure of XML
XML data is organized by documents like unstructured data There are structures (nodes/tags) within the documents Each XML document is an ordered, labeled tree Element Nodes are labeled with
Node name (e.g., chapter) Node attributes and the values (e.g., size=1000; time=01/01/2007) May have child nodes or data
Data exist (e.g., text strings) within leaf nodes
XML Example
<book id=“ML_Tom”> <title>Machine Learning</title> <author> <firstname>Tom</firstname> <surname>Mitchell</surname> </author> ... <p>Machine Learning Applications...</p> ... </book> Elements, Attributes/Values, Data(Text String)
XML Example
<book id=“ML_Tom”> <title>Machine Learning</title> <author> <firstname>Tom</firstname> <surname>Michael</surname> </author> ... <p>Machine Learning Applications...</p> ... </book> Elements, Attributes/Values, Data(Text String) book title author title para para chapter chapter surname firstname para