Databases: Some Research Opportunities For Latin

Databases: Some Research
Opportunities For Latin America
Marcelo Arenas
Pontificia Universidad Católica de Chile
Goal Of This Talk
Present an interesting area of research
in databases
Has been identified as an important area
Has enough open problems for many
research projects
Needs theoretical and practical research
Has been the subject of some research
projects in Latin America
The Problem Of Sharing Data
Main challenges:
Data may reside at several different sites
Data may be stored in several
different ways
Schema level: name, employee_name,
emp_name, …
Format level: Relational databases, XML,
plain text, …
The Problem Of Sharing Data
name
Peter
John
Global
database
<employee>
<name> John </name>
<name> Phil </name>
</employee>
emp_name
Peter
Ron
Data Exchange
Transform data structured under a source
schema into data structured under a
target schema
∑ST
S
T
Data Exchange
Emp(X)
Emp(name)
Emp
name
Peter
John
Worker(X)
Worker(name)
Worker name
Peter
John
Worker name
Ron
Data Exchange: Main Challenges
Emp(X)
Emp(name)
Worker(X)
Worker(name)
What is a good rule language?
Emp(name)
Emp(name, phone)
Worker(name, salary)
Worker(name, salary)
Data Exchange: Main Challenges
Rule language: precise semantics and
good expressive power
Emp(name)
E
Emp(X)
Y Worker(X,Y)
Worker(name, salary)
How can we translate the source data?
Emp
name
Peter
John
Can we do this efficiently?
Data Exchange: Main Challenges
Emp(name)
Emp
name
Peter
John
E
Emp(X)
Y Worker(X,Y)
Worker(name, salary)
Worker name salary
Peter NULL1
100K
NULL
John NULL2
120K
NULL
What is a good translation?
Data Exchange: Main Challenges
Emp(name)
Emp
name
Peter
John
E
Emp(X)
Y Worker(X,Y)
Worker(name, salary)
Worker name salary
Peter NULL1
John NULL2
Does Peter have a salary?
What is the salary of Peter?
How do we answer target queries?
Data Exchange: Relational Databases
Data exchange has been extensively
studied in the relational world
IBM Almaden, UCSC and UofT
It has also been implemented: Clio (DB2)
Semantics of data exchange has been
precisely defined
Efficient algorithms for translating source
data and answering target queries have
been developed
Ongoing Work
XML data exchange
Metadata management
XML Data Exchange
Transform XML data structured
What under
is the a
XML query language: Navigational
capabilities
difference?
source
schema
into
data
structured
under
XML schema: Powerful schema language
a target schema.
∑ST
S
T
XML document: Data is semi-structured
XML Document: Example
<company>
<employee>
<name> Peter Buneman </name>
</employee>
<employee>
<name>
<first> Ron </first>
<last> Fagin </last>
</name>
</employee>
</company>
Data Exchange: Relational And XML
Relational
XML schema
schema
XML schema
Emp name
<employee>
<name> Peter
</name>
Peter
<name> John
</name>
John
</employee>
We can do the same for other data formats!
XML Data Exchange: Our Contribution
Ongoing project: U. Edinburgh, UofT and
PUC Chile
Results: Fundamental problems of XML
data exchange has been solved
XML Data Exchange: Our Contribution
XML schema
XML schema
Semantics of XML data exchange
has been precisely defined
Efficient algorithms for answering
Efficient algorithms
target
for translating
queries have also been developed
source data have been developed
Rule language: precise semantics and
good expressive power
What Else Has To Be Done?
Ongoing Work
XML data exchange
Metadata management
Metadata Management
Process of creating schema mappings is
time-consuming
We need tools to manage schema
mappings automatically
Metadata Management: Composition
∑ST
S
T
∑TU
∑SU
Composition: ∑SU = ∑ST o ∑TU
U
Metadata Management: Inverse
∑ST
S
T
Inverse: ∑TU = (∑UT)-1
∑TU
∑UT
∑SU = ∑ST o (∑UT)-1
Composition: ∑SU = ∑ST o ∑TU
U
Metadata Management: More Operators
∑ST
S
T
What do we do
in this case?
∑SW
∑TU
∑WU
W
U
Metadata Management For Data
Exchange Systems
General metadata management
framework was proposed by Bernstein
Based on generic schema-mapping
operators: Composition, Inverse, ...
Has been studied for the case of
relational databases
Microsoft, IBM Almaden and UCSC
Composition operator has been
extensively studied
Metadata Management For Data
Exchange Systems:
Our (proposed) Contribution
Starting project: IBM Almaden
and PUC Chile
Two main components:
Continue the study of the relational
metadata operators
Extend the framework to XML data
exchange systems
Thank You!
Another Interesting Area: RDF
What is RDF? A framework for
representing information in the
Web (W3C)
Graph data model
RDF: Example
John
Person
rdf:type
rdf:sc
Employee
rdf:type
Company
Peter
works_in
Microsoft
rdf:type
RDF: Possible Applications
Web metadata
Automatization of information processing
on the Web by Agents
RDF Databases: Motivation
Large volumes of RDF data
Use of RDF data in ways unpredicted
when first designed
Need to design reliable tools to manage
RDF data
RDF Databases: Motivation
“Perhaps most interesting is the research
opportunities suggested by the term “semantic
Web.” While it may be unclear what the concept
truly entails, much of the recent work has
centered on “ontologies.” [...] The database
community should be looking for opportunities
to exploit these developments in future
database management systems.”
The Lowell Database Research SelfAssessment Meeting, May 2003
RDF Databases: Our Contribution
Foundations of RDF databases: U. Chile,
CWR and UofT
Querying RDF databases: U. Chile, CWR,
U. Talca and PUC Chile
Querying RDF Databases
John
Person
rdf:type
rdf:sc
Employee
rdf:type
rdf:type
Company
Peter
works_in
Microsoft
rdf:type
Querying RDF Databases:
Our Contribution
SPARQL: A query language for RDF
Graph-matching query language
W3C Candidate Recommendation
6 April 2006
SPARQL: Example
?X
John
?Y
?X :- (?X, works_in, Microsoft)
john@puc.cl
John
email john@puc.cl
rdf:type Peter
rdf:sc
Person
Employee
rdf:type
Peter
?X, ?Y :- (?X, rdf:type, Employee)
Company
?X
OPTIONAL (?X, email,
?Y)
Peter
works_in
Microsoft
rdf:type
SPARQL: Our Contribution
We consider a fragment of SPARQL which
encompasses all the main issues yet is simple
to formalize.
We provide a formal semantics for this
fragment.
We study the complexity of evaluating queries.
Provide complexity bounds.
We propose some optimizations techniques.
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.
Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft,
and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.