Joining Ranked Input In Practice
Ihab F. Ilyas Purdue University
Joint work with
Walid G. Aref Ahmed K. Elmagarmid Purdue University Hewlett Packard
Joining Ranked Input In Practice Ihab F. Ilyas Purdue University - - PowerPoint PPT Presentation
Joining Ranked Input In Practice Ihab F. Ilyas Purdue University Joint work with Walid G. Aref Ahmed K. Elmagarmid Purdue University Hewlett Packard Motivation Almost every application that depends
Ihab F. Ilyas Purdue University
Joint work with
Walid G. Aref Ahmed K. Elmagarmid Purdue University Hewlett Packard
– Similarity queries on multiple features. – Query by multiple examples. – Joining multiple ordered streams. – Documents ranking on multiple keywords search. – Document search results from multiple search engines.
(Demo - ICDE2002)
– Build a database system prototype that has the capabilities to manipulate continuous media objects. – Allow for flexible querying of these objects. – Store media objects inside the database and present the results as
– Modify the different engine components to satisfy the new functionality requirements and the new data types.
– Medical video database application on top of the system.
“get 4 video shots that are most similar in color to a given query image”
“get 4 video shots that are most similar in color, texture and edge histogram to a given query image”
Color Histogram Edge Histogram Texture Tamura Query
Color Histogram Edge Histogram Texture Tamura Query
the top K total scores.
A D C B E A B E C D 1 2 3 4 5 5 4 3 2 1
SELECT Images.name FROM GlobalRank( Images, Color, Texture, QueryImage.jpg, ScoreFunction, 10) ;
– Implementation outside the SQL engine loose efforts of the query optimizer. – A chance to be shuffled with other operators in the plan for better performance (pushing down predicates and projections) – In brief, under the optimizer control !
RJOIN Sorted Input 1 Sorted Input 2 Sorted Input n …… <oid,score1,…> <oid,score2,…> <oid,scoren,…> <oid, f(score1, score2, …scoren), …> Sorted output
NRA-RJ NRA-RJ Sorted Input Sorted Input Sorted Input Input
D A B C NRA-RJ NRA-RJ Sorted Input Sorted Input Sorted Input Input
D A B C
Output stream has no specific grade/score attached not a valid input. Consider all the inputs together more context, faster termination. Less flexible. A multi-way
leaf-level of the query plan. Allow inputs to have ranges
Report output objects with a range from worst to best grade. NRA-RJ: Consider some of the inputs at each stage less context. NRA-RJ: More flexible, pipelined special join
well in the query plan.
If Queue.Top.WorstGrade > Maximum (Best Grades of all other
Return the tuple.
LOOP
Left.GetNext() Right.GetNext() Compute threshold. Update best and worst grades of objects in Queue. If Queue.Top.WorstGrade > Maximum (Best Grades of all other
break
End Loop Remove from Queue Return tuple.
A (10-10) B (4 - 4) C (3 - 3) D (2 - 2) E (1 - 1) A (5 - 5) D (4 - 4) C (3 - 3) B (2 - 2) E (1 - 1) A(15-15) D (7 - 7) C (6 - 6) E (5 - 5) B (4 - 4) A (3 - 3) A(15-15) A(15-22) D(7-22) B(4 – 8) D(4 – 8) C(6 – 6) B(4 – 7) D(4 – 7) C(6 – 6) B(6 – 6) D(6 – 6) C(6 – 6) A(15-21) D(7-13) C(12– 12) A(15-21)
Buffer Queue Buffer Queue 1 2
n
NRA RJ NRA RJ
Input A Input B Input C m m n Input A Input B Input C n/p m’ m’
C : 4 D: 3 A : 10 B : 5 E : 2 A : 1 B : 5 C : 4 D: 3 E : 2 A (10-14) B (10-10) C (4 – 9)
HD indexes.
STOP_AFTER and the NN operators.
the database.
NRA-RJ is sensitive to input
2 4 6 8 10 12 14 20 40 60 80 100 120
K CPU Time
O1 O2 O3 O4 O5 O6 1000 2000 3000 4000 5000 6000 20 40 60 80 100 120
K Buffer Size
O1 O2 O3 O4 O5 O6
K NRA-RJ NRA J* 5 3.692383 2.699707 4.881836 10 12.87939 9.37793 10.77881 15 17.28613 13.97217 15.12305 20 19.15967 15.3457 16.49658 25 20.65723 16.62451 17.71289 30 23.05957 19.27686 20.36523 35 23.83545 20.27148 21.39111 40 25.38232 21.78711 22.87549 45 26.87256 23.25537 25.11572 50 28.7417 25.71826 27.45361 55 29.88037 27.04443 28.93604 60 30.51514 27.9917 29.72705 65 31.16211 28.60742 30.37402 70 32.28662 30.07568 31.77979 80 33.29834 32.1123 33.78516
J* is less sensitive to input
100 200 300 400 500 600 20 40 60 80 100 120
K CPU Time
O1 O2 O3 O4 O5 O6 1000 2000 3000 4000 5000 6000 7000 8000 20 40 60 80 100 120
K Buffer Size
O1 O2 O3 O4 O5 O6 20 40 60 80 100 120 20 40 60 80 100 120
K Page Accesses
O1 O2 O3 O4 O5 O6
100 200 300 400 500 600 20 40 60 80 100 120
K CPU Time
NRA-RJ(1) NRA-RJ(2) J*(1) J*(2) 1000 2000 3000 4000 5000 6000 7000 8000 20 40 60 80 100 120
K Buffer Size
NRA-RJ(1) NRA-RJ(2) J*(1) J*(2) 20 40 60 80 100 120 140 20 40 60 80 100 120
K Page Access
NRA-RJ(1) NRA-RJ(2) J*(1) J*(2)
Choosing the input ordering is an important optimizer decision.
5 10 15 20 25 30 35 40 20 40 60 80 100 120
K Page Accesses
NRA-RJ NRA J* 500 1000 1500 2000 2500 20 40 60 80 100 120
K Buffer Size
NRA-RJ NRA J* 5 10 15 20 25 30 20 40 60 80 100 120
K CPU Time
NRA-RJ NRA J*
20 40 60 80 100 120 140 160 180 2 3 4 5 6
Pipeline length Page Accesses
NRA-RJ NRA J* 1000 2000 3000 4000 5000 6000 7000 8000 9000 2 3 4 5 6
Pipeline length Buffer Size
NRA-RJ NRA J* 50 100 150 200 250 300 350 2 3 4 5 6
Pipeline length CPU Time
NRA-RJ NRA J*