G e t t i n g t h e M o s t O u t o f W i k i - - PowerPoint PPT Presentation

g e t t i n g t h e m o s t o u t o f w i k i d a t a
SMART_READER_LITE
LIVE PREVIEW

G e t t i n g t h e M o s t O u t o f W i k i - - PowerPoint PPT Presentation

G e t t i n g t h e M o s t O u t o f W i k i d a t a S e m a n t i c T e c h n o l o g y U s a g e i n W i k i p e d i a s K n o w l e d g e G r a p h S t a s Ma


slide-1
SLIDE 1

Markus Krötzsch: Wikidata Toolkit Kickoff

G e t t i n g t h e M

  • s

t O u t

  • f

W i k i d a t a

S e m a n t i c T e c h n

  • l
  • g

y U s a g e i n W i k i p e d i a ’ s K n

  • w

l e d g e G r a p h

S t a s Ma l y s h e v , Ma r k u s K r ö t z s c h , L a r r y G

  • n

z á l e z , J u l i u s G

  • n

s i

  • r

, A d r i a n B i e l e f e l d t

Wi k i m e d i a F

  • u

n d a t i

  • n

T U D r e s d e n

P r e s e n t a t i

  • n
  • f

p a p e r p u b l i s h e d a t t h e I n t e r n a t i

  • n

a l S e m a n t i c We b C

  • n

f e r e n c e 2 1 8 D

  • w

n l

  • a

d : h t t p s : / / i c c l . i n f . t u

  • d

r e s d e n . d e / w e b / I n p r

  • c

e e d i n g s 3 4 4 / e n

All slides CC-BY 3.0

slide-2
SLIDE 2
slide-3
SLIDE 3

T i m e l i n e

  • f

t r

  • p

i c a l c y c l

  • n

e s

slide-4
SLIDE 4

N u m b e r

  • f

t r

  • p

i c a l c y c l

  • n

e s p e r y e a r

slide-5
SLIDE 5
slide-6
SLIDE 6

T h e W i k i d a t a Q u e r y S e r v i c e

www.wikidata.org Relational Database (MySQL)

Wi k i We b s i t e

slide-7
SLIDE 7

T h e W i k i d a t a Q u e r y S e r v i c e

www.wikidata.org Relational Database (MySQL) Graph Database (BlazeGraph) query.wikidata.org

Load balancing/caching

Wi k i We b s i t e Q u e r y S e r v i c e

slide-8
SLIDE 8

T h e W i k i d a t a Q u e r y S e r v i c e

www.wikidata.org Relational Database (MySQL) Graph Database (BlazeGraph) query.wikidata.org

Linked data export Load balancing/caching Change monitoring

Wi k i We b s i t e Q u e r y S e r v i c e L i f e S y n c h .

slide-9
SLIDE 9

“ W h e r e a r e p e

  • p

l e b

  • r

n w h

  • t

r a v e l t

  • s

p a c e ? ”

( C

  • l
  • u

r

  • c
  • d

e d b y g e n d e r )

slide-10
SLIDE 10

“ W h i c h 1 9

t h

c e n t u r y p a i n t i n g s s h

  • w

t h e m

  • n

? ”

slide-11
SLIDE 11

“ W h i c h d a y s

  • f

t h e w e e k d

  • d

i s a s t e r s

  • c

c u r

  • n

? ”

slide-12
SLIDE 12

“ T h e f r e e k n

  • w

l e d g e b a s e t h a t a n y

  • n

e c a n e d i t ” E n t i t i e s : 5 M S t a t e me n t s : 5 7 M L a b e l s : 2 6 M D e s c r i p t i

  • n

s : 1 . 5 B L i n k s t

  • Wi

k i s : 6 5 M

slide-13
SLIDE 13

“ T h e f r e e k n

  • w

l e d g e b a s e t h a t a n y

  • n

e c a n e d i t ” E d i t

  • r

s : > 2 3 K

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

F r

  • m

W i k i d a t a

( r i c h g r a p h s )

t

  • R

D F

( p l a i n g r a p h s )

Q80 Q4273323 [ E r x l e b e n e t a l . , I S W C 2 1 4 ]

“ يلزرنريبميت”@ar “ 提姆 · 柏納 - 李” @zh “Тим Бернерс-Ли”@ru “ - ילסרנרבםיט”@he “Tim Berners-Lee”@en

label label

“Queen Elizabeth Prize for Engineering”@en

slide-18
SLIDE 18

F r

  • m

W i k i d a t a

( r i c h g r a p h s )

t

  • R

D F

( p l a i n g r a p h s )

Q80 Q4273323 wds:Q80-...

p:P166 ps:P166

[ E r x l e b e n e t a l . , I S W C 2 1 4 ]

“ يلزرنريبميت”@ar “ 提姆 · 柏納 - 李” @zh “Тим Бернерс-Ли”@ru “ - ילסרנרבםיט”@he “Tim Berners-Lee”@en

label label

“Queen Elizabeth Prize for Engineering”@en

slide-19
SLIDE 19

F r

  • m

W i k i d a t a

( r i c h g r a p h s )

t

  • R

D F

( p l a i n g r a p h s )

Q80 Q4273323 wds:Q80-...

p:P166 ps:P166

“2013”^^xsd:gYear

pq:P585

Q214129

p q : P 1 7 6

[ E r x l e b e n e t a l . , I S W C 2 1 4 ] Q92743 Q3083180

p q : P 1 7 6 pq:P1706

Q62882

pq:P1706

“ يلزرنريبميت”@ar “ 提姆 · 柏納 - 李” @zh “Тим Бернерс-Ли”@ru “ - ילסרנרבםיט”@he “Tim Berners-Lee”@en

label label

“Queen Elizabeth Prize for Engineering”@en

slide-20
SLIDE 20

F r

  • m

W i k i d a t a

( r i c h g r a p h s )

t

  • R

D F

( p l a i n g r a p h s )

wdt:P166

Q80 Q4273323 wds:Q80-...

p:P166 ps:P166

“2013”^^xsd:gYear

pq:P585

Q214129

p q : P 1 7 6

[ E r x l e b e n e t a l . , I S W C 2 1 4 ] Q92743 Q3083180

p q : P 1 7 6 pq:P1706

Q62882

pq:P1706

“ يلزرنريبميت”@ar “ 提姆 · 柏納 - 李” @zh “Тим Бернерс-Ли”@ru “ - ילסרנרבםיט”@he “Tim Berners-Lee”@en

label label

“Queen Elizabeth Prize for Engineering”@en

slide-21
SLIDE 21

W i k i d a t a R D F E x p

  • r

t s

 We

e k l y f u l l d u mp s

C u r r e n t l y 6 . 2 b i l l i

  • n

t r i p l e s ( 4 2 G B T u r t l e g z i p c

  • mp

r e s s e d )

A t h t t p s : / / d u mp s . w i k i me d i a .

  • r

g / w i k i d a t a w i k i / e n t i t i e s /

 L

i n k e d D a t a E x p

  • r

t s

L i v e d a t a i n ma n y f

  • r

ma t s

E . g . , h t t p : / / w w w . w i k i d a t a .

  • r

g / w i k i / S p e c i a l : E n t i t y D a t a / Q 4 2 . n t

slide-22
SLIDE 22

22

W i k i d a t a S P A R Q L Q u e r y S e r v i c e

O f fj c i a l q u e r y s e r v i c e s i n c e mi d 2 1 5

U s e r i n t e r f a c e a t h t t p s : / / q u e r y . w i k i d a t a .

  • r

g /

A l l t h e d a t a ( 6 . 2 B t r i p l e s ) , l i v e ( l a t e n c y < 6 s )

N

  • l

i mi t s ( w e l l , a l mo s t ) :

6 s e c t i me

  • u

t

N

  • l

i mi t

  • n

r e s u l t s i z e ( ! )

N

  • l

i mi t

  • n

p a r a l l e l q u e r i e s , b u t C P U

  • t

i me b u d g e t p e r c l i e n t

E x t r a S E R V I C E s i n S P A R Q L ( g e

  • ,

Wi k i p e d i a A P I , l a b e l s , …)

slide-23
SLIDE 23

A s i m p l e S P A R Q L q u e r y

slide-24
SLIDE 24

A s i m p l e S P A R Q L q u e r y

slide-25
SLIDE 25

A n a d v a n c e d S P A R Q L q u e r y

slide-26
SLIDE 26

“ I t ’ s t

  • c
  • m

p l i c a t e d ! ”

slide-27
SLIDE 27

“ I t ’ s n

  • t

t

  • c
  • m

p l i c a t e d ! ”

S P A R Q L i s w i d e l y u s e d

> 1 M r e q u e s t s p e r mo n t h ( 3 . 8 M p e r d a y ) i n 2 1 8

I t ’ s a n A P I – mo s t u s e r s a r e n

  • t

i n d i r e c t c

  • n

t a c t

T h e c

  • mmu

n i t y

  • f

f e r s t u t

  • r

i a l s , w

  • r

k s h

  • p

s a n d s u p p

  • r

t s e r v i c e s

slide-28
SLIDE 28

“ I t d

  • e

s n

  • t

s c a l e ! ”

slide-29
SLIDE 29

“ I t d

  • e

s n

  • t

s c a l e ! ”

E x c e l l e n t a v a i l a b i l i t y a n d p e r f

  • r

ma n c e

5 %

  • f

q u e r i e s a n s w e r e d i n < 4 ms ( 9 5 % i n < 4 4 ms ; 9 9 % i n < 4 s )

L e s s t h a n . 5 %

  • f

q u e r i e s t i me

  • u

t

S e r v i c e h a s n e v e r b e e n d

  • w

n s

  • f

a r

A f f

  • r

d a b l e s y s t e m s e t u p :

T h r e e c

  • mmo

d i t y s e r v e r s ( + t h r e e f

  • r

g e

  • r

e d u n d a n c y )

S t a n d a r d L i n u x l

  • a

d b a l a n c i n g + s t a n d a r d H T T P c a c h e

A l l s

  • f

t w a r e / c u s t

  • mi

s a t i

  • n

s f r e e &

  • p

e n s

  • u

r c e

S e e h t t p s : / / g i t h u b . c

  • m/

w i k i me d i a / w i k i d a t a

  • q

u e r y

  • r

d f

slide-30
SLIDE 30

S

  • w

h a t a r e t h

  • s

e 1 M s

  • f

q u e r i e s ?

We l

  • k

e d a t 4 8 1 , 7 1 6 , 2 8 q u e r i e s l

  • g

g e d d u r i n g 2 4 w e e k s

A n a l y s i n g S P A R Q L q u e r y a c t i v i t y i s h a r d

E x t r e me i n fm u e n c e

  • f

s c r i p t s a n d b

  • t

s

D

  • e

s n

  • t

a v e r a g e

  • u

t

  • v

e r t i me , e a c h mo n t h l

  • k

s r a t h e r d i f f e r e n t ! → C l a s s i f y ma j

  • r

s

  • u

r c e s ( b

  • t

s ) a n d i s

  • l

a t e “

  • r

g a n i c ” p a r t

  • f

t h e t r a f fj c

slide-31
SLIDE 31

R

  • b
  • t

i c a n d

  • r

g a n i c t r a f fj c

R

  • b
  • t

i c t r a f fj c

D

  • mi

n a t e s ( 6 %

  • f

q u e r i e s b y t

  • p
  • 3

b

  • t

s )

Mo s t l y d a t a i n t e g r a t i

  • n

a n d d a t a d

  • w

n l

  • a

d

Mo r e u n i f

  • r

m, s h

  • r

t e r O r g a n i c t r a f fj c

Mu c h s ma l l e r v

  • l

u me ( . 6 %

  • f

a l l q u e r i e s )

B r

  • w

s e r s , mo b i l e a p p s , mi s c e l l a n e

  • u

s

Mo r e d i v e r s e , l

  • n

g e r P a t h q u e r i e s a r e v e r y i mp

  • r

t a n t

  • R

e i fj e d s t a t e me n t s i n 4 %– 1 %

  • f

q u e r i e s

slide-32
SLIDE 32

S e e f

  • r

y

  • u

r s e l f !

We h a v e r e l e a s e d c

  • mp

l e t e , t i me s t a mp e d q u e r y l

  • g

s

A n

  • n

y mi s e d t

  • a

v

  • i

d u s e r i d e n t i fj c a t i

  • n

Wi t h l i mi t e d u s e r a g e n t i n f

  • r

ma t i

  • n

F u l l d a t a s e t , n

  • s

a mp l e !

C u r r e n t l y 1 2 w e e k s i n 2 1 7 – mo r e t

  • c
  • me

s

  • n

h t t p s : / / k b s . i n f . t u

  • d

r e s d e n . d e / Wi k i d a t a S P A R Q L

slide-33
SLIDE 33

C

  • n

c l u s i

  • n

s

 S

e ma n t i c w e b t e c h n

  • l
  • g

y – i t w

  • r

k s !

I n t e r a c t i v e a n a l y t i c s a n d q u e r y i s a f f

  • r

d a b l e f

  • r

d y n a mi c k n

  • w

l e d g e g r a p h s w i t h > 1 ^ 9 e d g e s

U s a b l e f

  • r

l a r g e ,

  • p

e n c

  • mmu

n i t i e s w i t h

  • u

t p r i

  • r

R D F / S P A R Q L e x p e r i e n c e

We w a n t mo r e a p p l i c a t i

  • n

s & mo r e r e s e a r c h !

slide-34
SLIDE 34

T h a n k s

 D

e n n y V r a n d e c i c , L y d i a P i n t s c h e r , a n d t h e w h

  • l

e Wi k i me d i a D e u t s c h l a n d e . V . t e a m i n B e r l i n w h

  • ma

d e Wi k i d a t a p

  • s

s i b l e

 B

r a d B e b e e , B r y a n T h

  • mp

s

  • n

, a n d a l l

  • f

t h e B l a z e G r a p h t e a m

 A

n y

  • n

e c

  • n

t r i b u t i n g t

  • R

D F a n d S P A R Q L l i b r a r i e s

 A

l l w h

  • c
  • n

t r i b u t e d t

  • W3

C s t a n d a r d s u s e d h e r e , e s p . S P A R Q L

 T

h e Wi k i d a t a c

  • mmu

n i t y

 T

i mB L ;

  • )
slide-35
SLIDE 35

F i l m s w i t h f u t u r e h e a d s

  • f

g

  • v

e r n m e n t

slide-36
SLIDE 36

L i t e r a t u r e

S t a n i s l a v Ma l y s h e v , Ma r k u s K r ö t z s c h , L a r r y G

  • n

z á l e z , J u l i u s G

  • n

s i

  • r

, A d r i a n B i e l e f e l d t : “ G e t t i n g t h e Mo s t

  • u

t

  • f

Wi k i d a t a : S e ma n t i c T e c h n

  • l
  • g

y U s a g e i n Wi k i p e d i a ’ s K n

  • w

l e d g e G r a p h ” I n D e n n y V r a n d ei , e t a l . , e d s . , P r

  • c

e e d i n g s

  • f

t h e č ć 1 7 t h I n t e r n a t i

  • n

a l S e ma n t i c We b C

  • n

f e r e n c e ( I S WC ' 1 8 )

A d r i a n B i e l e f e l d t , J u l i u s G

  • n

s i

  • r

, Ma r k u s K r ö t z s c h : “ P r a c t i c a l L i n k e d D a t a A c c e s s v i a S P A R Q L : T h e C a s e

  • f

Wi k i d a t a ” P r

  • c

e e d i n g s

  • f

t h e WWW2 1 8 Wo r k s h

  • p
  • n

L i n k e d D a t a

  • n

t h e We b ( L D O W- 1 8 ) , C E U R Wo r k s h

  • p

F r e d

  • E

r x l e b e n , Mi c h a e l G ü n t h e r , Ma r k u s K r ö t z s c h , J u l i a n Me n d e z , D e n n y V r a n d ei : “ I n t r

  • d

u c i n g Wi k i d a t a t

  • t

h e L i n k e d D a t a We b ” I n P r

  • c

e e d i n g s

  • f

t h e č ć 1 3 t h I n t e r n a t i

  • n

a l S e ma n t i c We b C

  • n

f e r e n c e ( I S WC 2 1 4 )

slide-37
SLIDE 37

S P A R Q L F e a t u r e D i s t r i b u t i

  • n

( 2 1 7 / 2 1 8 )

slide-38
SLIDE 38

38

T r i p l e s p e r q u e r y :

  • r

g a n i c

( b l u e )

/ r

  • b
  • t

i c

( y e l l

  • w

)

slide-39
SLIDE 39

39

L a n g u a g e s

  • f

l a b e l s i n

  • r

g a n i c q u e r i e s

slide-40
SLIDE 40

40

S P A R Q L f e a t u r e c

  • c

c u r r e n c e