INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch¨ utze’s, linked from http://informationretrieval.org/
IR 18/26: Finish Web Search Basics
Paul Ginsparg
Cornell University, Ithaca, NY
3 Nov 2009
1 / 74
INFO 4300 / CS4300 Information Retrieval slides adapted from - - PowerPoint PPT Presentation
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from http://informationretrieval.org/ IR 18/26: Finish Web Search Basics Paul Ginsparg Cornell University, Ithaca, NY 3 Nov 2009 1 / 74
Cornell University, Ithaca, NY
1 / 74
2 / 74
1
2
3
4
5
3 / 74
1
2
3
4
5
4 / 74
5 / 74
6 / 74
1
2
3
4
5
7 / 74
8 / 74
9 / 74
10 / 74
11 / 74
12 / 74
13 / 74
14 / 74
15 / 74
16 / 74
17 / 74
18 / 74
2
19 / 74
20 / 74
21 / 74
22 / 74
2
23 / 74
24 / 74
1
2
3
4
5
25 / 74
26 / 74
1
2
3
4
5
27 / 74
Pamela Anderson
Christina Aguilera
letras de canciones
28 / 74
29 / 74
30 / 74
1
2
3
4
5
31 / 74
32 / 74
A.Broder,R.Kumar,F.Maghoul,P.Raghavan,S.Rajagopalan,S. Stata, A. Tomkins, and
Strongly connected component (SCC) in the center Lots of pages that get linked to, but don’t link (OUT) Lots of pages that link to other pages, but don’t get linked to (IN) Tendrils, tubes, islands # of in-links (in-degree) averages 8–15, not randomly distributed (Poissonian), instead a power law: # pages with in-degree i is ∝ 1/iα, α ≈ 2.1
34 / 74
1
2
3
4
5
35 / 74
36 / 74
37 / 74
38 / 74
39 / 74
1
2
3
4
5
40 / 74
41 / 74
42 / 74
43 / 74
1
2
3
4
5
44 / 74
45 / 74
46 / 74
47 / 74
48 / 74
49 / 74
50 / 74
51 / 74
1
2
3
4
5
52 / 74
53 / 74
54 / 74
55 / 74
56 / 74
57 / 74
58 / 74
59 / 74
60 / 74
61 / 74
62 / 74
63 / 74
1
2
3
4
5
64 / 74
65 / 74
66 / 74
67 / 74
68 / 74
69 / 74
70 / 74
71 / 74
72 / 74
73 / 74