CAIM: Cerca i Anlisi dInformaci Massiva FIB, Grau en Enginyeria - - PowerPoint PPT Presentation

caim cerca i an lisi d informaci massiva
SMART_READER_LITE
LIVE PREVIEW

CAIM: Cerca i Anlisi dInformaci Massiva FIB, Grau en Enginyeria - - PowerPoint PPT Presentation

CAIM: Cerca i Anlisi dInformaci Massiva FIB, Grau en Enginyeria Informtica Slides by Marta Arias, Jos Luis Balczar, Ramon Ferrer-i-Cancho, Ricard Gavald Department of Computer Science, UPC Fall 2020 http://www.cs.upc.edu/~caim


slide-1
SLIDE 1

CAIM: Cerca i Anàlisi d’Informació Massiva

FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá

Department of Computer Science, UPC

Fall 2020 http://www.cs.upc.edu/~caim

1 / 11

slide-2
SLIDE 2
  • 0. Presentation
slide-3
SLIDE 3

COVID 19

◮ Follow the instructions that FIB has sent to you. ◮ Sit always of the same place. ◮ Write your row and column somewhere so that you can remember it.

3 / 11

slide-4
SLIDE 4

Instructors

◮ Ramon Ferrer-i-Cancho (lectures + exercices 10 & 20; lab 12)

◮ rferrericancho@cs.upc.edu ◮ Omega S124, 93 413 4028

◮ Ignasi Gómez (lab 11, 21 & 22)

◮ ignasi.gomez@upc.edu

◮ Javier Béjar (lab 13)

◮ bejar@cs.upc.edu ◮ Omega 204, 93 413 7879

4 / 11

slide-5
SLIDE 5

Class Logistics

◮ Fridays, 12–14 (A6E01), 15–17 (A6E02)

◮ Theory and exercises. Often, exercises will be proposed in advance.

◮ Thursdays, lab sessions

◮ Guided lab activities; expected to be complemented with an average estimate of 2 additional hours per session of autonomous work. ◮ Some lab sessions will finish by handing in a short written report; these count towards the evaluation of the course.

5 / 11

slide-6
SLIDE 6

Lab work - important rules

◮ Lab is done in pairs. Exceptions must have prior permission ◮ This semester: keep the same partner for the whole semester (see instructions at Racó). ◮ Do not exchange information with others, other than general ideas; that will be considered plagiarism

6 / 11

slide-7
SLIDE 7

Exercises

◮ In class, we will solve only a part of the exercises proposed ◮ You are strongly encouraged to try and solve the rest of the exercises ◮ Self-study: One or more small topics will not be explained in class. They will appear in the exam.

7 / 11

slide-8
SLIDE 8

Evaluation

◮ Evaluation: as per “Guia Docent” ◮ Parcial 1 (P1): November 5 16:00-17:30 (during week for partial exams), Parcial 2 (P2): 11/01/2021 15:00-18:00 ◮ On the day of Parcial 2 you may choose to do instead a final exam (F) on the whole course ◮ 40 % Lab + max(30 % P1 + 30 % P2, 60 % F)

8 / 11

slide-9
SLIDE 9

Contents I

First half (until midterm):

◮ Core Information Retrieval:

◮ Introduction: Concept. The IR process ◮ Information Retrieval Models ◮ Indexing and Searching, Implementation ◮ Information Retrieval Evaluation, Feedback Models

◮ Web Search:

◮ Link analysis: Page Rank ◮ Crawling the web ◮ Architecture of a Web search system

9 / 11

slide-10
SLIDE 10

Contents II

Second half:

◮ The “Big Data” Slogan

◮ Architecture of large-scale web search systems ◮ The Map-Reduce paradigm ◮ Introduction to NoSQL databases ◮ The Apache ecosystem for web search.

◮ Social Network Analysis:

◮ Characterizing of real complex networks ◮ Communities, influence, information diffusion

◮ Clustering and Locality Sensitive Hashing ◮ Recommender Systems

10 / 11

slide-11
SLIDE 11

Bibliography

◮ R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval (2nd ed.). Addison Wesley, 2010. ◮ I.H. Witten, A. Moffat, T. Bell: Managing Gigabytes. Morgan Kaufmann, 1999. ◮ C.D. Manning, P . Raghavan, H. Schütze: Introduction to Information Retrieval. Cambridge 2008. ◮ Z. Markov, D.T. Larose: Data Mining the Web. Wiley, 2007. ◮ Russell, Matthew , Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media

  • Site. O’Reilly , 2011

◮ . . . There’s a whole web out there

11 / 11