How to address Polo? Grammatically correct Prof. Chau Dr. Chau - - PowerPoint PPT Presentation

how to address polo
SMART_READER_LITE
LIVE PREVIEW

How to address Polo? Grammatically correct Prof. Chau Dr. Chau - - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Duen Horng (Polo) Chau Associate Professor, College of Computing Associate Director, MS Analytics Machine Learning Area Leader, College of


slide-1
SLIDE 1

http://poloclub.gatech.edu/cse6242


CSE6242 / CX4242: 


Data & Visual Analytics


Duen Horng (Polo) Chau


Associate Professor, College of Computing
 Associate Director, MS Analytics
 Machine Learning Area Leader, College of Computing
 Georgia Tech

slide-2
SLIDE 2

Google “Polo Chau” (only one in the world)

slide-3
SLIDE 3

How to address Polo?

Grammatically correct

  • Prof. Chau
  • Dr. Chau

Grammatically incorrect, but popular

  • Prof. Polo
  • Dr. Polo
slide-4
SLIDE 4

Course Registration

  • As of 3pm today
  • CSE 6242 A
  • 186/202 seats filled
  • 81/250 waitlist slots taken
  • CX 4242 A
  • 50/68 seats filled
  • 4/100 waitlist slots taken
  • CSE 6242 Q (distance-learning): 6 students

This class room seats 300. If you are on the waitlist, please wait for seats to released (some students typically “drop” after today).

slide-5
SLIDE 5

Course TAs Be very very nice to them!

Office hours and locations (TBD) on course homepage


poloclub.gatech.edu/cse6242

Neetha Ravishankar Jennifer Ma Mansi Mathur Arathi Arivayutham Vineet Vinayak Pasupulety Siddharth Gulati

slide-6
SLIDE 6

poloclub.gatech.edu

slide-7
SLIDE 7

poloclub.gatech.edu

slide-8
SLIDE 8 7

We work with (really) large data.

slide-9
SLIDE 9 8

Internet

50 Billion Web Pages

www.worldwidewebsize.com www.opte.org
slide-10
SLIDE 10 9

Facebook

2 Billion Users

slide-11
SLIDE 11 10

Citation Network

www.scirus.com/press/html/feb_2006.html#2 Modified from well-formed.eigenfactor.org

250 Million Articles

slide-12
SLIDE 12

Twitter

Who-follows-whom (500 million users) Who-buys-what (120 million users)

cellphone network

Who-calls-whom (100 million users)

Protein-protein interactions

200 million possible interactions in human genome

11

Many More

Sources: www.selectscience.net www.phonedog.com www.mediabistro.com www.practicalecommerce.com/
slide-13
SLIDE 13 12

“Big Data” Analyzed

DATA INSIGH

Graph Nodes Edges

YahooWeb 1.4 Billion 6 Billion Symantec Machine-File Graph 1 Billion 37 Billion Twitter 104 Million 3.7 Billion Phone call network 30 Million 260 Million

We also work with small data. 
 Small data also needs love.

slide-14
SLIDE 14

7

slide-15
SLIDE 15

7

Number of items an average human holds in working memory

±2

George Miller, 1956

slide-16
SLIDE 16
slide-17
SLIDE 17

7

slide-18
SLIDE 18

Data Insights

slide-19
SLIDE 19 16

How to do that?

COMPUTATION + HUMAN INTUITION

slide-20
SLIDE 20 17

Or, to ride the AI wave…

ARTIFICIAL INTELLIGENCE + HUMAN INTELLIGENCE

slide-21
SLIDE 21

Both develop methods for making sense of network data

18

How to do that?

COMPUTATION INTERACTIVE VIS

Automatic User-driven; iterative Summarization, 
 clustering, classification Interaction, visualization >Millions of nodes Thousands of nodes

slide-22
SLIDE 22 18

How to do that?

COMPUTATION INTERACTIVE VIS

Automatic User-driven; iterative Summarization, 
 clustering, classification Interaction, visualization >Millions of nodes Thousands of nodes

slide-23
SLIDE 23 18

How to do that?

COMPUTATION INTERACTIVE VIS

Automatic User-driven; iterative Summarization, 
 clustering, classification Interaction, visualization >Millions of nodes Thousands of nodes

slide-24
SLIDE 24 18

How to do that?

COMPUTATION INTERACTIVE VIS

Automatic User-driven; iterative Summarization, 
 clustering, classification Interaction, visualization >Millions of nodes Thousands of nodes

slide-25
SLIDE 25 18

How to do that?

COMPUTATION INTERACTIVE VIS

Automatic User-driven; iterative Summarization, 
 clustering, classification Interaction, visualization >Millions of nodes Thousands of nodes

slide-26
SLIDE 26 18

How to do that?

COMPUTATION INTERACTIVE VIS

Automatic User-driven; iterative Summarization, 
 clustering, classification Interaction, visualization >Millions of nodes Thousands of nodes

slide-27
SLIDE 27

Our research combines the 
 Best of Both Worlds

19

Our Approach for Big Data Analytics

DATA MINING HCI

Automatic User-driven; iterative Summarization, 
 clustering, classification Interaction, visualization >Millions of items Thousands of items

Human-Computer Interaction

slide-28
SLIDE 28 20

Our mission & vision:

Scalable, interactive, usable
 tools for big data analytics

slide-29
SLIDE 29

“Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.”

(Einstein might or might not have said this.)

slide-30
SLIDE 30

AI Interpretation & Protection Cyber Security Social Good & Health Large Graph Mining & Visualization

Polo Club of Data Science poloclub.github.io

slide-31
SLIDE 31

Course homepage
 All assignments, slides posted here poloclub.gatech.edu/cse6242/ Discussion, Q&A, 
 find teammates

Piazza: link available on canvas.gatech.edu

Assignment 
 Submission Canvas


(Use Piazza for discussion)

Logistics

Make sure you’re at the right Piazza!
 (CSE-6242-O01, CSE-6242-OAN have their Piazza forums too)

slide-32
SLIDE 32

Course Homepage

For syllabus, HWs, projects, datasets, etc.

Google “cse6242”


poloclub.gatech.edu/cse6242/

slide-33
SLIDE 33

Join Piazza ASAP
 (via canvas.gatech.edu)

slide-34
SLIDE 34
  • Polo will announce events related to this class and

data science in general

  • Distinguished lectures
  • Seminars
  • Hackathons (free food, prizes)
  • Company recruitment events (free food, swag)

Important to join Piazza because…

slide-35
SLIDE 35

Course Goals

27
slide-36
SLIDE 36 28

What is Data & Visual Analytics?

slide-37
SLIDE 37 28

What is Data & Visual Analytics?

No formal definition!

slide-38
SLIDE 38 28

Polo’s definition: 
 the interdisciplinary science of combining 
 computation techniques and 
 interactive visualization 
 to transform and model data to aid 
 discovery, decision making, etc.

What is Data & Visual Analytics?

No formal definition!

slide-39
SLIDE 39 29

What are the “ingredients”?

slide-40
SLIDE 40 29

What are the “ingredients”?

Need to worry (a lot) about: storage, complex system design, scalability of algorithms, visualization techniques, interaction techniques, statistical tests, etc. Wasn’t this complex before this big data era. Why?

slide-41
SLIDE 41 30

http://spanning.com/blog/choosing-between-storage-based-and-unlimited-storage-for-cloud-data-backup/

slide-42
SLIDE 42

What is big data? Why care?

Many businesses are based on big data.

Search engines: rank webpages, predict what you’re going to type Advertisement: infer what you like, based on what your friends like; show relevant ads E-commerce: recommends movies/products (e.g., Netflix, Amazon) Health IT: patient records (EMR) Finance

slide-43
SLIDE 43

Good news! Many jobs!

Most companies are looking for “data scientists” The data scientist role is critical for organizations looking to extract insight from information assets for ‘big data’ initiatives and requires a broad combination of skills that may be fulfilled better as a team


  • Gartner (http://www.gartner.com/it-glossary/data-scientist)

Breadth of knowledge is important.
 This course helps you learn some important skills.

slide-44
SLIDE 44

Collection Cleaning Integration Visualization Analysis Presentation Dissemination

Course Schedule


(Analytics Building Blocks)

slide-45
SLIDE 45

Building blocks. Not Rigid “Steps”.

Can skip some Can go back (two-way street)

  • Data types inform visualization design
  • Data size informs choice of algorithms
  • Visualization motivates more data cleaning
  • Visualization challenges algorithm

assumptions
 e.g., user finds that results don’t make sense

Collection Cleaning Integration Visualization Analysis Presentation Dissemination

slide-46
SLIDE 46
  • Learn visual and computation techniques

and use them in complementary ways

  • Gain a breadth of knowledge
  • Learn practical know-how by working on 


real data & problems

Course Goals

slide-47
SLIDE 47
  • [50%] 4 homework assignments
  • End-to-end analysis
  • Techniques (computation and vis)
  • “Big data” tools, e.g., Hadoop, Spark, etc.
  • [50%] Group project -- 4 to 6 people
  • [Bonus points] In-class pop quizzes
  • Each quiz is worth 1% course grade
  • No exams

Grading

slide-48
SLIDE 48

Policies


On website; we go through them now


 Grading, plagiarism, collaboration, late submission, and the “warning” about the difficulty this course

slide-49
SLIDE 49

From Previous Classes…

  • Class projects turned into papers at top

conferences (KDD, IUI, etc.)

  • Projects as portfolio pieces on CV
  • Increased job and internship opportunities
  • Former students sent me “thank you” notes
slide-50
SLIDE 50

IUI Full conference paper

slide-51
SLIDE 51

KDD Workshop paper

slide-52
SLIDE 52

IUI Poster paper

slide-53
SLIDE 53

“I feel like the concepts from your class are like a rite of passage for an aspiring data scientist. Assignments lead to a feelings of accomplishment and truly progressing in my area of passion.” “I really get more intuition about how to deal with data with some powerful tools in HW3 [uses AWS]. That feeling is beyond description for me.” “I would like to say thank you for your class! Thanks to the skills I got from the class and the project, I got the offer.”

42
slide-54
SLIDE 54

What Polo expects from you

  • Actively participate throughout the course!
  • Ask questions during class and on Piazza
  • Help out whenever you can, e.g., help

answer questions on Piazza

  • Polo reserves last few minutes of every

class for Q&A

slide-55
SLIDE 55

FREE After-class Coffee ☕

  • After class, Polo randomly selects 5 students

(+2 volunteers) for FREE after-class coffee

  • Polo’s treat. You can order coffee, tea,

pastries — whatever you want

  • Very casual — you can ask me ANYTHING
  • Will try doing this at least once a week,

starting next week!