The bottom line We are the data science people but the world needs - PowerPoint PPT Presentation
The bottom line We are the data science people but the world needs to know about it Wrangling vs Analytics wrangling analytics Wrangling: data processing that allows meaningful analysis to begin (extraction, integration, cleaning, querying,
The bottom line We are the data science people but the world needs to know about it
Wrangling vs Analytics wrangling analytics Wrangling: data processing that allows meaningful analysis to begin (extraction, integration, cleaning, querying, etc - basically SIGMOD/PODS CFP) Requires more effort (usually 50-80%)
This is what we do • But the world sees the end result • The 80-20 rule: 20% of effort gets 80% of PR • But we need to be better at it • Some ammunition...
Data analysts’ favorite tools Share of Respondents 40% 60% 20% 30% 50% 70% 10% 0% SQL Excel Python R MySQL LANGUAGES, DATA PLATFORMS, ANALYTICS TOOLS Python: numpy, scipy, scikit-learn ggplot Microsoft SQL Server Tableau JavaScript Matplotlib (Python) Java PostgreSQL Tool: language, data platform, analytics Oracle D3 Homegrown analysis tools Hive Spark Cloudera Visual Basic/VBA MongoDB Apache Hadoop SAS C++ PowerPivot Scala SQLite C Pig Amazon RedShift Weka Hbase Amazon Elastic MapReduce (EMR) Perl SPSS Teradata
Future data analysts’ favorite tools
The world needs to know • ... but it’s much more fun doing research than talk to the “real world” • Still, we are not a small community, and we have people with different skills • One example: we convinced our funders (EPSRC) that data management is an essential part of “big data” • The more people get the message, the healthier our field is
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.