Geoffrey Boulton University of Edinburgh & CODATA Learn Workshop - - PowerPoint PPT Presentation
Geoffrey Boulton University of Edinburgh & CODATA Learn Workshop - - PowerPoint PPT Presentation
From Open Data to Open Science Geoffrey Boulton University of Edinburgh & CODATA Learn Workshop University College, London January 2016 Knowledge and understanding - the engines of material progress depend on technologies that
Knowledge and understanding - the engines of material progress
depend on technologies that enable their accumulation and communication
1454 2002
Openness – the bedrock of science in the modern era
Henry Oldenburg
Scientific self correction
/var/folders/ls/nv6g47p94ks4d11f1p72h2ch00 00gn/T/com.apple.Preview/com.apple.Preview .PasteboardItems/rutford_avo_afi_ed_july201 0 (dragged).pdf
The Challenge: the “Data Storm” is undermining “self correction”
THEN AND NOW
A crisis of reproducibility and credibility?
Why such low levels of reproducibility?
- Misconduct/fraud
- Invalid reasoning
- Absent or inadequate data and/or metadata
19 Exabytes 280 Exabytes
Based on: http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=1018 bytes
The digital revolution
Global information storage capacity In optimally compressed bytes
Digital Storage
Analogue Storage
Explosion of the Digital revolution 1986 1993 2000 2007
2014 - 4000 Exabytes
http://www.wired.co. uk/news/archive/201 4-01/15/1000-dollar- genome/viewgallery/3 31679
Data acquistion: Cost down – Flux up
Information: how much is crystallised into knowledge?
Reinventing reproducibility for the digital age How do we retain an essential principle?
The data providing the evidence for a published concept MUST be concurrently published, together with necessary metadata and computer code. To do otherwise is scientific MALPRACTICE
Ozone Levels Four key drivers of change for science
- Big data
- Semantically-linked data
- Open data
- Cost reduction
Micro-satellite
Looking at clouds
Pillars of the Digital Revolution Big Data Volume Velocity Variety Linked Open Data Many databases Semantic Relations Deeper meaning
Foundations : Openness
Machine analysis & learning Text and data mining
The opportunity: data from “simple” to complex systems from uncoupled to highly coupled behaviour
Uncoupled systems Simulating behaviour of highly coupled systems
Simulating system dynamics
Mapping a complex state Image of brain cells in a rat Emergent behaviour of a specific 6-component coupled system
- patterns not hitherto seen
- unsuspected relationship
- complex systems
e.g. complexity: dynamic evolution and system state
Scientific opportunities
Satellite observation Surface monitoring
The opportunity: data-modelling: iterative integration
Initial conditions Model forecast Model-data iteration - forecast correction
Linear regression Cluster analysis Dynamic/complex behaviour Complex systems No mathematical pipeline Simple relationships Classical statistics
System characterisations: from simple to complex
Glucose in type II diabetes Topological analysis
A barrier to openness? - Analytic overload. E.g. - Global Earth Observation System of Systems
- What is the human role?
- Can we analyse & scrutinise what is in the
black box? - &who owns the box?
- What does it mean to be a researcher in a
data intensive age? A disconnect between machine analysis & human cognition?
Mathematics related discussions
Tim Gowers
- crowd-sourced mathematics
An unsolved problem posed on his blog. 32 days – 27 people – 800 substantive contributions Emerging contributions rapidly developed or discarded Problem solved! “Its like driving a car whilst normal research is like pushing it” What inhibits such processes?
- The criteria for credit and
promotion – ALTMETRICS THE ANSWER?
New modes of technology- enabled creativity: e.g Crowd-sourcing
The Open Data Iceberg
The Technical Challenge The Consent Challenge The Ecosystem Challenge The Funding Challenge The Support Challenge The Skills Challenge The Incentives Challenge The Mindset Challenge
Processes & Organisation People
motivation and ethos.
Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing.
A National Infrastructure Technology
The “Science International” Accord: principles of open data
(www.icsu.org/science-international)
Responsibilities
1-2. Scientists
- 3. Research institutions & universities
- 4. Publishers
- 5. Funding agencies
- 6. Scholarly societies and academies
- 7. Libraries & repositories
- 8. Boundaries of openness
Enabling practices
- 9. Citation and provenance
- 10. Interoperability
- 11. Non-restrictive re-use
- 12. Linkability
Responsibilities Scientists i. Publicly funded scientists have a responsibility to contribute to the public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re- used and re-purposed.
- ii. The data that provide evidence for published scientific claims
should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or
- bservations. To the extent possible, data should be deposited in
well-managed and trusted repositories with low access barriers.
CODATA CODATA
I I S
S
U
U
African Open Data/Open Science Platform
Platform Forum Coordination Government Priority setting Funders Funding Incentives Capacity Building Training and Skills Infrastructure Roadmaps Flagship Co-Designed Data Intensive Projects International Standards Programmes Shared infrastructure investment; shared good practice; capacity building; system development
EMBL-EBI services
Labs around the world send us their data and we… Archive it Classify it Share it with
- ther data
providers Analyse, add value and integrate it …provide tools to help researchers use it
A collaborative enterprise
Disciplinary communities can lead the way
e.g. Elixir programme in life sciences/bio-informatics
Regional Platforms for Open Science
African Platform? Asian Platform? Australian Platform Shared investment in infrastructure; harvesting and circulating good ideas; spreading and supporting good practice; capacity building; promoting applications; linking to international programmes and standards.
S. American Platform?
Inputs Outputs Open access
Administrative data (held by public authorities e.g. prescription data) Public Sector Research data (e.g. Met Office weather data) Research Data (e.g. CERN, generated in universities) Research publications (i.e. papers in journals)
Open data
Open science
“science as a public enterprise”
Collecting the data Doing research
Doing science
- penly
Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists
(communication/dialogue – joint production of knowledge)
Stakeholders
- Communication/dialogue must be audience-sensitive
- Is it – with all stakeholder groups?
Open Science
Data / Publications
Researchers Mono/Multi Inter Transdisciplinary Stakeholders Rigour Innovation Policy Solutions
Open Knowledge
- Ins tu onal
management and support
- Na onal
policies
- &
e-infrastructure
- Open
Research Data Big Data Analy cs Knowledge Output
- EXPLOITING
THE DATA REVOLUTION
Scien fic inference
Ins tu onal management & support Na onal policies
- &
e-infrastructure
A national data-intensive system
CODATA CODATA
I I S
S
U
U
International Research Data Collaboration
CODATA CODATA
I I S
S
U
U
CODATA
- Policies & practice
- Frontiers of data
science
- Capacity Building
WDS
- Data stewardship
- Data standards
RDA
- Interoperability
- 1. Maintaining “self-correction”
- 2. Open knowledge is creative & productive
“If you have an apple and I have an apple and we exchange these apples, then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”
- 3. Open data enables semantic linking
George Bernard Shaw
Why openness & sharing?
- Openly collected science is already helping policy
makers.
- AshTag app allows users to submit photos and
locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra).
Chalara spread: 1992-2012