Geoffrey Boulton University of Edinburgh & CODATA Learn Workshop - - PowerPoint PPT Presentation

geoffrey boulton university of edinburgh codata learn
SMART_READER_LITE
LIVE PREVIEW

Geoffrey Boulton University of Edinburgh & CODATA Learn Workshop - - PowerPoint PPT Presentation

From Open Data to Open Science Geoffrey Boulton University of Edinburgh & CODATA Learn Workshop University College, London January 2016 Knowledge and understanding - the engines of material progress depend on technologies that


slide-1
SLIDE 1

From Open Data to Open Science

Geoffrey Boulton

University of Edinburgh & CODATA

“Learn” Workshop

University College, London January 2016

slide-2
SLIDE 2

Knowledge and understanding - the engines of material progress

depend on technologies that enable their accumulation and communication

1454 2002

slide-3
SLIDE 3

Openness – the bedrock of science in the modern era

Henry Oldenburg

slide-4
SLIDE 4

Scientific self correction

slide-5
SLIDE 5

/var/folders/ls/nv6g47p94ks4d11f1p72h2ch00 00gn/T/com.apple.Preview/com.apple.Preview .PasteboardItems/rutford_avo_afi_ed_july201 0 (dragged).pdf

The Challenge: the “Data Storm” is undermining “self correction”

THEN AND NOW

slide-6
SLIDE 6

A crisis of reproducibility and credibility?

Why such low levels of reproducibility?

  • Misconduct/fraud
  • Invalid reasoning
  • Absent or inadequate data and/or metadata
slide-7
SLIDE 7

19 Exabytes 280 Exabytes

Based on: http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=1018 bytes

The digital revolution

Global information storage capacity In optimally compressed bytes

Digital Storage

Analogue Storage

Explosion of the Digital revolution 1986 1993 2000 2007

2014 - 4000 Exabytes

slide-8
SLIDE 8

http://www.wired.co. uk/news/archive/201 4-01/15/1000-dollar- genome/viewgallery/3 31679

Data acquistion: Cost down – Flux up

slide-9
SLIDE 9

Information: how much is crystallised into knowledge?

slide-10
SLIDE 10

Reinventing reproducibility for the digital age How do we retain an essential principle?

The data providing the evidence for a published concept MUST be concurrently published, together with necessary metadata and computer code. To do otherwise is scientific MALPRACTICE

slide-11
SLIDE 11
slide-12
SLIDE 12

Ozone Levels Four key drivers of change for science

  • Big data
  • Semantically-linked data
  • Open data
  • Cost reduction

Micro-satellite

Looking at clouds

slide-13
SLIDE 13

Pillars of the Digital Revolution Big Data Volume Velocity Variety Linked Open Data Many databases Semantic Relations Deeper meaning

Foundations : Openness

Machine analysis & learning Text and data mining

slide-14
SLIDE 14

The opportunity: data from “simple” to complex systems from uncoupled to highly coupled behaviour

Uncoupled systems Simulating behaviour of highly coupled systems

slide-15
SLIDE 15

Simulating system dynamics

Mapping a complex state Image of brain cells in a rat Emergent behaviour of a specific 6-component coupled system

  • patterns not hitherto seen
  • unsuspected relationship
  • complex systems

e.g. complexity: dynamic evolution and system state

Scientific opportunities

slide-16
SLIDE 16

Satellite observation Surface monitoring

The opportunity: data-modelling: iterative integration

Initial conditions Model forecast Model-data iteration - forecast correction

slide-17
SLIDE 17

Linear regression Cluster analysis Dynamic/complex behaviour Complex systems No mathematical pipeline Simple relationships Classical statistics

System characterisations: from simple to complex

Glucose in type II diabetes Topological analysis

slide-18
SLIDE 18

A barrier to openness? - Analytic overload. E.g. - Global Earth Observation System of Systems

  • What is the human role?
  • Can we analyse & scrutinise what is in the

black box? - &who owns the box?

  • What does it mean to be a researcher in a

data intensive age? A disconnect between machine analysis & human cognition?

slide-19
SLIDE 19

Mathematics related discussions

Tim Gowers

  • crowd-sourced mathematics

An unsolved problem posed on his blog. 32 days – 27 people – 800 substantive contributions Emerging contributions rapidly developed or discarded Problem solved! “Its like driving a car whilst normal research is like pushing it” What inhibits such processes?

  • The criteria for credit and

promotion – ALTMETRICS THE ANSWER?

New modes of technology- enabled creativity: e.g Crowd-sourcing

slide-20
SLIDE 20

The Open Data Iceberg

The Technical Challenge The Consent Challenge The Ecosystem Challenge The Funding Challenge The Support Challenge The Skills Challenge The Incentives Challenge The Mindset Challenge

Processes & Organisation People

motivation and ethos.

Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing.

A National Infrastructure Technology

slide-21
SLIDE 21

The “Science International” Accord: principles of open data

(www.icsu.org/science-international)

Responsibilities

1-2. Scientists

  • 3. Research institutions & universities
  • 4. Publishers
  • 5. Funding agencies
  • 6. Scholarly societies and academies
  • 7. Libraries & repositories
  • 8. Boundaries of openness

Enabling practices

  • 9. Citation and provenance
  • 10. Interoperability
  • 11. Non-restrictive re-use
  • 12. Linkability
slide-22
SLIDE 22

Responsibilities Scientists i. Publicly funded scientists have a responsibility to contribute to the public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re- used and re-purposed.

  • ii. The data that provide evidence for published scientific claims

should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or

  • bservations. To the extent possible, data should be deposited in

well-managed and trusted repositories with low access barriers.

slide-23
SLIDE 23

CODATA CODATA

I I S

S

U

U

African Open Data/Open Science Platform

Platform Forum Coordination Government Priority setting Funders Funding Incentives Capacity Building Training and Skills Infrastructure Roadmaps Flagship Co-Designed Data Intensive Projects International Standards Programmes Shared infrastructure investment; shared good practice; capacity building; system development

slide-24
SLIDE 24

EMBL-EBI services

Labs around the world send us their data and we… Archive it Classify it Share it with

  • ther data

providers Analyse, add value and integrate it …provide tools to help researchers use it

A collaborative enterprise

Disciplinary communities can lead the way

e.g. Elixir programme in life sciences/bio-informatics

slide-25
SLIDE 25

Regional Platforms for Open Science

African Platform? Asian Platform? Australian Platform Shared investment in infrastructure; harvesting and circulating good ideas; spreading and supporting good practice; capacity building; promoting applications; linking to international programmes and standards.

S. American Platform?

slide-26
SLIDE 26

Inputs Outputs Open access

Administrative data (held by public authorities e.g. prescription data) Public Sector Research data (e.g. Met Office weather data) Research Data (e.g. CERN, generated in universities) Research publications (i.e. papers in journals)

Open data

Open science

“science as a public enterprise”

Collecting the data Doing research

Doing science

  • penly

Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists

(communication/dialogue – joint production of knowledge)

Stakeholders

  • Communication/dialogue must be audience-sensitive
  • Is it – with all stakeholder groups?
slide-27
SLIDE 27

Open Science

Data / Publications

Researchers Mono/Multi  Inter  Transdisciplinary  Stakeholders Rigour  Innovation  Policy Solutions

Open Knowledge

slide-28
SLIDE 28
  • Ins tu onal

management and support

  • Na onal

policies

  • &

e-infrastructure

  • Open

Research Data Big Data Analy cs Knowledge Output

  • EXPLOITING

THE DATA REVOLUTION

Scien fic inference

Ins tu onal management & support Na onal policies

  • &

e-infrastructure

A national data-intensive system

slide-29
SLIDE 29

CODATA CODATA

I I S

S

U

U

International Research Data Collaboration

CODATA CODATA

I I S

S

U

U

CODATA

  • Policies & practice
  • Frontiers of data

science

  • Capacity Building

WDS

  • Data stewardship
  • Data standards

RDA

  • Interoperability
slide-30
SLIDE 30
  • 1. Maintaining “self-correction”
  • 2. Open knowledge is creative & productive

“If you have an apple and I have an apple and we exchange these apples, then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.”

  • 3. Open data enables semantic linking

George Bernard Shaw

Why openness & sharing?

slide-31
SLIDE 31
  • Openly collected science is already helping policy

makers.

  • AshTag app allows users to submit photos and

locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra).

Chalara spread: 1992-2012

Citizen Science