LUIGI BRIGUGLIO - BARI, NOVEMBER 11 2015 Presentation Topics - - PowerPoint PPT Presentation

luigi briguglio bari november 11 2015 presentation topics
SMART_READER_LITE
LIVE PREVIEW

LUIGI BRIGUGLIO - BARI, NOVEMBER 11 2015 Presentation Topics - - PowerPoint PPT Presentation

TRACKING DATASET TRANSFORMATIONS WITH HAPPI TOOLKIT LUIGI BRIGUGLIO - BARI, NOVEMBER 11 2015 Presentation Topics Premise: where everything starts Digital Preservation: overview Tracking dataset transformations: datamodel HAPPI


slide-1
SLIDE 1

TRACKING DATASET TRANSFORMATIONS WITH HAPPI TOOLKIT

LUIGI BRIGUGLIO - BARI, NOVEMBER 11 2015

slide-2
SLIDE 2

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Presentation Topics

  • Premise: where everything starts
  • Digital Preservation: overview
  • Tracking dataset transformations: datamodel
  • HAPPI Toolkit: implementation
  • Practice on HAPPI Toolkit @ EGI FedCloud
  • Q&A
slide-3
SLIDE 3

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Premise: where everything starts

  • The HAPPI Toolkit is part of the Data Preservation e-Infrastructure

produced by the SCIDIP-ES project [http://www.scidip-es.eu]

  • This component, released with open source license (Apache License v2.0)

and available on SourceForge [http://goo.gl/yWPBkV], is an implementation of an authenticity model defined by the collaboration of the APARSEN and SCIDIP-ES projects

  • This model describes how to trace and document transformations on any

digital object during the whole life cycle, and it is based on Open Provenance Model and PREMIS. These de-facto standards improves interoperability among different digital archives and communities.

  • Description of transformations on digital object is part of “preservation

metadata” (a.k.a. Preservation Description Information) includes provenance, reference and integrity information, according to the Open Archival Information System (OAIS), standard ISO:14721:2012.

#traceability #OAIS

slide-4
SLIDE 4

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Premise: where everything starts

ARCHIVE SETUP DATA ACCESS ARCHIVE EVOLUTION USE CASES Long-Term Digital Preservation Infrastructure

slide-5
SLIDE 5

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Premise: where everything starts

ICT Earth Science Community Research

slide-6
SLIDE 6

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Premise: where everything starts

  • APARSEN

proposed a methodology for the management of the authenticity of Digital Resources (DR):

– Formal authenticity model: to represent the DR life cycle and the management of authenticity evidence – Operational guidelines: to guide the process of instantiating the model in a specific environment – Case studies: carried out to tune the methodology and test its effectiveness in a set of heterogeneous environments

  • Cooperation

among APARSEN (specifically La Sapienza University) and SCIDIP-ES (specifically Engineering) improved the model and produced its implementation: HAPPI Toolkit

slide-7
SLIDE 7

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Premise: where everything starts HAPPI 1.5.0 instances run for validation in

slide-8
SLIDE 8

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Presentation Topics

  • Premise: where everything starts
  • Digital Preservation: overview
  • Tracking dataset transformations: datamodel
  • HAPPI Toolkit: implementation
  • Practice on HAPPI Toolkit @ EGI FedCloud
  • Q&A
slide-9
SLIDE 9

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Digital Preservation: overview

  • To promote standards for archiving (space) information, NASA has

been involved in the CCSDS (Consultative Committee for Space Data Systems) and the ISO TC (Technical Committee) and SC (Sub- Committee):

– TC 20: Aircraft and Space Vehicles – SC 13: Space Data and Information Transfer Systems

  • Digital Preservation aims at ensuring digital information is accessible,

understandable and usable over long time

  • ISO:14721:2003 - Space data and information transfer systems - Open

Archival Information System - Reference Model (OAIS RM)

  • ISO:14721:2012: introduced further details on Preservation

Description Information and Authenticity

slide-10
SLIDE 10

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Digital Preservation: overview

  • OAIS provides an Information Model based on key concept of

Information Package

  • And a Functional Model

Information Package (xIP) Content Preservation Description Ingestion

(Submission)

SIP

(Archival Storage)

AIP Access

(Dissemination)

DIP Producers Consumers

slide-11
SLIDE 11

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Digital Preservation: overview

Information Package Content Information Preservation Description Information

further described by

Reference Fixity Provenance Context Access Rights Descriptive Information

Content to preserve Metadata for retrieval Metadata for preservation

slide-12
SLIDE 12

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Presentation Topics

  • Premise: where everything starts
  • Digital Preservation: overview
  • Tracking dataset transformations: datamodel
  • HAPPI Toolkit: implementation
  • Practice on HAPPI Toolkit @ EGI FedCloud
  • Q&A
slide-13
SLIDE 13

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

  • During its life cycle, data may undergo through many transformations (incl. changes
  • f custody)
  • Those transformations may affect the authenticity of data, for this reason it is

important they are properly documented

  • Evidences of transformations will be later used for authenticity assessment

CREATION

KEEPING SYSTEM KEEPING SYSTEM LTDP SYSTEM LTDP SYSTEM LTDP SYSTEM

AGGREGATE

slide-14
SLIDE 14

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

  • The datamodel of HAPPI Toolkit is based on the Authenticity Model

defined by APARSEN and SCIDIP-ES

  • Each Transformation is documented by a record, providing user with

«evidence» of occurred events

Transformation Evidence Record Transformation Evidence Record Transformation Evidence History Evidence Record

slide-15
SLIDE 15

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

  • HAPPI Toolkit is a software component that manages part of

preservation metadata defined in ISO:14721:2012, i.e. OAIS Preservation Description Information (PDI)

  • This metadata is called EvidenceHistory and describes evidences for

the transformations occurred on digital objects during their life cycle, that is tracking transformations on digital objects

OAIS:PDI Provenance Reference Context Fixity Rights EvidenceHistory

slide-16
SLIDE 16

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Intellectual Entity Transformation Agent

Tracking dataset transformations: datamodel

Representation

controlledBy

Representation

used generatedBy

  • f
  • f
slide-17
SLIDE 17

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

Intellectual Entity Is a “coherent set of content that is described as a unit”, the goal of the preservation process being “to maintain usable versions of intellectual entities over time”. Representation Is a set of digital objects required to display, play, or

  • therwise make useable to a human a given version of an IE.

Transformation Is a change that intervenes in conjunction with an event in the IE lifecycle, and produces a new representation of the IE, thus potentially affecting its authenticity. Agent Is the actor (human, machine, or software) associated with a given transformation of an IE, and who bears the responsibility of it.

slide-18
SLIDE 18

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

Report

  • info
  • Fixity
  • SignificantProperties

Agent

  • ID+info
  • Type

Representation

  • ID+info
  • Format
  • Type

Transformation

  • ID+Info
  • Software
  • Type

Nodes & Edges

slide-19
SLIDE 19

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

  • To guarantee «interoperability» among communities and

archives, data model has been based on:

– OPM: Open Provenance Model – formalism for modelling life cycle of digital object as a provenance graph – PREMIS: Data Dictionary for Preservation Metadata – common dictionary in the preservation community for ensuring interoperability among repositories http://www.loc.gov/standards/premis/index.html http://openprovenance.org/

slide-20
SLIDE 20

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

slide-21
SLIDE 21

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Tracking dataset transformations: datamodel

extraction extraction aggregation aggregation

  • Some transformations

change the intellectual entity and generate new one(s), e.g.

– Extraction – Aggregation time

slide-22
SLIDE 22

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Presentation Topics

  • Premise: where everything starts
  • Digital Preservation: overview
  • Tracking dataset transformations: datamodel
  • HAPPI Toolkit: implementation
  • Practice on HAPPI Toolkit @ EGI FedCloud
  • Q&A
slide-23
SLIDE 23

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation

  • HAPPI (Handling Authenticity Provenance and Persistent Identifiers)

– Manage Intellectual Entity – Capture Evidence Record Documentation (OPM1.1 and PREMIS2.2) – Store Intellectual Entity, Evidence Record/History in a scalable database – Search/Browse – Import/Export

Archive Manager Store

HAPPI

Register Intellectual Entity Capture Evidence Record Import/Export

Evidence History

Search & Browse

Intellectual Entity, Evidence Records

slide-24
SLIDE 24

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation

  • Archive Manager can add specific significant properties, for later supporting authenticity

assessment

  • Reference is applied to Intellectual Entities and evidence items (i.e. Agent,

Transformation, Representation)

– Organisation – who assigns the reference – Type – type of reference (e.g. URI, DOI) – Value – value of reference

  • Type of Transformations

– AGGREGATION – CAPTURE – CHANGEOFCUSTODY – EXTRACTION – INGESTION – MIGRATION

slide-25
SLIDE 25

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation

Register the Intellectual Entity

  • title
  • creation date
  • reference
  • annotation

Gather information into Evidence Records

  • transformation
  • who controls the transformation
  • result/input of transformation
  • report with annotation and specific properties

Step 1 Step 2

slide-26
SLIDE 26

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation

IntellectualEntityManager EvidenceHistoryManager

+ addIntellectualEntity(ie) + addEvidenceRecord(er, eh) + getIntellectualEntity(label) + getEvidenceRecord(label, eh) + getAllIntellectualEntities() + getAllEvidenceRecords(eh) + getIntellectualEntitiesBy(from, to, keyword) + getEvidenceRecordHistory(label, eh) + getLastEvidenceRecords(eh) + importEvidenceHistory(eh, gxmlFile) + exportEvidenceHistory(eh, gxmlFile)

HAPPI-LOGIC-1.5.0

IEManager EHManager IntellectualEntity EvidenceHistory/Record

slide-27
SLIDE 27

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation

GraphDB HAPPI-LOGIC

Neo4j

HAPPI-SERVER

slide-28
SLIDE 28

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation Browse the History of Data Timeline mode

slide-29
SLIDE 29

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation Browse the History of Data Graph mode

slide-30
SLIDE 30

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation

// obtain the IntellectualEntityManager IntellectualEntityManager iemanager = ManagerFactory.getInstance().getIntellectualEntityManager(); // create the reference Reference sampleRef = new Reference("Picktochart", "URI","https://magic.piktochart.com/output/3098625-untitled-report"); /** * create the intellectual entity, that is composed by * reference, label, title, annotation and date of creation. */ IntellectualEntity ie1 = new IntellectualEntity(sampleRef, null, "HAPPI Infographics", "SCIDIP-ES HAPPI Infographics", new Date()); // add the intellectual entity through iemanager iemanager.addIntellectualEntity(ie1);

http://sourceforge.net/p/digitalpreserve/code/HEAD/tree/SCIDIP-ES/software/toolkits/authenticity

slide-31
SLIDE 31

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

HAPPI Toolkit: implementation

// obtain the EvidenceHistoryManager EvidenceHistoryManager ehmanager = ManagerFactory.getInstance().getEvidenceHistoryManager(); // get the evidence history of the intellectual entity EvidenceHistory eh1 = ie1.getEvidenceHistory(); /** create the first evidence record with sample data, by * using the buildRecord utility method. */ EvidenceRecord er1 = new EvidenceRecord(); er1 = buildSampleRecord("Luigi Briguglio", "capture","origin", "er1", null); // add the evidence record to its history eh1.addEvidenceRecord(er1); ehmanager.addEvidenceRecord(er1, eh1); // adding a second record to the history to the first one EvidenceRecord er2 = new EvidenceRecord(); er2 = buildSampleRecord("Luigi Briguglio", "ingestion","submitted", "er2", er1); eh1.addEvidenceRecord(er2); ehmanager.addEvidenceRecord(er2, eh1);

http://sourceforge.net/p/digitalpreserve/code/HEAD/tree/SCIDIP-ES/software/toolkits/authenticity

slide-32
SLIDE 32

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Presentation Topics

  • Premise: where everything starts
  • Digital Preservation: overview
  • Tracking dataset transformations: datamodel
  • HAPPI Toolkit: implementation
  • Practice on HAPPI Toolkit @ EGI FedCloud
  • Q&A
slide-33
SLIDE 33

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Presentation Topics

  • Premise: where everything starts
  • Digital Preservation: overview
  • Tracking dataset transformations: datamodel
  • HAPPI Toolkit: implementation
  • Practice on HAPPI Toolkit @ EGI FedCloud
  • Q&A
slide-34
SLIDE 34

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

References

  • A Modular Infrastructure for the Management of Authenticity and Persistent Identifiers in

Long Term Digital Preservation Repositories in Int. J. of Knowledge and Learning 2014 Vol.9 No.4 http://www.inderscience.com/info/inarticle.php?artid=69535

  • Thesis - Analisi Progettazione e Sviluppo di un Prototipo per la Gestione della

Provenienza nel Processo di Conservazione Digitale, Tor Vergata Univ., October 2013

  • Modelling Data Value in Digital Preservation in iPRES2013 Conference Proceedings,

September 2013, http://purl.pt/24107/1/iPres2013_PDF/Modelling%20Data%20Value%20in%20Digital%20P reservation.pdf

  • Preserving Authenticity Evidence to Assess Provenance and Integrity of Digital Resources

in ECLAP 2013 Conference Proceedings, LNCS issue no. 7990, April 2013 - http://link.springer.com/chapter/10.1007%2F978-3-642-40050-6_7

slide-35
SLIDE 35

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Next Step: Extending the model

time OAIS:PDI Provenance Reference Context Fixity Rights

  • add relationships to other digital objects

to document context

  • document rights
  • to better tracking evolution
slide-36
SLIDE 36

EGI CF2015 – Tracking Dataset Transformations with HAPPI Toolkit

Questions?