A Case Study of Long-Running Business Processes: Digital Information - - PowerPoint PPT Presentation

a case study of long running business processes digital
SMART_READER_LITE
LIVE PREVIEW

A Case Study of Long-Running Business Processes: Digital Information - - PowerPoint PPT Presentation

FORTH-I CS A Case Study of Long-Running Business Processes: Digital Information Preservation Yannis Tzitzikas Assistant Professor, Department of Computer Science, University of Crete Associate Researcher, Institute of Computer Science


slide-1
SLIDE 1

FORTH-I CS

A Case Study of Long-Running Business Processes: Digital Information Preservation

Assistant Professor, Department of Computer Science, University of Crete Associate Researcher, Institute of Computer Science (FORTH-ICS)

Yannis Tzitzikas

First SSME Workshop and Summer School “The Business Process in the Science of Service” Heraklion, May 30-June 3, 2007

slide-2
SLIDE 2

Yannis Tzitzikas, SSME'07 2

FORTH-I CS

Outline

  • What is Digital Information Preservation?
  • Why it is important?
  • Aspects of Preservation
  • Preservation Approaches (/Strategies)
  • The OAIS Reference Model
  • The CASPAR Project
  • On preserving the Intelligibility of Digital Objects

– Formalizing Intelligibility and Intelligibility Gaps – Intelligibility-aware processes

  • Concluding Remarks and Directions for Further Research
slide-3
SLIDE 3

FORTH-I CS

What is Digital Information Preservation?

slide-4
SLIDE 4

Yannis Tzitzikas, SSME'07 4

FORTH-I CS

Phaistos disk (dated to 1700 BC)

We still cannot understand it (the meaning has not been preserved)

slide-5
SLIDE 5

Yannis Tzitzikas, SSME'07 5

FORTH-I CS

Egyptian Pyramids

We still don’t know how the pyramids were constructed. (the process has not been preserved)

slide-6
SLIDE 6

Yannis Tzitzikas, SSME'07 6

FORTH-I CS

Digital Objects

How can we be sure that in the future one would be able to understand this byte stream?

089097110110105115 It is “Yannis” in ASCII &#89&#97&#110&#110&#105&#115 100110110000110111011011101110010111100111

How we will preserve the meaning of digital objects?

slide-7
SLIDE 7

Yannis Tzitzikas, SSME'07 7

FORTH-I CS

Digital Objects The need for preserving the process that created a digital object

How we will preserve the digital process?

process

Storage

  • How this image has been derived?
  • When and by whom it was taken?
  • How the satellite image was processed (by

what algorithms and with what parameters)?

slide-8
SLIDE 8

Yannis Tzitzikas, SSME'07 8

FORTH-I CS

Digital Objects The need for preserving everyday knowledge

Emprego stipend comeceData termineData emprego Empregado sobrenome idade aumenteIdade() Empresa nome empregue (p) fogo(p) promova(p,inc) Employment salary startDate endDate employment Person name age increaseAge() Company name hire(p:Person) fire(p:Person) promote(p,incr)

  • A person cannot start a job before his/her birth
  • A promotion cannot lower the salary of an employee

⇒ Now I can develop the system or I can guess how the existing system operates

I know UML but what this diagram specifies? If I knew Spanish then Plus everyday knowledge

slide-9
SLIDE 9

FORTH-I CS

everything flows nothing stands still [Heraklitus]

slide-10
SLIDE 10

Yannis Tzitzikas, SSME'07 10

FORTH-I CS

The need for tackling changes

Suppose a tourist agency which keeps a web site where a large number of touristic brochures (for various destinations all over the world) are made available in electronic form. All the material is stored in a digital repository

We need to tackle changes in software/hardware and community knowledge

Tour of Maribor w ith only: Tour of Maribor w ith only: Metadata

  • Format: gif
  • City: Maribor
  • Country: Yugoslavia
  • Currency.type: Yogoslav dinars

(YUM)

  • Currency.Value: 5

Metadata

  • Format: gif
  • City: Maribor
  • Country: Yugoslavia
  • Currency.type: Yogoslav dinars

(YUM)

  • Currency.Value: 5

Notice that:

  • The Flag is no longer valid
  • The Country … “does not exist” any more
  • The currency is not valid
  • We may want to change the image format (e.g

gif -> .png)

slide-11
SLIDE 11

Yannis Tzitzikas, SSME'07 11

FORTH-I CS

We need to tackle changes because … everything flows nothing stands still [Heraklitus]

Metadata

  • Format: giff
  • Type: Flag
  • Country: Yugoslavia

Metadata

  • Format: giff
  • Type: Flag
  • Country: Yugoslavia

1977

Bosnia & Herzegovina Bosnia & Herzegovina Croatia Croatia FYROM FYROM Montenegro Montenegro Serbia Serbia Slovenia Slovenia

2006

slide-12
SLIDE 12

Yannis Tzitzikas, SSME'07 12

FORTH-I CS

Tackling changes

Tour of Maribor w ith only: Tour of Maribor w ith only: Tour of Maribor w ith only: Tour of Maribor w ith only: Metadata

  • Format: giff
  • City: Maribor
  • Country: Yugoslavia
  • Currency.type: Yogoslav dinars

(YUM)

  • Currency.Value: 5

Metadata

  • Format: giff
  • City: Maribor
  • Country: Yugoslavia
  • Currency.type: Yogoslav dinars

(YUM)

  • Currency.Value: 5

Metadata

  • Format: png
  • City: Maribor
  • Country: Slovenia
  • Currency.type: Slovenian Tolar
  • Currency.Value: 3.4

Metadata

  • Format: png
  • City: Maribor
  • Country: Slovenia
  • Currency.type: Slovenian Tolar
  • Currency.Value: 3.4

Format migration Knowledge update

1977 July 2006

slide-13
SLIDE 13

Yannis Tzitzikas, SSME'07 13

FORTH-I CS

Preservation of Digital Information

Why it is important?

  • The world produces around 2 exabytes (260) of unique information per year,

– 90% of which is digital and with a 50% annual growth rate.

  • “Everything flows, nothing stands still” [Heraclitus]
  • Digital information has to be preserved not only against hardware and

software technology changes, but also against changes in the knowledge

  • f the community.
slide-14
SLIDE 14

Yannis Tzitzikas, SSME'07 14

FORTH-I CS

Aspects of Preservation

But what should we preserve?

  • For sure we have to preserve the bits of the digital objects

We should also try to preserve the information carried by the digital objects

– Their accessibility – Their integrity – Their authenticity – Their provenance – Their intelligibility (by human or artificial actors)

Preservation has been termed “interoperability with the future”

slide-15
SLIDE 15

FORTH-I CS

What are the current preservation approaches and inititatives?

slide-16
SLIDE 16

Yannis Tzitzikas, SSME'07 16

FORTH-I CS

Current preservation approaches

Approaches

  • Replication

– Keep multiple copies

  • Refreshing

– Copy data onto newer media or systems

  • Migration

– Replace digital objects of old formats with "equivalent" objects of new formats.

  • Emulation

– An emulator duplicates (provide an emulation of) the functions of one system with a different system, so that the second system behaves like (and appears to be) the first system. Standards – OAIS

  • (will be discussed next)

Ongoing EU Projects – PLANETS

  • Objective: Support humans in deciding what preservation policy (emulation,

migration) to adopt based on criteria like cost, loss of information. – CASPAR

  • (will be discussed next)
slide-17
SLIDE 17

FORTH-I CS

OAIS: Open Archival Information System (ISO 14721:2003)

slide-18
SLIDE 18

Yannis Tzitzikas, SSME'07 18

FORTH-I CS

OAIS: Open Archival Information System

OAIS: An archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community (OAIS 1.7.2)

– Development led by the Consultative Committee for Space Data Systems (CCSDS)

– Published in early 2003 as ISO 14721:2003

– Delivers two high-level models:

  • Information Model
  • Functional Model
slide-19
SLIDE 19

Yannis Tzitzikas, SSME'07 19

FORTH-I CS

OAIS Information Model Kinds of Metadata

  • Representation Information

– objective: for taking a collection of bits and convert it to something useful – key notions: Structure, Semantics, Algorithms,...

  • Preservation Description Information

– objective: for considering the origins and relevance of any digital information – key notions: Provenance, Fixity, Reference and Context

  • Descriptive Information

– role: important for data management, discovery and access

slide-20
SLIDE 20

Yannis Tzitzikas, SSME'07 20

FORTH-I CS

OAIS Information Model

Information Object Representation Information 1+ interpreted using 1+ Data Object interpreted using Physical Object Digital Object Bit Sequence 1+

slide-21
SLIDE 21

Yannis Tzitzikas, SSME'07 21

FORTH-I CS

OAIS Information Model Kinds of Metadata

class OAIS Information Model Information Object Data Object Physical Object Digital Object Bit Sequence Representation Information Structure Information Semantic Information Software Information Algorithms Information 1..* 0..* interpretedUsing 1..*

slide-22
SLIDE 22

Yannis Tzitzikas, SSME'07 22

FORTH-I CS

OAIS Functional Model

Functional Model of OAIS (6 entities):

  • Ingest
  • Archival Storage
  • Data Management
  • Administration
  • Preservation

Planning

  • Access

Functional Model of OAIS (6 entities):

  • Ingest
  • Archival Storage
  • Data Management
  • Administration
  • Preservation

Planning

  • Access
  • SIP: Submission Information Package
  • AIP: Archival Information Package (e.g. format) which consist of

– IO (Information Object): Data Object + Representation Information – PDI (Preservation Description Information): provenance, context, fixity

  • DIP: Dissemination Information Package

– is the version of the information package delivered to the Consumer in response to an access

  • request. May differ in form (e.g. TIFF to JPEG) or content (e.g. amount of metadata supplied) to

that which resides in the archival store.

Administration Ingest Archival Storage Access Data Management

Descriptive info.

P R O D U C E R C O N S U M E R

MANAGEMENT

queries result sets

Descriptive info.

Preservation Planning

  • rders

SIP SIP SIP DIP DIP AIP AIP

slide-23
SLIDE 23

FORTH-I CS

The CASPAR project CASPAR:

Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval

  • Ongoing FP6 Integrated Project
  • Start: April 2006.
  • Duration: 42 months
  • EU Funding: € 8 800 000
  • Total planned buget: € 16 000 000
slide-24
SLIDE 24

Yannis Tzitzikas, SSME'07 24

FORTH-I CS

CASPAR Objectives

Pioneering framework to support the end-to-end preservation “lifecycle” for scientific, artistic and cultural information based on existing and emerging standards

  • to establish the foundation methodology for covering all preservation aspects
  • to research, develop and integrate advanced components
  • to create the CASPAR framework
  • to demonstrate the validity of the CASPAR though testbeds

– Cultural (UNESCO) – Contemporary Arts (CNRS, INA, IRCAM, UofLeeds, …) – Scientific (European Space Agency, CCLRC)

slide-25
SLIDE 25

Yannis Tzitzikas, SSME'07 25

FORTH-I CS

“CASPARtners”

  • The partners of this project are:
  • Council for the Central Laboratory of the Research Councils – UK (Coordinator)
  • Foundation for Research and Technology - Hellas GR
  • European Space Agency, ESRIN - IT
  • UNESCO
  • Centre National de la Recherche Scientifique - FR
  • Institut de Recherche et Coordination Acoustique/Musique – FR
  • Institut National de l’Audiovisuel - FR
  • Consiglio Nazionale delle Ricerche – IT
  • IBM Haifa Research Laboratory - IL
  • University of Leeds- UK
  • International Centre for Art and New Technologies - CZ
  • University of Glasgow - UK
  • Università di Urbino- IT
  • and 4 companies:
  • Advanced Computer Systems S.p.A. - IT
  • @semantics S.r.l. - IT
  • Metaware S.p.A. - IT
  • Engineering – Ingegneria Informatica S.p.A. - IT
slide-26
SLIDE 26

FORTH-I CS

The project has to tackle a number of problems and we are just in its first year. Hereafter we will focus on the notion of intelligibility of digital objects.

  • Y. Tzitzikas, “Dependency Management for the Preservation of Digital Information", 18th International

Conference on Database and Expert Systems Applications, DEXA’2007, Regensburg, Germany, September 2007

  • Y. Tzitzikas and G. Flouris, “Mind the (Intelligibility) Gap", 11th European Conference on Research and

Advanced Technology for Digital Libraries, ECDL’2007, Budapest, Hungary, September 2007

slide-27
SLIDE 27

Yannis Tzitzikas, SSME'07 27

FORTH-I CS

OAIS Information Model

Representation Information

  • According to OAIS, metadata are distinguished to various categories.
  • One very important is that of Representation Information

– Aim at enabling the conversion of a collection of bits to something useful

class OAIS Information Model Information Object Data Object Physical Object Digital Object Bit Sequence Representation Information Structure Information Semantic Information Software Information Algorithms Information 1..* 0..* interpretedUsing 1..*

slide-28
SLIDE 28

Yannis Tzitzikas, SSME'07 28

FORTH-I CS

Modules and Dependencies

In order to abstract from the various domain-specific and time-varying details, we introduce the general notions of Module and Dependency.

  • Module

– We adopt a very general definition. A module could be:

  • a piece of software/ hardware module.
  • a knowledge model expressed explicitly and formally (e.g. an Ontology)
  • a knowledge model not expressed explicitly (e.g. GreekLanguage)

– (the only constraint is that modules need to have a unique identity)

  • Dependency

– A module t depends on t’, written t>t’, if t requires t’ – The meaning of a dependency t > t’

  • t cannot function/be understood/managed without the existence of t’

Note: We model the RI requirements of OAIS as dependencies between modules.

slide-29
SLIDE 29

Yannis Tzitzikas, SSME'07 29

FORTH-I CS

Modules and Dependencies: Examples

README.txt TEXT EDITOR ENGLISH LANGUAGE WINDOWS XP

(a)

README.txt TEXT EDITOR ENGLISH2GREEK DICTIONARY WINDOWS XP GREEK LANGUAGE

(b)

slide-30
SLIDE 30

Yannis Tzitzikas, SSME'07 30

FORTH-I CS

Modules and Dependencies: Examples

FITS FILE FITS STANDARD PDF STANDARD FITS JAVA s/w JAVA VM PDF s/w FITS DICTIONARY DICTIONARY SPECIFICATION UNICODE SPECIFICATION XML SPECIFICATION

  • Scientific Data
slide-31
SLIDE 31

Yannis Tzitzikas, SSME'07 31

FORTH-I CS

Modules and Dependencies: Examples

MULTIMEDIA PERFORMANCE DATA C3D DirectX MAX/MSP

3D motion data files 3D scene data files motion to music mapping strategy

  • Performing Arts Data

Motion Analysis and Recognition Motion- Multimedia Mapping Strategy Multimedia Generation GUI (For monitor & control) Motion Capture and Processing Motions 3D motion data Multimedia

  • utput

Mapping Parameters

slide-32
SLIDE 32

Yannis Tzitzikas, SSME'07 32

FORTH-I CS

Modules and Dependencies: Examples

CIDOC CRM CORE CIDOC CRM STANDARD RDF STANDARD CRM CORE XML Schema XML SPECIFICATION

  • Cultural Data

Metadata Record

slide-33
SLIDE 33

Yannis Tzitzikas, SSME'07 33

FORTH-I CS

Modules and Dependencies: Examples

  • Semantic Web Data

ns4 ns2 ns1 ns3 RDF/S

slide-34
SLIDE 34

Yannis Tzitzikas, SSME'07 34

FORTH-I CS

Formalizing Modules and Dependencies

t1 t2 t3 t4 t5 t6 t8 t7 tx ty T

  • Objects:

Obj={o1, …, on}

  • Components:

C={t1, .. tk}

  • Modules:

T = C ∪ Obj

  • Dependencies:

A binary relation over T (i.e. > ⊆ TxT)

  • Dependency graph: G = (T, >)

Notations

  • S: a subset of T
  • >+: the transitive closure of >
  • >*: the reflexive and transitive closure of >
  • Nr(t) = { t’ | t > t’ }
  • Nr+(t) = { t’ | t >+ t’ }
  • Nr*(t) = { t’ | t >* t’ }
  • Max(S): the maximal elements of S w.r.t. >
slide-35
SLIDE 35

Yannis Tzitzikas, SSME'07 35

FORTH-I CS

Formalizing Actor/Community knowledge

(in terms of modules and dependencies)

t1 t2 t3 t4 t5 t6 t8 t7 tx ty

T

Tu

  • Each actor or community u can be characterized by a profile Tu that contains

those modules that are assumed to be available/known to u.

  • Formalization: Tu ⊆ T

Examples

  • u is an artificial agent

– Tu may include the software/hardware modules available to it

  • u is a human,

– Tu may include modules that correspond to implicit knowledge

Unique Module Assumption (UMA)

  • Each module is uniquely identified by its name and its

required modules are always the same (more practical: different modules have different identities)

slide-36
SLIDE 36

Yannis Tzitzikas, SSME'07 36

FORTH-I CS

The notion of closure

(of modules and profiles)

  • Closure of a module t:

C(t) = Nr*(t)

  • Closure of a set of modules S:

C(S) = ∪ { C(t) | t ∈ S }

  • Required modules of t

C+(t) = C(t) - {t} = Nr+(t)

  • Closure of a profile Tu:

C(Tu) = Nr*(Tu)

t1 t2 t3 t4 t5 t6 t8 t7 tx ty

Tu

T

C+(tx) = C(tx)- {tx}

Closure of Tu

C+(ty) = C(ty)-{ty}

  • It is assumed that u knows C(Tu).

// modules required by tx

slide-37
SLIDE 37

Yannis Tzitzikas, SSME'07 37

FORTH-I CS

Intelligibility and Intelligibility Gap

  • Intelligibility

– Definition (dictionary)

  • 1. Capable of being understood: an intelligible set of directions.
  • 2. Capable of being apprehended by the intellect alone.
  • Intelligibility Gap

– Definition:

  • The smallest set of modules u needs to have in order to understand a module t.

– Notation

  • Gap(t,u): The intelligibility gap between a user u with profile Tu and a module t
slide-38
SLIDE 38

Yannis Tzitzikas, SSME'07 38

FORTH-I CS

Intelligibility and Intelligibility Gap (I)

  • u can understand t iff:

C+(t) ⊆ C(Tu)

  • The intelligibility gap:

Gap(t,u) = C+(t)-C(Tu)

t1 t2 t3 t4 t5 t6 t8 t7 tx ty

Tu Reqs of ty Closure of Tu Gap(ty,u)= ∅

t1 t2 t3 t4 t5 t6 t8 t7 tx ty

Tu Reqs of tx Closure of Tu Gap(tx,u)= {t1, t2, t4, t5}

slide-39
SLIDE 39

Yannis Tzitzikas, SSME'07 39

FORTH-I CS

Intelligibility and Intelligibility Gap (II)

  • u can understand t iff:

C+(t) ⊆ C(Tu) Due to UMA we can write:

  • C+(t) ⊆ C(Tu) ⇔ max(C+(t)) ⊆ C(Tu)
  • In our example

– max(C+(ty)))= {t3}∈C(Tu) max(C+(tx))={t1} ∉ C(Tu) t1 t2 t3 t4 t5 t6 t8 t7 tx ty

Tu Reqs of ty Closure of Tu

t1 t2 t3 t4 t5 t6 t8 t7 tx ty

Tu Reqs tx Closure of Tu

slide-40
SLIDE 40

Yannis Tzitzikas, SSME'07 40

FORTH-I CS

Converters

t1 t2 t3 t4 t5 t6 t8 t7

  • x oy

Tu

T

t1 t2 t3 t6 t7

C

t1 t2 t3 t4 t5 t6 t8 t7

Tu

T

We can capture emulation and migration by introducing converters (as a different kind of edges). Intelligibility gaps can be filled with converters and finding the appropriate converters reduces to the problem of REACHABILITY in directed graphs.

slide-41
SLIDE 41

Yannis Tzitzikas, SSME'07 41

FORTH-I CS

Example

Mypage.html yannis.jpg Mypage.html HTML Mypage.html HTML JPG The extension of the filename gives us a hint about the type of the digital object, so we may write type(mypage.html) =HTML, and as mypage.html > HTML, we can in general assume that: for every t it holds: t > type(t), if type(t) is known. However only if HTML is intelligible we can realize that: mypage.html > HTML We need to have an HTML parser. If we cannot understand HTML then we cannot deduce the dependency mypage.html > JPG In general, type(o) = type(o) ∪ type(contents(o)) To compute contents(o) we need to be able to understand type(o)

slide-42
SLIDE 42

Yannis Tzitzikas, SSME'07 42

FORTH-I CS

Example (II)

Mypage.html yannis.jpg Mypage.html HTML Mypage.html HTML JPG The extension of the filename gives us a hint about the type of the digital object, so we may write type(mypage.html) =HTML, and as mypage.html > HTML, we can in general assume that: for every t it holds: t > type(t), if type(t) is known. However only if HTML is intelligible we can realize that: mypage.html > HTML We need to have an HTML parser. If we cannot understand HTML then we cannot deduce the dependency mypage.html > JPG In general, type(o) = type(o) ∪ type(contents(o)) To compute contents(o) we need to be able to understand type(o)

  • So we may be unable to compute C(t)
  • We may be able to compute only one part of Nr(t).
slide-43
SLIDE 43

Yannis Tzitzikas, SSME'07 43

FORTH-I CS

Preservation Information Systems

A Preservation Information System could adopt the following policies

  • Input Policy

– The input (e.g. data objects to be archived) should be intelligible by the system

  • Output Policy

– The output (e.g. returned answers) should be intelligible by the recipients

The notion of profile could be used as gnomon in these policies

Intelligible wrt the profile of the system (say Tp) Intelligible wrt the profile of the user (say Tu)

PRESERVATION

  • INFO. SYSTEM

p

input

  • utput

USER u

slide-44
SLIDE 44

Yannis Tzitzikas, SSME'07 44

FORTH-I CS

Intelligibility-aware Interaction Schemes

Consider the classical query-and-answer interaction scheme between:

  • an information provider p and
  • an information consumer u

We will extend the query-and-answer interaction scheme with intelligibility-related concerns

(1): up: query(q) (2): pu: answer(A)

slide-45
SLIDE 45

Yannis Tzitzikas, SSME'07 45

FORTH-I CS

Intelligibility-aware Interaction Schemes

Case: p stores the dependency graphs and the profiles

PRESERVATION

  • INFO. SYSTEM

p USER u

t1 t2 t3 t4 t5 t6 t8 t7 tx ty

T

Tu

dependency graph + profiles

slide-46
SLIDE 46

Yannis Tzitzikas, SSME'07 46

FORTH-I CS

Intelligibility-aware Interaction Schemes

For Delivering Intelligible Answers > with fixed Number of Messages

PRESERVATION INFORMATION SYSTEM

Query q Answer A

Scheme (I)

Answers are accompanied by their closure

(1) up: query(q) (2) pu: return(A, C(A)) Scheme (II)

u sends Tu with the query (or registers it), p returns answers accompanied by the intelligibility gap

(1) up: query(q, Tu) (2) pu: return(A, Gap(A,u)) Step 1 can be replaced by (1’) up: query(q, Max(Tu))

USER u

slide-47
SLIDE 47

Yannis Tzitzikas, SSME'07 47

FORTH-I CS

Intelligibility-aware Interaction Schemes

For Delivering Intelligible Answers > with fixed Number of Messages

PRESERVATION INFORMATION SYSTEM

Query q Answer A

Scheme (I)

Answers are accompanied by their closure

(1) up: query(q) (2) pu: return(A, C(A)) Scheme (II)

u sends Tu with the query (or registers it), p returns answers accompanied by the intelligibility gap

(1) up: query(q, Tu) (2) pu: return(A, Gap(A,u)) Step 1 can be replaced by (1’) up: query(q, Max(Tu))

USER u

  • May expensive to compute and large in size
slide-48
SLIDE 48

Yannis Tzitzikas, SSME'07 48

FORTH-I CS

Intelligibility-aware Interaction Schemes

Case: p does not store user profiles

PRESERVATION

  • INFO. SYSTEM

p USER u

t1 t2 t3 t4 t5 t6 t8 t7 tx ty

T

dependency graph

  • nly
slide-49
SLIDE 49

Yannis Tzitzikas, SSME'07 49

FORTH-I CS

Intelligibility-aware Interaction Schemes

For Delivering Intelligible Answers > progressive method

Scheme (I’) Gradual identification and completion of the intelligibility gap The provider does not know Tu. Answers are accompanied by their direct requirements (1) up: query(q) (2) pu: return(A, max(C(A))) // ≡ return(A,Nr(A)) ≡ return(A,directReqOf(A)) (3) u; repeat (4) u: M:= recmsg –Tu // or M:=recmsg –C(Tu) (5) u: If M ≠ ∅ then (6) up: getDirectReqsOf(M) (7) pu: return(max(C(recmsg))) // ≡ return(Nr(recmsg)) (8) u: until M= ∅

  • Fast and small in size
slide-50
SLIDE 50

Yannis Tzitzikas, SSME'07 50

FORTH-I CS

Intelligibility-aware Interaction Schemes

For Recording information

It is analogous with the previous case (we revert the roles of p and u):

  • we ignore the query submission step
  • we consider that the user u is the preservation system who wants to ingest the

set of objects A that user p sends to u.

PRESERVATION INFORMATION SYSTEM

Query q Answer A

USER u PRESERVATION INFORMATION SYSTEM

Answer A

USER u

slide-51
SLIDE 51

Yannis Tzitzikas, SSME'07 51

FORTH-I CS

Preservation-related processes and intelligibility-related concerns

Input request Indentify Gap Commit Update Select Profile Fill the Gap Output request Deliver Indentify Gap Select Profile Fill the Gap Change Event Identify Consequences /Gaps Notify Tackle Change

Intelligibility

  • Ingest and Archive

– Ensure the intelligibility by the system, adopt a self-describing (wrt a profile) packaging approach.

  • Disseminate

– Deliver intelligible information packages

  • Curate

– Identify risks of obsolescence, react to changes, select preservation policy to adopt

  • Clean

– Estimate what is worth preserving. Delete the rest

slide-52
SLIDE 52

Yannis Tzitzikas, SSME'07 52

FORTH-I CS

CASPAR Architecture

slide-53
SLIDE 53

Yannis Tzitzikas, SSME'07 53

FORTH-I CS

Summary and Concluding Remarks Intelligibility of Digital Objects

  • Intelligibility is an important notion of preservation.
  • We formalized this notion on the basis of dependencies. The notion of

dependency is ubiquitous and dependency management is an important requirement that is subject of research in several (old and new emerged) areas, from software engineering to ontology engineering

  • A modern digital information preservation system should be generic, i.e. able to

preserve heterogeneous digital objects which may have different interpretation

  • f the notion of dependency.
  • Contribution

– Abstract notion of module and dependency – The notion of DC Profile: gnomon for deciding intelligibility

  • representation information adequacy (during input)
  • intelligibility (during output).

– Intelligibility Gap – Intelligibility-aware processes

slide-54
SLIDE 54

Yannis Tzitzikas, SSME'07 54

FORTH-I CS

Intelligibility of Digital Objects Next steps and Further Research

  • Future research

– Extend the theoretical framework with Converters (for capturing migration/evolution): they can be considered as a specialization of the notion of module. – Study the effects of changes (on modules, dependencies) and notification services – Study modules and dependencies of different granularity – Study properties of dependency relations (transitivity, acyclicity, …) – Relax the notion of identify (incorporate the notion of similarity and the notion of Diff)

  • For more see

– Y. Tzitzikas, “Dependency Management for the Preservation of Digital Information", 18th International Conference on Database and Expert Systems Applications, DEXA’2007, Regensburg, Germany, September 2007 – Y. Tzitzikas and G. Flouris, “Mind the (Intelligibility) Gap", 11th European Conference on Research and Advanced Technology for Digital Libraries, ECDL’2007, Budapest, Hungary, September 2007

  • Proof-of-concept prototype

– Based on Semantic Web Technologies.

slide-55
SLIDE 55

Yannis Tzitzikas, SSME'07 55

FORTH-I CS

Summary and Concluding Remarks General

  • Digital preservation is an endless-process which poses a number
  • f challenging problems

Thanks for your attention