ARCHIVING & PRESERVING WEB CONTENT THE INTERNET ARCHIVE What? - - PowerPoint PPT Presentation

archiving preserving web content the internet archive
SMART_READER_LITE
LIVE PREVIEW

ARCHIVING & PRESERVING WEB CONTENT THE INTERNET ARCHIVE What? - - PowerPoint PPT Presentation

ARCHIVING & PRESERVING WEB CONTENT THE INTERNET ARCHIVE What? A non-profit digital library and archive Where? San Francisco, CA When? Who? Founded in 1996 by Brewster Kahle How? Officially designated a library by the state of California


slide-1
SLIDE 1
slide-2
SLIDE 2

ARCHIVING & PRESERVING WEB CONTENT

slide-3
SLIDE 3

THE INTERNET ARCHIVE

What? A non-profit digital library and archive Where? San Francisco, CA When? Who? Founded in 1996 by Brewster Kahle How? Officially designated a library by the state of California in 2007

slide-4
SLIDE 4

THE WAYBACK MACHINE

Online: https://archive.org/web/ The largest publicly available web archive in existence. > 280 Billion Pages > 100 million websites > 150 languages ~ 1 billion URLs added per week

slide-5
SLIDE 5

WEB ARCHIVING

What is a web archive? A collection of archived URLs grouped by theme, event, subject area, or web address. A web archive contains as much as possible from the original resources and documents their change over time. It is a priority to recreate the same experience a user would have had if they had visited the live site on the day it was archived.

slide-6
SLIDE 6

THE LIFESPAN OF A WEBSITE How long does a website last?

In general, a typical web page can be expected to last ~90-100 days before changing, moving, or disappearing completely. > In 2013, our colleagues at Old Dominion University determined that over 10% of event related content posted to social media platforms is lost after one year. > In 2014, a study by UCLA determined that 7-in-10 scholarly articles that include citations with hyperlinks suffer from reference rot.

slide-7
SLIDE 7

ARCHIVE-IT: A WEB ARCHIVING SERVICE

A web-based application launched in 2006 that allows users to create, manage, access and store collections of web-based digital content. A fully hosted solution, including access and storage. A suite of tools for selecting and scoping, and cataloging. Provides the ability to capture content using 10 different frequencies. Archived web content includes: html, text, videos, audio, social media, PDF, images, password protected content, static databases and newspapers. Browse archived content 24 hours after a capture is complete; full text search is available within 7 days. Private access options are available.

slide-8
SLIDE 8

HOW IS ARCHIVE-IT DIFFERENT THAN THE GENERAL/GLOBAL WAYBACK?

Focused collections Control over scope and frequency Technical support All content and metadata indexed for search Archived data shipped/downloaded Private access options Available 24 hours after captured Subscription service One collection Snapshot Automated Search and cataloging not available Shipping/download not available Public access only Access varies Absolutely free

slide-9
SLIDE 9

WHAT OUR PARTNERS ARE COLLECTING...

slide-10
SLIDE 10

ARCHIVE-IT USE CASES

Create a thematic/topical web archive on a specific subject or event > Often related to traditional collecting activity around the same topical focus > Capture spontaneous events > Document different perspectives and social commentaries Fulfill a mandate to capture/preserve evolving web history > Construct a historical record of an institution or individual’s web/social media presence > Support an electronic records system to meet records retention requirements > Collect publications/documents that are no longer in print form Closure crawls > Document a public institution’s presence on the web before it changes or closes

slide-11
SLIDE 11

UNIVERSITY OF ALBERTA: ALBERTA FLOODS JUNE 2013

Use Case: Archive web content before, during, and after the 2013 Alberta floods

> Personal and institutional blogs > News articles > Institutional websites

slide-12
SLIDE 12

WILFRID LAURIER UNIVERSITY

> Document the university’s social media presence Use Cases: > Archive the university’s web presence in order to meet required records retention mandates.

slide-13
SLIDE 13

ACCESS TO COLLECTIONS

Partners: > Can view through private web application with login/password General Public: > Can view from Archive-It’s website: http://www.archive-it.org/ > Search Archive-It data and metadata from institutional domains > Landing Pages: branded pages that link back to Archive-It hosted data

slide-14
SLIDE 14

EXAMPLES OF ORGANIZATIONS’ LANDING PAGES

Library of Virginia University of Texas at Austin

slide-15
SLIDE 15

PRIVATE ACCESS OPTIONS

> Entire account > Individual collections > Specific URLs > IP address

slide-16
SLIDE 16

STORAGE AND PRESERVATION

Storage: > 2 copies (primary & backup) of archived data are stored at San Francisco data centers. > A third copy is transferred to the General Archive. > A copy of archived data can be shipped on a hard drive > Partners can always download their archived data from Internet Archive’s servers. Preservation partnerships: > 2008: LOCKSS > 2013: DuraCloud > 2017: Multiple in development...

slide-17
SLIDE 17

DATA REPOSITORY

slide-18
SLIDE 18

KEY ARCHIVE-IT FEATURES

> Different levels of access for account users > Ten available capture frequencies (from twice daily to yearly) > Browse collections by URL, search by full-text and metadata > Detailed post crawl reports for analysis > Quality Assurance (QA) tools > Online Help Center and User Manual > Web Archivists and technical support > Hosting, access, and redundant storage

slide-19
SLIDE 19

SUBSCRIPTION MODEL

> Annual, renewable subscription > Subscription levels vary by the amount of archived data archived > Factors include: type and number of sites, how large they are, and how frequently they are archived > All subscriptions include hosting, access, and perpetual storage (primary and backup)

slide-20
SLIDE 20

TIME COMMITMENTS

Staff dedicated to web archiving program NDSA, Web Archiving in the United States: A 2016 Survey

58% 13% 5% 5% 19%

slide-21
SLIDE 21

THE WEB ARCHIVING LIFE CYCLE http://www.archive-it.org/publications

slide-22
SLIDE 22

COMPLIMENTARY TRIAL Create a collection of up to 5 websites, archive content, and view the results!

slide-23
SLIDE 23

ARCHIVE-IT WEB APPLICATION DEMO

STO

slide-24
SLIDE 24

LEARN MORE Check out our blog: www.archive-it.org/blog Follow us on Twitter: @archiveitorg Like us on Facebook: https://www.facebook.com/ArchiveIt Questions? ait@archive.org THANK YOU!