CSV on the Web
Intro to W3C CSV on the Web Specifications DDI Metadata Workshop – Dagstuhl 2016
Gregg Kellogg
gregg@greggkellogg.net @gkellogg
1
https://gkellogg.github.com/ddi-csvw
CSV on the Web Intro to W3C CSV on the Web Specifications DDI - - PowerPoint PPT Presentation
CSV on the Web Intro to W3C CSV on the Web Specifications DDI Metadata Workshop Dagstuhl 2016 Gregg Kellogg gregg@greggkellogg.net https://gkellogg.github.com/ddi-csvw @gkellogg 1 CSV data is dumb Its a simple text format, data
Intro to W3C CSV on the Web Specifications DDI Metadata Workshop – Dagstuhl 2016
gregg@greggkellogg.net @gkellogg
1
https://gkellogg.github.com/ddi-csvw
meaning.
format: what does “09/10/2016” mean?
columns: Foreign Keys
Join results
2
Resources
conversion
3
interoperability with working with CSV, or similar formats.
www.w3.org/TR/tabular-data-model/
metadata/
TR/csv2json/
TR/csv2rdf/
4
id Table Group id Table notes transformations about URL cells datatype default Column lang name number
property URL required separator table text direction titles value URL virtual cells number primary key titles Row referenced rows source number table about URL column errors
Cell property URL row string value table text direction value value URL
5
notes foreign keys
url
tables columns rows table direction
rows table
lineTerminators, quoteChar, …
conversions.
6
locations
names)
7
foreignKeys columns @id @type
Schema
primaryKey rowTitles
8
url targetFormat scriptFormat titles source @id @type
Transformation Definition
name titles required suppressOutput virtual @id @type
Column Description
columnReference reference
Foreign Key Definition
resource schemaReference columnReference
Foreign Key Reference
array property link property URI template property column reference property
natural language property atomic property
Legend:
reference to an array of values of a specific category reference to a value of a specific category @language @base
Top-Level Properties
tables transformations tableDirection tableSchema dialect @context @id
Table Group
notes @type decimalChar groupChar pattern
Number Format
url transformations tableDirection tableSchema dialect notes
Table
@context @id @type suppressOutput null lang textDirection separator
default datatype
Inherited Properties
aboutUrl propertyUrl valueUrl required base format length minLength maxLength minimum maximum
Datatype Description
minInclusive maxInclusive minExclusive maxExclusive @id @type encoding lineTerminators quoteChar doubleQuote skipRows commentPrefix header
Dialect Description
headerRowCount skipBlankRows skipInitialSpace trim @id delimiter skipColumns
9
countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan
countries.csv
countryRef year population AF 1960 9,616,353 AF 1961 9,799,379 AF 1961 9,989,846
country_slice.csv
10
embedded metadata.
compatibility.
located.
11
string, UAX35 number picture string
12
references (single and multiple columns)
13
countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan
countries.csv
14
{ "tables": [{ "url": "http://example.org/countries.csv", "row": [{ "url": "http://example.org/countries.csv#row=2", "rownum": 1, "describes": [{ "countryCoe": "AD", "latitude": "42.5", "longitude": "1.6", "name": "Andorra" }] }, { "url": "http://example.org/countries.csv#row=3", "rownum": 2, "describes": [{ "countryCode": "AE", "latitude": "23.4", "longitude": "53.8", "name": "United Arab Emirates" }] }, { "url": "http://example.org/countries.csv#row=4", "rownum": 3, "describes": [{ "countryCode": "AF", "latitude": "33.9", "longitude": "67.7", "name": "Afghanistan" }] }] }] }
countries.json countries-standard.json
countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan
15
[{ "countryCode": "AD", "latitude": "42.5", "longitude": "1.6", "name": "Andorra" }, { "countryCode": "AE", "latitude": "23.4", "longitude": "53.8", "name": "United Arab Emirates" }, { "countryCode": "AF", "latitude": "33.9", "longitude": "67.7", "name": "Afghanistan" }]
countries.csv countries.json countries-minimal.json
countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan
16
@base <http://example.org/countries.csv> . @prefix csvw: <http://www.w3.org/ns/csvw#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . _:tg a csvw:TableGroup ; csvw:table [ a csvw:Table ; csvw:url <http://example.org/countries.csv> ; csvw:row [ a csvw:Row ; csvw:rownum "1"^^xsd:integer ; csvw:url <#row=2> ; csvw:describes _:t1r1 ], [ a csvw:Row ; csvw:rownum "2"^^xsd:integer ; csvw:url <#row=3> ; csvw:describes _:t1r2 ], [ a csvw:Row ; csvw:rownum "3"^^xsd:integer ; csvw:url <#row=4> ; csvw:describes _:t1r3 ] ] . _:t1r1 <#countryCode> "AD" ; <#latitude> "42.5" ; <#longitude> "1.6" ; <#name> "Andorra" . _:t1r2 <#countryCode> "AE" ; <#latitude> "23.4" ; <#longitude> "53.8" ; <#name> "United Arab Emirates" . _:t1r3 <#countryCode> "AF" ; <#latitude> "33.9" ; <#longitude> "67.7" ; <#name> "Afghanistan" .
countries.csv countries.json countries-standard.ttl
countryCode latitude longitude name AD 42.5 1.6 Andorra AE 23.4 53.8 United Arab Emirates AF 33.9 67.7 Afghanistan
17 @base <http://example.org/countries.csv> . _:t1r1 <#countryCode> "AD" ; <#latitude> "42.5" ; <#longitude> "1.6" ; <#name> "Andorra" . _:t1r2 <#countryCode> "AE" ; <#latitude> "23.4" ; <#longitude> "53.8" ; <#name> "United Arab Emirates" . _:t1r3 <#countryCode> "AF" ; <#latitude> "33.9" ; <#longitude> "67.7" ; <#name> "Afghanistan" .
countries.csv countries.json countries-minimal.ttl
18
@gkellogg gregg@greggkellogg.net http://greggkellogg.net/
https://gkellogg.github.com/ddi-csvw/
20
rel=“describedby”
21
type="application/ld+json" or type="application/json". {+url}-metadata.json csv-metadata.json
Context
namespace http:// www.w3.org/ns/csvw
@language
22
{ "@context": "http://www.w3.org/ns/csvw", ... } { "@context": [ "http://www.w3.org/ns/csvw", { "@base": "http://example.org/", "@language": "en-AU" } ], ... }
a tableSchema
transformations to other formats
23
{ "@context": "http://www.w3.org/ns/csvw", "@type": "TableGroup", "dialect": { "delimiter": "\t", "headerRowCount": 3 }, "notes": { "type": "Annotation", "target": "countries.csv#cell=2,6-*,7", "body": "…representative points.", "motivation": "commenting" }, "tables": [{ "url": "countries.csv" }, { "url": "country-groups.csv" }], "tableDirection": "ltr", “tableSchema": {}, "transformations": [{...}] }
to describe that format of referenced tables
24
{ "@context": "http://www.w3.org/ns/csvw", "@type": "Table", "url": "countries.csv", "dialect": {...}, "notes": {...}, "tableDirection": "..", "tableSchema": { "columns": [{ "titles": "country" },{ "titles": "country group" },{ "titles": "name (en)" },{ "titles": "name (fr)" },{ "titles": "name (de)" },{ "titles": "latitude" },{ "titles": "longitude" }] }, "transformations": {...} }
columns.
entries in another table.
content defines the title for the row.
25
{ "@context": "http://www.w3.org/ns/csvw", "url": "countries.csv", "tableSchema": { "columns": [{ "titles": "country" },{ "titles": "country group" },{ "name": "name_en", "titles": "name (en)", "lang": "en" },{ "name": "name_fr", "titles": "name (fr)", "lang": "fr" },{ "name": "name_de", "titles": "name (de)", "lang": "de" },{ "titles": "latitude", "datatype": "number" },{ "titles": "longitude", "datatype": "number" }], "foreignKeys": [{}], "primaryKey": "country", "rowTitles": ["name_en", "name_fr", "name_de"] } }
in URI templates.
MUST match the header from the CSV. Allows different forms for internationalization.
26
{ "titles": "country", "dc:description": "The ISO two-letter code for a country, in lowercase.", "datatype": { "base": "string", "minLength": "3", "maxLength": "128" }, "virtual": false }
Datatypes
27
{ "@context": "http://www.w3.org/ns/csvw", "url": "countries.csv", "tableSchema": { "aboutUrl": "http://example.org/country/{code}", "columns": [{ "titles": "country", "name": "code", "suppressOutput": true },{ "titles": "name (en)", "lang": "en", "propertyUrl": "schema:name" },{ "titles": "latitude", "datatype": "number", "aboutUrl": "http://example.org/country/{code}#geo", "propertyUrl": "schema:latitude" },{ "titles": "longitude", "datatype": "number", "aboutUrl": "http://example.org/country/{code}#geo", "propertyUrl": "schema:longitude" },{ "virtual": true, "propertyUrl": "rdf:type", "valueUrl": "schema:Country" },{ "virtual": true, "propertyUrl": "schema:geo", "valueUrl": "http://example.org/country/{code}#geo" },{ "virtual": true, "aboutUrl": "http://example.org/country/{code}#geo", "propertyUrl": "rdf:type", "valueUrl": "schema:GeoCoordinates" }] } }
names.
LD to associated with the associated model object.
is constrained.
28
{ "@context": "http://www.w3.org/ns/csvw", "@type": "Table", "url": "http://example.com/table.csv", "tableSchema": [...], "dc:title": [ {"@value": "The title of this Table", "@language": "en"} {"@value": "Der Titel dieser Tabelle", "@language": "de"} ], "dc:publisher": [{ "schema:name": "Example Municipality", "schema:url": {"@id": "http://example.org"} }], "schema:url": {"@id": "http://example.com/table.csv"} }
29
{ "encoding": "utf-8", "lineTerminators": ["\r\n", "\n"], "quoteChar": "\"", "doubleQuote": true, "skipRows": 0, "commentPrefix": "#", "header": true, "headerRowCount": 1, "delimiter": ",", "skipColumns": 0, "skipBlankRows": false, "skipInitialSpace": false, "trim": false }
template
script format
target format
transformation
30
{ "@context": "http://www.w3.org/ns/csvw", "url": "countries.csv", "transformations": [{ "targetFormat": "http://www.iana.org/ assignments/media-types/application/xml", "titles": "Simple XML version", "url": "xml-template.mustache", "scriptFormat": "https://mustache.github.io/", "source": "json" }] }
integer long int byte short nonNegativeInteger nonPositiveInteger negativeInteger positiveInteger unsignedLong unsignedInt unsignedShort unsignedByte anyURI base64Binary (binary) boolean date dateTime (datetime) dateTimeStamp double (number) decimal duration dayTimeDuration yearMonthDuration float gDay gMonth gMonthDay gYear gYearMonth hexBinary QName string time anyAtomicType (any) normalizedString token language Name xml json html NMTOKEN
From XML Schema Pt. 2
Datatype Aliases number double binary base64Binary datetime dateTime any anyAtomicType xml rdf:XMLLiteral html rdf:HTML json csvw:JSON
32
{ "titles": "name (en)", "datatype": { "base": "string", "minLength": "3", "maxLength": "128" } } { "titles": "country", "datatype": { "dc:title": "Country Code", "dc:description": "Country codes as specified in ISO 3166.", "base": "string", "format": "[a-z]{2}" } } { "titles": "latitude", "datatype": { "base": "number", "minimum": "-90", "maximum": "90" } }
33
{ "titles": "latitude", "datatype": { "base": "number", "minimum": "-90", "maximum": "90", "format": "#0.000000##" } } "datatype": { "base": "integer", "format": { "decimalChar": ",", "groupChar": " ", "pattern": "# ##0,0#" } } { "titles": "latitude", "datatype": { "base": "number", "minimum": "-90", "maximum": "90", "format": "#0.000000##" } }
34
"datatype": { "base": "boolean", "format": "Yes|No" }
35
"datatype": { "base": "date", "minimum": "2000-01-01", "format": "dd/MM/yyyy" }
similar conventions
data model, with any available serialization
but no champion emerged to define such a serialization.
provenance information from
using “minimal” mode.
36