Oct 132011
 

With the release of DDI version 3 (DDI Lifecycle), an effort was made to allow reuse and linkages throughout the content model. This created a rich model that allows for reuse and harmonization of metadata items through the use of referencing. When using the DDI 3 Addin for Colectica Repository, all of these relationships between metadata items are indexed and allow for the rich interlinkages of items as seen on Colectica Web. With all of this relationship information, wouldn’t it be nice to execute arbitrary queries about the relationships? Colectica Repository already offer relationship and set based searching for registered metadata items, but a more powerful interface has now arrived.

Colectica Repository RDF Services

SPARQL is a query language created for searching RDF data and is standardized by the W3C. It allows for searching based on the relationships and literal data stored in an RDF graph or store. Colectica Repository now offers a new Addin with the ability to query DDI 3 as RDF using a SPARQL endpoint on Colectica Web and from the Repository with a web service! In addition, each DDI 3 item stored in the Colectica Repository can be downloaded in RDF using a Concise Bounded Description.

How are the RDF Services implemented?

The RDF Services is a new optional component for Colectica Repository. Colectica Repository has many extension points created with the help of the Microsoft Managed Extensibility Framework. This allows custom Addins to be created and deployed by dropping a new assembly into the Addins folder. The DDI 3 Addin uses the Item Format extension point and many customers are already familiar with it. Another extension point is the Post Commit Hook. The RDF Services are implemented as a post commit hook and query for RDF serializers for the item using MEF. They then stores the RDF serialization of the DDI 3 in the Colectica Repository.

RDF Examples

I will show some examples from the US 2010 Census sample DDI 3 example file. The following is the RDF serialization of question 6 as a CBD which has a coded classification and question text in several languages.

@base <http://data.colectica.com/item/us.colectica/ba540279-8bfc-461c-a07b-d25493c648a7/31>.
 
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix ddi: <urn:ddirdf:>.
@prefix ddit: <urn:ddirdf:type:>.
 
_:autos1 a <ddit:CodeDomain>;
         <ddi:HasCodeScheme> <http://data.colectica.com/item/us.colectica/e2648604-e1a5-4a75-8b0e-52a5c13dd89c/13>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

<http://data.colectica.com/item/us.colectica/ba540279-8bfc-461c-a07b-d25493c648a7/31> <http://purl.org/dc/elements/1.1/title> "Q6"@en-us;
    a <ddit:Question>;
    <ddi:AgencyId> "us.colectica"^^xsd:string;
    <ddi:EstimatedTime> "PT0S"^^xsd:dayTimeDuration;
    <ddi:HasCodeDomain> _:autos1;
    <ddi:HasCodeSet> <http://data.colectica.com/item/us.colectica/e2648604-e1a5-4a75-8b0e-52a5c13dd89c/13>;
    <ddi:Id> "ba540279-8bfc-461c-a07b-d25493c648a7"^^xsd:string;
    <ddi:QuestionIntent> "Asked since 1790. Census data about sex are important because many federal programs must differentiate between males and females for funding, implementing and evaluating their programs. For instance, laws promoting equal employment opportunity for women require census data on sex. Also, sociologists, economists, and other researchers who analyze social and economic trends use the data."@en-us;
    <ddi:QuestionText> "¿Cuál es el sexo de la Persona 1?"@es,
          "Jinsia ya Mtu wa 1 ni ipi?"@sw,
          "Quel est le sexe de la Personne 1 ?"@fr,
          "Seksi i Personit 1?"@sq,
          "Was ist das Geschlecht von Person 1?"@de,
          "What is Person {PersonCounter}'s sex?"@en-us;
    <ddi:UserId> "Colectica:UserAssignedId:Q6"^^xsd:string;
    <ddi:Version> 31 ;
    <ddi:VersionDate> "2011-02-09T10:11:14"^^xsd:dateTime;
    <ddi:VersionRationale> "Publishing study"@en-us.


Question 5 shows the use of multiple response domains.

@base <http://data.colectica.com/item/us.colectica/6fc6291a-b9c1-4698-83f8-3983c2ec8cb4/28>.
 
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix ddi: <urn:ddirdf:>.
@prefix ddit: <urn:ddirdf:type:>.

_:autos1 <dc:description> "Apellido"@es,
           "First Name"@en-us,
           "Jina la Mwisho"@sw,
           "Mbiemri"@sq,
           "Nachname"@de,
           "Nom"@fr;
         <dc:title> "Apellido"@es,
         "First Name"@en-us,
         "Jina la Mwisho"@sw,
         "Mbiemri"@sq,
         "Nachname"@de,
         "Nom"@fr;
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

_:autos2 <dc:description> "Emri"@sq,
           "Jina la kwanza"@sw,
           "Last Name"@en-us,
           "Nombre"@es,
           "Prénom"@fr,
           "Vorname"@de;
         <dc:title> "Emri"@sq,
         "Jina la kwanza"@sw,
         "Last Name"@en-us,
         "Nombre"@es,
         "Prénom"@fr,
         "Vorname"@de;
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

_:autos3 <dc:description> "Emri i dytë"@sq,
           "Herufi ya Kati"@sw,
           "Inicial"@es,
           "Initiale 2e prénom"@fr,
           "MI"@en-us,
           "Mittl.Init."@de;
         <dc:title> "Emri i dytë"@sq,
         "Herufi ya Kati"@sw,
         "Inicial"@es,
         "Initiale 2e prénom"@fr,
         "MI"@en-us,
         "Mittl.Init."@de;
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

<http://data.colectica.com/item/us.colectica/6fc6291a-b9c1-4698-83f8-3983c2ec8cb4/28> <dc:title> "Q5"@en-us;
    a <ddit:Question>;
    <ddi:AgencyId> "us.colectica"^^xsd:string;
    <ddi:EstimatedTime> "PT0S"^^xsd:dayTimeDuration;
    <ddi:HasTextDomain> _:autos1,
                   _:autos2,
                   _:autos3;
    <ddi:Id> "6fc6291a-b9c1-4698-83f8-3983c2ec8cb4"^^xsd:string;
    <ddi:QuestionIntent> "Listing the name of each person in the household helps the respondent to include all members, particularly in large households where a respondent may forget who was counted and who was not. Also, names are needed if additional information about an individual must be obtained to complete the census form. Federal law protects the confidentiality of personal information, including names."@en-us;
    <ddi:QuestionText> "¿Cuál es el nombre de la Persona 1?"@es,
                  "Jina la Mtu wa 1 ni lipi?"@sw,
                  "Quel est le nom de la Personne 1 ?"@fr,
                  "Si quhet Personi 1?"@sq,
                  "What is Person {PersonCounter}'s name?"@en-us,
                  "Wie lautet der Name von Person 1?"@de;
    <ddi:UserId> "Colectica:UserAssignedId:Q5"^^xsd:string;
    <ddi:Version> 28 ;
    <ddi:VersionDate> "2011-02-09T10:11:14"^^xsd:dateTime;
    <ddi:VersionRationale> "Publishing study"@en-us.

SPARQL Examples

Lets take a look at some SPAQRL queries that I can run across the DDI 3 RDF stored in the Colectica Repository. The first one I will look for studies that I have created since January 2010.

PREFIX ddi: <urn:ddirdf:>
PREFIX ddit: <urn:ddirdf:type:>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
 
SELECT ?study
WHERE
{
    ?study a ddit:StudyUnit;
    dc:date ?creation_date;
    dc:creator <http://dan.smith.name/who#dan>.
    FILTER ( xsd:dateTime(?creation_date) > "2010-01-01 00:00:00"^^xsd:dateTime ) .
}
ORDER BY ?study

This second query will give us a count of how many times a variable has been reused/harmonized across datasets.

PREFIX ddi: <urn:ddirdf:>
PREFIX ddit: <urn:ddirdf:type:>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
 
SELECT ?variable COUNT (?parent) AS c
WHERE
{
    ?parent ddi:HasVariable ?variable .
    ?parent a ddit:Dataset
}
GROUP BY ?variable

Next Steps

Colectica Repository DDI 3 RDF Services is now available as a community technology preview to interested customers. There are several items to take note of while using it:

  • There is not yet an official DDI RDF vocabulary. The namespace and names of elements may change in the future.
  • Several external vocabularies are also currently used. These include rdf, rdfs, dc, dcterms, owl, xsd, and foaf.
  • Per metadata item level ACLs on items in the Colectica Repository are not yet implemented in the SPARQL interface. This means that all metadata items will be in a read state to all users, Deploy accordingly.
  • SPARQL UPDATE is disabled to maintain consistency with the versioned metadata items in the Repository.
  • You can set the location of your Colectica Web installation in the RDF Services so that the generated URLs for items are resolvable in the browser. This allows users to see a nice web based view of the information.

Feedback is welcome!

Dan