Aug 292013

The changes from the DDI 3.2 public review have been entered into the source repository, and the final review of the changes is now taking place. A main focus on version 3.2 is consistency and usability, and the Technical Committee came up with a list of design and content guidelines to ensure this. This focus on consistency should allow users and developers to more quickly adopt DDI Lifecycle since all the content areas should now be programmatically usable in the same ways.

Check out an example report on the current DDI 3.2 development schemas

During our review of 3.2, we have created a tool to point out items in the DDI schema set that do not conform with these consistency guidelines. The tool analyses the schemas and creates an html report of items that should be addressed before release. It currently performs the following checks.

  • Validate schema set is DDI Lifecycle.
  • Check compilation of the schema as an XML Schema Set.
  • Versionables and Maintainables allowing inline or reference usage.
    • Versionables and Maintainables are in a xs:Choice.
    • Versionables and Maintainables in a xs:Choice contain two elements.
    • Versionables and Maintainables in a xs:Choice contain a xxxReference.
  • FragmentInstance contains all Versionables and Maintainables.
  • Type of Object for references
    • Duplicate Element names detected for referenceable types.
    • Element names detected without a TypeOfObject defined.
  • Spell checking
    • Element names
    • Attribute names
    • XSD annotations/documentation
    • Breaking apart CamelCasedWords
    • Allows words to be added to dictionary
    • Uses en-US
    • Highlighting of misspellings in generated reports.

In addition to checking the structures in the schema, the tool also does a spell checking of all elements, attributes, and inline documentation to make sure that the released DDI has a professional feel. You can see an example report on the current DDI 3.2 schemas progress towards the consistency goals!

lgplv3We have licensed the tool as Open Source under the LGPL and the code is available for download and forking on GitHub at

There is also a release of the compiled tool on the releases page. Please email the DDI users list, send a tweet, or send us pull requests with any additional tests that you would like to see incorporated.

Mar 102012

I received a followup question to my post about registering 11179 items in the Colectica Repository. This question involves working with the Colectica SDK and its DDI model in conjunction with the Repository.

How do I connect to the Repository and retrieve a DdiInstance, such as the YourDdiInstance() method in your previous post?

First we will create the repository client. In this example we will use the built in Active Directory authentication and send the credentials of the user running the program (The user who asked the question uses the Active Directory authentication and roles). Notice the username and password are not specified as they were in the example from my previous post.

// Create the web services client
var client = new WcfRepositoryClient("localhost", 19893);

If we know the item’s identification, we can retrieve the item. If not, we can perform a search on the repository. The basic GetItem has many variations with different processing options, retrieving item lists, and sets of relationships. The simple GetItem and GetLatestItem are shown below.

// Get an item by 11179 identifier
IVersionable item = client.GetItem(id, agency, version);

// Or get the latest version
item = client.GetLatestItem(id, agency);

DdiInstance instance = item as DdiInstance;

To make it extremely easy to work with DDI items in the Colectica Repository, we will wrap this client with additional methods using the DdiClient. This also avoids the type checking and casting if you want to access properties of the DdiInstance not present on IVersionable. There are also similar methods for each DDI item type as the one shown below!

// Wrap the web services client
DdiClient ddiClient = new DdiClient(client);

// Get the Ddi Instance
DdiInstance instance = ddiClient.GetDdiInstance(
  id, agency, version, ChildReferenceProcessing.Instantiate)

The client calls allow controlling how child items are populated. If we have an unpopulated DdiInstance, we can use a similar method call to fill it with data and find its children.

// an unpopulated item with its identification. Children items 
// may come back  from the client as unpopulated depending on 
// the child processing that is selected. Here is an example of 
// how to populate such an item with the client

DdiInstance instance = new DdiInstance() 
  Identifier = id, 
  AgencyId = agency, 
  Version = version,
  IsPopulated = false 

// Populate the Ddi Instance
  false, ChildReferenceProcessing.Instantiate);

// Or as shown in Update 1 of my other post, populate the entire 
// item hierarchy from the Repository
GraphPopulator populator = new GraphPopulator(client);

// Do something with the instance
foreach(StudyUnit study in instance.StudyUnits)


Oct 132011

With the release of DDI version 3 (DDI Lifecycle), an effort was made to allow reuse and linkages throughout the content model. This created a rich model that allows for reuse and harmonization of metadata items through the use of referencing. When using the DDI 3 Addin for Colectica Repository, all of these relationships between metadata items are indexed and allow for the rich interlinkages of items as seen on Colectica Web. With all of this relationship information, wouldn’t it be nice to execute arbitrary queries about the relationships? Colectica Repository already offer relationship and set based searching for registered metadata items, but a more powerful interface has now arrived.

Colectica Repository RDF Services

SPARQL is a query language created for searching RDF data and is standardized by the W3C. It allows for searching based on the relationships and literal data stored in an RDF graph or store. Colectica Repository now offers a new Addin with the ability to query DDI 3 as RDF using a SPARQL endpoint on Colectica Web and from the Repository with a web service! In addition, each DDI 3 item stored in the Colectica Repository can be downloaded in RDF using a Concise Bounded Description.

How are the RDF Services implemented?

The RDF Services is a new optional component for Colectica Repository. Colectica Repository has many extension points created with the help of the Microsoft Managed Extensibility Framework. This allows custom Addins to be created and deployed by dropping a new assembly into the Addins folder. The DDI 3 Addin uses the Item Format extension point and many customers are already familiar with it. Another extension point is the Post Commit Hook. The RDF Services are implemented as a post commit hook and query for RDF serializers for the item using MEF. They then stores the RDF serialization of the DDI 3 in the Colectica Repository.

RDF Examples

I will show some examples from the US 2010 Census sample DDI 3 example file. The following is the RDF serialization of question 6 as a CBD which has a coded classification and question text in several languages.

@base <>.
@prefix rdf: <>.
@prefix rdfs: <>.
@prefix xsd: <>.
@prefix ddi: <urn:ddirdf:>.
@prefix ddit: <urn:ddirdf:type:>.
_:autos1 a <ddit:CodeDomain>;
         <ddi:HasCodeScheme> <>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

<> <> "Q6"@en-us;
    a <ddit:Question>;
    <ddi:AgencyId> "us.colectica"^^xsd:string;
    <ddi:EstimatedTime> "PT0S"^^xsd:dayTimeDuration;
    <ddi:HasCodeDomain> _:autos1;
    <ddi:HasCodeSet> <>;
    <ddi:Id> "ba540279-8bfc-461c-a07b-d25493c648a7"^^xsd:string;
    <ddi:QuestionIntent> "Asked since 1790. Census data about sex are important because many federal programs must differentiate between males and females for funding, implementing and evaluating their programs. For instance, laws promoting equal employment opportunity for women require census data on sex. Also, sociologists, economists, and other researchers who analyze social and economic trends use the data."@en-us;
    <ddi:QuestionText> "¿Cuál es el sexo de la Persona 1?"@es,
          "Jinsia ya Mtu wa 1 ni ipi?"@sw,
          "Quel est le sexe de la Personne 1 ?"@fr,
          "Seksi i Personit 1?"@sq,
          "Was ist das Geschlecht von Person 1?"@de,
          "What is Person {PersonCounter}'s sex?"@en-us;
    <ddi:UserId> "Colectica:UserAssignedId:Q6"^^xsd:string;
    <ddi:Version> 31 ;
    <ddi:VersionDate> "2011-02-09T10:11:14"^^xsd:dateTime;
    <ddi:VersionRationale> "Publishing study"@en-us.

Question 5 shows the use of multiple response domains.

@base <>.
@prefix rdf: <>.
@prefix rdfs: <>.
@prefix xsd: <>.
@prefix dc: <>.
@prefix ddi: <urn:ddirdf:>.
@prefix ddit: <urn:ddirdf:type:>.

_:autos1 <dc:description> "Apellido"@es,
           "First Name"@en-us,
           "Jina la Mwisho"@sw,
         <dc:title> "Apellido"@es,
         "First Name"@en-us,
         "Jina la Mwisho"@sw,
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

_:autos2 <dc:description> "Emri"@sq,
           "Jina la kwanza"@sw,
           "Last Name"@en-us,
         <dc:title> "Emri"@sq,
         "Jina la kwanza"@sw,
         "Last Name"@en-us,
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

_:autos3 <dc:description> "Emri i dytë"@sq,
           "Herufi ya Kati"@sw,
           "Initiale 2e prénom"@fr,
         <dc:title> "Emri i dytë"@sq,
         "Herufi ya Kati"@sw,
         "Initiale 2e prénom"@fr,
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

<> <dc:title> "Q5"@en-us;
    a <ddit:Question>;
    <ddi:AgencyId> "us.colectica"^^xsd:string;
    <ddi:EstimatedTime> "PT0S"^^xsd:dayTimeDuration;
    <ddi:HasTextDomain> _:autos1,
    <ddi:Id> "6fc6291a-b9c1-4698-83f8-3983c2ec8cb4"^^xsd:string;
    <ddi:QuestionIntent> "Listing the name of each person in the household helps the respondent to include all members, particularly in large households where a respondent may forget who was counted and who was not. Also, names are needed if additional information about an individual must be obtained to complete the census form. Federal law protects the confidentiality of personal information, including names."@en-us;
    <ddi:QuestionText> "¿Cuál es el nombre de la Persona 1?"@es,
                  "Jina la Mtu wa 1 ni lipi?"@sw,
                  "Quel est le nom de la Personne 1 ?"@fr,
                  "Si quhet Personi 1?"@sq,
                  "What is Person {PersonCounter}'s name?"@en-us,
                  "Wie lautet der Name von Person 1?"@de;
    <ddi:UserId> "Colectica:UserAssignedId:Q5"^^xsd:string;
    <ddi:Version> 28 ;
    <ddi:VersionDate> "2011-02-09T10:11:14"^^xsd:dateTime;
    <ddi:VersionRationale> "Publishing study"@en-us.

SPARQL Examples

Lets take a look at some SPAQRL queries that I can run across the DDI 3 RDF stored in the Colectica Repository. The first one I will look for studies that I have created since January 2010.

PREFIX ddi: <urn:ddirdf:>
PREFIX ddit: <urn:ddirdf:type:>
PREFIX dc: <>
SELECT ?study
    ?study a ddit:StudyUnit;
    dc:date ?creation_date;
    dc:creator <>.
    FILTER ( xsd:dateTime(?creation_date) > "2010-01-01 00:00:00"^^xsd:dateTime ) .
ORDER BY ?study

This second query will give us a count of how many times a variable has been reused/harmonized across datasets.

PREFIX ddi: <urn:ddirdf:>
PREFIX ddit: <urn:ddirdf:type:>
PREFIX dc: <>
SELECT ?variable COUNT (?parent) AS c
    ?parent ddi:HasVariable ?variable .
    ?parent a ddit:Dataset
GROUP BY ?variable

Next Steps

Colectica Repository DDI 3 RDF Services is now available as a community technology preview to interested customers. There are several items to take note of while using it:

  • There is not yet an official DDI RDF vocabulary. The namespace and names of elements may change in the future.
  • Several external vocabularies are also currently used. These include rdf, rdfs, dc, dcterms, owl, xsd, and foaf.
  • Per metadata item level ACLs on items in the Colectica Repository are not yet implemented in the SPARQL interface. This means that all metadata items will be in a read state to all users, Deploy accordingly.
  • SPARQL UPDATE is disabled to maintain consistency with the versioned metadata items in the Repository.
  • You can set the location of your Colectica Web installation in the RDF Services so that the generated URLs for items are resolvable in the browser. This allows users to see a nice web based view of the information.

Feedback is welcome!


May 272011

We were recently at UW-Madison to visit with Dr. Barry T Radler and the MIDUS longitudinal survey. We are working with them to document their series of longitudinal studies in DDI 3 to enable the generation of very detailed and cross linked codebooks. During our meeting Dr. Radler mentioned that the way the search results on Colectica Web were listed could be improved, as they were currently ordered based solely on information specific to each individual item.

Colectica Web offers faceted searching of DDI items, and even searching within arbitrary sets such as a specific study, instance, package, scheme, etc. The question is how to return the results with the most relevant DDI 3 item listed first. DDI 3 allows for massive reuse of items through its referencing mechanisms. For example, the same concept can be used to describe multiple questions or the same code scheme can be the representation of many different variables. Colectica Repository tracks all of this extra relationship and contextual information about DDI items, so we decided to use it in the search rankings.

Introducing DDI 3 metadata ranking

The search results, show for Gender above, now use not only the information from the DDI 3 item for ranking, they also take into account the how often an item is reused and harmonized across waves of this longitudinal study.

This is also a great new feature for users of Colectica Designer. When a user opens the item picker to create a reference, their search results will also list the most reused items more prominently. This will help users find the items that already have the most influence and increase the comparability of their published research. Please let me know what you think of the new search rankings or if you have any ideas for how they can be even further refined.

May 222011

Colectica Repository can store any metadata items that conform to the ISO 11179 naming scheme for registered items. The DDI 3 Addin for Colectica Repository additionally allows for indexing of contextual and relationship information. Here is a brief code example showing how DDI 3 items can be registered in the Colectica Repository.

First we will create some DDI 3 based metadata using the Colectica SDK. If you don’t have the SDK you can create the DDI 3 by hand or using your favorite XML library.

// Create a DDI 3 Concept using the Colectica SDK
Concept concept = new Concept() { AgencyId = "" };
concept.ItemName["en-US"] = "Given Name";
concept.Description["en-US"] = @"A character-string (e.g. `Billy' and `Peter')
        given to people as a first name (or, in most Western countries, as a
        middle name), usually shortly after birth.";

// Create a DDI 3 Question using the Colectica SDK
Question q1 = new Question() { AgencyId = "" };
q1.QuestionText["en-US"] = "What is your first name?";
TextDomain domain = new TextDomain();
domain.Label["en-US"] = "First Name";

// Link the question and concept

Then we will create the repository client, using the supplied credentials.

// Create the web services client
var client = new WcfRepositoryClient(
    "username", "password", "localhost", 19893);

We can Register any object made by the Colectica SDK using the built in mappings.

// Register a 11179 administered item using
// the Repository Client helper functions
client.RegisterItem(concept, new CommitOptions());

Alternatively, we can access the web services layer and construct the proper SOAP payload.

// Register a 11179 administered item using the Web Services directly
Collection<Note> notes = new Collection<Note>();
string serialization = q1.GetXmlRepresentation(notes).OuterXml;
RepositoryItem ri = new RepositoryItem()
    CompositeId = q1.CompositeId,   // agency, id, and version
    Item = serialization,           // item's serialization
    ItemType = q1.ItemType,         // model defined item type identifier
    IsDepricated = false,
    IsPublished = q1.IsPublished,
    IsProvisional = false,          // only used in the local repository
    Notes = notes,                  // notes about the item being registered
    VersionDate = q1.VersionDate,
    VersionRationale = q1.VersionRationale,
    VersionResponsibility = q1.VersionResponsibility
client.RegisterItem(ri, new CommitOptions());

As you can see, we added both the DDI concept and the DDI question item to the repository. The Colectica SDK has methods to gather all items that are linked and create sets of items to be registered. It also has the ability to detect changed items automatically, so a program can quickly determine which items should have new versions registered after a user action.

Update 1: Registering items in a DDI instance

Here is more information about registering items in the Repository based on some followup questions.

When adding a concept to a question, the hierarchical relationship is established. Is this inline, or by reference at the XML level?.

The DDI 3 standard allows for either including items inline or by references in many locations. Colectica Repository will process and store items using either format. If it is a DDI 3 item, the Repository will additionally index the text and relationship information about the item using the DDI 3 Addin. Note that only the item being registered and its relationship are processed, each item must still be registered individually or in a batch operation.

Colectica Designer will always use the referencing mechanisms in DDI when interacting with the Repository. This is for speed of processing and to allow the easiest sharing and harmonization of items across multiple Studies and Instances. You can learn more about how Designer determines item boundaries by reading about Concise Bounded Descriptions.

How can I register all items in a DDI Instance? How can I update only the changed items in a DDI instance?

If you are using DDI with an XML library, you can use the following xpath queries to find all the items in your Instance to register.


You can then loop over the XML nodes returned by the XPath and register the results. If you are using the Colectica SDK, you can find all items in an Instance as follows:

// obtain your DDI Instance in some fashion
DdiInstance instance = YourDdiInstance();

// Find all items
ItemGatherer gatherer = new ItemGatherer();
Collection<IVersionable> allItems = gatherer.Items;

// You can also find only the modified items
DirtyItemGatherer gatherer = new DirtyItemGatherer();
Collection<IVersionable> changedItems = gatherer.DirtyItems;

How do I export a whole instance to DDI3?

There are several ways to export a DDI instance from the Repository. One way is to use the Repositoy’s command line tools and to write an XML document. Another is to programmatically export a DDI instance using the SDK:

// obtain your DDI Instance in some fashion.
// Only the identification is needed since we will populate the instance
DdiInstance instance = YourDdiInstance();

// populate the entire class hierarchy from the Repository
GraphPopulator populator = new GraphPopulator(client);

// Create the XML document for the DDI Instance
DDIWorkflowSerializer serializer = new DDIWorkflowSerializer();
XmlDocument doc = serializer.Serialize(instance);

A third option is to use an XML library and construct the DDI Instance.

May 192011

Colectica Repository is used as both a registry and resolution service for various pieces of identified metadata. Both Colectica Designer and Web communicate with it to perform all of the neat tasks listed on their features pages. Users can communicate with these same service calls to create their own applications and leverage all of this built in functionality. By default, we supply a SOAP 1.2 WS-* and net.tcp endpoints to communicate with the server remotely. These are the industry heavyweights in enterprise SOA architecture.

Recently we had a client request to use the SOAP 1.1 WS-Basic profile. Due to the Repository’s decoupled design, we were able to add this very quickly. All of these endpoints use a secure transport channel such as SSL/TLS. The quickness of adding new access methods got me thinking what other types of endpoints and serializations might be useful. Adding both SPARQL and REST immediately came to mind.

Colectica Repository already has an excellent relationship and set based querying system. Adding a SPARQL endpoint would allow users to use a standardized query language to process those relationships and associated data. The RDF serialization would be a subset of the official DDI object model. When the DDI urn format is agreed upon I will look into this more. If you like this idea, tell us you would like to see feature ticket #1181 implemented.

REST services make it very convenient for users on various platforms to create access clients. Since all metadata stored in the Repository are identified consistently it should be simple to make a basic access model. Exposing some of the Repository’s more advanced functions would be a bit more challenging, but for simple resolution this would work well. REST could also make use of already existing HTTP caching, as published versions of the metadata do not change.

Aside from native DDI 3, JSON is an obvious candidate for the serialization format, but speed is always a concern. I have been looking at several new binary serialization formats:

  • Google Protocol Buffers: Protocol buffers are fast, simple, compact, and cross platform. I have seen benchmarks where they are faster than the net.tcp binary serialization we currently ship.
  • BSON: Binary JSON is another option and is very similar to the protocol buffers, but is not tied to a schema.

I’ve added REST support as feature ticket #1182, again let us know if that interests you. The next version of Colectica Repository now additionally supports SOAP 1.1, are there any other ways that you would like to access the services?