Jan 092014
 

Colectica Repository and Portal have many settings that can be adjusted by the system administrator. These include settings such as language sort orders, synonyms, item types to display, and appears within configuration. A developer creating Colectica Addins recently asked how this API could be used for storing settings in custom extensions.

The Repository settings consist of a key associated with an optional string and long integer values. There are four web service calls that can interact with the Repository Settings store.

// Removes the repository setting.
void RemoveRepositorySetting(string settingName);

// Sets the repository setting.
void SetRepositorySetting(RepositorySetting setting);

// Gets the repository setting.
RepositorySetting GetRepositorySetting(string settingName);

// Gets all of the repository settings.
Collection<RepositorySetting> GetRepositorySettings();

The Repository Setting class looks like this

public class RepositorySetting
{
    public string SettingName { get; set; }
    public string Value { get; set; }
    public long? LongValue { get; set; }
}

Our built in settings use this key value store, normally with a json representation of our settings stored as the value. Here is an example of how we could store synonyms for DDI item types using the settings API.

// A key defined somewhere in the class
string SingularItemTypeNamesKey = "Colectica:Portal:SingularItemTypeNames";

// Add some synonyms
Dictionary<Guid,string> names = new Dictionary<Guid,string>();
names.Add(DdiItemType.DdiInstance, "Project");

// Create the repository setting to store
RepositorySetting setting = new RepositorySetting();
setting.SettingName = SingularItemTypeNamesKey;
setting.Value = JsonConvert.SerializeObject(names);

// Create the web services client and set the setting
WcfRepositoryClient client = GetClient();
client.SetRepositorySetting(setting);

Similarly, here is an example of how you could retrieve the singular synonyms.

// Create the web services client and retrieve the setting
WcfRepositoryClient client = GetClient();
var setting = service.GetRepositorySetting(SingularItemTypeNamesKey);

if (setting == null || setting.Value == null)
{
    return new Dictionary<Guid, string>();
}

// If the setting exists, serialize it
var results = JsonConvert.DeserializeObject<Dictionary<Guid, string>>(setting.Value);

Best Practices

Use a unique key name. We suggest a colon separated key in the form Organization:Product:Description. For example, MyOrg:MyCustomProject:MySettingName.

Store many settings using a single key. For example, if you are storing translations you could store all of your translations in one key with all the data stored in json or xml in the value. Our example used a dictionary. You could also use lists, a custom class, etc. This allows you to minimize the number of web service calls over the network.

Cache your settings once they are retrieved. If the settings will be used multiple times, cache them in your program so you do not have to call the web services repeatedly.

Sep 182013
 

In an enterprise, there are often numerous data management systems that contain specific data for different domains. These data silos are often difficult to integrate when creating a holistic view of the data life cycle. This post will detail how to create a web services layer over existing databases that will expose DDI metadata. DDI is an open standard for documenting the data lifecycle. Using DDI, multiple data sources can be combined to create the ‘big picture’ view.

The Read Only View

The simplest way to expose DDI from an existing system is to create a Web Services facade. This facade will implement several functions that are needed to expose a data source as an ISO 11179 repository, a standard on which DDI is based. One option is to allow the existing system to perform all updates and management of its own data, while providing a read only view to other systems for integration. To accomplish this with DDI and Colectica, the following abilities should be present in the web services facade.

Viewing an Item

The most basic function of a repository is to retrieve an item. In Colectica, this will most likely be an item serialized as DDI 3. Given an ISO 11179 international registration data identifier (IRDI), the web service calls GetItem and GetItems will return a RepositoryItem object containing information about an Administered Item and its XML serialization.

Versioning

An ISO 11179 repository manages multiple versions of Administered Items. The web service call GetVersionHistory can list all versions of an item in a repository.

Relationships and Search

Searching for relationships between items is needed to efficiently browse items in a hierarchy. To enable a read only view, the web services facade should implement GetRelationshipBySubject and GetTypedRelationships, and GetSet to enable relationship searching. To enable text based searching, the web services facade should implement Search and SearchTypedSet.

Optimizations

Often when browsing, only basic information about an item is needed for display. This often includes the item type, its identity, and a basic label. Implementing GetRepositoryItemDescriptions to provide this basic information can speed up user interactions with the web services layer.

Summary

These 9 abilities encompass all that is needed to create a read only view on top of an existing data management system. These functions also enable creation of local checkouts of the items.

  • If the system already manages items using the DDI standard this is very straight forward.
  • If the system manages data in the DDI content model but not in a DDI serialization of versioning system, a translation layer may be required for the serialization and identification beneath the web service facade.
  • If the data managed by the system is not part of the DDI content model, the data should most likely not be put behind a web services facade. It should instead be documented using the DDI standard. This includes describing variables, datasets, and concepts that describe the data.
Oct 132011
 

With the release of DDI version 3 (DDI Lifecycle), an effort was made to allow reuse and linkages throughout the content model. This created a rich model that allows for reuse and harmonization of metadata items through the use of referencing. When using the DDI 3 Addin for Colectica Repository, all of these relationships between metadata items are indexed and allow for the rich interlinkages of items as seen on Colectica Web. With all of this relationship information, wouldn’t it be nice to execute arbitrary queries about the relationships? Colectica Repository already offer relationship and set based searching for registered metadata items, but a more powerful interface has now arrived.

Colectica Repository RDF Services

SPARQL is a query language created for searching RDF data and is standardized by the W3C. It allows for searching based on the relationships and literal data stored in an RDF graph or store. Colectica Repository now offers a new Addin with the ability to query DDI 3 as RDF using a SPARQL endpoint on Colectica Web and from the Repository with a web service! In addition, each DDI 3 item stored in the Colectica Repository can be downloaded in RDF using a Concise Bounded Description.

How are the RDF Services implemented?

The RDF Services is a new optional component for Colectica Repository. Colectica Repository has many extension points created with the help of the Microsoft Managed Extensibility Framework. This allows custom Addins to be created and deployed by dropping a new assembly into the Addins folder. The DDI 3 Addin uses the Item Format extension point and many customers are already familiar with it. Another extension point is the Post Commit Hook. The RDF Services are implemented as a post commit hook and query for RDF serializers for the item using MEF. They then stores the RDF serialization of the DDI 3 in the Colectica Repository.

RDF Examples

I will show some examples from the US 2010 Census sample DDI 3 example file. The following is the RDF serialization of question 6 as a CBD which has a coded classification and question text in several languages.

@base <http://data.colectica.com/item/us.colectica/ba540279-8bfc-461c-a07b-d25493c648a7/31>.
 
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix ddi: <urn:ddirdf:>.
@prefix ddit: <urn:ddirdf:type:>.
 
_:autos1 a <ddit:CodeDomain>;
         <ddi:HasCodeScheme> <http://data.colectica.com/item/us.colectica/e2648604-e1a5-4a75-8b0e-52a5c13dd89c/13>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

<http://data.colectica.com/item/us.colectica/ba540279-8bfc-461c-a07b-d25493c648a7/31> <http://purl.org/dc/elements/1.1/title> "Q6"@en-us;
    a <ddit:Question>;
    <ddi:AgencyId> "us.colectica"^^xsd:string;
    <ddi:EstimatedTime> "PT0S"^^xsd:dayTimeDuration;
    <ddi:HasCodeDomain> _:autos1;
    <ddi:HasCodeSet> <http://data.colectica.com/item/us.colectica/e2648604-e1a5-4a75-8b0e-52a5c13dd89c/13>;
    <ddi:Id> "ba540279-8bfc-461c-a07b-d25493c648a7"^^xsd:string;
    <ddi:QuestionIntent> "Asked since 1790. Census data about sex are important because many federal programs must differentiate between males and females for funding, implementing and evaluating their programs. For instance, laws promoting equal employment opportunity for women require census data on sex. Also, sociologists, economists, and other researchers who analyze social and economic trends use the data."@en-us;
    <ddi:QuestionText> "¿Cuál es el sexo de la Persona 1?"@es,
          "Jinsia ya Mtu wa 1 ni ipi?"@sw,
          "Quel est le sexe de la Personne 1 ?"@fr,
          "Seksi i Personit 1?"@sq,
          "Was ist das Geschlecht von Person 1?"@de,
          "What is Person {PersonCounter}'s sex?"@en-us;
    <ddi:UserId> "Colectica:UserAssignedId:Q6"^^xsd:string;
    <ddi:Version> 31 ;
    <ddi:VersionDate> "2011-02-09T10:11:14"^^xsd:dateTime;
    <ddi:VersionRationale> "Publishing study"@en-us.


Question 5 shows the use of multiple response domains.

@base <http://data.colectica.com/item/us.colectica/6fc6291a-b9c1-4698-83f8-3983c2ec8cb4/28>.
 
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix ddi: <urn:ddirdf:>.
@prefix ddit: <urn:ddirdf:type:>.

_:autos1 <dc:description> "Apellido"@es,
           "First Name"@en-us,
           "Jina la Mwisho"@sw,
           "Mbiemri"@sq,
           "Nachname"@de,
           "Nom"@fr;
         <dc:title> "Apellido"@es,
         "First Name"@en-us,
         "Jina la Mwisho"@sw,
         "Mbiemri"@sq,
         "Nachname"@de,
         "Nom"@fr;
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

_:autos2 <dc:description> "Emri"@sq,
           "Jina la kwanza"@sw,
           "Last Name"@en-us,
           "Nombre"@es,
           "Prénom"@fr,
           "Vorname"@de;
         <dc:title> "Emri"@sq,
         "Jina la kwanza"@sw,
         "Last Name"@en-us,
         "Nombre"@es,
         "Prénom"@fr,
         "Vorname"@de;
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

_:autos3 <dc:description> "Emri i dytë"@sq,
           "Herufi ya Kati"@sw,
           "Inicial"@es,
           "Initiale 2e prénom"@fr,
           "MI"@en-us,
           "Mittl.Init."@de;
         <dc:title> "Emri i dytë"@sq,
         "Herufi ya Kati"@sw,
         "Inicial"@es,
         "Initiale 2e prénom"@fr,
         "MI"@en-us,
         "Mittl.Init."@de;
         a <ddit:TextDomain>;
         <ddi:ResponseDomainBlankIsMissingValue> false.

<http://data.colectica.com/item/us.colectica/6fc6291a-b9c1-4698-83f8-3983c2ec8cb4/28> <dc:title> "Q5"@en-us;
    a <ddit:Question>;
    <ddi:AgencyId> "us.colectica"^^xsd:string;
    <ddi:EstimatedTime> "PT0S"^^xsd:dayTimeDuration;
    <ddi:HasTextDomain> _:autos1,
                   _:autos2,
                   _:autos3;
    <ddi:Id> "6fc6291a-b9c1-4698-83f8-3983c2ec8cb4"^^xsd:string;
    <ddi:QuestionIntent> "Listing the name of each person in the household helps the respondent to include all members, particularly in large households where a respondent may forget who was counted and who was not. Also, names are needed if additional information about an individual must be obtained to complete the census form. Federal law protects the confidentiality of personal information, including names."@en-us;
    <ddi:QuestionText> "¿Cuál es el nombre de la Persona 1?"@es,
                  "Jina la Mtu wa 1 ni lipi?"@sw,
                  "Quel est le nom de la Personne 1 ?"@fr,
                  "Si quhet Personi 1?"@sq,
                  "What is Person {PersonCounter}'s name?"@en-us,
                  "Wie lautet der Name von Person 1?"@de;
    <ddi:UserId> "Colectica:UserAssignedId:Q5"^^xsd:string;
    <ddi:Version> 28 ;
    <ddi:VersionDate> "2011-02-09T10:11:14"^^xsd:dateTime;
    <ddi:VersionRationale> "Publishing study"@en-us.

SPARQL Examples

Lets take a look at some SPAQRL queries that I can run across the DDI 3 RDF stored in the Colectica Repository. The first one I will look for studies that I have created since January 2010.

PREFIX ddi: <urn:ddirdf:>
PREFIX ddit: <urn:ddirdf:type:>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
 
SELECT ?study
WHERE
{
    ?study a ddit:StudyUnit;
    dc:date ?creation_date;
    dc:creator <http://dan.smith.name/who#dan>.
    FILTER ( xsd:dateTime(?creation_date) > "2010-01-01 00:00:00"^^xsd:dateTime ) .
}
ORDER BY ?study

This second query will give us a count of how many times a variable has been reused/harmonized across datasets.

PREFIX ddi: <urn:ddirdf:>
PREFIX ddit: <urn:ddirdf:type:>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
 
SELECT ?variable COUNT (?parent) AS c
WHERE
{
    ?parent ddi:HasVariable ?variable .
    ?parent a ddit:Dataset
}
GROUP BY ?variable

Next Steps

Colectica Repository DDI 3 RDF Services is now available as a community technology preview to interested customers. There are several items to take note of while using it:

  • There is not yet an official DDI RDF vocabulary. The namespace and names of elements may change in the future.
  • Several external vocabularies are also currently used. These include rdf, rdfs, dc, dcterms, owl, xsd, and foaf.
  • Per metadata item level ACLs on items in the Colectica Repository are not yet implemented in the SPARQL interface. This means that all metadata items will be in a read state to all users, Deploy accordingly.
  • SPARQL UPDATE is disabled to maintain consistency with the versioned metadata items in the Repository.
  • You can set the location of your Colectica Web installation in the RDF Services so that the generated URLs for items are resolvable in the browser. This allows users to see a nice web based view of the information.

Feedback is welcome!

Dan

May 272011
 

We were recently at UW-Madison to visit with Dr. Barry T Radler and the MIDUS longitudinal survey. We are working with them to document their series of longitudinal studies in DDI 3 to enable the generation of very detailed and cross linked codebooks. During our meeting Dr. Radler mentioned that the way the search results on Colectica Web were listed could be improved, as they were currently ordered based solely on information specific to each individual item.

Colectica Web offers faceted searching of DDI items, and even searching within arbitrary sets such as a specific study, instance, package, scheme, etc. The question is how to return the results with the most relevant DDI 3 item listed first. DDI 3 allows for massive reuse of items through its referencing mechanisms. For example, the same concept can be used to describe multiple questions or the same code scheme can be the representation of many different variables. Colectica Repository tracks all of this extra relationship and contextual information about DDI items, so we decided to use it in the search rankings.

Introducing DDI 3 metadata ranking

The search results, show for Gender above, now use not only the information from the DDI 3 item for ranking, they also take into account the how often an item is reused and harmonized across waves of this longitudinal study.

This is also a great new feature for users of Colectica Designer. When a user opens the item picker to create a reference, their search results will also list the most reused items more prominently. This will help users find the items that already have the most influence and increase the comparability of their published research. Please let me know what you think of the new search rankings or if you have any ideas for how they can be even further refined.

May 192011
 

Colectica Repository is used as both a registry and resolution service for various pieces of identified metadata. Both Colectica Designer and Web communicate with it to perform all of the neat tasks listed on their features pages. Users can communicate with these same service calls to create their own applications and leverage all of this built in functionality. By default, we supply a SOAP 1.2 WS-* and net.tcp endpoints to communicate with the server remotely. These are the industry heavyweights in enterprise SOA architecture.

Recently we had a client request to use the SOAP 1.1 WS-Basic profile. Due to the Repository’s decoupled design, we were able to add this very quickly. All of these endpoints use a secure transport channel such as SSL/TLS. The quickness of adding new access methods got me thinking what other types of endpoints and serializations might be useful. Adding both SPARQL and REST immediately came to mind.

Colectica Repository already has an excellent relationship and set based querying system. Adding a SPARQL endpoint would allow users to use a standardized query language to process those relationships and associated data. The RDF serialization would be a subset of the official DDI object model. When the DDI urn format is agreed upon I will look into this more. If you like this idea, tell us you would like to see feature ticket #1181 implemented.

REST services make it very convenient for users on various platforms to create access clients. Since all metadata stored in the Repository are identified consistently it should be simple to make a basic access model. Exposing some of the Repository’s more advanced functions would be a bit more challenging, but for simple resolution this would work well. REST could also make use of already existing HTTP caching, as published versions of the metadata do not change.

Aside from native DDI 3, JSON is an obvious candidate for the serialization format, but speed is always a concern. I have been looking at several new binary serialization formats:

  • Google Protocol Buffers: Protocol buffers are fast, simple, compact, and cross platform. I have seen benchmarks where they are faster than the net.tcp binary serialization we currently ship.
  • BSON: Binary JSON is another option and is very similar to the protocol buffers, but is not tied to a schema.

I’ve added REST support as feature ticket #1182, again let us know if that interests you. The next version of Colectica Repository now additionally supports SOAP 1.1, are there any other ways that you would like to access the services?