Sep 182013
 

In an enterprise, there are often numerous data management systems that contain specific data for different domains. These data silos are often difficult to integrate when creating a holistic view of the data life cycle. This post will detail how to create a web services layer over existing databases that will expose DDI metadata. DDI is an open standard for documenting the data lifecycle. Using DDI, multiple data sources can be combined to create the ‘big picture’ view.

The Read Only View

The simplest way to expose DDI from an existing system is to create a Web Services facade. This facade will implement several functions that are needed to expose a data source as an ISO 11179 repository, a standard on which DDI is based. One option is to allow the existing system to perform all updates and management of its own data, while providing a read only view to other systems for integration. To accomplish this with DDI and Colectica, the following abilities should be present in the web services facade.

Viewing an Item

The most basic function of a repository is to retrieve an item. In Colectica, this will most likely be an item serialized as DDI 3. Given an ISO 11179 international registration data identifier (IRDI), the web service calls GetItem and GetItems will return a RepositoryItem object containing information about an Administered Item and its XML serialization.

Versioning

An ISO 11179 repository manages multiple versions of Administered Items. The web service call GetVersionHistory can list all versions of an item in a repository.

Relationships and Search

Searching for relationships between items is needed to efficiently browse items in a hierarchy. To enable a read only view, the web services facade should implement GetRelationshipBySubject and GetTypedRelationships, and GetSet to enable relationship searching. To enable text based searching, the web services facade should implement Search and SearchTypedSet.

Optimizations

Often when browsing, only basic information about an item is needed for display. This often includes the item type, its identity, and a basic label. Implementing GetRepositoryItemDescriptions to provide this basic information can speed up user interactions with the web services layer.

Summary

These 9 abilities encompass all that is needed to create a read only view on top of an existing data management system. These functions also enable creation of local checkouts of the items.

  • If the system already manages items using the DDI standard this is very straight forward.
  • If the system manages data in the DDI content model but not in a DDI serialization of versioning system, a translation layer may be required for the serialization and identification beneath the web service facade.
  • If the data managed by the system is not part of the DDI content model, the data should most likely not be put behind a web services facade. It should instead be documented using the DDI standard. This includes describing variables, datasets, and concepts that describe the data.
Aug 292013
 

The changes from the DDI 3.2 public review have been entered into the source repository, and the final review of the changes is now taking place. A main focus on version 3.2 is consistency and usability, and the Technical Committee came up with a list of design and content guidelines to ensure this. This focus on consistency should allow users and developers to more quickly adopt DDI Lifecycle since all the content areas should now be programmatically usable in the same ways.

Check out an example report on the current DDI 3.2 development schemas

During our review of 3.2, we have created a tool to point out items in the DDI schema set that do not conform with these consistency guidelines. The tool analyses the schemas and creates an html report of items that should be addressed before release. It currently performs the following checks.

  • Validate schema set is DDI Lifecycle.
  • Check compilation of the schema as an XML Schema Set.
  • Versionables and Maintainables allowing inline or reference usage.
    • Versionables and Maintainables are in a xs:Choice.
    • Versionables and Maintainables in a xs:Choice contain two elements.
    • Versionables and Maintainables in a xs:Choice contain a xxxReference.
  • FragmentInstance contains all Versionables and Maintainables.
  • Type of Object for references
    • Duplicate Element names detected for referenceable types.
    • Element names detected without a TypeOfObject defined.
  • Spell checking
    • Element names
    • Attribute names
    • XSD annotations/documentation
    • Breaking apart CamelCasedWords
    • Allows words to be added to dictionary
    • Uses en-US
    • Highlighting of misspellings in generated reports.

In addition to checking the structures in the schema, the tool also does a spell checking of all elements, attributes, and inline documentation to make sure that the released DDI has a professional feel. You can see an example report on the current DDI 3.2 schemas progress towards the consistency goals!

lgplv3We have licensed the tool as Open Source under the LGPL and the code is available for download and forking on GitHub at https://github.com/DanSmith/DDISchemaCheck/.

There is also a release of the compiled tool on the releases page. Please email the DDI users list, send a tweet, or send us pull requests with any additional tests that you would like to see incorporated.