Understanding Vocabularies. Wait! What did you say?

Any data system the semantic meaning of data is as important as the strucutre of the data. In HealthVault we expose a very structured data set in form of various data types and the semantic meaning of the content in those data sets is dictated by vocabularies.

HealthVault Vocabulary is a big area so I’m going to attempt to break this down in separate series of posts. In this post i’m primarily going to focus on vocabularies in general.

Many of you might have heard of the term – Semantic Web or Web 3.0. So whats this buzz about?  Well Web 1.0 was for humans to connect, Web 2.0 was for systems to connect to humans via rich internet applications. Web 3.0 promises a web for systems – a web where programs can communicate and link to each other. So what this implies is for Semantic Web to be successful – the data being put on the semantic internet need not only be structured but also the content be in such a way that computer programs can understand the meaning of it. This is only possible if everyone has a shared Vocabulary or Ontology, or a mechanism to relate to a new Vocabulary.

To solve the ontology problem we can just sit down and invent a vocabulary which everyone will use henceforth and be done with it, right!  First, we won’t agree to single vocabulary and second we can’t plan for future vocabularies. And the most important challenge is that the system which powers this vocabulary needs to agree with the architecture of the web i.e must be decentralized and open!

The semantic web community is using a very powerful way to achieve this. They are using the same mechanism which powers resource discovery (for example – URL linking) to discover and understand vocabularies. Two candidates which make this possible are RDF (resource description format) and OWL (Web Ontology Language). I won’t describe these technologies in details here but keep it for some other day. However the point of this note is to surface example ontologies or vocabularies this community has successfully used/developed so far:

So how does this fit in the HealthCare? John Hamalka outlines the elements of vocabulary whicn an EHR can use in his post – http://geekdoctor.blogspot.com/2009/04/data-elements-of-ehr.html. He mentions preferred vocabularies and transports for some of important EHR elements. In the following posts i will try to go deeper in this area.

So how does this fit with HealthVault? Well HealthVault exposes all the vocabularies it uses – http://developer.healthvault.com/types/vocabs.aspx. We let people also annotate their data with any vocabulary they like. However this leds to an interesting interoperability problem, so on the XSD schemas of our data types (http://developer.healthvault.com/types/types.aspx) we specify preferred vocabularies for some data elements. In the following posts i will provide more details with regards to this.

As you can from John’s post their is no dearth of language systems for various medical or healthcare terms. However their is a big gap on best practices on how one can denormalize various vocabularies for implementeting systems which can interoperate with other systems using different vocabularies. I tend to think that there are some lessons to be learned in this area from semantic web efforts and also a need for a more structured effort to surface best practices. May be I’ll dig deeper in this area in one of the future posts.

Next post: Recommended Vocabularies for Various Data Contexts.

Reblog this post [with Zemanta]

9 thoughts on “Understanding Vocabularies. Wait! What did you say?

  1. Great topic! Vocabularies are close to my heart.

    I think the idea of being able to add new HealthVault “Types” and sharing them across the ecosystem is really killer – having said that, the current set up doesn’t allow or facilitate this idea. The health provider portals are completely segregated from the apps, so lets say I’m NYP patient and I want to start recording my pedometer readings – I login to MyNYP.org but there is no way for me to know that there is something called WalkMe to do exactly this. So essentially its silos of systems connected through a central database.

    The decentralized way of creating types on a centralized resource works great (think Wikipedia) but IMO the Semantic Web approach of people creating separate ontologies will never work – it leads to classic problems in mapping and integration.

    We have our own internal set of ‘Types’ and terminologies/vocabularies that we map against incoming HealthVault data. Its easy since HealthVault sticks to standard set of vocabularies. However, I could imagine a scenario with thousands of HealthVault applications with millions of types and vocabularies. So for example you don’t want things like this

    Application 1 Type: Pedometer Reading

    Application 2 Type: Pedometer Count

    Although they represent the same information differentiating such cases will become hard.

    I think you folks are cracking the vocabulary problems elegantly at each step. Would love to hear more on this topic.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.