SNOMED Clinical Terms® Implementation Course (abbreviated one

SNOMED Clinical Terms®
Supporting Post-coordination with an
Expression Repository
IHTSDO Implementation SIG Webinar
by David Markwell
The Clinical Information Consultancy Ltd
david@clininfo.co.uk
www.cliniclue.com and www.clininfo.co.uk
© 2002-2010 The Clinical Information Consultancy Ltd (includes some material © 2002-2010 IHTSDO)
1
Overview
 Refresher about expressions
– What is an expression
– Pros and cons of post-coordination
 Issues with post-coordination
 A practical approach to storage issues
– Expression repository
 A practical approach to retrieval issues
– Expression link table
– Expression transitive closure
SNOMED CT Expressions
A „SNOMED CT expression‟ is
 A collection of references to one or more
SNOMED CT concepts, used to express an
instance of a clinical idea
 Expressions can be used to represent:
– Instances of clinical information in electronic health
records
– Knowledge links in resources such as decision
support protocols and online reference materials
Expressions can be pre-coordinated or
post-coordinated
 Pre-coordinated expression
– A single ConceptId represents the required meaning
• Example
• 31978002
– (fracture of tibia)
 Post-coordinated expression
– A combination of ConceptIds represents a concept
• Example
• 31978002 : 272741003 = 7771000
– (fracture of tibia : laterality = left)
– In human readable form … “fracture of left tibia”
Expressions can exist in different forms
 Close-to-user form
– The concept or concepts selected by the user
• (or by a user-interface designer)
 Normal form
– The result of applying a set of logical rules that
transform different expressions with the same
meaning into a common comparable form
 Both these forms may include or exclude a
situation context wrapper
– If included this explicitly states the context of a
finding or procedure
Advantages of post-coordination
 Scope coverage and terminology size
– Coverage of scope to an adequate level of specificity does not
require every possible concept to exist
– Reduces the need for “combinatorial explosion” in concept
numbers to cover every eventuality
 Terminology maintenance
– The maintenance burden is related to terminology size
• Discussed further on next slide
 Structured data entry
– Ability to represent refined content is not dependent on specific
concept existing
– Expressions can be constructed in a consistent manner rather than
searching hundreds of similar terms for precisely the correct one
 Consistent retrieval
– Less dependency on modelling of individual concepts
• Discussed further on subsequent slide
Advantages of post-coordination
Terminology maintenance
 Maintenance burden is roughly proportional to
terminology size. This is due to:
– Requirements to add new content
• Accurate modelling and addition of synonymous terms
• Synonymous terms
• Translation adding terms in other languages
– Errors such as
• Ambiguity
• Non-synonymous terms (within or between languages)
• Inconsistent modelling
– Enhancements of the SNOMED CT Concept Model
• Which require some concepts to be remodelled
Note:
This rule applies to Extensions as well as the International Release
Advantages of post-coordination
Consistent retrieval
 Use of post-coordination make retrieval less
dependent on modelling of individual concepts
 For example:
– Accurate retrieval of the following pre-coordinated
expression is dependent on the accuracy and
specificity of the defined causative agent
• 91936005 | allergy to penicillin |
– In contrast, the following post-coordinated
expression explicitly identifies the substance
• 416098002 | drug allergy | : 246075003 | causative agent | =
372725003 | penicillin V |
Disadvantages of post-coordination
(Note: issues addressed in more detail on subsequent slides)
 Human readability, entry and display
– „Extreme post-coordination‟ leads to loss of natural terms
 Data entry
– Requiring users to select several items to build a postcoordinated expression may be a burden for entry of
common composite data items
 Storage
– Post-coordinated expressions have variable length so it
may be difficult to efficiently represent them in a database
table
 Retrieval
– Performance may be impaired by
• Storage issues that may prevent optimal indexing
• Complexity of testing query predicates against post-coordinated
expressions
Pre & post-coordination with SNOMED CT
 SNOMED CT supports both pre and post-coordination
– No absolute boundaries between them
 SNOMED CT enables computation of equivalence and
subsumption between pre and post-coordinated
expressions that have the same meaning
Addressing post-coordination issues
Human-readability
„Extreme post-coordination‟ leads to loss of natural terms
Example
If “appendectomy” is only represented as:
71388002 | procedure | : { 260686004 | method | = 129304002 | excision action | , 405813007 | procedure site - Direct | = 66754008 | appendix
structure | }
the word „appendectomy‟ is not present
Thus this does not support search by or display of the term clinical
users expect to see
 Clinical ideas that are associated with common names
(other than composites that could be derived from a
post-coordinated expression) need to be represented by
adding concepts to SNOMED CT
Addressing post-coordination issues
Concept model limits
 Clinical ideas that cannot be fully represented by a postcoordinated expression due to limitation of the
SNOMED CT Concept Model should be represented by
either:
– Adding concepts to SNOMED CT; or
– Using information model constructs to link expressions together
or to other related data
 The choice between these approaches depends on the
information to be represented. Only ideas that fit within
the scope and editorial guidelines applied to SNOMED
CT should result in addition of new concepts
Addressing post-coordination issues
Data entry
 The two many keystrokes problem
– Multiple selections to build a post-coordinated
expression may be a burden for entry of common
composite data items
– Options to address this issue will be the subject of another
Implementation SIG webinar
Addressing post-coordination issues
Storage
 Post-coordinated expressions have variable
length so it may be difficult to efficiently
represent them in a database table
– Using the SNOMED CT grammar
• The shortest expression is 6 characters in length
• The length of an expression is theoretically unlimited
• Real examples exist with over 300 characters (using id‟s
only) or over 1,000 characters (including the term text)
 One way to address this is by using an
„Expression Repository‟
– This is described in the next few slides
Expression Repository
General approach
 Users and/or user-interface designers have full
access to post-coordinated expressions as a way to
record information
 When a post-coordinated expression is entered, it
is looked up in a repository of expressions
 If the expression is found in the repository
– The unique identifier associated with that expression is
stored in the record
 If the expression is not found in the repository
– It is added to the repository with a new unique identifier
– The new unique identifier is stored in the record
Expression Repository
Nature of the expression looked up
 The record looked up in, or added to, the repository
is the post-coordinated expression as:
– entered by the user via a generalized user-interface; or
– specified by the designer of a data entry form (or protocol)
to represent a particular user selection
 The post-coordinated expression is not transformed
to normal form before look up
 The order of refinements in the expression may be
sorted
– To avoid creating repository entries for expressions
containing identical refinements simply due to the order in
which a refinement was applied
Expression Repository
Management of the repository
 Expression repository management is a fully
automated process
 Every distinct expression entered is recorded in the
repository with a unique identifier
 Entries in the repository are never deleted
 The repository must be backed and kept secure so
the recorded data is not compromised
– Note: This backup is mission critical and needs to be
treated as part of the overall record system
 No manual intervention or terminology expert
involvement in repository maintenance
– The repository is simply a technical artefact that provides
a reference link between the literal expression and the
stored unique identifier
Expression Repository
Supporting communication of
expressions
 Communications use the literal form of the
expression
– Not the unique repository identifier
 When a record that contains an expression is sent
– The unique identifier is looked up in the repository
– The associated expression is included in the
communication
 When a record including an expression is received
– The expression received in looked up in the repository
– If the expression is not found a new entry is made in the
repository
– The associated unique identifier is stored in the record
Expression Repository
Sharing and merging of the expression repositories
 No need to share the expression repository to support
communication
– The literal form of the expression is communicated
– Where necessary the target expression repository allocates a
new identifier for a received expression
 Systems may share an expression repository subject to
– Performance and data integrity considerations
– Organisational boundaries and responsibilities
– System architecture and software application design
 Expression repositories may need to be merged to
support changes in organisational structure and system
architecture
– The method of allocating identifiers should ensure global
uniqueness to facilitate future mergers without compromising
data integrity
Expression Repository
Associations between terms and expressions
 The expression repository must not contain any
many specified terms associated with a postcoordinated expression
– The meaning of a post-coordinated expression in the
repository is no more and no less than the meaning
represented by its constituent parts.
– Excluding manually entered terms avoids the risk of
deviation between such terms and the inherent meaning
of the expression
 The expression identifier is just a reference to the
full expression
– This is the main difference between an entry in the
expression repository and a concept definition
– This is the key to avoiding manual maintenance of the
repository
Expression Repository
An example schema
Column
Datatype
uid
UUID
Unique identifier of Expression in the
repository.
Created by standard UUID/GUID
generation algorithm.
hash
String (40)
Hash from expression used as search key
(not necessarily unique)
expression
String (varlen) Text of the expression
createdTime
UtcTime
Time stamp of creation of this entry
Clinical record entry table
Partial example schema
Column
Datatype
Notes
uid
UUID
Unique identifier of record entry.
Created by standard UUID/GUID
generation algorithm.
patientId
UUID/LocalId
Patient identifier.
Links record entry to the patient.
expressionUid
UUID
Reference to row in the Expression
Repository.
Refers to the Expression that was used in
this record entry.
…
…
Various other fields containing text, values,
units, dates, times provenance, etc
Addressing post-coordination issues
Retrieval
 Performance may be impaired by
– Storage issues limiting use of indexing
• Indexing of variable length fields is less efficient
• Some database limit the maximum field length for indexing
– Complexity of testing query predicates against postcoordinated expressions
• Normal form transforms are tractable but repeating them each
time a retrieval request is made is unlikely to meet performance
requirements
• Alternative approaches using Description Logic classification
computation are similarly less efficient than a simple search for
a set of codes
 Enhancements to an „Expression repository‟ are
one way to address this issue
– This is described in the next few slides
Normal form optimisation
Expression Linkage
 Expression Repository
– Each expression in a is represented by an expressionUid
– This can be dereferenced in the Expression Repository
 Expression Links
– An Expression Links Table represents links between
Expressions using the UIDs to the linked Expressions
 Normal Form Expression Generation
– Each new expression added to the Expression Repository is
transformed to its Normal Form
– The Normal Form expression is also looked up in and if
necessary added to the Expression Repository
– A link is created between an Expression and its Normal Form
Expression
 Rapid access to Normal Forms
– It is simple and fast to lookup the Normal Form for an
Expression by querying using the Expression Links Table
Normal form optimisation
Expression Link Maintenance
 For each new SNOMED CT release
– The normal forms of all expressions affected by
modelling changes are re-computed
– The new Normal Forms are looked up (or if
necessary added to) the Expression Repository
– New Links are added to the Expression Links Table
– Use of the state-valid approach (as adopted in the
SNOMED CT Release Format 2) allows these new
links to supersede the pre-existing links as they have
a more recent effectiveTime
 The resulting maintenance burden can be
managed automatically
Expression Link Table
An example schema
Column
Datatype
Notes
sourceUid
UUID
UID of an Expression
linkType
Integer
Enumeration of transformation types.
Allows alternative transformation results for the
same expression (e.g. with/without context)
effectiveTime
UtcTime
Time stamp of creation of this entry.
If normal forms change due to enhancement of
the Concept Model and/or modification of the
Concept Definitions a new row with the same
sourceUid and linkType but a newer
effectiveTime supersedes this.
targetUid
UUID
The UID of the Normal Form Expression.
Retrieval Optimization
Expression Repository Transitive Closure
 For added performance the Expression
Repository can be “auto-classified” using a
description logic classifier.
 The results of classification can appended to a
transitive closure table
– This Expression Transitive Closure Table enables
instant subsumption testing using general purpose
SQL queries
Retrieval Optimization
Expression Repository Transitive Closure Maintenance
 For each new SNOMED CT release
– Expression Repository Transitive Closure (ERTC)
needs to be recomputed
– The computed ERTC can use an effectiveTime to
enable views of the subsumption relationships
between expression after every release or change
 This process can be automated resulting in
minimization of the maintenance burden
Demonstration
 Demonstration of
– Expression Repository
– Expression Link Table
 Using
– CliniClue Xplore to
• Select expressions
• Generate Normal Forms
– Access database with following tables
• Expressions : The Expression Repository
• Journals
: A simple record mockup
• ExpressionLinks : The ExpressionLinks to NormalForms
Summary
The value of post-coordination
 Attempts to avoid the use of post-coordination
by adding pre-coordinated concepts to meet
every requirement
– Increases the size of the terminology
– Introduce risks of errors from modeling
• In particular mismatches between descriptions and
associated concept definitions
– Create a growing maintenance burden
– The impact of these factors is probably even greater
if additions are made in Extensions as a result of
duplication of effort on different and potentially
divergent Extensions
Summary
Practical approaches to storage and retrieval of
post-coordinated expressions
 An Expression Repository
– Enables a predictable indexable storage of postcoordinated expressions
– Allows post-coordinated expressions to be entered,
stored, retrieved and communicated
 Adding an Expression Links Table
– Allows rapid access to Normal Forms
 Adding an Expression Transitive Closure
– Supports high performance subtype testing
 These tables can all be maintained by software
without manual input
Questions?
© 2002-2010 The Clinical Information Consultancy Ltd (includes some material © 2002-2010 IHTSDO)
32