Feb 022010
 

In creating a process where SQL Server takes XML data that is stored in a table and populates a table, I have learned a few things on how to get SQL Server to more efficiently query XML data.

Setup:

Let’s first create a table to store the data:

CREATE TABLE dbo.XMLDataStore(
  link nvarchar(100) NULL,
  data xml NULL
) ON [PRIMARY]
GO

Let’s take a look at the XML that we are going to query (yes I know is is a large xml document, but I wanted to show the peformance in a real world situation, not a 3 node xml document that is normally demonstrated)

Now let’s insert it into the table:

INSERT INTO [XMLTutorial].[dbo].[XMLDataStore]
           ([link]
           ,[data])
     VALUES
           ('ABCDEFGHIJ'
           ,'<ns0:ORU_R01_231_GLO_DEF xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:ns0="http://labratory/DB/2X">
               ...
            </ns0:ORU_R01_231_GLO_DEF>')
GO

The first query starts at the Observation record (all 64 records) and traverses the xml document and creates the necessary columns in 16 seconds.

select  Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_5_PatientName/XPN_1_GivenName/text())[1])','nvarchar(100)') as FirstName,
    Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_5_PatientName/XPN_0_FamilyLastName/XPN_0_0_FamilyName/text())[1])','nvarchar(100)') as LastName,
    Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_7_DateTimeOfBirth/TS_0_TimeOfAnEvent/text())[1])','nvarchar(100)') as BirthDate,
    Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_2_PatientId/CX_0_Id/text())[1])','nvarchar(100)') as InsuranceNumber,
    Observation.ref.value('((../OBR_ObservationRequestSegment/OBR_1_SetIdObr/text())[1])','nvarchar(100)')as [OBRID],
    Observation.ref.value('((../OBR_ObservationRequestSegment/OBR_7_ObservationDateTime/TS_0_TimeOfAnEvent/text())[1])','nvarchar(10)') as ObservationDate,
    Observation.ref.value('((../OBR_ObservationRequestSegment/OBR_4_UniversalServiceId/CE_1_Text/text())[1])','nvarchar(100)') as LabTestName,
    null as LabTestCode,
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_1_SetIdObx/text())[1])','nvarchar(100)') as [OBXID],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_3_ObservationIdentifier/CE_4_AlternateText/text())[1])','nvarchar(100)') as LabResultName,
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_3_ObservationIdentifier/CE_0_Identifier/text())[1])','nvarchar(100)') as LabResultCode,
    Observation.ref.value('((./OBR_5_PriorityObr/text())[1])','nvarchar(100)') as [Priority],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_14_DateTimeOfTheObservation/TS_0_TimeOfAnEvent/text())[1])','nvarchar(100)') as [ResultDate],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_5_ObservationValue/CE_4_AlternateText/text())[1])','nvarchar(100)') as [ResultValue],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_6_Units/CE_0_Identifier/text())[1])','nvarchar(100)') as [UnitOfMeasure],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_7_ReferencesRange/text())[1])','nvarchar(100)') as [ReferenceRange]
from XMLDataStore x cross apply x.data.nodes('//Observation') Observation(ref)
where x.link='ABCDEFGHIJ'

QueryResults1

The first optimization step is instead of using the Observation node and deriving all of the other columns from that, you can further use CROSS APPLY to create separate nodes to extract data from coupled with not using wild cards and diving directly to the exact node I need to. I went to the OBX_ObservationResultSegment as the originating node, and then from that node (named Observation), I derived two other nodes to reference in the query; Patient and Request.

This time the query completed in 4 seconds:

WITH XMLNAMESPACES ('http://labratory/DB/2X' AS "ns0")
select  Patient.node.value('(PID_5_PatientName/XPN_1_GivenName/text())[1]','nvarchar(100)') as FirstName,
    Patient.node.value('(PID_5_PatientName/XPN_0_FamilyLastName/XPN_0_0_FamilyName/text())[1]','nvarchar(100)') as LastName,
    Patient.node.value('(PID_7_DateTimeOfBirth/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as BirthDate,
    Patient.node.value('(PID_2_PatientId/CX_0_Id/text())[1]','nvarchar(100)') as InsuranceNumber,
    Request.node.value('(OBR_1_SetIdObr/text())[1]','nvarchar(100)')as [OBRID],
    Request.node.value('(OBR_7_ObservationDateTime/TS_0_TimeOfAnEvent/text())[1]','nvarchar(10)') as ObservationDate,
    Request.node.value('(OBR_4_UniversalServiceId/CE_1_Text/text())[1]','nvarchar(100)') as LabTestName,
    null as LabTestCode,
    Observation.node.value('(OBX_1_SetIdObx/text())[1]','nvarchar(100)') as [OBXID],
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_4_AlternateText/text())[1]','nvarchar(100)') as LabResultName,
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_0_Identifier/text())[1]','nvarchar(100)') as LabResultCode,
    Observation.node.value('(OBR_5_PriorityObr/text())[1]','nvarchar(100)') as [Priority],
    Observation.node.value('(OBX_14_DateTimeOfTheObservation/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as [ResultDate],
    Observation.node.value('(OBX_5_ObservationValue/CE_4_AlternateText/text())[1]','nvarchar(100)') as [ResultValue],
    Observation.node.value('(OBX_6_Units/CE_0_Identifier/text())[1]','nvarchar(100)') as [UnitOfMeasure],
    Observation.node.value('(OBX_7_ReferencesRange/text())[1]','nvarchar(100)') as [ReferenceRange]
from XMLDataStore x
cross apply x.data.nodes('ns0:ORU_R01_231_GLO_DEF/CompleteOrder/Order/Observation/OBX_ObservationResultSegment') Observation(node)
cross apply Observation.node.nodes('../../../Patient/PID_PatientIdentificationSegment') Patient(node)
cross apply Observation.node.nodes('../../OBR_ObservationRequestSegment') Request(node)
where x.link='ABCDEFGHIJ'

QueryResults2

Never being satisfied, let’s add an index to the data. However, to add an xml index to the data, we need to create a clustered index on the table. If we simply try to add an xml index to the current table with this command:

CREATE PRIMARY XML INDEX PrimaryXMLIndex ON
dbo.XMLDataStore(data)
GO

We get the following error:

Msg 6332, Level 16, State 201, Line 1
Table 'dbo.XMLDataStore' needs to have a clustered primary key with less than 16 columns in it in order to create a primary XML index on it.

Not descriptive, so let’s create a new table:

CREATE TABLE dbo.OptimizedXMLDataStore(
  id INT IDENTITY PRIMARY KEY,
  link nvarchar(100) NOT NULL,
  data xml NOT NULL
) ON [PRIMARY]
GO

And creating the following indexes in the database:

CREATE PRIMARY XML INDEX PrimaryXMLIndex ON
dbo.OptimizedXMLDataStore(data)
GO
CREATE XML INDEX
XMLDataStore_XmlCol_PATH ON dbo.OptimizedXMLDataStore(data)
USING XML INDEX PrimaryXMLIndex FOR PATH
GO

Now that the indexes are created, let’s insert the data into this table:

INSERT INTO [XMLTutorial].[dbo].[OptimizedXMLDataStore]
           ([link]
           ,[data])
     VALUES
           ('ABCDEFGHIJ'
           ,'<ns0:ORU_R01_231_GLO_DEF xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:ns0="http://labratory/DB/2X">
               ...
            </ns0:ORU_R01_231_GLO_DEF>')
GO

Now let’s run the same query (except pointing to the indexed table):

WITH XMLNAMESPACES ('http://labratory/DB/2X' AS "ns0")
select  Patient.node.value('(PID_5_PatientName/XPN_1_GivenName/text())[1]','nvarchar(100)') as FirstName,
    Patient.node.value('(PID_5_PatientName/XPN_0_FamilyLastName/XPN_0_0_FamilyName/text())[1]','nvarchar(100)') as LastName,
    Patient.node.value('(PID_7_DateTimeOfBirth/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as BirthDate,
    Patient.node.value('(PID_2_PatientId/CX_0_Id/text())[1]','nvarchar(100)') as InsuranceNumber,
    Request.node.value('(OBR_1_SetIdObr/text())[1]','nvarchar(100)')as [OBRID],
    Request.node.value('(OBR_7_ObservationDateTime/TS_0_TimeOfAnEvent/text())[1]','nvarchar(10)') as ObservationDate,
    Request.node.value('(OBR_4_UniversalServiceId/CE_1_Text/text())[1]','nvarchar(100)') as LabTestName,
    null as LabTestCode,
    Observation.node.value('(OBX_1_SetIdObx/text())[1]','nvarchar(100)') as [OBXID],
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_4_AlternateText/text())[1]','nvarchar(100)') as LabResultName,
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_0_Identifier/text())[1]','nvarchar(100)') as LabResultCode,
    Observation.node.value('(OBR_5_PriorityObr/text())[1]','nvarchar(100)') as [Priority],
    Observation.node.value('(OBX_14_DateTimeOfTheObservation/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as [ResultDate],
    Observation.node.value('(OBX_5_ObservationValue/CE_4_AlternateText/text())[1]','nvarchar(100)') as [ResultValue],
    Observation.node.value('(OBX_6_Units/CE_0_Identifier/text())[1]','nvarchar(100)') as [UnitOfMeasure],
    Observation.node.value('(OBX_7_ReferencesRange/text())[1]','nvarchar(100)') as [ReferenceRange]
from OptimizedXMLDataStore x
cross apply x.data.nodes('ns0:ORU_R01_231_GLO_DEF/CompleteOrder/Order/Observation/OBX_ObservationResultSegment') Observation(node)
cross apply Observation.node.nodes('../../../Patient/PID_PatientIdentificationSegment') Patient(node)
cross apply Observation.node.nodes('../../OBR_ObservationRequestSegment') Request(node)
where x.link='ABCDEFGHIJ'

The results came back in 0 seconds

QueryResults3

Things I did not do:

  1. Actually see Clark Kent (I think he was born before June of 1938, but it was the first time he was writtent about)
  2. Question why the “The Last Son of Krypton” was actually getting lab work done
  3. Imported schemas into the database