Feb 232010
 

I have had to look this query up a few times, so I thought I would write about it here:

USE [DBInstanceName]
GO
SELECT  st.row_count as [Row Count]
FROM sys.dm_db_partition_stats st
WHERE index_id < 2
and OBJECT_NAME(OBJECT_ID)='TableName'

Feb 102010
 

Since I have had to explain how to use the BulkXML adapter a few times, I decided to create a quick tutorial on how to use it. Also, this is a tutorial for me to follow, since every time I do this, I re-learn the same things over and over.

The first thing is we are going to create the tables in the database with the relationships. You can get a jump start by downloading this script:

You will have a table structure like this:

SampleTables

and an xml file that looks like this

<ns0:File xmlns:ns0="http://BulkXMLSample.Input">
  <FamilyRecord>
    <Name>Stott</Name>
    <Address>100 N 100 W</Address>
    <City>Logan</City>
    <State>UT</State>
    <Zip>84321</Zip>
    <Child><Name>Bob</Name><Sex>M</Sex></Child>
    <Child><Name>Susan</Name><Sex>F</Sex></Child>
    <Child><Name>Mary</Name><Sex>F</Sex></Child>
    <Child><Name>Jane</Name><Sex>F</Sex></Child>
    <Child><Name>Jeb</Name><Sex>M</Sex></Child>
  </FamilyRecord>
  <FamilyRecord>
    <Name>Dahl</Name>
    <Address>45 Polk Ave</Address>
    <City>Blaine</City>
    <State>MN</State>
    <Zip>54321</Zip>
    <Child><Name>Jason</Name><Sex>M</Sex></Child>
  </FamilyRecord>
  <FamilyRecord>
    <Name>Matthew</Name>
    <Address>232 Acadia Ave</Address>
    <City>St. Louis</City>
    <State>MO</State>
    <Zip>78223</Zip>
    <Child><Name>William</Name><Sex>M</Sex></Child>
    <Child><Name>Jennifer</Name><Sex>F</Sex>
    </Child>
  </FamilyRecord>
</ns0:File>

The next thing is to create the input and output schema 
bulkoutputschema

and the map 
bulkconversionmap

Now we need to create the ‘mapping’ schema that maps the xml data that BizTalk creates and maps it into the database table.

To speed up the creation of the mapping download the following file and place it in the schema directory of Visual Studio (%InstallRoot%\Xml\Schemas):

Let’s open up the output schema using the XML Editor:

openwith

Make the following changes to the output schema:

<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns:b="http://schemas.microsoft.com/BizTalk/2003" xmlns="http://BulkXMLSample.Output" targetNamespace="http://BulkXMLSample.Output" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
  <!--Add xmlns:sql="urn:schemas-microsoft-com:mapping-schema" to xs:schema element-->
  <xs:annotation>
    <xs:appinfo>
      <b:schemaInfo root_reference="File" xmlns:b="http://schemas.microsoft.com/BizTalk/2003" />
    </xs:appinfo>
  </xs:annotation>
  <xs:element name="File" sql:relation="Header" sql:key-fields="fileid">
    <!--We want to 'map' the element File to the table Header, since they are different we have to use the sql:relation-->
    <!--We also need to tell which is the primary key column-->
    <xs:complexType>
      <xs:sequence>
        <xs:element name="FileName" type="xs:string" />
        <xs:element name="Date" type="xs:string" />
        <xs:element maxOccurs="unbounded" name="FamilyRecord" sql:relation="Family" sql:key-fields="FamilyId">
          <!--Now we need to start defining how this and its parent are related-->
          <xs:annotation>
            <xs:appinfo>
              <sql:relationship parent="Header" parent-key="fileid" child-key="HeaderId" child="Family" />
              <!--The parent and parent-key are defined in the parent table and child and child-key are the columns in the current table-->
            </xs:appinfo>
          </xs:annotation>
          <xs:complexType>
            <xs:sequence>
              <xs:element name="Name" type="xs:string" />
              <xs:element name="Address" type="xs:string" />
              <xs:element name="City" type="xs:string" />
              <xs:element name="State" type="xs:string" />
              <xs:element name="Zip" type="xs:string" />
              <xs:element maxOccurs="unbounded" name="Child" sql:relation="Child" sql:key-fields="ChildId">
                <!--Added the sql:relation and sql:key-fields annotations-->
                <xs:annotation>
                  <xs:appinfo>
                    <sql:relationship parent="Family" parent-key="FamilyId" child="Child" child-key="FamilyId" />
                  </xs:appinfo>
                </xs:annotation>
                <!--I added the entire xs:annotation section-->
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="Name" type="xs:string" sql:field="ChildName" />
                    <!--Added the sql:field since it was different-->
                    <xs:element name="Sex" type="xs:string" sql:field="Gender"/>
                    <!--Added the sql:field since it was different-->
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Now lets save a copy of this output schema in the send handler folder

bulkloadsendhandler

The way the adapter knows which schema to use is it starts 7 characters into the target namespace and replaces # with _ and adds .xsd, so our http://targetnamespace#rootnode is http://BulkXMLSample.Output#File so it becomes BulkXMLSample.Output_File.xsd

bulkschemalocation

In the orchestration, in the construct shape I have a message assignment shape that has the following code:

xpath(OutputMsg.MessagePart,"/*[local-name()='File' and namespace-uri()='http://BulkXMLSample.Output']/*[local-name()='FileName' and namespace-uri()='']")=System.IO.Path.GetFileName(InputMsg(FILE.ReceivedFileName));

I deploy it and set the send port to point to the database:

bulksendportsettings

Bind it and run a file through:

I look in the event log and see this:

Event Type:    Error

Event Source:    SQLBulkXML

Event Category:    None

Event ID:    0

Date:        2/9/2010

Time:        9:52:27 PM

User:        N/A

Computer:   
Description:

The column ‘Date’ was defined in the schema, but does not exist in the database.

   at SQLXMLBULKLOADLib.SQLXMLBulkLoad4Class.Execute(String bstrSchemaFile, Object vDataFile)

   at StottIS.BulkLoad.BulkLoadXML.ThreadProc()

SQLXMLBulkLoad4Class error: The column ‘Date’ was defined in the schema, but does not exist in the database.

The Visual Basic Script to test this message is here:C:\Users\Administrator\AppData\Local\Temp\1e0e7b04-92e1-4f39-a591-2bdedef93a8f.vbs

So without having to run data through again, I can go and look at the data that was attempting to be inserted into the database and run it manually.

I have one of the elements mapped incorrectly to the column, so I modify the schema:

        <xs:element name="Date" type="xs:string" sql:field="ImportDate" />
        <!--I need to map the Date element into the ImportDate column, I use the sql:field attribute to do it-->
        <xs:element maxOccurs="unbounded" name="FamilyRecord" sql:relation="Family" sql:key-fields="FamilyId">

If you run the vbs script and you experience this error

<?xml version="1.0"?>
<Result State="FAILED">
 <Error>
  <HResult>0x80004005</HResult>
  <Description>
   <![CDATA[No data was provided for column '{Primary Key Column}' on table '{Table}', and this column cannot contain NULL values.]]>
  </Description>
  <Source>General operational error</Source>
  <Type>FATAL</Type>
 </Error>
</Result>

Add this line to the vbs file

objBL.KeepIdentity=FALSE

After that I ran the vbs file, I am successful.

bulkdone

I can open up the tables:

RelationalData

Now that I have perfected the annotations to the mapping schema, I can run it through BizTalk without error.

To look at the final schema that is used by the bulkload adapter here:

Here is the sample data:

If you want to change the vbs script, here are the things you can change in the Bulk Load Object Model

Here are more definitions for the Bulk Load Annotations

Feb 022010
 

In creating a process where SQL Server takes XML data that is stored in a table and populates a table, I have learned a few things on how to get SQL Server to more efficiently query XML data.

Setup:

Let’s first create a table to store the data:

CREATE TABLE dbo.XMLDataStore(
  link nvarchar(100) NULL,
  data xml NULL
) ON [PRIMARY]
GO

Let’s take a look at the XML that we are going to query (yes I know is is a large xml document, but I wanted to show the peformance in a real world situation, not a 3 node xml document that is normally demonstrated)

Now let’s insert it into the table:

INSERT INTO [XMLTutorial].[dbo].[XMLDataStore]
           ([link]
           ,[data])
     VALUES
           ('ABCDEFGHIJ'
           ,'<ns0:ORU_R01_231_GLO_DEF xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:ns0="http://labratory/DB/2X">
               ...
            </ns0:ORU_R01_231_GLO_DEF>')
GO

The first query starts at the Observation record (all 64 records) and traverses the xml document and creates the necessary columns in 16 seconds.

select  Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_5_PatientName/XPN_1_GivenName/text())[1])','nvarchar(100)') as FirstName,
    Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_5_PatientName/XPN_0_FamilyLastName/XPN_0_0_FamilyName/text())[1])','nvarchar(100)') as LastName,
    Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_7_DateTimeOfBirth/TS_0_TimeOfAnEvent/text())[1])','nvarchar(100)') as BirthDate,
    Observation.ref.value('((../../Patient/PID_PatientIdentificationSegment/PID_2_PatientId/CX_0_Id/text())[1])','nvarchar(100)') as InsuranceNumber,
    Observation.ref.value('((../OBR_ObservationRequestSegment/OBR_1_SetIdObr/text())[1])','nvarchar(100)')as [OBRID],
    Observation.ref.value('((../OBR_ObservationRequestSegment/OBR_7_ObservationDateTime/TS_0_TimeOfAnEvent/text())[1])','nvarchar(10)') as ObservationDate,
    Observation.ref.value('((../OBR_ObservationRequestSegment/OBR_4_UniversalServiceId/CE_1_Text/text())[1])','nvarchar(100)') as LabTestName,
    null as LabTestCode,
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_1_SetIdObx/text())[1])','nvarchar(100)') as [OBXID],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_3_ObservationIdentifier/CE_4_AlternateText/text())[1])','nvarchar(100)') as LabResultName,
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_3_ObservationIdentifier/CE_0_Identifier/text())[1])','nvarchar(100)') as LabResultCode,
    Observation.ref.value('((./OBR_5_PriorityObr/text())[1])','nvarchar(100)') as [Priority],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_14_DateTimeOfTheObservation/TS_0_TimeOfAnEvent/text())[1])','nvarchar(100)') as [ResultDate],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_5_ObservationValue/CE_4_AlternateText/text())[1])','nvarchar(100)') as [ResultValue],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_6_Units/CE_0_Identifier/text())[1])','nvarchar(100)') as [UnitOfMeasure],
    Observation.ref.value('((./OBX_ObservationResultSegment/OBX_7_ReferencesRange/text())[1])','nvarchar(100)') as [ReferenceRange]
from XMLDataStore x cross apply x.data.nodes('//Observation') Observation(ref)
where x.link='ABCDEFGHIJ'

QueryResults1

The first optimization step is instead of using the Observation node and deriving all of the other columns from that, you can further use CROSS APPLY to create separate nodes to extract data from coupled with not using wild cards and diving directly to the exact node I need to. I went to the OBX_ObservationResultSegment as the originating node, and then from that node (named Observation), I derived two other nodes to reference in the query; Patient and Request.

This time the query completed in 4 seconds:

WITH XMLNAMESPACES ('http://labratory/DB/2X' AS "ns0")
select  Patient.node.value('(PID_5_PatientName/XPN_1_GivenName/text())[1]','nvarchar(100)') as FirstName,
    Patient.node.value('(PID_5_PatientName/XPN_0_FamilyLastName/XPN_0_0_FamilyName/text())[1]','nvarchar(100)') as LastName,
    Patient.node.value('(PID_7_DateTimeOfBirth/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as BirthDate,
    Patient.node.value('(PID_2_PatientId/CX_0_Id/text())[1]','nvarchar(100)') as InsuranceNumber,
    Request.node.value('(OBR_1_SetIdObr/text())[1]','nvarchar(100)')as [OBRID],
    Request.node.value('(OBR_7_ObservationDateTime/TS_0_TimeOfAnEvent/text())[1]','nvarchar(10)') as ObservationDate,
    Request.node.value('(OBR_4_UniversalServiceId/CE_1_Text/text())[1]','nvarchar(100)') as LabTestName,
    null as LabTestCode,
    Observation.node.value('(OBX_1_SetIdObx/text())[1]','nvarchar(100)') as [OBXID],
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_4_AlternateText/text())[1]','nvarchar(100)') as LabResultName,
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_0_Identifier/text())[1]','nvarchar(100)') as LabResultCode,
    Observation.node.value('(OBR_5_PriorityObr/text())[1]','nvarchar(100)') as [Priority],
    Observation.node.value('(OBX_14_DateTimeOfTheObservation/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as [ResultDate],
    Observation.node.value('(OBX_5_ObservationValue/CE_4_AlternateText/text())[1]','nvarchar(100)') as [ResultValue],
    Observation.node.value('(OBX_6_Units/CE_0_Identifier/text())[1]','nvarchar(100)') as [UnitOfMeasure],
    Observation.node.value('(OBX_7_ReferencesRange/text())[1]','nvarchar(100)') as [ReferenceRange]
from XMLDataStore x
cross apply x.data.nodes('ns0:ORU_R01_231_GLO_DEF/CompleteOrder/Order/Observation/OBX_ObservationResultSegment') Observation(node)
cross apply Observation.node.nodes('../../../Patient/PID_PatientIdentificationSegment') Patient(node)
cross apply Observation.node.nodes('../../OBR_ObservationRequestSegment') Request(node)
where x.link='ABCDEFGHIJ'

QueryResults2

Never being satisfied, let’s add an index to the data. However, to add an xml index to the data, we need to create a clustered index on the table. If we simply try to add an xml index to the current table with this command:

CREATE PRIMARY XML INDEX PrimaryXMLIndex ON
dbo.XMLDataStore(data)
GO

We get the following error:

Msg 6332, Level 16, State 201, Line 1
Table 'dbo.XMLDataStore' needs to have a clustered primary key with less than 16 columns in it in order to create a primary XML index on it.

Not descriptive, so let’s create a new table:

CREATE TABLE dbo.OptimizedXMLDataStore(
  id INT IDENTITY PRIMARY KEY,
  link nvarchar(100) NOT NULL,
  data xml NOT NULL
) ON [PRIMARY]
GO

And creating the following indexes in the database:

CREATE PRIMARY XML INDEX PrimaryXMLIndex ON
dbo.OptimizedXMLDataStore(data)
GO
CREATE XML INDEX
XMLDataStore_XmlCol_PATH ON dbo.OptimizedXMLDataStore(data)
USING XML INDEX PrimaryXMLIndex FOR PATH
GO

Now that the indexes are created, let’s insert the data into this table:

INSERT INTO [XMLTutorial].[dbo].[OptimizedXMLDataStore]
           ([link]
           ,[data])
     VALUES
           ('ABCDEFGHIJ'
           ,'<ns0:ORU_R01_231_GLO_DEF xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:ns0="http://labratory/DB/2X">
               ...
            </ns0:ORU_R01_231_GLO_DEF>')
GO

Now let’s run the same query (except pointing to the indexed table):

WITH XMLNAMESPACES ('http://labratory/DB/2X' AS "ns0")
select  Patient.node.value('(PID_5_PatientName/XPN_1_GivenName/text())[1]','nvarchar(100)') as FirstName,
    Patient.node.value('(PID_5_PatientName/XPN_0_FamilyLastName/XPN_0_0_FamilyName/text())[1]','nvarchar(100)') as LastName,
    Patient.node.value('(PID_7_DateTimeOfBirth/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as BirthDate,
    Patient.node.value('(PID_2_PatientId/CX_0_Id/text())[1]','nvarchar(100)') as InsuranceNumber,
    Request.node.value('(OBR_1_SetIdObr/text())[1]','nvarchar(100)')as [OBRID],
    Request.node.value('(OBR_7_ObservationDateTime/TS_0_TimeOfAnEvent/text())[1]','nvarchar(10)') as ObservationDate,
    Request.node.value('(OBR_4_UniversalServiceId/CE_1_Text/text())[1]','nvarchar(100)') as LabTestName,
    null as LabTestCode,
    Observation.node.value('(OBX_1_SetIdObx/text())[1]','nvarchar(100)') as [OBXID],
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_4_AlternateText/text())[1]','nvarchar(100)') as LabResultName,
    Observation.node.value('(OBX_3_ObservationIdentifier/CE_0_Identifier/text())[1]','nvarchar(100)') as LabResultCode,
    Observation.node.value('(OBR_5_PriorityObr/text())[1]','nvarchar(100)') as [Priority],
    Observation.node.value('(OBX_14_DateTimeOfTheObservation/TS_0_TimeOfAnEvent/text())[1]','nvarchar(100)') as [ResultDate],
    Observation.node.value('(OBX_5_ObservationValue/CE_4_AlternateText/text())[1]','nvarchar(100)') as [ResultValue],
    Observation.node.value('(OBX_6_Units/CE_0_Identifier/text())[1]','nvarchar(100)') as [UnitOfMeasure],
    Observation.node.value('(OBX_7_ReferencesRange/text())[1]','nvarchar(100)') as [ReferenceRange]
from OptimizedXMLDataStore x
cross apply x.data.nodes('ns0:ORU_R01_231_GLO_DEF/CompleteOrder/Order/Observation/OBX_ObservationResultSegment') Observation(node)
cross apply Observation.node.nodes('../../../Patient/PID_PatientIdentificationSegment') Patient(node)
cross apply Observation.node.nodes('../../OBR_ObservationRequestSegment') Request(node)
where x.link='ABCDEFGHIJ'

The results came back in 0 seconds

QueryResults3

Things I did not do:

  1. Actually see Clark Kent (I think he was born before June of 1938, but it was the first time he was writtent about)
  2. Question why the “The Last Son of Krypton” was actually getting lab work done
  3. Imported schemas into the database