Friday, July 07, 2006

SQLXMLBulkLoad Rocks!

For reasons that will become obvious in September (sly grin), I've been working a lot lately on uploading data to SQL Server 2005. There are a variety of mechanisms you can use to do this, but all of them have their shortcomings under certain conditions. For example, using a series of INSERT statements is slow while bulk insert doesn't work with memo fields. While looking for something else in SQL Server Books Online, I happened across a topic on bulk XML loading. After playing with it for a while, it seemed pretty easy to do, so I did some testing on a relatively large file (365,741 records) containing memo fields (which means I couldn't use bulk insert). Using other mechanisms, it took more than two hours to load the data into a SQL Server table. Using bulk XML load, it took 11 minutes. That's a 90% improvement in speed. I love making things faster!

Using bulk XML load is easy: you instantiate SQLXMLBulkLoad.SQLXMLBulkload.4.0, set its ConnectionString property to an OLE DB connection string, set some other properties appropriately (for example, KeepNulls to .T. to insert null rather than default values for missing column values), and call Execute, passing it the name of a schema file and the name of the XML data file. Execute throws an error if there's something wrong with either file, and import errors are logged to a file whose name is in the ErrorLogFile property.

There's one slight wrinkle in working with VFP data: while the CURSORTOXML() function can create a schema file, it needs to be tweaked slightly to work with bulk XML load. Also, DateTime fields have to be flagged with the sql:datatype="dateTime" attribute (the latter was documented, the former took some trial and error to resolve).

Here's a generic program that will load the data from an open VFP cursor into an existing SQL Server table. You can download it from http://www.stonefield.com/articles/other/bulkxmlload.zip.

*==============================================================================
* Function: BulkXMLLoad
* Purpose: Performs a SQL Server bulk XML load
* Author: Doug Hennig
* Last revision: 07/06/2006
* Parameters: tcAlias - the alias of the cursor to export
* tcTable - the name of the table to import into
* tcDatabase - the database the table belongs to
* tcServer - the SQL Server name
* tcUserName - the user name for the connection (optional:
* if it isn't specified, Windows Integrated Security is
* used)
* tcPassword - the password for the connection (optional:
* if it isn't specified, Windows Integrated Security is
* used)
* Returns: an empty string if the bulk load succeeded or the text of
* an error message if it failed
* Environment in: the alias specified in tcAlias must be open
* the specified table and database must exist
* the specified server must be accessible
* there must be enough disk space for the XML files
* Environment out: if an empty string is returned, the data was imported into
* the specified table
*==============================================================================


lparameters tcAlias, ;
tcTable, ;
tcDatabase, ;
tcServer, ;
tcUserName, ;
tcPassword
local lnSelect, ;
lcSchema, ;
lcData, ;
lcReturn, ;
loException as Exception, ;
lcXSD, ;
loBulkLoad


* Create the XML data and schema files.


lnSelect = select()
select (tcAlias)
lcSchema = forceext(tcTable, 'xsd')
lcData = forceext(tcTable, 'xml')
try
cursortoxml(alias(), lcData, 1, 512 + 8, 0, lcSchema)
lcReturn = ''
catch to loException
lcReturn = loException.Message
endtry


* Convert the XSD into a format acceptable by SQL Server. Add the SQL
* namespace, convert the start and end tags to ,
* use the sql:datatype attribute for DateTime fields, and specify the table
* imported into with the sql:relation attribute.


if empty(lcReturn)
lcXSD = filetostr(lcSchema)
lcXSD = strtran(lcXSD, ':xml-msdata">', ;
':xml-msdata" xmlns:sql="urn:schemas-microsoft-com:mapping-schema">')
lcXSD = strtran(lcXSD, 'IsDataSet="true">', ;
'IsDataSet="true" sql:is-constant="1">')
lcXSD = strtran(lcXSD, '<xsd:choice maxOccurs="unbounded">', ;
'<xsd:sequence>')
lcXSD = strtran(lcXSD, '</xsd:choice>', ;
'</xsd:sequence>')
lcXSD = strtran(lcXSD, 'type="xsd:dateTime"', ;
'type="xsd:dateTime" sql:datatype="dateTime"')
lcXSD = strtran(lcXSD, 'minOccurs="0"', ;
'sql:relation="' + lower(tcTable) + '" minOccurs="0"')
strtofile(lcXSD, lcSchema)


* Instantiate the SQLXMLBulkLoad object, set its ConnectionString and other
* properties, and call Execute to perform the bulk import.


try
loBulkLoad = createobject('SQLXMLBulkLoad.SQLXMLBulkload.4.0')
lcConnString = 'Provider=SQLOLEDB.1;Initial Catalog=' + tcDatabase + ;
';Data Source=' + tcServer + ';Persist Security Info=False;'
if empty(tcUserName)
lcConnString = lcConnString + 'Integrated Security=SSPI'
else
lcConnString = lcConnString + 'User ID=' + tcUserName + ;
';Password=' + tcPassword
endif empty(tcUserName)
loBulkLoad.ConnectionString = lcConnString
*** Can set the ErrorLogFile property to the name of a file to write import
*** errors to
loBulkLoad.KeepNulls = .T.
loBulkLoad.Execute(lcSchema, lcData)
lcReturn = ''
catch to loException
lcReturn = loException.Message
endtry


* Clean up.


erase (lcSchema)
erase (lcData)
endif empty(lcReturn)
select (lnSelect)
return lcReturn

15 comments:

Anonymous said...

Doug,

Let me guess, you are creating a VFP to SQL upsizing wizzard that actually works!

I know, you won't tell me if I'm right, so I'll assume I am.

Bob Archer

Doug Hennig said...

Hey Bob, good to hear from you. That's a darn good guess!

tacShooter said...

Hey Doug, I'm having some trouble working with SQLXMLBulkLoad and was wondering if you could point me in the right direction. I'm just getting one error so far, "multiple base for derived type not supported". Can the bulk loader really not handle derived types with multiple bases or am I missing something. Thanks for your time.

Ken Ferguson

Doug Hennig said...

Hi Ken.

I'd check msdn.microsoft.com to see if there are any details about that error message.

Doug

tacShooter said...

Thanks Doug. I tried that to no avail and was making the rounds to every blog on the subject to see if anyone out there had run across this problem... Thanks anyway.

Ken Ferguson

Anonymous said...

Thanks for your contribution.One problem I could not solve is: fields that are empty but don´t accept null values. The SQLXMLBUlkLoad treat theses fields as NULL and prompt an error. Any ideas?

TIA
Moacyr Zalcman

Anonymous said...

1100 individual XML Files to load into SQL Server becomes a LOT easier. Thanks Doug.

Anonymous said...

Get error message Class definition SQLXMLBULKLOAD.SQLXMLBULKLOAD.4.0 is not found

Anonymous said...

Hi Dough,
i've tried the sample its working no error but nothing inserting to the SQL side :(

Doug Hennig said...

Did you try setting loBulkLoad.ErrorLogFile to the name of a file to log errors to?

Anonymous said...

No, how would be this

Anonymous said...

SQLXMLBulkLoad eats CR of CRLF in Memo fields when importing. Any ideas on a workaround?

Doug Hennig said...

Sorry, I don't.

Luis Santos said...

Hello Doug

I have also problems with NULL because my sql table don´t accept NULL.

Could you give me na ideia
Many thanks
Luis Santos

Doug Hennig said...

If you have empty or null values in your data but the fields don't accept null values, then it sounds like you have a data issue you need to resolve.