<captions>We want to use this XML to update the Stonefield Query data dictionary. Our first attempt used XMLTOCURSOR() to put the XML into a cursor, which we then process. It took somewhere around 30 seconds, which seems like forever because it happens every time the user starts Stonefield Query. XMLTOCURSOR() uses the XML DOM object, which gets exponentially slower as the XML gets larger.
<newrecord>
<tablename>SomeTableName</tablename>
<fieldname>SomeFieldName</fieldname>
<fieldcaption>Caption for Field</fieldcaption>
</newrecord>
...
</captions>
The next approach was to manually parse the XML using STREXTRACT(). STREXTRACT() is the perfect function for parsing XML because it was designed to find text between delimiters. If you know the XML structure, it's really easy to use. Here's the code we used:
create cursor __CAPTIONS (FIELDNAME C(119), CAPTION C(60))Interestingly, this turned out to be way slower than XMLTOCURSOR() (in fact, I stopped it after a couple of minutes, so I don't know how long it would've taken).
for lnI = 1 to occurs('<tablename>', lcXML)
lcTable = strextract(lcXML, '<tablename>', '</tablename>', lnI)
lcField = strextract(lcXML, '<fieldname>', '</fieldname>', lnI)
lcCaption = strextract(lcXML, '<fieldcaption>', '</fieldcaption>', lnI)
insert into __CAPTIONS values (lcTable + '.' + lcField, lcCaption)
next lnI
It occurred to me while looking at the XML that if we could convert it into comma-delimited text, we could use APPEND FROM to suck it into the cursor. Here's the code for that:
#define ccCRLF chr(13) + chr(10)This took 0.7 seconds, 42 times faster than using XMLTOCURSOR().
lcXML = strtran(lcXML, '<tablename>')
lcXML = strtran(lcXML, '</tablename>' + ccCRLF + '<fieldname>', '.')
lcXML = strtran(lcXML, '</fieldname>' + ccCRLF + '<fieldcaption>', ',"')
lcXML = strtran(lcXML, '</fieldcaption>' + ccCRLF + '</newrecord>' + ;
ccCRLF + '<newrecord>', '"')
lcXML = strtran(lcXML, '</fieldcaption>' + ccCRLF + '</newrecord>' + ;
ccCRLF + '</vfpdata>', '"')
lcTemp = forcepath(sys(2015) + '.txt', sys(2023))
strtofile(substr(lcXML, 91), lcTemp)
create cursor __CAPTIONS (FIELDNAME C(119), CAPTION C(60))
append from (lcTemp) delimited
erase (lcTemp)
But still, I wondered why the STREXTRACT() approach was so slow; my experience is that the VFP text handling functions are blindingly fast. I wondered if it had to do the with fourth parameter, the one that specified the occurrence. I rewrote the code to this:
create cursor __CAPTIONS (FIELDNAME C(119), CAPTION C(60))This took 0.4 seconds. Wow! Nearly two orders of magnitude improvement over the initial attempt. It really does pay to take out the slow parts!
lnLines = alines(laLines, lcXML)
for lnI = 1 to lnLines
lcLine = laLines[lnI]
do case
case '<TableName>' $ lcLine
lcTable = strextract(lcLine, '<TableName>', '</TableName>')
case '<FieldName>' $ lcLine
lcField = strextract(lcLine, '<FieldName>', '</FieldName>')
case '<FieldCaption>' $ lcLine
lcCaption = strextract(lcLine, '<FieldCaption>', '</FieldCaption>')
insert into __CAPTIONS values (lcTable + '.' + lcField, lcCaption)
endcase
next lnI
Doug,
ReplyDeleteThank you for an excellent post. Your post 100% solved the problem I was facing.
We have a log file stored in SQL Server where the log can contain hundreds of thousands of records. To manage this mountain of data, we archive any data beyond 7 days. The data is brought from SQL Server to VFP in a cursor, translated into an XML string and then compressed and stored back in SQL Server. This process is very fast and usually only takes a matter of seconds to complete.
Now has come the time to decompress and extract the XML back to a VFP cursor to get at the archived data. However, when I attempted an XMLTOCURSOR() function, I thought I was making a mistake or that the system had hung.
I went to my smallest archive entries first and noted that the time to translate them through XMLTOCURSOR was getting longer and longer as the XML string grew in size. I did not know why until I read your post.
Realizing now what is going on, I copied your code example, re-worked it to fit our needs here and then put it in to practice.
I am happy to report that a 9.5 MB XML file is translated using the STREXTRACT method you've outlined in about 10 seconds or less. The data comes through perfectly.
Final note: On these big XML files, the XMLTOCURSOR was complaining about control characters like CHR(6). This seemed odd to me because we use CURSORTOXML() to make the conversion in the first place. So, all things being equal, I would expect the complaint to also show up on the CURSORTOXML() as well as the XMLTOCURSOR().
Using the code you've inspired, the problem does NOT show up. I really DO hope someone in Redmond will provide a little help on the matter and get the XMLTOCURSOR() more efficient AND have it work as well as the CURSORTOXML().
Regardless -- again, thank you for your hard work and your post. Without it, I would have been struggling with this for a long time to come. As it was, I spent the better portion of the last 3 days on the issue.
Larry
Marietta, GA
You're welcome, Larry -- glad this was of help.
ReplyDeleteHi Doug,
ReplyDeleteSorry for asking dumb question, but where do you get the lcXML or what does it contain? Thanks. NP
lcXML contains the XML shown in the first part of the entry.
ReplyDeleteThanks Doug, how would I load that data into lcXML? NP.
ReplyDeleteAs I mentioned, in this case it's a COM object that returns that XML. Otherwise, the XML could come from anywhere: a file, a call to a function, TEXT ... ENDTEXT statements, etc. IOW, the same way you'd get the content for any string (which is all XML really is).
ReplyDeleteHi,
ReplyDeleteIf i have an xml that has a collection of tables, how can i specify that lcXML should be only checking the table specified. in the xml2cursor function, you load the xml and use: oXML.Tables(arrayposition).ToCursor()to specify the table you want to use in this collection. how can i do the same in the algorithm you specified?
Thanks,
Nazha
Hi Nazha.
ReplyDeleteChange the STREXTRACT() statement to extract the table name following the table tag as well.
Doug