Thursday, December 13, 2007

Sorting in Natural Order

Jeff Atwood discussed "natural sorting" yesterday in his Coding Horror blog (a blog I highly recommend, by the way). This is something I've needed to do every now and then, so I decided it was time to write a function to do that. The code is shown below. Note that this isn't necessarily the most elegant solution; I used the "brute force" method, so I appreciate any comments about tightening it up.
* Function: NaturalSort
* Purpose: Sorts an array in natural rather than ASCII order
* Author: Doug Hennig
* Last Revision: 12/13/2007
* Parameters: taArray - the array to sort (passed by reference)
* tnColumn - the column to sort on (optional: if it isn't
* specified, column 1 is used)
* tlDescending - .T. to sort in descending order; if .F. or
* not specified, ascending order is used
* Returns: .T. if it succeeded or .F. (and an error is raised) if
* invalid parameters are passed or the specified column
* doesn't contain a homogeneous data type
* Environment in: none
* Environment out: none

lparameters taArray, ;
tnColumn, ;
local lnColumn, ;
lnRows, ;
lnCols, ;
lnOrder, ;
laArray[1], ;
lnI, ;
lcKey, ;
llInNumeric, ;
lcString, ;
lnJ, ;
lcChar, ;
llNumeric, ;
lcNumeric, ;
laClone[1], ;

* Define some constants.

#define cnLENGTH 20
&& the length to pad numeric sections to
&& Function argument value, type, or count is invalid
&& Data type mismatch

* Ensure taArray is an array and tlDescending is logical if it's specified.

if type('taArray', 1) <> 'A' or ;
(pcount() = 3 and vartype(tlDescending) <> 'L')
return .F.
endif type('taArray', 1) <> 'A' ...

* If the column to sort on wasn't specified, assume 1.

lnColumn = iif(pcount() = 2, tnColumn, 1)

* Figure out the size of the source array.

lnRows = alen(taArray, 1)
lnCols = alen(taArray, 2)

* Ensure the column to sort on is valid.

if vartype(lnColumn) <> 'N' or not between(lnColumn, 1, max(lnCols, 1))
return .F.
endif vartype(lnColumn) <> 'N' ...

* Figure out the order flag for ASORT().

lnOrder = iif(tlDescending, 1, 0)

* Get the data type of the first key value. If it isn't character, we don't
* have to do anything fancy; ASORT() will take care of it for us.

if vartype(taArray[1, lnColumn]) = 'C'

* Create an array we'll sort on.

dimension laArray[lnRows, 2]

* Go through each element we're sorting on.

for lnI = 1 to lnRows
lcKey = taArray[lnI, lnColumn]

* Bug out if the data type is different.

if vartype(lcKey) <> 'C'
return .F.
endif vartype(lcKey) <> 'C'

* Create a key that will sort properly by looking for numeric sections and
* left-padding them with zeros.

llInNumeric = .F.
lcString = ''
for lnJ = 1 to len(lcKey)
lcChar = substr(lcKey, lnJ, 1)
llNumeric = isdigit(lcChar) or ;
(lcChar = '.' and isdigit(substr(lcKey, lnJ + 1, 1)))
do case
case llNumeric and llInNumeric
&& if we have a digit and we're already in a numeric
&& section, add to the numeric part
lcNumeric = lcNumeric + lcChar
case llNumeric
&& if we have a digit and we're not in a numeric
&& section, flag that we are in such a section and add to
&& the numeric part
llInNumeric = .T.
lcNumeric = lcChar
case llInNumeric
&& we don't have a digit and we were in a numeric section
&& so pad the section and add it to our string
llInNumeric = .F.
lcString = lcString + padl(lcNumeric, cnLENGTH, '0') + ;
lcString = lcString + lcChar
next lnJ

* Finish the string if we were still processing a numeric section.

if llInNumeric
lcString = lcString + padl(lcNumeric, cnLENGTH, '0')
endif llInNumeric

* Store the new key and the original index in our sort array.

laArray[lnI, 1] = lcString
laArray[lnI, 2] = lnI
next lnI

* Now sort the array and reorder the values in the source array by cloning it
* and copying the values from the each row in the clone to the new row in the
* source array.

asort(laArray, 1, -1, lnOrder, 1)
acopy(taArray, laClone)
for lnI = 1 to lnRows
lnIndex = laArray[lnI, 2]
if lnCols > 0
for lnJ = 1 to lnCols
taArray[lnI, lnJ] = laClone[lnIndex, lnJ]
next lnJ
taArray[lnI] = laClone[lnIndex]
endif lnCols > 0
next lnI

* Just use ASORT to do the job.

asort(taArray, lnColumn, -1, lnOrder, 0)
endif vartype(taArray[1, lnColumn]) = 'C'
return .T.


Anonymous said...

Hi Doug,

Nice post, and thanks for pointing out the Coding Horror article. Until now, I just considered ASCII sort as a fact of life, but this really changed my outlook.
Also, I think it's worth pointing out that if you separated most of the code in that first main loop into its own function, you would have a function you could use in SQL-SELECT and index statements to provide natural sort in those scenarios.

Mike Potjer

Doug Hennig said...

Great suggestion -- thanks, Mike.

Paul Johnson said...

Hey Doug I'm looking for a way to do a natural sort within a Select-SQL statement. This sort issue is a big problem with libraries needing to have their shelf list be a natural sort order. It really becomes a problem if the library is using Library of Congress numbering system which starts with letters. DO you have or know where I can get a Select-SQL statement that can take care of the problem?