xsharp.eu • Best Practices: Seek() vs AScan()
Page 1 of 1

Best Practices: Seek() vs AScan()

Posted: Tue May 25, 2021 6:17 pm
by OhioJoe
In another post the brilliant and always-helpful Wolfgang suggested that instead of using a two-dimensional array, we should use an array of objects.

Huh? (Wolfgang apparently forgot whom he was addressing)

To recap: I have a list of about 3000 US postal codes, which I Seek() or AScan() with every loop. Sometimes there are half a million loops, depending on the size of the database that we're importing.

here's the basic structure:
// code, city, state, county
{ { "43756","McConnelsville","OH","58" } }

Example:

WHILE !oJoeDBF:Eof
// do lots of stuff

// then look up the city, county and state (plus some other stuff) on the basis of zip (postal) code
cZip = oJoeDBF:FIELDGET(#FZIP)

IF oZipDBF:Seek( cZip, FALSE, FALSE )
cCity := oZipDBF:FIELDGET(#CITY)
cState := oZipDBF:FIELDGET(#CSTATE)
cCounty := oZipDBF:FIELDGET(#CCOUNTY)
ELSE
cCity := cState := cCounty := Space(0)
ENDIF
oJoeDBF:FIELDPUT(#FCITY, cCity )
oJoeDBF:FIELDPUT(#FSTATE, cState )
oJoeDBF:FIELDPUT(#FCOUNTY, cCounty )
oJoeDBF:Skip(1)
ENDDO

Simple enough. Here's the array method, which I don't often use because I've always thought it creates a slowdown while the OS scrambles to find contiguous memory.

WHILE !oJoeDBF:Eof

cZip = oJoeDBF:FIELDGET(#FZIP)
nElement := AScan( aZips, { |x| x[1] == cZip } )
// by the way, Terry: aZips is already sorted by the first element
IF nElement > 0
cCity := aZips[ nElement, 2]
cState := aZips[ nElement, 3]
cCounty := aZips[ nElement, 4]
ELSE
cCity := cState := cCounty := Space(0)
ENDIF
// do the field replacement
oJoeDBF:Skip(1)
ENDDO

My first question was to ask WHICH method I should choose. To which Wolfgang replied:

"an array lookup will be faster because memory is cheap these days, and a serial read of the database (SetOrder(0)) is really fast. It would be better to not build a two-dimensional array, but an array of objects."

So now two potential new techniques have been added. It would be fun to test and wonderful to know which works best. But first I'm hoping that someone can show me an example of

1. a "serial read" (you mean Locate() ? )
2. an array of objects, as it applies here.

Perhaps this has been explained in this forum before, so please just enlighten me with a link.

Remember: the aforementioned loop runs up to half a million times in one sequence.

Thank you, everyone.

Best Practices: Seek() vs AScan()

Posted: Tue May 25, 2021 6:48 pm
by wriedmann
Hi Joe,
please let me explain:
a serial read is the following code:

Code: Select all

oServer:SetOrder( 0 )
oServer:GoTop()
while ! oServer:EOF
  // do something
  oServer:Skip()
enddo
Reading the database in its "natural" order is the fastest read available because it excludes any index.
And using an object makes the code easier to read and more stable (using export variables instead of better protect variables and access entities to have less code to understand):

Code: Select all

class clsZip
  export cCode as string
  export cCity as string
  export cState as string
  export cCounty as string
method Init( oDBF ) class clsZip
  cCode := oDBF:FieldGet( #Zip )
  cCity := oDBF:FieldGet( #City )
  cState := oDBF:FieldGet( #CState )
  cCounty := oDBF:FieldGet( #CCounty )
  return self
and to fill the array:

Code: Select all

aZIP := {}
oServer:SetOrder( 0 )
oServer:GoTop()
while ! oServer:EOF
  AAdd( aZIP, clsZip{ oServer } )
  oServer:Skip()
enddo
and finally to search:

Code: Select all

nElement := AScan( aZip, { |o| o:cZip == cZip } )
HTH

Wolfgang

Best Practices: Seek() vs AScan()

Posted: Tue May 25, 2021 8:24 pm
by OhioJoe
Thank you, Wolfgang.

At this moment I can't see how an array of objects could possibly be faster and less memory-intense than a multidimensional array of simple character strings, but I will certainly try it!

Best Practices: Seek() vs AScan()

Posted: Tue May 25, 2021 10:54 pm
by Chris
Hi Joe,

A simple array of objects is not faster, but it provides a little better compile time checking that multi-dim arrays, if you store array elements to local vars before using them. Especially if the object has say 10 or more properties, isn't it much easier and cleaner managing an array of objects, than an array of arrays?

But what is indeed faster and absolutely most robust and provides full compile time checking, is using a strongly typed list (array) of objects that is available in .Net. Something like:

Code: Select all

USING System.Collections.Generic

LOCAL aZip AS List<clsZip>
aZip := List<clsZip>{}
aZip:Add(clsZip{"city" , "zip number"})
aZip:Add(clsZip{"other city" , "zip no2"})
...
and then, in order to search for zip code cZip, you can use:

Code: Select all

LOCAL oResult AS clsZip	
oResult := aZip:Find({oZip AS clsZip => oZip:cCode == cZip })
? oResult:cCode
Because everything is strongly typed in this code, the array uses the least possible memory, it's very fast and it's a lot less prone for errors, because the compiler will tell you if you have done something wrong, like searching for a numeric instead of a string etc.

And if absolute speed is very important, you can use a SortedList or a Dictionary instead of a simple List, which can provide a 10x or even 100x speed on searching for an item!

Best Practices: Seek() vs AScan()

Posted: Wed May 26, 2021 3:57 am
by wriedmann
Hi Joe,
for sure an array of objects (in VO) is less memory efficient that a bidimensional array, but it is much easier to understand and to read, specially after several years when you don't remember if the county is element 3 or 4.
As Chris wrote, in X# this is an entire other thing as you can use collections to be more efficient and to have more compile time checkings.
Whenever I touch VO code that uses bi- or multidimensional arrays, I rewrite that code to use objects because it is much easier to read and therefore the code is also more stable and easier to debug.
Wolfgang

Best Practices: Seek() vs AScan()

Posted: Wed May 26, 2021 5:54 am
by robert
Wolfgang,
wriedmann wrote:Hi Joe,
and finally to search:

Code: Select all

nElement := AScan( aZip, { |o| o:cZip == cZip } )
This will do a late bound ivar access into o because o is not typed and therefore treated as a usual.
So there is also no compile time check if property of field cZip exists.
If you want early bound code in VO you could do this

Code: Select all

LOCAL oZip as clsZip
.
.
nElement := AScan( aZip, { |o| oZip := o, oZip:cZip == cZip } )

In X# you can do this

Code: Select all

nElement := AScan( aZip, { |o as clsZip|  o:cZip == cZip } )

However for a fast lookup in a 3000 element list I would not use an array or list of object but create a dictionary of strings and clsZip objects Dictionary <String, clsObject>. If you want case insensitive comparisons then pass StringComparer.OrdinalIgnoreCase to the constructor of the dictionary to do so.

The Ascan code will use an average of 1500 (1/2 n) comparisons. A Dictionary of this size will do a binary search so that will be square root of n, = Sqrt(3000) ~ 55 comparisons.

If you do this with a DBF then you are also using a dictionary like approach because the index is also ordered like binary tree.
So I think in VO the DBF is the better solution. Especially when the DBF is on a SSD.
Things would be different if there are only 100 or less elements. In that case having everything in memory certainly beats having to do disk access.

Robert

Best Practices: Seek() vs AScan()

Posted: Wed May 26, 2021 6:05 am
by wriedmann
Hi Robert,
thank you very much!
The trick to use a local typed variable looks really good!
Of course in X# a dictionary is the best option (but only if the key is unique and do not contains any strange characters.
Regarding memory vs disk access: most of the time my software is used on networks and therefore I'm using global caching objects that are read at the first access and then reused through the entire program use. So I can save all the frequent open/search/close processes.
For this purpose in VO I have a clsCachedServer class that checks at every use if the data on the disk has been changed (timestamp).
I have to rewrite that class for X# to use an internal dictionary instead of an AScan call.
Wolfgang

Best Practices: Seek() vs AScan()

Posted: Thu Jun 17, 2021 10:41 am
by mainhatten
robert wrote: The Ascan code will use an average of 1500 (1/2 n) comparisons. A Dictionary of this size will do a binary search so that will be square root of n, = Sqrt(3000) ~ 55 comparisons.
Hi Robert,

IIRC Dotnet Dictionary as generalizable construct was first based on Hashtable. Dim memory bells here ringing on implementation of Dotnet Dictionary switching store/search paradigma depending on # items in table and structure of key: hash function call on 50 records with integer keys makes small or no sense, as hash collisions also must be guarded against.
Where did you find that Dotnet Dictionary with 3000 items still uses binary search pattern ? Or did you trace/debug ?

curious
thomas