Best Practices: Seek() vs AScan()

This forum is meant for questions and discussions about the X# language and tools
Post Reply
User avatar
OhioJoe
Posts: 131
Joined: Wed Nov 22, 2017 12:51 pm
Location: United States

Best Practices: Seek() vs AScan()

Post by OhioJoe »

In another post the brilliant and always-helpful Wolfgang suggested that instead of using a two-dimensional array, we should use an array of objects.

Huh? (Wolfgang apparently forgot whom he was addressing)

To recap: I have a list of about 3000 US postal codes, which I Seek() or AScan() with every loop. Sometimes there are half a million loops, depending on the size of the database that we're importing.

here's the basic structure:
// code, city, state, county
{ { "43756","McConnelsville","OH","58" } }

Example:

WHILE !oJoeDBF:Eof
// do lots of stuff

// then look up the city, county and state (plus some other stuff) on the basis of zip (postal) code
cZip = oJoeDBF:FIELDGET(#FZIP)

IF oZipDBF:Seek( cZip, FALSE, FALSE )
cCity := oZipDBF:FIELDGET(#CITY)
cState := oZipDBF:FIELDGET(#CSTATE)
cCounty := oZipDBF:FIELDGET(#CCOUNTY)
ELSE
cCity := cState := cCounty := Space(0)
ENDIF
oJoeDBF:FIELDPUT(#FCITY, cCity )
oJoeDBF:FIELDPUT(#FSTATE, cState )
oJoeDBF:FIELDPUT(#FCOUNTY, cCounty )
oJoeDBF:Skip(1)
ENDDO

Simple enough. Here's the array method, which I don't often use because I've always thought it creates a slowdown while the OS scrambles to find contiguous memory.

WHILE !oJoeDBF:Eof

cZip = oJoeDBF:FIELDGET(#FZIP)
nElement := AScan( aZips, { |x| x[1] == cZip } )
// by the way, Terry: aZips is already sorted by the first element
IF nElement > 0
cCity := aZips[ nElement, 2]
cState := aZips[ nElement, 3]
cCounty := aZips[ nElement, 4]
ELSE
cCity := cState := cCounty := Space(0)
ENDIF
// do the field replacement
oJoeDBF:Skip(1)
ENDDO

My first question was to ask WHICH method I should choose. To which Wolfgang replied:

"an array lookup will be faster because memory is cheap these days, and a serial read of the database (SetOrder(0)) is really fast. It would be better to not build a two-dimensional array, but an array of objects."

So now two potential new techniques have been added. It would be fun to test and wonderful to know which works best. But first I'm hoping that someone can show me an example of

1. a "serial read" (you mean Locate() ? )
2. an array of objects, as it applies here.

Perhaps this has been explained in this forum before, so please just enlighten me with a link.

Remember: the aforementioned loop runs up to half a million times in one sequence.

Thank you, everyone.
Joe Curran
Ohio USA
User avatar
wriedmann
Posts: 3644
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

Best Practices: Seek() vs AScan()

Post by wriedmann »

Hi Joe,
please let me explain:
a serial read is the following code:

Code: Select all

oServer:SetOrder( 0 )
oServer:GoTop()
while ! oServer:EOF
  // do something
  oServer:Skip()
enddo
Reading the database in its "natural" order is the fastest read available because it excludes any index.
And using an object makes the code easier to read and more stable (using export variables instead of better protect variables and access entities to have less code to understand):

Code: Select all

class clsZip
  export cCode as string
  export cCity as string
  export cState as string
  export cCounty as string
method Init( oDBF ) class clsZip
  cCode := oDBF:FieldGet( #Zip )
  cCity := oDBF:FieldGet( #City )
  cState := oDBF:FieldGet( #CState )
  cCounty := oDBF:FieldGet( #CCounty )
  return self
and to fill the array:

Code: Select all

aZIP := {}
oServer:SetOrder( 0 )
oServer:GoTop()
while ! oServer:EOF
  AAdd( aZIP, clsZip{ oServer } )
  oServer:Skip()
enddo
and finally to search:

Code: Select all

nElement := AScan( aZip, { |o| o:cZip == cZip } )
HTH

Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
User avatar
OhioJoe
Posts: 131
Joined: Wed Nov 22, 2017 12:51 pm
Location: United States

Best Practices: Seek() vs AScan()

Post by OhioJoe »

Thank you, Wolfgang.

At this moment I can't see how an array of objects could possibly be faster and less memory-intense than a multidimensional array of simple character strings, but I will certainly try it!
Joe Curran
Ohio USA
User avatar
Chris
Posts: 4562
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Best Practices: Seek() vs AScan()

Post by Chris »

Hi Joe,

A simple array of objects is not faster, but it provides a little better compile time checking that multi-dim arrays, if you store array elements to local vars before using them. Especially if the object has say 10 or more properties, isn't it much easier and cleaner managing an array of objects, than an array of arrays?

But what is indeed faster and absolutely most robust and provides full compile time checking, is using a strongly typed list (array) of objects that is available in .Net. Something like:

Code: Select all

USING System.Collections.Generic

LOCAL aZip AS List<clsZip>
aZip := List<clsZip>{}
aZip:Add(clsZip{"city" , "zip number"})
aZip:Add(clsZip{"other city" , "zip no2"})
...
and then, in order to search for zip code cZip, you can use:

Code: Select all

LOCAL oResult AS clsZip	
oResult := aZip:Find({oZip AS clsZip => oZip:cCode == cZip })
? oResult:cCode
Because everything is strongly typed in this code, the array uses the least possible memory, it's very fast and it's a lot less prone for errors, because the compiler will tell you if you have done something wrong, like searching for a numeric instead of a string etc.

And if absolute speed is very important, you can use a SortedList or a Dictionary instead of a simple List, which can provide a 10x or even 100x speed on searching for an item!
Chris Pyrgas

XSharp Development Team test
chris(at)xsharp.eu
User avatar
wriedmann
Posts: 3644
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

Best Practices: Seek() vs AScan()

Post by wriedmann »

Hi Joe,
for sure an array of objects (in VO) is less memory efficient that a bidimensional array, but it is much easier to understand and to read, specially after several years when you don't remember if the county is element 3 or 4.
As Chris wrote, in X# this is an entire other thing as you can use collections to be more efficient and to have more compile time checkings.
Whenever I touch VO code that uses bi- or multidimensional arrays, I rewrite that code to use objects because it is much easier to read and therefore the code is also more stable and easier to debug.
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
User avatar
robert
Posts: 4225
Joined: Fri Aug 21, 2015 10:57 am
Location: Netherlands

Best Practices: Seek() vs AScan()

Post by robert »

Wolfgang,
wriedmann wrote:Hi Joe,
and finally to search:

Code: Select all

nElement := AScan( aZip, { |o| o:cZip == cZip } )
This will do a late bound ivar access into o because o is not typed and therefore treated as a usual.
So there is also no compile time check if property of field cZip exists.
If you want early bound code in VO you could do this

Code: Select all

LOCAL oZip as clsZip
.
.
nElement := AScan( aZip, { |o| oZip := o, oZip:cZip == cZip } )

In X# you can do this

Code: Select all

nElement := AScan( aZip, { |o as clsZip|  o:cZip == cZip } )

However for a fast lookup in a 3000 element list I would not use an array or list of object but create a dictionary of strings and clsZip objects Dictionary <String, clsObject>. If you want case insensitive comparisons then pass StringComparer.OrdinalIgnoreCase to the constructor of the dictionary to do so.

The Ascan code will use an average of 1500 (1/2 n) comparisons. A Dictionary of this size will do a binary search so that will be square root of n, = Sqrt(3000) ~ 55 comparisons.

If you do this with a DBF then you are also using a dictionary like approach because the index is also ordered like binary tree.
So I think in VO the DBF is the better solution. Especially when the DBF is on a SSD.
Things would be different if there are only 100 or less elements. In that case having everything in memory certainly beats having to do disk access.

Robert
XSharp Development Team
The Netherlands
robert@xsharp.eu
User avatar
wriedmann
Posts: 3644
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

Best Practices: Seek() vs AScan()

Post by wriedmann »

Hi Robert,
thank you very much!
The trick to use a local typed variable looks really good!
Of course in X# a dictionary is the best option (but only if the key is unique and do not contains any strange characters.
Regarding memory vs disk access: most of the time my software is used on networks and therefore I'm using global caching objects that are read at the first access and then reused through the entire program use. So I can save all the frequent open/search/close processes.
For this purpose in VO I have a clsCachedServer class that checks at every use if the data on the disk has been changed (timestamp).
I have to rewrite that class for X# to use an internal dictionary instead of an AScan call.
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
mainhatten
Posts: 200
Joined: Wed Oct 09, 2019 6:51 pm

Best Practices: Seek() vs AScan()

Post by mainhatten »

robert wrote: The Ascan code will use an average of 1500 (1/2 n) comparisons. A Dictionary of this size will do a binary search so that will be square root of n, = Sqrt(3000) ~ 55 comparisons.
Hi Robert,

IIRC Dotnet Dictionary as generalizable construct was first based on Hashtable. Dim memory bells here ringing on implementation of Dotnet Dictionary switching store/search paradigma depending on # items in table and structure of key: hash function call on 50 records with integer keys makes small or no sense, as hash collisions also must be guarded against.
Where did you find that Dotnet Dictionary with 3000 items still uses binary search pattern ? Or did you trace/debug ?

curious
thomas
Post Reply