Some test data for strings

This forum is meant for examples of X# code.

Post Reply
mainhatten
Posts: 200
Joined: Wed Oct 09, 2019 6:51 pm

Some test data for strings

Post by mainhatten »

Hi,
as I am working on vfp GetWordNum/GetWordCount (perhaps to be reused or -visited with alines()), test data was needed.
I remembered an old article of Steve Black,

https://github.com/StevenBlack/sbc.next ... n-VFP.html

and searched for the raw text, which (of course) had moved, but I found some Tolstoy texts elsewhere.
4 records (War&Peace, Anna Karenina, Resurrection and Kreutzer Sonata) tucked away in memo fields of vfp table and zipped as attachment.

Hope it is sufficient enough as basis for "long string work" - some web sites deliver similar amounts to War&Peace...

Perhaps others need long texts as basis as well - if so, use without FileToStr() or LLF calls.

regards

thomas
LongTxt.zip
(2.81 MiB) Downloaded 107 times
Karl-Heinz
Posts: 774
Joined: Wed May 17, 2017 8:50 am

Some test data for strings

Post by Karl-Heinz »

Hi Thomas,

i´ve found the 3.2 MB "war-and-peace.txt" and made some tests to see how fast some of the X# string and array funcs are. And they are very fast ! Of course the slowest way is to use AAdd(), but keep in mind that there are 65007 elements that must be added step by step.

The required references are Xsharp.Core.dll and XSharp.rt.dll

https://github.com/fluentpython/example ... -peace.txt

Code: Select all

FUNCTION ReadWarAndPeaceTxtFile()AS VOID
LOCAL t1, t2 AS FLOAT	
LOCAL cValue, cFile AS STRING
LOCAL h AS PTR 
LOCAL i, dwLines, dwHits AS DWORD
LOCAL a, b, aStrings AS ARRAY


// https://github.com/fluentpython/example-code/blob/master/attic/sequences/war-and-peace.txt 
// ------------------------------------------------
cFile := "D:testvfpwar-and-peace.txt" 
// ------------------------------------------------

t1 := Seconds() 
cValue := MemoRead ( cFile  )
t2 := Seconds()

?  "MemoRead() seconds:" , t2-t1 // 0.03 secs 


t1 := Seconds() 
dwLines := MLCount( cValue ) // 65007 lines 
t2 := Seconds()

? "MLCount() seconds:" , t2-t1 , "number of lines:" , dwLines //  0.03 secs 
? "Last line:" , MemoLine( cValue , , dwLines )

aStrings := ArrayNew ( 5 )

aStrings[1] := "Russia"  // 775 occurences
aStrings[2] := "Anna"    // 293 occurences
aStrings[3] := "Czar"    // 4 occurences
aStrings[4] := "windows" // 23 occurences
aStrings[5] := "Pentium" // 0 occurences 

FOR i := 1 UPTO ALen ( aStrings ) 

	t1 := Seconds()
	dwHits := Occurs( aStrings[i], cValue )
	t2 := Seconds()
	
	? 'Occurs ("' + aStrings[i] + '" , cValue) ' , "" , "Hits:" , dwHits , "" , "seconds: " , t2 -t1 
	
NEXT	

? 
IF ( h := FOpen( cFile )) !=  F_ERROR  
	
	cValue := ""
	
	t1 := Seconds()
	FRead( h, @cValue, 4000000 )   
	t2 := Seconds()	
	
	? "FRead() seconds:" ,t2 -t1  // 0.03 secs  
	? "content of last line:" , MemoLine( cValue , , dwLines ) 

// --------------------

	a := ArrayNew ( dwLines )   
	
	FRewind(h)

	t1 := Seconds() 
	FOR i:= 1 UPTO dwLines   
		a [i] := FGetS ( h ) 
	NEXT
	t2 := Seconds() 
	
	? "ArrayNew() seconds:" , t2-t1 , "Elements:" , ALen ( a )    // 0,33 secs
	? "content of last line:" , ATail ( a ) 		
	
// -----------------
    ?
    
	b := {}
	
	FRewind ( h )
	    
	t1 := Seconds()
	DO WHILE ! FEof ( h )	
		AAdd ( b, FGetS ( h ) ) 		
	ENDDO 	
	t2 := Seconds()
	
	?  "AAdd() seconds:" , t2-t1, "Elements:" , ALen ( b )  // 0,53 secs
	?  "content of last line:" , ATail(b)  
	 
	?
	
	FClose( h )
	
ENDIF 	
	
RETURN
regards
Karl-Heinz
User avatar
SHirsch
Posts: 281
Joined: Tue Jan 30, 2018 8:23 am

Some test data for strings

Post by SHirsch »

Hi Karl-Heinz,

I tested you prog.
My times differs from yours:
ArrayNew() 1.33 sec
Aadd() 1,53 sec

Than I checked build in .Net method:

Code: Select all

t1 := Seconds()
VAR x := System.IO.File.ReadAllLines(cFile)
t2 := Seconds()
?  "ReadAllLines seconds:" , t2-t1, "Elements:" , x:Length  // 0,02 secs
?  "content of last line:" , x[x:Length]
KH-Test.jpg
KH-Test.jpg (66.78 KiB) Viewed 1907 times
Conclusion: Refactorings are seldom wasted time. :)

Stefan
Karl-Heinz
Posts: 774
Joined: Wed May 17, 2017 8:50 am

Some test data for strings

Post by Karl-Heinz »

Hi Stefan

i added your ReadAllLines() test and the time elapsed is 0,02 secs. About your ArrayNew() and AAdd() results: I cleaned my eyeglasses but my results are still 0,35 and 0,56 secs - always about 1 sec faster than your results. Ok, let´s wait what others report.

regards
Karl-Heinz
User avatar
SHirsch
Posts: 281
Joined: Tue Jan 30, 2018 8:23 am

Some test data for strings

Post by SHirsch »

Hi Karl-Heinz,

haha, I took an old test app referencing Vulcan dlls. Changed this to XSharp an now I can copy your results (0,31 and 0,52 and 0,02). :)

Stefan
Karl-Heinz
Posts: 774
Joined: Wed May 17, 2017 8:50 am

Some test data for strings

Post by Karl-Heinz »

Hi Stefan,

Indeed, if the VN runtime is used instead, the ArrayNew() and AAdd() results are similar to your first post !

regards
Karl-Heinz
User avatar
Chris
Posts: 4562
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Some test data for strings

Post by Chris »

Guys, I think it's also worth comparing the memory consumption off the two versions (X#/vulcan runtime) :)
Chris Pyrgas

XSharp Development Team test
chris(at)xsharp.eu
Karl-Heinz
Posts: 774
Joined: Wed May 17, 2017 8:50 am

Some test data for strings

Post by Karl-Heinz »

Hi Chris,

i found this on GitHub :

https://github.com/X-Sharp/XSharpPublic ... /Start.prg

and modified it a bit. Compared to X#, VN needs a lot more memory.


VULCAN

Memory for 65007 element ARRAY 2.600.332 bytes
Memory for 65007 element ARRAY after assigning 65007 values 2.600.332 bytes
Memory for 65007 element ARRAY after assigning 65007 values with AAdd 2.602.480 bytes

Memory for 1000000 element ARRAY 40.000.056 bytes
Memory for 1000000 element ARRAY after assigning 1000000 values 40.000.056 bytes
Memory for 1000000 element ARRAY after assigning 1000000 values with AAdd 40.195.216 bytes

X#

Memory for 65007 element ARRAY 1.040.192 bytes
Memory for 65007 element ARRAY after assigning 65007 values 1.040.192 bytes
Memory for 65007 element ARRAY after assigning 65007 values with AAdd 1.048.636 bytes

Memory for 1000000 element ARRAY 16.000.080 bytes
Memory for 1000000 element ARRAY after assigning 1000000 values 16.000.080 bytes
Memory for 1000000 element ARRAY after assigning 1000000 values with AAdd 16.777.276 bytes

Anyway, VN is history ;-)

regards
Karl-Heinz
User avatar
Chris
Posts: 4562
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

Some test data for strings

Post by Chris »

Yeap :).

We had been discussing this internally years ago, how to implement USUALs, ARRAYs, FLOATs etc in X#, so that they are much more efficient than in Vulcan. It payed off :)
Chris Pyrgas

XSharp Development Team test
chris(at)xsharp.eu
mainhatten
Posts: 200
Joined: Wed Oct 09, 2019 6:51 pm

Some test data for strings

Post by mainhatten »

SHirsch wrote:haha, I took an old test app referencing Vulcan dlls. Changed this to XSharp an now I can copy your results (0,31 and 0,52 and 0,02). :)
Great - comparing and finding the better implementation/setting to use is one of the things better done with same data.

I read up on vfp9 Eula and Redistrib.txt, and the data files given with vfp examples (Northwind, TastradeData and Data_for_samples) can be deployed. All of these are rehashes of very similar data pool, but needed if one tries to port vfp samples (guite old, must look if a good example for cursoradapter, buffering, conflict update is among them), but could/should serve as data basis for any examples, as no problem with data protection laws exist there.

Also, if an example or speed test needs huge amount of data, it is easier to write a few lines of xBase appending slightly modified data to itself, which one can run on own machine without DL of a few hundred MB.

Did not want to spam vfp/MS data here, but as there seems to be no special Clipper/VO/VN testdata, building on the old MS stuff is IMHO better than creating always new data. Any interest in having common play data ?

If so, I'll install (typically not installed here on dev VM), zip and attach to a post. All three only a few MB IIRC, but it is nice to have tables set up where relational links make some sense and have that structure filled with data if I want to show something from a SQL syntax to GUI example.

As long as path to directory name can be pointed to in example, easy to follow any example running on same DB on own hardware.

regards
thomas
Post Reply