xsharp.eu • StringBuilder performance
Page 1 of 1

StringBuilder performance

Posted: Thu Mar 12, 2020 4:05 pm
by wriedmann
Hi all interested people,
please see this code:

Code: Select all

cBuffer := DateTime.Now:ToString()
foreach oTag as PlanTag in _oPlanTage
    	cBuffer := cBuffer + oTag:DebugString( 1 )
    	foreach oPosition as PlanPosition in _oPlanPositionen
    		cBuffer := cBuffer + oPosition:DebugString( 1 )
    	next
next
cBuffer := cBuffer + DateTime.Now:ToString()
In my application this code creates a text file with over 95.000 lines.
The code takes a lot of time (5 minutes 36 seconds) and uses an entire processor core.
A simple optimization makes it behave better:

Code: Select all

cBuffer := DateTime.Now:ToString()
foreach oTag as PlanTag in _oPlanTage
 	cBuffer := cBuffer + oTag:DebugString( 1 )
  	cPosition := ""
    	foreach oPosition as PlanPosition in _oPlanPositionen
    		cPosition := cPosition + oPosition:DebugString( 1 )
    	next
    	cBuffer := cBuffer + cPosition
next
cBuffer := cBuffer + DateTime.Now:ToString()
The only change is that instead of adding every substring to the main buffer there is an intermediate buffer.
This reduces the needed time to about 4 seconds!!!
But the use of the StringBuilder class makes the code again perform faster:

Code: Select all

oSB := StringBuilder{}
oSB:AppendLine( DateTime.Now:ToString() )
foreach oTag as PlanTag in _oPlanTage
    	oSB:Append( oTag:DebugString( 1 ) )
    	foreach oPosition as PlanPosition in _oPlanPositionen
    		oSB:Append( oPosition:DebugString( 1 ) )
    	next
next
oSB:AppendLine( DateTime.Now:ToString() )
cBuffer := oSB:ToString() 
The code now takes only 2 seconds!
Wolfgang
P.S. in VO you can see similar differences, but there is no StringBuilder class available.

StringBuilder performance

Posted: Thu Mar 12, 2020 4:31 pm
by Chris
Hi Wolfgang,

Very good sample!

Furthermore, if you know in advance the size (more or less) of the final string, then specify this in the constructor of the StringBuilder object, this will make sure that its internal buffer will only allocated once (instead of dozens of times if you do not specify a starting size), which will further improve performance.

Also, if you do this very often in your app, then it's also a good idea to always (re)use a single StringBuilder object, instead of creating a new one every time. Just reset to zero string size after you are done with it (with oSB:Length := 0), this will keep the internal buffer intact, which will prevent any further memory allocation when you generate new text in the string builder. Only further memory allocation will happen when converting it to a normal string.

StringBuilder performance

Posted: Thu Mar 12, 2020 6:47 pm
by wriedmann
Hi Chris,
I had tried to build a StringBuilder class in VO, but unfortunately it was slower that a simple string concatenation as in the 2nd sample.
This is the relative VO-Code:

Code: Select all

class StringBuilder
protect _aElements			as array
	
declare method Append
declare method GetString
	
method Init() class StringBuilder
_aElements := {}
return self
	
method Append( cString as string ) as void pascal class StringBuilder
AAdd( _aElements, cString )
return
	
method GetString() as string pascal class StringBuilder
local ptrResult as byte ptr
local ptrTemp as byte ptr
local nLen as dword
local nI as dword
local nBufLen as dword 
local nTotalLen as dword 
local nIndex as dword
local cBuffer as string
local cResult as string
	
nLen := ALen( _aElements )
nBufLen := 0
for nI := 1 upto nLen         
  cBuffer := _aElements[nI]
  nTotalLen := nTotalLen + SLen( cBuffer )
next
if nTotalLen == 0
  cResult := ""
else
  ptrResult := MemAlloc( nTotalLen )
  if ptrResult == null_ptr
    _Break( "memory allocation error - failed to allocate " + NTrim( nTotalLen ) + " bytes" )
  endif
  ptrTemp := ptrResult
  nIndex := 0
  for nI := 1 upto nLen         
    cBuffer := _aElements[nI]
    nBufLen := SLen( cBuffer )
    MemCopyString( ptrTemp, cBuffer, nBufLen )
    ptrTemp := ptrTemp + nBufLen
  next
  cResult := Mem2String( ptrResult, nTotalLen )
  MemFree( ptrResult ) 
endif
	
return cResult
I'm pretty sure this code can be enhanced, but after the first checks I decided to to put more time in this class.
Wolfgang

StringBuilder performance

Posted: Thu Mar 12, 2020 8:18 pm
by Jamal
Hi Wolfgang,

While you are at it, just wondering if you create X# or C# COM object and initialize a StringBuilder object like Chris suggested, then use it in a similar fashion, what would the performance be beyond the initial COM object call.

Jamal

StringBuilder performance

Posted: Fri Mar 13, 2020 5:45 am
by wriedmann
Hi Jamal,
I have not tested that, but in my experience (and I do a LOT of COM interaction between X# modules and VO applications) the COM interface is not very fast (and cannot be very fast because there is a lot of code and a lot of conversions involved).
Wolfgang

StringBuilder performance

Posted: Fri Mar 13, 2020 8:30 am
by Serggio
You're welcome (see the attachment)

StringBuilder performance

Posted: Sat Mar 14, 2020 8:42 am
by Karl-Heinz
wriedmann wrote:Hi Chris,
I had tried to build a StringBuilder class in VO, but unfortunately it was slower that a simple string concatenation as in the 2nd sample.
Hi Wolfgang,

i agree, even when i use static memory only i see no speed advantages. Maybe i overlooked something, but when i compare the results of your stringbuilder with mine the speed differences are not that much as i would expect.

Code: Select all

CLASS StringbuilderMem  INHERIT Vobject
PROTECT _ptrValue  AS BYTE PTR
PROTECT _dwCurrentPos AS DWORD
PROTECT _dwStep := 2000 AS DWORD 

DECLARE METHOD Append
DECLARE METHOD GetString 

METHOD Append ( cValue AS STRING )  AS VOID PASCAL CLASS StringbuilderMem  
LOCAL dwLen AS DWORD             

      
	dwLen := SLen ( cValue ) 
		

	IF dwLen > 0  
      	
				
		IF _dwCurrentPos + dwLen  > MemLen ( _ptrValue ) 
					
//		  ? "MemRealloc"  ,  MemLen ( _ptrValue )  , dwLen , _dwCurrentPos  
					
			_ptrValue := MemRealloc ( _ptrValue , MemLen (_ptrValue ) + _dwStep )   
					
		ENDIF 	
				

  	   MemCopyString ( PTR ( _CAST , _ptrValue  + _dwCurrentPos )  , cValue , dwLen )  

     	_dwCurrentPos += dwLen
	     	
	ENDIF	     	


	RETURN 
METHOD Destroy() CLASS StringbuilderMem 

	
	UnRegisterAxit(SELF) 
   	
	IF _ptrValue != NULL_PTR 
		MemFree ( _ptrValue ) 		
		
	ENDIF 	

	RETURN NIL 


METHOD GetString() AS STRING PASCAL CLASS StringbuilderMem 
 
	IF  _ptrValue == NULL_PTR  .OR. _dwCurrentPos == 0
		RETURN NULL_STRING
		
	ENDIF			
		
	RETURN Mem2String ( _ptrValue , _dwCurrentPos ) 

	                      

METHOD Init( nCapacity ) CLASS StringbuilderMem  


	Default (@nCapacity, _dwStep )   
	

	_ptrValue := MemAlloc ( nCapacity )
		
	_dwStep := nCapacity
	 
  	RegisterAxit ( SELF )
  	           

	RETURN SELF  


regards
Karl-Heinz

StringBuilder performance

Posted: Sat Mar 14, 2020 3:50 pm
by ArneOrtlinghaus
I have also made the experience that often repeated string operations with strings for 1000 characters and more get very expensive. In VO already many years ago I made a class similar to stringbuilder to use memalloc functions for avoiding triggering the garbage collector and there was a huge difference in speed. Now with X# it is very similar: the dynamic memory can get cost intensive. Making tests with a performance profiler show that much time goes into treating strings, even if fully strong typed.

StringBuilder performance

Posted: Mon Mar 16, 2020 3:23 pm
by mainhatten
wriedmann wrote:I have not tested that, but in my experience (and I do a LOT of COM interaction between X# modules and VO applications) the COM interface is not very fast (and cannot be very fast because there is a lot of code and a lot of conversions involved).
Hi Wolfgang,
gut reaction hints at following my second programming mantra: "Chunky, not Chatty" when it comes to calling across layers, as such layers sometimes have realistc physical borders - in this case the marshalling code. I am pretty certain that your first example done across COM into Stringbuilder, would be slower - at least at first / for strings not really large. The second example, first concatenating lots of tiny strings into intermediate, then doing 1 large append - there the benefit of not tasking memory managment with large discarded memory areas might be better as target string is in multi-megabyte range.

In vfp we have similar issues, typically when string sizes rize above 10K and memory allotment is set for small VM. Typical response is similar to your second way (as we have no StringBuilder type), although often with the twist of not only adding small strings into 1 string, but a small array of strings, which then can be concatenated in 1 line

Code: Select all

laTmp = ""   && setting all elements to 1 start value is nice in this context
for lnRun = 1 to 7
    *-- build laTmps
next
lcLargeString = lcLargeString + laTmp[1] + laTmp[2] + laTmp[3] + laTmp[4] + laTmp[5] + laTmp[6] + laTmp[7] 
as the slow part is not the concat of one or more strings, but the release of previous var, claiming new memory and assigning the total of right side of the line. Can be seen by measuring: as lcLargestring grows, adding strings of identical length gets slower as lcLargeString grows.
But easiest way (even if going against "RAM is always faster" reflex) is to open a buffered low level file and just appending the new strings until result is finished. If needed loading them once with FileToStr() for further processing is often faster than always memcpying it around in process space, as all internal memory allotment and garbage collection is sidestepped until final load.
Unixoid behaviour makes sense there and is even easier to code and read. (Noticed that xSharp does not differentiate between buffered or unbufferef LLF, but as buffered was/is vfp default behaviour, probably xSharp LLF implementation defaults to buffered as well. Question already raised on GIT)

That was true on old HD, and SSD improved write throughput as well.

regards
thomas