Zing with CapacityString by Seth Willits
02-11-06




In this week's tutorial we're going to write a CapacityString class which will vastly improve string performance in certain situations. Now, I admit this tutorial isn't exactly going to be eye catching, but I think for some of you it will be quite an eye opener.



The Problem

Let's say that you're going to be importing some data from a file, processing it, and outputting the results to a string. Since this process can take quite a while you're going to want to display a progress dialog with a progress bar that increments realistically based on the current position in the file. Normally you'd have something that looks like this:

bin = File.OpenAsBinaryFile
length = bin.Length

for i = 1 to length step 2048
  s = s + ProcessData( bin.Read(2048) )
  // <show progress>
next

There's nothing extraordinary about this code at all, but what if I told you that it could be sped up by over 50 times? It's quite possible, and easy.

Note that every time through the loop we assign a value to the string "s", and during that assignment REALbasic reallocates a block of memory to store the contents of that string. The problem is that allocating memory is actually pretty slow, so doing it over and over and over again is very inefficient. The solution is to simply allocate enough memory up front so that it never has to be reallocated. Sounds easy, and it is, but REALbasic strings can't do this so what we need to do is do it ourselves using a MemoryBlock.


The CapacityString Class

Create a new class called CapacityString and add three properties to it: mCapacity as Integer, mLength as Integer, and mData as MemoryBlock. mData is the chunk of memory that we're going to be using to store the string, mCapacity will cache the size of the MemoryBlock (although it will always contain the value returned by mData.Size, the fewer function calls we make the faster the code will be), and because the string inside the MemoryBlock will almost never be the size of the MemoryBlock itself, we use mLength to store the size of the string.

Sub Constructor(capacity as Integer)
  mCapacity = Capacity
  mData = New MemoryBlock(mCapacity)
End Sub

Function Operator_Convert() As String
  return mData.StringValue(0, mLength)
End Function


The constructor initializes the mData MemoryBlock to have the capacity we want, and Operator_Convert is a handy method to return the string that is stored in the CapacityString.

The SetString method below sets the string in the CapacityString. The first thing that each of these methods below does is first check to see if the string will actually fit inside of the MemoryBlock. If it doesn't, it resizes (within the method, function calls would add overhead ;^) and then assigns the string.

Sub SetString(s as String)
  dim slen as Integer = LenB(s)
  
  if mCapacity < slen then
    mCapacity = slen
    mData.Size = mCapacity
  end if
  mData.StringValue(0, slen) = s
  mLength = slen
End Sub


Sub AppendString(s as String)
  dim slen as Integer = LenB(s)
  
  if mCapacity < mLength + slen then
    mCapacity = mLength + slen
    mData.Size = mCapacity
  end if
  mData.StringValue(mLength, slen) = s
  mLength = mLength + slen
End Sub


AppendString is similar to SetString but just adds the string onto the end. This is equivalent to "s = s + ...". The InsertString method below doesn't have a direct equivalent to REALbasic's String type because you have to use Mid or Left and Right with Strings to be able to insert text in the middle. So this not only speeds things up, but gives us extra functionality. That's nice. :^)

Sub InsertString(location as Integer, s as String)
  dim slen as Integer = LenB(s)
  
  if mCapacity < mLength + slen then
    mCapacity = mLength + slen
    mData.Size = mCapacity
  end if
  
  // 0 based
  location = location - 1
  
  mData.StringValue(location + slen, mLength - location) = mData.StringValue(location, mLength - location)
  mData.StringValue(location, slen) = s
  mLength = mLength + slen
End Sub


For a simple test of the class, you can use this code:

Sub Action()
  dim s as String
  dim cs as CapacityString
  dim bin as BinaryStream
  dim i, length as Integer
  dim time as Double
  dim file as FolderItem
  
  file = GetOpenFolderItem("")
  if file = nil then return
  
  
  ///////////////////////
  // Using a String
  ///////////////////////
  time = Microseconds
  
  bin = File.OpenAsBinaryFile
  length = bin.Length
  
  for i = 1 to length step 2048
    s = s + bin.Read(2048)
  next
  
  time = (Microseconds - time) / 1000000
  MsgBox "String: " + Format(time, "###.##") + " seconds"
  bin.Close
  
  
  
  ///////////////////////
  // Using a CapacityString
  ///////////////////////
  
  time = Microseconds
  
  bin = File.OpenAsBinaryFile
  length = bin.Length
  cs = New CapacityString(length)
  
  for i = 1 to length step 2048
    cs.AppendString bin.Read(2048)
  next
  
  time = (Microseconds - time) / 1000000
  MsgBox "CapacityString: " + Format(time, "###.##") + " seconds"
  bin.Close
  
End Sub


Finished

This isn't a completely "finished" class as it doesn't take every string posibility into account, but it's a solid foundation for anyone wanting to take the idea even further. Download this project.