In this week's tutorial we're going to write a CapacityString class which will vastly improve string performance in certain situations. Now, I admit this tutorial isn't exactly going to be eye catching, but I think for some of you it will be quite an eye opener.
The Problem
Let's say that you're going to be importing some data from a file, processing it, and outputting the results to a string. Since this process can take quite a while you're going to want to display a progress dialog with a progress bar that increments realistically based on the current position in the file. Normally you'd have something that looks like this:
bin = File.OpenAsBinaryFile
length = bin.Length
for i = 1 to length step 2048
s = s + ProcessData( bin.Read(2048) )
// <show progress>
next
There's nothing extraordinary about this code at all, but what if I told you that it could be sped up by over 50 times? It's quite possible, and easy.
Note that every time through the loop we assign a value to the string "s", and during that assignment REALbasic reallocates a block of memory to store the contents of that string. The problem is that allocating memory is actually pretty slow, so doing it over and over and over again is very inefficient. The solution is to simply allocate enough memory up front so that it never has to be reallocated. Sounds easy, and it is, but REALbasic strings can't do this so what we need to do is do it ourselves using a MemoryBlock.
The CapacityString Class
Create a new class called CapacityString and add three properties to it: mCapacity as Integer, mLength as Integer, and mData as MemoryBlock. mData is the chunk of memory that we're going to be using to store the string, mCapacity will cache the size of the MemoryBlock (although it will always contain the value returned by mData.Size, the fewer function calls we make the faster the code will be), and because the string inside the MemoryBlock will almost never be the size of the MemoryBlock itself, we use mLength to store the size of the string.
Sub Constructor(capacity as Integer)
mCapacity = Capacity
mData = New MemoryBlock(mCapacity)
End Sub
Function Operator_Convert() As String
return mData.StringValue(0, mLength)
End Function
The constructor initializes the mData MemoryBlock to have the capacity we want, and Operator_Convert is a handy method to return the string that is stored in the CapacityString.
The SetString method below sets the string in the CapacityString. The first thing that each of these methods below does is first check to see if the string will actually fit inside of the MemoryBlock. If it doesn't, it resizes (within the method, function calls would add overhead ;^) and then assigns the string.
Sub SetString(s as String)
dim slen as Integer = LenB(s)
if mCapacity < slen then
mCapacity = slen
mData.Size = mCapacity
end if
mData.StringValue(0, slen) = s
mLength = slen
End Sub
Sub AppendString(s as String)
dim slen as Integer = LenB(s)
if mCapacity < mLength + slen then
mCapacity = mLength + slen
mData.Size = mCapacity
end if
mData.StringValue(mLength, slen) = s
mLength = mLength + slen
End Sub
AppendString is similar to SetString but just adds the string onto the end. This is equivalent to "s = s + ...". The InsertString method below doesn't have a direct equivalent to REALbasic's String type because you have to use Mid or Left and Right with Strings to be able to insert text in the middle. So this not only speeds things up, but gives us extra functionality. That's nice. :^)
Sub InsertString(location as Integer, s as String)
dim slen as Integer = LenB(s)
if mCapacity < mLength + slen then
mCapacity = mLength + slen
mData.Size = mCapacity
end if
// 0 based
location = location - 1
mData.StringValue(location + slen, mLength - location) = mData.StringValue(location, mLength - location)
mData.StringValue(location, slen) = s
mLength = mLength + slen
End Sub
For a simple test of the class, you can use this code:
Sub Action()
dim s as String
dim cs as CapacityString
dim bin as BinaryStream
dim i, length as Integer
dim time as Double
dim file as FolderItem
file = GetOpenFolderItem("")
if file = nil then return
///////////////////////
// Using a String
///////////////////////
time = Microseconds
bin = File.OpenAsBinaryFile
length = bin.Length
for i = 1 to length step 2048
s = s + bin.Read(2048)
next
time = (Microseconds - time) / 1000000
MsgBox "String: " + Format(time, "###.##") + " seconds"
bin.Close
///////////////////////
// Using a CapacityString
///////////////////////
time = Microseconds
bin = File.OpenAsBinaryFile
length = bin.Length
cs = New CapacityString(length)
for i = 1 to length step 2048
cs.AppendString bin.Read(2048)
next
time = (Microseconds - time) / 1000000
MsgBox "CapacityString: " + Format(time, "###.##") + " seconds"
bin.Close
End Sub
Finished
This isn't a completely "finished" class as it doesn't take every string posibility into account, but it's a solid foundation for anyone wanting to take the idea even further. Download this project.