Using += for appending to a string has a big performance benefit

echuck · February 5, 2022, 12:33am

It might be easy to overlook that the new “+=” operator can be used to append strings to an existing string in addition to incrementing a numerical value. However, using the “+=” method provides a significant performance benefit, as I will illustrate with this posting.

The legacy method for appending a string to an existing string has been to code something like this:

$Text = $Text + "some new text...";

The new way to do this with Tinderbox 9.1 and later is as follows:

$Text += "some new text...";

However, while the two approaches shown above achieve the same result, the new += version seems to be much faster. This makes sense, because the new approach does not require Tinderbox to completely replace the contents of $Text (or any other attribute or variable) with the same text plus whatever is being appended. Instead, it merely needs to append the new text.

To test this out, I used the Logging tools that I provided as Toolbox v1b to compare the two approaches to appending new text. More information about the logging tools is available in the post Logging for fun and profit.

The logging tools employ a single function to perform all logging operations. When called, this function will append the provided text as a new log record to a specified log book note. For backwards compatibility with older versions of Tinderbox, the provided Toolbox v1b document uses the “$Text=$Text+string” approach, but in my own Tinderbox documents, I’m using the “+=” approach.

I have just finished a rather complex custom explode project where I use Action code to process a somewhat messy set of imported files into over 300 Tinderbox Prototypes that will later be used in further import operations. Since I use the logging tools for debugging, I was running this custom explode code with several log messages being recorded for each imported file. The net effect was that about 2,000 log records were being recorded into two different logbooks for one complete run of the custom explode. Realizing that this provided a real-world opportunity to test the relative performance of the two approaches to appending strings, I ran this code twice, first with the legacy approach, and subsequently with the “+=” approach. Each test case was started with fresh log books.

In the first case, the execution took 31 minutes. In the second test case, it took less than 5 minutes. This was with just one line of code being changed. Furthermore, since timestamps were being included with the log records, I could see direct evidence that things slowed down the larger the logbooks got. In the first case with the legacy approach, merely watching the progress it was clear that things slowed down toward the end of the execution.

So, if you do a lot of appending of strings, and have found performance suffering, you might want to try modifying your code to use the new “+=” operator.

mwra · February 5, 2022, 11:11am

Thanks for that interesting insight. I’ve already been updating ‘x = x + …’ patterns to ’ x += …’ mainly for concision (readability) but this gives added benefit. I’d infer -= yields a similar advantage—even if only noticeable at scale as in your project.

Here, the Explode itself appears is the incidental context, correct? It’s not stated so I’m the ‘test’ code was running in the explode action or in something like the ‘Exploded Notes’ prototype’s OnAdd.

As to Explode itself , in one research project I quite regularly explode a note with c.3k–5k lines into per-line notes. Why? Essentially, make a note per source attribute (list) value. Even using the default paragraph break marker, the Explode takes more than the blink of an eye, but there’s a lot to do so it’s no inconvenience: some things take time (lots of regex under the hood). I’m lucky as in my case the source $Text is literally one attribute value per line.

However, if I had a similarly big source and the demarcation needed a complex regex, my past solution has been to copy the text (assuming it is not RTF styled) into a tool like BBEdit and inserted a simple string at the begin of each putative split. Using a string like #####, which is unlikely to occur in the body text (but do test that!), I paste back to Tinderbox and use the simpler configuration to split on ##### (deleting the split marker). In a big explode, I’ve found that helps.

But, for the general reader here, I’d stress that most users’ Explode use will be on much smaller source and the result is near instant, so the nuances above are moot. IOW, don’t worry about changing a small doc with 100 small notes and don’t be put off using Explode.

Thanks for this interesting report.

echuck · February 5, 2022, 3:44pm

@mwra—Mark, thanks for the feedback.

I should clarify that the test I conducted really just involved a simple logging function. The “custom explode” operation was merely incidental, and only served to generate a lot of logging activity, hence a lot of text appending actions. This operation did not use the built-in Explode operations provided with Tinderbox.

I should also clarify that I was testing only the text append feature of the new “+=” operator. While the numerical increment (and decrement) variants of the += operator likely have some performance benefits, I would not expect there to be a noticeable difference, whereas text appending is a much more intensive operation.

Since the logging function—LogRec( )—that I developed appends log records to existing text, it offers an easy way to test the performance differences between the legacy method versus the recently introduced “+=” method. Another benefit of logging for this test is that the log records can include timestamps to measure progress. I took advantage of something I was doing anyway, a custom explode operation, to generate a lot of logging activity.

To explain further, the “explode” operation I’m performing is truly a custom operation that does not utilize the built-in explode features of Tinderbox. While I’ve used Tinderbox’s explode operations, and these are useful in typical situations, my project requires a moderately complex procedure to translate another, highly specialized language into Tinderbox notes that can be analyzed, and eventually regenerated based on automated procedures and templates.

Due to complexity of this customized export, I use logging to record the steps in the procedure, and convince myself that the input files are being properly translated into Tinderbox elements. It was this logging activity that provided a convenient opportunity to test out the performance differences between legacy methods for appending text versus the recently introduced += method. In other words, I was performing this custom explode anyway, so I took advantage of the opportunity to also test the performance impact on appending text, but only with the logging function. This really did involve changing just one line of code.

So, I agree that no one should conclude from this test that the Tinderbox Explode operations are inefficient, or likely to consume a lot of wall clock time. I should have probably referred to my operation as “translation” instead of “explode” so as to avoid confusion.

mwra · February 5, 2022, 4:41pm

Thanks for that detail. Most useful!

eastgate · February 6, 2022, 3:45pm

You would expect that, for sufficiently long texts,

$Text=$Text+ msg;

would run in time O(n), where n is the length of the text. We need to retrieve and format the text, and for a long text that can be a lot of work.

If the typical message has length k, then after s steps we have a text of length ks. So, the overall runtime will eventually be O(ks²).

On thing that might make this faster is to accumulate the log messages in a string attribute rather than $Text. But that’s a lot less convenient!

echuck · February 7, 2022, 1:07am

@eastgate—Thanks for the additional explanation.

For the record, the logging I’m doing works fine with Tinderbox, especially with the += enhancement. It is convenient, and I don’t ever anticipate doing the sort of heavy-duty logging that a real syslog service is intended to handle. My use cases are mostly for debugging, and for keeping simple records of when certain operations were performed. I also use this for logging error conditions, but mostly in a debugging context.