How conservative should I be with agents?

beck · January 15, 2021, 6:32pm

I have a 2019 MBP with 64 GB of RAM. Right now I’m running 10.15.6, but plan on upgrading to Big Sur unless someone cautions me not to. I am running TB 8.9.1.

I have a TBX document with ~500 notes (data items) from which I have ziplinked quotes to create another 1500 notes (data extracts). The file is at 20MB right now. I want to run agents to explore the relationships among the 1500 data extracts. Often times this means creating agents that crawl all 1500 notes and assign them to smart adornments. These adornment queries may be on various user-defined attributes or based on system attributes like Tags or Flags. A common agent would be mapping, say, 250 notes across 86 smart adornments, another common agent might map 400 notes across 5 adornments. Right now these agents/smart adornments do not have actions, but they could (more likely I’d do the $AgentAction bits in the attribute browser).

I can easily imagine having a hundred or so agents in this file. But I can also creatively work around this to avoid it. I’ve never been so query-based in my TBX files, so I’m not sure what’s asking too much of the software/system.

I’ve gotten into some troublesome loops with TB so far, where the app maxes out my CPU and crashes or becomes unresponsive, but so far I think it’s always because there’s been an error in my code. (Side question: how do I stop TB from trying to open a file that crashed it upon restart? I get into a situation where launching TB means automatically trying to re-open the file that was opened when it became unresponsive and so it immediately becomes unresponsive again. Te only thing that seems to work is to uninstall TB so the preferences reset and then reinstall to get it to “forget” to open the crashed file.)

Your advice/perspective is most appreciated!

eastgate · January 15, 2021, 6:36pm

For any Macintosh application, launch the application while holding down the Shift key.

eastgate · January 15, 2021, 6:54pm

My overall advice: assume that the computer can easily handle what you want to do, while keeping an eye out for signs that it cannot keep up. (One sign that it’s falling behind can be found in the Agents & Rules pane of the Tinderbox inspector: if the number of “pending tasks” keeps growing, you’re in trouble.)

More detailed advice: you’re likely going to be fine — even with lots of agents — provided your agents only need to look at each item once. If your query needs to compare this note with every other note, you’ll be fine for dozens or even hundreds of notes but you’ll hit a wall eventually.

Also, whenever you think about hundreds of agents, also think about the attribute browser.

Practical advice: as far as I know, only two users have ever hit the performance wall with Tinderbox agents other than by error. And one of those was just this week. Avoid premature optimization. Go ahead and try; if it blows up, you can scale things back.

Historical advice: In Tinderbox 5 and before, agents competed with you for the processor’s attention and could interrupt your work. Lots of agent-avoidance dates back to those bad old days.

Planning advice: I expect agent capacity to improve substantially in the coming months and years. Infrastructure work now underway will clean up the way agents do their work and keep them out of your hair. Apple Silicon, and current developments in application architecture, promise significant speed improvements.

beck · January 15, 2021, 7:13pm

This is exceptionally helpful, Mark, thank you!

I will avoid premature optimization and keep the attribute browser in mind. Right now I’m using it to find what agents to create, and then doing so to visually map out what they yield.

Beck

mwra · January 15, 2021, 10:04pm

i think this repays unpacking for less experienced users. Compare these queries:

$Prototype=="Event"&$Text.contains("thing")

descendedFrom("A container")"&$Text.contains("thing")

$Text.contains("thing")

The last is the type being talked about. This is especially true if the next query term is something using regex-using operators like .contains, or in the case of the last example above, the only query term.

Why so? A query has to, initially, query the whole document, as we have to assume any note might match. For TBX with only 50 notes this doesn’t matter. With 5k notes it is a bit different. So ‘scoping’ the query with an initial query term (first two examples above) that can only match part of the document help a lot.

I think part of the confusion is here in the, well-intentioned v2 Help:

The fields within each record must be delimited with either commas (,) or tabs.

ICSV and TSV are similar but different formats. A menu item stating “Import CSV” is _explicitly stating TSV can’t be used. I’d suggest either:

One item “Import CSV or Tab-delimited” and filer on the next dialogue.
'Import" with sub-items ‘CSV’ or ‘Tab-delimited’.

Well intentioned simplification here simply makes the non-tech user’s challenge harder. They likely understand CSV != Tab-delim but not much more than that. It’s kinder—and less load on support—not to lump the two together and the import stage. Even more so if the dialog has no text explaining the delimiter detected or asking the user to select the delimiter.

It’s fair to assume the user brings a well-formed CSV/TSV documented to the start-line. It is unreasonable, given this file is likely the output of a wizard-ised output from another app to assume they know the difference between the two formats.

eastgate · January 15, 2021, 10:30pm

Actually, the most expensive queries are those that involve collect_if or find operators and related operators.

For example, the following query is quite fast:

$Votes > 100

This collects all notes with over 100 votes. But suppose we want to ask, “What notes account for more than 10% of all votes?" One way to do this might be

$Votes > sum_if(all,$Votes,true)/10

That means we need recalculate the total vote afresh for every note we examine. With a dozen notes, that’s not too bad; with a thousand notes, you’re adding up a million numbers every time you update the agent.

Regular expressions can also be slow, especially if you have lots of text. Even there, it matters a lot what you want to do.

$Text.contains(“Albuquerque”)

is not too bad, though if you have a million words in your document, that’s a fair amount of searching. Still, Tinderbox can save a lot of time by being smart; for example, if you find a capital A and the next character is not an “l”, you can stop right there and skip to the next capital A. If you look for

$Text.contains(“A.*que”)

(meaning: an A followed by one or more letters that are followed by “que”), Tinderbox might have to read to the end of the note for each capital A it finds. If your notes are long, that’s a lot of work.