An aTbRef for AIs?

mwra · April 3, 2026, 1:17pm

During an interesting zoom call with @andreas we explored the notion of writing for an AI. The AI’s stated preference was for consuming aTbRef as XML.

Prompt: Do you prefer to look using Tinderbox using the Tinderbox skill or by directly reading the XML?

Honestly, direct XML reading is faster and more reliable for most tasks:

No dependency on Tinderbox being open

Can search/parse the full document instantly

Better for bulk queries and exploration

The Tinderbox skill (JXA) is better when you need to write changes back to a live, open document — creating notes, modifying attributes, etc.

So: read via XML, write via the skill. What are you looking to do?

Prompt: Why is it quicker or more effective to use the XML

A few practical reasons:

Direct file access — I read the .tbx file with the Read or Grep tool instantly, no round-trips through osascript and the Tinderbox app.

Bulk search — I can Grep across the entire XML for any attribute, note name, or value in one shot. JXA would require iterating note by note.

No app dependency — the file can be read even if Tinderbox isn’t running or the document isn’t open.

Full structure visible — the raw XML exposes everything: attributes, prototypes, links, agents, adornments — nothing is hidden behind an API.

The JXA skill is more of a necessity for writes, since you shouldn’t edit a live Tinderbox document’s XML directly while it’s open (you risk corruption or having changes overwritten when Tinderbox saves).

Prompt: Could the XML be written more? Plainly to a to aid you in reading the structure of the document

Now I have a sense of it. The format is fairly readable but verbose — lots of attributes with many XML properties (editable, visibleInEditor, lines, type, default) that are mostly boilerplate for system attributes.

To answer your question: yes, it could be simplified for my purposes. A few approaches:

Strip boilerplate attributes — for system <attrib> definitions, most of the XML attributes (editable, visibleInEditor, lines) are noise. Only Name, parent, type, and default are semantically interesting.

Export a summary — Tinderbox can export note outlines or tab-delimited data that is far more compact than the full XML.

Preprocess to a plain outline — a simple script could walk the XML and produce a flat or indented text representation of just notes, their key attributes, and their values, stripping the schema/definition boilerplate entirely.

How large is the file? That would tell me how much the verbosity is actually a problem in practice.
<checks>
200,000 lines and ~19MB — that’s substantial. Reading it raw would consume a lot of context. So yes, simplification would genuinely help.

The most practical approach would be a Python script that parses the XML and outputs a compact representation — just note names, paths, key user-defined attributes, and text content, skipping all the system schema definitions. That could reduce it by an order of magnitude.

Would you like me to write such a script?

So we did. The plan was…

Plan: Tinderbox XML Simplifier (simplify_tbx.py)

A Python script to strip noise from the 19MB .tbx XML and produce a compact plain-text outline for LLM reading. Key decisions:

Keep: note names, hierarchy, proto, semantic attributes (URL, Rule, Edict, AgentQuery, AgentAction, IsPrototype), text content, resolved links
Strip: <rtfd> blobs, visual/positional attributes, <preferences>, <windows>, schema definitions
Output format: indented plain-text outline (not XML or JSON)
CLI options: --max-text, --no-text, --output

The result: atbref-xml-ai.txt (2.6 MB).

Disclaimer: the txt file is collateral reference to the thread and not intended for use. Experiment with it (with AI) at your risk.

I’m not skilled enough yet to gauge the context size difference of the 2.6MB TXT file vs. the 19.2MB TBX file. It’s nice that the AI version is still readable to a human.

Given how an AI doesn’t ‘understand’ in a human sense I suspect that it has thrown out info on features it hasn’t needed or been asked to use. So as a general resource int might be limited. But for a (process) agent, this might be an interesting approach for a down-scaled reference for an agent with limited implementation flexibility.

A clear takeaway is that review and iteration are required for active use, not least as aTbRef reflects constant change in Tinderbox itself so the content and structure of the TBX is not settled.

Still I think this is investing in the context of AI+TB use, not least as smaller files mean a smaller context, which uses fewer tokens, which in aggregate is less cost be it to our pocket or destruction of the environment to build electrical plant and data centres.

jfontana · April 4, 2026, 1:49pm

Could you share ‘simplify_tbx.py’ with us?

JM

mwra · April 4, 2026, 4:12pm

I’ve fixed the TXT download link in my last post.

I don’t see much point in sharing the TXT creation Python script as, looking at the TXT, why would you want to? It strikes me the AI missed the point of the question—or the human (me!) didn’t phrase it accurately enough. The TXT only retains info on features the AI has used in the current session but not those in the corpus it has not used yet. It retains an arbitrary amount of text, which a human might pre-supposes all the important nt info in $Text is in the first few sentences. I know this not to be the case (as I write the TBX). So, I think more work is needed.

At this point, I don’t see any point in sharing the Python as it isn’t making the ‘right’ output. As I’ve not rev viewed the code I’ve no idea if it is safe for other use. The TXT is just text, code (untested/reviewed) might do more. Also, bear in mind anyone with Claude Code can recreate it from the info up-thread. Is there a need to see the code for anything other than idle curiosity?

The point of the original post, perhaps being missed, was to review/discuss if this example generates sensible output. The problem with gene is the output, whilst tidy, is not always correct/useful.

mwra · April 4, 2026, 4:37pm

One thing I’ve quickly spotted is the aI failed to grok that Name is not required to be a unique values, and for content reasons in a document like aTbref is not. So the way it think it has stored link is not going to work. Bear in mind this was a speculative test: Claude hasn’t tried to use the TXT file as a direct replacement for aTbRef.

Admittedly, I didn’t make the last point to Claude. Nor did I inform it abut $name non-uniqueness. The AI/human communication here is it can’t look at something and tell us what bit doesn’t understand: something innate to a human if stuck. So we can’t ask the AI what it doesn’t understand other than by posing edge case problems … a bit like human beta testing of the app!

I feel the above is an interesting starting experiment . If nothing else it helps underscore our naïve presumptions as to the degree to which an AI ‘understands’ the problem. I don’t say that as point of blame of shame: it’s just code, albeit very fast/clever code. The forthcoming challenge goes back to the initial provocation: how do we write (minimally) and meaningfully for a non-human reader such as an AI?

Usit · April 7, 2026, 8:35am

An interesting idea — thank you for that.
I tend toward ready-made solutions out of convenience and regularly fail because they don’t really fit properly. My experience working with AI feels sobering. I often spend more time correcting errors introduced by the AI than on the actual task itself. I find myself increasingly reducing Claude to little more than a convenient search engine, as my tolerance for frustration drops proportionally with the length of the conversation.

At the moment, Claude remembers questions from general topics I asked once more reliably than specific Tinderbox functions it repeatedly tries to implement incorrectly — even though it has documented those functions in Hints.
A love-hate relationship has developed between us.

eastgate · April 7, 2026, 2:05pm

Things I have found to help:

Look at your readings in Hints with an eye to keeping them as short and as limited in scope as you can.
It sounds like some tasks are recurring problems. Consider asking Claude about that topic after it has done the readings.
Review CLAUDE.md and MEMORY.md. If these are too long, that can be a problem. If these contradict the readings, that can really be a problem.
I’d be interested to know more about the specific friction points. Also, are you using Desktop or Claude Code CLI? And which model are you using?

Usit · April 7, 2026, 4:14pm

Thank you for the hints. I use Claude Desktop and, after the meetup with Andreas Grimm, also Claude Code CLI on a trial basis. Currently with the Sonnet 4.6 model. I’m also a German speaker, and so I could tell that his inputs don’t differ significantly from mine. I was always wondering whether English inputs would be more comprehensible.
The problem is simply fundamental in nature and purely my personal experience with AI, outside of Tinderbox as well. Though the differences between vendors and the strengths and weaknesses of the respective models are obvious. I often get the impression that AI mostly suggests overly convoluted paths to the goal. It rarely surprises me. Sometimes I wonder if it’s trying to pull my leg, because the correct solution is so simple and logical — and when I point that out, the usual flattery follows.
I don’t want to be negative about AI here though. It just crossed my mind.

I find aTbRef for AI fascinating

mwra · April 7, 2026, 4:33pm

AI is code. I doesn’t ‘think’, nor does it ‘know’ in a human sense. Tinderbox is not a common app so why would the LLM, that only knows of things prior to the model (i.e. likely not much about Tinderbox) plus what we individually teach it locally on our own systems use the methods we would. Thus the frequent recommends to make sure you record your local hearings and to tell the AI to consults them each session (it doesn’t ‘remember’ between sessions, at least not in a human manner.

The LLM is not local and can’t ‘learn’ (though the next realise of the model likely will have some learning). If AI’s approach seems odd, its experience differs from ours. So it will know a lot about command line tools, grep etc., not so much about action code in Tinderbox. Ourselves, perhaps the opposite.

So if we treat it like ourselves—in terms of expertise—we are likely to be disappointed. But if we use its strength and make sure the each session it reviews what it’s leaned we should see better result.

Of course, as bearings grow, so does the data and thus context used. To me this implies—without knowing the answer—that making a minimal instruction set that we intentionally ensure covers all/most Tinderbox use then re-learning is not needed and will (may?) eventually go into the LLMs.

Usit · April 7, 2026, 5:27pm

Many thanks for the hints. This also makes sense to me when I think about how LLMs work. I’m not a scientist. The theoretical approach is rather secondary to me. I think more in practical terms. Here is my problem — and which path leads to the expected solution. I enjoy reading scientific papers during quiet hours to broaden my perspective. But I don’t solve my problems with that. I only sharpen my mind with it.
So it is more than likely that my expectations of AI are excessive.
The thing is, I don’t want to adapt my problems to the AI by only giving it tasks it likes.
(A slightly exaggerated little joke)

Best regards

mwra · April 7, 2026, 6:59pm

But, how about giving asks it has a reasonable chance of being able to address? If we can’t answer the latter can we reasonably critique the result we are given. Where I see genAI help is in automating work we know how to do but doing it faster. Often, it only does it better with a human in the loop who understands that task at hand. genAI is not a magic lamp, even if it is sold as such.

It seems the challenge is to communicate about tasks it can’t yet do, as they aren’t ‘in’ the LLM as solutions nor does the LLM have enough info for the AI to ‘guess’ the solution.

Part of the challenge is we have been given a working companion who is not human, doesn’t think/work like a human and can’t express what it doesn’t know (something a human generally can).

Usit · April 7, 2026, 10:19pm

I completely agree with that statement.

You’ve taken a technical issue and framed it in an empathetic way, and you’ve described it well.
I sometimes wonder if I view Claude (as an example) differently than other software products. Not yet, at least, but will that change?

jodell · April 24, 2026, 5:29pm

the aI failed to grok that Name is not required to be a unique values

I noticed that when using it to process notes I had exploded from a table I had copied and pasted from HTML. I copied the HTML, then used BBEdit to comma delimit the column entries/terminate lines with semi-colons, then copied and pasted all the lines into a Tinderbox note, which I then exploded, all done manually before using Claude Desktop. (The knowledgeable Tinderbox user will spontaneously interject all the much better ways it could have been done. Please don’t crush my tiny pride with the results of my naive learning process by telling me, we all learn in our own way, and this is mine ).

Claude encountered difficulties using its initial 2 approaches both of which assumed that $Name was a good way to access the notes. As I had seen in one of the video meetups, the $ID is unique, but Claude does not seem to have picked that up from its readings. In fact, because explode put commas and semicolons in the name, Claude mentioned that was causing errors, and its attempt to try another way caused a Claude desktop (apparent) lockup. When I ran it the next time, I explicitly told it that IDs are unique identifiers. This leads me to 2 questions:

Can I add to the existing notes for readings in Claude’s folder?
Is adding this fact (IDs are unique) to the readings likely to improve Claude-Tinderbox interactions?

Which also raises the question-when feeding Claude, or any LLM, information about using Tinderbox (for instance), how much is too much? We are essentially shaping a general purpose multivariable statistical model to predict specific outcomes. In my work world, I converted someone else’s statistical model (for 2D NMR data) from several proprietary stats packages to Python, and applied it to a new, uncharacterized data set. The underlying assumption (variables with highest variance would be likely to indicate the most important variables) was completely incorrect for my data set.
It also brings to mind a (stodgy) scientist of my acquaintance who looked askance at any statistics more complicated than mean and standard deviation because “you can fit a line to almost any data set by increasing the order of the polynomial”. (I was suggesting Design of Experiments, which he apparently had read about, but didn’t understand, so had to crush that approach before he had to reveal his ignorance. No, I’m not bitter, why do you ask? ) It is relevant, though, because the more variables in the model, the more chance it has to fit incorrectly (ie hallucinate). And I still wonder, how can we provide the AI the information it needs to speed up many ordinary cases, while not pushing the model into a state where it thinks anything is realistic? And is it better for me to add my specific observations to the readings provided? Or to put them into a different file in Claude’s notes? How are the Hints files prioritized in using the MCP server?

mwra · April 24, 2026, 8:05pm

I can’t speak to the AI stuff, but if you don’t need to see values when using code then I’d argue $ID trumps $Name/$Path/$IDString. Sure, humans don’t easily grok patterns in 10-digit numbers, but if you never see the values the fact they are all numeric is moot. As long as you avoid scenarios where the code is left to guess data types and are doing arithmetical work ($ID is Number type and looks like a number to the type parser).

eastgate · April 24, 2026, 10:28pm

Yes! You can (and should) edit the readings, and add your own readings.
I have mentioned to Claude that unique names are a good idea, and also that it’s good to avoid punctuation — especially / — in names. When referring to notes, I prefer $Path to $IDString, and $IDString to $ID. But $ID is available if you want it.

All things being equal, I’d put them in the existing readings if they fit naturally, and in another note if you’re explaining a distinct topic.