An aTbRef for AIs?

During an interesting zoom call with @andreas we explored the notion of writing for an AI. The AI’s stated preference was for consuming aTbRef as XML.

Prompt: Do you prefer to look using Tinderbox using the Tinderbox skill or by directly reading the XML?

Honestly, direct XML reading is faster and more reliable for most tasks:

  • No dependency on Tinderbox being open
  • Can search/parse the full document instantly
  • Better for bulk queries and exploration

The Tinderbox skill (JXA) is better when you need to write changes back to a live, open document — creating notes, modifying attributes, etc.

So: read via XML, write via the skill. What are you looking to do?

Prompt: Why is it quicker or more effective to use the XML

A few practical reasons:

  1. Direct file access — I read the .tbx file with the Read or Grep tool instantly, no round-trips through osascript and the Tinderbox app.
  2. Bulk search — I can Grep across the entire XML for any attribute, note name, or value in one shot. JXA would require iterating note by note.
  3. No app dependency — the file can be read even if Tinderbox isn’t running or the document isn’t open.
  4. Full structure visible — the raw XML exposes everything: attributes, prototypes, links, agents, adornments — nothing is hidden behind an API.

The JXA skill is more of a necessity for writes, since you shouldn’t edit a live Tinderbox document’s XML directly while it’s open (you risk corruption or having changes overwritten when Tinderbox saves).

Prompt: Could the XML be written more? Plainly to a to aid you in reading the structure of the document

Now I have a sense of it. The format is fairly readable but verbose — lots of attributes with many XML properties (editable, visibleInEditor, lines, type, default) that are mostly boilerplate for system attributes.

To answer your question: yes, it could be simplified for my purposes. A few approaches:

  1. Strip boilerplate attributes — for system <attrib> definitions, most of the XML attributes (editable, visibleInEditor, lines) are noise. Only Name, parent, type, and default are semantically interesting.
  2. Export a summary — Tinderbox can export note outlines or tab-delimited data that is far more compact than the full XML.
  3. Preprocess to a plain outline — a simple script could walk the XML and produce a flat or indented text representation of just notes, their key attributes, and their values, stripping the schema/definition boilerplate entirely.

How large is the file? That would tell me how much the verbosity is actually a problem in practice.
<checks>
200,000 lines and ~19MB — that’s substantial. Reading it raw would consume a lot of context. So yes, simplification would genuinely help.

The most practical approach would be a Python script that parses the XML and outputs a compact representation — just note names, paths, key user-defined attributes, and text content, skipping all the system schema definitions. That could reduce it by an order of magnitude.

Would you like me to write such a script?

So we did. The plan was…


Plan: Tinderbox XML Simplifier (simplify_tbx.py)

A Python script to strip noise from the 19MB .tbx XML and produce a compact plain-text outline for LLM reading. Key decisions:

  • Keep: note names, hierarchy, proto, semantic attributes (URL, Rule, Edict, AgentQuery, AgentAction, IsPrototype), text content, resolved links
  • Strip: <rtfd> blobs, visual/positional attributes, <preferences>, <windows>, schema definitions
  • Output format: indented plain-text outline (not XML or JSON)
  • CLI options: --max-text, --no-text, --output

The result: atbref-xml-ai.txt (2.6 MB).

Disclaimer: the txt file is collateral reference to the thread and not intended for use. Experiment with it (with AI) at your risk.

I’m not skilled enough yet to gauge the context size difference of the 2.6MB TXT file vs. the 19.2MB TBX file. It’s nice that the AI version is still readable to a human.

Given how an AI doesn’t ‘understand’ in a human sense I suspect that it has thrown out info on features it hasn’t needed or been asked to use. So as a general resource int might be limited. But for a (process) agent, this might be an interesting approach for a down-scaled reference for an agent with limited implementation flexibility.

A clear takeaway is that review and iteration are required for active use, not least as aTbRef reflects constant change in Tinderbox itself so the content and structure of the TBX is not settled.

Still I think this is investing in the context of AI+TB use, not least as smaller files mean a smaller context, which uses fewer tokens, which in aggregate is less cost be it to our pocket or destruction of the environment to build electrical plant and data centres.

3 Likes

Could you share ‘simplify_tbx.py’ with us?

JM

I’ve fixed the TXT download link in my last post.

I don’t see much point in sharing the TXT creation Python script as, looking at the TXT, why would you want to? It strikes me the AI missed the point of the question—or the human (me!) didn’t phrase it accurately enough. The TXT only retains info on features the AI has used in the current session but not those in the corpus it has not used yet. It retains an arbitrary amount of text, which a human might pre-supposes all the important nt info in $Text is in the first few sentences. I know this not to be the case (as I write the TBX). So, I think more work is needed.

At this point, I don’t see any point in sharing the Python as it isn’t making the ‘right’ output. As I’ve not rev viewed the code I’ve no idea if it is safe for other use. The TXT is just text, code (untested/reviewed) might do more. Also, bear in mind anyone with Claude Code can recreate it from the info up-thread. Is there a need to see the code for anything other than idle curiosity?

The point of the original post, perhaps being missed, was to review/discuss if this example generates sensible output. The problem with gene is the output, whilst tidy, is not always correct/useful.

1 Like

One thing I’ve quickly spotted is the aI failed to grok that Name is not required to be a unique values, and for content reasons in a document like aTbref is not. So the way it think it has stored link is not going to work. Bear in mind this was a speculative test: Claude hasn’t tried to use the TXT file as a direct replacement for aTbRef.

Admittedly, I didn’t make the last point to Claude. Nor did I inform it abut $name non-uniqueness. The AI/human communication here is it can’t look at something and tell us what bit doesn’t understand: something innate to a human if stuck. So we can’t ask the AI what it doesn’t understand other than by posing edge case problems … a bit like human beta testing of the app!

I feel the above is an interesting starting experiment . If nothing else it helps underscore our naïve presumptions as to the degree to which an AI ‘understands’ the problem. I don’t say that as point of blame of shame: it’s just code, albeit very fast/clever code. The forthcoming challenge goes back to the initial provocation: how do we write (minimally) and meaningfully for a non-human reader such as an AI?

An interesting idea — thank you for that.
I tend toward ready-made solutions out of convenience and regularly fail because they don’t really fit properly. My experience working with AI feels sobering. I often spend more time correcting errors introduced by the AI than on the actual task itself. I find myself increasingly reducing Claude to little more than a convenient search engine, as my tolerance for frustration drops proportionally with the length of the conversation.

At the moment, Claude remembers questions from general topics I asked once more reliably than specific Tinderbox functions it repeatedly tries to implement incorrectly — even though it has documented those functions in Hints.
A love-hate relationship has developed between us.​​​​​​​​​​​​​​​​

Things I have found to help:

  1. Look at your readings in Hints with an eye to keeping them as short and as limited in scope as you can.

  2. It sounds like some tasks are recurring problems. Consider asking Claude about that topic after it has done the readings.

  3. Review CLAUDE.md and MEMORY.md. If these are too long, that can be a problem. If these contradict the readings, that can really be a problem.

  4. I’d be interested to know more about the specific friction points. Also, are you using Desktop or Claude Code CLI? And which model are you using?

Thank you for the hints. I use Claude Desktop and, after the meetup with Andreas Grimm, also Claude Code CLI on a trial basis. Currently with the Sonnet 4.6 model. I’m also a German speaker, and so I could tell that his inputs don’t differ significantly from mine. I was always wondering whether English inputs would be more comprehensible.
The problem is simply fundamental in nature and purely my personal experience with AI, outside of Tinderbox as well. Though the differences between vendors and the strengths and weaknesses of the respective models are obvious. I often get the impression that AI mostly suggests overly convoluted paths to the goal. It rarely surprises me. Sometimes I wonder if it’s trying to pull my leg, because the correct solution is so simple and logical — and when I point that out, the usual flattery follows.
I don’t want to be negative about AI here though. It just crossed my mind.

I find aTbRef for AI fascinating

AI is code. I doesn’t ‘think’, nor does it ‘know’ in a human sense. Tinderbox is not a common app so why would the LLM, that only knows of things prior to the model (i.e. likely not much about Tinderbox) plus what we individually teach it locally on our own systems use the methods we would. Thus the frequent recommends to make sure you record your local hearings and to tell the AI to consults them each session (it doesn’t ‘remember’ between sessions, at least not in a human manner.

The LLM is not local and can’t ‘learn’ (though the next realise of the model likely will have some learning). If AI’s approach seems odd, its experience differs from ours. So it will know a lot about command line tools, grep etc., not so much about action code in Tinderbox. Ourselves, perhaps the opposite.

So if we treat it like ourselves—in terms of expertise—we are likely to be disappointed. But if we use its strength and make sure the each session it reviews what it’s leaned we should see better result.

Of course, as bearings grow, so does the data and thus context used. To me this implies—without knowing the answer—that making a minimal instruction set that we intentionally ensure covers all/most Tinderbox use then re-learning is not needed and will (may?) eventually go into the LLMs.

Many thanks for the hints. This also makes sense to me when I think about how LLMs work. I’m not a scientist. The theoretical approach is rather secondary to me. I think more in practical terms. Here is my problem — and which path leads to the expected solution. I enjoy reading scientific papers during quiet hours to broaden my perspective. But I don’t solve my problems with that. I only sharpen my mind with it.
So it is more than likely that my expectations of AI are excessive.
The thing is, I don’t want to adapt my problems to the AI by only giving it tasks it likes.
(A slightly exaggerated little joke)

Best regards

But, how about giving asks it has a reasonable chance of being able to address? If we can’t answer the latter can we reasonably critique the result we are given. Where I see genAI help is in automating work we know how to do but doing it faster. Often, it only does it better with a human in the loop who understands that task at hand. genAI is not a magic lamp, even if it is sold as such.

It seems the challenge is to communicate about tasks it can’t yet do, as they aren’t ‘in’ the LLM as solutions nor does the LLM have enough info for the AI to ‘guess’ the solution.

Part of the challenge is we have been given a working companion who is not human, doesn’t think/work like a human and can’t express what it doesn’t know (something a human generally can).

1 Like

I completely agree with that statement.

You’ve taken a technical issue and framed it in an empathetic way, and you’ve described it well.
I sometimes wonder if I view Claude (as an example) differently than other software products. Not yet, at least, but will that change?