AI and Tinderbox

eastgate · August 8, 2025, 3:43pm

After a considerable time as a fairly vocal skeptic of LLMs, I have changed my mind. Large Language Models (LLMs) can be an extraordinary Tinderbox companion.

I’ve now had about two weeks experience with a new, experimental Tinderbox build that “talks” to Anthropic’s Claude Desktop ($20/mo, and should also be able to communicate with many other AI models). Some observations:

Claude learns to use Tinderbox with remarkable facility. At first, it just guesses — and it has an unfortunate confidence that its guesses must be right! That’s annoying, but not very harmful. After supplying Claude with a one-page “cheat sheet” to get started, it became quite good at routine Tinderbox manipulations.
I spent about half an hour explaining Posters to Claude, and showed it the Mermaid example from the Poster demo. Claude got it almost immediately. (It still occasionally forgets to set $PosterTemplate or confuses it with $HTMLExportTemplate. Who doesn’t?)
Claude is really good as a research assistant, finding excellent suggestions. I had it gather planned reading from my own very ill-sorted Book Notes and it built a nice list. I asked for further reading on a variety of topics, and its suggestions were remarkably good. (If Claude were making stuff up, it wouldn’t matter for this application as I’d catch it right away.)
Claude was less good at discussing a tricky text. I asked it to read Emerson’s The American Scholar, and it went straight to Cliff’s Notes and to a term paper mill. Sigh. Pushing a little bit did help.
A key element to all this is that I leave notes for Claude (in /Hints/AI/Claude/Readings) that I expect it to read before each session, and Claude leaves notes to itself (in /Hints/AI/Claude/Notes) for later reference.
Claude is also quite good at boring work that sometimes comes up in research, like reformatting tables or finding syntax errors.

A key is to think of the AI as an undergraduate research assistant whom you don’t know very well. Claude is overconfident. It takes shortcuts. It’s not always honest, and it’s terrible at introspection. Claude is a shameless flatterer. Still, Claude is extraordinarily well read.

This experiment build is currently available to backstage users. I’d love to hear from other folks using AppleScript-based approaches to integrating AIs about what works and does not work, and what they wish they had known sooner.

abusch · August 8, 2025, 4:06pm

Excellent development! Interested as an end user, not as a code smith, but this very much.
Also my compliments for reevaluating your previous position! That’s a very good sign that enhances my confidence in Tinderbox development.

WAKAMATSU · August 9, 2025, 12:23am

Dear eastgate.
I support the creation of an environment where AI can be used with Tinderbox.
I have a suggestion.
In addition to this, could you please use an aid to connect AI only on a local desktop, just like the DEVONthink4 function?
The paid Anthropic’s Claude Desktop is still too expensive for me.
I am using an LLM environment via LM Studio with DEVONthink4.
（LLM helps me reread Bergson’s books.）
Yours, WAKAMATSU

eastgate · August 9, 2025, 2:20am

My impression, not yet confirmed, is that this will work both with LM Studio and oLlama.

DaveM · August 9, 2025, 3:38pm

That’s not just unsettling – that’s astounding.

pkus · August 11, 2025, 11:29am

The thing that turned out to be a limitation in my implementation is about how to design the tools so that the assistant can easily get an overview of the document (especially since one of the things it’s useful for is suggesting connections). The “discovery” tools that I have are all about the immediate context of a single note (get_siblings, get_children, get_links). But this means that it needs quite a few tool calls to get a sense of the context of a note, which can be inefficient, and miss more remote notes that are relevant.

I was thinking that it would be good to have a bird’s eye view tool that can be called early in the conversation and that returns a list of note titles in their outline structure (like Export → as Outline). Even with only titles, that can fill up the context though – if I remember correctly, my current document with ca. 600 notes amounted to 5k tokens like this. But there could be a filter where you tell it which branch or subbranch of the document is relevant for the conversation, and then it only receives all the descendants of that point.

On the other hand, your clever setup with Claude having its own notes section makes me think that a simpler solution could work as well: Instruct it to always keep a note summarising what it knows about the broad structure of the document based on the interactions so far.

How have you been approaching this issue of how to best let Claude explore the contents of a document? If you could give a quick list of all the tools included in your version I’d be grateful as well, as I don’t have the backstage thing.

eastgate · August 11, 2025, 1:47pm

Information on backstage: Tinderbox: Tinderbox Backstage

You’re right about the utility of a batch get_notes tool. The current implementation fetches both information about the note and the identities of its parent, prototype, and children; directing the attention of Tinderbox to a specific container lets it get going. But sure, asking it to review all the notes in my million-word weblog is likely to lead to tears.

The current tool roster is

get_notes (with either a list of notes or a query)
set_value
create_note
create_link
get_document (information about the entire document)
do (perform an action)
evaluate (evaluate an expression)

Of course, the final two actions provide access to just about anything in Tinderbox.

These are note carefully thought out.

Phileosophos · August 12, 2025, 5:56am

Happy to help test and verify if you like: the only AI I’m willing to use is entirely self hosted on my own hardware with zero external connectivity. I use Ollama (among other tools) for that and could try Tinderbox with it. Cheers!

mwra · August 12, 2025, 8:43am

Given your understood need for local-only AI, I strongly recommend watching the video for the 6 Jun 25 meet-up where physician Dr Andy Bland described his local-only set-up and the toolchain involved. In his case patient record confidentiality (HIPAA) was a concern/constraint.

The MCP connection aspect of AI work is very new. My understanding is that the MCP approach (i.e. AI+app integration) debuted in the Anthropic (Claude) community so using Claude as the initial test context for Tinderbox makes sense (by way of managing expectation). A lack of a local/private implementation for Tinderbox+AI is just state of the art ATM not a policy choice.

eastgate · August 12, 2025, 2:42pm

I’m working on a connection to ollama via Runebook’s TOME application. Not working quite yet, but the developer seems confident that we can sort it out.

Jake_Bernstein · August 13, 2025, 11:52pm

I absolutely love this. Nothing is more powerful than a person willing to change his mind.

The integration of AI via MCP is 100% going to get me to upgrade (ironically I think I paid up through 10.0 – and it’s a budget thing combined with the fact that I can’t readily use a Mac for work…don’t take it personally!).

From a hypertext/computer science perspective, though, I’m REALLY curious to hear more from @eastgate about his experiences. There is something absolutely alien (and delightful) when you start letting LLMs access tools. As I recently quipped, studying LLMs themselves is a whole new line of research. And it feels more like biology than traditional computer science!

Coincidentally, I’ve stopped reading sci-fi novels…I find it boring compared to just reading the news.

eastgate · August 14, 2025, 2:52am

Well, with all due respect, A Memory Called Empire is really worth reading.

I’m writing a series at https://markBernstein.org . There’s a new books here, but it will take some time.

Jake_Bernstein · August 14, 2025, 3:37am

I was being perhaps a bit hyperbolic—I have read tons of sci-fi and am always up for a good recommendation. It’s just that I went back to fantasy for my escapism and I think I’ll be here for a while (Malazan Book of the Fallen plus Complete Wheel of Time is like 25,000 pages…).

Usit · September 18, 2025, 12:20pm

My first experiences with Tinderbox 11 and Cloude AI.

I am currently getting to know Tinderbox better, and Claude AI is helping me with that.

My first project is to create a list of my whisky collection.

BTW: While creating it, I got the impression that Cloude is a bigger Whisky fan than I am.

After Cloude suggested and created the concept, I handed over 100+ data records (Whisky and Whiskey).

I deliberately did not intervene.

Cloude created a note for each whisky with the important data in text form. It also created a few agents for evaluation.

Then I told Cloude that I own several bottles of the same type of whisky, which I bought at different times, at different prices and from different retailers. Cloud knew the answer and divided the notes. One note with the general data on the whisky and one for each bottle.

Then I asked if it wouldn’t make more sense to use attributes instead of plain text.

Cloude thought the idea was perfect and took care of the conversion.

BTW: Since you’re constantly pushing the limits, I upgraded to Cloude’s Pro Plan.

I’ve already reached the limit several times.

Claude also created a prototype for each new whisky and new bottle. As well as a template and new agents.

I was delighted until I realized that a created agent overwrote all notes in the text field with its own meaningless message, thereby destroying the data source for transferring the data.

I now had attributes, but they were all empty.

When I asked Cloude, the AI initiated a check. The result confirmed my suspicion, and Cloude advised me to use Cmd+Z or restore a backup.

After my initial euphoria, I am now sober. Yes, the AI has potential, but it absolutely requires experienced supervision.

But I’m learning a lot about Tinderbox.

I’ll take a closer look at Git.

dominiquerenauld · September 18, 2025, 12:39pm

Hi! I suppose — but I’m asking just in case, before installing the latest Tinderbox update — that it must be possible not to use Claude or any AI whatsoever? This is a real “sticking point” for me that might even make me decide to go back to entirely handwritten note-taking, that says it all! I mean: whatever I write — note for private use or fragment of manuscript intended for an editor —, can Claude — or Germaine… — have access, in one way or another, to the notes I take with Tinderbox? (Paranoid episode)

eastgate · September 18, 2025, 12:50pm

Absolutely. There’s a global switch (File ▸ Enable AI Integration). It’s OFF until you opt in. And each document has its own switch (DocumentSettings) .

mwra · September 18, 2025, 1:43pm

Links for the above:

*App-level master switch: Tinderbox 11 menu.

Document-level switch: Enable Al Integration in this document.

Note: the master-switch wins if a document is enabled for AI use but the app is not. IOW, no accidental leakage.

Side note. Disabling AI use at the app level (Tinderbox 11 menu) removes the Tinderbox integration files from Claude. If you then re-enable the integration, Tinderbox will put them back. I only mention this for those wondering “what might possibly go wrong?” in terms of there being access when not desired.

It is understood that this facet of use is non-trivial. Besides persona secrets, anyone close to areas like medicine or therapy needs to be careful about where patient records go, etc. For some that’s sand in the gears, but if it is our own patient records getting hoovered up, we tend to think otherwise.

Usit · September 18, 2025, 7:15pm

In my tests, Cloude was unable to create attributes and referred to a lack of support for the Tinderbox API. It also cannot assign values to manually created attributes.
I double-checked this with a new File and a single note.

mwra · September 18, 2025, 7:43pm

It might help others to investigate this if you were to share the prompts that lead to this negative outcome.

Usit · September 18, 2025, 8:04pm

That’s correct. I haven’t worked with script commands in Tinderbox yet, so I don’t recognize things right away.
But I was just listening to my wife telling me about her day and kept thinking that I could just compare what aTbRef has to offer for that.
I’ll check it out right away. Sorry, my mistake.