AI and Tinderbox

After a considerable time as a fairly vocal skeptic of LLMs, I have changed my mind. Large Language Models (LLMs) can be an extraordinary Tinderbox companion.

I’ve now had about two weeks experience with a new, experimental Tinderbox build that “talks” to Anthropic’s Claude Desktop ($20/mo, and should also be able to communicate with many other AI models). Some observations:

  1. Claude learns to use Tinderbox with remarkable facility. At first, it just guesses — and it has an unfortunate confidence that its guesses must be right! That’s annoying, but not very harmful. After supplying Claude with a one-page “cheat sheet” to get started, it became quite good at routine Tinderbox manipulations.

  2. I spent about half an hour explaining Posters to Claude, and showed it the Mermaid example from the Poster demo. Claude got it almost immediately. (It still occasionally forgets to set $PosterTemplate or confuses it with $HTMLExportTemplate. Who doesn’t?)

  3. Claude is really good as a research assistant, finding excellent suggestions. I had it gather planned reading from my own very ill-sorted Book Notes and it built a nice list. I asked for further reading on a variety of topics, and its suggestions were remarkably good. (If Claude were making stuff up, it wouldn’t matter for this application as I’d catch it right away.)

  4. Claude was less good at discussing a tricky text. I asked it to read Emerson’s The American Scholar, and it went straight to Cliff’s Notes and to a term paper mill. Sigh. Pushing a little bit did help.

  5. A key element to all this is that I leave notes for Claude (in /Hints/AI/Claude/Readings) that I expect it to read before each session, and Claude leaves notes to itself (in /Hints/AI/Claude/Notes) for later reference.

  6. Claude is also quite good at boring work that sometimes comes up in research, like reformatting tables or finding syntax errors.

A key is to think of the AI as an undergraduate research assistant whom you don’t know very well. Claude is overconfident. It takes shortcuts. It’s not always honest, and it’s terrible at introspection. Claude is a shameless flatterer. Still, Claude is extraordinarily well read.

This experiment build is currently available to backstage users. I’d love to hear from other folks using AppleScript-based approaches to integrating AIs about what works and does not work, and what they wish they had known sooner.

4 Likes

Excellent development! Interested as an end user, not as a code smith, but this very much.
Also my compliments for reevaluating your previous position! That’s a very good sign that enhances my confidence in Tinderbox development.

Dear eastgate.
I support the creation of an environment where AI can be used with Tinderbox.
I have a suggestion.
In addition to this, could you please use an aid to connect AI only on a local desktop, just like the DEVONthink4 function?
The paid Anthropic’s Claude Desktop is still too expensive for me.
I am using an LLM environment via LM Studio with DEVONthink4.
(LLM helps me reread Bergson’s books.)
Yours, WAKAMATSU

My impression, not yet confirmed, is that this will work both with LM Studio and oLlama.

1 Like

That’s not just unsettling – that’s astounding.

The thing that turned out to be a limitation in my implementation is about how to design the tools so that the assistant can easily get an overview of the document (especially since one of the things it’s useful for is suggesting connections). The “discovery” tools that I have are all about the immediate context of a single note (get_siblings, get_children, get_links). But this means that it needs quite a few tool calls to get a sense of the context of a note, which can be inefficient, and miss more remote notes that are relevant.

I was thinking that it would be good to have a bird’s eye view tool that can be called early in the conversation and that returns a list of note titles in their outline structure (like Export → as Outline). Even with only titles, that can fill up the context though – if I remember correctly, my current document with ca. 600 notes amounted to 5k tokens like this. But there could be a filter where you tell it which branch or subbranch of the document is relevant for the conversation, and then it only receives all the descendants of that point.

On the other hand, your clever setup with Claude having its own notes section makes me think that a simpler solution could work as well: Instruct it to always keep a note summarising what it knows about the broad structure of the document based on the interactions so far.

How have you been approaching this issue of how to best let Claude explore the contents of a document? If you could give a quick list of all the tools included in your version I’d be grateful as well, as I don’t have the backstage thing.

Information on backstage: Tinderbox: Tinderbox Backstage

You’re right about the utility of a batch get_notes tool. The current implementation fetches both information about the note and the identities of its parent, prototype, and children; directing the attention of Tinderbox to a specific container lets it get going. But sure, asking it to review all the notes in my million-word weblog is likely to lead to tears.

The current tool roster is

  • get_notes (with either a list of notes or a query)
  • set_value
  • create_note
  • create_link
  • get_document (information about the entire document)
  • do (perform an action)
  • evaluate (evaluate an expression)

Of course, the final two actions provide access to just about anything in Tinderbox.

These are note carefully thought out.

1 Like

Happy to help test and verify if you like: the only AI I’m willing to use is entirely self hosted on my own hardware with zero external connectivity. I use Ollama (among other tools) for that and could try Tinderbox with it. Cheers!

Given your understood need for local-only AI, I strongly recommend watching the video for the 6 Jun 25 meet-up where physician Dr Andy Bland described his local-only set-up and the toolchain involved. In his case patient record confidentiality (HIPAA) was a concern/constraint.

The MCP connection aspect of AI work is very new. My understanding is that the MCP approach (i.e. AI+app integration) debuted in the Anthropic (Claude) community so using Claude as the initial test context for Tinderbox makes sense (by way of managing expectation). A lack of a local/private implementation for Tinderbox+AI is just state of the art ATM not a policy choice.

I’m working on a connection to ollama via Runebook’s TOME application. Not working quite yet, but the developer seems confident that we can sort it out.

1 Like

I absolutely love this. Nothing is more powerful than a person willing to change his mind.

The integration of AI via MCP is 100% going to get me to upgrade (ironically I think I paid up through 10.0 – and it’s a budget thing combined with the fact that I can’t readily use a Mac for work…don’t take it personally!).

From a hypertext/computer science perspective, though, I’m REALLY curious to hear more from @eastgate about his experiences. There is something absolutely alien (and delightful) when you start letting LLMs access tools. As I recently quipped, studying LLMs themselves is a whole new line of research. And it feels more like biology than traditional computer science!

Coincidentally, I’ve stopped reading sci-fi novels…I find it boring compared to just reading the news. :slight_smile:

Well, with all due respect, A Memory Called Empire is really worth reading. :confused:

I’m writing a series at https://markBernstein.org . There’s a new books here, but it will take some time.

1 Like

I was being perhaps a bit hyperbolic—I have read tons of sci-fi and am always up for a good recommendation. It’s just that I went back to fantasy for my escapism and I think I’ll be here for a while (Malazan Book of the Fallen plus Complete Wheel of Time is like 25,000 pages…).