Turning aTbRef 11 into a NotebookLM Knowledge Base for Tinderbox Workflow Design

I wanted to share a small experiment: I converted aTbRef 11 into a NotebookLM-friendly Markdown knowledge base so I can ask questions about Tinderbox mechanisms without losing the structure and source references of the original reference file.

The basic approach was:

  1. Export/convert aTbRef 11 into 30 main Markdown files, roughly corresponding to the major sections of the reference.
  2. Add 3 lightweight guide files:
    • 00_README_How_to_Ask.md
    • 00_Terminology_Index.md
    • 00_My_Tinderbox_Goals.md
  3. Upload all 33 files into NotebookLM as sources.
  4. Use NotebookLM as a retrieval and interpretation layer, not as a replacement for Tinderbox judgment.

The important part was adding the guide files. Since aTbRef is a reference, not a tutorial, I wanted NotebookLM to answer in a way that stays close to the source. So I instructed it to cite the relevant aTbRef source file and section heading, distinguish between explicit aTbRef content and synthesized workflow advice, and avoid jumping too quickly into complex automation.

My personal use case is designing a gradual workflow for analyzing a large number of reading notes in Tinderbox: finding repeated themes, unresolved questions, potential projects, and highly connected notes. So I added some “workflow review rules” to guide NotebookLM’s answers:

  • prefer observational agents before action-taking agents
  • prefer manual Stamps for deliberate decisions
  • use safe $OnAdd initialization only where appropriate
  • scope $AgentQuery examples to a specific container such as /Notes
  • avoid recommending $Rule, $AgentAction, or complex $Edict unless there is a clear need
  • explain whether an action affects an alias or the original note
  • cite the source file and section heading for technical claims

For example, instead of asking NotebookLM “design my whole Tinderbox system,” I ask things like:

Please compare $OnAdd, $AgentAction, $Rule, and $Edict. Explain when each runs, what object it affects, and cite the aTbRef source file and section heading.

or:

I want to analyze 500 reading notes to find repeated themes, unresolved questions, potential projects, and highly connected nodes. Design a minimal first-pass Tinderbox workflow. Do not jump into complex automation.

This has worked better than uploading one huge text file, because the Markdown sources preserve useful boundaries, and the extra guide files help NotebookLM answer in a more restrained, source-aware way.

In short, I’m using NotebookLM as a companion index and explanation layer for aTbRef: it helps me locate and connect relevant Tinderbox mechanisms, while Tinderbox itself remains the place where the actual structure, links, attributes, agents, and views live.

I’d be curious whether others have tried something similar with aTbRef, Tinderbox documents, or other large reference materials.

1 Like

I made a skill for Claude AI which knows the whole aTbRef 11. Take some time and quiet a few credits, but pretty handy as you get pretty much instant answers to your questions and Claude can actually work within Tinderbox through the MCP Connection.

Create a project in claude. Split the original aTbRef 11 File into separate pieces (about 30 splits in my case), add all of them to the files and use the “skill creator”-skill. Give it some prompt like “remember this complete reference, ..” and you’re good to go. Should work fine with the cheaper Sonnet Model, as you probably do not have to use the expensive Opus Model.

Hope this can give you some insight. (:

3 Likes

I think the problem is that most AIs tend to get problems reading text files/pdf with more than 50 to 70 pages or so. Splitting the files or even inputting and processing them one by one works way better, as far as I can tell.

1 Like

Currently we’re making a AI brute force attack on aTbRef. The resource is 21 years of accumulative writing where: the OS has changed, the app has been completely re-written, edge cases have emerged—and also (others) have become irrelevant, automations has scaled significantly, external Web-related tech has changed, the author’s depth of knowledge has changed, etc. As importantly, the underlying structure is an outline: the apps’ default is to export the outline as a static web site—at a time when the zeitgeist is for Markdown-native wiki-derived tools.

Also aTbref was written intentionally as a hypertext, not as a series of self-contained articles: the reader follows links. It is not clear the AI really understands the latter method, so can come to odd results. This is why I’m less interested in a brute-force attack so much as understanding how to write better fro an AI, as opposed to a human as they read (consume) differently under all the faux anthropomorphism. The chick-and-egg here is that new writing takes insight and we all want results now! But aTbRef’s been at this for 21+ years and this is but one more challenge.

Up to a point. However, this was written as a hypertext. A given topic is rarely—beyond the meaning of a label or button action—fully answered in one branch of the outline so arbitrary slicing down into the outline is catting orthogonally to the axis of many answer trails. So we’re giving the AI more work.

We also know there is rarely one ‘right’ way to do [thing] in Tinderbox. Human-mediated discussions in places like this forum often end up not taking the user’s expected path to an outcome because the selected option better fits the wider context†. That context is generally missing in AI interactions. If I’m re-working a section of aTbRef‡, I’m drawing on20+ years experience of the document and the changing environment in which the app has lived. My own experiments with AI give no sign it has such a rich picture. Yet, if I have to pack all that nuance into a prompt I’d run out of tokens and also: why keep a dog but then bark yourself? None of this is to argue against this interesting project and ai’m genuinely thankful for the insights being shared. Whist other ‘just’ get answers, i’m more focussed on the general patterns and achieving communication that improves all answers by the AI. I should add, I’m not worried about AI stepping on my toes. Indeed, I’m happy and amused that a decidedly human artefact is proving so core to a number of Tinderbox/AI experiments now.

Still, it is clear the NotebookLM is giving results. But, the researcher in me asks, how is the quality—the correctness/relevance/accuracy/efficiency—of the result being tested? Asking a tool to do a task achieve an imagined outcome the user cannot fully describe means that any result may appear correct. In some cases, any result is a result but I think we should strive for better.

Perhaps the community could start to collect some tests to try. Clearly the AI, like any schooled with learn for the exam and not the whole but even so I suspect the exercise may help us in better writing—or structuring information—to assist AI in order to better assist ourselves as the actual end-user.

Current note-taking tools (in the 21C) are surprisingly ill-informed about much prior informational work. Touchstones now are: scale, speed, wikis, wiki-linking, Markdown, and bizarrely to my mind zettelkasten (mainly due to Ahren’s book§. Researching this I’m often left with a mental picture of a bunch of eager folk bolting a rocket engine to a big wheel, while behind them a race-tuned car sits under a tarp—if only they were a little more inquisitive, more progress might occur.

† The normal meaning of the word, not the AI context space.

‡. Such ‘small’ task are invariably the most complex and far-reaching changes that get made. An example is revision to tighten consistent use of terminology. Nice for the English-speaking user, it also has implication for those needing to use auto-translation or for an AI, that we know cannot ‘read’ text in a human manner.

§. How to Take Smart Notes. Take Smart Notes — Sönke Ahrens.

2 Likes

Thank you—this is a very helpful clarification, and I think it gets to the heart of the issue.

What struck me most in your comment is the distinction between making the material accessible to AI and making it genuinely intelligible to AI. Those are clearly not the same thing. Breaking aTbRef into Markdown sections may improve ingestibility, but if the original work was written as hypertext—where meaning often emerges across links, branches, and accumulated context—then such slicing may actually remove part of the logic the human reader would naturally reconstruct by following trails.

Your point about Tinderbox rarely having one single “right” way is especially important. In practice, good answers are often contextual, and experienced human guidance frequently depends on a rich sense of the document, the tool’s evolution, and the user’s broader goal. That sort of background judgment is exactly what current AI interactions tend to flatten.

So to me the more interesting question is not simply “how do we feed aTbRef to AI?” but “how should we write and structure reference material so that AI can use it without losing the very qualities that make it valuable to human readers?” That seems like the deeper problem—and probably the more durable one.

I also strongly agree with your point about evaluation. A plausible answer is not necessarily a correct, relevant, or efficient one, especially when the user cannot fully specify the desired outcome in advance. A shared set of test cases would be extremely useful, both for judging the quality of AI-assisted answers and for revealing what kinds of structure help or hinder them.

In that sense, this feels less like a one-off experiment and more like an opportunity to learn something important about documentation, knowledge design, and AI mediation.

1 Like

Music to my ears! I don’t claim to have the answers here and other are ahead of me on AI experience. But, as you too point out, if we can better inform the AI of the concepts/rules in a manner it can ‘understand’, and in a manner not wasteful of tokens/context, then this has to be to our benefit.

The tech press are now reporting that Anthropic is A/B testing removing Claude Code from the ‘Pro’ subscription for new sign-ups (the cheapest non-free plan). This isn’t surprising. Investment gambling (VC) money is currently he buffer between the true service production price and what we pay: it cannot last. On a fixed income I expect soon to be priced out of the market. This is why it is even more important that finding a method to effectively (thriftily!) inform the AI will be of significant benefit to all but those with deep pockets.

1 Like

Yes — I think that is exactly the point.

The real challenge is not simply giving the AI more material, but giving it the right concepts and rules in a form it can use efficiently, without having to drown it in context every time. Otherwise the method only works while tokens are cheap.

And I think your cost point is crucial. If current pricing is still being softened by VC subsidy, then “efficiently informing the AI” is not just a technical nicety — it becomes a practical necessity. Without that, serious AI use may increasingly become something reserved for those who can afford large context windows, repeated retries, and generous subscriptions.

So yes, to my mind this is not only about improving answers. It is also about keeping advanced AI use viable for ordinary users. The more carefully and thriftly we can encode and present knowledge, the less dependent we are on brute-force context — and the less likely it is that useful AI becomes a deep-pocket privilege.

1 Like

I believe that this will work much more efficiently with finer-grained segmentation, so that the model can read a concise overview of what is available at the start of a session, and knows where to find more information as that information is needed. There is no need, for example, to spend tokens on HTML Export until HTML Export is actually needed. You don’t need json operators unless you’re going to be parsing json.

I updated the skill quite a bit based on the discussion here, so wanted to share a more detailed version.

The basic idea is still the same — the whole aTbRef 11 lives inside the skill as 30 source files, so Claude reads from the actual reference rather than guessing. But I added a proper lookup tool that finds any note by name in milliseconds and also follows the → [clarify] cross-references, which was the thing Mark pointed out: a lot of the useful information in aTbRef only makes sense when you follow the links, not just read one note in isolation. There are about 6,800 of those cross-references in the corpus, so it actually matters. (implementation of @mwra 's input)

I also added some workflow guidance after reading the discussion here — things like preferring observational agents before action-taking ones, always explaining whether something affects an alias or the original note (that one trips people up a lot), and not jumping straight to $Rule or $AgentAction when a Stamp would do. Basically trying to make Claude less eager to suggest the most powerful tool when a simpler one fits.

Honestly the reference layer works pretty well — syntax questions, operator signatures, attribute types, all of that. The judgment layer is another story and I think Mark is completely right that 21 years of accumulated context isn’t something you can just index your way into. But for day-to-day “what does this operator do” or “write me an OnAdd for this” kind of questions it’s quite handy, especially since Claude can also just make the changes directly in your document via MCP if you have that set up.

Runs fine on Sonnet, no need for Opus. Happy to share the .skill file if anyone wants to try it — and very much up for test cases if people want to stress-test it properly.

atbref-11_2.0.skill.zip (758.4 KB)

1 Like

Thanks for sharing this. I fully expect this will result in some indications for how the corpus might better be re-structured or a new resource written. No small task, but a potentially interesting experiment (and a paper!). But that’s n the future. Busy with paper deadlines ATM but i’ll watch this with interest.

1 Like

FWIW, I recently built an experimental MCP server for Tinderbox that includes reference information derived primarily from @mwra’s excellent aTbRef work.

I’ve been using the MCP server I built for a few weeks now with great success. Give it a whirl and let me know what you think.

1 Like

Nice! Can you tell us what it does that the official MCP server doesn’t? Besides the reference information.

I tried to faithfully implement Tinderbox’s built-in MCP tools, with one exception (as you noted): I added an additional tool for enabling the AI to lookup reference information.

But my ultimate goal was to build an MCP server for Tinderbox that Claude Cowork could use as detailed in my blog post.

1 Like

Excellent, thank you.

What’s the secret for getting Claude Cowork to see MCP?

Just installed your MCP server and will play with it - a little confusing for me was the instruction:

Add the following to your Claude Desktop configuration file:
{
“mcpServers”: {
“tinderbox”: {
“command”: “ruby”,
“args”: [“/full/path/to/tbx-mcp/server.rb”]
}
}
}

since the TBX native MCP server was already there I replace the entry with the new path and args.

I‘ll come back to you and report my experience. At least what I can say today: I love the concept and appreciate the amount of work you put into your tool.

1 Like

FWIW, and stating up front that I don’t know very much about this, but I had to disable Tinderbox’s built-in MCP by turning off AI integration in the app menu. In that configuration Claude does not launch Tinderbox, but expects it to be running already.