Tinderbox and Large Language Models

Hi all, I’m starting this thread in response to interest from @satikusala about using Tinderbox in conjunction with large language models (LLMs). I hope others will share ideas here as well.

I don’t think I’ll be able to match what @webline has done in terms of providing a ready-to-go solution, but maybe others will be able to build on the outlines I’m providing below.

What LLMs are doing for me

I’m a nonfiction author. I wanted to create an assistant that can pick up some of the tasks that slow me down and create bottlenecks. I wanted the assistant to have access to my notes and sources and stay focused on those rather than its own “knowledge.” So I build a system based on retrieval-augmented generation (RAG) and here’s what it is doing for me:

  • Finding similar notes. A stamp sends the note’s text to an LLM that has access to all my notes. The LLM performs a semantic similarity search and returns the top n results with a summary. The reply also sets up a way to link the found notes to the original note. (I also use Tinderbox’s own similar notes feature. So far there’s been some overlap but enough difference to keep me using both.)

  • Mapping notes. All my notes are in a vector database, organized by an LLM’s view of their semantic similarities. I can retrieve from that database my notes’ embeddings, which are numerical representations of each note’s text. Once these numbers are in Tinderbox, they can serve as $Xpos and $Ypos, and I can see how similar notes are clustered together. There’s an early example of this here.

  • Chat about my notes. I can have a conversation with an LLM about my notes entirely within Tinderbox. The chat is set up to use “memory” of previous turns in a given conversation (also stored in Tinderbox). Replies always include references to my own notes. But I can also set it up to have a conversation about any digital text. More on this below.

  • Summarization. A stamp collects all notes in a container or agent and sends them to an LLM with a request to summarize. The summary gets imported into the container’s $Text but it could go anywhere. Useful for summarizing all notes with a certain tag or all notes taken on a given source.

  • Tagging. A stamp collects a note’s $Text and all the tags currently in use in the document, then sends them to an LLM with a request for “zero-shot classification” (no training). The result is imported into the note’s $Tags for review.

  • Research via similarity search of full-text sources. Outside of Tinderbox, I can store any digital text in a vector database, broken down by paragraph. From within Tinderbox I can write a query that will retrieve results not just based on keywords but on semantic distance. For example, a query about ancient Greece and Rome will return paragraphs about Demosthenes even if the word “Greece” is not in the searched text. In other words, I can search for ideas and concepts rather than just keywords. And I can search really big texts. Within Tinderbox, I can import the search result’s metadata into attributes, organize the notes by the relevance score provided by the LLM, and filter out irrelevant results.

So what does this look like in practice? For me the biggest improvement to my workflow has been using chat to brainstorm. As I prepare to write a section, I describe what I plan to write and ask for notes that will support my ideas. I get a conversational summary that sometimes contains new connections and insights—all based on notes I already have. I can then go back and ask follow-up questions. Creating a plan for writing becomes a dialog.

One of my weaknesses as a writer is over-researching. Maybe it’s true you can never know too much, but you can have too many notes. For me, going from brainstorming to writing always included a tedious intermediate step of finding, reviewing, and gathering relevant notes, placing them in an outline, and then reviewing them again. All this has been made worse by my tendency to over-research and overthink.

Now I start a conversation. Even though I’m aware there’s no consciousness on the other side, the conversation keeps me in a creative thinking mode and away from the linearity of reading through notes and placing them in outlines. It presents my notes in a new way, giving me a fresh perspective. And it forces me to be clearer about what I want to say before the actual writing starts.

So that’s what I’m doing with LLMs and Tinderbox. The other part of the question is how, which I’ll get into in my next post!

Is anyone else integrating LLMs into their Tinderbox workflows? What does that look like?

7 Likes

Hopefully, one day, in a galaxy not too far away, we can see round 2, the other half (dark side) of preparing the file with an LLM vector database.

No pressure… I want to be respectful of your time, but this thread is full of awesomeness!!!

You are awesome…

Tom

1 Like

This sounds…err…reads really exciting. I’m keen to learn how you’ve achieved performing some of these workflows with RAG.

Thank you in advance for sharing!

Allen, can you share info on the tech you are using for the database and LLM? Your process is interesting and it would helpful to know how to build it out for my own use cases.

(FWIW, I am thinking of creating a similar approach to support my research and writing, using Tana, instead of Tinderbox. I been exploring Tana off and on since its alpha days. Tana is coming out of beta in its first public release on 2/3/25. With it’s robust metadata modeling, command building, and AI integration, plus mobile support, it’s an easier environment to build these models than Tinderbox, I think.)

2 Likes

This is AWESOME!

So the next big question is, “How does one exactly setup a local LLM?”= What are the steps and considerations?" Any guidance would be much appreciated.

I am certainly no expert on this topic but have begun my research. I have read that a popular “out of the box” GUI solution is LMStudio.ai and a common cli package that people are using is ollama.com. There are others I am certain, but these are the two that seem to be bubbling up in the few articles I have read.

I have not installed either yet… readOn…

I have just begun my research as well, but this is where I am at. I am new to this game. I got started down this path recently when deepseek was introduced and became interested to maybe try it out…locally only though.

It is my understanding once you choose a model, there are more choices depending on your hardware.

But the big question for me right now, is how do I sandbox it on a mac?
I am aware of docker for windows (my understanding is it is not available for mac) but this, for me is where it gets complicated.

It is my understanding, if you install locally without sandboxing, some potential future rogue models “could” access your files, which is scarry for me.

Thought I would share my thoughts of where I am in my journey.

Tom

1 Like

Great point! Me too. Looking forward to figuring this out together.

Just checked, Docker is supported for the Mac: Docker Desktop: The #1 Containerization Tool for Developers | Docker, but as noted in this video https://www.youtube.com/watch?v=7TR-FLWNVHY it can’t access the macOS GPUs, so we’re out of luck here.

A bit more on the how, with a disclaimer up front: I am not a programmer and not an LLM expert. I wandered into this knowing just enough Python to scrape by and have relied heavily on online tutorials and examples. My code is a mess and I’m constantly revising it. I’m happy to help others muddle through like I did but I can’t provide a ready-to-go solution.

When I started on this project, I learned about embeddings, which are numerical representations of text (or images, audio, etc.) designed to capture semantic meaning. They are fundamental to the functioning of LLMs. In a retrieval-augmented system, document embeddings are stored in a vector database, which is optimized for efficient similarity searches and distance calculations between vectors–allowing for search by meaning instead of just by keyword. (More here.)

The first thing I did was set up a Python virtual environment, isolating whatever mess I make from other Python installations. This is where my vector database, Python modules, and scripts live. I then wrote scripts in Python to handle data coming in from Tinderbox: a script to add text to the vector database, a script to retrieve documents from that database, another to retrieve and pass onto a generative LLM, and so on.

Within Tinderbox, I created stamps that:

  1. gather data

  2. format it for the LLM

  3. activate the virtual environment and trigger a script using runCommand()

  4. assign the response to a Tinderbox attribute

For some LLM tasks that require only a few Tinderbox attributes, it was easy enough to pass them on directly as arguments. The output from the script then gets assigned to a Tinderbox attribute. Something like this:

$someAttribute = runCommand(“source path/to/env/bin/activate&&python path/to/script.py arg1 arg2”)

If there is a lot of text and associated data I use a stamp that formats it as JSON lines using a Tinderbox template and exportedString(). The stamp then saves to file using runCommand(). The same stamp then triggers a Python script that reads the file and starts the requested process. In these cases the response is written to file, and the number “1” is sent back to $someBoolean in Tinderbox, which triggers an import rule (Mark helped me with this part here).

That’s about all on the Tinderbox side. The rest is the script interacting with parts of the system. This includes:

  • Scripts to add Tinderbox info to my vector database.

  • Scripts to ask the vector database for a similarity search of a given text.

  • Scripts to ask for a similarity search and to pass the results to an LLM with my query.

The tools I’m using include:

  • Chroma as a vector database. Documentation is very clear and concise. Here’s a bare-bones example of how to create a Chroma database and add documents to it.

  • Hugging Face transformers module. This Python module makes it easy to interact with Hugging Face LLMs.

  • I started with LangChain, another Python module that seemed promising in terms of managing the pipeline, but I’ve encountered too many limitations with it to continue using.

For LLMs, I’m in favor of using specialized local models from Hugging Face. You don’t need the state-of-the-art for every task.

For generative LLMs, the ones that chat with you, the big decision is whether to use one of the household names like ChatGPT, and access it through its API, or download a free LLM like Llama or DeepSeek. I started with ChatGPT, moved over to Llama, and am now playing with DeepSeek.

For local models, there are a crazy number of ways to interact with them. I’m using llama.cpp, a command-line interface that seems much faster than the other options, though not as user-friendly. I trigger the command line from inside a Python script. Other options include transformers pipeline (part of the transformers module); Ollama, which can be accessed through Python modules, R packages, or the command line; and LM Studio, which is great for experimentation and even customizes code for you. Unfortunately it requires Apple Silicon. There’s also the llama-cpp-python module that I haven’t tried.

Huge number of choices, vast possibilities. The more I learn, the more applications I find.

For those who are interested, what do you see LLMs doing for you?

5 Likes

Hi Tom, if you have an Apple Silicon Mac I definitely recommend checking out LM studio. Ollama is also pretty easy to interact with, and you can download DeepSeek from that platform as well.

You can use Docker on a Mac but many Docker images aren’t set up for Apple Silicon chips yet. I played around with a Docker Ubuntu container on my Mac and found it difficult to manage.

Interested in hearing more about this concern about rogue models accessing your system. For now at least my understanding is that they only activate when called…

all I can say is…

Wow

It sounds like Allen is doing a lot of what I’d love Tinderbox to support. In spring 2023, I did a deep dive on this. I got very good at sending notes from Tinderbox to a Python script that sent them to ChatGPT and saved the response to a new note. I think the way I had it working was having a folder of prompts, or stamps, that I could then apply to notes.

However, I found it fragile, as my programming skills are limited. It would work for weeks, then stop working, and I’d have to go back and fix it. What I really wanted was to be able to have a more robust way of doing it so that I could chain prompts together, or give an LLM tools in Tinderbox, and have it access Tinderbox notes, tools, functions, attributes, etc. and interact with them, move things around, file them, make connections, etc. But, that was beyond my grasp. At some point last summer, I gave up on that because my Tinderbox programming wasn’t up to snuff. Also, frankly, the LLMs are really good at writing Python, so I started doing a lot of this sort of thing in Python with files directly, rather than trying to get my Tinderbox code working. I still think it would be incredibly useful to be able to apply a series of prompts to a note and have the LLM interact with the notes. But, it was always slightly beyond my grasp. I’m sure if I spent the time learning how to parse JSON in Tinderbox, it could be done. But, anyway, I gave up as I realized I was procrastinating from actually writing the book, and more to the point, some of the book writing was what I was trying to outsource.

Tools, I agree with Allen. Lang Chain is great; Ollama for running from the command line is my preferred route, but LM Studio is a something I just came across and it’s useful, while GPT4All has a RAG feature built into it. I am pretty sure there is something for Obsidian as well, though I’ve not looked into it.

I’ve certainly had LLMs interacting with my files because I spent a lot of time creating tools and hooking them into an LLM. But, this was me writing scripts that gave the LLM tools, and let it choose which script to run. Ollama, LM Studio, GPT4All, etc., don’t access your files per se. So, I’ve not bothered with things like Docker.

I’ve just done it with homebrew, or in Ollama’s case they have a Mac app.

All of them can run a smaller version of deep seek locally, for example. But, to really get good results, my 16 GB of RAM is a bit too little. More is better in the world of LLMs in my experience.

1 Like

p.s. I’m just checking, and pointing GPT4All at an export folder from Tinderbox, works well enough to chat with my files.

Thanks for the post Daniel, lots of great stuff in there.
Question: Have you compared GPT4All vs LLM Studio?

Tom

I set up a simple curl that I can post through turn commandment and get direct responses back from ChatGPT. I have a boolean that will also let me maintain (the history of the conversation). It is similar to what @detlef did but a little more streamlined to align with my understanding of coding. It works great. I can get summaries, answers, and all kinds of things right back into the tinderbox with one check of a boolean.

I am interested, however, on learning more about Allen’s approach and getting it local, so 1) I can keep the data on my machine, and 2) learn the parameters and methods for training a model.

FWIW, I am thinking of creating a similar approach to support my research and writing, using Tana , instead of Tinderbox. I been exploring Tana off and on since its alpha days.

Thanks Paul for telling me about Tana. I took a quick look and will likely dive in further later. How are you using it? What does a research/writing workflow look like in Tana?

Glad to know about this app–very easy to use. The recommended model gave me some very weird answers but I’ll keep trying.

FWIW NotebookLM gives you a lot more control over the documents the model will consult to answer the question and lets you do some limited interaction with the docs within the platform.

Thanks again!

1 Like

If you can stomach sending data to Google, then NotebookLM has a much larger context window! About 1,000,000 tokens, or say a book. Claude Opus also has a lot. But, that costs money.

I have found it interesting to give NotebookLM a lot of (my) text, and see what it comes up with about a particular topic.

But, in the end, I’m finding coding a la James McPhee, then emergent structure, and lots and lots rewriting, works much better for me.

I’m a Professor in my day job, and have given up with student essays because they all use the GPTs, and the results as so generic. It feels a bit like power tools and carpentry. You have to know how to use power tools, or you’ll take of your own finger.

I think for those wanting to run LLMs locally, what truly matters is RAM. On my M1 16 GB of RAM, I’m lucky if I can get a LLM to output say 500 words output, which is probably 1000 tokens, and it will take a bit of time. I can run 7B or 8B models… but not bigger. The big ones might be 600B or more.

The way the RAG works, or at least how I think about it, is it creates a search engine for chunks of text related to your prompt, and then send that to the LLM, which then tries to make sense of it. Problem is, the LLM is quite limited. The limiting factor is RAM. But, if you had much more RAM, you could run Llama 70B, and get something like Chat GPT 4. But, we’re probably need 96 GB or something to run that, or run a model getting close to as good as Chat GPT o.

If I had no marginal cost, I’d start to do interesting stuff.

What I’m intrigued with is playing with is having a LLM work with a Tinderbox document, make changes, add notes, expand on idea, comment on it, etc. I’d love to give a LLM running locally hooks into the Tinderbox notebook, and let it start to do stuff. Or, be able to use LLM prompts in rules or agents. I’m sure if I dug into it again, I could. But, anyway. What do I mean? Almost like automator for prompts. Break down a tasks into a dozen steps, and then ask the LLM to do each step.

What’s a use case? Say, a historic archive of legal cases from rural Colombia, that are hand written, and where in garbage bags. Use a LLM to do HTR/OCR on the handwriting. (Works well enough with QWEN 2 VL on my computer, but even better with QWEN 72 VL, which requires a lot of GPU power.) Then use Named Entity Recognition to extract names, places, organizations, much as Tinderbox already does. Then use a LLM to summarize case files, build a catalogue, extract summaries of where each person, place, appears, etc. To do all that, I find myself falling back to Python, the command line, and text files. Although, I’d love to do it in Tinderbox.

But, anyway. I digress.

As far as models. I suggest Llama 3.2 8B Instruct works well.

The DeepSeek models are interesting, but I find them less useful because they’re apt to go and think, when sometimes I just want an answer,.

Of them all, for accessing from the command line and python, I use Ollama. For the VL models to do HTR/OCR reach out over email, and I’ll share the approach.

But, for replacing one of the online services, I think I’d suggest LM Studio.

GPT4All works well with the local docs, but I prefer LM Studio.

3 Likes

I recently found LM Studio. But, have done most of what I do from python, and Ollama works well for that.

1 Like

Tana looks cool. Thanks for sharing.

1 Like