A bit more on the how, with a disclaimer up front: I am not a programmer and not an LLM expert. I wandered into this knowing just enough Python to scrape by and have relied heavily on online tutorials and examples. My code is a mess and I’m constantly revising it. I’m happy to help others muddle through like I did but I can’t provide a ready-to-go solution.
When I started on this project, I learned about embeddings, which are numerical representations of text (or images, audio, etc.) designed to capture semantic meaning. They are fundamental to the functioning of LLMs. In a retrieval-augmented system, document embeddings are stored in a vector database, which is optimized for efficient similarity searches and distance calculations between vectors–allowing for search by meaning instead of just by keyword. (More here.)
The first thing I did was set up a Python virtual environment, isolating whatever mess I make from other Python installations. This is where my vector database, Python modules, and scripts live. I then wrote scripts in Python to handle data coming in from Tinderbox: a script to add text to the vector database, a script to retrieve documents from that database, another to retrieve and pass onto a generative LLM, and so on.
Within Tinderbox, I created stamps that:
-
gather data
-
format it for the LLM
-
activate the virtual environment and trigger a script using
runCommand()
-
assign the response to a Tinderbox attribute
For some LLM tasks that require only a few Tinderbox attributes, it was easy enough to pass them on directly as arguments. The output from the script then gets assigned to a Tinderbox attribute. Something like this:
$someAttribute = runCommand(“source path/to/env/bin/activate&&python path/to/script.py arg1 arg2”)
If there is a lot of text and associated data I use a stamp that formats it as JSON lines using a Tinderbox template and exportedString()
. The stamp then saves to file using runCommand()
. The same stamp then triggers a Python script that reads the file and starts the requested process. In these cases the response is written to file, and the number “1” is sent back to $someBoolean
in Tinderbox, which triggers an import rule (Mark helped me with this part here).
That’s about all on the Tinderbox side. The rest is the script interacting with parts of the system. This includes:
-
Scripts to add Tinderbox info to my vector database.
-
Scripts to ask the vector database for a similarity search of a given text.
-
Scripts to ask for a similarity search and to pass the results to an LLM with my query.
The tools I’m using include:
-
Chroma as a vector database. Documentation is very clear and concise. Here’s a bare-bones example of how to create a Chroma database and add documents to it.
-
Hugging Face transformers module. This Python module makes it easy to interact with Hugging Face LLMs.
-
I started with LangChain, another Python module that seemed promising in terms of managing the pipeline, but I’ve encountered too many limitations with it to continue using.
For LLMs, I’m in favor of using specialized local models from Hugging Face. You don’t need the state-of-the-art for every task.
-
For creating embeddings to feed into Chroma: all-MiniLM-L6-v2
-
For performing classification tasks: bart-large-mnli or deberta-v3-base-zeroshot
For generative LLMs, the ones that chat with you, the big decision is whether to use one of the household names like ChatGPT, and access it through its API, or download a free LLM like Llama or DeepSeek. I started with ChatGPT, moved over to Llama, and am now playing with DeepSeek.
For local models, there are a crazy number of ways to interact with them. I’m using llama.cpp, a command-line interface that seems much faster than the other options, though not as user-friendly. I trigger the command line from inside a Python script. Other options include transformers pipeline (part of the transformers module); Ollama, which can be accessed through Python modules, R packages, or the command line; and LM Studio, which is great for experimentation and even customizes code for you. Unfortunately it requires Apple Silicon. There’s also the llama-cpp-python module that I haven’t tried.
Huge number of choices, vast possibilities. The more I learn, the more applications I find.
For those who are interested, what do you see LLMs doing for you?