Help with Tinderbox and LLMs

Maybe I’m not very good at searching but I haven’t been able to find much information on user experiences with the new Tinderbox 11 and its incorporation of AI functionalities capabilities.
If there is indeed such information and such experiences, could someone please share them or point me to some useful links?

I’m interested in finding out how good is the AI in finding information that lives in some form in Tinderbox. Tinderbox is already very good at finding information but many times if you don’t remember specific keywords you have used in your documents you will not be able to retrieve the information you are looking for.

I’d like to know something as simple as whether now with TB 11 you can do now what should be possible with RAG. That is, use a natural language prompt to extract information from the texts stored in Tinderbox; for instance, prompts of the sort “Please tell me which authors have made proposals which are compatible with X’s proposal in that the explanation of phenomenon/problem Y lies in Z and not in W”. Can you now do this in TB?

If so, how good is it at it?

If I understand it correctly, now TB only works with Claude. I’m not a subscriber to Claude and I’m not too keen on paying another 20 bucks a month for yet another subscription. Is it possible to hook TB to other LLMs like ChatGPT or to local LLMs via Ollama or LM Studio?

Thanks in advance for your time.

JM

I have also just started using it and have achieved surprising results. It works perfectly with the free version of Claude.This is more than sufficient for initial trials. However, it is addictive due to the amazing results and can sometimes be perplexing.

and further to the last answer above…

Does use with Claude AI require a subscription? No, but it does require the Claude desktop app to be installed; there is no requirement to have a Claude subscription. Feedback from those trying the TB/Claude integration report good progress without a subscription. Of course, if you make very heavy use of Claude, you may run out of free tokens. At that point you either have to wait for a period or subscribe—it is impossible to predict how quickly—if at all—your personal work will hit a token limit.

Summary. Claude integration requires use of the macOS Claude Desktop app, and a Claude login/ID. However a paid Claude subscription is not a pre-use requirement. Depending on the degree/complexity of use, some users may choose to take a subscription rather than wait until free tokens become available again.

Does Tinderbox integrate with other AIs? Not at present, but wider support is planned including use of local AI engines. The need for the latter is understood so sensitive info (e.g. patient data) does not get passed out onto the cloud, aka unknown server in unknown jurisdiction..

Why only Claude? I don’t want to use Claude but something else. Anthropic, Claude’s maker, invented the MCP bridging concept that allows the AI to communicate with apps and with the app’s content via the app. The tech is new, under constant change and lightly documented. Unlike Anthropic Eastgate doesn’t have a vast staff and $Bn research budget for AI implementation and has to support all its customers, not just those wanting to use or simply try out AI.

An emergent challenge is that AI’s don’t understand apps in the way humans do, so implementation now has to allow for two different types of use, an AI vs. a human. Results are counter-intuitive too. AI can do some task we think hard, like implementing a poster, but fail at more early stage information exploration where a human’s superior associative thinking skill tends to win out (AI is less sure footed where information is missing, i.e. we are looking for implicit not explicit links).

The best place to assess that is to read the posts of fellow users reporting here. Tinderbox doesn’t have the user base of Microsoft or Google and engaged users tend to interact with the forum. Or, if blogging, likely their output will get linked to form here. Despite the AI snake-oil promises, use requires both some effort and human ‘adult supervision’ in terms of assessing whether results make sense. AI is not a “don’t make me think” machine.

Using AI simply as a way to employ Tinderbox automation without learning how the latter works—as hinted at be some recent threads—is also as yet not necessarily a proven success. Bear in mind that if I don’t understand how the automation works, and Claude has a limited and non-human understanding, just because i get an output doesn’t mean it is either correct or a complete answer, or even the process I assumed should happen.

Those reporting most success appear to be users employing the aI to do tasks where they have some understanding of the process and in subject areas where they are able to spot bogus result, be they incorrect or hallucinated.

Likely yes, but only if you are familiar enough with the source material to know if the answers given make sense. IOW, AI may get you an answer more quickly/with less effort, but it doesn’t really know if its answer is correct or pertinent. It will do better with bounded questions or processes like “make me poster” that “what are the main ideas in the subject I’m studying”. I make this point as the AI companies are over-promising at present (shareholders demand profits) so we need be attentive to actual vs promised performance.

That said. the early reports of actual Tinderbox+Claude use are positive and encouraging, but just don’t think we’ve suddenly discovered sci-fi level General AI when planning one’s own use of the integration.

I also think it is early days to expect libraries of copy/paste prompts for various tasks. At this point you have to roll your sleeves up and just try. Otherwise, just wait a few months as the tech is improving rapidly.

Well, it’s only something like six weeks since the first test release. Experience is bound to be thin. There are some good discussions on this forum, and more on the backstage forum. I’ve written quite a bit at https://markBernstein.org/ .

Yes, that ought to be possible, if the AI can reasonably assess what is and isn’t “compatible with X’s proposal.” That proposition itself might be insoluble (cf. Gödel), or it might be easy.

I’ve chiefly done clerical tasks: “The name of a ‘book note’ is book title, perhaps followed by the author’s name in parenthesis, or preceded by the author’s name. The text might have a call number or other brief location memo, but nothing else. Please make a list of all my book notes. and store that list in a new note.”

I’ve also done things like “Look at the notes in the container named /Topics/RAG. What are some important recent papers that bear on these questions but that are not discussed there?”

At present, Claude is inclined at times to be sycophantic, lazy, and occasionally less than honest. It is also extraordinarily well read. It requires close supervision, but when supervised it can be extraordinarily helpful.

Thanks everyone for your answers. I’m definitely going to give it a try.

JM

So far I have used the Claude app alone (without Tinderbox) to parse a computer generated data file (sample metadata in rows followed by data columns), and reshape it into all columns, duplicating the sample metadata into columns, so each row of sample measurements had the sample data from the header. This sort of data reshaping is tedious to do by hand and can be error prone; previously I have written Python scripts to automate the process. My Python skills are learned as I go, so I would hit snags and have to figure out what the problem was, then fix it. Scripting the data reshaping took as much time as scripting the analysis, and each instrument would again require a new or modified script. Claude analysed a data file and wrote a script to parse and reshape it in less time than it would have taken to copy and paste one sample’s metadata by hand.
When paired with Tinderbox, my trivial example so far has used Claude to clean up notes that I exploded from a table I copied from the internet. I also had it generate a simple LaTex table. More important from my point of view was automating documentation of interactions with Claude to leverage learning from multiple sessions within the Tinderbox file. Most of the automation is 1) to provide documentation of the file manipulations for me to remember what was done, and 2) to summarize what Claude learned across multiple sessions in a form that it says will reduce my usage.

In the realm of data analysis for scientific papers, there has been a push for what is called “literate programming”, where the intermediate steps taken to process the data and do the analysis are documented in prose along with code that does the task. This isn’t a new idea, Donald Knuth wrote the TeX typesetting system using literate programming. It has taken a while to catch on in the areas of science where I work, because data sets were small enough that spreadsheets were good enough. I can see, at least for me, that documenting my thought process as I work with notes for writing academic papers would be a great help to write more quickly.

Note: it’s worth mentioning, for those who can’t, won’t ,or aren’t allowed to use the likes of Claude, happily there is an app for doing data transforms of the type you describe without an AI. See https://www.easydatatransform.com. Tip: it’s another small software house and of late the app has been in the artisanal software summer/winterfest sales.

I’m also used to triaging badly formed data where seeing—at least samples of—it is a necessity. I’m depressed at the number of papers I review that state along the lines of “we just took the output of X and put into into our algorithm Y”: that’s usually an immediate soft reject from me pending author clarification of the process. Else it’s garbage in, garbage out.

I too have started to use Claude to look at data and use it to review/fix data. I started on my own data, as I know who to blame if I find mistakes but it also helps if one has some idea of the data provenance as it helps figure out the cause of errors as well as just fixing the data table.

It’s frustrating that in the rush to have something shiny to show, many AI-involved demos rush to the shiny end point, whereas the real learning is tin the middle. Both things that do and don’t work and all the emergent scaffolding and praxis about recording process for next use or re-working for a different process. In fairness, the speed of change with AI work means it is perhaps evolving faster than we can turn out explanations, not least as some of those go out of date fast.

You raise an excellent point, Mark. I hadn’t thought of the need for verification of tool changes in the context of AI. But just like any other software tool where the code changes, maybe I need to include some tests that get run to validate the automated processes, similar to several of the Python test frameworks that run the test and compare to expected output. I noticed that the Claude desktop app showed 2 updates that needed restart in one day. Like any rapidly evolving code base.

“Claude, how many fingers am I holding up?”