Help with Tinderbox and LLMs

Maybe I’m not very good at searching but I haven’t been able to find much information on user experiences with the new Tinderbox 11 and its incorporation of AI functionalities capabilities.
If there is indeed such information and such experiences, could someone please share them or point me to some useful links?

I’m interested in finding out how good is the AI in finding information that lives in some form in Tinderbox. Tinderbox is already very good at finding information but many times if you don’t remember specific keywords you have used in your documents you will not be able to retrieve the information you are looking for.

I’d like to know something as simple as whether now with TB 11 you can do now what should be possible with RAG. That is, use a natural language prompt to extract information from the texts stored in Tinderbox; for instance, prompts of the sort “Please tell me which authors have made proposals which are compatible with X’s proposal in that the explanation of phenomenon/problem Y lies in Z and not in W”. Can you now do this in TB?

If so, how good is it at it?

If I understand it correctly, now TB only works with Claude. I’m not a subscriber to Claude and I’m not too keen on paying another 20 bucks a month for yet another subscription. Is it possible to hook TB to other LLMs like ChatGPT or to local LLMs via Ollama or LM Studio?

Thanks in advance for your time.

JM

I have also just started using it and have achieved surprising results. It works perfectly with the free version of Claude.This is more than sufficient for initial trials. However, it is addictive due to the amazing results and can sometimes be perplexing.

and further to the last answer above…

Does use with Claude AI require a subscription? No, but it does require the Claude desktop app to be installed; there is no requirement to have a Claude subscription. Feedback from those trying the TB/Claude integration report good progress without a subscription. Of course, if you make very heavy use of Claude, you may run out of free tokens. At that point you either have to wait for a period or subscribe—it is impossible to predict how quickly—if at all—your personal work will hit a token limit.

Summary. Claude integration requires use of the macOS Claude Desktop app, and a Claude login/ID. However a paid Claude subscription is not a pre-use requirement. Depending on the degree/complexity of use, some users may choose to take a subscription rather than wait until free tokens become available again.

Does Tinderbox integrate with other AIs? Not at present, but wider support is planned including use of local AI engines. The need for the latter is understood so sensitive info (e.g. patient data) does not get passed out onto the cloud, aka unknown server in unknown jurisdiction..

Why only Claude? I don’t want to use Claude but something else. Anthropic, Claude’s maker, invented the MCP bridging concept that allows the AI to communicate with apps and with the app’s content via the app. The tech is new, under constant change and lightly documented. Unlike Anthropic Eastgate doesn’t have a vast staff and $Bn research budget for AI implementation and has to support all its customers, not just those wanting to use or simply try out AI.

An emergent challenge is that AI’s don’t understand apps in the way humans do, so implementation now has to allow for two different types of use, an AI vs. a human. Results are counter-intuitive too. AI can do some task we think hard, like implementing a poster, but fail at more early stage information exploration where a human’s superior associative thinking skill tends to win out (AI is less sure footed where information is missing, i.e. we are looking for implicit not explicit links).

The best place to assess that is to read the posts of fellow users reporting here. Tinderbox doesn’t have the user base of Microsoft or Google and engaged users tend to interact with the forum. Or, if blogging, likely their output will get linked to form here. Despite the AI snake-oil promises, use requires both some effort and human ‘adult supervision’ in terms of assessing whether results make sense. AI is not a “don’t make me think” machine.

Using AI simply as a way to employ Tinderbox automation without learning how the latter works—as hinted at be some recent threads—is also as yet not necessarily a proven success. Bear in mind that if I don’t understand how the automation works, and Claude has a limited and non-human understanding, just because i get an output doesn’t mean it is either correct or a complete answer, or even the process I assumed should happen.

Those reporting most success appear to be users employing the aI to do tasks where they have some understanding of the process and in subject areas where they are able to spot bogus result, be they incorrect or hallucinated.

Likely yes, but only if you are familiar enough with the source material to know if the answers given make sense. IOW, AI may get you an answer more quickly/with less effort, but it doesn’t really know if its answer is correct or pertinent. It will do better with bounded questions or processes like “make me poster” that “what are the main ideas in the subject I’m studying”. I make this point as the AI companies are over-promising at present (shareholders demand profits) so we need be attentive to actual vs promised performance.

That said. the early reports of actual Tinderbox+Claude use are positive and encouraging, but just don’t think we’ve suddenly discovered sci-fi level General AI when planning one’s own use of the integration.

I also think it is early days to expect libraries of copy/paste prompts for various tasks. At this point you have to roll your sleeves up and just try. Otherwise, just wait a few months as the tech is improving rapidly.

Well, it’s only something like six weeks since the first test release. Experience is bound to be thin. There are some good discussions on this forum, and more on the backstage forum. I’ve written quite a bit at https://markBernstein.org/ .

Yes, that ought to be possible, if the AI can reasonably assess what is and isn’t “compatible with X’s proposal.” That proposition itself might be insoluble (cf. Gödel), or it might be easy.

I’ve chiefly done clerical tasks: “The name of a ‘book note’ is book title, perhaps followed by the author’s name in parenthesis, or preceded by the author’s name. The text might have a call number or other brief location memo, but nothing else. Please make a list of all my book notes. and store that list in a new note.”

I’ve also done things like “Look at the notes in the container named /Topics/RAG. What are some important recent papers that bear on these questions but that are not discussed there?”

At present, Claude is inclined at times to be sycophantic, lazy, and occasionally less than honest. It is also extraordinarily well read. It requires close supervision, but when supervised it can be extraordinarily helpful.

Thanks everyone for your answers. I’m definitely going to give it a try.

JM