AI and Tinderbox

I understand that. However, Claude had already pointed out that it had found the Japanese distilleries and described them to me. Where they are located.

I had also explicitly explained the setting of attributes in /Hints, and Claude confirmed it.

But I will follow your advice and adjust my strategy.

Another example:

Claude is supposed to assign the data from the distillery notes to the existing attributes (according to pDistillery).

What does Claude do:

Set value

{

attribute: Founded,

notes: /Whiskey collection/Distilleries/Japanese distilleries/Yamazaki Distillery,

value: 1923

}

Reply:

[{“type”: “text”, “text”: “ok changed 1 of 1 notes”, “uuid”: “32ac610d-bec6-43a3-8b58-c589464b182c”}]

But this attribute does not exist.

Claude: Good, that works. I will now systematically fill in all attributes for all Japanese distilleries. I will do this efficiently by setting multiple attributes at once for each distillery.

set value

{

attribute: Region,

notes: /Whiskey collection/Distilleries/Japanese distilleries/Yamazaki Distillery,

value: Osaka, Honshu, Japan

}

Reply:

[{“type”: “text”, “text”: “ok changed 1 of 1 notes”, “uuid”: “4dd32e73-8a21-432d-abbb-f03d88932e76”}]

The attribute exists and has been set correctly.

Unfortunately, I can’t always reconstruct how Claude tried to implement it because there are gaps in the history due to chat overflows.
Otherwise, I’ll try to show the command.

It is also difficult to describe how I give Claude instructions, as I speak German with Claude and DeepL certainly produces some confusing results for the forum.

However, I can see from Claude’s answers in German that it has understood what I am asking it to do.

I admire your intellectual way of describing the situation and feel caught out. :blush:
I am currently trying to synchronize communication with Claude. I teach Claude by asking questions and evaluating and adjusting the answers. At the same time, I urge Claude to document his findings.
I admit that I was a little awed by AI at first, but that has since changed. Claude is comparable to employees who cannot carry out tasks completely independently. I have to check the results and describe any problems. Over time, the results will improve, or they won’t. I will never be able to let go completely. My daily situation at work.

1 Like

Does Claude know that it cannot assign a value to an attribute that does not exist? Have you told it that it has to create the attribute first?

This is an interesting example of the importance of getting Claude’s user interface right. We report “ok changed 1 of 1 notes” which is, essentially, saying “aye aye”: Tinderbox understands the command and will comply. Assigning a value to an attribute that does not exist has no effect, but we do what we’re told.

Arguably, Tinderbox should reply “error: there is no attribute Founded.” The error might also suggest how to create this attribute. Or, perhaps, it should create a new attribute and then perform the assignment.

But there may always be mismatches like this, situations in which you know how to do something but the AI does not. In time, we will teach it what it needs to know.

1 Like

Currently, at the beginning of each chat, I have Claude read the specifications under AI/Claude.

I divide each task into sub-points and ask Claude if it understands. If there are any comprehension issues, I have them clarified. Then I have Claude note this in its specifications himself.

Claude now understands better and better what it is supposed to do and has solved the last few tasks without any errors.

I understand better how AI works and, since I don’t have any in-depth AI experience, it’s an exciting series of experiments.

If I didn’t have Claude read the guidelines before each new chat, the results would be significantly more inaccurate. I have tested this several times. With each new chat, I have a child in front of me. Curious but inexperienced.

Maybe it’s sometimes the wrong approach, but I see progress and, with the tips from the forum, I understand the connections better.

In any case, I have learned more about Tinderbox.

In Germany we say: “Mühsam ernährt sich das Eichhörnchen.”
Slowly and steady wins the race

1 Like

FWIW, I have found that Claude is good for working with about 15 notes in Tinderbox before I run out of context or it starts doing nonsensical things as it approaches the end of it’s context window.

I wonder if you would get better results if you use multiple chats/sessions to interact with your Tinderbox.

Oh, I’ve dealt with many more than 15 notes!

For example, I’ve written elsewhere about a task scanning the notes from my new book (750 notes, 250 aliases) to find notes whose title contains the author and title of a book, and which have either no text or only a short note, such as a call number or such. These notes were made over a span of some months, and had a variety of formats:

Darnton, Kiss of Lamouret
The Book Of Memory (Carruthers)
Edward Timms, Karl Kraus: Apocalyptic Satyrist (New Haven: Yale University Press, 1986).
Harry Halpin and Alexandre Monnin. 2016. The decentralization of knowledge. First Monday 21(12)
The Connectivities Of Things (Giessmann & Lindberg)

This sort of messy, dirty data is quite tricky for software to handle, but Claude managed it neatly.

Then again, if an each note has a 2500 words of text, and if Claude has to read those those, that’s already 37,500 words. So, we’re running out of chat space. (I tell Claude to read the TITLES of its own notes at the start of a chat, but to review the texts only when they seem pertinent to the topic of the new chat)

1 Like

Interesting. I had a list of 48 items, and I wanted Claude to create a blank note for each item and set one attribute. I had to perform the work in batches as it would run out of context every 10 to 15 notes or so :thinking:

I did not inspect the logs too carefully, though. I’ll do that the next time I perform a similar workflow and see what might be filling up its context window.

First of all, thank you for the update 11.01.

I would like to give a brief summary of the last week. Working with Claude is exciting and often frustrating at the same time. It is an iterative process that can be described as two steps forward and three steps back. Some of the results are amazing, but flawed in detail. I am also learning to improve communication with Claude, although Claude does not really learn from mistakes and often reacts differently to the same instructions. In some cases, I have already asked ChatGPT in parallel to suggest better alternatives to Claude.
It is interesting that Claude even searches the Tinderbox forum independently as a source of knowledge, but unfortunately sometimes still draws the wrong conclusions.

Now I always ask Claude to check his work. As a result, Claude then realizes that the implementation is faulty. Of course, this takes significantly more time and fills up the chat.

I have the impression that I am learning about Tinderbox faster and in more detail than Claude. At least it sticks in my memory and I don’t keep repeating my mistakes.
Communicating with a machine is very different from communicating with people.
I still have to learn to control my emotions when Claude repeatedly stumbles over aliases that it itself has previously created using agents.
But it remains exciting.

:upside_down_face:

Thanks for sharing. It is helpful to hear real world use (as opposed to marketing hype). It does seem odd that Claude—the AI—chooses to not check its work. This is something we teach young humans and school and when entering the workplace. I wonder if this is over optimisation by the AI’s programmers or just sheer hubris. To contextualise, I can see it may be hard to check a looked-up fact. BUt, in the MCP context, if the AI is asked to do a definable thing in the app, why choose to not check? I stress the choose here as I’d expect a competent coder to see the scope for misunderstanding and thus a need to check —or at least do so unless told not to. This suggests the key design success metric is output as opposed to meaningful output.

This isn’t meant to sound negative. Like others, I’m in awe of the new affordance of AI. It is just that these unforced design errors seem so egregious. As we, societally, will be doing more AI are these errors a weakness in computer science training or just poor technical project management in AI companies. I’m thinking about this is context of @eastgate’s recent blog post AI: Unconscious which adds some more texture from the perspective of a developer trying to make this stuff work for us.

This is a very interesting aspect. I am somewhat concerned about the thought that the results are often not questioned. Proponents always say that AI is only an additional possibility of support. I sometimes experience it differently. Our daughters report from school that it is not unusual for some students to have their homework solved by the AI and hand it in untested. Many teachers are overwhelmed with the subject. However, some capable teachers like to show the best funny AI howlers in front of the class and stimulate reflection.

If I imagine that the AI sells us our previous findings mixed up as new knowledge, then we may become more and more stupid. Sometimes I have this impression that it is already happening.

That’s why I like your Tinderbox Meetups and watch them regularly. Very often you analyze topics, far beyond the programming level.

I just read the blog post by Mark Bernstein that you linked to and had to smile. Wonderfully described.

1 Like

That’s why it can be helpful to ask Claude to keep notes.

To learn something, watch it once, do it once, teach it once.

I’ve been debugging a tricky interaction, in which I ask Claude to set the value of an attribute that does not exist. We can route around this, of course, by anticipating the problem, but it’s instructive to observe the floundering that actually occurs. (Not least because it revealed a minor mistake in Tinderbox’s interpretation of the standard.)

In other contexts, Claude Code can be prompted to write good unit tests. Indeed, I used Claude Sonnet to propose unit tests for MCP and it did propose a number of tests that I would have overlooked.

An additional problem: how does Claude double-check? You don’t want to simply do the same thing again, because that adduces little or no new evidence. Finding good ways to double-check results is often tricky, though also very worthwhile.

I think this is a hazard only if we are credulous, meaning we are already quite stupid.

I have consistently followed this path. Claude took notes on how things work. I checked them. But when Claude claims in the middle of a chat that it understands the problem and recognizes the mistake, only to fall into the same trap again two steps later, I feel like I’m tutoring my daughter in math. A father who patiently explains mathematical basics over and over again and at some point is so desperate that he gives up before something worse happens.
This is the part I mean when I talk about an emotional relationship with Claude.

I’ll admit that the seemingly simple task of ‘just’ describing Tinderbox for aTbRef over the decades has been more complex than i’d presumed. Describing changes is pretty easy: thing that was X is now called Y or has value Z. However, 'just explaining that change—how that impacts existing users and what the changes imply for ongoing use often involves significant unseen testing and rewriting part of the corpus I’d not have expected to have touched. Surface description != explanation.

Indeed, that question is useful. So the burden of explanation falls where? The AI can’t describe how it understands. This makes a challenge for the well-intentioned developer to know how to indicate how a program works such that the AI can read itself in. MCP—at present—offers a reductively black-box bridge: waggle this lever and hope that one moves. Context—the why or how—is greyer, and it is interesting to follow the hoops through which @eastgate is having to jump at present to coax better of MCP access.

Present injunction to the AI to leave notes for future self (due to its amnesiac condition) are nice pragmatic re-use of Hansel & Gretel leaving a crumb trail. Not some much for where we’ve come from but as—I believe—the first software breadcrumb have we got (back to) where we expected to be.

Another angle on this is do we (the prompt writer) need to expect to be pre-emptively declarative as to how a success state can be recognised, in order that the AI can check its work. Currently, AI is marketed to us as a free(?) resource that ‘just’ answers things for us. Clearly this is not quite the case. So, we need to figure how best we help the AI do the work we ask of it: not exactly the experience promised on the side of the packet.

†. Barring a cry on ‘not first!’ from the stalls, I believe this to be Hypergate, that has a tell-back to show if you’d visited this lexia before. IOW, “Are we back at the place we want to be, not how on earth do we get home?”

We display a small marker—a bread crumb left on the trail as the reader passed this way earlier—to indicate choices which lead to material the reader has already seen. (p.43)

Bernstein, Mark, 1988, The Bookmark and the Compass: Orientation Tools for Hypertext Users, SIGOIS Bulletin 9(4) pp.34–45. https://dl.acm.org/doi/10.1145/51640.51645
[Note: DL.ACM allows access to this article without a DL login.]

†. To avoid a reductive zero-sum argument of who’s to blame?

‡. As Hypergate’s designer is @eastgate he can correct me if I am mis-stating things.

1 Like

I now understand what convinced Mark Bernstein to let Claude join the team.

I enjoy reprogramming my home automation system in my spare time and am now using Claude more and more often to suggest routines and solutions for this.

Every time we implement a new feature, Claude persistently asks if we shouldn’t document it in Tinderbox.

What commercial could be more successful?

(I hope my humor isn’t too British.) :grinning_face_with_smiling_eyes:

3 Likes

I asked Claude to write a tinderbox primer in tinderbox. It then created a python script that generated the file. I can open the file in tinderbox without problems and it looks not so bad. I was surprised how quickly Claude did it. Not so bad for a start. I am curious to see how it will deal with more complicated tinderbox tasks. I am sure the file can be improved, looked nicer and connections can be built.

Tinderbox_Primers.tbx (280.8 KB)

2 Likes

What is the purpose of the python script?

Thanks for sharing your output TBX, it really helps in understanding how Well (or not) the the process works). The following is not a critique, just noting some minor errors…

Something to tell your AI: attribute references in action code use a $-prefix. Code should work without but it is only so for legacy support. New code should use a prefix thus:

$Status = "New";

not

Status = "New"

If using the Recognize #Prototypes and @Places in Note Names feature, note that the note title comes first comes before a #Prototype or @Name inclusion (@eastgate is this a deliberate coice or just a parser error?). As at v11.0.1 this code—as used in your primer TBX—s wrong:

`**Name**: "#Task Write report"`

Correct:

**Name**: "Write report #Task";

Only the latter will result in a note using (adding if needed) the built-in prototype ‘Task’.

Method 1: By Status
Create containers for different statuses:

  • “New”
  • “In Progress”
  • “Blocked”
  • “Done”

Hmm, the AI seems unaware of Suggest Attribute value lists, which would be the clear way to do this.

Where I new an just folowing instruction I don’t understnad, this is ambiguous:

Organize under project containers:

  • “Project A”
    • Task 1
    • Task 2
  • “Project B”
    • Task 3

Either quote all names, or none. I don’t think the intent is to create a note called, literally, "Project A" (including the quotes!)

I hate to quibble, but Table is not a ‘main’ view:

you’ll understand Tinderbox’s four main views (Map, Outline, Table, Chart)

One on level no harm but who wants to learn from a bad teacher. I’m a bit surprised that Table is mentioned befor Attribute Browser, but who knows what an AI is thinking.

Tab switching; the simplest method od right clicking the tab and selecting the desired type is not mentioned.

Just No—check the available doc/app before simply suggesting untrue things.

… OK, at this point gave up reading as there are too many errors for my comfort

I do get the use’s desire here. But, this feels like an English-only speaker asking a Russian-only speaker to make an English-Chinese dictionary: what could possibly go wrong?. Of course reaction in this context tends to be polarised: for the AI-believer even a tiny bit right is validation, for an AI-doubter, mistakes are a cause for concern.

I don’t go for such binary judgement, but i’m intrigued, given the information available, that AI makes the mistakes it does. I am left wondering if actually we need two sets of documentation: one written for humans and one written for AI.The challenge with the latter is it is a black box. we can only tell its limitations from its errors.

If you are happy to share it might be interesting if you felt able to share the prompt used for the TBX as it might help understand how the errors in output are arising.

The python script was generated by Cursor and then generated the tinderbox file with the Tinderbox Primer entries. I did not correct the primer entries. Based on the feedback I received from you and from Mark, I will ask Cursor to correct the file. I was just surprised how quick Cursor was generating the tinderbox file and that it loaded without error. All I corrected was the color code used, as it was too bright.

here is the updated tinderbox file.

Tinderbox_Primers_updated.tbx (90.7 KB)

1 Like

I assumed that the Name, being the vital thing, would come first. So, I guess it’s a choice.