Learning About Mistakes

eastgate · October 1, 2025, 4:16pm

A very useful contribution right now is to identify, when Claude makes a mistake when using Tinderbox:

What Claude thought would work
Why it didn’t
How you explained to Claude what it ought to do
What happened then

For example, someone recently reported a problem arose when Claude tries to set the value of an attribute that did not exist. You can see why it might think that would work; either it assumed the attribute did exist, or it expected that the attribute would be created for it automatically.

This is easy enough to get passed, but I think it exposes a general weakness that Tinderbox might remedy by improved tools, or even improved error handling.

Estomm · October 2, 2025, 10:07am

I asked Claude to search for the URLs of 69 references and insert them into the text of each note. It stumbled over duplicate names and spaces in path names. I later realised it actually knew how to use the ID, but that idea didn’t come to it until I suggested it.

image1320×2148 297 KB

It still helps to know how to use TBX. Another example: I asked Claude to make 76 notes named Question_1 to Question_76. A few minutes later it was still only halfway done, so I killed the process. After restarting, I told it that it would be quicker to make a single note with the items separated by ###, then use Explode. Claude said it couldn’t execute Explode, but it did create the note, and I exploded it myself.

eastgate · October 2, 2025, 1:51pm

That’s interesting. I think we might be able to expand the create_notes tool so Claude could do this in one tool call instead of 76.

Another approach would be an action: 1…76,each(x){create(“/questions/”+x);}. But extending the create_note tool should be reasonably straightforward and might save people a little time.

That said, I think Claude Desktop spaces tool calls out be ~3 sec. Even if it’s 5sec, this should complete in 6 minutes, which doesn’t seem that bad.

Usit · October 8, 2025, 7:43am

Claude always falls into the trap of not knowing the difference between notes and aliases. It regularly includes the aliases generated by the agents in its analysis, producing nonsensical results.
I just can’t seem to train it out of that habit.

eastgate · October 8, 2025, 2:10pm

That’s an interesting point, and one that is deeper than it might seem at first glance. In particular, a note is inside(X) is the note is a child of X, or if an alias of the notes is a child of X. That’s very useful for systems of agents, but it complicates the semantics.

I think Claude’s tools may not make the alias/original distinction sufficiently clear, and that can likely be improved promptly.

Later: b741 and subsequent releases will take steps to clarify to Claude the distinction between a note and one of its aliases.