Where are the limits of exploding?

hermeneia · February 7, 2025, 2:42pm

Last week I tried to drag and drop a MS Word document (about 280 pages, just text) into Tinderbox. Tinderbox crashed. No problem. DOCX has been and will stay a big problem. Fugeddaboudit.

So I converted the DOCX to Markdown and tried again. This time the file was accepted as one long note. Great.

Then I tried to explode this note, each paragraph should become a new note, altogether more than 7000 new notes. Tinderbox froze. I had to kill this process.

Then I made a new note with plain text in Tinderbox, just Lorem ipsum, with more than 7000 paragraphs to make sure that there was no non unicode gremlin in the note that could cause problems. Same outcome. Tinderbox froze. I waited several minutes. Nothing.

So it has to be the length of the note that makes problems.

Here comes my question: Where are the limits of exploding very long notes? What are your experiences? What works, what doesn’t? 1000 exploded notes? 2000? 3000? And are there other things I should be beware of when I want to explode notes?

I should mention that I used a M1 MacBook Pro Max with 64 GB RAM. Tinderbox used just about 1 GB of RAM. The power of my computer is not the problem.

Thank you in advance for sharing your experience.

eastgate · February 7, 2025, 3:18pm

Send your source document and precisely how you want to explode it to tinderbox@eastgate.com, together with any crash logs.

I don’t think there are any limits. But explode is likely asymptotically O(n^2) for large numbers of notes, and 7000 notes is a very large Tinderbox document indeed.

My first suggestion would be to do this on a small chunk — maybe 1/10 or 1/25th of the whole document? Does that work OK? If so, try a first explode into 10 or 25 chunks. Then explode each chunk.

hermeneia · February 7, 2025, 3:44pm

Thank you for your fast answer and your offer! I am not allowed to send you this document, it is not mine. I work for publishing houses …

But since I have tried it with plain text in a Tinderbox note, everyone can easily try it with Lorem ipsum text. Just get one paragraph of plain text in a note, select all, copy and paste it a few times, select all, copy and paste it a few times … until you have 1000 paragraphs. Copy and paste them seven times, voilà, a matter of seconds.

Explode it to paragraphs, nothing special, each paragraph a new note.

hermeneia · February 7, 2025, 3:49pm

Here is a file to test (about 8 MB).
7200explode.tbx (8.2 MB)

hermeneia · February 7, 2025, 3:52pm

And here is my system data while Tinderbox is working this explode:

hermeneia · February 7, 2025, 4:21pm

Okay, after 35 minutes explode time Tinderbox uses about 5 GB RAM (still no exploded note):

mwra · February 7, 2025, 6:02pm

I’ve ‘broken’ Explode several times in the past, memorably an explode of 0,000s of items that IIRC took overnight to run. So, I’d echo the ‘go in small chunks approach’. “But why waste my time splitting it?” we cry, “It’s the app’s job to do app that”. Yes, and no. The underlying issue is how the process works. We don’t care, but enlightened self interest suggests we should.

The Explode is doing a regular expression, even by default (splitting on line breaks, \n). WE imagine the app reads the first segment we want, saves it out and reads on. Actually it reads the whole input and that does all the work on it. no big deal for small amounts of text. Toss in a big text and there’s a lot of work behind the curtain.

Being a glutton for punishment, I tried the test doc (a TBX with one massive^† note). the TBX opens OK on my m4max/64GB RAM MBPro. It took a good few seconds for the explode dialog to appear. Why? Because the dialog runs the process is theory once as it reports the number of notes detected … 7,200 discrete notes! Press ‘Explode’ and Tinderbox has gone for a good think. Although various things like Activity Monitor report the app as responsive, it just seems buy: “you want me to do all this and talk to you?”. Off to support. I’ll leave this running. I suspect it will complete, but how long it takes I’m unsure. The less powerful the Mac the greater the likelihood that it is just all too much.

†. Tinderbox prefers small notes.

hermeneia · February 7, 2025, 6:18pm

After two hours nineteen minutes I have killed the process while Tinderbox wrote 21 GB of data into RAM without producing one single new note. Maybe this is the problem: Tinderbox seems to try build do all new notes at once. Maybe it would work if it would make just one note after another?

I don’t need to explode large notes regularly, so this is not a problem for me. I just want to know what other users experienced with the limits of exploding.

And I may be wrong, but it seems to be a Tinderbox memory administration problem. The task is not difficult, the data is plain text, thousands of notes is within reach of Tinderbox as I understand it. Maybe it is just something in the internal preferences that prevents Tinderbox from doing this.

hermeneia · February 7, 2025, 6:30pm

A last idea to this: In old times operating systems like ATARI TOS loaded apps and documents into RAM. That was good for speed, bad for stability; I cannot count how often I have lost precious hours of work. But everything was superfast, as fast as today. Could it be possible to start Tinderbox in such a RAM mode just for time critical tasks like this 7000 notes explode task to get it done? You see, I am not a programmer, just a user with tasks and ideas.

mwra · February 7, 2025, 7:03pm

I’d beg, politely to differ. Like you, I’m not a (trained) coder. I’m essentially an information emergency plumber. I use code to get things done, but it’s not my interest, training or expertise. I note the latter because I still respect the skill of coding—one I don’t have.

My reason to demur is we assume everyone is doing what we’re doing: but it is not so. We might counter that we don’t care. we might ask the app to not accept ‘too much’ info. Simple for us lay folk to assert but less obvious from the app’s side of the digital curtain.

Given the evident limit here, I’d suggest a possible workaround. This in terminal:

split -l 500 filename.txt

splits the file ‘filename.txt’ into N files of 500 lines each. Those can be dragged into Tinderbox and exploded individually. Yes, there isn’t an automated explode, but because that is to avoid the above where we throw in more info than the app can (quickly) manage. I’ve not tried the latter as I’m leaving my Mac chewing on your big test overnight (in UK) to see what results.

HTH

mwra · February 7, 2025, 7:35pm

I ran this and got 15 text files withuot extensions, then:

find . -type f  ! -name "*.*" -exec mv {} {}.txt \;

and now txt files (to help with Tinderbox import. Here they are:
Archive.zip (30.3 KB)

I can’t test as Explode test still running. But if you unpack the ZIP and drag the items to Tinderbox, you should be able to explode them.

eastgate · February 7, 2025, 7:53pm

Keep in mind that Explode was originally designed for a community college writing exercise in which students took a big 5-paragraph article and exploded it into small chunks, which they could then rearrange.

Here’s another idea: is Explode really what we want here? Perhaps using the streaming interface would actually be more straightforward…

PaulWalters · February 7, 2025, 7:56pm

Perhaps a method would be to give the document to Claude (which is better than ChatGPT for this sort of thing), instructing Claude to create break down the document into paragraphs, and asking it to read it back out as a .tsv or .csv file to be dragged into Tinderbox.

hermeneia · February 7, 2025, 8:33pm

Thank you. Yes, there are possibilities to split documents outside of Tinderbox, and that is good.

I just wanted to know where the limits of the explode function in Tinderbox are. So the first task was five notes. Explode does a lot more, great.

With a combination of Tinderbox split function and find function with RegExp search I managed to do splits with Keyboard Maestro makros. It works and it is pretty fast, requires manually search definition, though.

hermeneia · February 7, 2025, 8:33pm

What is the streaming interface?

hermeneia · February 7, 2025, 8:39pm

For my texts ChatGPT or Claude are no options at all (servers are just the computers of other guys that you not know). My texts are private. Maybe a locally running LLM (e. g., something in Ollama) could work.

mwra · February 7, 2025, 9:54pm

I believe it refers to this: Stream Processing and parsing

BTW, I force quit my explode at 54GB RAM used and other process (which I need running were suffering. No idea how far the process was through by then!

satikusala · February 7, 2025, 10:44pm

Yes! I think this is the answer. Oliver, this is/will be a great topic for next weeks’ 5Cs Mastering Tinderbox lesson, but we can also discuss it here on the thread. I’ll try to do a write-up on this.

hermeneia · February 7, 2025, 10:49pm

Thank you for the link! I didn’t know this streaming function.

hermeneia · February 7, 2025, 10:53pm

Thank you, Michael, if you think this is useful, go for it! :o) As I said, I don’t need this all the time and just wanted to know what other users experienced with the explode function.