Where are the limits of exploding?

mwra · February 7, 2025, 10:58pm

I’m not saying this is the solution, but just tried the TXT files (Zip posted up thread). Dragged them into Tinderbox:

Expand those per-source note containers, and, tada!

One note per paragraph of the source note—instantly!

I guess the next thing to try is the same with RTF, as realise some folk want styled source text. Here’s the above reason, as proof. Zipped (as 18MB → 700 kb): 7200explode-1.tbx.zip (683.7 KB)

Anyway, no reason not to use Stream parsing but for the original scenario is it possibly more work to get to the stage as above. The experiment above is also a reminder that trying to do everything in one app sometimes pushes boundaries unnecessarily when simple alternates exist (as I myself keep re-learning).

Side aspect note mentioned, but might be pertinent as starting scenario was MS Word. Do the styles matter, and were there embedded images. The text in the test TBX was plain text (the same Lorem ipsum repeated 000s of times), so might not be a true replication of the start scenario.

hermeneia · February 7, 2025, 11:11pm

Thank you, Mark, good to know! And so intriguing attributes …

Now that I know that Tinderbox explode function was not built for this kind of task I also think that the solution should be to get the document splitted before the import to Tinderbox.

And yes, the styles matter in my texts; bold or italics or small caps are valuable information that must not get lost in the process. But that is no big problem if the DOCX comes to Tinderbox in much smaller chunks.

satikusala · February 8, 2025, 12:05am

@Oliver, I believe we covered this in Week 1 or 2 of the 5Cs TBX 101 course: if you paste a Word doc into Obsidian, Obsidian will convert the Word headings and style to markdown, e.g. (heading 1, 2, 3, bold, italic, etc.). Then, per this thread you have two options: 1) break the document up into smaller pieces, to 2) use a streamparsing stamp.

mwra · February 8, 2025, 9:00am

Ah. It turns out on closer inspection that the TXT file import reported above is not as I expected (apologies, it was late here and I posted the result without deeper inspection). So, my split files each hold 500 repeated single paragraphs each with the text:

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim ven.

What the import has done is take the first sentence and a bit more as the note title:

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident

and then parsed the rest into user attributes:

Why. I’m not sure. If I make a file with just one paragraph of the test text, it imports as a note names as for the file (without extension) and the contents of the file is the $Text.

Curiouser still, was the result of making a file with two short paragraphs:

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore.

as this imports as one note with the $Text as per the source file contents.

This outcome suggests that the length is the first line (paragraph) or possibly the source text overall is affecting how the fill import works. Bottom line, results are like inconsistent using this method as the parsing logic as to when to generate multiple notes is as yet unclear. Pinging: @eastgate

saqibhafeez · February 12, 2025, 9:26am

It sounds like Tinderbox struggles with handling that many exploded notes at once. You might have better luck breaking it down in smaller chunks—maybe 500–1000 at a time. Also, checking if indexing or agents are slowing things down could help. Have you tried tweaking settings or testing with a smaller dataset first?