I have a note that is 5,000 words long. I’d like to split it up using a stamp into 5 notes, each being 1000 words long. Any suggestions on how to do that? Thanks so much.
For •one• note, I’d do this by hand, using cut and paste.
For a note every month or so, I’d do this by hand-placing a unique delimiter character (say •) at suitable intervals, and then using Explode…
For notes like this all the time, I’d write a stamp. One easy approach will get things roughly correct:
• make a numeric local variable pos. Set it to 1000.
• if pos exceeds the length of the text, set it to the length of the text
• look at the character at position pos. If it’s not a space or a carriage return, advance pos to the next space, carriage return, or the end of the text.
• grab the substring from 0 to pos; it’s the first chunk. Make a note, and save it there.
• delete everything from 0 to pos.
• set pos to 1000, and go back to step 2
If you’re wondering if the Tinderbox Explode feature has a character-count based option, it doesn’t. But, this probably explains why…
Is that really the case. If word # 1,000 is in the middle of a sentence, will the sentence be spilt across two notes? My question isn’t flippant as bears on how you decide to split the note.
Perhaps you’d do better to find the first sentence (paragraph) break, after the 1k point and stored all $Text before that in a new note. Then take the residue and do repeat until the residue is less than 1,000 words.
Tinderbox action code doesn’t have a method to iterate sentences within a string (such as $Text), but it can do so for paragraphs.
…and I see that whilst typing, @eastgate has posted an entirely string based solution so I’ll stop there and just note that to get the character at pos you could use String.substr(startNum[, lengthNum]). There’s invariably more than one way
to do most tests, but that might help get started. Per my comment about splitting sentences, if going the paragraph based route, amend @eastgate’s test so as to look only for line break ("\n"
) or source text end.
It’s worth noting that String.wordList() does a different function to that of String.paragraphList(). The former returns a nouns in the source string, the last a List object holding a string for each discrete paragraph of the source string.
On second thought, I quite like @mwra’s solution! Grab each paragraph and copy it to the destination note. When the destination note exceeds 1000 words, make a new destination note. That’s much cleaner and nicer than my approach.
Try:
var:string vSource = $Text;
var:number vChunk = 1000;
var:number vSourceSize = $Text.wordCount;
var:string vNewNote =;
var:string vNewNotePath =;
vNewNote = $Path +"/Split notes/" + vSource.words(3).trim(punctuation);
vNewNotePath = create(vNewNote);
$DisplayedAttributes(vNewNotePath) = "WordCount";
if(vSourceSize < vChunk){
$Text(vNewNotePath) = vSource;
}else{
vSource.paragraphList.each(aPara){
vSourceSize = $Text(vNewNotePath).wordCount;
if(vSourceSize > vChunk){
vNewNote = $Path +"/Split notes/" + aPara.words(3).trim(punctuation);
vNewNotePath = create(vNewNote);
$DisplayedAttributes(vNewNotePath) = "WordCount";
$Text(vNewNotePath) += aPara + "\n";
}else{
$Text(vNewNotePath) += aPara + "\n";
}
}
}
See my test TBX: chunktext1.tbx (441.2 KB)
Issues for the use to polish:
- Getting the desired title. Tinderbox action code can’t access the algo used by Explode to detect the first sentence. Using
String.words()
includes any contained or training punctuation. - Tinderbox is not quite giving a clean iteration of paragraphs. See the
/log
note that iterates the first 3 words of paragraphs in the $Text of/Source
There are no hidden line breaks/control characters I can see (though just today I found ‘hidden’ (non-printing/displaying) soft hyphens in some text I’d pasted into Tinderbox that confused the app. In the emo file, new note 4 ought to start with an animal name†. - The strings in
.paragraph
list are supplied ‘bare’, so the user has to join them with an explicit line break if concatenating to text. Sadly, the resulting $Text fails to adopt normal rules so the larger line breaks between paragraphs are lost. For now, select all the $Text of a new note and manually apply menu Format ▸ Style ▸ Rest Margins
@eastgate I’m not sure if I’ve diagnosed correctly on the cause of the edge cases on the last 2 bullets above.
Another reason to use paragraph over words is Action code doesn’t have a way to iterate the words in a String. String.wordList()
does something quite different.
†. The eagle-eyed reader will be correct, in my quick-and-dirty animal names to start paragraphs I omitted '‘G’ and ‘L’. Meh. The reason for adding those extra words was because as I was using cod latin (‘Lorem ipsum’) text the same word sequences repeat too often and so I needed a better tell-back as to exactly where breaks were occurring.
Thanks for this @mwra . I took it for a spin and it works perfect! What a timesaver. Thank you once again.
Whilst I saw from my brief test, a few edges, I think the code does the greater part of the task. Anyway, apart for the folk who just cut’n’paste from AI one is always going to have to review the splits to if title or content need a tweak.
I forget to mention the demo’s stamp hard-wires 100 (words) as the chunk size at line #2:
var:number vChunk = 1000;
but you could easily put that in a Displayed Attributes’ value in a Config note (i.e essentially as a global variable) if doing this a lot and wanting the use differing chunk sizes per stamp use…