Extract text with particular formatting

Yes that should work well with bold!

However, I’m not getting bold mark-up.

Here is a sample TB document showing the problem on my end.

Sample no bold.tbx (83.3 KB)

All I did was…

  1. File > New (no special “starter” document or anything).
  2. Added one note.
  3. Bolded some text in it.
  4. Chose File > Built-In Templates > HTML.
  5. Turned on Text Pane Selector in the Window menu.

Screen Shot 2021-06-30 at 23.35.01


Here is your file for me:

A thought - I’m still on macOs 10.14.6. If the files’s style attribute settings are correct (they are in your text file) I wonder if this is an OS-related issue. I assume Tinderbox is parsing the RTF internally for use of a bold variant of a typeface and using that for bold sections. Perhaps something in the way Apple frameworks parse this in different OSs is an issue?

That’s possible. I’m running Big Sur 11.4.

Is the font in a Tinderbox document the same when opened on different machines? I assume this one is the default in Tinderbox 9?

Here’s your same doc’s Doc setting on my 10.14.6 system:

Same except my UK (locale-derived) date styling.

Opening the Fonts dialog via the ‘Text font’ button, I see:

Testing with your specimen export note, and the same Fonts dialog, then placing the $Text cursor in some bolded text shows as using ‘Mercury SSm Bold’.

FWIW, italic text uses ‘Mercury SSm Book Italic’.

Anyway, Ive reported this via a thread on the Backstage forum, referring back to this thread.

Ok, great. FWIW, I opened the same document in Tinderbox 8.9.2. and the bold isn’t marked up there either in the HTML pane.

Meanwhile, italic seems to work reliably for simple styling and extraction without getting involved in manual text markup codes, etc…

Thank you for this. I see how to use regex to populate an attribute with “the juicy bit” (text between my special markers). But I’m only getting the first juicy bit. What if there are multiple juicy bits, each set off with the same markers?

I assume you are using String.contains()—or icontains—these only pass back the offset of the first match.

I’d suggest the AppleScript approach is probably better.

The is no simple way of asking Tinderbox to pass you all instances of sections of $Text between two markers. For a start you actually mean between every odd and even numbered instance of the marks. IOW, text between marker 1 &marker 2, 3 & 4, etc. but not the text between markers 2 & 3.

At this point I’d re-think my annotation strategy, at least for future annotation, as you’re essentially relying on a non-existent feature.

Or, perhaps not-yet. (I think this idea of extracting highlighted parts of notes makes sense.)

3 Likes

@gleick What if there are multiple juicy bits, each set off with the same markers?

It turns out that with runCommand and egrep Tinderbox can quite easily extract multiple instances of text between two markers.

Here is a demo file:

Excerpts demo.tbx (92.1 KB)

The action code in the stamps is like this:

// specify starting and ending markers -- must be different //
var startmarker="<i>";var endmarker="</i>";
// put marked up Tinderbox export string into a variable //
var str=exportedString(this,$HTMLExportTemplate);
// use regex in egrep to extract matches -- each will appear on a separate line //
$Text=$Text+"EXCERPTS \n"+runCommand("egrep -o " + "'"+ startmarker +".+"+endmarker+"'",str);
// remove starting and ending markers -- do longer one first //
$Text=$Text.replace(endmarker,"").replace(startmarker,"");

Select the container folder containing marked up notes and run a stamp. The excerpts from the children notes will be placed in the text of the container note. The stamps can also be run on an individual note, in which case the excerpts will be appended to the existing text of that note.

startmarker and endmarker can be changed as needed. In the demo file I’ve tried italics, bold, and arbitrary starting marker of && ending marker of &&&. I find it much easier to select text and hit command-i (or command-b) than insert special markers.

This is more concise than the AppleScript upthread. But it was much fussier to debug. Plus AppleScript can be placed in the Script Menu and easily reused with other Tinderbox files without messing around with copying stamps, etc. AppleScript deserves more respect around here. :grinning:

Both approaches require the built-in HTML template to be present (File > Built-in Templates > HTML).

1 Like

And here is a cleaned up AppleScript that does the same thing, except it places the result on the clipboard instead of in a note:

-- specify the html tags that match the styling
set startingMarker to "<i>"
set endingMarker to "</i>"
-- for bold use <b> and </b>
-- can also use arbitrary markers unrelated to styling
-- startingMarker and endingMarker must be different from each other

-- get the Tinderbox export html
tell front document of application "Tinderbox 9"
	if not (exists selection 1) then error "Select the container and run again"
	tell selection 1 to set htmlStr to evaluate it with "exportedString(this,$HTMLExportTemplate)"
end tell

-- "chunk" the text to isolate the parts enclosed by the markers
set text item delimiters to {startingMarker, endingMarker}
set textItems to text items of htmlStr

-- gather the even items (those are the ones enclosed in the markers) into an AppleScript list
set extractedItems to {}
repeat with i from 2 to length of textItems by 2
	set end of extractedItems to item i of textItems
end repeat

-- convert the AppleScript list to text and place on clipboard
set text item delimiters to return
set theExcerpts to extractedItems as text
set the clipboard to theExcerpts
return theExcerpts -- to view in results panel

4 Likes

You are amazing. This is SOOOO cool. :slight_smile:

Problem: Collect all the special passages from multiple notes, where the special passages (no longer than a paragraph) have been set off by a marker, in this case #.

This seems to work:

I create an agent called Juicy Bits, which collects all notes containing the marker. This agent has the following Rule:

$MyList=collect(children,$Text.split(“\n+”));
$MyList=$MyList.collect_if(x,x.contains(“#(.+)#”),$1);
$Text=$MyList.format(“\n”)

So the Text of Juicy Bits displays all the special passages. (There might be some redundancy or cruft here. I’m a newbie finding my way.)

1 Like

I’m now convinced that there ought to be a way to collect highlighted passages. This will take a couple of weeks. (I’m not yet sure whether we really need to distinguish “red-highlghiter” and “yellow-highlighter”: opinions on this are welcome.)

2 Likes

I vote for red and yellow at a minimum as two most commonly used colors

Tom

That is exactly what the action code example and the AppleScript above do. Easy to use. Wonder if you’ve had a chance to try them?

I find it much easier to just apply a style via command-b or command-i via the keyboard rather than type in markers, but it will work either way (though the end marker should be different from the start marker).

@eastgate Will there there be a way to apply colors to text as easily as selecting and typing command-b or command-i?

There are shortcuts for coloured highlights (Style menu), but I believe coloured text still requires the OS Fonts palette. It’s worth looking at the app’s existing shortcuts as many choices are already in use.

Colored text has shortcuts ⌘^1-⌘^5 (Format ▸ Style ▸ Red, etc)

Colored highlighting has shortcut ⌘⇧-Y for the common case (yellow) and can have user-assigned shortcuts if you like.

Thanks for that! I’ve haven’t much ventured deep into those Format submenus so as to reach highlighting, and forgot that was there, I think partly because I once tried highlighting and found it illegible under a dark color scheme.

On my machine the built-in shortcuts for text color work well, now that I remember they are there.

I see there is now a $String.following, which grabs all occurrences.

Maybe $String.between would be what we are talking about here?

The latest backstage release has native support for extracting highlighted passages — not precisely the same as bold text, but pretty close. @gleick : would love to know what you’re working on, if you’re at a stage where you can talk about it.

1 Like

Thank you! Highlighting seems more appropriate than bold, for this purpose.