Suggestions on Extracting Highlights + Bold Text + Italic Text from $Text of a note

This common task should be easy for me but I feel stuck.

Simply
I have the $Text of a long note which I read and annotated. I have highlighted text (2 colors), italic text and bold text that I have done on multiple readings.

Now I would like to create a summary using my Markup:
Highlights as the headings
Italics and bold as points under the headings

Extracting Highlights_bold_italic from Text.tbx (85.9 KB)

Tom

Have you looked at String.highlights

Italicised and bolded text aren’t findable in the same manner. But as you only use 2 colours at present, you could easily adapt your method to use 2 of the 3 highlight colours you don’t currently use, and then you can extract the 4 different types of ‘highlight’ you are applying.

Even then, String.highlights returns a list of highlights of all or a given colour. What you are presuming is an $Text stream output processor which is a feature request I’ve not previously seen. IOW, the $Text is rendered for export (or preview window) depending on where highlights are detected in $Text. At present, I don’t think that a tractable approach in Tinderbox so, today, I’d try an approach closer aligned to be queries can find.

IOW, I read that you want a putative stream processor to ‘read’ the $Text and on encountering a highlight set the export code stream to enclose the highlight with an HTML heading level (dependent on the highlight colour?). Then in italic or bolded (or both?) are encountered export those $Text substrings as bulleted items

But, what defines under? For this this sort of logic to work, you need to address the edge cases. What happens if text occurs between the end of a highlighted substring and a bold/italic section. You logic doesn’t account for that. Why are bold and italic both exporting to the same mark-up when they have been marked differently at source? And so on.

In terms of what you can do today consider using auto-headings instead (as well as?) coloured highlights.

HTH :slight_smile:

@TomD, as @wmra points out the “rtf” layer, can be a challenge to parse. Yes, it is visually pleasing, but difficult to operate against. There are a couple of approaches here. I think a more long-term and sustainable process would be to use markdown notations to block out your highlighted text. Tinderbox could then be instructed to easily parse these notations. You could also use the “highlighter” functions to color-code your notations, which can be easily reversed when you want to.

To help with the notations I would use Text Expander and a supply that would take and replace copied text on the clipboard with the text surrounded by the desired notation. If this does not make sense, I can explain later.

Part of the challenge is the desire to map pretty looking $Text to export using means only appeared in the visual render.

The start point for export planning needs to be, can I I identify '‘this’ style using action (i.e. query) code. If you can’t the export plan is dead in the water. As we can’t identify a run of bolded text, the original plan fails—in the terms of the plan—at that point. That’s not to disparage the plan or the need but simply to waste time trying things we can predict won’t work.

Question: Would there be a way to combine elements of String.highlights + this script from Sumner Gerard from here: Extract text with particular formatting - #8 by sumnerg

Tom

How long is your text?

How many of these texts do you have?

Hi Mark

I have hundreds of these texts. The $Text are not long but my normal process for years in reviewing my reading has been to highlight with different colors in my initial review, usually yellow, red, green and blue. Then, when I re-review my note (sometime later), I will bold the text and if I review the note a third time, I will italicize the text. For years, this has worked well and helped me understand almost like a cheap man’s version of a timeline, in which review (initial, 1st review or second review) I felt impressions and importance to the text.

Up to now, I have been extracting these summaries manually, but thought it was time to automate. I would think others use a mixture of highlighting and bold, italics as well during their reviews, which is what prompted me to ask the original question.

Tom

As Tinderbox has AppleScript support and AppleScript I believe can handle finding sections of styled text, it might be worth checking some AppleScript forums for suitable code.

However, the problem part is it seems necessary to retain the in-text order of the styled sections. If font size/face are consistent, then it probably should be possible to find and delete all non-styled text leaving an original-ordered list of styled sections, these could be written into a new note with appropriate added mark-up if the report structure is not realisable via export code/HTML alone.

Tom, here’s a bare bones example that may do close to what you want, at least save some of the work.

tell front document of application "Tinderbox 9"
	if not (exists selection 1) then error "Select a note and run again"
	tell selection 1
		-- extract the highlighted strings from the note using TB action code
		set yellowHLStrings to evaluate it with "$Text.highlights(\"yellow\");"
		-- get html from which bold and italic markup strings will be extracted
		set htmlStr to evaluate it with "exportedString(this,$HTMLExportTemplate)"
	end tell
	set italicStrings to (my getStyledExtractsFrom:htmlStr startingWith:"<i>" endingWith:"</i>")
	set boldStrings to (my getStyledExtractsFrom:htmlStr startingWith:"<b>" endingWith:"</b>")
end tell

-- assemble an output string
set outputStr to yellowHLStrings & return & italicStrings & return & boldStrings

-- remove/replace markup as desired to create a simple "report"
set outputStr to my replaceThese:{"<p>", "</p>", "<i>", "</i>", "<b>", "</b>"} withThese:{"", "", return & "Italicised:", "", return & "Bolded:", ""} inText:outputStr

-- send to clipboard so can be pasted wherever
set the clipboard to outputStr
return outputStr -- to view in results panel

-- handlers
on getStyledExtractsFrom:fromText startingWith:startingMarker endingWith:endingMarker
	set text item delimiters to {startingMarker, endingMarker}
	-- "chunk" the text to isolate the parts enclosed by the markers
	set textItems to text items of fromText
	-- gather the even items (the enclosed bits) into an AppleScript list
	set extractedItems to {}
	repeat with i from 2 to length of textItems by 2
		set end of extractedItems to startingMarker & item i of textItems & endingMarker
	end repeat
	return extractedItems
end getStyledExtractsFrom:startingWith:endingWith:

on replaceThese:aListOfSearchValues withThese:aListOfReplacements inText:someText
	set {s, r, t} to {aListOfSearchValues as list, aListOfReplacements as list, someText as text}
	set tid to AppleScript's text item delimiters
	repeat with i from 1 to count s
		set text item delimiters to s's item i
		set t to t's text items
		set AppleScript's text item delimiters to r's item i
		set t to t as text
	end repeat
	set text item delimiters to tid
	return t
end replaceThese:withThese:inText:

I tried to use the generalized form of `$Text.Highlights` (without specifying a color) but that doesn't seem to work. So if necessary you may need to add statements to specify other colors than yellow if you are using them.
3 Likes

Thanks for that. I spent a couple of hours dusting off my rusty AppleScript this AM and mainly found things I couldn’t do. I like you clever way to get at the bold and italic strings.

Sumner, you and Mark x2, Michael and everyone else on this forum are my heroes!
Thank you for your great work and generosity.

Tom

Hi Sumner
Two questions in the included sample file.

TestingExtractingMetaFromText.tbx (82.6 KB)

  1. How do I add a new line separator between highlights, bold and italic formatting?
  2. I assume I would add a similar section for red, blue and green highlights?
  3. Is there a way to add sections to the script… Example below…

“Italic (top layer)”

“Bold Section” (2nd from top)

Red
Green
Yellow

“Highlights section” (at the bottom) lowest level

Thanks again for your assistance
Tom

#1. The ‘return’ in the code above is AppleScript inserting a line break into the text. So a return is like a \n in action code. If you need two such to give you enough separation I’d try:

...& return & return & ...
1 Like

@TomD, this should be closer to what you want, organized so that it should be easy for you to tweak and reorder the output.

However, it doesn’t know that you probably intend for “Reaching out…” to come right after the italicized “develop a plan…”

It simply collects items for each “section” and then presents the sections in the order you specified.

-- settings for spacing in "report" - adjust as needed
set sectionSeparator to return & return & return -- lines between sections
set listDelimiter to return & return -- used to separate items in a list when more than one item in a section
set spaceAfterHeader to return -- spacing after each section label

tell front document of application "Tinderbox 9"
	if not (exists selection 1) then error "Select a note and run again"
	tell selection 1
		-- extract the highlighted strings from the note using TB action code
		set text item delimiters of AppleScript to ";" -- TB uses ; to separate list items
		set redHLStrings to text items of (evaluate it with "$Text.highlights(\"red\");")
		set greenHLStrings to text items of (evaluate it with "$Text.highlights(\"green\");")
		set yellowHLStrings to text items of (evaluate it with "$Text.highlights(\"yellow\");")
		-- get html from which bold and italic markup strings will be extracted
		set htmlStr to evaluate it with "exportedString(this,$HTMLExportTemplate)"
	end tell
end tell

set italicStrings to (my getStyledExtractsFrom:htmlStr startingWith:"<i>" endingWith:"</i>")
set boldStrings to (my getStyledExtractsFrom:htmlStr startingWith:"<b>" endingWith:"</b>")

-- assemble the output string "report"
set outputStr to ""
set text item delimiters to listDelimiter
if length of italicStrings > 0 then
	set outputStr to outputStr & "Italic:" & spaceAfterHeader -- delete this line to remove section label
	set outputStr to (outputStr & italicStrings as text) & sectionSeparator
end if
if length of boldStrings > 0 then
	set outputStr to outputStr & "Bold:" & spaceAfterHeader -- delete this line to remove section label
	set outputStr to (outputStr & boldStrings as text) & sectionSeparator
end if
if length of redHLStrings > 0 then
	set outputStr to outputStr & "Red:" & spaceAfterHeader -- delete this line to remove section label
	set outputStr to (outputStr & redHLStrings as text) & sectionSeparator
end if
if length of greenHLStrings > 0 then
	set outputStr to outputStr & "Green:" & spaceAfterHeader -- delete this line to remove section label
	set outputStr to (outputStr & greenHLStrings as text) & return & return
end if
if length of yellowHLStrings > 0 then
	set outputStr to outputStr & "Yellow:" & spaceAfterHeader -- -- delete this line to remove section label
	set outputStr to (outputStr & yellowHLStrings as text) & sectionSeparator
end if

-- remove/replace markup as desired
set outputStr to my replaceThese:{"<p>", "</p>", "<i>", "</i>", "<b>", "</b>"} withThese:{"", "", "", "", "", ""} inText:outputStr

set the clipboard to outputStr ---- send to clipboard so can be pasted wherever
return outputStr -- to view in results panel

-- handlers
on getStyledExtractsFrom:fromText startingWith:startingMarker endingWith:endingMarker
	set text item delimiters to {startingMarker, endingMarker}
	-- "chunk" the text to isolate the parts enclosed by the markers
	set textItems to text items of fromText
	-- gather the even items (the enclosed bits) into an AppleScript list
	set extractedItems to {}
	repeat with i from 2 to length of textItems by 2
		set end of extractedItems to startingMarker & item i of textItems & endingMarker
	end repeat
	return extractedItems
end getStyledExtractsFrom:startingWith:endingWith:

on replaceThese:aListOfSearchValues withThese:aListOfReplacements inText:someText
	set {s, r, t} to {aListOfSearchValues as list, aListOfReplacements as list, someText as text}
	set tid to text item delimiters of AppleScript
	repeat with i from 1 to count s
		set text item delimiters to item i of s
		set t to text items of t
		set AppleScript's text item delimiters to item i of r
		set t to t as text
	end repeat
	set text item delimiters of AppleScript to tid
	return t
end replaceThese:withThese:inText:
2 Likes

Many thanks Sumner for your help and expertise! That was awesome! Works like a charm.

Tom

You’ll have to show this to me next time we chat.

1 Like

Instead of doing what I should be doing with TurboTax I decided try out the Shortcuts app and see if I could get similar output. It takes some getting used to. Comments, for example, seem to work best after rather than before the relevant actions. But not too bad.

Shortcuts is supposed to be the future of automation on the Mac (and iOS), while, AppleScript … well it refuses to die.

Anyway, Shortcuts can do pretty much the same thing with only a sprinkling of AppleScript one-liners mixed in with its native actions.

And shortcuts are easy to share. Clicking the link will import the shortcut into the Shortcuts app, where it can be run, “pinned” to the menu bar, or added to the Dock.

‘Extract Tinderbox markups highlights’ - Shortcut for Shortcuts app

1 Like

Thx. Where is this link supposed to go? It is leading me here: ‎Shortcuts on the App Store. This app only seems available or iOS and iPad.

@TomD Thanks for the call and explanation. Makes a lot of sense.

That statement was once true. But it no longer is. :grinning: Shortcuts is built into Monterey.

One of the hits from a search for “Mac Shortcuts app” is this.

Shortcuts On Mac: Apple Is Updating Automation On Mac

If you are running macOS 12.3 Monterey, then the link in my post should offer this: