Is there a way to automatically parse anchors and URLs from pasted HTML in $Text. When I look at the HTML I see the text, but not the URLs. The URLs are clearly there because you can click on them. I would like to be able to extract anchor and text pairs and then be able to use them as I see fit.
No, these donāt exist as Tinderbox web links and are only in the RTF layer as part of the āSmart URLā detection process.
When SmartURL were first possible, via the Apple Frameworks used in v6+, it was intended such links would get adopted as true Tinderbox (web) links but there were issues that as yet, are still to be resolved.
I suspect AppleScript (not tried, and my expertise is rather rusty) probably ought to be able to do this. From a quick Google, I found this: automator - In a Service, how to get a URL from rich text? - Ask Different. One solution also references TextSoap which does this (albeit via an internal AppleScript!).
I was thinking textutil
in runCommand
should easily convert $Text to html from which the anchor and link could be extracted, something like this:
$MyString = runCommand("textutil -convert html -stdin -stdout",$Text)
Alas, that gives html, but drops the url and leaves just the anchor.
If you select the text in the note, copy to the clipboard and run the following AppleScript in Script Editor then you do get the full html, revealing the url.
set the clipboard to (the clipboard as Ā«class RTF Ā»)
set theHTML to do shell script "pbpaste -Prefer rtf | textutil -convert html -stdin -stdout"
-- the result is HTML from which the url and anchor can be extracted
Thanks! I tried this, but am getting an error āerror āCanāt make some data into the expected type.ā number -1700 to itemā Any idea what might be wrong.
Iām not sure. With the AppleScript, one necessary (unwanted) step, of course, is making sure to select the rich text with the link and then typing command-c to copy to clipboard before running the script. In my simple test (a short note with a link formed by copy-pasting from the Eastgate site) the results were as expected: html like this:
<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
<html>
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">
<meta http-equiv=\"Content-Style-Type\" content=\"text/css\">
<title></title>
<meta name=\"Generator\" content=\"Cocoa HTML Writer\">
<meta name=\"CocoaVersion\" content=\"2022.3\">
<style type=\"text/css\">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Helvetica}
span.s1 {font-kerning: none}
span.s2 {font-kerning: none; color: #fb5a08}
</style>
</head>
<body>
<p class=\"p1\"><span class=\"s1\">Whether youāre plotting your next thriller or writing your dissertation, designing a course, managing a legal practice, coordinating a campaign or planning a season of orchestral concerts, <a href=\"http://www.eastgate.com/Tinderbox/updates/Tinderbox88.html\"><span class=\"s2\">Tinderbox 8.9</span></a> will be your personal information assistant</span></p>
</body>
</html>
You can see the anchor and url in that.
The action code above yields this html (unfortunately there is no url to grab; not sure why):
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title></title>
<meta name="Generator" content="Cocoa HTML Writer">
<meta name="CocoaVersion" content="2022.3">
<style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Light'}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Light'; min-height: 14.0px}
</style>
</head>
<body>
<p class="p1">Whether youāre plotting your next thriller or writing your dissertation, designing a course, managing a legal practice, coordinating a campaign or planning a season of orchestral concerts, Tinderbox 8.9 will be your personal information assistant..</p>
<p class="p2"><br></p>
</body>
</html>
FWIW, experimenting with Tinderboxās AppleScript methods shows they do not expose the RTF layer of $Text, only the plain text. But, the desired link info exists only in the RTF version of $Text. I imagine other AppleScript approaches might be able to drive UI access the essentially scrape the RTF from the $Text area of the $Text pane and use that.
Itās tough working with RTF in plain AppleScript. Generally the only practical way (for all but the most expert) is to go through the clipboard.
FWIW, hereās a quick and dirty way I found to extract urls from a note using Automator. There are two separate images here. I selected the text in the note (top) and copied to the clipboard with command-c before running the workflow (bottom).
The shell script action has this:
osascript -e 'the clipboard as Ā«class RTF Ā»' | perl -ne 'print chr foreach unpack("C*", pack("H*",substr($_,11,-3)))' | textutil -stdin -stdout -convert html -format rtf
It seems that it should be possible to put the perl
and textutil
parts of this into the first argument of runCommand
and STDIN the RTF with $Text in the second argument. But I couldnāt figure out how to escape the perl
so that it wouldnāt throw an error.
Thanks @sumnerg. One more question. Do you know if it would be able to parse and create a keypair of the anchor and the url, e.g. ANCHOR::URL?
The Automator action is quick to implement. But, alas, there seems to be no easy way to get the anchor text.
However, to my surprise, I managed to wrangle AppleScript to do the job by adapting scripts shared online.
Script here
-- adapted from https://www.macscripter.net/viewtopic.php?pid=182034, https://macscripter.net/viewtopic.php?id=46657
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions
-- get any rich texts off the clipboard
set pb to current application's NSPasteboard's generalPasteboard()
set theRichTexts to (pb's readObjectsForClasses:{current application's NSAttributedString} options:(missing value)) as list
if (count of theRichTexts) = 0 then
display dialog "No rich text found on the clipboard" buttons {"OK"} default button 1
error number -128
end if
set theRichText to (item 1 of theRichTexts)
-- get length so we can start from the end
set start to (theRichText's |length|()) - 1
-- make plain string copy to work on
set theString to theRichText's |string|()'s mutableCopy()
set output to return
repeat while start ā„ 0
set {aURL, theRange} to theRichText's attribute:(current application's NSLinkAttributeName) atIndex:start effectiveRange:(reference)
if aURL is not missing value then
-- get linked text
set anchorText to theString's substringWithRange:theRange
if aURL's |scheme|()'s isEqualToString:"mailto" then -- email address
set newLink to aURL's resourceSpecifier()
else if anchorText's containsString:"This Site" then -- resource specifier, remove //
set newLink to aURL's resourceSpecifier()'s substringFromIndex:2
else -- full URL
set newLink to aURL's absoluteString()
end if
set output to ((output & anchorText as text) & "::" & newLink as text) & return
end if
set start to (location of theRange) - 2
end repeat
return output -- to view in Script Editor Result pane
And shorter, cleaned up script here
-- adapted fr https://www.macscripter.net/viewtopic.php?pid=182034, https://macscripter.net/viewtopic.php?id=46657
-- copy rich text to clipboard and run
use framework "Foundation"
-- get any rich texts off the clipboard
set pb to current application's NSPasteboard's generalPasteboard()
set theRichTexts to (pb's readObjectsForClasses:{current application's NSAttributedString} options:(missing value)) as list
set theRichText to (item 1 of theRichTexts)
set start to (theRichText's |length|()) - 1 -- will work from end backwards
set theString to theRichText's |string|()'s mutableCopy() -- plain string copy to work on
set output to ""
repeat while start ā„ 0
set {aLink, theRange} to theRichText's attribute:(current application's NSLinkAttributeName) atIndex:start effectiveRange:(reference)
if aLink is not missing value then
set anchorText to theString's substringWithRange:theRange
set urlText to aLink's absoluteString()
set output to (anchorText as text) & "::" & (urlText as text) & return & output
end if
set start to (location of theRange) - 2
end repeat
return output -- to view in Script Editor Result pane
Here the output simply goes to the Result pane in the format suggested for copy-pasting. It could, of course, be delimited in other ways and automated to set values of attribute(s) in Tinderbox.
Ok, this works perfectly. Thank you for the education.
Now, to finish this off. Does anyone know how to trigger an Apple Script from TBX? I can use the run command to copy $Text to the clipboard. I then want the apple script to run, and paste the results back into Text, or possibly a new note (have the apple script create a new note). Then, I can run an explode to parse the results.
Ah, so much fun to have later.
If extracting links from multiple notes then selecting them and File > Export > As Text > RTF > Selected Notes seems to be the way to go. Then run this script and choose the file that Tinderbox creates from the export.
Script here
-- adapted fr https://www.macscripter.net/viewtopic.php?pid=182034, https://macscripter.net/viewtopic.php?id=46657
use framework "Foundation"
use scripting additions
set thePath to POSIX path of (choose file)
set urlPath to current application's NSURL's fileURLWithPath:thePath
set {attString, theError} to current application's NSAttributedString's alloc()'s initWithURL:urlPath options:(missing value) documentAttributes:(missing value) |error|:(reference)
set start to (attString's |length|()) - 1 -- will work from end backwards
set theString to attString's |string|()'s mutableCopy() -- plain string copy to work on
set output to ""
repeat while start ā„ 0
set {aLink, theRange} to attString's attribute:(current application's NSLinkAttributeName) atIndex:start effectiveRange:(reference)
if aLink is not missing value then
set anchorText to theString's substringWithRange:theRange
set urlText to aLink's absoluteString()
set output to (anchorText as text) & "::" & (urlText as text) & return & output
end if
set start to (location of theRange) - 2
end repeat
return output -- to view in Script Editor Result pane
Iāve noticed that the the RTF ālayerā (or whatever it is called) is somehow ādisturbedā in any note where a āwikilink/ziplink/text linkā is added. The colors (if any) of the text pasted from the web all shift when one is added and the external links in the text are no longer clickable, ā¦ and of course these scripts can no longer find any hyperlinks. Not sure if that is expected behavior.
Why not enable the scripts menu on the OS menu-bar and use scripts that target the current TBXās selection. Essentially it is like using a stamp, albeit called from outside the app. That seems less hassle than tinkering with runCommand()
.
It is possible to trigger an AppleScript from within Tinderbox and even pass an argument, via runCommand
, using osascript -e
, using, say, a stamp. From my (very) old notes:
However, escaping AppleScript for the command line is a daunting task. No single quotes, for example. Similar problems with the perl
one-liner above (that I thought might make it possible to pass rich text to textutil
for conversion to html, which could be parsed within Tinderbox. )
Now that Tinderbox has external scripting support, making it efficient to get values in and out (with the notable exception of rich text) I suggest just launching a script outside of Tinderbox in the usual ways (the run
button in Script Editor, a menu pick after placing the script in the Script menu, or with a keyboard shortcut after placing the script in an Automator Service, a.k.a. Quick Action).
This is perfect!
Would be great if we could set $Text to rich text via AppleScript.
Iād like to e.g. get parts of a PDF as attributed string and create a note from that.
@eastgate Would this be possible?
Set $Text
to rich text and get rich text from $Text
into a script.
And not have $Text
ādisturbedā by addition of a text link (per above) so that links canāt be extracted.
I suspect it is complicated, per @eastgate :
Not actually an RTF writing space. But the internal format happens to be stored as RTFD
But it would be nice. Especially the ability to add text links in a note without the existing links ādisappearingā from the built-in export to RTF.
Weāll take a look!
@Pete This is going in the opposite direction from what you describe but I have figured out how get an attributed string from a Tinderbox note via AppleScript (thus retaining formatting and any embedded links). The results are easy to see in Script Debugger.
The script below retrieves the rtfd
for the selected note from the tbx xml, decodes it, and converts the rtf part into an attributed string.
Perhaps going the other way (attributed string to Tinderbox) could be done by base64 encoding it in AppleScript and writing that to the rtfd
. I suspect one would be living dangerously if one tried to write from a script directly to the xml of a document open in Tinderbox, though, as the script demonstrates, it is not too hard to read the rtfd
from the xml file using XQuery. Perhaps @eastgate could consider exposing rtfd
to AppleScript as an āattributeā whose value can be read and set.
-- select a Tinderbox note and run
use framework "Foundation"
use scripting additions
tell front document of application "Tinderbox 8"
set strTBX to read (its file as alias) as Ā«class utf8Ā»
set theIDs to my doXQuery("for $i in //item return string($i/@ID)", strTBX)
set encodedRtfds to my doXQuery("for $i in //item return string($i/rtfd)", strTBX)
tell selection 1 -- the selected note
set noteID to value of attribute "ID"
set encodedRtfd to my getValWithKey(noteID as text, theIDs, encodedRtfds) as text
end tell
end tell
set strRtf to my getStringRtfFromEncodedRtfd(encodedRtfd)
set attrStr to makeAttributedStringFromStringRtf(strRtf)
--~~~ handlers/subroutines ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
to doXQuery(strXQuery, strXML)
-- XQuery handler adapted from post by Rob Trew
set {xmlError, xqueryError} to {reference, reference} -- holders to report errors
set {docXML, xmlError} to (current application's NSXMLDocument's alloc()'s Ā¬
initWithXMLString:strXML options:0 |error|:xmlError) -- parse XML
if xmlError is not missing value then return (localizedDescription of xmlError) as string
set {xs, xqueryError} to (docXML's objectsForXQuery:strXQuery |error|:xqueryError) -- apply XQuery
if xqueryError is not missing value then return (localizedDescription of xqueryError) as string
return xs as list -- values retrieved by the XQuery over the XML
end doXQuery
to getStringRtfFromEncodedRtfd(encodedRtfd)
set decodedStr to do shell script " echo '" & encodedRtfd & "' | base64 -d"
-- Remove visible and invisible characters outside the surrounding {}
set startPos to offset of "{" in decodedStr
set endPos to offset of "}" in (reverse of characters of decodedStr as string)
set strRtf to text startPos thru -endPos of decodedStr
end getStringRtfFromEncodedRtfd
to makeAttributedStringFromStringRtf(strRtf)
set ca to current application
set s to ca's NSString's stringWithString:strRtf -- the string
set d to (s)'s dataUsingEncoding:(ca's NSUTF8StringEncoding) -- the data
set attStr to ca's NSAttributedString's alloc()'s initWithRTF:d documentAttributes:(missing value)
if attStr is missing value then error "String not recognized as RTF"
return attStr
end makeAttributedStringFromStringRtf
on getValWithKey(aKey, aKeysList, aValuesList)
set ca to current application
set theDict to ca's NSDictionary's dictionaryWithObjects:aValuesList forKeys:aKeysList
set theResult to theDict's objectForKey:aKey
set tempArray to ca's NSArray's arrayWithArray:{theResult}
return item 1 of (tempArray as list)
end getValWithKey
Apologies if Iāve misread the aim here. But, Iād note that this seems to be going against the flow. Tinderbox (IIRC for v9) is now āadoptingā links created/pasted in the RTFD layer of text. I note the RTFD aspect as given that links are defined in against the plain text_ layerā , using AppleScript to define a link in the RTFD layer when it could more effectively defined in the plain text later. Rich text style, necessarily, must be defined in the RTFD layer. But, I see only downside defining the link in the RTFD layer and then expecting Tinderbox to pick up the pieces.
ā . Tinderbox stored links in a <links> linkbase discrete from the text, but link anchors are defined by character offsets in the plain text <text> of the note rather than the styled <rtfd> version. For those not aware, Tinderbox stores both plain and styled text.
[edits for clarity]
The aim is definitely misunderstood. This isnāt about ādefiningā a link in rtfd
or asking Tinderbox to āpick up the pieces.ā Itās a demonstration that making the rftd accessible to external scripts can be useful (at least as read-only, if setting the value of the rtfd turns out to be too problematic).
The aim is to get styled text in and out of Tinderbox via AppleScript without going through the clipboard, either manually (as in the first examples above) or through easily broken āguiā scripting.
This script demonstrates one way to get styled text out (including any āsmartā links that happen to be in the rtfd). Itās not too hard. But it would be much easier if the rtfd could be read directly (like reading the value of an attribute) rather than having to resort to XQuery.
Going the other way, getting styled text into Tinderbox, as @Pete has requested above, may be more difficult. My thought was that perhaps a scripter could base64 encode and set the rtfd to that, but @eastgate would have to comment on that.
BTW, I do not believe it is entirely correct to say that native Tinderbox links are ādefined in the plain text_layer.ā In the xml of my current version (8.9.2) I see them defined in <links>
, with the position of the anchors specified in terms of offsets (sstart
and slen
) in the plain text. That may seem like quibbling. But the distinction is hugely significant from a scripting point of view. Unlike smart links embedded in the rtfd, itās not that easy via a script to get text links out of Tinderbox anchored in their proper place in the text, though that has been demonstrated in the forum using R reading the xml.
Anyway, would not making the rftd more easily accessible to external scripts be a good idea, at least read-only?