Revisiting Devonthink Annotations to Tinderbox

Thanks, I’ll quit now, floating up to the top of the fishbowl on this one…out of my league. :wink: (For now).

As regards PDFs, the ‘PDF’ library of Mac’s Automator (one of Apple’s less-loved offspring of late) has an ‘Extract PDF Text layer’ feature:

I don’t doubt AppleScript can probably do similar via Finder (or Preview?). The above is taken from a service I made to export PDF text to text files. Worked fine across c.2k PDFs. BUT what you get with OCR-ed docs is the OCR output which is often riddled with errors ( believe Adobe Acrobat and PDFPen Pro offer OCR text correction and even then I think it is a manual edit process. Digitally native docs, i.e. printed to PDF should likely have better text fidelity.

OOOH – Automator, yes. There’s an Automator action to extract annotations

1 Like

In case it helps a Zip of my PDF text export OS service file is here: Extract Text from PDFs.workflow.zip (102.4 KB)

Unzip and place the file in you user account Library: ~/Library/Services/ I’d check the newly-placed file has the same permissions as others in the folder (You: Read & Write, Staff/everyone: Read-only) . Once in place, select a PDF9S), right-click the selection and scroll to the ‘Services’ sub-menu and select the service.

2 Likes

Hi, I wonder why you didn’t ask in the script thread? :slight_smile:

Because I’m not a programmer and I’m not really that sure what your question means. I’m stumbling around in the dark looking for light switches of understanding.

I wrote this script and would be happy if you’d asked. It’s impossible for me to estimate what users know about an app or AppleScript so I don’t try to do it anymore and instead include only what’s specific about the actual script. However the only way that I could learn what might be necessary to include in a script’s instructions is feedback. You weren’t able to use the script and I didn’t know about it until I read your post here. That’s not how forums should be used, I think :slight_smile:

Good news. @pete31 has taken the time to develop and provide a step-by-step on a new approach to generate a TSV file based on Summarise Highlights and then import it into Tinderbox.

I’ve tried it out on a large summarise file of mine and it works as described. Below an illustration of the notes once imported and coloured.

1 Like

Ya!!! Got it working. Not sure why I had trouble the first time, but the second time I got it working in just a few minutes. :slight_smile: Thanks so much, @pete31. Too cool!

And more good news. @Pete has updated his Apple Script to include custom metadata from your DT3 files (I also learned to edit this file to add even more custom metadata…so very cool!). Here is the post for the updated Apple Script -Script: Create "Summarize Highlights" TSV for import into Tinderbox - #10 by satikusala - DEVONthink - DEVONtechnologies Community. It works great.

1 Like

Check out this great thread from @PaulWalters. It is perfect for or those people that don’t want to bother with AppleScript and are not concerned with pulling in custom fields or note type annotations into specific TBX notes. Paul, as he so often does, has made it really clear and easy.

An update on my side re. exporting Devonthink annotations to Tinderbox. Devonthink developer C. Grunenberg has just confirmed that the next release of Devonthink will allow the user to export Summarise Highlights directly into CSV format in addition to Markdown and RTF. I expect this will help simplify the export process to TB even further. See this link

3 Likes

Thanks for noting that. Can you or someone using this method post back here in this thread when the DEVONthink update occurs to confirm this issue is now essentially ‘fixed’? That will help later readers of the thread. :slight_smile:

1 Like

The most recent version of DevonThink (v3.6.2) now supports export of annotations to a .tsv file (the DevonThink spreadsheet format). I’ve tested it this morning with the following observations:

  • creating the .tsv file is very fast and convenient (an option in the summarise highlights menu)
  • directly dragging the .tsv file to Tinderbox does not work well. TB does not recognise it as a spreadsheet and does not generate a note for each line
  • dragging the file to the finder and then into Tinderbox works better. A new note is created for each line and the display attributes are populated and filed out.

It’s a step forward for sure and will help me in doing the highlighting in DT and text analyses in TB. I still see room for improvement on the TB side with the following elements:

  • Have TB recognise DT .tsv as a spreadsheet and generate a new note for each line in the sheet e.g. same function as the drag from finder which currently adds one more step. This should be low hanging fruit !
  • Have TB process and populate the headers or column names of the file to populate the display attributes in a better way. This is likely a little more involved. As an example currently the header of one of my .tsv highlight export files is fairly simple and show something like this:

“Document#string” “Location#string” “Type#set{values:Highlight|Underline|StrikeOut}” “Annotation#text” “Name#text” “Link#url”

which is translated in TB into the following display attribute

image

I would suggest that TB on-import used the type statement (e.g. #string for Document) to define the display attribute type and set the name of the attribute to the name of the column e.g. Document for column 1.

1 Like

TSV is simply a standard extension for Tab-Separated-Variable data (i.e. Tab-delimited data), though more often this format is fond with a ‘.txt’ format. Tinderbox certainly recognises the ‘.tsv’ file extension - I can happily import Tab-delim data from files with ‘.tsv’ or ‘.txt’.

Do you mean dragging it onto the Finder’s Dock icon. The fact it works with after would suggest that the action is setting file information that should have been set correctly by the originating app. Or, there is some bad/unexpected formatting in the data. Certainly, quotes around cell values are not a requirement of Tab-delim format (as the Tab already separates the cells so quotes aren’t delimiters as such.

It would be interesting to open the before & after (Finder) versions of one of these file in BBEdit and do a compare. Has anything in the file changed. If not, it would suggest the issue is missing (OS, under the hood) file metadata that may be the problem.

Do you mean ToolsSummarize Highlightsas Sheet ?

When I use this (DEVONthink Pro v3.6.2) nothing happens. No save dialog seen. Neither app Help nor the v3.6.2 PDF Manual provide any clear documentation as to this feature.

Are you setting the hash-delimiter data type or is part of the default DEVONthink export? There I can’t find anything in the DEVONthink Preferences of Help/Manual.

I like the idea of the possibility of letting Tinderbox know the desired attribute type for ingested data rows, but it would make sense to follow a standard format (if any) as CSV/TSV drag-drop import in Tinderbox can use files from any source and not just DEVONthink. If necessary it might make more sense to have a specific option for importing DEVONthink ‘sheet’ data (albeit not via drag drop as how would Tinderbox know the originator?).

Following up with some more details (others can confirm):

  1. Drag and drop the .tsv file from DT to TB does not work
  2. Drag and drop the very same file from the Finder to TB does work as expected with one note create per line in the sheet
  3. Yes I do mean the output of Tools → Summarize Highlights → as Sheet which works as expected. There is no save dialogue but a TSV file is created in the same group as the file with the highlights
  4. “Document: string” etc… refers to the first line of the TSV file with the names of the columns

So is DEVONthink storing this TSV internally within the DEVONthink app. If so, and dragging from DEVONthink != dragging from Finder, then the issue likely needs fixing in DEVONthink. A way to bottom this out is to try dragging from DEVONthink to some other app that supports drag-drop TSV import.

Not for me. I do see a new annotations folder item, but if dragged to Finder it is an RTF. Odd.

OK. If so, then I think what you are saying is “Yes, this is DEVONthink generated mark-up?”. IOW, it is adding the non-standard quotes and in-heading hash-marks?

You probably want grep to use Perl regular expressions. Unfortunately, the older version of grep ((BSD grep) 2.5.1-FreeBSD) that ships with MacOS does not support them. However, you can get the latest version ((GNU grep) 3.6) by first installing HomeBrew [1] and the installing gnu grep (brew install grep). Once done, you instruct GNU grep to use PCRE [2] by passing ‘-P’ on the command line [3]. Let me know if you have questions on this.

  1. https://brew.sh
  2. https://www.pcre.org
  3. GNU Grep 3.8
  4. The GNU Operating System and the Free Software Movement

BTW, Roger had walked me through this process. It is a pretty straightforward process once you get over using the terminal and the fear of breaking something. :slight_smile:

Sorry, I appreciate your info, but I should have deleted that post months ago. What I said there is useless.