DTP import in v7.2.0

(Mark Anderson) #1

De merged from this thread:

Let’s give this it’s own thread.

Updates for v7.2.0
(Paul Walters) #2

Importing (dragging) PDFs into Tinderbox from DEVONthink or elsewhere – and the “gibberish” result.

There are numerous factors that affect what a “PDF” is – the standard is loosely followed by many apps.

  1. Is the PDF made form a scanner image – especially a document or book scan. Your results will vary wildly depending on the scanner, the source, the scanning program, etc. Unless the scanned image has been OCRd, there is NO text in the file that either DEVONthink or Tinderbox can display. This is the most common source of gibberish. Tip: don’t drag documents from DEVONthink to Tinderbox that are not type “PDF + Text” in DEVONthink. If you have Pro Office, or some other PDF app, then OCR the document first.

  2. Same as #1 but the file is OCRd. Again, every app does OCR differently and in any case the resulting “text layer” in the PDF depends on the quality of the source image and the capability of the OCR software. If you are looking at an OCRd document in DEVONthink (i.e., type is “PDF + Texst”) and want to see how well the OCR was done, select the document and choose Convert > To Plain Text from DEVONthink’s contextual menu. What you see is what Tinderbox will import – as good or as bad is it might be.

Hint: don’t import the PDF to Tinderbox, import the Text file created by converting the PDF as described here.

The most common complaint about PDFs (other than the messes that Sierra created with PDFkit) is “my text is gibberish”. And the most common reason is “your scan is bad”.

(Mark Anderson) #3

Amen to the notion that PDF isn’t always what we assume it to be. I think perhaps I should paraphrase this into my page on DT import (or a sub-page thereof)? If one expects something complex with internal variety (like PDF) to just work, then surprises are likely even for otherwise experience folk. A note to suggest the cause isn’t necessarily a Tinderbox failure might be useful seeming that there seem to be a fair number who use both Tinderbox and DEVONThink.

(eastgate) #4

We’re investigating the pdf import issue; it’s probably a side-effect of some improvements in the DEVONthink➛Tinderbox plumbing.

(Andreas Grimm) #5

Thank you @mwra for merging the first DTP-contributions in Updates for v7.2.0 to this new thread.

Some first impressions and experiences

New attributes

Drag and Drop works fine. The two new attributes SourceCreated and SourceModified are a delight to work with. Thanks.

dragged-dropped-eMails from DTP to TBX

Dragging and dropping eMails (stored in Devonthink) into Tinderbox does result in gibberisch both in $Name and $Text.

And: The auto-populated $URL does not show the complete link back to the eMail stored in DTP but only always just: x-devonthink-item://true.



As @PaulWalters points out: Problems with PDFs almost always derive from bad scans:

The most common complaint about PDFs (other than the messes that Sierra created with PDFkit) is “my text is gibberish”. And the most common reason is “your scan is bad”.

As far as can see: All PDFs I dragged-dropped into TBX (v.7.2) are gibberish (both $Name and $Text) only as long as I do not check $AutoFetch. As soon as I check $AutoFetch both $Name and $Text are properly displayed.

Better, though, one follows @PaulWalters’ suggestion to convert PDF to Plain Text --> then import this Plain Text file into TBX. Thanks, Paul.

Or even better: One converts PDF to RTF / RTFD whereby the latter nicely displays even tables etc. from the original PDF. Cool!

Meanwhile, @eastgate reports that they are investigating the PDF import issue. Thanks.

(Paul Walters) #6

In my (very limited) testing it appears that the $Text becomes the raw source (the same thing you would see in Mail if you choose View > Message > Raw Source).

I’ve tried converting .eml to “Rich Text” in DEVONthink, but usually end up with RTFD. With 7.2.0 I find that dragging RTFD from DEVONthink to Tinderbox results in a long hang that needs forced quit. So, that left me with using Edit > Copy Item Link in DEVONthink, pasting that clipboard to Tinderbox, adding the $AutoFetch attribute to the note’s KA, then activating auto fetch.

(Andreas Grimm) #7

Nice workaround, @PaulWalters, which I’ll adopt for the time being. Thanks.