Very weird bug or me misusing TBX?

Just came upon very weird bug. If I try to paste particular text and then deselect note in which I pasted it some amount of text disappears from the note. I tried with completely new TBX document also. Am I not understanding something? On the verge of my memory it seems that somebody mentioned once something like that… Maybe you can kindly clarify…

Try inserting this: In its aspect of being a dependently co-arisen (pratītya-samutpanna) existent, a conditioned dharma is said to be samskrta 一 ‘compounded’ ‘co-produced ’ ‘conditioned’. In its other aspect of being a causally productive force, it is also called a samskãra — ‘conditioning’ or ‘conditioning force’.

It becomes this:
In its aspect of being a dependently co-arisen (pratītya-samutpanna) existent, a conditioned dharma is said to be samskrta 一 ‘compounded’ ãra — ‘conditioning’ or ‘conditioning force’.

Odd. If I take your specimen text and paste it to a note’s $Text, de-/re-select, I get … the source text. I tried it pasting the web-copied text (⌘+V) or using Paste-and-match-style (⌘+⌥+⇧+V): same result. As I’ve admin access, I even used edit mode on your post, copied the raw text of your post … and still got the same result.

But, this doesn’t mean you are mistaken. I think the pivot is “Try inserting this”. What are we inserting from where? Is this text you are typing in, or are you copy/pasting from another app or a web page.

It’s a guess at this point, but this has the hallmarks of copy-pasted text where non-printing, i.e. invisible, characters are coming along for the ride and confusing Tinderbox. This guess is reinforced by the my test above. It doesn’t invalidate your report. Rather it suggests I’m not actually testing what you—with good intent—though you were giving me to test. A problem with non-printing characters is some inter-app process delete these (even if only as they don’t ‘see’ them at the copy phase) and others don’t.

Could you post a small TBX with the offending text in a note. I suspect the full text is there, but some literal gremlin is confusing the text parser. Ot if the original is copied from a publicly accessing web page, the URL of that (and where to look in it) would help us help you.

HTH

Thank you very much for pointing me to the source of problem ) I am sorry. You are right. I copied it from OCRed pdf inside Devonthink (OCRed not by it) which can have this “invisible” characters. I became puzzled because it never happened before (or I didn’t notice). Just tried again and watched buffer — it didn’t copy all. But no problem with other parts. And strange thing is that I somehow was able to copy it all into the buffer to put it in this post…

When I meet this sort of thing, I paste the text into something like BBEdit ((OK, you loose formatting) but you can see/detect these ‘missing’ characters and purge them.

If you don’t need source formatting, paste-and-match-style is often a help in stripping off unwanted ride-along garbage. Using the above is generally the last resort.

The older the OCR the more likely the OCR-ed text is to have all sorts of unwanted artefacts.

I got this in BBEdit. Only BBEdit sees everything ) When I look into Alfred buffer it looks like it didn’t copy all. But after paste into BBEdit I got this. Actually paste-and-match-style is default behaviour on ⌘V in my Tinderbox currently. Anyway it is exactly what you have explained! Thank you very much for that!

If you use the BBEdit Character Inspector (menu Window ▸ Palettes Character Inspector) and select one of the upside-down question marks, it’s tell you the Unicode character you can look up—in case it helps with cleaning the source, if you have such control. Otherwise, just delete characters that you don’t think shod be there.

Thanks a lot!

