Exporting notes with smart quotes

(Galen Menzel) #1

Hi all,

I’m doing a very simple plain-text export in which the $HTMLExportTemplate's note is just ^text(plain)^. However, if I have smart quotes like in the notes that are being exported, the quotes in the resulting files show up like ’ rather than . I’m not able to find any option that would make this happen. What’s going on here?

I notice that the smart quotes render just fine when I use ^value($Text)^ rather than ^text(plain)^. Is there any difference at all between these two? For example, does $Text have a maximum length that does not apply to text(plain)?



(Mark Anderson) #2

Maximum length? Probably not. Actually, I think it’s a legacy issue. ^text(plain)^ goes back to the dawn of Tinderbox; c.2001 text handling was less capable pre-unicode.

For non-formatted/HTML export of $Text, I’d deprecate ^text(plain)^ in favour of the newer ^value($Text)^. The later will place the source text in the output as UTF-8 Unicode so shouldn’t mangle non-low-ASCII characters like curly quotes.

TL;DR - in this case go with ^value($Text)^ to avoid any encoding

Edit: urgh - I recommended the wrong outcome due to a copy/paste error. Fixed.

(eastgate) #3

Before HTML5, ISO-8859-1 was the default character set for HTML 4. That encoding was a superset of ASCII, and didn’t have typographic quotes. That meant that it was mandatory to encode curly quotes.

The default encoding in HTML5 is UTF8, which does (of course) have typographic quotes.

We tried to bridge the gap in Tinderbox 6 by encoding characters that would, if not encoded, cause problems in HTML 4.0.1 Transitional. ^value($Text) will give you utf8, which is fine if you know you’re using HTML5, or if your page adopts the UTF8 charset. This gives you a way to export to either new or old HTML formats.

(Mark Anderson) #4

I’ll update aTbRef accordingly and this nuance should probably go in the manual as well.

The edge case here is the OP is using export templates (and ^export^ code) to export plain text. I think to the general non-tech user this means the text viewed in TextEdit or such will not show encodings. I think that’s the expectation. Quite the best way to match that to a description of the status quo, I’m less sure.

(Galen Menzel) #5

Ah, I see. Still, it seems strange that character encoding depends on the template export code used rather than, say, an HTMLCharacterEncoding attribute.

Thanks for the clarification!