Japanese character blur

WAKAMATSU · October 18, 2021, 4:15am

Why Japanese character become blur ？

I would be very grateful if I could have your wisdom.
I am investigating the cause of garbled characters in md format.
I added one note to Dr.Mark Anderson’s zettelcode-demo.tbx file.
[[A Tinderbox solution to organising a Zettelkasten? - #44 by Steve_Scott]]
I have added one note.
Exporting as html displays Japanese without problems, but
However, when I export the file as an md file with “HTMLExportExtension”, the
However, when I export the file as a md file to “HTMLExportExtension”, only the “horse-horse” file is not created in the Testcell Folder.
Only the file of “horse-horse” is not created in Testcell Folder.
If you exclude Japanese from the title name “horse-horse” and change it to “horse”.
horse.md will be created.

Here is the name of the note
(I’ve enclosed it in brackets for convenience of posting)

The string [ horse-馬 ] is checked in the Export Pane.
The string is [ horse-＆#x99ac; ].
Because of the [ &# ] & [ ; ] between the two, the string is not displayed in the markdown file.
I understand that this is why the string is messed up when I change it to a markdown file.
It also seems that specifying utf-8 in the Built-in Templates does not work.
Question 01: How do I get the “Japanese” display to work without the [ &# ] & [ ; ] ?

Q02: What happens when I use the HTMLExportExtension to export a file?
What should I be aware of when exporting files with HTMLExportExtension?

This question will be sent as a separate thread.
Thx and regards, WAKAMATSU
P.S
X99AC 馬 is Unicode character number 39340, KanjiLiberal, Uma.

WAKAMATSU · October 18, 2021, 4:40am

This is a report of the solution to the problem.
Exporting html title with unicode characters
[[Exporting html title with unicode characters]]
Adopting Dr.Mark Anderson’s suggestion above
^value($Name)^ and the “garbled” problem was solved.
Thank you for your concern.
Thx and regards, WAKAMATSU

P.S.

^value($Name)^

^text ^children(/Templates/HTML page/HTML item)

WAKAMATSU · October 18, 2021, 6:00am

Dear All,
I changed it to ^value($Name)^ and it contains Japanese.
The “garbled” problem was solved.
But now I have another problem with it.
Zettels > Test cell > “Horse”
If I use only Japanese 馬 in my note names,
generally I get 馬.md" in the Testcell folder.
I am afraid that the file does not have a note name attached to it,
worse still “.md” in a hidden state.
Try changing it to ^text(plain)^ and I will get the same ".md"furthermore hidden state,
but when I open the file I will see only the "<h2></h2>" exported without the note name.
If I change it back to “^value($Name)^” and run the export again.
The content is invisible, but it contains "<h2>馬(horse in Jaoanese)</h2>"
Now I am placed betwixt and between.
How can I get to the bottom of this?
I would be grateful for your wisdom.
Thx and regards, WAKAMATSU

mwra · October 18, 2021, 11:29am

This is likely to do with Japanese script being a DBCS (double-byte character set^†). I don’t think the problem is the input text, i.e. your use of Japanese characters, but how various systems processing that script

The question here is whether the breakage occurs within Tinderbox processes (e.g. ^export code^) or within the Markdown process.

The simplest tear-down to expose the problem is to strip out the extra Markdown set. Does the process work correctly when exporting HTML. If it does not then there is an issue with Tinderbox export code process. As I understand it, Tinderbox has been fully Unicode capable since v6+, so writing in a double-byte language should not matter. But, as many of the deeper parts of software (i.e. OS frameworks, not just apps themselves) were likely written by codes using single byte languages (e.g. English) some errors may still lurk.

If HTML export works, then add Markdown. Remember, Markdown is—for better or worse—just a shortcut for writing mark-up in text without it being too visually intrusive. That text then has to be processed to generate HTML. This offers an extra point at which double-byte content may be handled incorrectly.

Your first post above suggests all works until you add Markdown into the mix. But we should not rush to assumption that Markdown is the villain here.

At this point I would:

make the smallest possible Tinderbox file (i.e. necessary prototypes, templates and only one or two actual notes to export to HTML). Also make the exporting notes as short as possible. We only need to see the error once per exported note.
copy that file and make necessary changes to implement Markdown.
post both those demos here.

By using the minimum amount of notes (and $text in those notes) needed to show the problem, it helps everyone as I’m aware we are working across a language barrier.

By following this method, I would expect to see an HTML-based TBX that does work and a version doing the same thing but including Markdown code and processing, and which fails.

Having small working and not-working files would be useful. Also, in my experience, just the act of making such small tests exposes the cause or likely cause of the failure.

The temptation is to work on a bigger file, because that was where the problem was spotted. I’m certainly guilty of that error (many issues I have encountered first showed up in the TBX for aTbRef which is much bigger/more complex than most TBXs; tearing down to a smaller test is the correct next action).

I hope that helps.

Also, the (translated) term ‘garbled’ does make sense in (translated) English. I appreciate the effort put into your posts here and the difficulty of explaining fine detail via translated text.

†. This refers to languages whose alphabet cannot be expressed within 1 byte (limit: 256 characters) and which require two bytes to encode each character in the set.

WAKAMATSU · October 18, 2021, 1:25pm

Dear Dr.Mark Anderson,
Thanks a lot for your significant phrase.
I am aware of the 2-byte code, but I am having the same problem
with the 2-byte code conversion under Tinderbox 9.
I try to read the sentence with accuracy,
some word that is not clear in meaning for me.
[copy that file and make necessary changes to implement Markdown.]
How can I change to implement Markdown ?
I change HTMLexportExtention from html to md, that is all what I can.
At the final analysis, I could not get export file, used HTML nor md extension,
for note created with only Japanese name at the beginning of the file,
the file did not export at all.
I used your zettelcode-demo.tbx as an attachment.
The contents of the zip file are
zettelcode-demo.tbx & zettelcode-demo-md.tbx
htmlExport Folder & mdExport Folder.
Please take a look the results for the following unnamed files.
htmlExport ＞ Zettels ＞ Testcell ＞.html file
mdExport ＞ Zettels ＞ Testcell ＞.md file
Respectfully, WAKAMATSU
zettelcode-demoExport.zip (94.2 KB)

mwra · October 18, 2021, 2:56pm

Ah, that done nothing except change the exported file’s name from ‘.md’ to ‘html’. That has no effect on the content exported. To generate export that turns Markdown code in $Text into HTML you need to set up Tinderbox to use Markdown.

But some other issues first. I think I understand the ‘garbled’ text issue. You see HTML code for the note title ‘馬’ exported like this:

<h2>馬</h2>

but you expected,

<h2>馬</h2>

But here nothing is broken, or garbled. All is in order. The 馬 above is simply the character 馬 expressed as a web-safe HTML entity code.

This was needed much more in the early days of the web than now. The easy way to stop the above from happening is to set $HTMLEntities to false via the Document Inspector’s system tab. In the top box, type ‘HTMLEntities’ and press the Return key, the atrribute should be selected. In the value box change true to false and press the Apply button. Now all notes in the TBX document will not generate HTML entities when using ^title^ or ^text^ export codes.

There are also some inconsistencies in your two text TBXS. I’ll address this later in a separate replay as I have a meeting just starting.

mwra · October 18, 2021, 6:56pm

Ok, I’ve edited the HTML TBX so $HTMLEntities is false at document level. I also rolled back changes to the two HTML export templates. The ‘Export’ pane for the ‘Zettels’ note, now looks like this.

Here is the TBX: zettelcode-demo-1.tbx (131.8 KB)

For the Markdown TBX, I fixed the HTML export templates and reset $HTMLEntities as in the HTML TBX. I also added the built-in Markdown prototype. The 'Zettelnotes` prototype now uses the Markdown prototype as a prototype. Yes, prototypes can do this!. As a result, all your exporting zettel notes are configured for Markdown rendering/processing.

The export code now looks like this:

Here is the modified TBX: zettelcode-demo-md1.tbx (133.9 KB)

I think this resolves the problems. If not, please ask

WAKAMATSU · October 20, 2021, 3:02am

Dear Dr. Mark Anderson,
Thank you very much for the example tbx.
I will try it out.
Respectfully, WAKAMATSU