Tinderbox Forum

German Umlaute in String Operations

Hi, I was experimenting with Importing markdown notes from dev Devonthink containing links to specific items in Devonthink.

During trying out the correct action statement for the explode action, I am recognizing String.find() and String.substr() seem to calculate the positions wrong in case of German Umlaute contained in the subset.

Please find example in the screenshot. One example $Text containing no Umlaut, second example with Umlaut. In the second example, the position is wrong.

I assume, because of the additional byte an Umlaut is needing.

Is this a bug or do I have to implement the correct calculation of myself ?

Thanks for your advice.

It does look as if an umlaut character is giving an off-by-one error. To replicate your report I took these steps (test file attached at end). To test I took this string from your screen-grab:

1. [2021-03-15_Post_dysfunktionales_Team](x-devonthink-item

I made that the $Text of a note and ran this stamp on it:

$MyNumber = $Text.find("x");
$MyString = $Text.substr($MyNumber,1);

For the above string $MyNumber is 42 and $MyString is x. Now, in a new note, I laced an amended (sorry if ungrammatically!) a ‘u’ to a ‘ü’, as in:

1. [2021-03-15_Post_dysfünktionales_Team](x-devonthink-item

Re-running the stamp on the revised $Text, $MyNumber is now 43 (wrong!) and $MyString is now -. So changing one character to an accented character is causing a mis-report—from the user’s perspective—of the .find(string) string argument if the accented character comes before the .find(string) string .

Another check mores the accent after the searched string (again, apologies for non-grammatical usage), undoing the ‘ü’ change but altering ‘o’ to ‘ö’ later in the the string.:

1. [2021-03-15_Pöst_dysfünktionales_Team](x-devonthink-item

Re-running the stamp on the revised $Text, $MyNumber returns to 42 (correct!) and $MyString is x. The latter reinforces the observation that adding an accented character. Indeed, as in the test file, adding two accented characters before string results in a $MyNumber of 44.

See: find-string-test.tbx (84.3 KB)

Whilst, that can’t easily be fixed, we might route around it for the wider context. What custom Explode delimiter are you trying to define? Here to a simple text file might help us fellow users help you. :slight_smile:

The problem lies in $Text.find(), which is handling Unicode incorrectly. I’ll get this fixed asap.

1 Like

Thanks for your analysis! For my specific case, which is very simple, I have found a solution. It is only one Umlaut, so with a simple if then else clause, everything worked.

1 Like

Good to know! Thanks for your confirmation!