Sorting Umlauts in the Attribute Browser

KaiS · July 14, 2018, 12:36pm

Hello again,

while working on cataloguing my library in Tinderbox I’ve noticed a somewhat strange behaviour in the Attribute Browser when sorting notes:

Whenever I sort notes by an attribute and that attribute contains an Umlaut (ä,ö,ü) or other special characters , the note gets sorted in at the end of that particular letter.

Example:

I use attributes for author names. When sorting the notes by author name the german author Erich Kästner is placed at the end of the list of authors beginning with “K”, instead of showing up after the authors starting with “Kad” (in german ä = ae).

Also, author names not beginning with a capital letter (for example, the Dutch author Margriet de Moor, whose last name begins “de”) are placed at the end of the list after the authors beginning with “Z”.

Is there any way to fix this behavior?

mwra · July 14, 2018, 2:40pm

Sorting, like date formats, is more complex than imagined as there’s more variation than we tend to assume. My understanding is that accented character sorting varies by country and so in v6+ Tinderbox uses the OS locale setting to determine this. Thus if I sort German names under, for example, my en-UK locale I probably may get different sorting of names with accents than someone using a German (or Austrian, Swiss, etc.) locale.

You can set a specific locale used by Tinderbox (i.e. different to the OS locale) via action code -see locale(). I’m not 100% sure on the scope duration. I think it is per session rather than per document. IOW, it affects all open docs but doesn’t persist past the current session. Again @eastgate may be able to clarify.

If you think sorting is occurring in a fashion that doesn’t match the expected sort for your locale, please contact tech support with a sample document and the locale(s) you use as this may need finer testing to resolve the issue.

eastgate · July 14, 2018, 5:38pm

Most sorting in Tinderbox now uses the current system locale. I’m not entirely sure that this is true of attribute browser — there are a few places where we continue to use code point sorting, the sorting generally favored by software engineers.

Case-sensitivity, on the other hand, is tricky. So is recognizing which name to sort by: my local library, for example, has a very tough time remembering how is shelves the work of Edward St. Aubyn, much less Emily St. John Mandel. And don’t get me started on G. E. M. de Ste. Croix! I’ve been tempted to introduce a canonical bibliographic sort, but I’m not entirely convinced that this would help many people all that much.

My way of coping with these situations is to use OnAdd to populate a default value that’s extracted from the relevant name. I create a user attribute $SortName, which we will use for sorting, and the OnAdd rule takes a guess

$SortName=$Name.lastWord

If you don’t want lowercase names to sort separately, use

$SortName=$Name.lastWord.upperCase

Finally, if Tinderbox’s idea of the name is wrong — if it thinks Martin’s family name was Junior — just replace the guess and enter whatever will get sorted in the right place.

KaiS · July 15, 2018, 4:13pm

Unfortunately switching from the English system locale to German didn’t do anything, the sort order remains unchanged in the attribute browser. The locale action didn’t do anything either. I’m also a bit reluctant to add a SortName attribute, because I already have lots of custom attributes and I only need it for the attribute browser.

mwra · July 15, 2018, 4:44pm

I wouldn’t worry in this regard. My documents are big and I often may have 20-30 user attributes. They don’t pose a performance problem.

It looks like, as per the opening to MB’s last post above, Attribute Browser may not currently (v7.5.4) be using locale-based support.

eastgate · July 15, 2018, 8:06pm

I’ve added an issue to check on the sort criterion; I think it’s quite possible that we’re not using localized sorting here.

Update: Yes, we’re not using localized sorting to sort categories in the attribute browser. (I assume that’s what we’re talking about here?). In principle, we ought to use localized sorting, both to get umlauts and accents right, and also for East Asian languages. In practice, this might be a performance bottleneck; it’s additional work for attribute browser, and attribute browser has its hands full. We’ll have to see!