Searching for, and copy/pasting, words with accents

Hello,

A simple question, I hope.

When searching tinderbox it will not find Bártok (with an accent) when I search for Bartok (without accent). Is there a way to make the search less fussy?

Also, when copying and pasting text (From outside tinderbox) with accents I seem to get odd extra characters and no accents ie Ditta Pásztory comes up as Ditta Pásztory. This does not happen when pasting into other apps I tried.

Many thanks for any advice,

Best wishes

Thomas

Lots of (unintended) ambiguity here. Also we don’t know your OS or app version and these could be contributory factors here.

I can’t replicate the error, testing in v11.6.0, on macOS 26.3.1, using en-GB (British English) locale. Firstly, the type of search:

  • Find (view pane).
    • Search term “Bártok” works using default ‘Text’ match choice). Tested $Name and $Text separately as the target attribute.
    • Search term “Bártok” works using ‘Regular Expression’ match choice). Tested $Name and $Text separately as the target attribute.
    • Query $MyString.icontains("Bártok") works using ‘Tinderbox Expression’ match choice).
    • Query $MyString=="Bártok" works using ‘Tinderbox Expression’ match choice).
  • Find (Text pane, $Text area).
    • Search term “Bártok” works (using ‘contains’ option)
  • Filter (view pane, some views only).
    • In outline $Text.icontains("Bártok") works
  • Agent query / query in action code. (If so what operator(s)?)
    • Query $Text.icontains("Bártok") works.
    • Query $Name.icontains("Bártok") works.

Also target attribute:

  • note title ($Name), or text ($Text), or other attributes?
    • =="Bártok" works in action code.
    • .icontains("Bártok") worksin action code.

When copying/pasting, what is the source app/format? If the app copying to the clipboard provides incorrect or incomplete encoding info, the receiving app is forced to guess and may do so incorrectly. Ditta Pásztory definitely looks like the result of mistranslated encoding info. BUt without more info as to the source it is hard to tell.

Generally, the OS and app are (should be!) Unicode capable, but if not using current OS or app version it should be considered a possible factor. If I recall some diacriticals have several Unicode possible encodings (some correct, others similar looking) which can confuse transcription. This is especially true of digital material that pre-dates c.2000.

The OS Locale may also be an overlooked factor. For most people their system locale and their working language align. But for some, especially academics this is simply not possible. IIRC at one point we had a scholar studying cuneiform, writing in English and German on a Swedish locale Mac (presumably they were Swedish, which that used for all other general work). So, if different languages are part of your work, consider locale.

The English language (in its many flavours: BrE, AmE, etc.) is rather lackadaisical about diacritics. this is chauvinism as English doesn’t natively use accents except for loan words and over time those also loose their accent over time (résumé→resumé→resume). Unfortunately, English also happens to be the predominant language of coding and assumptions tend to be based on experience/perspective … what could possibly go wrong?

†. Actually the current beta, but there are no changes since v11.6.0 release that should affect the above.

‡. For instance, the abbreviation for ‘foot’ is actually correctly the prime symbol (U+2032) but people often use the single straight quote (ASCII apostrophe) ' (U+0027) instead.

Dear Mark,

Many thanks. I will try to give a precise exmaple

If I use search term “Bartok” (no accent) using the Find command (command F) it will not find a note with name “Bártok” (with accent) . I can see why one might want to be precise, but I would like to be able to find all the versions of Bartok, with and without accents.

Maybe there is a setting somewhere that I dont know about. My Mac setting are Region United Kingdom and language English uk

Regarding copy of text with accents from an outside app into Tinderbox I will see if I can do. The text comes from a note in “thebrain” app. If I paste it into most other apps they seem to work fine (including this forum) , but the tinderbox note has the extra characters added,

All the best

Thomas

Ah. I was perhaps testing the wrong thing, for search/find not correctly using an accent supplied in the input string. Unicode support has made this issue go away (bar encoding edge cases mentioned).

The issue here is you are assuming there is some automatic cross-mapping of ‘Bartok’ to ‘Bártok’. so an ‘accent-insensitive’ search in the manner of case-(in)sensitive searched. At the scale of Google or a large AI LLM, this cross-mapping might occur. As the OS level, the large mapping may not be used (space/cost?). Indeed, we see this in Apple’s TextEdit app:

The only mention of the target name uses the accented version and sure enough TextEdit offers zero matches for ‘Bartok’. The same happens in reverse, if I search for ‘Bártok’ I get zero matches to ‘Bartok’. Even if I add Bártok as a known spelling that only affects the red misspelling markers in TExtEdit and not search.

So there is no error or bug here. Frustratingly, the issue is simply that your assumption about accent/non-accent character mapping is false in this context. There no right/wrong blame there, just an unwanted outcome! I’d assume, that as the app is building off Apple Frameworks, the latter is where your desired ‘fix’ is needed if such is even possible. @eastgate may correct me here, as I can’t see inside the app.

We might be able to offer a search that ignores diacritics. Whether or not that’s wise is, of course, a matter of some debate.

Well, that would be great. My most straightforward reason for this is that it’s easier to type a word without diacritics on my keyboard, and so that would make the search easier and quicker. The second is that often the notes Im searching mix up words with diacritics and without, so a more fuzzy search is helpful. Im sure that the code in Tinderbox would let me do this, but Im not an intuitive code user, so would be great to have the search built in.

The parallel of case-insensitive search seems like a good one.

I wonder what people using other-than-english in Tinderbox feel about this.

Many thanks for considering it.

Best

Thomas

In many languages, the question is vexed, and it often has a political dimension. I imagine the Académie français has strong opinions. The Ukrainian “i”, The Turkish “ı“, and Cyrillic Polish come to mind. If I’m not mistaken, Norwegian å vs aa used to have a political valence, which is why the Aarhus University comes to be located in Århus.

This used to be very hard to do, because it requires a lot of localization just to understand what is a diacritic mark and what is a distinct letter. Apple now does a pretty good job of it; at least, I don’t see many complaints.

A key question is, what would we call this in the search menu?

In terms of ease of typing letters with diacritics (accents), often over looked is that rather than remember shortcuts or use the OS’ Keyboard Viewer, if you press and hold a letter key, a pop-up shows all the variants and you press the number for the one you want.

Here, I type a and hold the key down:

  a

I assume the options shown may vary depending on your OS locale and keyboard in use. FWIW, the above is UK (en-gb) locale and with an Apple Extended keyboard, UK variant. The OS also callow you t ‘install’ a number of languages and swap (I think this actually changes the OS locale) to aid typing in some other languages, though y the keyboard doesn’t change.


Mapping from accented to accent-less characters seems much easier than the reverse. So, á is ‘A-acute’ and then getting to the ‘bare’ a is easy. In contrast, knowing all accented characters is more complex withuot a stored mapping. As noted above, the offering shown for ‘a’, might be only those considered pertinent in the current locale. Plus, some scientific fields re-purpose letters with a specific meaning and may have a discrete Unicode code point for that. Thus we may have multiple different numeric codes for a character that visibly look the same but to the computer, in code, are not. As demonstrated up-thread, Apple’s spellchecker variant as used in TextEdit will not treat an accented and unaccented ‘a’ as the same in search. Apple’s spellcheckers for its AI may well have more a sophisticated character map.

As a user I realise the natural reaction is “I don’t care about the detail, it should just work”. But, how to define ‘just work’. The more I learn about the underpinnings here, I’m amazed so many things even work at all. :slight_smile:

Having pondered this some more, I’m inclined in search to default to diacritic-insensitive search.

On the other question:

what is the source of the text you are pasting? This sounds like something is confused about its text encoding. This used to be common on the Web, though it’s less frequent nowadays.

Many thanks, Got it!

I do use the long-press and then type a number thing, but it still interrupts the typing rhythm and is an awkward bit of navigation, but nice to see the list , and many thanks for sharing it.

Best

Thomas

Brilliant, thank you. I too have been pondering it . Making it default sounds by far the easiest!

Best

Thomas

Re where the source was - it was from a ‘thought’ in “the brain” app. The thing is that pasting the same copied text into other apps seems to be fine, so I was curious about a way to get around it. (other than copying into another app and then into Tinderbox, which seems to work)
Best

Thomas

That’s odd. Could you send me a test document from The Brain that has an example of something that doesn’t paste properly.

Well, I will try. Here is a link to a thought with an example of troublesome text. (Using thebrain’s new ‘share a thought’ function. maybe it wont work) . When I look at the text that the brain is storing in it folder, using text edit, the text looks fine, so it must be something added during copy and paste..

But, good news, I find that if I hold the command+control+shift+v to paste (ie without formatting ) it works. So that is good for the moment.

Best

Thomas

I cannot copy from that web page.