Hi all!
I have a note which is called by the author’s name (for example F.W. Taylor) how to make it automatically link to all notes which mention this author (can be not the full name but only last name or surname, but in another language).
I did something like that earlier with the code from the tutorial Tinderbox Training Video 51- Linking With Tinderbox.
This code linked notes by matching attribute sensor and note name.
here’s the code:
First, beware of noisy data. In free text, we might have:
“F. W. Taylor wrote The Principles Of Scientific Management.
Frederick Winslow Taylor was born in 1856.
Professor Brandeis called on Fred Taylor, who had recently enrolled in his course.
Second: there are new tools in Tinderbox 9 to help with this task, so a short wait might not be wasted.
$NLNames extracts names from text and should be helpful. In fact, you might use $NLNames much as you use $sensor, first checking that the destination exists, or perhaps that the destination is one of the names of interest.
Dear Mark, thank you for your response. I will wait for Tinderbox 9.
Unfortunately $NLNames in Russian does not work (at least for me). And with English it does not always work, for example:
@eastgate posted while I was drafting, but as this is done…
Firstly, I really concur the point about messy data. Make sure what you are matching if what you mean to match.
For instance you might do well to do one, or more agent queries and add a $Tags value or a user attribute value of some sort to those articles where a link is actually needed. More work? Yes, but a cleaner result so arguably the extra work is warranted in the time later lost following inappropriate links. Even more so if you are going to use further automation based on the existence of those links. Garbage in, garbage out. Computers struggle with nuances obvious to the human brain—and vice-versa.
Here below, are some current approaches you can use at present, but I’d agree with @eastgate that better is to come soon in v9.
OK, so you want note “F. W. Taylor” to link to (from?) the original of any note whose $Text includes the (exact) substring “F. W. Taylor”. Some thoughts
Context of use. As you need to use a regex search on $Text you don’t want this running on a Rule or full-on agent. For an action, I’d use a stamp (run once per selected note per application) or an edict. With an agent I’d run it and then turn the priority down or delete it. Why delete the agent rather than turn it off? Turning off retains all the aliases and their rules and edicts may still run. Thus you may have 100s of aliases still pointlessly evaluating their rules/edicts.
Agent approach. Make an agent to query $Text.contains("F\. W\. Taylor"). The escapes because a full stop is a regex special character matching any character. Better would be to store “F. W. Taylor” in $MyString in the agent and make it visible as a Displayed Attribute. Then query $Text.contains($MyString(agent)). this makes re-use of the agent much easier. Then the agent action is linkFromOriginal(this, "cites"), you can chose To/From direction as suits.
Action approach. Here you have to encapsulate the query in the action. So for note with $Name F. W. Taylor it could be stamped like so:
Of the two approaches, the action stamp has more flexibility. I’d not use it in an edict or rule as you’d be running a lot more .contains() operations than needed.
Thank you very much Mark. I will try your approach until the new version comes out. After your clarification I will use the stamp.
If I may, I have a question to understand: $Name(that) - in this construction “that” means everything that is written in this attribute? What does “this” mean by a given note? And why “aPath” and not “Path” what is the “a” for?
I believe the Tinderbox processes feeding such tags leverage underlying Apple Frameworks so language support—or the depth/accuracy of NLP—can vary by language. English-language work (or study) has a significant advantage here, not least as there are bigger training corpuses and it is where much of the research started (even if research done in non-English speaking nations). Apartial explanation might be this comment I saw
highly-inflected languages (e.g. languages with a lot of grammatical cases, like Latin, Russian, or Finnish) may perform poorly without lemmatization (using the “dictionary form” of words, versus whatever inflected form is actually present in the text), especially for smaller text corpora
I wonder if the app needs some locale prompt to know that the content may be different from the OS or other locale setting. In other words the app/OS thinks the text being processed is language X but the user knows is Y.
Again some improvements (hopefully!) are on the way for v9.
My apologies. The that term is actually a designator that tells the find() to get the value from the calling note—the one being stamped—and not the note currently being tested by the query. I’ve written an explanation here.
I think that the situation with NLNames and Russian will also improve in Tinderbox 9, though I’m not sure how well trained the neural networks are in Russian. My experience is that they’re remarkably good at inflection, though of course names are also incredibly tricky.