Agent searching for text using boolean operators and regex

jfontana · July 14, 2021, 10:38am

I wanted to be as descriptive as possible in the title but here’s the problem I have. This is a bit strange.

So, I created an agent that in principle should gather all the notes containing the words ‘Drinka’ and ‘Persian’. If I construct the query as follows, it doesn’t yield any results.

$Text.contains("Drinka") & Text.contains("Persian");

If I construct it in any of these ways, though, I do get the expected results (there is a single note that contains both words):

$Text.contains("\bDrinka\b") & Text.contains("Persian");
$Text.contains("Drinka") & Text.contains("\bPersian\b");

I’m really puzzled. Why would indicating beginning/end of word with regular expressions make any difference? Why does one even need to use regex in this case if the strings in the query actually happen to match existing strings in the texts?

eastgate · July 14, 2021, 1:38pm

You don’t want the semicolon at the end of the query, but I don’t think that’s the problem.

To diagnose head scratchers like this, a good rule is: simplify the problem until it cannot possibly fail, and then test that.

So, I’d start with this query:

$Text.contains("Drinka")

Does it find your expected note? Does it find other notes that it ought to find?

Then, try Text.contains("Persian"). If both look to be OK, try putting the query together:

$Text.contains("Drinka") & Text.contains("Persian")

jfontana · July 14, 2021, 4:37pm

Actually, I think I know what the problem was but I’m still a bit confused. The following is the only note that contains the words Persian and Drinka. I think the search didn’t work because there is a colon right after Drinka:

Source: Oktor Skjærvø_Avestan and Old Persian Morphology.pdf

But this is questioned by Drinka: Language Contact in Europe: The Periphrastic Perfect through History:

Now, since there is the colon, I can see why the string “Drinka” would not be detected with .content (it would with .icontent) and how adding \b can help. What I don’t understand is why it works if \b is added at the beginning and end of “Persian” but not at the beginning and end of “Drinka”.

As for your comment about the semicolon, does that apply only to queries involving .contains? I’m asking because using a semicolon at the end of the query involving .icontains does seem to have some effect. The agent detects this note if I do:

$Text.icontains(“drinka”) & Text.icontains(“persian”);

but not if I do:

$Text.icontains(“drinka”) & Text.icontains(“persian”)

With .contains, it is totally the opposite and that’s one of the reasons my query didn’t work.

eastgate · July 14, 2021, 7:36pm

Queries should not end with a semicolon.

I made a new document. I added note A, and pasted your text into its text field.

I made an agent with the following query:

$Text.contains("Persia") & $Text.contains("Drinka")

The agent does indeed locate note A.