Tinderbox Forum

Wildcard search in query

Hi there, is there a way to conduct a wildcard search, e.g. $Capabilities.comtains (*Password). I want to search for a specific word not a complete phrase.

Yes, both String.contains() and String.icontains() allow regular experssion (regex) search. However, if testing List or Set type data you can only match to whole list items - see more. So, we’ll assume $Capabilities is a String-type and we want a case-insensitive match to the sub-string “password”.

$Capabilities.icontains("password")

Here regex are not needed as we have a literal string and .icontains() deals with case. We don’t need to worry about the other parts of the string. Only if you need to deal with placement of the term (e.g. at the end of the string, or in combo with other character, or if a back-reference is needed would you need a regex. This does the same as above but uses more CPU cycles to do so there is nogain:

$Capabilities.icontains(".*password.*")

The .* regex just means ‘zero or more characters of any type’ so would still match if the only characters in the target were “password”. In fact, I suspect when you use the first term that is how the app deals with it so my observation on CPU use may be moot! The main point re regex (contains, icontains, replace, etc.) is if you can use a more precise operator—not the case here—such as == or != then try to do so.

If you’ve several real phrases to test, as in yuo “passord” above, in an agent you can set $MyString as a Key Attributes in the agent and add your search term there. Then use the query:

$Capabilities.icontains($MyString(agent))

The search term is not quoted as we’re telling Tinderbox to replace our literal or regex match string with the string stored in the $MyString belonging to the agent. Change the agent’s $MyString, you change the query. Neat!

Does that help?

This is perfect. You had me up to the $MyString part. I don’t understand how to do that with an agent.

With an agent? The secret is use of a designator, this case ‘agent’. Let’s unwrap the layers. If you use the query:

$Capabilities.icontains($MyString)

instead of checking $Capabilities in each note against a string like “password”, the query reads the value $MyString in each note it checks and uses that string value for the icontains() test. Cool, except now if you want to check 1,000 notes, you’d have to set the first string test value in every note. Then set the next test string in al those notes etc. Way too much work. But if we use:

$Capabilities.icontains($MyString(agent))

for each note checked the value of $MyString in the agent is used as the test value. So, set it once in the agent and all notes get the same test: change the agent’s $MyString, you get a different test, etc.

How you set the agent’s $MyString value is up to you - there are so many ways: as a Key Attribute for the agent, via Get Info, QuickStamp, etc. Choose one you like.

It really is that simple.

Thanks. Finally getting it. One question. What if you wanted to search variations fo a test, e.g. “Password” and/or “Possward”. Is there a way to do this with the above method, or do you need to create individual agents with the different test criteria you’re looking to test?

One approach is to query both possibilities

$Text.contains(“password”) | $Text.contains(“lösenord”)

Another approach might be to devise a regular expression that will match your typos:

$Text.icontains(“p.ssw.*rd”)

This will match “password” and “Possweerd” but not “putrid”.

Ya, that makes sense. I’ll plan around with these ideas with the sting attribute idea Mark shared above. Thanks.

Hey there, sorry to ask, again, but I’m unable to get the wildcard search to work.

If I have the following in $Citation “(Wunderman Thompson, 2019)” and all I remember is “Wunderman” what would be the wildcard query to find notes with this citation?
*

I’ve tried combinations of $Citation.icontains(“Wun"); or $Citation.icontains(".**Wun.”); or $Citation.icontains(“Wun”);

Clearly, I’m missing something VERY simple.

You didn’t provide the context, but over here In a small test file an agent searching for

$Citation.icontains("Wun");

Successfully finds the relevant note.

Smart quotes can cause failure sometimes. Make sure you use straight quotes.

I’m, I tried that. I tried that and it is just not working for me. $Citation is a string variable. When I under the above sadly nothing comes back.

Here is my agent:

image

Here are a list of notes that have the Citation,

Again, I want to pull these notes from an agent using the wildcard.

The example query in the image does not contain a wildcard, does it?

Anyway, I cannot reproduce – but we here do not have the file. Perhaps if you sent it to @eastgate for review?

I too can’t replicate this based on the info supplied. IOW, the issue is likely an overlooked, undescribed, aspect of the file in question. Thus, it would be useful to see a test TBX file that reputably shows the error. The test case only needs 2 notes (a positive and negative match) and an agent) as it seems likely some unstated element is at play.

Taking another tack, you could try:

$Citation.beginsWith("(Wun");

N.B. this operator olny allows literal string matches, so no regex.

Got it, figured out the problem.

I had my $Citation attribute as a “set” not as a “string”. I created a test file as you suggested and it worked, which led me to figure out my error.

In my live file I changed $Citation from Set to String and it worked. :slight_smile:

I would love to figure out how to search using partial words and wildcards on sets, however. For example, looking for “Priv” to capture any time that has “Privacy” in the $Tags set.

Maybe on our next Saturday call we can go over wildcard searches, including using operators like
*, .*, etc.

Thanks.

Michael

.icontains when used with lists or sets does not take regular expressions (i.e., “wildcards”). Just strings.

Set and list comparison requires literals, not regular expressions, to void confusing partial matches. If you want to use regular expressions with set and list comparison, coerce the set or list to a string:

Rule: MyString = $MyList

AgentQuery: $MyList.contains(“Frac.*s”)

Shouldn’t that query be:

 $MyString.contains("Frac.*s")

I’m not sure but, could this not be done on the fly without needing to pre-store the string-ified list? IOW:

$MyList.format(" ").contains("Frac.*s")

So if $MyList olds the values ‘Frantic’, ‘Fractal’ and ‘Frames’, the string tested by .contains() becomes “Frantic;Fractal;Frames”.

Indeed, using the test doc to check @satikusala’s problem, if I make $Citation into a Set and use a query $Citation.format(";").contains(“Wun”), it works. Here is the test: Set-check.tbx (79.2 KB). The agent sets the tested note’s $Text simply to prove the target string being tested by the chained .contains is a string. You don’t have to use a semi-colon concatenation (i.e. the default list item delimiter) but if choosing a not letter character don’t choose one with a regex meaning.

Expectation management: cClearly, if one scales this to using the method on 000s of notes each with large numbers of list items and using a complex regex, performance may be less snappy. I’d assume users working at this scale would expect such an effect, but I still think it warrants noting for those as yet only working in small docs where the issue is moot.

1 Like

Waaay tooo much fun guys. :slight_smile: Thanks. I’d like to add one more learning to this; Mark showed me this a couple of weeks ago.

You can use a designator “agent” so that the query will pull the search team from the $MyString. Works great. This is easier than messing with the query string. For those interested in playing with this, here is my test file WildcardSearch.tbx (133.4 KB).

Mark and Paul, I do have a question though.

I’m trying to learn more about regular expression search so I went here: https://www.acrobatfaq.com/tb_manual/tinderbo/append_1.html.

The Boost link referenced is broken: http://www.boost.org/libs/regex/doc/syntax_perl.html.

I can seem to find the correct replacement link. Can you point me in the right direction?

Also, stupid question but if I wanted to read up more on the basic Tinderbox code syntax? What language is it? JavaScript? Perl? Others?

Tinderbox code is sui generis, with a generous hat tip to C.

I don’t recommend reading a dry manual on regular expressions. Rather, get an app such as Expressions or one of a number of sites you can find via a web search that lets you try out expressions with sample data. Best way to learn regular expressions is hands on. You’re familiar with the drill.

1 Like

Further to the last, as regex comes in many ‘flavours’ of implementation, in tools like the above use documentation of the Perl-style choice. I don’t believe there is a canonical reference as such. the basic regex forms seem common to most flavours. It’s only at the more exotic level (that few use) where differences of implementation come into play. Ergo, most general regex references will get you started.

After some years of use, I will say that if you can bear to do so, you get much more out of regex by taking time to see what they do and not simple copy/paste sections of ‘magic’ code. Regex are very precise, and in a manner often more granular in execution than our assumptions in writing them. Don’t worry about the odd-looking syntax but just remember ‘garbage in garbage out’. If it doesn’t work, esp in a spall localised test (I.e. no/less scope for edge cases) it likely means you haven’t yet found the edge case.

Very cool, and only $4.99. Just bought it and tried it out. :slight_smile: This will be a great way to learn.

1 Like