Tinderbox Forum

Cleaning up $Name of Imported filenames with extra (unwanted) characters

Hi
I have quite a few notes that I need to clean up some white space and character before the Title begins I have successfully identified my notes but have not used the replace command to replace the names before.

Here is what I am trying to do:
" • Name1" to Name1
" • Name2" to Name2

I have successfully used an agent to identify my notes but I am getting stuck on the Action code for the agent.

Here is what my Query to find these notes looks like:
Query:
$Name.icontains("^\s+•\s+\w+")

Action? I think I will need to use the replace command to remove the extra characters but this is where I am getting stuck.

Thoughts?
Thanks in advance
Tom

You want a back-reference to the $Name in the query, so amend it thus:

$Name.icontains("^\s+•\s+(\w+)")

The extra parentheses are the back-reference markers. Now the agent action is:

$MyString=$1;

Thanks Mark. The explanation and code works perfectly. Learned something today :slight_smile:

Comment: (in the example given in atbref, you mention in back-reference )
Example shown below from atbref, $0,$1 make sense but $2 is confusing to me. I only mention this in case you want to update or use an additional example for others.

 query:  `$Text.contains.("email: (\w+)<(.+)>")`
 If the whole referenced $Text were:
 `Source email: John, on 24/03/2010`  

 …then the above query gives these back-references:

 $0:  `email: John`

 $1:  `John`

 $2:  `johndoe@example.com` 

As always, many thanks
Tom

[Edit: I’ve updated the page on back-references. I need to better cross refer things, but the update is a first stab at a better aTbRef articlee, for now]

Agree, and I just spotted a typo as well (‘were’ => ‘where’). Grr. In the interim, for a query:

`$Text.contains.("email: (\w+)<(.+)>")`

Back-reference $0 is the contents of ("email: (\w+)<(.+)>"), i.e. the whole of the matched (sub-)string of the source string—in this case, $Text. The other back-references , $1, $2 etc., are always a part of what is matched by $0.

So $1 returns the contents of the first pair of matched parentheses within the overall query expression, reading left to right.

Thus $1is a string that is the contents of (\w+), without the parentheses!

$2 is a string that is the contents of (.+).

$3 returns nothing as there is no third set of back-reference generating references. However, Tinderbox does allow up to 9 ($1 - $9) back-references within a query string.

I need to check, but while back-references can be nested I’m not sure if they can overlap (i.e. one reference opens before another closes).

If in doubt about mapping $-numbers to the correct part of the query match, the answer—as with most action code conundrums— is to make a small test file so you can concentrate just on the problem at hand without confounding effects from the rest of your actual work file.

Back references can be used with agent queries, find() queries, .replace(findStr, replaceStr) where the back-refs are set in findStr and used in replaceStr. Back-reference can also be using in macros, both action code do() and export code ^do()^.

[Edit: I need to check on the find() context]

Hopefully that fills in a few blanks! :slight_smile:

1 Like

One other thing as was reminded of when setting up to test your problem is that when manually editing $Name (e.g. in the view or in the title box on the text pane) Tinderbox will generally strip leading/trailing whitespace.

Your example used a leading tab+bullet+tab string. I copied that from you post to a new file. But selecting that sub-string in the text pane title box, in order to see what the whitespace was, when the box lost focus the initial (whitespace) tab was deleted.

I had to set it back by using a stamp in order for your query to work. This:

$Name.icontains("^\s+•\s+\w+")

must have 1 or more whitespace characters before the bullet. If you are accidentally (as above) triggering removal of leading whitespace, consider this:

$Name.icontains("^\s*•\s+\w+")

… which must have zero or more whitespace characters before the bullet, but is otherwise the same. Regular expressions are infuriatingly precise and in ways that at first sight seem counter-intuitive: we mentally discard odd edge cases in a way the regex code cannot do.

1 Like

Follow up question: $Name.icontains("^\s*•\s+(\w*+)") works perfectly for my Name1 in the above example but…what would work better would be just to strip out the " • " and keep the rest of the titles of varying length IOW…simply put…

" • This is the name of the complete title" -> “This is the name of the complete title”

IOW…I want to strip out the " • " from the beginning of the title and keep everything else.

Thoughts?

Thanks in advance.
Tom

OK try this, in a test file. Don’t use it on real data until you’re happy. Query:

$Name.icontains("^\s*•\s+(.+)$")

Action:

$Name = $1;

From some trivial tests in v8.2.1 this works but please don’t try it on your real data without testing as it replaces existing $Name data and if it goes wrong you can’t undo. So, make sure you work on a copy of your data.

Many Thanks Mark.

Worked Perfectly. Thank you for reminding me of some regex details I had not put together on my own Nice.

All the best, Tom