Regex beginning anchor character ignored/invalid?

I’m using the following expression which works to find a birthdate in $Text.
$Text.extract("Born: (.*)")
However the following does not work even when "Born: " is at the beginning of a line.
$Text.extract("^Born: (.*)")
The regex is certainly valid, why would it not be working?

This works for me:

$MyString = $Text.extract("Born: .*");

Example:

I’m not sure what the extra parentheses are for in your example but they seem to be the issue.

See String.extract(regexStr[, caseInsensitiveBln]).

If you want what follows "Born: ", then use stream parsing:

$Text.skipTo("Born: ").captureLine("MyStringA"); 

See: String.skipTo(matchStr) and String.captureLine([targetAttributeStr]).

A clarification, in a multi-line/paragraph string like $Text .extract() tried matching line by line and then returns the remainder of that line.

If the whole line with the match were “Assumed Born: 1924”, i.e not starting with 'Born", the result of:

$MyString = $Text.extract("Born: .*");

…is still a match, but from the start of the match string to the end of the line. IOW, the “Assumed” part of the line is not captured.

In case you’ve multiple matches for that pattern, consider String.extractAll(regexStr[, caseInsensitiveBln]).

The parentheses are regex syntax for a capture group. String.extract() returns the value of the first capture group. The expression returns the stuff after the prefix and can be converted immediately using date(). The ^ character should anchor the match to the beginning of the line but when introduced it does not produce the expected result. I can’t tell what it produces, possible the empty string.

Ahh. I think I figured it out. This isn’t grep. So if "Born: " appears at the beginning of a line, but not at the beginning of the $Text, "^Born: " won’t match. I do need to use some sort of streaming in this case.

I think a better approach might be

$Text.skipTo("Born: ").captureNumber("MyString")

Why didn’t $Text.extract("^Born: (.*)") work?

I have a theory, but why theorize when I can find out! Wait a few minutes for an update.

My theory was mistaken. It turns out that .extract assumes that ^ and $ match only the start and end of the complete string. So, your failing example would work if you fed it a line at a time using .eachLine( ){ }, but not for your multiline string.

I think your interpretation is more natural, and that will apply from b695 onward.

I think that enhancement would be more natural, but how would this new interpretation handle the possibility of multiple lines matching the pattern? Would it just pick the first one?

Any possibility of adding a second argument to String.extract that specifies the capture group number? i.e. String.extract(pattern, n) would return the nth capture group. Sometime parentheses are needed for grouping things like alternate prefixes and the real result is later in the string. 0 could return the entire match. This would be consistent with other regex implementations such as found in ruby or python.

Did you look at String.extractAll(regexStr[, caseInsensitiveBln]), which handles that scenario.

As noted, as you really want the rest of the line after 'Born: ', stream parsing is the way to go.

If I recall, .extract() is really for when you have a target sub-string you can define via a pattern and want to copy it without the faff of doing a .contains() query and populating a back-reference. for instance, in the apst (and still possible) is:

$Text.contains("Born: (.*)");
$MyString = $1;

where, for the earlier test text, value of $MyString is string “1924”. Note that the .* in the pattern only reads to the end of the matched line (i.e. $Text paragraph) and not to the very end of $Text.

.extract() and its sister operator .extractAll() were added in v9.1.0 alongside the new stream parsing operators (q.v. above) although not as part of that suite of tools. I assume the design use case is that .extract() is for simple matches where the regex pattern is the whole of the desired substring. For cases where the text follows a known (but not desired in the capture) string/pattern, stream parsing tools are the expected tools to use.

The reason multiple tools are available reflects the ever new niche cases user bring out. Stream parsing might seem more complex at first sight. But, done correctly, it extracts