Because it wasnât my problem and I did not know what was being done next! Having the sentences as discrete list items alls more flexibility for the next stageâsuch as being able to output the result to $Text as line-spaced items.
Note that after the .replace() the last text begins with a leading space which is âcleanedâ away during the concatenation by the code vTexts+=aPart;
. Good, because that suits us, but if not you might need to find a way to guard or re-create that space.
Aside, it was convenient that you didnât want the text enclosed in square brackets. These are regex characters and caused a problem with some of he approaches. One thing that did work was to replace [
with @
and ]
with #
so the brackets could be reinserted after text was extracted. Of course if your next stage is text in Markdown, you might not want to use #
as a replacement marker. But as [
and ]
differ in use you ideally want different replacement characters for each to save having to do additional tests: IOW, whether the marker as at the beginning of the string or word in a string OR end of a word or end of the string ⌠all more work with potential side-cases.
FWIW, my first thought was to use stream parsing and then realised it was quicker to make the above solution (as I didnât have much free time at that moment). A stream approach would be a combo of .skipTo()
to âconsumeâ the streamâi.e. move the parser cursor forwardsâthen .captureTo()
to save the desired text, then detect the next text start marker, etc.
However, consider that Stream .capture
-based operators pass their captured sub-string to an attribute nominated as an argument. So youâd need a different attribute to capture each matched substring. Using this with your source $Text:
$Text(/Test text).skipTo("<p>").captureTo("</p>",$MyList).skipTo("<p>").captureTo("</p>",$MyList).skipTo("<p>").captureTo("</p>",$MyList);
(N.B. the above works even if the source $Text has multiple paragraphs, i.e. the .skipTo()
can skip past a line break)
You might expect $MyList to hold 3 list items, but in fact there is only one: â[TLDR] Some more Text
â, i.e. the last recovered sub-string. Why? Each attribute write from captureTo()
to the same attribute (re-)sets the whole attribute value, even for a list or dictionary target.
Another problem is the above only finds the first 3 sub-strings in the source HTML snippet. What if you used the code on a snippet with 5 embedded sub-strings? Youâd only recover only the first 3 and, because of the above, only item #3 would actually be saved.
Bear in mind that Stream Parsing, as it exists in v9.6.1, was originally conceived to do thinks like parsing mail headers or structured text, ideally where each target is preceded by a label and where each target will be saved to a discrete attribute. The test above where we might want to recover an undefined number of substrings as a list wasnât part of that design concept. So not a failure, just a scenario for which Stream Parsing was not designed.