If I understand correctly, the source problem here is PDF Exert has no decent export configurations and you’re left using the default. You also want to use part of a sentence (up to a colon) as a title.
OK, so PDF Export has let you down, but it’s worth saying if you haven’t tried all the source app’s export method try fixing further in the process.
I made some test ‘source’ $Text:
Highlight [page 5]: First note. More stuff.
Highlight [page 51]: Another note. More stuff.
Highlight [page 501]: Different note. More stuff.
Highlight [page 5001]: Last note. More stuff.
I tried this stamp in Tinderbox (v7.5.6), without success:
$Text=$Text.replace("(Highlight \[page \d+\]: ","$1\n");
…though in fairness I’m also assuming $1 would change for each different page number. Indeed I don’t know if this sort of regex-based use of
.replace() on $Text in this way was envisaged as a regular task. When testing this, my app tended to sit churning away which indicates a lot of ‘thinking’ is going on and I had to force quit.So, I pasted my sample source text into BBEdit and did:
(Highlight \[page \d+\]):
This is just the sort of task for which such tools are intended. Anyway, the result is this $Text:
Highlight [page 5]. First note. More stuff.
Highlight [page 51]. Another note. More stuff.
Highlight [page 501]. Different note. More stuff.
Highlight [page 5001]. Last note. More stuff.
Now, if we Explode the $Text on paragraphs, using the first sentence as the title and omitting it from the note text.
Whilst Tinderbox has some pretty nifty text tools, regex get complicated quickly and the need for using such with Explode such invariably points back to poor data export upstream. I don’t have PDF Expert but it might be worth contacting the dev to ask for more sensible export options. Failing that, a competent text editor with regex support like BBEdit (which has a free version) or Sublime Edit are your friends when dealing with poorly formatted source data.