Excellent! thank you for the round trip and background also – it does make sense - particulaly if we know the behavior, we can handle it!
Now, with that said, and because you’ve given me the nature of the beast, I can think of when this exception will have exceptions… when the string itself has deliberate quotes in it. we lose the quotes in the string by the .replace .
For now, this is good because I understand the behavior well enough to handle it and thanks to your explanation, and help with the action codes, I think I can put a test in upstream to handle quotes actually meant to be inside the string.
I was hoping you’d miss that. You are quite right. Part of the issue is (my presumption here!) that as the TBX store format is XML (albeit with some RTFD / base64 content) there will be a lot of text parsing going on. I’ve assumed that’s one reason we can’t escape straight quotes.
Yet, whereas Action code must use straight quotes for delimiters, typographic ‘curly’ quotes, i.e. “ ” ‘ ’ quote in literal text strings is good. Then again, getting anything out of Excel that uses more that ASCII is an invitation to a lost afternoon (BBEdit, or a similar decent text editor is your friend here).
I don’t really like to use the term ‘format’ for CSV, because it isn’t a format with a common spec. The flavours of CSV are endless, I treat CSV (unless I made it for myself) as a ‘serving suggestion’. IOW, it might contain usable data … after some work. Thus I’ll often take (some of) a new CSV source and drop it into a new unsaved TBX just to see what comes in our doesn’t.That way dumping early sub-par data imports are easily ditched and without a kitty litter of bizarre attribute names is a nice clean working TBX. If you have the choice, I find tab-delimited outshines CSV for lack of headaches.
In context, separate discussion of word breaks in Chinese text reminds us how fragile our assumptions are about how text is/should be. This is were small tests are your friend. Once you’ve the data in Tinderbox, then you’ve a veritable Swiss Army knife of ways to work with and visualise your text.
Circling back to quotes, if taking data from a pipeline, consider whether quotes need to be straight or typographic and don’t coerce then (or let them be coerced by export code written in a previous pre-Unicode era)). For characters like parentheses, consider swapping them out for square brackets (or anything safe for your purposes as String.replace() is your friend, as long as you take care to use unique replacements for opening closing chars for quotes/brackets/etc. (easily overlooked for those unused to this text grunt work).
In an ideal world everything would ‘just’ import. Most often it does, but as your really interesting example shows, sometimes a bit of extra fiddling is needed to get all safely bedded in. And, if stuck, the forum is always here.
You can also use [String].trim(punctuation), which will remove whitespace and punctuation from the start and end of the string without changing the interior. I’ll look at an option that is limited to whitespace and quotes.
I can confirm – smart quotes in the text survive the .replace. So if there are smart quotes (curly quotes) in the texts, all works well.
I love BBEdit, btw. Was also a TextWrangler fan. As much as I hated to see TextWrangler go, it’s caused me to focus on BBEdit just as well.
There’s another handy little app for scrubbing text called TextSoap. I’ve had that for a long time and it’s just a delight to rapidly use. Much of it overlaps the BBEdit functions, but it’s worth mentioning because some people not accustomed to BBEdit would likely find TextSoap more approachable
Actually, TextWrangler just grew up and became BBEdit without licence applied. It’s not always obvious this is the case, which I why I mention it. Yay for free stuff, though I actually do have licence as I use it so much even though I don’t think I use many of the licence-only options. BBEdit, like Tinderbox is one of the apps always open in my doc.
Agree re TextSoap, more approachable for those wanting the safety of more UI whilst wrangling [sic] text.
Totally aside from post topic, but I had BBEdit licenses also and would still run to TextWrangler first! I knew they were the same. I knew BBEdit could do more. For some reason, TextWrangler was just the default for me. LOL