Parsing CSV/spreadsheet data getting drop down serialized data out of a column

Hi,

I have a spreadsheet output from a database that has recurring events. The dates can be built from a column that has Month and another that has the day. The idea is to have annual commemorations reappear annually, etc.

One column has complex entires in a comma delineated series… as an example COLUMN D entries might be like this:
“Fishing, Picnic, Camping”
“Camping, Picnic”
“Theater, Fishing”

Those entries came from an option picker - So there’s a known set of possible terms in the column.

Is there a way to parse that back out of the CSV note and into a similar structure in tinderbox?

Thank you. I’ve been a lurker for years, and I putz around with tinderbox for seven years now, but I’m finally trying to do some heavier lifting and hitting walls at every turn!

In an excel spreadsheet, I created a range of data shown below. You’ll notice that I used Find > Replace to replace the commas in the second column with semicolons. I’ll touch on why a bit later.

Screenshot of Microsoft Excel (4-27-20, 7-48-49 PM)

In a Tinderbox document, I created a set-type attribute OptionSet.

I copied the Excel range (A1:B4) and paste it into the Outline view of the Tinderbox document. This causes three notes to be created, named as per column A of the spreadsheet – the Name column becomes the content of the built-in Name attribute for each note. And the OptionSet column populates that attribute for each note.

The reason I wanted to use semicolons is because semicolons separate members of a set in a set-type attribute. If I used commas, then Tinderbox would interpret “Fishing,Picnic,Camping” as a single entry in the set.

This image illustrates the result.

I believe if you apply these principles to the real data you have, you’ll get headed where you want to go. The little dropdown box on the right side shows all the possible set values, that Tinderbox construed based on the input data.

Here’s the demo file.

Set Example.tbx (75.2 KB)

3 Likes

Nice! Thank you Paul!

1 Like

The above is absolutely the path I’d endorse. Knowing someone will arrive and say “but the data’s already imported?” I thought I’d deal with the latter case .

First, if you can, do the above (in Excel or some other tool) and re-import. Much better to import the data in correct form. No really, exporting structured data correctly at source is almost always the easiest method in the round. Didn’t make the data yourself,? Well, you can still fix it in Excel or whatever before importing to Tinderbox.

But, you say, it’s part of a larger import and it’s too hard to do over, is there still a method? Yes, in short you take a String of comma or comma+space delimited terms, replace the delimiters with a semi-colon and pass to your list (at Set or List type). Yuo only want to do this once, so I’d use a stamp. If adding to a List-type then you do want to ensure this is a one-time operation or you’ll just keep readding data.

The simple pattern is this:

$MySet = $MyString.replace(", *",";");

The comma-space-asterisk is a regex for ‘command and zero or more spaces’. To append the new data to existing data:

$MySet = $MySet+$MyString.replace(", *",";");

Where I’ve implied a Set above, you can use a list (i.e. replace my Set-type in the code with a $List-type). With a List, do take care to run the process once.

As I hope this shows, fixing the data pre-import is actually much easier that fiddling once imported. So, try the former approach first! :wink:

2 Likes

Thanks Mark. A little regex in BBEdit makes short shrift of this. The commas in a data field of a CSV are escaped and the CSV format makes it easy to capture that one column

2 Likes