My CRM has choked on test data, and more

I’ve got myself in a little fix, and I need help. A CRM I’m building progressed far enough (btw, thanks to @eastgate and @mwra) to import some data, but apparently I gave it too much. I imported about 4,000 rows, and the beachball spun for more than 24 hours. Force-quitting and reopening hangs again. Strangely, TBX backup documents that did not have any data fed to them also hang on open. The file has several active agents; I assume they fire on open against 4,000 notes and can’t recover.

Is there a way to open a TBX file with agents suspended or disabled so I can get in and turn them off before they fire?

Tinderbox 11.6.0, MacBook Pro M1, Tahoe 26.3.1.

  1. Make a backup of your file. Don’t touch it.
  2. Make copies of any backups you plan to open. Don’t touch them, either!
  3. Open Tinderbox while holding down the shift key. This tells Tinderbox not to reload open documents.
  4. Does that work? Good!
  5. Close Tinderbox. Make a working copy of one of your backups. Does that copy open?
  6. If possible, send me a copy.

Also, if possible, hang on to the dataset you tried to import. It’s quite possible this can be made to import in seconds, and that we can save someone else some trouble.

Thanks for your reply!

- The dataset is intact.

- The TBX document backups are fine.

- Tinderbox itself reopens fine.

- A duplicate of the stalled document stalls whenever I reopen it. A backup copy without the data import opens and seems free of any hangup. My almost-complete CRM with the attempted data ingestion (9.5 MB) is on Dropbox. Dropbox

- Main containers: Connections, Connections Raw (import staging area for TSV data), Prospects (manually confirmed prospects), and Messages Raw.

- Two prototypes: ConnectionProspect and Message.

- Two agents: one maintains message counts and last-contact dates for notes in Prospects; the other uses isDuplicateName() to flag duplicate imports.

- The glitch: I imported about 4,000 rows of LinkedIn connection data as a TSV into Connections Raw. Tinderbox hung immediately on import. I let the beachball spin overnight before force-quitting. The document has not opened successfully since — every attempt to reopen it, including all backups containing the imported data, reproduces the hang.

The main error seems to have been giving TBX too much data at once. And I suppose I’d have been wise to turn the agents off.

Sometimes, a task that I believe runs in time proportionate to the number of notes, O(n), accidentally winds up running in time proportionate to O(n2). When importing 4000 notes, for example, it’s important not to update the outline view after each import, because outline layout in the worst case might need to examine every note.

That doesn’t matter a lot of the time. Say processing a note takes a millisecond. Instead of taking 12 ms, handling a dozen takes 144ms, and that’s still plenty fast. But for 4000 notes, we expect 4 sec and it might end up taking 16,000,000 sec ≈ 6 months.

That’s the sort of thing that is likely happening here. So, we’ll fire up the profiler and figure out what is happening.

1 Like

A short range hack is to chunk the task in two ways:

  • Use a separate Tinderbox with little if any content as the import context, then copy paste the generated notes to your main TBX
  • Break the import dataset into several smaller sections and add one at a time.
  • Or both.

The first point means to don’t have to worry about other code (e.g. agents, rules) competing with the import task. In an ideal world we want just start and let everything run but real-world-data—especially data we didn’t make (here, for example, it came from LinkedIn)—tends to throw up unexpected challenges.

This also begs the question as to whether you need all the columns/rows. It is easy to ingest everything simply because that’s the data file we got at start. But, for instance importing 30 columns when we only needed 5 means it might be worth pre-processing the data. If doing that I’d resit the lazy impulse: spreadsheets in this context generally add problems rather than remove them. Some ways to extract only the data needed for import:

  • AI. Flavour of the moment. You can give the data file (if not sensitive data) your favourite AI) and ask for only columns 1–5, 11, and 14 … or some such.
  • If used to coding , use R or Python to manipulate the data.
  • Command line tools such as awk.
  • BBEdit with regex.
  • If you like an UI consider EasyDataTransform

Of course, a factor in the above is whether this data importing s a process you’ll repeat (even with different data) or it is really a rare or once-in-a-lifetime event.

†. This is often on offer in the Artisanal Software Summer/WinterFest, if you can wait.

OK. I know what caused the trouble, and it’s easy to fix.

Agents were not at fault.

Beyond that, however, there’s another problem: I don’t think your data are importing as you expect. Could you post the first 10 records of your dataset here, anonymized? Then we can sort out the import, while I sort out the slowdown.

Eleven lines of fake data.

Name	FirstName	LastName	LinkedinURL	Company	Position	Connected On
Test Person1	Test	Person1	linkedin.com/in/testperson1	Fake Co 1	Job Title 1	1/19/26
Test Person2	Test	Person2	linkedin.com/in/testperson2	Fake Co 2	Job Title 2	1/21/26
Test Person3	Test	Person3	linkedin.com/in/testperson3	Fake Co 3	Job Title 3	1/22/26
Test Person4	Test	Person4	linkedin.com/in/testperson4	Fake Co 4	Job Title 4	1/25/26
Test Person5	Test	Person5	linkedin.com/in/testperson5	Fake Co 5	Job Title 5	1/28/26
Test Person6	Test	Person6	linkedin.com/in/testperson6	Fake Co 6	Job Title 6	2/1/26
Test Person7	Test	Person7	linkedin.com/in/testperson7	Fake Co 7	Job Title 7	2/3/26
Test Person8	Test	Person8	linkedin.com/in/testperson8	Fake Co 8	Job Title 8	2/5/26
Test Person9	Test	Person9	linkedin.com/in/testperson9	Fake Co 9	Job Title 9	2/8/26
Test Person10	Test	Person10	linkedin.com/in/testperson10	Fake Co 10	Job Title 10	2/11/26
Test Person11	Test	Person11	linkedin.com/in/testperson11	Fake Co 11	Job Title 11	2/14/26

Thanks for your reply, @mwra. Those are helpful suggestions.

- Chunking: That’s the plan. In fact, it was the plan even before I carelessly threw the whole bunch into the grinder.

- Whether I need all those columns: No, I don’t need all the fields that Linkedin provides. I’ve made a Keyboard Maestro macro that trims out the junk.

- The data ingestion chore will be weekly. I’m not sure yet what app should do that. The leading candidate now is PanoramaX for stuff it can do that Excel can’t. Also, Claude can write its formulas and stuff with the help of documentation screenshots I’ve put into the Claudian project knowledge.

OK: the first ten records seem to import fine.

Something convinced Tinderbox that your data were not, in fact, tabular, and so it imported everything into one very long note. The likely culprits are:

  • missing fields, especially at the end of the record
  • confusing punctuation (quotation mark, apostrophes and such used to cause trouble but should not be a problem now)
  • bad test encoding (might be a problem with old data or data from the Web)

One good way to track this down is to try pasting the first 200 lines. Are they ok? How about the first line and the next 200 lines? If that’s OK and not too slow, try 400 lines at a time. See whether you can narrow down the issue.

Thank you, @eastgate! I’ll give it a try asap, though possibly not today.

As the specimen data was fake, it is worth noting that CSV vs. TSV tabular data can’t be assumed to work the same.

Tip: friends don’t let friends use CSV if TSV is available. If making the date oneself then there is on excuse for using TSV. Neither ‘format’ has a standard but commas appear where not expected more often than tabs so TSV lessens that issue. Similarly, friends don’t ask friends to review/clean tabular data in spreadsheets. The latter tend to only add further damage to data. Pretty much any alternative (see upthread) is better.

Regardless, if the issue looking like being a row that is dispersed for the number of fields, a simple kludge is to add an extra last column with a header name (whatever) and a dummy data, e.g. ‘X’ in every row. This can help trick a confused parser to find the right number of columns. The imported data for the dummy column is easily deleted post import by deleting the user attribute created for that data.

Tinderbox’s preference for TSV seems clear in aTbRef, so that’s what I’ve fed it (though some CSV may have crept through). But I still wondered why, which you’ve now explained. Thanks.

1 Like