Experiences importing HTML files

jfontana · January 5, 2021, 6:36pm

Hi again,

I’m trying to rescue a bunch of notes I had in an app I used to use a long time ago before I started with TB. The app was called Notebook (by Circus Ponies). The format of those notes was some mixture of XML files and embedded rtf so you would think it would not be difficult to recover the contents.

I’ve struggled for a long time trying to find a way of converting the Notebook documents into another format. Luckily I had exported all my documents in other formats. The most convenient format is HTML since I don’t have to deal with the complex structure of the folders and XML files in the OPML exports.

I thought it would be very simple to just create a TB document and drag all the HTML docs into it. I don’t understand why but TB really struggles to import these HTML docs. I tried dragging 10 docs at a time but TB crashed or just showed the beach ball for a long time. Then I tried dragging just two or three docs at a time. This was a bit better but by no means solved the problem.

I just have a TB document with three HTML files imported. When I try to visualize the contents of a note by clicking the title in the outline, I get a beach ball that lasts for quite while before I’m able to see the note.

Has anybody experienced similar problems? These are well-formed HTML documents. I have no problems opening them with any of the browsers I use. I don’t understand why it is so difficult for TB to handle them.

JM

eastgate · January 5, 2021, 6:43pm

First, zip up some of the files you’re trying to import and send them to tech support: tinderbox@eastgate.com . We can take a look and see what’s going wrong.

Possible workaround: open the files in your browser. Copy and paste.

rkaplan · January 5, 2021, 6:52pm

The new version of Notebooks is a semi-successor to Circus Ponies. It may be an interim step in exporting your data.

See here:

https://www.aquaminds.com/faqs

jfontana · January 5, 2021, 10:38pm

Hi Richard,

Thanks for the pointer. $30 is a steep price to pay for an app that I’m only going to use once. I had seen another app that was also able to import CPN docs but in the end I decided not to purchase it because the only thing that would allow me to do is to have a replacement for CPN. It would not allow me to do what I need to do.

My problem is not so much not being able to read the CPN docs as being able to transform them into a format that can be easily imported into TB. I have already exported my old CPN docs into different formats: HTML and OPML. OPML is easily imported into TB. The problem with CPN docs is that they are not structured in the same way as TB. CPN does not seem to make any difference between headings or titles and notes.

When I import the OPML doc I exported from CPN into TB what I get is a single outline with a collection of titles where every title is the note itself. That requires a lot of work to convert into a usable format. With the HTML export, you have a single HTML file for every note and that is a little better. This would make things a lot easier because I just need to import the documents as independent notes and then organize them within TB.

The solution Mark Bernstein suggests would work but I have hundreds of notes and copying and pasting all of them into TB would take a lot of time. So, it looks like I’m stuck.

jfontana · January 5, 2021, 10:40pm

OK. I’ll do that. Thanks.

eastgate · January 5, 2021, 11:15pm

Patience! Let’s see what the problem might be.

satikusala · January 5, 2021, 11:21pm

Do you need the HTML formatting? I’ve you tried opening the HTML in a browser, copying it, and then paste-specialing it (⇧⌥⌘)? this will clear out all the non-visible junk. That is probably what is crashing the file.

eastgate · January 6, 2021, 5:09pm

It turns out that the HTML in the documents in question was…unusual, with thousands of font and style tags scattered throughout. The HTML importer was having a very hard time dealing with all the styles; it could get through eventually, but the resulting notes would be hard to work with.

I think we can find ways to clean up the HTML more-or-less automatically through other tools, and then import the cleaned-up documents.

jfontana · January 6, 2021, 7:17pm

They were unusual indeed. Like the following multiplied by 1000:

This was most likely generated by the export engine in CPN.

Thanks for the help!

JM

JKF · January 12, 2021, 7:46am

Older versions of the Outline app used to import Circus Ponies.

The news St version has a free trial and there’s instructions linked to in the article below how to downgrade to older versions. I’ve no idea whether this gives a working way forward but might be helpful.

https://help.outline.app/article/21-how-to-import-circus-ponies-notebooks-to-outline

jfontana · January 13, 2021, 5:51pm

Thanks Jenny!

In the end what I did was to export the notes as text. In the export different sections in the notes have different indentations. This allowed me to “explode” the note within TB into different notes. A bit involved and not exempt of problems but at least all my data is in TB format and I can work from that.