Fetch an RSS feed

I tried fetching an RSS feed and am getting errors. Thoughts on how to make this work?

Just had a play, using the forum feed, and it looks like it downloads and does a little formatting of the feed. Basically it does what I expected (actually slightly more than --my memory told me it would give unformatted code).

Is the feed working (i.e. if you put it in your usual RSS reader, does it come up OK)?

Ah, perhaps I spoke too soon; after leaving it alone for a few minutes, itā€™s now giving an ā€œAutofetch couldnā€™t retrieve a noteā€ error.

But again, if I create an entirely new feed (MacSparky this time), then it does work as expected.

What feed?

Iā€™ve tried multiple feeds, canā€™t seem to get any of them to work. I must be doing something wrong.

OK, Iā€™m certainly not an expert in this (I read RSS elsewhere, and donā€™t need to slice-and-dice with tinderbox), but feed address in the URL and turn on autofetch works nicely.

Iā€™ll attach an example.temp rss test.tbx (259.8 KB)

This is just the forum feed, as macsparky has far too many large images to be helpful here. This works for me on 8.9.2

I see, I was doing it wrong. I was using autofetch command and not the URL. Works now. Thanks.

Yes, $AutoFetch simply tells the current note to auto-fetch the resource stored in $URL.

Yupā€¦get that now. Kinda feel stupid, but there you go. :slight_smile:

Not really, many things are non-obvious before the fact. :slight_smile:

As it is, i misremembered: auto-fetch can pull from other resources as well as $URL - see here. $URL was the original use case, but itā€™s broadened over time.

Is there a way to have Tinderbox parse the RSS? Possible divide each item into separate notes:

Here is my RSS:

This is what Iā€™m getting:

What would be the best way to parse this? Iā€™d prefer to have the RSS importer divide every thing up into discrete notes, or at least drop a delimiter that I could explode on. Iā€™d love to get all the values populating in attributes. Thoughts?

Each item in the feed is enclosed in <item></item> tags. Iā€™d caution against casual use of Preview as a proxy viewer. Why? Because likely youā€™ll see some form of render and think the preview is failing rather than it being a case of the user previewing data for which the preview was not intended.

The RSS URL in the first screen-grab (in the <atom:link> element) returns a 404 so it is difficult to diagnose the content earlier

The feed purports to follow the RSS v2.0 standard (see: RSS 2.0 Specification (RSS 2.0 at Harvard Law)) but has some non-standard Atom elements/attributesā€ . Note: Atom was essentially a branching of RSS as different sub-groups of RSS creators wanted to allow different sorts of <item> contents.

Then again, the RSS format is quite permissive and like CSV, OPML, and such, RSS tolerates custom additions as long as they follow general encoding/format guidelines. Conversely it means that feeds found in the wild rarely accurately follow the ā€˜standardā€™ so expecting an appā€™s RSS parser to cope with anything thrown at it (when this is a minor app feature - given the cost of engineering for all edge cases) is, in fairness, over-optimistic.

Anyway, to try and move this forward, I took aTbRefā€™s RSS feedā€”ā€”whose provenance I know :wink: as a test input. I tried both AutoFetch import and pasting the RSS feed data into a Tinderbox note with the built-in ā€˜Codeā€™ prototype set, to avoid all the ā€˜smartā€™ Apple automations fiddling with the data. To be fair to the app, it is not easy to tell when the user is employing $Text to hold ā€˜codeā€™ text as opposed to general text wherein affordances to aid visually nice typographical improvements is generally a benison.

The TBX file I created is here: rss-parse.tbx (561.8 KB)

[Note: by the time some later readers use this doc, the source feed may have changed.]

Firstly, using AutoFetch. Here Tinderbox is processing the feed and sort of getting it right. It seems unable to correctly detect the <item> tag marking the start of each item in the feed. In addition, it is unable to un-encode typographic quotes and non-ASCII characters like Apple keyboard symbols. In fairness, these formats predate widespread Unicode support (or even UTF-8 use). so, whilst not what I want, Iā€™m not blaming the parser. Indeed, were I writing RSSD today, I might try and add more encoding hints for any such parser.

Note the above is Tinderbox parsing the source into $Text, it is not Preview mode.

Next, the RSS feed raw code pasted into a Tinderbox note with ā€˜Codeā€™ prototype set. Here seen in ā€˜Previewā€™ mode:

Essentially there is the same error (same parsing invoked?) in that there is no clean detection of discrete <item> boundaries. It probably doesnā€™t help that all items are extra elements in the <channel> element which otherwise holds what might be considered the header of the feed. Still <item>ā€¦<\item> tags are not hard to detect as discrete elements.

So, can the user explore the code? Using these settingsā€¦

ā€¦I exploded the feed:

This does show one apparent bug/parse fail (note for @eastgate) whereby the line return at the end of the sentence/paragraph is incorrectly retain, noting this doesnā€™t happen with normal text (i.e. not containing mark-up tags).

Before committing even more time researching this, Itā€™s unclear as to what, exactly, you are expecting, beyond better than now . :slight_smile: I sense youā€™d like some element values parsed out to attributes. Should $Text try to retain <description> formatting? If so, what standard(s) are you expecting to be handled?

As ā€˜diisgoā€™ is, it seems, a paid service it may be you can customise (or get them to improve/alter) their existing RSS templates which might go towards you getting a clearer feed on which to work in Tinderbox.

In areas like this of what are in reality standards in name only (e.g. RSS) it is better to think of the interchange info rather than the source-sink apps. This has an echo of other threads re similarly loose standards like OPML or CSV (or even a user-format based on a ā€˜standardā€™ like ā€˜semicolon delimited dataā€™). Many of these things are complete kludges often due to app developers making best effort against erstwhile technical limitation and a general inconsistency is adherence to the loose standards.

From experience, Iā€™ve learned to look first at the interchange format, entirely separate from source/target app and assess first whether the format is actually usable with the desired combo. Iā€™ll admit, in the past, to innocently asking one app or the otherā€™s dev to ā€˜justā€™ make things work without realising Iā€™m likely asking them to kludge their own demo to fit the failings of another. Itā€™s not the devs so much as a failure to define and maintain standards.

This MacScripter thread, albeit from 2004, seems to offer up some ideas. I was thinking of AppleScriptā€”or indeed AppleScript as part of an Apple Automator
service. Given that we might need to be parsing a ā€˜diisgoā€™-created RSS as opposed to some other source, likely once all the edge cases have been worked out (with which the many different expertises here can help) like as no you will have something that will then work for the time ā€˜diisgoā€™ keeps going (most online ā€˜appsā€™ seem to last only a few years, few survive long-term) or you move to using some other source in which case the script(s) can be tweaked for the new sources understanding of the RSS standard.

ā€ . On checking aTbRefā€™s RSS templates, I appear to use the <atom:link> element in the <channel> header. It is so longer ago that why quite escapes me.

ā€”. aTbRefā€™s home page offers three feed formatsā€”RSS, Atom (an RSS variant) and JSON. The templates for creating them can be copied from the aTbRef source TBX (also available from the aTbRef site.

1 Like