Replacing Firefox live bookmarks

metadata

I use Firefox as my web browser and I use it’s live bookmarks feature to subscribe to RSS (and Atom) feeds across the internet. When Firefox 64 was released Mozilla announced that they would remove the live bookmarks feature rather than continue to support it (apparently it used it’s own XML parser).

I realise that RSS (and Atom) feeds are dying a slow death on the internet (because content providers are slowly realising that they allow users to access content whilst side-stepping tracking cookies and avoiding advertising revenue) but I feel that Mozilla could have put a little bit more effort into maintaining this feature. Mozilla should not care about how other websites publish their content - by removing the feature in Firefox they will contribute to the downfall of RSS (and Atom) feeds as fewer people will use them. Anyway, Mozilla did produce an article about how to migrate your data to a different app or a Firefox add-on: What happened to my live bookmarks?

Personally, I decided that this wasn’t the path for me. I was already inconvenienced by the Firefox app on Android/iOS not supporting live bookmarks anyway so I decided to write a Python script to check my feeds and email me about new articles. This way I would always get notified no matter what device I was using (as long as I could access my emails). Additionally, the Python script keeps a record of which articles it has already emailed me about so that it doesn’t email me about them all again just because it was run again (this means that the Python script can be run as part of a Cron job).

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258

#!/usr/bin/env python3

# Use the proper idiom in the main module ...
# NOTE: See https://docs.python.org/3.12/library/multiprocessing.html#the-spawn-and-forkserver-start-methods
if __name__ == "__main__":
    # Import standard modules ...
    import email
    import email.message
    import html
    import json
    import mimetypes
    import shutil
    import subprocess
    import sys
    import time

    # Import special modules ...
    try:
        import lxml
        import lxml.etree
    except:
        raise Exception("\"lxml\" is not installed; run \"pip install --user lxml\"") from None

    # Import my modules ...
    try:
        import pyguymer3
    except:
        raise Exception("\"pyguymer3\" is not installed; you need to have the Python module from https://github.com/Guymer/PyGuymer3 located somewhere in your $PYTHONPATH") from None

    # Check that "ssmtp" is installed ...
    if shutil.which("ssmtp") is None:
        raise Exception("\"ssmtp\" is not installed") from None

    # Define settings ...
    path = "/path/to/rss_checker.json"

    # Define function ...
    def construct_email(emailIn, feedTitleIn, postTitleIn, dateIn, linkIn, contentIn, thumbnailIn, sessIn, /):
        # Check inputs ...
        if feedTitleIn is None:
            print("WARNING: \"feedTitleIn\" is None")
            return False
        if postTitleIn is None:
            print("WARNING: \"postTitleIn\" is None")
            return False
        if dateIn is None:
            print("WARNING: \"dateIn\" is None")
            return False

        # Create email ...
        message = email.message.EmailMessage()
        message["To"] = emailIn
        message["Subject"] = f"New post in \"{feedTitleIn.text.strip()}\" feed"
        message["From"] = "you@example.com"

        # Create content ...
        contentOut = f"Post Title: {postTitleIn.text.strip()}\n"
        contentOut += f"Post Date: {dateIn.text.strip()}\n"
        contentOut += f"Post Link: {linkIn}\n"

        # Check if there is an article description ...
        if contentIn is not None:
            # Add the article description ...
            contentOut += f"Post Description:\n{html.unescape(contentIn.text.strip())}\n"

        # Set content ...
        message.set_content(contentOut)

        # Check if there is an article thumbnail ...
        if thumbnailIn is not None:
            # Obtain the thumbnail URL ...
            url = thumbnailIn.attrib.get("url", "ERROR")

            # Check that there is a thumbnail URL ...
            if url != "ERROR":
                # Download the thumbnail ...
                cont = pyguymer3.download_stream(sessIn, url)

                # Determine MIME type ...
                ftype = mimetypes.guess_type(url, strict = True)[0]
                if ftype is None:
                    ftype = "image/jpg"

                # Create short-hands ...
                maintype, subtype = ftype.split("/")

                # Add the article thumbnail ...
                message.add_attachment(
                    cont,
                    maintype = maintype,
                     subtype = subtype,
                    filename = f"thumbnail.{subtype}",
                )

        # Return the answer ...
        return message

    # Load data file as JSON ...
    with open(path, "rt", encoding = "utf-8") as fObj:
        data = json.load(fObj)

    # Initialize counter and set limit ...
    n = 0                                                                       # [#]
    nlim = 30                                                                   # [#]

    # Start session ...
    with pyguymer3.start_session() as sess:
        # Loop over feeds ...
        for feed in data:
            print(f"Processing \"{feed}\" ...")

            # Download Atom/RSS (as a byte stream) ...
            src = pyguymer3.download_stream(sess, feed)
            if src is False:
                print("WARNING: Failed to download the Atom/RSS feed.")
                continue
            if len(src) == 0:
                print("WARNING: The Atom/RSS feed is empty.")
                continue

            # Parse Atom/RSS as XML with error handling ...
            # NOTE: Atom/RSS feeds have a habit of being illegal XML. For
            #       example:
            #           <title>Cinthie's 'Soul, Strings & Samples' Mini Mix</title>
            #       ... should be:
            #           <title>Cinthie&apos;s &apos;Soul, Strings &amp; Samples&apos; Mini Mix</title>
            #       ... therefore, I no longer use "xml.etree.ElementTree" but
            #       rather "lxml.etree" as it supports recovery of illegally
            #       specified characters.
            root = lxml.etree.fromstring(src, parser = lxml.etree.XMLParser(recover = True))

            # Determine the feed format ...
            if root.tag  == "{http://www.w3.org/2005/Atom}feed":
                print("  It is an Atom feed")

                # Loop over all entry tags in the feed ...
                for entry in root.findall("{http://www.w3.org/2005/Atom}entry"):
                    # Find the link to the article ...
                    post = entry.find("{http://www.w3.org/2005/Atom}id").text.strip()
                    if not post.startswith("http"):
                        post = entry.find("{http://www.w3.org/2005/Atom}link").get("href").strip()
                        if not post.startswith("http"):
                            raise Exception("cannot find a post that starts with \"http\"") from None

                    # Correct for common bugs ...
                    post = post.replace("www.FreeBSD.org", "www.freebsd.org")
                    post = post.replace("www.freebsd.org//", "www.freebsd.org/")

                    # Skip this article if it has already been emailed ...
                    if post in data[feed]["posts"]:
                        continue

                    # Construct email ...
                    inp = construct_email(
                        data[feed]["email"],
                        root.find("{http://www.w3.org/2005/Atom}title"),
                        entry.find("{http://www.w3.org/2005/Atom}title"),
                        entry.find("{http://www.w3.org/2005/Atom}updated"),
                        post,
                        entry.find("{http://www.w3.org/2005/Atom}summary"),
                        entry.find("{http://search.yahoo.com/mrss/}thumbnail"),
                        sess,
                    )
                    if inp is False:
                        continue

                    # Send email and increment counter ...
                    subprocess.run(
                        ["ssmtp", data[feed]["email"]],
                           check = True,
                        encoding = "utf-8",
                           input = inp.as_string(),
                         timeout = 60.0,
                    )
                    n += 1                                                      # [#]

                    print(f"  Sent email about \"{post}\"")

                    # Save article so that it is not sent again ...
                    data[feed]["posts"] = sorted(list(set(data[feed]["posts"] + [post])))
                    with open(path, "wt", encoding = "utf-8") as fObj:
                        json.dump(
                            data,
                            fObj,
                            ensure_ascii = False,
                                  indent = 4,
                               sort_keys = True,
                        )

                    # Stop sending emails or wait so that this script does not
                    # spam the server ...
                    if n >= nlim:
                        print("Finishing cleanly; sent too many emails.")
                        sys.exit()
                    time.sleep(2.0)
            elif root.tag  == "rss":
                print("  It is an RSS feed")

                # Loop over all item tags in the first channel tag of the feed ...
                for item in root.find("channel").findall("item"):
                    # Find the link to the article ...
                    post = item.find("link").text.strip()
                    if not post.startswith("http"):
                        raise Exception("cannot find a post that starts with \"http\"") from None

                    # Correct for common bugs ...
                    post = post.replace("www.FreeBSD.org", "www.freebsd.org")
                    post = post.replace("www.freebsd.org//", "www.freebsd.org/")

                    # Skip this article if it has already been emailed ...
                    if post in data[feed]["posts"]:
                        continue

                    # Construct email ...
                    inp = construct_email(
                        data[feed]["email"],
                        root.find("channel").find("title"),
                        item.find("title"),
                        item.find("pubDate"),
                        post,
                        item.find("description"),
                        item.find("{http://search.yahoo.com/mrss/}thumbnail"),
                        sess,
                    )
                    if inp is False:
                        continue

                    # Send email and increment counter ...
                    subprocess.run(
                        ["ssmtp", data[feed]["email"]],
                           check = True,
                        encoding = "utf-8",
                           input = inp.as_string(),
                         timeout = 60.0,
                    )
                    n += 1                                                      # [#]

                    print(f"  Sent email about \"{post}\"")

                    # Save article so that it is not sent again ...
                    data[feed]["posts"] = sorted(list(set(data[feed]["posts"] + [post])))
                    with open(path, "wt", encoding = "utf-8") as fObj:
                        json.dump(
                            data,
                            fObj,
                            ensure_ascii = False,
                                  indent = 4,
                               sort_keys = True,
                        )

                    # Stop sending emails or wait so that this script does not
                    # spam the server ...
                    if n >= nlim:
                        print("Finishing cleanly; sent too many emails.")
                        sys.exit()
                    time.sleep(2.0)
            else:
                raise Exception(f"\"{root.tag}\" is an unrecognized feed format") from None

              
You may also download “rss_checker.py” directly or view “rss_checker.py” on GitHub Gist (you may need to manually checkout the “main” branch).

You will see that the script has a few neat features, such as:

The JSON file that it uses as a database is shown below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130

{
    "http://updating.kojevnikov.com/atom/ports": {
        "email" : "you@example.com",
        "posts" : [
            "http://updating.kojevnikov.com/entry/8e1ed6e2b38be47eab3e4d433073b80477fb9190d77192e6b62e411feb70d790",
            "http://updating.kojevnikov.com/entry/9b3c43c14a569550fef365b746a01f38d6c93d7bf98adfd02514dfa2a04ab3cf",
            "http://updating.kojevnikov.com/entry/75aaf4e5d7e7d88ebaa1786841cfda27f18c1e12517816f8b405ecbc07aa0d23",
            "http://updating.kojevnikov.com/entry/847d0f9a75de85d2207b02963a701ebfa56dbd7bb9bd86c384a5caf94da3f760",
            "http://updating.kojevnikov.com/entry/3c5d0cf2c3b838278980e3b1fbce4ac70171814e897211636ee273c59beb28ef",
            "http://updating.kojevnikov.com/entry/be1f0245edfaf5a53c6c49d889584dd637426d4109e6e19c84f3864212dbb196",
            "http://updating.kojevnikov.com/entry/9092af7b0b3ebe89eb374cef69debb511d6a3f6ff1e93bb6af98183275e1bdcc",
            "http://updating.kojevnikov.com/entry/13ad26498c2d4b2d9a3f94ccb58117f7ddfa65d8a38b950e805f079cc20de3f1",
            "http://updating.kojevnikov.com/entry/abb15a178d2dce63c2abc6995e7ebba1df1f80ba96bdc1cda1281d8fc171682b",
            "http://updating.kojevnikov.com/entry/d29c267b899c7f18381b8393d2a1e0a883fcb6dc4c5982ca60fdb131e8ee659c"
        ]
    },
    "https://what-if.xkcd.com/feed.atom": {
        "email" : "you@example.com",
        "posts" : [
            "https://what-if.xkcd.com/157/",
            "https://what-if.xkcd.com/156/",
            "https://what-if.xkcd.com/155/",
            "https://what-if.xkcd.com/154/",
            "https://what-if.xkcd.com/153/"
        ]
    },
    "https://xkcd.com/atom.xml": {
        "email" : "you@example.com",
        "posts" : [
            "https://xkcd.com/2089/",
            "https://xkcd.com/2088/",
            "https://xkcd.com/2087/",
            "https://xkcd.com/2086/"
        ]
    },
    "https://blog.xkcd.com/feed/atom/": {
        "email" : "you@example.com",
        "posts" : [
            "https://blog.xkcd.com/?p=847",
            "https://blog.xkcd.com/?p=840",
            "https://blog.xkcd.com/?p=823",
            "https://blog.xkcd.com/?p=805",
            "https://blog.xkcd.com/?p=801",
            "https://blog.xkcd.com/?p=797",
            "https://blog.xkcd.com/?p=774",
            "https://blog.xkcd.com/?p=728",
            "https://blog.xkcd.com/?p=768",
            "https://blog.xkcd.com/?p=746"
        ]
    },
    "https://bodhi.fedoraproject.org/rss/updates/?type=security": {
        "email" : "you@example.com",
        "posts" : [
            "https://bodhi.fedoraproject.org/updates/tinc-1.0.35-1.fc28",
            "https://bodhi.fedoraproject.org/updates/vcftools-0.1.16-1.fc28",
            "https://bodhi.fedoraproject.org/updates/vcftools-0.1.16-1.el7",
            "https://bodhi.fedoraproject.org/updates/leptonica-1.77.0-1.fc28%20mingw-leptonica-1.77.0-1.fc28",
            "https://bodhi.fedoraproject.org/updates/leptonica-1.77.0-1.fc29%20mingw-leptonica-1.77.0-1.fc29",
            "https://bodhi.fedoraproject.org/updates/mingw-podofo-0.9.6-5.fc29%20podofo-0.9.6-3.fc29",
            "https://bodhi.fedoraproject.org/updates/wordpress-5.0.2-1.fc29",
            "https://bodhi.fedoraproject.org/updates/wordpress-5.0.2-1.el7",
            "https://bodhi.fedoraproject.org/updates/wordpress-5.0.2-1.el6",
            "https://bodhi.fedoraproject.org/updates/wordpress-5.0.2-1.fc28",
            "https://bodhi.fedoraproject.org/updates/openjpeg2-2.3.0-10.fc29%20mingw-openjpeg2-2.3.0-6.fc29",
            "https://bodhi.fedoraproject.org/updates/openjpeg2-2.3.0-10.fc28%20mingw-openjpeg2-2.3.0-6.fc28",
            "https://bodhi.fedoraproject.org/updates/mingw-poppler-0.67.0-2.fc29",
            "https://bodhi.fedoraproject.org/updates/mingw-poppler-0.62.0-2.fc28",
            "https://bodhi.fedoraproject.org/updates/php-pear-1.10.7-2.fc29",
            "https://bodhi.fedoraproject.org/updates/php-pear-1.10.7-2.fc28",
            "https://bodhi.fedoraproject.org/updates/krb5-1.16.1-22.fc28",
            "https://bodhi.fedoraproject.org/updates/krb5-1.16.1-22.fc29",
            "https://bodhi.fedoraproject.org/updates/terminology-1.3.2-1.fc29",
            "https://bodhi.fedoraproject.org/updates/terminology-1.3.2-1.fc28"
        ]
    },
    "https://fedoramagazine.org/feed/": {
        "email" : "you@example.com",
        "posts" : [
            "https://fedoramagazine.org/best-2018-fedora-system-administrators/",
            "https://fedoramagazine.org/how-to-build-a-netboot-server-part-3/",
            "https://fedoramagazine.org/best-2018-articles-command-line/",
            "https://fedoramagazine.org/4-try-copr-december-2018/",
            "https://fedoramagazine.org/best-2018-articles-desktop-users/",
            "https://fedoramagazine.org/how-to-build-a-netboot-server-part-2/",
            "https://fedoramagazine.org/dash-dock-extenstion/",
            "https://fedoramagazine.org/fedora-classroom-containers-101-podman/",
            "https://fedoramagazine.org/secure-nfs-home-directories-kerberos/",
            "https://fedoramagazine.org/fedora-27-end-of-life/"
        ]
    },
    "https://vuxml.freebsd.org/freebsd/rss.xml": {
        "email" : "you@example.com",
        "posts" : [
            "https://www.vuxml.org/freebsd/70b774a8-05bc-11e9-87ad-001b217b3468.html",
            "https://www.vuxml.org/freebsd/b80f039d-579e-4b82-95ad-b534a709f220.html",
            "https://www.vuxml.org/freebsd/4f8665d0-0465-11e9-b77a-6cc21735f730.html",
            "https://www.vuxml.org/freebsd/fa6a4a69-03d1-11e9-be12-a4badb2f4699.html"
        ]
    },
    "https://www.freebsd.org/news/rss.xml": {
        "email" : "you@example.com",
        "posts" : [
            "https://www.FreeBSD.org/news/newsflash.html#event20181224:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181211:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181211:02",
            "https://www.FreeBSD.org/news/newsflash.html#event20181201:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181125:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181117:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181110:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181103:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181027:01",
            "https://www.FreeBSD.org/news/newsflash.html#event20181020:01"
        ]
    },
    "https://www.freebsd.org/security/rss.xml": {
        "email" : "you@example.com",
        "posts" : [
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:15.bootpd.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:14.bhyve.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:13.nfs.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:12.elf.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:11.hostapd.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:10.ip.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:09.l1tf.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:08.tcp.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:07.lazyfpu.asc",
            "https://security.FreeBSD.org/advisories/FreeBSD-SA-18:06.debugreg.asc"
        ]
    }
}

              
You may also download “rss_checker.json” directly or view “rss_checker.json” on GitHub Gist (you may need to manually checkout the “main” branch).

You can see that it is a simple dictionary (or associative array) where each key is the URL of the RSS (or Atom) feed and each value is a list of the article URLs that have been emailed to-date. If you want to add a new RSS (or Atom) feed to the script then you can just define the key with an empty list and run the Python script again.