A picture of me.

Tom Hodson

Maker, Baker Programmer Reformed Physicist RSE@ECMWF


Selfhosting: Miniflux and RSSHub

Like many nerdy, computery types, I like to subscribe to blogs and other content through RSS. RSS is crazy simple, you host a url on a website with a list of posts with titles/URLs/content encoded in XML (I know I know but it only have like 5 tags and is only nested one level deep.) An RSS reader just checks a big list of those URLs every now and then and presents you the latest thing to show up.

Incidentally this is also how podcasts work, at least for a while, Spotify is clearly trying to capture it.

Anyway, I usually use theoldreader to read RSS feeds but lately they’ve implemented a premium version that you have to pay $3 a month for if you have more than 100 feeds (I have 99…).

Honestly, I use their service a lot so somehow $3 doesn’t seem so bad, but it spurred me to look into selfhosting.

Selfhosting seems to be all the rage these days. Probably in response to feeling locked in to corporate mega structures, the aforementioned computery nerdy types have gone looking for ways to maintain their own anarchic web infrastructure. See i.e the indieweb movement, mastodon etc etc etc

So I want to try out some self hosting. Let’s start with an RSS reader. Miniflux seems well regarded. So I popped over their, grabbed a docker-compose.yml, ran docker compose up -d and we seem to be off to the races.

Ok, a nice thing about Miniflux when compared to theoldreader is the former seems to be better at telling you when there’s something wrong with your feeds. It told me about a few blogs it couldn’t reach, notably Derek Lowe’s excellent blog about chemical drug discovery.

That blog has an rss feed, which loads perfectly find in my browser but doesn’t seem to work when outside of that context, i.e in python:

1>>> import requests
2>>> requests.get("https://blogs.sciencemag.org/pipeline/feed")
3<Response [403]>

Playing around a bit more, adding in useragents, accepting cookies and following redirects, I eventually get back a page with a challenge that requires JS to run. This is the antithesis of how RSS should work!

Ok so to fix this I came upon RSSHub which is a kind of RSS proxy, it parses sites that don’t have RSS feeds and generates them for you. I saw that this has pupeteer support so I’m hopping that I can use it to bypass the anti-crawler tactics science.org is using.

Anyway, for how here is a docker-compose.yml for both miniflux and RSSHub. What took me a while to figure out is that docker containers live in their own special network. So to subscribe to a selfhosted RSSHub feed you need to put something like “http://rsshub:1200/” where rsshub is the key to the image in the yaml file below.

EDIT: I got it to work using puppeteer! For now the code is in a branch for which I’ll do a proper PR soon.

 1version: '3'
 2
 3services:
 4  miniflux:
 5    image: miniflux/miniflux:latest
 6    # build:
 7    #   context: .
 8    #   dockerfile: packaging/docker/alpine/Dockerfile 
 9    container_name: miniflux
10    restart: always
11    healthcheck:
12      test: ["CMD", "/usr/bin/miniflux", "-healthcheck", "auto"]
13    ports:
14      - "8889:8080"
15    depends_on:
16      - rsshub
17      - db
18
19    environment:
20      - DATABASE_URL=postgres://miniflux:secret@db/miniflux?sslmode=disable
21      - RUN_MIGRATIONS=1
22      - CREATE_ADMIN=1
23      - ADMIN_USERNAME=admin
24      - ADMIN_PASSWORD=test123
25  db:
26    image: postgres:15
27    environment:
28      - POSTGRES_USER=miniflux
29      - POSTGRES_PASSWORD=secret
30    volumes:
31      - miniflux-db:/var/lib/postgresql/data
32    healthcheck:
33      test: ["CMD", "pg_isready", "-U", "miniflux"]
34      interval: 10s
35      start_period: 30s
36
37  rsshub:
38      # two ways to enable puppeteer:
39      # * comment out marked lines, then use this image instead: diygod/rsshub:chromium-bundled
40      # * (consumes more disk space and memory) leave everything unchanged
41      image: diygod/rsshub
42      restart: always
43      ports:
44          - '1200:1200'
45      environment:
46          NODE_ENV: production
47          CACHE_TYPE: redis
48          REDIS_URL: 'redis://redis:6379/'
49          PUPPETEER_WS_ENDPOINT: 'ws://browserless:3000'  # marked
50      depends_on:
51          - redis
52          - browserless  # marked
53
54  browserless:  # marked
55      image: browserless/chrome  # marked
56      restart: always  # marked
57      ulimits:  # marked
58        core:  # marked
59          hard: 0  # marked
60          soft: 0  # marked
61
62  redis:
63      image: redis:alpine
64      restart: always
65      volumes:
66          - redis-data:/data
67
68volumes:
69  miniflux-db:
70  redis-data:

## Backup RSS feed list I put a small script in the repo to backup.

1python -m env ~/miniflux_python_env
2source ~/miniflux_python_env/bin/activate
3pip install pyyaml

I’ve collected the code for the docker containers and config together into this repo.

## Backup everything to google drive Use rclone