Selfhosting: Miniflux and RSSHub
Like many nerdy, computery types, I like to subscribe to blogs and other content through RSS. RSS is crazy simple, you host a url on a website with a list of posts with titles/URLs/content encoded in XML (I know I know but it only have like 5 tags and is only nested one level deep.) An RSS reader just checks a big list of those URLs every now and then and presents you the latest thing to show up.
Incidentally this is also how podcasts work, at least for a while, Spotify is clearly trying to capture it.
Anyway, I usually use theoldreader to read RSS feeds but lately they’ve implemented a premium version that you have to pay $3 a month for if you have more than 100 feeds (I have 99…).
Honestly, I use their service a lot so somehow $3 doesn’t seem so bad, but it spurred me to look into selfhosting.
Selfhosting seems to be all the rage these days. Probably in response to feeling locked in to corporate mega structures, the aforementioned computery nerdy types have gone looking for ways to maintain their own anarchic web infrastructure. See i.e the indieweb movement, mastodon etc etc etc
So I want to try out some self hosting. Let’s start with an RSS reader. Miniflux seems well regarded. So I popped over their, grabbed a docker-compose.yml
, ran docker compose up -d
and we seem to be off to the races.
Ok, a nice thing about Miniflux when compared to theoldreader is the former seems to be better at telling you when there’s something wrong with your feeds. It told me about a few blogs it couldn’t reach, notably Derek Lowe’s excellent blog about chemical drug discovery.
That blog has an rss feed, which loads perfectly find in my browser but doesn’t seem to work when outside of that context, i.e in python:
Playing around a bit more, adding in useragents, accepting cookies and following redirects, I eventually get back a page with a challenge that requires JS to run. This is the antithesis of how RSS should work!
Ok so to fix this I came upon RSSHub which is a kind of RSS proxy, it parses sites that don’t have RSS feeds and generates them for you. I saw that this has pupeteer support so I’m hopping that I can use it to bypass the anti-crawler tactics science.org is using.
Anyway, for how here is a docker-compose.yml for both miniflux and RSSHub. What took me a while to figure out is that docker containers live in their own special network. So to subscribe to a selfhosted RSSHub feed you need to put something like “http://rsshub:1200/” where rsshub is the key to the image in the yaml file below.
EDIT: I got it to work using puppeteer! For now the code is in a branch for which I’ll do a proper PR soon.
1version: '3' 2 3services: 4 miniflux: 5 image: miniflux/miniflux:latest 6 # build: 7 # context: . 8 # dockerfile: packaging/docker/alpine/Dockerfile 9 container_name: miniflux 10 restart: always 11 healthcheck: 12 test: ["CMD", "/usr/bin/miniflux", "-healthcheck", "auto"] 13 ports: 14 - "8889:8080" 15 depends_on: 16 - rsshub 17 - db 18 19 environment: 20 - DATABASE_URL=postgres://miniflux:secret@db/miniflux?sslmode=disable 21 - RUN_MIGRATIONS=1 22 - CREATE_ADMIN=1 23 - ADMIN_USERNAME=admin 24 - ADMIN_PASSWORD=test123 25 db: 26 image: postgres:15 27 environment: 28 - POSTGRES_USER=miniflux 29 - POSTGRES_PASSWORD=secret 30 volumes: 31 - miniflux-db:/var/lib/postgresql/data 32 healthcheck: 33 test: ["CMD", "pg_isready", "-U", "miniflux"] 34 interval: 10s 35 start_period: 30s 36 37 rsshub: 38 # two ways to enable puppeteer: 39 # * comment out marked lines, then use this image instead: diygod/rsshub:chromium-bundled 40 # * (consumes more disk space and memory) leave everything unchanged 41 image: diygod/rsshub 42 restart: always 43 ports: 44 - '1200:1200' 45 environment: 46 NODE_ENV: production 47 CACHE_TYPE: redis 48 REDIS_URL: 'redis://redis:6379/' 49 PUPPETEER_WS_ENDPOINT: 'ws://browserless:3000' # marked 50 depends_on: 51 - redis 52 - browserless # marked 53 54 browserless: # marked 55 image: browserless/chrome # marked 56 restart: always # marked 57 ulimits: # marked 58 core: # marked 59 hard: 0 # marked 60 soft: 0 # marked 61 62 redis: 63 image: redis:alpine 64 restart: always 65 volumes: 66 - redis-data:/data 67 68volumes: 69 miniflux-db: 70 redis-data:
##Â Backup RSS feed list I put a small script in the repo to backup.
I’ve collected the code for the docker containers and config together into this repo.
##Â Backup everything to google drive Use rclone