We have had a rough week migrating the Packet Pushers site. The problems are behind us and you should not fear our RSS Feed.
Here is some self-abuse with a postmortem and some learning about running WordPress at modest scale.
- The RSS feeds failed to migrate cleanly.
- We didn’t freeze the content between the data export and go-live. It should have been ok but it made debugging more difficult.
- Because of legacy, we now have two podcasting plugins for WordPress. We couldn’t test these properly until the site was live. Related: Reading XML isn’t hard, making sense of XML is hard.
- Our RSS provider had a caching problem that was hard to diagnose and get them to fix.
- Our Web developer wasn’t responsive in providing answers to reasonable questions.
Some Good Stuff
- Speed – SO MUCH FASTER. Roughly a 1000% improvement in page load time. Home page load time is under 1 second!
- The site is fully SSL with Lets Encrypt certificates
- All Packet Pushers RSS feeds are SSL-enabled and signed by Lets Encrypt
- Website is under our control and we can make changes as the business changes
- We separated the Ignition membership platform into a different WordPress instance to reduce complexity and load.
RSS Feeds and Trouble
WordPress generates RSS feeds for different things – site feed, categories, authors etc. An RSS feed for podcasting has a range of extra metadata and a plugin is used to generate the metadata. Each podcasting plugin developer seems to use a different method to do this. One plugin uses taxonomies and others use categories. Once you go down a path its very difficult to go back, as we have discovered. Today we have a plugin that publishes podcasts using taxonomies and another plugin to add metadata to the taxonomies created.
Got that ? Confused ?
RSS Readers look for uniqueness to identify a new post. It seems that most RSS readers use the GUID in the XML data as the key since its a globally unique ID. (This may not always be true, RSS isn’t a very good at many things)
During the migration, it took some time and work to get the plugins working correctly and, likely, the GUID’s were changing as we worked which setting work. I’ll remind you that we are stuck with two plugins because fixing a WordPress taxonomy is not a trivial task.
At some point in this process our RSS feed provider had an issue. The feeds stopped refreshing with stale data that pointed to our previous RSS and metadata. Troubleshooting through two problems stacked on top of each other, not easy.
Why Migrate At All
Speed. The old website took about 10 seconds to load. The provider lied to our faces telling there was nothing that they could do and the only solution was to upgrade to special service at a cost of some thousands per year.
Why Use a Managed Service ?
In 2014, the burden of operating the PP website was high and time spent maintaining the WordPress was preventing us from making content. We had 50+ hours a week $dayjobs
We chose to use a managed service called Rainmaker, largely based on their membership platform that we wanted to introduce since 2012.
Looking back it was a good-ish decision to outsource and we were able to pretend that running the website was a solved problem. We have built the business to the point where we can pay someone to run the infrastructure.
Like everything outsourced, it fails miserably and it’s just a matter of how long it takes. Rainmaker failed to deliver on 90% of what they promised and recently they gave up (like most startups) by selling themselves to WP Engine. Our experience of WP Engine “a bunch of high school kids with too much VC money” and nothing has changed my opinion on that front. (Do not use WP Engine for WordPress, seriously. It will end in tears.)
Servers, Podcasting and RSS Feeds
We estimate that Packet Pushers has between 75K–100K unique subscribers across the podcast channels we operate. We deliberately do NOT collect user data so we don’t really have a more accurate picture. Which is ok as this still makes us one of the largest media businesses in the enterprise IT market.
Just this single podcast feed has ~22K subscribers. Each podcast app has different method of checking for new content – Overcast makes its own copy of our feed to save battery life while most Android podcatchers scan our feeds on a regular basis.
The RSS feed can be a very large file. Consider a feed that had 50 items with the XML metadata and content is often 500K.
Incidentally, this is why we now run an excerpt only feed to reduce the size of RSS XML so your user experience is improved. Less battery and bandwidth on your mobile devices to check the feed.
For mobile devices with background polling enabled, they will hit the site regularly to check for new content. For thousands of users, this means your server is getting seriously hammered at regular intervals for unchanged XML data. At this time server was averaging 60-75% CPU usage on a fairly substantial configuration.
Spent some time tuning the caching plugin for WordPress only to discover that since 2014 caching plugins still do nothing for RSS feeds. We added Cloudflare but got very little traffic reduction. Post-cutover this hasn’t really changed and I have some future work to determine why caching isn’t working so well for us, I would expect more than 80% of the packetpushers.net site to be cached since nothing changes or is dynamic.
The solution is to use an RSS provider to handle RSS feeds which reduces the inbound traffic and gives us some simple tools to diagnose problems and get insights into how people access our content. Good news – servers CPU is now down to 20%. Of course, cutting back from the website to an RSS provider may have further impacted your RSS experience.
And we get this useful data from the RSS provider derived from the User Agent in the HTTP request. Knowing just how dominant Apple Podcasts is means we focus our efforts on getting reviews on Apple Podcasts app.
Note: requests to RSS feed contain no perrsonal data just the same simple HTTP requests that any web server gathers. We can see that a request is made and what was was requested. Packet Pushers has policy of collecting minimum data to respect our audience and because its expensive to collect, store and analyse that data. Why bother with all of that ?
Self-hosted. Why not cloud ? Running self-hosted is simpler than using the cloud for now. The time it would take to learn the AWS/GCP/Azure to the level we needed for competent deployment and operations would take some time. Rainmaker is shutting down and we didn’t have time to wait.
Plus cloud is expensive, more expensive than a simple IaaS instance running at $10 per month.
Load Balanced WordPress ? Again, not possible in the time frame. Also, troubleshooting/operations very difficult. Better to vertically scale WordPress, use a CDN, or shift load to a service as we have done. In particular, the load on the database is substantial and would be hard to use in a multi-instance setup. Cheaper to scale up the VM for now.
Other Managed Services : We aren’t large enough to engage a full corporate service from managed providers which is needed because of our customisation that aim towards podcasting. If the company goes, then its a possibility.