Find broken links on Pelican

Earlier this month I found, while reading some old blog posts, several dead links. I decided to replace them with a link to the suitable capture on the Internet Archive project.

Following that I thought it would be a good idea to check all links of the blog looking for broken ones. And as automated as possible, indeed.

At first I came accross a plugin for Pelican, pelican-deadlinks. This plugin checks links and redirects broken ones to at « compile » time. Here's the major point with this behavior: links are replaced in the HTML output of the blog, not in the source. I was quite unconfortable with this as I wanted to propagate these changes to the source.

So I went back to search another tool and found riplink. This little tool, written in Go, takes an URL as argument and will fetch all links printing those with a 4xx or 5xx response code. Eg:

~ $ riplink -url
~ $ riplink -url -verbose 200 200 200

As I only want to check blog posts1, I use riplink jointly with find and a for loop against the local server, which gives me:

~ $ for i in $(find output/blog/ -mindepth 4 -maxdepth 4 -type d) ; do riplink -url http://localhost:8000/${i#output/*} ; sleep 1 ; done 404 404

The last step is manual: for each broken link I search on the Internet Archive the capture which is the closest to the time of the post.


  1. the source of my blog consists of one folder by post, containing a post.markdown file