I use a custom 404 page on this site which emails me a notification whenever a bad request for a page is made. Over the last five days I’ve received in excess of 15,000 of these 404 emails. Oh dear!
This system has been pretty useful for spotting mistakes, mine and other peoples. If I include a wrong link in a post, I soon start getting 404 emails. I can quickly correct the issue. If someone else has a bad link to me, the email includes the referer (sic) field, so I can quickly trace the problem. Great stuff! It’s been working fine for ages.
Then, a few days ago, I notice a lot (a few hundred) of 404 emails in the Gmail folder they are automatically shunted to. I glanced through them and noticed that they looked like permanent links to my old blog url
.../b2/archives/p/1234... that had been somehow corrupted into
.../journalized//p/1234/.... I also noticed that it was Yahoo’s search engine web crawler. I moved one back to my Gmail inbox and popped a star on there to remind me to look into it.
So here I am today looking into it. I still hadn’t realized there were more than a few hundred! It was only when I fired up Thunderbird to clear out my POP3 mailboxes, that I saw some 20,000 emails waiting to download!
A fairly quick investigation revealed that my old b2 redirect script was still in place. But when I changed some code around and added some debug to it, I got nothing. Ah ha! I vaguely remembered fiddling with redirects in my .htaccess file the other day. I quickly spotted the culprit and commented out the line. Yay! instantly fixed.
I’d been trying to short circuit the PHP redirect code with the quicker apache redirect for the simplest case with the following line:
Redirect Permanent /b2/archives https://journalized.zed1.com/ There are so many regular expression RedirectMatch lines in there that I forgot that that line would retain the rest of the URL when redirecting. You can even see where the extra slash came from!
Lesson learned: When making a change like this don’t just check it works, check that the other stuff isn’t broken!