I use a custom 404 page on this site which emails me a notification whenever a bad request for a page is made. Over the last five days I’ve received in excess of 15,000 of these 404 emails. Oh dear!
This system has been pretty useful for spotting mistakes, mine and other peoples. If I include a wrong link in a post, I soon start getting 404 emails. I can quickly correct the issue. If someone else has a bad link to me, the email includes the referer (sic) field, so I can quickly trace the problem. Great stuff! It’s been working fine for ages.
Then, a few days ago, I notice a lot (a few hundred) of 404 emails in the Gmail folder they are automatically shunted to. I glanced through them and noticed that they looked like permanent links to my old blog url .../b2/archives/p/1234...
that had been somehow corrupted into .../journalized//p/1234/...
. I also noticed that it was Yahoo’s search engine web crawler. I moved one back to my Gmail inbox and popped a star on there to remind me to look into it.
So here I am today looking into it. I still hadn’t realized there were more than a few hundred! It was only when I fired up Thunderbird to clear out my POP3 mailboxes, that I saw some 20,000 emails waiting to download!
A fairly quick investigation revealed that my old b2 redirect script was still in place. But when I changed some code around and added some debug to it, I got nothing. Ah ha! I vaguely remembered fiddling with redirects in my .htaccess file the other day. I quickly spotted the culprit and commented out the line. Yay! instantly fixed.
I’d been trying to short circuit the PHP redirect code with the quicker apache redirect for the simplest case with the following line: Redirect Permanent /b2/archives https://journalized.zed1.com/
There are so many regular expression RedirectMatch lines in there that I forgot that that line would retain the rest of the URL when redirecting. You can even see where the extra slash came from!
Lesson learned: When making a change like this don’t just check it works, check that the other stuff isn’t broken!
Short link to this post: https://z1.tl/l4
Don’cha hate when that happens? I did the exact same thing when I moved servers, so it was doing some strange 301-ing.
Chalk it up to the webmaster’s laziness. Or rather, the miscommunication between the brain and the dimwit fingers 😛
Pingback: Is there a PC Doctor in the house?
You should consider setting up an RSS feed for these notifications – I run feeds for all sorts of stuff from my site and it’s pretty handy and does not mess up your mail either! 🙂
Richard has a great idea … you could also probably hack together a database tool that would allow you to get some handle on the sources of the 404s … let some scripts do the work for you rather than using brainpower to follow the rabbit trail. 🙂
Hey Mike, just a quick note to say thanks loads for the great theme. I am using it for my internet Radio Show (podcast) FirstPersonShow.net
I’ve Enjoyed having a poke around your website, and I’ll be back again to check out more later.
thanks,
Kevin
“celebrating the uncelebrated”
Would you share your 404 page? 🙂
Thanks.
Hi Eric,
I’ll look at packaging it up and distributing it here.
Mike
I call it htaccess hazard. This is the reason I am extremely cautious touching that thing.
However sometimes it is the best option.
u seem to be expert when it comes to pc webs blogs n programming .etc
im just a beginner i mean i jsut started my blog in http://sparkling-spirit.blogspot.com/
u can stop by there ur comments will b more than welcomed
just one Q : I wanna have calandar in my blog how can i add that?
thanx
O ok i just saw that the calandar u have is related to ur archieve .
oops !!
neyway what bout the pic ? i want to have a pic next to my nick name each time i upload a post how can i do that?
Oh, you mean you don’t like it when everyone puts in a wrong url just so you can get e-mail? =)
I usually just go over my stats at the end of every week an see where everyone is going wrong and fix it from there.