The mystery of pages disappearing from the google index

So I work for a reasonably large publishing organisation. Over the years we have changed the format of the URLs to get to our articles. There are a large numbers of links to these articles on the Internet and we want to continue supporting them so we have a couple of applications that redirect the users from the old URLs to the new one. We recently changed the URL format again. This time for the last time. We moved to a nice short URL format but continue to support the old formats. This was fine for a couple of months, but last week I noticed that none of our article content was indexed by Google anymore, so I started to investigate what the cause could be.

I logged on to the Google webmaster tools and discovered a few things.

  1. One of the sites that were were redirecting to was distributing malware.  There what a hidden iframe on their page that was going somewhere nasty. So we changed the redirect to another page on that site.
  2. There were some messages saying that there was a large amount of duplicate content on our site. Most of this was because of the jsessionid URL parameter. So we changed this in the webmaster tools and told Google to ignore this parameter.
    We also canoncialized our URLs. We just added the <link rel="canonical" href="http://www.developerslog.org/newshorturl.html"> to each of our article pages with the short URL.
  3. We are going to change the redirects from the old URL formats to 301 redirects instead of 302. We are a Java shop and I did not find the implementation of the 301 redirect particularly intuitive. Most of the documentation out there for doing a redirect just says httpServletResponse.sendRedirect("http://www.developerslog.org/newshorturl.html" ); // this is a 302 redirect.After a bit of searching I found how to do a 301 redirect.

    httpServletResponse.setStatus(HttpServletResponse.SC_MOVED_PERMANENTLY);
    httpServletResponse.setHeader( "Location", "http://www.developerslog.org/" );
    httpServletResponse.setHeader( "Connection", "close" );

I think the api for the HttpServletResponse should have another method for sending redirect that accepts the status code. Just my opinion but that would be more intuitive.

httpServletResponse.sendRedirect("http://www.example.com/newshorturl.html" , HttpServletResponse.SC_MOVED_PERMANENTLY);

Now we just have to wait and see what happens.

This entry was posted in Uncategorized. Bookmark the permalink.

11 Responses to The mystery of pages disappearing from the google index

  1. admin says:

    One thing that I did leave out is that we had some server issues and our site was very unstable for a few days. This does not seem so mysterious now. I found another person with a similar problem. Their database crashed and half their pages disappeared from the google index. Full story here

  2. Ingrid says:

    I truly knew about the majority of this, but with that in mind, I still assumed it was useful. Great blog!

  3. I am really satisfied with this posting that you have given us. This is really a stupendous work done by you. Thank you and looking for more posts
    [url=http://www.digitalcamerabuzz.com]Samsung Camera Reviews[/url]

    • Jay says:

      One thing that I did leave out is that we had some srever issues and our site was very unstable for a few days. This does not seem so mysterious now. I found another person with a similar problem. Their database crashed and half their pages disappeared from the google index. Full story

  4. Hey mate! I really appreciate what you’re providing here. Keep working that way.

  5. Stitches says:

    This shows real expeirtse. Thanks for the answer.

  6. admin says:

    Everything above was just a guess at what the problem was the downtime was heading in the right direction but after we talked with google we got the True causes. They are here

  7. Pingback: True cause of the pages dissappearing from the google index « Uncategorized « Developers Log

Leave a Reply to Doc Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>