next is the robots dot txt file And this is very important. So the robots dot txt file can tell google not to call certain pages and it can also tell google where the site maps are contained on the site. So the only requirements of a robots dot txt file are the lines user agent and disallow. So the example that we're looking at here for IBM, you can see those right here for any robots that txt file, you know that it's valid if you can see that there's a user agent line and it says agent string here and then there is a disallowed line and you don't even have to have these disallow lines down here. You just have to have that line disallowed in general. And below the disallow line is the heart of the robots at txt file because disallow indicates which parts of the site site crawlers are instructed to not crawl by the client. So here you can see that IBM is saying to google that they don't want google to crawl anything at this backslash link to crawl anything at back slash common slash err...
or etcetera. So the first step you'll want to take is go to your client's site name dot com slash robots dot txt. So again, we're looking at this example IBM dot com slash robots. Dot txt u r l to see whether they have that file uploaded. If they don't have this file uploaded, then it can usually be auto created by most CMS platforms similar to the xml site maps and once you've got the file, you first want to check to ensure that the root domain of the site in any important pages are not listed under disallow. So we can check here and we see that IBM dot com or it would just have that single backslash that that's not under disallow. This is a pretty common mistake after a new site launch or refresh, as developers will sometimes put the U R L s there that they don't want the public to access until those pages are ready to go live. So subsequently removing the disallowed tag is pretty easy to forget, but we're in good shape here. After that. You'll want to review the rest of the content under disallow and you want to make sure that everything looks like extraneous pages. So for example, email content or advertising third parties, things like that that don't need to be crawled. So everything IBM has here under disallow. These are presumably parts of their site that are not important for the standard user to see in search results. If there are any important pages under disallow, that you think should be included in Google's index. You want to flag those to the client
Bonus Materials with Purchase
19. Bonus Lesson - Site Speed Diagnostics and Optimization