{"id":262,"date":"2019-05-23T13:41:29","date_gmt":"2019-05-23T08:11:29","guid":{"rendered":"https:\/\/newszii.com\/marketing\/?p=262"},"modified":"2019-05-23T17:41:39","modified_gmt":"2019-05-23T12:11:39","slug":"know-about-robots-txt-file","status":"publish","type":"post","link":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/","title":{"rendered":"What You Need To Know About Robots.Txt File"},"content":{"rendered":"<!-- date start--><p class=\"last-updated\" style=\"font-size:14px;\">Published on May 23rd, 2019<\/p><!-- date end--><p>Why should I have robots?text file? Is not it great when search engines visit my site frequently and index my content?<\/p>\n<h4>What? You Don&#8217;t Want Your Every Online Content To Be Indexed?<\/h4>\n<p>Oh, you want to avoid the risk of being imposed a duplicate <a href=\"https:\/\/newszii.com\/marketing\/website-copywriting-mistakes\/\" target=\"_blank\" rel=\"noopener\">content penalty<\/a>. Apart from this, your site might contain sensitive data that you do not want the world to see.<\/p>\n<p>You will also prefer that <a href=\"https:\/\/www.newszii.com\/search-engine-optimization-in-2019\" target=\"_blank\" rel=\"noopener\">search engines<\/a> do not index these pages.<\/p>\n<p>Robots.txt is a text (not HTML) is a simple text file placed on your web server which tells web crawlers like <a href=\"https:\/\/searchengineland.com\/heres-what-happened-when-i-followed-googlebot-for-3-months-308674\" target=\"_blank\" rel=\"nofollow noopener\">Googlebot<\/a> if they should access a file or not.<\/p>\n<p>By default search engines are greedy. They want to index as much high-quality information as they can, &amp; will assume that they can crawl everything unless you tell them otherwise. If you specify data for all bots (*) and data for a specific bot (like GoogleBot) then the specific bot commands will be followed while that engine ignores the global\/default bot commands.<\/p>\n<p>In practice, <a href=\"https:\/\/support.google.com\/webmasters\/answer\/6062596?hl=en\" target=\"_blank\" rel=\"noopener, nofollow\">robots.txt<\/a> files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website. These crawl instructions are specified by \u201cdisallowing\u201d or \u201callowing\u201d the behavior of certain (or all) user agents.<\/p>\n<p>A robots.txt file lives at the root of your site. So, for site www.example.com, the robots.txt file lives at www.example.com\/robots.txt. robots.txt is a plain text file that follows the Robots Exclusion Standard. A robots.txt file consists of one or more rules. Each rule blocks (or allows) access for a given crawler to a specified file path in that website. However, before you create or edit robots.txt,<a href=\"https:\/\/support.google.com\/webmasters\/answer\/6062608?hl=en\" target=\"_blank\" rel=\"nofollow noopener\"> you should know the limits of this URL blocking method<\/a>. At times, you might want to consider other mechanisms to ensure your URLs are not findable on the web.<\/p>\n<ol>\n<li>Robots.txt<\/li>\n<li>Creating Robots.txt File<\/li>\n<\/ol>\n<h4>1. Robots.Txt<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-289\" src=\"https:\/\/newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt_1.png\" alt=\" Robots.Txt\" width=\"1429\" height=\"750\" srcset=\"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt_1.png 1429w, https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt_1-300x157.png 300w, https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt_1-768x403.png 768w, https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt_1-1024x537.png 1024w, https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt_1-810x425.png 810w, https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt_1-1140x598.png 1140w\" sizes=\"(max-width: 1429px) 100vw, 1429px\" \/><\/p>\n<p>Websites always store the robots.txt at the root of the website. The search engine spiders look for the robots.txt on the root of a domain. It will not look anywhere else on the website so you can\u2019t actually specify a different location for it. Robots.txt is always named in lower case, as well.<\/p>\n<p>Robots.txt instructs the <a href=\"https:\/\/moz.com\/beginners-guide-to-seo\/how-search-engines-operate\" target=\"_blank\" rel=\"noopener, nofollow\">search engine spiders<\/a> as to what part of the website to index and what parts should be ignored. It somehow controls the action of the search engine spiders in such a way that it directs its movement.<\/p>\n<p>It is like a recommendation \u2013 robots.txt recommends which part of the sites to index.<\/p>\n<p>&nbsp;<\/p>\n<h4>2. Creating Robots.Txt File<\/h4>\n<p>How do you create a robots.txt file? First, you need to identify what should be included in the robots.txt file. A robots.txt file is like a list of instructions. One part of the robots.txt file is the User-agent. It is the one that tells the robots or spiders reading the file which robots should pay attention to the instructions. Oftentimes User-agent indicates \u201c*\u201d which means \u201call robots\u201d.<\/p>\n<p>After the User-agent follows the rules themselves. Remember that there should not be any blank lines in the instructions. The instructions in the robots.txt oftentimes follow these formats:<\/p>\n<ul>\n<li>Disallow: \/folder\/<\/li>\n<li>Disallow: \/file.htm<\/li>\n<\/ul>\n<p>Each line must bear only one instruction. If you put anything after \u201c#\u201d, that will be completely ignored since the spiders will consider it as a comment. So it is advised to write a comment on a separate line all by itself.<\/p>\n<p>In creating your site\u2019s robots.txt file, be extra careful with your syntax and commands. Once the robots fail to recognize a certain command in your robots.txt, it may interpret the wrong notion that you want them to stay away from <a href=\"https:\/\/help.woorank.com\/hc\/en-us\/articles\/360000236389-Make-Sure-Google-is-Indexing-Your-Site\" target=\"_blank\" rel=\"noopener , nofollow\">indexing your website<\/a>. The incorrect syntax may also prevent your entire site from being indexed by the robots.<\/p>\n<p>What you need to do is create the robots.txt file and check it twice or even thrice before you upload it. This will ensure proper indexing of your site. This practice will also minimize the probability of committing errors in commands and syntax.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Published on May 23rd, 2019Why should I have robots?text file? Is not it great when search engines visit my site frequently and index my content? What? You Don&#8217;t Want Your Every Online Content To Be Indexed? Oh, you want to avoid the risk of being imposed a duplicate content penalty. Apart from this, your site [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":290,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[6,49,43,5],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What You Need To Know About Robots.Txt File<\/title>\n<meta name=\"description\" content=\"The robots.txt file, also known as the robots exclusion protocol, is a text file webmasters. It tells search robots which pages you would like them not to visit.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What You Need To Know About Robots.Txt File\" \/>\n<meta property=\"og:description\" content=\"The robots.txt file, also known as the robots exclusion protocol, is a text file webmasters. It tells search robots which pages you would like them not to visit.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/\" \/>\n<meta property=\"og:site_name\" content=\"Newszii | Marketing\" \/>\n<meta property=\"article:published_time\" content=\"2019-05-23T08:11:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-05-23T12:11:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"newsziimarketing\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"newsziimarketing\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/\",\"url\":\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/\",\"name\":\"What You Need To Know About Robots.Txt File\",\"isPartOf\":{\"@id\":\"https:\/\/www.newszii.com\/marketing\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png\",\"datePublished\":\"2019-05-23T08:11:29+00:00\",\"dateModified\":\"2019-05-23T12:11:39+00:00\",\"author\":{\"@id\":\"https:\/\/www.newszii.com\/marketing\/#\/schema\/person\/3c8109aa24e93dbde4ecb31a18f57aec\"},\"description\":\"The robots.txt file, also known as the robots exclusion protocol, is a text file webmasters. It tells search robots which pages you would like them not to visit.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#primaryimage\",\"url\":\"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png\",\"contentUrl\":\"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png\",\"width\":1200,\"height\":628,\"caption\":\"Robots.Txt\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.newszii.com\/marketing\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"What You Need To Know About Robots.Txt File\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.newszii.com\/marketing\/#website\",\"url\":\"https:\/\/www.newszii.com\/marketing\/\",\"name\":\"Newszii | Marketing\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.newszii.com\/marketing\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.newszii.com\/marketing\/#\/schema\/person\/3c8109aa24e93dbde4ecb31a18f57aec\",\"name\":\"newsziimarketing\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.newszii.com\/marketing\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/55b198d34e3307f2018457bea99c7951?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/55b198d34e3307f2018457bea99c7951?s=96&d=mm&r=g\",\"caption\":\"newsziimarketing\"},\"url\":\"https:\/\/www.newszii.com\/marketing\/author\/newsziimarketing\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What You Need To Know About Robots.Txt File","description":"The robots.txt file, also known as the robots exclusion protocol, is a text file webmasters. It tells search robots which pages you would like them not to visit.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/","og_locale":"en_US","og_type":"article","og_title":"What You Need To Know About Robots.Txt File","og_description":"The robots.txt file, also known as the robots exclusion protocol, is a text file webmasters. It tells search robots which pages you would like them not to visit.","og_url":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/","og_site_name":"Newszii | Marketing","article_published_time":"2019-05-23T08:11:29+00:00","article_modified_time":"2019-05-23T12:11:39+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png","type":"image\/png"}],"author":"newsziimarketing","twitter_card":"summary_large_image","twitter_misc":{"Written by":"newsziimarketing","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/","url":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/","name":"What You Need To Know About Robots.Txt File","isPartOf":{"@id":"https:\/\/www.newszii.com\/marketing\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#primaryimage"},"image":{"@id":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#primaryimage"},"thumbnailUrl":"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png","datePublished":"2019-05-23T08:11:29+00:00","dateModified":"2019-05-23T12:11:39+00:00","author":{"@id":"https:\/\/www.newszii.com\/marketing\/#\/schema\/person\/3c8109aa24e93dbde4ecb31a18f57aec"},"description":"The robots.txt file, also known as the robots exclusion protocol, is a text file webmasters. It tells search robots which pages you would like them not to visit.","breadcrumb":{"@id":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#primaryimage","url":"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png","contentUrl":"https:\/\/www.newszii.com\/marketing\/wp-content\/uploads\/2019\/05\/Robots.Txt2_.png","width":1200,"height":628,"caption":"Robots.Txt"},{"@type":"BreadcrumbList","@id":"https:\/\/www.newszii.com\/marketing\/know-about-robots-txt-file\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.newszii.com\/marketing\/"},{"@type":"ListItem","position":2,"name":"What You Need To Know About Robots.Txt File"}]},{"@type":"WebSite","@id":"https:\/\/www.newszii.com\/marketing\/#website","url":"https:\/\/www.newszii.com\/marketing\/","name":"Newszii | Marketing","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.newszii.com\/marketing\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.newszii.com\/marketing\/#\/schema\/person\/3c8109aa24e93dbde4ecb31a18f57aec","name":"newsziimarketing","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.newszii.com\/marketing\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/55b198d34e3307f2018457bea99c7951?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/55b198d34e3307f2018457bea99c7951?s=96&d=mm&r=g","caption":"newsziimarketing"},"url":"https:\/\/www.newszii.com\/marketing\/author\/newsziimarketing\/"}]}},"_links":{"self":[{"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/posts\/262"}],"collection":[{"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/comments?post=262"}],"version-history":[{"count":8,"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/posts\/262\/revisions"}],"predecessor-version":[{"id":327,"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/posts\/262\/revisions\/327"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/media\/290"}],"wp:attachment":[{"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/media?parent=262"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/categories?post=262"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newszii.com\/marketing\/wp-json\/wp\/v2\/tags?post=262"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}