geminispace.info

gemini search engine
git clone https://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

handling-robots.md (244B)


      1 # robots.txt handling
      2 
      3 robots.txt is fetched for each (sub)domain before actually crawling the content.
      4 
      5 GUS honors the following User-agents:
      6 * indexer
      7 * *
      8 
      9 ## robots.txt caching
     10 
     11 Every fetched robots.txt is cached only for the current crawl.