geminispace.info

gemini search engine
git clone https://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

commit e691231ec872d4dd241a69cc418eba42c32e967f
parent 8520ec533ce63a745c5dbb1bafc5c23722244f94
Author: René Wagner <rwagner@rw-net.de>
Date:   Fri, 26 Feb 2021 18:52:51 +0100

gsi specific updates 2021-02-26

Diffstat:
Mserve/templates/documentation/indexing.gmi | 2+-
Mserve/templates/news.gmi | 5+++++
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/serve/templates/documentation/indexing.gmi b/serve/templates/documentation/indexing.gmi @@ -24,7 +24,7 @@ GUS currently tends to update its index a few times per month. The last updated To control crawling of your site, you can use a robots.txt file, Place it in your capsule's root directory such that a request for "robots.txt" will fetch it. It should be returned with a mimetype of `text/plain`. -GUS obeys User-agent of "gus" and "*". +GUS obeys User-agent of "indexer" and "*". ### How can I recognize GUS requests? diff --git a/serve/templates/news.gmi b/serve/templates/news.gmi @@ -3,6 +3,11 @@ ## News +### 2021-02-26 +I've made some adjustments on how GUS/geminispace.info uses robots.txt. +Previously we tried to honor the settings for *, indexer and gus user-agents. That didn't work out well with the available python libraries for robots parsing and GUS ended up crawling files it wasn't intended tto. +We now only use the settings for * and indexer, no special handling for GUS anymore. All indexers unite. ;) + ### 2021-02-02 The first fully unattended index update has happened last night. There are still some rough edges to be cleaned, but we are on the way to have up-to-date search results without manual intervention.