commit e691231ec872d4dd241a69cc418eba42c32e967f
parent 8520ec533ce63a745c5dbb1bafc5c23722244f94
Author: René Wagner <rwagner@rw-net.de>
Date: Fri, 26 Feb 2021 18:52:51 +0100
gsi specific updates 2021-02-26
Diffstat:
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/serve/templates/documentation/indexing.gmi b/serve/templates/documentation/indexing.gmi
@@ -24,7 +24,7 @@ GUS currently tends to update its index a few times per month. The last updated
To control crawling of your site, you can use a robots.txt file, Place it in your capsule's root directory such that a request for "robots.txt" will fetch it. It should be returned with a mimetype of `text/plain`.
-GUS obeys User-agent of "gus" and "*".
+GUS obeys User-agent of "indexer" and "*".
### How can I recognize GUS requests?
diff --git a/serve/templates/news.gmi b/serve/templates/news.gmi
@@ -3,6 +3,11 @@
## News
+### 2021-02-26
+I've made some adjustments on how GUS/geminispace.info uses robots.txt.
+Previously we tried to honor the settings for *, indexer and gus user-agents. That didn't work out well with the available python libraries for robots parsing and GUS ended up crawling files it wasn't intended tto.
+We now only use the settings for * and indexer, no special handling for GUS anymore. All indexers unite. ;)
+
### 2021-02-02
The first fully unattended index update has happened last night.
There are still some rough edges to be cleaned, but we are on the way to have up-to-date search results without manual intervention.