geminispace.info

gemini search engine
git clone https://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

commit 602b408933a33087690a4e7c088905d5987b6664
parent 572a30280e1fb58002557c057f04ace03b64294e
Author: René Wagner <rwa@clttr.info>
Date:   Sat, 19 Mar 2022 18:55:14 +0100

news 2022-03-19

Diffstat:
Mserve/templates/documentation/indexing.gmi | 5+++--
Mserve/templates/news.gmi | 7+++++++
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/serve/templates/documentation/indexing.gmi b/serve/templates/documentation/indexing.gmi @@ -11,10 +11,11 @@ geminispace.info is a search engine for all content served over the Gemini Proto ### What does geminispace.info index? geminispace.info will only index content within Geminispace, and will neither follow nor index links out to other protocols, like Http or Gopher. We will only crawl outwards by following Gemini links found within `text/gemini` pages. If you return a `text/plain` mimetype for a page, Gemini links within it will not register with GUS (though the content of the `text/plain` page will itself get indexed). +geminispace.info does not crawl capsules behind Onion links. Textual pages over 10 MB in size will not be indexed. -Please note that GUS' indexing has provisions for manually excluding content from it, which maintainers will typically use to exclude pages and domains that cause issues with index relevance or crawl success. GUS ends up crawling weird protocol experiments, proofs of concepts, and whatever other bizarre bits of technical creativity folks put up in Geminispace, so it is a continual effort to keep the index healthy. Please don't take it personally if your content ends up excluded, and I promise we are continually working to make GUS indexing more resilient and scalable! +Please note that there are provisions in place for manually excluding content from indexing, which maintainers will typically use to exclude pages and domains that cause issues with index relevance or crawl success. GUS ends up crawling weird protocol experiments, proofs of concepts, and whatever other bizarre bits of technical creativity folks put up in Geminispace, so it is a continual effort to keep the index healthy. Please don't take it personally if your content ends up excluded, and I promise we are continually working to make GUS indexing more resilient and scalable! Currently, especially content of the following types is excluded: - mirrors of large websites like Wikipedia or the Go-docs (it's just to much to add it to the index in the current state) @@ -24,7 +25,7 @@ Currently, especially content of the following types is excluded: geminispace.info checks for specific return codes like 31 PERMANENT REDIRECT and will save this information. When your capsule served an permanent redirect for some sort of stuff, geminispace.info will not re-crawl this stuff for at least a week. -### Controlling what GUS indexes with a robots.txt +### Controlling what geminispace.info indexes with a robots.txt To control crawling of your site, you can use a robots.txt file, Place it in your capsule's root directory such that a request for "robots.txt" will fetch it. It should be returned with a mimetype of `text/plain`. diff --git a/serve/templates/news.gmi b/serve/templates/news.gmi @@ -2,6 +2,13 @@ ## News +### 2022-03-19 TLS config update +geminispace.info allows now more variants of TLS ciphers which hopefully will allow us to crawl even more capsules. + +### 2022-03-05 monitoring +geminispace.info is now monitored (and i will be alerted if something goes wrong) by shit.cx. Big thanks to Jon for providing this service. +=> gemini://status.shit.cx shit.cx status monitoring. + ### 2022-02-08 oopsie So the last refactor went...erm...upside down. We had a outage for a few hours because of this. I rolled the changes back and will do another attempt for a (hopefully successfull) refactor in the next days *fingers crossed*