geminispace.info

gemini search engine
git clone https://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

commit 1a0503927169e35987d3d7b9408d02ae1de9e4f1
parent 96faff09c5cf1e17fb408503bee3ae544bdb425f
Author: René Wagner <rwa@clttr.info>
Date:   Thu, 25 Apr 2024 12:42:35 +0200

more excludes and a news

Diffstat:
Mgus/excludes.py | 6+++++-
Mgus/lib/search.py | 2+-
Mserve/templates/news.gmi | 6++++++
3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gus/excludes.py b/gus/excludes.py @@ -205,7 +205,11 @@ EXCLUDED_URL_PREFIXES = [ "gemini://gemini.thegonz.net/ski", "gemini://gemini.thegonz.net/gemski", "gemini://thegonz.net/", - "gemini://gemlog.stargrave.org/" + "gemini://gemlog.stargrave.org/", + # NOULIN + "gemini://gmi.noulin.net/stackoverflow/", + "gemini://gmi.noulin.net/gitRepositories/", + "gemini://gmi.noulin.net/man/", ] EXCLUDED_URL_PATHS = [ diff --git a/gus/lib/search.py b/gus/lib/search.py @@ -77,7 +77,7 @@ class Index: def _rolling_writer(self): if not self._writer: - self._writer = self._index.writer(limitmb=1536, procs=3, multisegment=self._destructive) + self._writer = self._index.writer(limitmb=1024, procs=3, multisegment=self._destructive) return self._writer def add_document(self, document): diff --git a/serve/templates/news.gmi b/serve/templates/news.gmi @@ -2,6 +2,12 @@ ## News +### 2024-04-25 update troubles +We had some trouble to update the index in the last days. We crawled over 1.7 Million(!) pages on a single capsule - usefull stuff like "node_modules" directories pushed to Git repos served over Gemini. +When trying to include all those pages into our whoosh FTS index the VM simply froze - or sometimes the Python process was just oom-killed. + +We've removed the questionable pages and set up new excludes. Index is now up to date again. + ### 2024-01-05 happy birthday :) On this very day three years ago "geminispace.info" was born and being announced an the long gone mailinglist a few days later. And still today, with some adjusted bits here and some additions there, geminispace.info is essentially still standing on the foundations that ~natpen build with GUS. Kudos to Natalie