geminispace.info

gemini search engine
git clone https://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

commit faf87301d736167b8b8e6f14cf4fcf2e0308b5cc
parent beb4b4e038d43d2c859bc7f6d23bc468469c0122
Author: René Wagner <rwa@clttr.info>
Date:   Sun,  4 Jun 2023 19:41:35 +0200

revert changes made for search.clttr.info

Diffstat:
MREADME.md | 6+++---
Mgus/lib/search.py | 2+-
Mserve/templates/about.gmi | 14+++++++-------
Mserve/templates/news.gmi | 22++++------------------
4 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/README.md b/README.md @@ -4,7 +4,7 @@ ## Dependencies Install the following packages -- Python (>3.5) including `distutils` and `python-dev` (aka headers) +- Python (>= 3.11) including `distutils` and `python-dev` (aka headers) - [poetry](https://python-poetry.org) - gcc - OpenSSL @@ -54,9 +54,9 @@ Now you'll have created `index.new` directory, rename it to `index`. 2. Run `poetry run build_index` 3. Run `systemctl --user restart gus` -### Running the crawl & indexer in production with systemd +### Running the crawl & indexer in production -3. set up a cron job with the following params: `0 9 * * * <path to your working dir>/infra/update-index.sh <path to your working dir>` +1. set up a cron job with the following params: `0 9 * * * <path to your working dir>/infra/update_index.sh <path to your working dir>` ## Running the test suite diff --git a/gus/lib/search.py b/gus/lib/search.py @@ -77,7 +77,7 @@ class Index: def _rolling_writer(self): if not self._writer: - self._writer = self._index.writer(limitmb=2048, procs=3, multisegment=self._destructive) + self._writer = self._index.writer(limitmb=1536, procs=3, multisegment=self._destructive) return self._writer def add_document(self, document): diff --git a/serve/templates/about.gmi b/serve/templates/about.gmi @@ -1,27 +1,27 @@ {% include 'fragments/header.gmi' %} -## About search.clttr.info +## About geminispace.info -search.clttr.info is a search engine for content served over the Gemini Protocol. It provides both a search interface, so you can look for content within Geminispace by keywords, content types, content sizes and more. It also provides data on the size and characteristics of Geminispace itself. +geminispace.info is a search engine for content served over the Gemini Protocol. It provides both a search interface, so you can look for content within Geminispace by keywords, content types, content sizes and more. It also provides data on the size and characteristics of Geminispace itself. -search.clttr.info is powered by GUS, an open-source crawler & search engine made by Natalie Pendragon and contributors. The source code is available at the following forges: +geminispace.info is powered by GUS, an open-source crawler & search engine made by Natalie Pendragon and contributors. The source code is available at the following forges: => gemini://gmn.clttr.info/sources/geminispace.git/ browse the source on gemini => https://sr.ht/~rwa/geminispace.info source & mailinglist on sourcehut ### supporting geminispace.info -search.clttr.info runs on small PC lely to this service. -If you'd like to help keep search.clttr.info running feel free to donate a few bucks. Any donation will be directly used for server costs. +geminispace.info runs on a VPS in a german datacenter. +If you'd like to help keep geminispace.info running feel free to donate a few bucks. Any donation will be directly used for server costs. => https://liberapay.com/rwa/ support me on LiberaPay -=> https://donate.stripe.com/9AQ7uT1kGbJn8jCaEE one time donation via Stripe +=> https://donate.stripe.com/fZe3cr3386VnfzaaEF one time donation via Stripe In 2023 geminispace.info received donations worth 0f 105.92€ (transaction fees already excluded). In 2022 geminispace.info received donations worth of 82.78€. ### index updates -Index updates run every three days. +Index updates run every three days starting on the 1st of every month: 1, 4, 7, 10, 13, 16, 19, 22, 25, 28 ### contact the admin(s) diff --git a/serve/templates/news.gmi b/serve/templates/news.gmi @@ -12,7 +12,7 @@ We now provide a list of URIs that are currently excluded from crawl & indexing. ### 2023-01-29 updated TLS certificate geminispace.info now uses an updated certificate that uses X.509 Version 3. -I hope this improves compatibility with clients as the previously used X.590 v1 seems to move out of support in some omplementations. +I hope this improves compatibility with clients as the previously used X.590 v1 seems to move out of support in some implementations. ### 2023-01-27 update delay We had some issues with the crawler stuck in an "infinite maze" that should have never been crawled. This is solved for the moment and the index is up to date again. @@ -148,21 +148,11 @@ Seems due to the continued growth of gemini we are hitting the same problems Nat It took almost ten days the last reindex to complete as i triggered a complete index. This was necessary after the cleanup as there is currently no incremental cleanup of the search index implemented. The design of GUS - which clearly has never been meant to index such a huge number of capsules - and the slow VPS are doing no good currently to keep the index up to date. Unfortunately we are currently stuck with the VPS. Currently there is no progress to be reported on the coding site. I'm busy with various other things and late in the evening i can't bother to tackle some of the obvious tasks to improve GUS. If you are interesting in helping out improving GUS/geminispace.info feel free to comment on one of the issue or drop me a mail. -=> https://src.clttr.info/rwa/geminispace.info/issues/ issues and todos of geminispace.org +=> https://todo.sr.ht/~rwa/geminispace.info/ issues and todos of geminispace.org ### 2021-06-16 I've made some manual cleanup of the base data the last days. This decreased the raw data size from over 3 GB to roughly 2 GB. Unfortunately a new mirror of godocs came online...another thing we need to exclude for the moment. -### 2021-06-04 -geminispace.info runs rather stable the last weeks, but i added it to my external status monitor anyway: -=> https://status.clttr.info/public external status monitor (web only currently :( ) -It will alert me if it goes down. - -No news on the coding site currently. Other projects occupy the time that i can currently devote to tech stuff. -=> https://src.clttr.info/rwa/geminispace.info/issues/ issues and todos of geminispace.org - -We'll have a few days off, i'll get back to some coding after that. - ### 2021-05-25 geminispace.info is now aware of more than 1000 capsules. Unfortunately this data is somewhat misleading: some of the capsules may already be gone, but GUS lacks a mechanism for invaliding old data. I'll probably start with some manual cleanup the next days, so don't worry if numbers go down. @@ -172,9 +162,8 @@ We are back on track with crawl and index, everything is up-to-date again. I had to add another news and a wikipedia mirror to the exclude list. The current implementation can't handle such a huge amount of information well. ### 2021-05-08 -Obviously this didn't work as expected. For whatever reason indexing fails repeatedly on one or another page with a mysterious sqlite error. It may to a few days till i find enough time to search for the cause of this error. -If you are familiar with peewee and sqlite or have come across this issue earlier, let me know: -=> https://src.clttr.info/rwa/geminispace.info/issues/21 Here's the issue related to this error on src.clttr.info +Obviously this didn't work as expected. For whatever reason indexing fails repeatedly on one or another page with a mysterious sqlite error. It may take a few days till i find enough time to search for the cause of this error. +If you are familiar with peewee and sqlite or have come across this issue earlier, let me know. ### 2021-05-05 The index is currently a few days behind. It will hopefully catch up during the day. @@ -215,6 +204,3 @@ geminispace.info has just been announced on the gemini mailing list. ### 2021-01-25 geminispace.info is going public! Yeah! :) - -### 2021-01-18 -test drive of instance search.clttr.info started