gemini search engine
git clone
Log (Feed) | Files | Refs (Tags) | README | LICENSE

commit 09abe013da9735c2007595fdc835cf928d6798b7
parent fd3a662f11bf8d1c1a52332951984ada1487c507
Author: Natalie Pendragon <>
Date:   Sat, 29 Feb 2020 08:13:22 -0500


Diffstat: | 28+++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/ b/ @@ -4,7 +4,9 @@ Note that doing this currently requires you to perform a full crawl of Geminispace. With little content, and few people hacking on this, it's probably fine, but we should definitely keep tabs on this to ensure we're kind and -respectful to content and server owners. +respectful to content and server owners (I think the +solution is that we need a way to create a mock index +sooner than later). 1. Get Python and [Poetry]( 2. Generate a local Geminispace index with `poetry run crawl` @@ -22,3 +24,27 @@ Please send patches to [~natpen/](mailto:~natpen/ For an introduction to mailing list-based Git collaboration, see [this introduction](, as well as this guide to [mailing list etiquette]( + +# Roadmap / TODOs + +- *general code cleanup*: most notably There are a lot + of hacks in there that I put in for expediency, but haven't + taken the time to address. +- *improve the indexing*: currently, the url is prepended to + the page content, and everything is simply indexed with the + default indexer. I think a better solution would be to have + urls indexed with a url-specific indexer that doesn't do + things like, e.g., porter-stemming, which I assume the + default indexer is doing. +- *extend the index to handle binary links in Geminispace*: + currently, there's a hack in the code to simply skip + anything that looks like a binary link. I think with the + above improvement to how indexing works, they could be + made very effectively searchable. Also in this vein, + binary links should be identified via their mime types + probably, instead of the suffix hack used now. +- *add tests*: there aren't any yet! +- *add functionality to create a mock index*: this would + be useful for local hacking on, so one does + not need to perform a real scrape of Geminispace to do + said hacking.