geminispace.info

gemini search engine
git clone https://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

commit 5ff76ac64ef29928c15dce5d4f8d2a1ff1b53b18
parent 32d12c4c5e7396fb3ca01751acf9b37a8e5b1cd6
Author: Natalie Pendragon <natpen@natpen.net>
Date:   Fri, 15 May 2020 08:01:16 -0400

Update and reorder TODOs

Diffstat:
MREADME.md | 23++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md @@ -29,15 +29,11 @@ as this guide to [mailing list etiquette](https://man.sr.ht/lists.sr.ht/etiquett - **log output of crawl**: I see some errors fly by, and it would be nice to be able to review later and investigate. -- **add tests**: there aren't any yet! -- **add functionality to create a mock index**: this would - be useful for local hacking on serve.py, so one does - not need to perform a real scrape of Geminispace to do - said hacking. -- **exclude raw-text links**: I think there is a "raw-text block" - type of construct in the Gemini spec now, so I should probably - add a TODO to refactor the extract_gemini_links function to - exclude any links found within such a block. +- **create non-destructive crawl**: it would be nice to be able to run + the crawl in a non-destructive way that retains its memory of + which sites it has already seen, and only adds new content to + the index. +- **get crawl to run on a schedule with systemd** - **add more statistics**: this could go in the index statistics page, and, in addition to using the index itself, could also pull information from the jetforce logs. @@ -50,3 +46,12 @@ as this guide to [mailing list etiquette](https://man.sr.ht/lists.sr.ht/etiquett - num non-trivial redirects (i.e., more than just removing/adding trailing slash) - num cross-domain redirects +- **add tests**: there aren't any yet! +- **add functionality to create a mock index**: this would + be useful for local hacking on serve.py, so one does + not need to perform a real scrape of Geminispace to do + said hacking. +- **exclude raw-text links**: I think there is a "raw-text block" + type of construct in the Gemini spec now, so I should probably + add a TODO to refactor the extract_gemini_links function to + exclude any links found within such a block.