commit 1e63d8b307a42230db0a7e3fe2b2db9abcf2b608
parent aa3fdeaefb1f80aa0838c2ea62b8e73f6e832d40
Author: Natalie Pendragon <natpen@natpen.net>
Date: Sun, 1 Nov 2020 11:05:07 -0500
Clean up todo list in README
Diffstat:
1 file changed, 7 insertions(+), 22 deletions(-)
diff --git a/README.md b/README.md
@@ -34,25 +34,10 @@ Now you'll have created `index.new` directory, rename it to `index`.
## Roadmap / TODOs
-- **log output of crawl**: I see some errors fly by, and it
- would be nice to be able to review later and investigate.
-- **get crawl to run on a schedule with systemd**
-- **add more statistics**: this could go in the index statistics
- page, and, in addition to using the index itself, could also
- pull information from the jetforce logs.
- - server uptime (from indexes)
- - num new servers per week/month (from indexes)
- - num GUS queries per day (from server logs)
- - most common queries (not sure about this one) (from server logs)
- - num cross-domain redirects
- - num domains with robots
-- **add tests**: there aren't any yet!
-- **add functionality to create a mock index**: this would
- be useful for local hacking on serve.py, so one does
- not need to perform a real scrape of Geminispace to do
- said hacking.
-- **exclude raw-text links**: I think there is a "raw-text block"
- type of construct in the Gemini spec now, so I should probably
- add a TODO to refactor the extract_gemini_links function to
- exclude any links found within such a block.
-- **track number of inbound links**
+- TODO: improve crawl and build_index automation
+- TODO: get crawl to run on a schedule with systemd
+- TODO: add some automated tests
+- TODO: add functionality to create a mock index
+- TODO: exclude raw-text blocks from indexed content
+- TODO: strip control characters from logged output like URLs
+- TODO: fix bug in calulation of backlinks (iirc the bug is visible on gemini.circumlunar.space)