commit 6396d9f1869471d00977b2146b854b1158d8e50f
parent 3df47bfd42a8ea5142333e0cf5fa652752a04363
Author: René Wagner <rwagner@rw-net.de>
Date: Thu, 28 Jan 2021 20:59:02 +0100
add systemd-units for automatic crawling
The template runs the crawler once a week on saturday afternoon.
If other launch times are wanted, gus-crawl.timer needs to be
modified.
Diffstat:
3 files changed, 46 insertions(+), 10 deletions(-)
diff --git a/README.md b/README.md
@@ -3,8 +3,8 @@
## Dependencies
-1. Install python and poetry
-2. Run: "poetry install"
+1. Install python (>3.5) and [poetry](https://python-poetry.org)
+2. Run: `poetry install`
## Making an initial index
@@ -13,34 +13,45 @@ Make sure you have some gemini URLs for testing which are nicely
sandboxed to avoid indexing huge parts of the gemini space.
1. Create a "seed-requests.txt" file with you test gemini URLs
-2. Run: "poetry run crawl -d"
-3. Run: "poetry run build_index -d"
+2. Run: `poetry run crawl -d`
+3. Run: `poetry run build_index -d`
Now you'll have created `index.new` directory, rename it to `index`.
-## Running the frontend
+# Running the frontend
-1. Run: "poetry run serve"
+1. Run: `poetry run serve`
2. Navigate your gemini client to: "gemini://localhost/"
+## automatic frontend with systemd-unit
+
+1. update `infra/gus.service` to match your needs (directory, user)
+2. copy `infra/gus.service` to `/etc/systemd/system/`
+3. run `systemctl enable gus` and `systemctl start gus`
# Updating the index
-1. Run: "poetry run crawl"
-2. Run: "poetry run build_index"
+1. Run: `poetry run crawl`
+2. Run: `poetry run build_index`
3. Restart frontend
+## systemd-unit for crawling
+
+1. update `infra/gus-crawl.service` to match your needs (directory, user)
+2. update `infra/gus-crawl.timer` to match your needs (OnCalendar definition)
+3. copy both files to `/etc/systemd/system/`
+4. run `systemctl enable gus-crawl.timer` & `systemctl start gus-crawl.timer` to start the timer
+
## Running test suite
-Run: "poetry run pytest"
+Run: `poetry run pytest`
## Roadmap / TODOs
- TODO: improve crawl and build_index automation
-- TODO: get crawl to run on a schedule with systemd
- TODO: add functionality to create a mock index
- TODO: exclude raw-text blocks from indexed content
- TODO: strip control characters from logged output like URLs
diff --git a/infra/gus-crawl.service b/infra/gus-crawl.service
@@ -0,0 +1,12 @@
+# /etc/systemd/system/gus.service
+
+[Unit]
+Description=Gemini Universal Search - Crawler
+
+[Service]
+User=gus
+Group=gus
+Type=oneshot
+WorkingDirectory=/home/gus/code/gus
+Environment="PYTHONUNBUFFERED=1"
+ExecStart=/home/gus/.poetry/bin/poetry run crawl
diff --git a/infra/gus-crawl.timer b/infra/gus-crawl.timer
@@ -0,0 +1,13 @@
+[Unit]
+Description=Gemini Universal Search - Crawler Timer
+ConditionVirtualization=!container
+
+[Timer]
+OnCalendar=Sat 18:00:00
+AccuracySec=1h
+Persistent=true
+RandomizedDelaySec=6000
+
+[Install]
+WantedBy=timers.target
+