geminispace.info

gemini search engine
git clone https://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

commit 6396d9f1869471d00977b2146b854b1158d8e50f
parent 3df47bfd42a8ea5142333e0cf5fa652752a04363
Author: René Wagner <rwagner@rw-net.de>
Date:   Thu, 28 Jan 2021 20:59:02 +0100

add systemd-units for automatic crawling

The template runs the crawler once a week on saturday afternoon.
If other launch times are wanted, gus-crawl.timer needs to be
modified.

Diffstat:
MREADME.md | 31+++++++++++++++++++++----------
Ainfra/gus-crawl.service | 12++++++++++++
Ainfra/gus-crawl.timer | 13+++++++++++++
3 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md @@ -3,8 +3,8 @@ ## Dependencies -1. Install python and poetry -2. Run: "poetry install" +1. Install python (>3.5) and [poetry](https://python-poetry.org) +2. Run: `poetry install` ## Making an initial index @@ -13,34 +13,45 @@ Make sure you have some gemini URLs for testing which are nicely sandboxed to avoid indexing huge parts of the gemini space. 1. Create a "seed-requests.txt" file with you test gemini URLs -2. Run: "poetry run crawl -d" -3. Run: "poetry run build_index -d" +2. Run: `poetry run crawl -d` +3. Run: `poetry run build_index -d` Now you'll have created `index.new` directory, rename it to `index`. -## Running the frontend +# Running the frontend -1. Run: "poetry run serve" +1. Run: `poetry run serve` 2. Navigate your gemini client to: "gemini://localhost/" +## automatic frontend with systemd-unit + +1. update `infra/gus.service` to match your needs (directory, user) +2. copy `infra/gus.service` to `/etc/systemd/system/` +3. run `systemctl enable gus` and `systemctl start gus` # Updating the index -1. Run: "poetry run crawl" -2. Run: "poetry run build_index" +1. Run: `poetry run crawl` +2. Run: `poetry run build_index` 3. Restart frontend +## systemd-unit for crawling + +1. update `infra/gus-crawl.service` to match your needs (directory, user) +2. update `infra/gus-crawl.timer` to match your needs (OnCalendar definition) +3. copy both files to `/etc/systemd/system/` +4. run `systemctl enable gus-crawl.timer` & `systemctl start gus-crawl.timer` to start the timer + ## Running test suite -Run: "poetry run pytest" +Run: `poetry run pytest` ## Roadmap / TODOs - TODO: improve crawl and build_index automation -- TODO: get crawl to run on a schedule with systemd - TODO: add functionality to create a mock index - TODO: exclude raw-text blocks from indexed content - TODO: strip control characters from logged output like URLs diff --git a/infra/gus-crawl.service b/infra/gus-crawl.service @@ -0,0 +1,12 @@ +# /etc/systemd/system/gus.service + +[Unit] +Description=Gemini Universal Search - Crawler + +[Service] +User=gus +Group=gus +Type=oneshot +WorkingDirectory=/home/gus/code/gus +Environment="PYTHONUNBUFFERED=1" +ExecStart=/home/gus/.poetry/bin/poetry run crawl diff --git a/infra/gus-crawl.timer b/infra/gus-crawl.timer @@ -0,0 +1,13 @@ +[Unit] +Description=Gemini Universal Search - Crawler Timer +ConditionVirtualization=!container + +[Timer] +OnCalendar=Sat 18:00:00 +AccuracySec=1h +Persistent=true +RandomizedDelaySec=6000 + +[Install] +WantedBy=timers.target +