add systemd-units for automatic crawling - geminispace.info

commit 6396d9f1869471d00977b2146b854b1158d8e50f
parent 3df47bfd42a8ea5142333e0cf5fa652752a04363
Author: René Wagner <rwagner@rw-net.de>
Date:   Thu, 28 Jan 2021 20:59:02 +0100

add systemd-units for automatic crawling

The template runs the crawler once a week on saturday afternoon.
If other launch times are wanted, gus-crawl.timer needs to be
modified.

Diffstat:
M README.md  | 31 +++++++++++++++++++++----------
A infra/gus-crawl.service  | 12 ++++++++++++
A infra/gus-crawl.timer  | 13 +++++++++++++

3 files changed, 46 insertions(+), 10 deletions(-)
diff --git a/README.md b/README.md
@@ -3,8 +3,8 @@
 
 ## Dependencies
 
-1. Install python and poetry
-2. Run: "poetry install"
+1. Install python (>3.5) and [poetry](https://python-poetry.org)
+2. Run: `poetry install`
 
 
 ## Making an initial index
@@ -13,34 +13,45 @@ Make sure you have some gemini URLs for testing which are nicely
 sandboxed to avoid indexing huge parts of the gemini space.
 
 1. Create a "seed-requests.txt" file with you test gemini URLs
-2. Run: "poetry run crawl -d"
-3. Run: "poetry run build_index -d"
+2. Run: `poetry run crawl -d`
+3. Run: `poetry run build_index -d`
 
 Now you'll have created `index.new` directory, rename it to `index`.
 
 
-## Running the frontend
+# Running the frontend
 
-1. Run: "poetry run serve"
+1. Run: `poetry run serve`
 2. Navigate your gemini client to: "gemini://localhost/"
 
+## automatic frontend with systemd-unit
+
+1. update `infra/gus.service` to match your needs (directory, user)
+2. copy `infra/gus.service` to `/etc/systemd/system/`
+3. run `systemctl enable gus` and `systemctl start gus`
 
 # Updating the index
 
-1. Run: "poetry run crawl"
-2. Run: "poetry run build_index"
+1. Run: `poetry run crawl`
+2. Run: `poetry run build_index`
 3. Restart frontend
 
+## systemd-unit for crawling
+
+1. update `infra/gus-crawl.service` to match your needs (directory, user)
+2. update `infra/gus-crawl.timer` to match your needs (OnCalendar definition)
+3. copy both files to `/etc/systemd/system/`
+4. run `systemctl enable gus-crawl.timer` & `systemctl start gus-crawl.timer` to start the timer
+
 
 ## Running test suite
 
-Run: "poetry run pytest"
+Run: `poetry run pytest`
 
 
 ## Roadmap / TODOs
 
 - TODO: improve crawl and build_index automation
-- TODO: get crawl to run on a schedule with systemd
 - TODO: add functionality to create a mock index
 - TODO: exclude raw-text blocks from indexed content
 - TODO: strip control characters from logged output like URLs
diff --git a/infra/gus-crawl.service b/infra/gus-crawl.service
@@ -0,0 +1,12 @@
+# /etc/systemd/system/gus.service
+
+[Unit]
+Description=Gemini Universal Search - Crawler
+
+[Service]
+User=gus
+Group=gus
+Type=oneshot
+WorkingDirectory=/home/gus/code/gus
+Environment="PYTHONUNBUFFERED=1"
+ExecStart=/home/gus/.poetry/bin/poetry run crawl
diff --git a/infra/gus-crawl.timer b/infra/gus-crawl.timer
@@ -0,0 +1,13 @@
+[Unit]
+Description=Gemini Universal Search - Crawler Timer
+ConditionVirtualization=!container
+
+[Timer]
+OnCalendar=Sat 18:00:00
+AccuracySec=1h
+Persistent=true
+RandomizedDelaySec=6000
+
+[Install]
+WantedBy=timers.target
+

	geminispace.info gemini search engine
	git clone https://git.clttr.info/geminispace.info.git
	Log (Feed) \| Files \| Refs (Tags) \| README \| LICENSE

M	README.md	\|	31	+++++++++++++++++++++----------
A	infra/gus-crawl.service	\|	12	++++++++++++
A	infra/gus-crawl.timer	\|	13	+++++++++++++