From d8d6f13f7e108ea4094e3cc0205d8847f9b3c892 Mon Sep 17 00:00:00 2001 From: Olivier 'reivilibre Date: Sat, 2 Jul 2022 22:55:18 +0100 Subject: [PATCH] Update the README a little bit --- README.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 47e95e0..2b41a4e 100644 --- a/README.md +++ b/README.md @@ -26,11 +26,11 @@ If you need to fall back to a conventional search engine, this will eventually b *Crossed-out things are aspirational and not yet implemented.* -- ~~Shareable 'rakepacks', so that anyone can run their own search instance without needing to rake (crawl) themselves~~ - - ~~Dense encoding to minimise disk space usage; compressed with Zstd?~~ +- Shareable 'rakepacks', so that anyone can run their own search instance without needing to rake (crawl) themselves + - Dense encoding to minimise disk space usage; compressed with Zstd. - Raking (crawling) support for - HTML (including redirecting to Canonical URLs) - - ~~Language detection~~ + - Language detection for when the metadata is absent. - Redirects - ~~Gemtext over Gemini~~ - RSS, Atom and JSON feeds @@ -43,9 +43,9 @@ If you need to fall back to a conventional search engine, this will eventually b - Article content extraction, to provide more weight to words found within the article content (based on a Rust version of Mozilla's *Readability* engine) - (Misc) - ~~Use of the Public Suffix List~~ - - ~~Tagging URL patterns; e.g. to mark documentation as 'old'.~~ + - Tagging URL patterns; e.g. to mark documentation as 'old'. - ~~Page duplicate content detection (e.g. to detect `/` and `/index.html`, or non-HTTPS and HTTPS, or non-`www` and `www`...)~~ -- ~~Language detection for pages that don't have that metadata available.~~ + ## Limitations @@ -62,11 +62,17 @@ If you need to fall back to a conventional search engine, this will eventually b *Not written yet.* +The stages of the QuickPeep pipeline are briefly described in [an introductory blog post][qp_intro_blog]. + +[qp_intro_blog]: https://o.librepush.net/blog/2022-07-02-quickpeep-small-scale-web-search-engine + ## Development and Running *Not written yet.* +Some hints may be obtained from the introductory blog post mentioned in the 'Architecture' section, but it's probably quite difficult to follow right now. + ### Helper scripts