Update the README a little bit
All checks were successful
ci/woodpecker/push/check Pipeline was successful
ci/woodpecker/push/manual Pipeline was successful
ci/woodpecker/push/release Pipeline was successful
ci/woodpecker/tag/check Pipeline was successful
ci/woodpecker/tag/manual Pipeline was successful
ci/woodpecker/tag/release Pipeline was successful
All checks were successful
ci/woodpecker/push/check Pipeline was successful
ci/woodpecker/push/manual Pipeline was successful
ci/woodpecker/push/release Pipeline was successful
ci/woodpecker/tag/check Pipeline was successful
ci/woodpecker/tag/manual Pipeline was successful
ci/woodpecker/tag/release Pipeline was successful
This commit is contained in:
parent
09f70ad8ce
commit
d8d6f13f7e
16
README.md
16
README.md
@ -26,11 +26,11 @@ If you need to fall back to a conventional search engine, this will eventually b
|
|||||||
|
|
||||||
*Crossed-out things are aspirational and not yet implemented.*
|
*Crossed-out things are aspirational and not yet implemented.*
|
||||||
|
|
||||||
- ~~Shareable 'rakepacks', so that anyone can run their own search instance without needing to rake (crawl) themselves~~
|
- Shareable 'rakepacks', so that anyone can run their own search instance without needing to rake (crawl) themselves
|
||||||
- ~~Dense encoding to minimise disk space usage; compressed with Zstd?~~
|
- Dense encoding to minimise disk space usage; compressed with Zstd.
|
||||||
- Raking (crawling) support for
|
- Raking (crawling) support for
|
||||||
- HTML (including redirecting to Canonical URLs)
|
- HTML (including redirecting to Canonical URLs)
|
||||||
- ~~Language detection~~
|
- Language detection for when the metadata is absent.
|
||||||
- Redirects
|
- Redirects
|
||||||
- ~~Gemtext over Gemini~~
|
- ~~Gemtext over Gemini~~
|
||||||
- RSS, Atom and JSON feeds
|
- RSS, Atom and JSON feeds
|
||||||
@ -43,9 +43,9 @@ If you need to fall back to a conventional search engine, this will eventually b
|
|||||||
- Article content extraction, to provide more weight to words found within the article content (based on a Rust version of Mozilla's *Readability* engine)
|
- Article content extraction, to provide more weight to words found within the article content (based on a Rust version of Mozilla's *Readability* engine)
|
||||||
- (Misc)
|
- (Misc)
|
||||||
- ~~Use of the Public Suffix List~~
|
- ~~Use of the Public Suffix List~~
|
||||||
- ~~Tagging URL patterns; e.g. to mark documentation as 'old'.~~
|
- Tagging URL patterns; e.g. to mark documentation as 'old'.
|
||||||
- ~~Page duplicate content detection (e.g. to detect `/` and `/index.html`, or non-HTTPS and HTTPS, or non-`www` and `www`...)~~
|
- ~~Page duplicate content detection (e.g. to detect `/` and `/index.html`, or non-HTTPS and HTTPS, or non-`www` and `www`...)~~
|
||||||
- ~~Language detection for pages that don't have that metadata available.~~
|
|
||||||
|
|
||||||
|
|
||||||
## Limitations
|
## Limitations
|
||||||
@ -62,11 +62,17 @@ If you need to fall back to a conventional search engine, this will eventually b
|
|||||||
|
|
||||||
*Not written yet.*
|
*Not written yet.*
|
||||||
|
|
||||||
|
The stages of the QuickPeep pipeline are briefly described in [an introductory blog post][qp_intro_blog].
|
||||||
|
|
||||||
|
[qp_intro_blog]: https://o.librepush.net/blog/2022-07-02-quickpeep-small-scale-web-search-engine
|
||||||
|
|
||||||
|
|
||||||
## Development and Running
|
## Development and Running
|
||||||
|
|
||||||
*Not written yet.*
|
*Not written yet.*
|
||||||
|
|
||||||
|
Some hints may be obtained from the introductory blog post mentioned in the 'Architecture' section, but it's probably quite difficult to follow right now.
|
||||||
|
|
||||||
|
|
||||||
### Helper scripts
|
### Helper scripts
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user