Commit Graph

225 Commits (main)

Author SHA1 Message Date
Olivier 'reivilibre' 9866da2d16 Load documents and index them! 2022-03-24 23:45:36 +00:00
Olivier 'reivilibre' 5418afe8dd Read rake pack records in the indexer 2022-03-24 23:37:11 +00:00
Olivier 'reivilibre' 4f4c3e36d1 Prepare indexer 2022-03-24 23:19:32 +00:00
Olivier 'reivilibre' 7fa302b2c2 Support flush() for Tantivy 2022-03-24 22:57:55 +00:00
Olivier 'reivilibre' 139fe380bc Cargo fmt 2022-03-24 22:57:13 +00:00
Olivier 'reivilibre' f43424de94 Add an open function for the Tantivy Backend 2022-03-24 22:55:21 +00:00
Olivier 'reivilibre' 73154e7e34 Move the indexer around 2022-03-24 19:50:20 +00:00
Olivier 'reivilibre' 7aa5521c5d Think a bit about how indexers will fit together
continuous-integration/drone the build failed Details
2022-03-23 20:57:51 +00:00
Olivier 'reivilibre' 1773ba4f44 Start fleshing out the indexer 2022-03-23 20:11:12 +00:00
Olivier 'reivilibre' 0060ec0764 Add seed sorting tool in order to approach first proof of concept
continuous-integration/drone the build failed Details
2022-03-22 23:54:28 +00:00
Olivier 'reivilibre' 528e0bbf43 Cargo fix and fmt 2022-03-22 23:23:31 +00:00
Olivier 'reivilibre' 753d03327a Fix weeds slipping in as seeds 2022-03-22 23:23:17 +00:00
Olivier 'reivilibre' be84c0e1cc Add DB inspection tool 2022-03-22 23:23:15 +00:00
Olivier 'reivilibre' db9fe77c16 Re-apply seeds and weeds at import time, to on-hold URLs 2022-03-22 20:01:26 +00:00
Olivier 'reivilibre' 2f5131e690 Don't enqueue references if they're weeds 2022-03-22 19:56:10 +00:00
Olivier 'reivilibre' 641c575660 Allow importing 'weeds' as opposed to 'seeds' 2022-03-22 19:52:50 +00:00
Olivier 'reivilibre' 05ebfc8998 Add process metrics
continuous-integration/drone the build failed Details
2022-03-21 20:19:59 +00:00
Olivier 'reivilibre' 2d35298a2e Add some metrics for emitted packs 2022-03-21 19:56:29 +00:00
Olivier 'reivilibre' 806192fab5 Shut down faster (don't wait for crawl delays) 2022-03-21 19:39:31 +00:00
Olivier 'reivilibre' a60ace0482 Fix bug where Ctrl+C wouldn't hang up the emitters 2022-03-21 19:38:03 +00:00
Olivier 'reivilibre' 649dec7fa9 Add tool for viewing what's on hold 2022-03-21 19:33:07 +00:00
Olivier 'reivilibre' fcc1f517af Support pages where we can't extract the article 2022-03-21 19:25:55 +00:00
Olivier 'reivilibre' e0fb714f7a Some bugfixes that get the raker mostly going 2022-03-21 19:24:20 +00:00
Olivier 'reivilibre' 51d5b9208b Put URLs on hold rather than the queue if they are not allowed 2022-03-21 19:16:48 +00:00
Olivier 'reivilibre' 9ef4fef858 Shut down gently on SIGINT or SIGTERM (supposedly) 2022-03-21 19:11:07 +00:00
Olivier 'reivilibre' 71c22daf0d Emit rakepacks from the raker 2022-03-21 19:07:35 +00:00
Olivier 'reivilibre' f60031a462 Improve domain acquisition and shutdown logic 2022-03-20 23:01:56 +00:00
Olivier 'reivilibre' 06b3c54b81 Remove active domain after all pages are raked
continuous-integration/drone the build failed Details
2022-03-20 22:24:35 +00:00
Olivier 'reivilibre' 6d109632a3 Sort through TODO items 2022-03-20 22:23:12 +00:00
Olivier 'reivilibre' 120702ce0e Don't forget to commit after R/W operations
continuous-integration/drone the build failed Details
2022-03-20 22:01:57 +00:00
Olivier 'reivilibre' f6efc7a4e5 Fix seed finder 2022-03-20 21:57:04 +00:00
Olivier 'reivilibre' 173b8a4de1 Comment out tagging code from the rake seeder 2022-03-20 21:51:33 +00:00
Olivier 'reivilibre' 179f04b2dd Import the seeds and show stats 2022-03-20 21:46:50 +00:00
Olivier 'reivilibre' abf814550a Import seeds (theoretically) 2022-03-20 21:41:32 +00:00
Olivier 'reivilibre' 8df430c7f1 Load and parse seeds 2022-03-20 20:50:31 +00:00
Olivier 'reivilibre' 39aa4eb9b7 Add seed file parser 2022-03-20 20:29:32 +00:00
Olivier 'reivilibre' 5e61386a83 Use ArcIntern and CompactString as needed 2022-03-20 15:49:00 +00:00
Olivier 'reivilibre' fc90ea4e1f Set migration version to something a little bit more intuitive 2022-03-20 15:44:53 +00:00
Olivier 'reivilibre' 6bdc505394 Rename qp-seeds to qp-seedrake 2022-03-20 15:42:06 +00:00
Olivier 'reivilibre' 5be6cade11 STASH notes about seeds
continuous-integration/drone the build failed Details
2022-03-20 15:20:13 +00:00
Olivier 'reivilibre' c3ccd64d5f Add way of periodically tracking database metrics 2022-03-20 14:24:30 +00:00
Olivier 'reivilibre' e651a953f6 STASH to calculate datastore metrics
continuous-integration/drone the build failed Details
2022-03-20 13:26:34 +00:00
Olivier 'reivilibre' 410a4e962b Ignore the workbench directory
continuous-integration/drone the build failed Details
2022-03-20 12:24:52 +00:00
Olivier 'reivilibre' f9aac34104 Add support for Prometheus metrics 2022-03-20 12:24:35 +00:00
Olivier 'reivilibre' a907817831 Use mold linker for faster compilation 2022-03-20 12:24:19 +00:00
Olivier 'reivilibre' 4f85aebd38 Theoretically allow graceful stop 2022-03-20 06:33:39 +00:00
Olivier 'reivilibre' 085020b80d Get ever closer to a raker being usable
continuous-integration/drone the build failed Details
2022-03-20 00:08:37 +00:00
Olivier 'reivilibre' ea4f2d1332 Some partial progress towards raking pages 2022-03-19 22:57:36 +00:00
Olivier 'reivilibre' 5bab279cc2 STASH
continuous-integration/drone the build failed Details
2022-03-19 15:39:59 +00:00
Olivier 'reivilibre' ab0b1e84ee STASH work on Raking
continuous-integration/drone the build failed Details
2022-03-19 21:04:12 +00:00