Update READMEs

This commit is contained in:
Olivier 'reivilibre' 2021-06-26 16:41:47 +01:00
parent 1a6f1a7001
commit 63a59357a8
2 changed files with 19 additions and 98 deletions

12
datman/README.md Normal file
View File

@ -0,0 +1,12 @@
# datman: DATa MANager
Datman is a tool to make it easier to use Yama for backups.
Features:
* Chunk-based deduplication
* (optional) Compression using Zstd and a specifiable dictionary
* (optional) Encryption
* Ability to back up to remote machines over SSH
See the documentation for more information.

View File

@ -1,109 +1,18 @@
# 山 (yama): deduplicated heap repository
note: this readme is not yet updated to reality…
Yama is a system for storing files and directory trees in 'piles'. The data stored is deduplicated (by using content-defined chunking) and can be compressed and encrypted, too.
```
yama
[-w|--with [user@host:]path] [--with-encrypted true|false]
NOT YET ~~Yama also permits storing to piles on remote computers, using SSH.~~
```
Yama is intended for use as a storage mechanism for backups. Datman is a tool to make it easier to use Yama for backups.
## Backup Profiles
The documentation is currently the best source of information about Yama, see the `docs` directory.
Yama can be used as a library for your own programs; further information about this is yet to be provided but the API documentation (Rustdocs) may be useful.
## Remotes
## Other, unpolished, notes
In `yama.toml`, you can configure remotes:
```toml
[remote.bob]
encrypted = true
host = "bobmachine.xyz"
user = "bob"
path = "/home/bob/yama"
```
## Subcommands
### `check`: Check repository for consistency
Verifies the full repository satisfies the following consistency constraints:
- all chunks have the correct hash
- all pointers have a valid structure, recursively
Usage: `yama check [--gc]`
The amount of space occupied and occupied by unused chunks is reported.
If `--gc` is specified, unused chunks will be removed.
### `lsp`: List tree pointers
Usage: `yama lsp`
### `rmp`: Remove tree pointers
Usage: `yama rmp pointer/path [--force]`
If `--force` is not specified and the pointer is depended upon by another, then deletion is aborted with an error.
### `store`: Store tree into repository
Usage: `yama store [--dry-run] [ssh://user@host]/path/to/dir pointer/path [--exclusions path/to/exclusions.txt] [--differential pointer/parent]`
The pointer must not exist and it will be created. If `--differential` is specified with an existing parent pointer, then the diretory listing is specified as a differential list to the parent.
The intention of this is to reduce the size of the directory list.
#### Exclusion lists
Exclusion lists have pretty much the same format as `.gitignore`, one glob per line of files to not include, relative to the tree root.
### `extract`: Extract file(s) from repository
Usage: `yama extract [--dry-run] pointer/path[:path] [ssh://user@host]/path/to/local/dir[/]`
If no path specified, extract root /. Trailing slash means that the file will be extracted as a child of the specified directory.
### `remote`: Run operations on a remote repository
Usage: `yama remote ssh://user@host/path/to/repo <subcommand>`
#### remote `store`: Store local tree into remote repository
Usage is identical to `yama store` except store path must be local.
#### remote `extract`: Extract remote repository into local tree
Usage is identical to `yama extract` except target path must be local.
### `slave`: Remote-controlled yama
Communicates over stdin/stdout to perform specified operations. Used when a yama command involves SSH.
## Repository Storage Details
Pointers are stored in `pointers.lmdb` and chunks are stored in `chunks.lmdb`.
It is expected that exclusion files will be kept in the same directory with the repository, if they are to be used
on a recurring basis.
Chunks are compressed with `zstd`. It must first be trained and a training dictionary placed in `repo root/zstd.dict`.
**This dictionary file must not be lost or altered after chunks have been made using it. Doing so will void the integrity of the entire repository.**
Chunks are hashed with BLAKE256, and chunks will have their xxHash calculated before being deduplicated away. (Collision being detected will result in abortion of the backup. It is expected to never happen but nevertheless we may not be sure.)
## Remote Protocol Details
* Compression is performed on the host where the data resides.
* Only required chunks are compressed and diffused across the SSH connection.
* There needs to be some mechanism to offer, decline and accept chunks, without buffers overflowing and bringing hosts down.
## Processor Details
## Other notes
### Training a Zstd Dictionary
`zstd --train FILEs -o zstd.dict`