mdbx: добавление режима MDBX_NOSTICKYTHREADS вместо MDBX_NOTLS.

This commit is contained in:
Леонид Юрьев (Leonid Yuriev) 2024-04-02 00:22:09 +03:00
parent 1727b697a0
commit e56c73b4e6
14 changed files with 283 additions and 198 deletions

View File

@ -11,8 +11,7 @@ For the same reason ~~Github~~ is blacklisted forever.
So currently most of the links are broken due to noted malicious ~~Github~~ sabotage.
- [Replace SRW-lock on Windows to allow shrink DB with `MDBX_NOTLS` option](https://libmdbx.dqdkfa.ru/dead-github/issues/210).
- [More flexible support of asynchronous runtime/framework(s)](https://libmdbx.dqdkfa.ru/dead-github/issues/200).
- [Replace SRW-lock on Windows to allow shrink DB with `MDBX_NOSTICKYTHREADS` option](https://libmdbx.dqdkfa.ru/dead-github/issues/210).
- [Migration guide from LMDB to MDBX](https://libmdbx.dqdkfa.ru/dead-github/issues/199).
- [Support for RAW devices](https://libmdbx.dqdkfa.ru/dead-github/issues/124).
- [Support MessagePack for Keys & Values](https://libmdbx.dqdkfa.ru/dead-github/issues/115).
@ -22,6 +21,7 @@ So currently most of the links are broken due to noted malicious ~~Github~~ sabo
Done
----
- [More flexible support of asynchronous runtime/framework(s)](https://libmdbx.dqdkfa.ru/dead-github/issues/200).
- [Move most of `mdbx_chk` functional to the library API](https://libmdbx.dqdkfa.ru/dead-github/issues/204).
- [Simple careful mode for working with corrupted DB](https://libmdbx.dqdkfa.ru/dead-github/issues/223).
- [Engage an "overlapped I/O" on Windows](https://libmdbx.dqdkfa.ru/dead-github/issues/224).

View File

@ -190,18 +190,20 @@ readers without writer" case.
## One thread - One transaction
A thread can only use one transaction at a time, plus any nested
read-write transactions in the non-writemap mode. Each transaction
belongs to one thread. The \ref MDBX_NOTLS flag changes this for read-only
transactions. See below.
A thread can only use one transaction at a time, plus any nested
read-write transactions in the non-writemap mode. Each transaction
belongs to one thread. The \ref MDBX_NOSTICKYTHREADS flag changes this,
see below.
Do not start more than one transaction for a one thread. If you think
about this, it's really strange to do something with two data snapshots
at once, which may be different. MDBX checks and preventing this by
returning corresponding error code (\ref MDBX_TXN_OVERLAPPING, \ref MDBX_BAD_RSLOT,
\ref MDBX_BUSY) unless you using \ref MDBX_NOTLS option on the environment.
Nonetheless, with the `MDBX_NOTLS` option, you must know exactly what you
are doing, otherwise you will get deadlocks or reading an alien data.
Do not start more than one transaction for a one thread. If you think
about this, it's really strange to do something with two data snapshots
at once, which may be different. MDBX checks and preventing this by
returning corresponding error code (\ref MDBX_TXN_OVERLAPPING,
\ref MDBX_BAD_RSLOT, \ref MDBX_BUSY) unless you using
\ref MDBX_NOSTICKYTHREADS option on the environment.
Nonetheless, with the `MDBX_NOSTICKYTHREADS` option, you must know
exactly what you are doing, otherwise you will get deadlocks or reading
an alien data.
## Do not open twice

View File

@ -129,20 +129,23 @@ no open MDBX-instance(s) during fork(), or at least close it immediately after
necessary) in a child process would be both extreme complicated and so
fragile.
Do not start more than one transaction for a one thread. If you think about
this, it's really strange to do something with two data snapshots at once,
which may be different. MDBX checks and preventing this by returning
corresponding error code (\ref MDBX_TXN_OVERLAPPING, \ref MDBX_BAD_RSLOT, \ref MDBX_BUSY)
unless you using \ref MDBX_NOTLS option on the environment. Nonetheless, with the
\ref MDBX_NOTLS option, you must know exactly what you are doing, otherwise you
will get deadlocks or reading an alien data.
Do not start more than one transaction for a one thread. If you think
about this, it's really strange to do something with two data snapshots
at once, which may be different. MDBX checks and preventing this by
returning corresponding error code (\ref MDBX_TXN_OVERLAPPING,
\ref MDBX_BAD_RSLOT, \ref MDBX_BUSY) unless you using
\ref MDBX_NOSTICKYTHREADS option on the environment. Nonetheless,
with the \ref MDBX_NOSTICKYTHREADS option, you must know exactly what
you are doing, otherwise you will get deadlocks or reading an alien
data.
Also note that a transaction is tied to one thread by default using Thread
Local Storage. If you want to pass read-only transactions across threads,
you can use the \ref MDBX_NOTLS option on the environment. Nevertheless, a write
transaction entirely should only be used in one thread from start to finish.
MDBX checks this in a reasonable manner and return the \ref MDBX_THREAD_MISMATCH
error in rules violation.
Also note that a transaction is tied to one thread by default using
Thread Local Storage. If you want to pass transactions across threads,
you can use the \ref MDBX_NOSTICKYTHREADS option on the environment.
Nevertheless, a write transaction must be committed or aborted in the
same thread which it was started. MDBX checks this in a reasonable
manner and return the \ref MDBX_THREAD_MISMATCH error in rules
violation.
## Transactions, rollbacks etc

128
mdbx.h
View File

@ -1207,28 +1207,80 @@ enum MDBX_env_flags_t {
*/
MDBX_WRITEMAP = UINT32_C(0x80000),
/** Tie reader locktable slots to read-only transactions
* instead of to threads.
/** Отвязывает транзакции от потоков/threads насколько это возможно.
*
* Don't use Thread-Local Storage, instead tie reader locktable slots to
* \ref MDBX_txn objects instead of to threads. So, \ref mdbx_txn_reset()
* keeps the slot reserved for the \ref MDBX_txn object. A thread may use
* parallel read-only transactions. And a read-only transaction may span
* threads if you synchronizes its use.
* Эта опция предназначена для приложений, которые мультиплексируют множество
* пользовательских легковесных потоков выполнения по отдельным потокам
* операционной системы, например как это происходит в средах выполнения
* GoLang и Rust. Таким приложениям также рекомендуется сериализовать
* транзакции записи в одном потоке операционной системы, поскольку блокировка
* записи MDBX использует базовые системные примитивы синхронизации и ничего
* не знает о пользовательских потоках и/или легковесных потоков среды
* выполнения. Как минимум, обязательно требуется обеспечить завершение каждой
* пишущей транзакции строго в том же потоке операционной системы где она была
* запущена.
*
* Applications that multiplex many user threads over individual OS threads
* need this option. Such an application must also serialize the write
* transactions in an OS thread, since MDBX's write locking is unaware of
* the user threads.
* \note Начиная с версии v0.13 опция `MDBX_NOSTICKYTHREADS` полностью
* заменяет опцию \ref MDBX_NOTLS.
*
* \note Regardless to `MDBX_NOTLS` flag a write transaction entirely should
* always be used in one thread from start to finish. MDBX checks this in a
* reasonable manner and return the \ref MDBX_THREAD_MISMATCH error in rules
* violation.
* При использовании `MDBX_NOSTICKYTHREADS` транзакции становятся не
* ассоциированными с создавшими их потоками выполнения. Поэтому в функциях
* API не выполняется проверка соответствия транзакции и текущего потока
* выполнения. Большинство функций работающих с транзакциями и курсорами
* становится возможным вызывать из любых потоков выполнения. Однако, также
* становится невозможно обнаружить ошибки одновременного использования
* транзакций и/или курсоров в разных потоках.
*
* This flag affects only at environment opening but can't be changed after.
* Использование `MDBX_NOSTICKYTHREADS` также сужает возможности по изменению
* размера БД, так как теряется возможность отслеживать работающие с БД потоки
* выполнения и приостанавливать их на время снятия отображения БД в ОЗУ. В
* частности, по этой причине на Windows уменьшение файла БД не возможно до
* закрытия БД последним работающим с ней процессом или до последующего
* открытия БД в режиме чтения-записи.
*
* \warning Вне зависимости от \ref MDBX_NOSTICKYTHREADS и \ref MDBX_NOTLS не
* допускается одновременно использование объектов API из разных потоков
* выполнения! Обеспечение всех мер для исключения одновременного
* использования объектов API из разных потоков выполнения целиком ложится на
* вас!
*
* \warning Транзакции записи могут быть завершены только в том же потоке
* выполнения где они были запущены. Это ограничение следует из требований
* большинства операционных систем о том, что захваченный примитив
* синхронизации (мьютекс, семафор, критическая секция) должен освобождаться
* только захватившим его потоком выполнения.
*
* \warning Создание курсора в контексте транзакции, привязка курсора к
* транзакции, отвязка курсора от транзакции и закрытие привязанного к
* транзакции курсора, являются операциями использующими как сам курсор так и
* соответствующую транзакцию. Аналогично, завершение или прерывание
* транзакции является операцией использующей как саму транзакцию, так и все
* привязанные к ней курсоры. Во избежание повреждения внутренних структур
* данных, непредсказуемого поведения, разрушение БД и потери данных следует
* не допускать возможности одновременного использования каких-либо курсора
* или транзакций из разных потоков выполнения.
*
* Читающие транзакции при использовании `MDBX_NOSTICKYTHREADS` перестают
* использовать TLS (Thread Local Storage), а слоты блокировок MVCC-снимков в
* таблице читателей привязываются только к транзакциям. Завершение каких-либо
* потоков не приводит к снятию блокировок MVCC-снимков до явного завершения
* транзакций, либо до завершения соответствующего процесса в целом.
*
* Для пишущих транзакций не выполняется проверка соответствия текущего потока
* выполнения и потока создавшего транзакцию. Однако, фиксация или прерывание
* пишущих транзакций должны выполняться строго в потоке запустившим
* транзакцию, так как эти операции связаны с захватом и освобождением
* примитивов синхронизации (мьютексов, критических секций), для которых
* большинство операционных систем требует освобождение только потоком
* захватившим ресурс.
*
* Этот флаг вступает в силу при открытии среды и не может быть изменен после.
*/
MDBX_NOTLS = UINT32_C(0x200000),
MDBX_NOSTICKYTHREADS = UINT32_C(0x200000),
#ifndef _MSC_VER /* avoid madness MSVC */
/** \deprecated Please use \ref MDBX_NOSTICKYTHREADS instead. */
MDBX_NOTLS MDBX_DEPRECATED = MDBX_NOSTICKYTHREADS,
#endif /* avoid madness MSVC */
/** Don't do readahead.
*
@ -2121,11 +2173,12 @@ enum MDBX_option_t {
* track readers in the the environment. The default is about 100 for 4K
* system page size. Starting a read-only transaction normally ties a lock
* table slot to the current thread until the environment closes or the thread
* exits. If \ref MDBX_NOTLS is in use, \ref mdbx_txn_begin() instead ties the
* slot to the \ref MDBX_txn object until it or the \ref MDBX_env object is
* destroyed. This option may only set after \ref mdbx_env_create() and before
* \ref mdbx_env_open(), and has an effect only when the database is opened by
* the first process interacts with the database.
* exits. If \ref MDBX_NOSTICKYTHREADS is in use, \ref mdbx_txn_begin()
* instead ties the slot to the \ref MDBX_txn object until it or the \ref
* MDBX_env object is destroyed. This option may only set after \ref
* mdbx_env_create() and before \ref mdbx_env_open(), and has an effect only
* when the database is opened by the first process interacts with the
* database.
*
* \see mdbx_env_set_maxreaders() \see mdbx_env_get_maxreaders() */
MDBX_opt_max_readers,
@ -2389,7 +2442,7 @@ LIBMDBX_API int mdbx_env_get_option(const MDBX_env *env,
*
* Flags set by mdbx_env_set_flags() are also used:
* - \ref MDBX_ENV_DEFAULTS, \ref MDBX_NOSUBDIR, \ref MDBX_RDONLY,
* \ref MDBX_EXCLUSIVE, \ref MDBX_WRITEMAP, \ref MDBX_NOTLS,
* \ref MDBX_EXCLUSIVE, \ref MDBX_WRITEMAP, \ref MDBX_NOSTICKYTHREADS,
* \ref MDBX_NORDAHEAD, \ref MDBX_NOMEMINIT, \ref MDBX_COALESCE,
* \ref MDBX_LIFORECLAIM. See \ref env_flags section.
*
@ -3385,7 +3438,7 @@ LIBMDBX_API int mdbx_env_get_fd(const MDBX_env *env, mdbx_filehandle_t *fd);
* 2) Temporary close memory mapped is required to change
* geometry, but there read transaction(s) is running
* and no corresponding thread(s) could be suspended
* since the \ref MDBX_NOTLS mode is used.
* since the \ref MDBX_NOSTICKYTHREADS mode is used.
* \retval MDBX_EACCESS The environment opened in read-only.
* \retval MDBX_MAP_FULL Specified size smaller than the space already
* consumed by the environment.
@ -3504,11 +3557,11 @@ mdbx_limits_txnsize_max(intptr_t pagesize);
* track readers in the the environment. The default is about 100 for 4K system
* page size. Starting a read-only transaction normally ties a lock table slot
* to the current thread until the environment closes or the thread exits. If
* \ref MDBX_NOTLS is in use, \ref mdbx_txn_begin() instead ties the slot to the
* \ref MDBX_txn object until it or the \ref MDBX_env object is destroyed.
* This function may only be called after \ref mdbx_env_create() and before
* \ref mdbx_env_open(), and has an effect only when the database is opened by
* the first process interacts with the database.
* \ref MDBX_NOSTICKYTHREADS is in use, \ref mdbx_txn_begin() instead ties the
* slot to the \ref MDBX_txn object until it or the \ref MDBX_env object is
* destroyed. This function may only be called after \ref mdbx_env_create() and
* before \ref mdbx_env_open(), and has an effect only when the database is
* opened by the first process interacts with the database.
* \see mdbx_env_get_maxreaders()
*
* \param [in] env An environment handle returned
@ -3702,8 +3755,8 @@ mdbx_env_get_userctx(const MDBX_env *env);
* \see mdbx_txn_begin()
*
* \note A transaction and its cursors must only be used by a single thread,
* and a thread may only have a single transaction at a time. If \ref MDBX_NOTLS
* is in use, this does not apply to read-only transactions.
* and a thread may only have a single transaction at a time unless
* the \ref MDBX_NOSTICKYTHREADS is used.
*
* \note Cursors may not span transactions.
*
@ -3764,8 +3817,8 @@ LIBMDBX_API int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent,
* \see mdbx_txn_begin_ex()
*
* \note A transaction and its cursors must only be used by a single thread,
* and a thread may only have a single transaction at a time. If \ref MDBX_NOTLS
* is in use, this does not apply to read-only transactions.
* and a thread may only have a single transaction at a time unless
* the \ref MDBX_NOSTICKYTHREADS is used.
*
* \note Cursors may not span transactions.
*
@ -4140,10 +4193,11 @@ LIBMDBX_API int mdbx_txn_break(MDBX_txn *txn);
* Abort the read-only transaction like \ref mdbx_txn_abort(), but keep the
* transaction handle. Therefore \ref mdbx_txn_renew() may reuse the handle.
* This saves allocation overhead if the process will start a new read-only
* transaction soon, and also locking overhead if \ref MDBX_NOTLS is in use. The
* reader table lock is released, but the table slot stays tied to its thread
* or \ref MDBX_txn. Use \ref mdbx_txn_abort() to discard a reset handle, and to
* free its lock table slot if \ref MDBX_NOTLS is in use.
* transaction soon, and also locking overhead if \ref MDBX_NOSTICKYTHREADS is
* in use. The reader table lock is released, but the table slot stays tied to
* its thread or \ref MDBX_txn. Use \ref mdbx_txn_abort() to discard a reset
* handle, and to free its lock table slot if \ref MDBX_NOSTICKYTHREADS is in
* use.
*
* Cursors opened within the transaction must not be used again after this
* call, except with \ref mdbx_cursor_renew() and \ref mdbx_cursor_close().

View File

@ -3679,8 +3679,8 @@ public:
/// \brief Operate options.
struct LIBMDBX_API_TYPE operate_options {
/// \copydoc MDBX_NOTLS
bool orphan_read_transactions{false};
/// \copydoc MDBX_NOSTICKYTHREADS
bool no_sticky_threads{false};
/// \brief Разрешает вложенные транзакции ценой отключения
/// \ref MDBX_WRITEMAP и увеличением накладных расходов.
bool nested_write_transactions{false};

View File

@ -21,7 +21,7 @@ N | MASK | ENV | TXN | DB | PUT | DBI | NOD
18|0004 0000|NOMETASYNC |TXN_NOMETASYNC|CREATE |APPENDDUP | | | | |
19|0008 0000|WRITEMAP |<= | |MULTIPLE | | | | <= |
20|0010 0000|UTTERLY | | | | | | | <= |
21|0020 0000|NOTLS |<= | | | | | | |
21|0020 0000|NOSTICKYTHR|<= | | | | | | |
22|0040 0000|EXCLUSIVE | | | | | | | |
23|0080 0000|NORDAHEAD | | | | | | | |
24|0100 0000|NOMEMINIT |TXN_PREPARE | | | | | | |

View File

@ -1580,7 +1580,7 @@ __cold int rthc_register(MDBX_env *const env) {
rthc_limit *= 2;
}
if ((env->me_flags & MDBX_NOTLS) == 0) {
if ((env->me_flags & MDBX_NOSTICKYTHREADS) == 0) {
rc = thread_key_create(&env->me_txkey);
if (unlikely(rc != MDBX_SUCCESS))
goto bailout;
@ -3275,7 +3275,7 @@ enum {
#define TXN_END_UPDATE 0x10 /* update env state (DBIs) */
#define TXN_END_FREE 0x20 /* free txn unless it is MDBX_env.me_txn0 */
#define TXN_END_EOTDONE 0x40 /* txn's cursors already closed */
#define TXN_END_SLOT 0x80 /* release any reader slot if MDBX_NOTLS */
#define TXN_END_SLOT 0x80 /* release any reader slot if NOSTICKYTHREADS */
static int txn_end(MDBX_txn *txn, const unsigned mode);
static __always_inline pgr_t page_get_inline(const uint16_t ILL,
@ -6562,10 +6562,13 @@ __cold static int dxb_resize(MDBX_env *const env, const pgno_t used_pgno,
size_bytes == env->me_dxb_mmap.filesize)
goto bailout;
/* При использовании MDBX_NOSTICKYTHREADS с транзакциями могут работать любые
* потоки и у нас нет информации о том, какие именно. Поэтому нет возможности
* выполнить remap-действия требующие приостановки работающих с БД потоков. */
if ((env->me_flags & MDBX_NOSTICKYTHREADS) == 0) {
#if defined(_WIN32) || defined(_WIN64)
if ((env->me_flags & MDBX_NOTLS) == 0 &&
((size_bytes < env->me_dxb_mmap.current && mode > implicit_grow) ||
limit_bytes != env->me_dxb_mmap.limit)) {
if ((size_bytes < env->me_dxb_mmap.current && mode > implicit_grow) ||
limit_bytes != env->me_dxb_mmap.limit) {
/* 1) Windows allows only extending a read-write section, but not a
* corresponding mapped view. Therefore in other cases we must suspend
* the local threads for safe remap.
@ -6589,8 +6592,7 @@ __cold static int dxb_resize(MDBX_env *const env, const pgno_t used_pgno,
}
#else /* Windows */
MDBX_lockinfo *const lck = env->me_lck_mmap.lck;
if (mode == explicit_resize && limit_bytes != env->me_dxb_mmap.limit &&
!(env->me_flags & MDBX_NOTLS)) {
if (mode == explicit_resize && limit_bytes != env->me_dxb_mmap.limit) {
mresize_flags |= MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE;
if (lck) {
int err = osal_rdt_lock(env) /* lock readers table until remap done */;
@ -6616,6 +6618,7 @@ __cold static int dxb_resize(MDBX_env *const env, const pgno_t used_pgno,
}
}
#endif /* ! Windows */
}
const pgno_t aligned_munlock_pgno =
(mresize_flags & (MDBX_MRESIZE_MAY_UNMAP | MDBX_MRESIZE_MAY_MOVE))
@ -8616,26 +8619,30 @@ static int meta_sync(const MDBX_env *env, const meta_ptr_t head) {
return rc;
}
static __inline bool env_txn0_owned(const MDBX_env *env) {
return (env->me_flags & MDBX_NOSTICKYTHREADS)
? (env->me_txn0->mt_owner != 0)
: (env->me_txn0->mt_owner == osal_thread_self());
}
__cold static int env_sync(MDBX_env *env, bool force, bool nonblock) {
bool locked = false;
if (unlikely(env->me_flags & MDBX_RDONLY))
return MDBX_EACCESS;
const bool txn0_owned = env_txn0_owned(env);
bool should_unlock = false;
int rc = MDBX_RESULT_TRUE /* means "nothing to sync" */;
retry:;
unsigned flags = env->me_flags & ~(MDBX_NOMETASYNC | MDBX_SHRINK_ALLOWED);
if (unlikely((flags & (MDBX_RDONLY | MDBX_FATAL_ERROR | MDBX_ENV_ACTIVE)) !=
if (unlikely((flags & (MDBX_FATAL_ERROR | MDBX_ENV_ACTIVE)) !=
MDBX_ENV_ACTIVE)) {
rc = MDBX_EACCESS;
if (!(flags & MDBX_ENV_ACTIVE))
rc = MDBX_EPERM;
if (flags & MDBX_FATAL_ERROR)
rc = MDBX_PANIC;
rc = (flags & MDBX_FATAL_ERROR) ? MDBX_PANIC : MDBX_EPERM;
goto bailout;
}
const bool inside_txn =
(!locked && env->me_txn0->mt_owner == osal_thread_self());
const meta_troika_t troika =
(inside_txn | locked) ? env->me_txn0->tw.troika : meta_tap(env);
(txn0_owned | should_unlock) ? env->me_txn0->tw.troika : meta_tap(env);
const meta_ptr_t head = meta_recent(env, &troika);
const uint64_t unsynced_pages =
atomic_load64(&env->me_lck->mti_unsynced_pages, mo_Relaxed);
@ -8646,7 +8653,7 @@ retry:;
goto bailout;
}
if (locked && (env->me_flags & MDBX_WRITEMAP) &&
if (should_unlock && (env->me_flags & MDBX_WRITEMAP) &&
unlikely(head.ptr_c->mm_geo.next >
bytes2pgno(env, env->me_dxb_mmap.current))) {
@ -8676,8 +8683,8 @@ retry:;
osal_monotime() - eoos_timestamp >= autosync_period))
flags &= MDBX_WRITEMAP /* clear flags for full steady sync */;
if (!inside_txn) {
if (!locked) {
if (!txn0_owned) {
if (!should_unlock) {
#if MDBX_ENABLE_PGOP_STAT
unsigned wops = 0;
#endif /* MDBX_ENABLE_PGOP_STAT */
@ -8723,7 +8730,7 @@ retry:;
if (unlikely(err != MDBX_SUCCESS))
return err;
locked = true;
should_unlock = true;
#if MDBX_ENABLE_PGOP_STAT
env->me_lck->mti_pgop_stat.wops.weak += wops;
#endif /* MDBX_ENABLE_PGOP_STAT */
@ -8737,8 +8744,8 @@ retry:;
flags |= MDBX_SHRINK_ALLOWED;
}
eASSERT(env, inside_txn || locked);
eASSERT(env, !inside_txn || (flags & MDBX_SHRINK_ALLOWED) == 0);
eASSERT(env, txn0_owned || should_unlock);
eASSERT(env, !txn0_owned || (flags & MDBX_SHRINK_ALLOWED) == 0);
if (!head.is_steady && unlikely(env->me_stuck_meta >= 0) &&
troika.recent != (uint8_t)env->me_stuck_meta) {
@ -8765,7 +8772,7 @@ retry:;
rc = meta_sync(env, head);
bailout:
if (locked)
if (should_unlock)
osal_txn_unlock(env);
return rc;
}
@ -8854,7 +8861,7 @@ static void txn_valgrind(MDBX_env *env, MDBX_txn *txn) {
if (env->me_pid != osal_getpid()) {
/* resurrect after fork */
return;
} else if (env->me_txn0 && env->me_txn0->mt_owner == osal_thread_self()) {
} else if (env->me_txn && env_txn0_owned(env)) {
/* inside write-txn */
last = meta_recent(env, &env->me_txn0->tw.troika).ptr_v->mm_geo.next;
} else if (env->me_flags & MDBX_RDONLY) {
@ -8950,7 +8957,7 @@ static bind_rslot_result bind_rslot(MDBX_env *env, const uintptr_t tid) {
safe64_reset(&result.rslot->mr_txnid, true);
if (slot == nreaders)
env->me_lck->mti_numreaders.weak = (uint32_t)++nreaders;
result.rslot->mr_tid.weak = (env->me_flags & MDBX_NOTLS) ? 0 : tid;
result.rslot->mr_tid.weak = (env->me_flags & MDBX_NOSTICKYTHREADS) ? 0 : tid;
atomic_store32(&result.rslot->mr_pid, env->me_pid, mo_AcquireRelease);
osal_rdt_unlock(env);
@ -8970,12 +8977,12 @@ __cold int mdbx_thread_register(const MDBX_env *env) {
return (env->me_flags & MDBX_EXCLUSIVE) ? MDBX_EINVAL : MDBX_EPERM;
if (unlikely((env->me_flags & MDBX_ENV_TXKEY) == 0)) {
eASSERT(env, !env->me_lck_mmap.lck || (env->me_flags & MDBX_NOTLS));
return MDBX_EINVAL /* MDBX_NOTLS mode */;
eASSERT(env, env->me_flags & MDBX_NOSTICKYTHREADS);
return MDBX_EINVAL /* MDBX_NOSTICKYTHREADS mode */;
}
eASSERT(env, (env->me_flags & (MDBX_NOTLS | MDBX_ENV_TXKEY |
MDBX_EXCLUSIVE)) == MDBX_ENV_TXKEY);
eASSERT(env, (env->me_flags & (MDBX_NOSTICKYTHREADS | MDBX_ENV_TXKEY)) ==
MDBX_ENV_TXKEY);
MDBX_reader *r = thread_rthc_get(env->me_txkey);
if (unlikely(r != NULL)) {
eASSERT(env, r->mr_pid.weak == env->me_pid);
@ -8986,7 +8993,7 @@ __cold int mdbx_thread_register(const MDBX_env *env) {
}
const uintptr_t tid = osal_thread_self();
if (env->me_txn0 && unlikely(env->me_txn0->mt_owner == tid) && env->me_txn)
if (env->me_txn && unlikely(env->me_txn0->mt_owner == tid))
return MDBX_TXN_OVERLAPPING;
return bind_rslot((MDBX_env *)env, tid).err;
}
@ -9000,12 +9007,12 @@ __cold int mdbx_thread_unregister(const MDBX_env *env) {
return MDBX_RESULT_TRUE;
if (unlikely((env->me_flags & MDBX_ENV_TXKEY) == 0)) {
eASSERT(env, !env->me_lck_mmap.lck || (env->me_flags & MDBX_NOTLS));
return MDBX_RESULT_TRUE /* MDBX_NOTLS mode */;
eASSERT(env, env->me_flags & MDBX_NOSTICKYTHREADS);
return MDBX_RESULT_TRUE /* MDBX_NOSTICKYTHREADS mode */;
}
eASSERT(env, (env->me_flags & (MDBX_NOTLS | MDBX_ENV_TXKEY |
MDBX_EXCLUSIVE)) == MDBX_ENV_TXKEY);
eASSERT(env, (env->me_flags & (MDBX_NOSTICKYTHREADS | MDBX_ENV_TXKEY)) ==
MDBX_ENV_TXKEY);
MDBX_reader *r = thread_rthc_get(env->me_txkey);
if (unlikely(r == NULL))
return MDBX_RESULT_TRUE /* not registered */;
@ -9220,7 +9227,7 @@ static bool check_meta_coherency(const MDBX_env *env,
}
/* Common code for mdbx_txn_begin() and mdbx_txn_renew(). */
static int txn_renew(MDBX_txn *txn, const unsigned flags) {
static int txn_renew(MDBX_txn *txn, unsigned flags) {
MDBX_env *env = txn->mt_env;
int rc;
@ -9245,14 +9252,15 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
0);
const uintptr_t tid = osal_thread_self();
flags |= env->me_flags & (MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP);
if (flags & MDBX_TXN_RDONLY) {
eASSERT(env, (flags & ~(MDBX_TXN_RO_BEGIN_FLAGS | MDBX_WRITEMAP)) == 0);
txn->mt_flags =
MDBX_TXN_RDONLY | (env->me_flags & (MDBX_NOTLS | MDBX_WRITEMAP));
eASSERT(env, (flags & ~(MDBX_TXN_RO_BEGIN_FLAGS | MDBX_WRITEMAP |
MDBX_NOSTICKYTHREADS)) == 0);
txn->mt_flags = flags;
MDBX_reader *r = txn->to.reader;
STATIC_ASSERT(sizeof(uintptr_t) <= sizeof(r->mr_tid));
if (likely(env->me_flags & MDBX_ENV_TXKEY)) {
eASSERT(env, !(env->me_flags & MDBX_NOTLS));
eASSERT(env, !(env->me_flags & MDBX_NOSTICKYTHREADS));
r = thread_rthc_get(env->me_txkey);
if (likely(r)) {
if (unlikely(!r->mr_pid.weak) &&
@ -9265,7 +9273,8 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
}
}
} else {
eASSERT(env, !env->me_lck_mmap.lck || (env->me_flags & MDBX_NOTLS));
eASSERT(env,
!env->me_lck_mmap.lck || (env->me_flags & MDBX_NOSTICKYTHREADS));
}
if (likely(r)) {
@ -9313,9 +9322,9 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
mo_Relaxed);
safe64_write(&r->mr_txnid, head.txnid);
eASSERT(env, r->mr_pid.weak == osal_getpid());
eASSERT(env,
r->mr_tid.weak ==
((env->me_flags & MDBX_NOTLS) ? 0 : osal_thread_self()));
eASSERT(env, r->mr_tid.weak == ((env->me_flags & MDBX_NOSTICKYTHREADS)
? 0
: osal_thread_self()));
eASSERT(env, r->mr_txnid.weak == head.txnid ||
(r->mr_txnid.weak >= SAFE64_INVALID_THRESHOLD &&
head.txnid < env->me_lck->mti_oldest_reader.weak));
@ -9374,12 +9383,12 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
tASSERT(txn, db_check_flags(txn->mt_dbs[MAIN_DBI].md_flags));
} else {
eASSERT(env, (flags & ~(MDBX_TXN_RW_BEGIN_FLAGS | MDBX_TXN_SPILLS |
MDBX_WRITEMAP)) == 0);
MDBX_WRITEMAP | MDBX_NOSTICKYTHREADS)) == 0);
if (unlikely(txn->mt_owner == tid ||
/* not recovery mode */ env->me_stuck_meta >= 0))
return MDBX_BUSY;
MDBX_lockinfo *const lck = env->me_lck_mmap.lck;
if (lck && (env->me_flags & MDBX_NOTLS) == 0 &&
if (lck && (env->me_flags & MDBX_NOSTICKYTHREADS) == 0 &&
(mdbx_static.flags & MDBX_DBG_LEGACY_OVERLAP) == 0) {
const size_t snap_nreaders =
atomic_load32(&lck->mti_numreaders, mo_AcquireRelease);
@ -9639,7 +9648,8 @@ static int txn_renew(MDBX_txn *txn, const unsigned flags) {
* since Wine don't support section extending,
* i.e. in both cases unmap+map are required. */
used_bytes < env->me_dbgeo.upper && env->me_dbgeo.grow)) &&
/* avoid recursive use SRW */ (txn->mt_flags & MDBX_NOTLS) == 0) {
/* avoid recursive use SRW */ (txn->mt_flags &
MDBX_NOSTICKYTHREADS) == 0) {
txn->mt_flags |= MDBX_SHRINK_ALLOWED;
osal_srwlock_AcquireShared(&env->me_remap_guard);
}
@ -9673,15 +9683,13 @@ static __always_inline int check_txn(const MDBX_txn *txn, int bad_bits) {
return MDBX_BAD_TXN;
tASSERT(txn, (txn->mt_flags & MDBX_TXN_FINISHED) ||
(txn->mt_flags & MDBX_NOTLS) ==
((txn->mt_flags & MDBX_TXN_RDONLY)
? txn->mt_env->me_flags & MDBX_NOTLS
: 0));
(txn->mt_flags & MDBX_NOSTICKYTHREADS) ==
(txn->mt_env->me_flags & MDBX_NOSTICKYTHREADS));
#if MDBX_TXN_CHECKOWNER
STATIC_ASSERT(MDBX_NOTLS > MDBX_TXN_FINISHED + MDBX_TXN_RDONLY);
if (unlikely(txn->mt_owner != osal_thread_self()) &&
(txn->mt_flags & (MDBX_NOTLS | MDBX_TXN_FINISHED | MDBX_TXN_RDONLY)) <
(MDBX_TXN_FINISHED | MDBX_TXN_RDONLY))
STATIC_ASSERT((long)MDBX_NOSTICKYTHREADS > (long)MDBX_TXN_FINISHED);
if ((txn->mt_flags & (MDBX_NOSTICKYTHREADS | MDBX_TXN_FINISHED)) <
MDBX_TXN_FINISHED &&
unlikely(txn->mt_owner != osal_thread_self()))
return txn->mt_owner ? MDBX_THREAD_MISMATCH : MDBX_BAD_TXN;
#endif /* MDBX_TXN_CHECKOWNER */
@ -9762,7 +9770,6 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags,
~flags)) /* write txn in RDONLY env */
return MDBX_EACCESS;
flags |= env->me_flags & MDBX_WRITEMAP;
MDBX_txn *txn = nullptr;
if (parent) {
/* Nested transactions: Max 1 child, write txns only, no writemap */
@ -9781,10 +9788,11 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags,
}
tASSERT(parent, audit_ex(parent, 0, false) == 0);
flags |= parent->mt_flags & (MDBX_TXN_RW_BEGIN_FLAGS | MDBX_TXN_SPILLS);
flags |= parent->mt_flags & (MDBX_TXN_RW_BEGIN_FLAGS | MDBX_TXN_SPILLS |
MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP);
} else if (flags & MDBX_TXN_RDONLY) {
if (env->me_txn0 &&
unlikely(env->me_txn0->mt_owner == osal_thread_self()) && env->me_txn &&
if ((env->me_flags & MDBX_NOSTICKYTHREADS) == 0 && env->me_txn &&
unlikely(env->me_txn0->mt_owner == osal_thread_self()) &&
(mdbx_static.flags & MDBX_DBG_LEGACY_OVERLAP) == 0)
return MDBX_TXN_OVERLAPPING;
} else {
@ -9967,12 +9975,13 @@ int mdbx_txn_begin_ex(MDBX_env *env, MDBX_txn *parent, MDBX_txn_flags_t flags,
eASSERT(env, txn->mt_flags == (MDBX_TXN_RDONLY | MDBX_TXN_FINISHED));
else if (flags & MDBX_TXN_RDONLY)
eASSERT(env, (txn->mt_flags &
~(MDBX_NOTLS | MDBX_TXN_RDONLY | MDBX_WRITEMAP |
~(MDBX_NOSTICKYTHREADS | MDBX_TXN_RDONLY | MDBX_WRITEMAP |
/* Win32: SRWL flag */ MDBX_SHRINK_ALLOWED)) == 0);
else {
eASSERT(env, (txn->mt_flags &
~(MDBX_WRITEMAP | MDBX_SHRINK_ALLOWED | MDBX_NOMETASYNC |
MDBX_SAFE_NOSYNC | MDBX_TXN_SPILLS)) == 0);
eASSERT(env,
(txn->mt_flags &
~(MDBX_NOSTICKYTHREADS | MDBX_WRITEMAP | MDBX_SHRINK_ALLOWED |
MDBX_NOMETASYNC | MDBX_SAFE_NOSYNC | MDBX_TXN_SPILLS)) == 0);
assert(!txn->tw.spilled.list && !txn->tw.spilled.least_removed);
}
txn->mt_signature = MDBX_MT_SIGNATURE;
@ -10409,6 +10418,13 @@ int mdbx_txn_abort(MDBX_txn *txn) {
if (unlikely(rc != MDBX_SUCCESS))
return rc;
if ((txn->mt_flags & (MDBX_TXN_RDONLY | MDBX_NOSTICKYTHREADS)) ==
MDBX_NOSTICKYTHREADS &&
unlikely(txn->mt_owner != osal_thread_self())) {
mdbx_txn_break(txn);
return MDBX_THREAD_MISMATCH;
}
return txn_abort(txn);
}
@ -12093,6 +12109,12 @@ int mdbx_txn_commit_ex(MDBX_txn *txn, MDBX_commit_latency *latency) {
if (unlikely(txn->mt_flags & MDBX_TXN_RDONLY))
goto done;
if ((txn->mt_flags & MDBX_NOSTICKYTHREADS) &&
unlikely(txn->mt_owner != osal_thread_self())) {
rc = MDBX_THREAD_MISMATCH;
goto fail;
}
if (txn->mt_child) {
rc = mdbx_txn_commit_ex(txn->mt_child, NULL);
tASSERT(txn, txn->mt_child == NULL);
@ -13757,9 +13779,9 @@ __cold int mdbx_env_set_geometry(MDBX_env *env, intptr_t size_lower,
if (unlikely(rc != MDBX_SUCCESS))
return rc;
const bool need_lock =
!env->me_txn0 || env->me_txn0->mt_owner != osal_thread_self();
const bool inside_txn = !need_lock && env->me_txn;
const bool txn0_owned = env->me_txn0 && env_txn0_owned(env);
const bool inside_txn = txn0_owned && env->me_txn;
bool should_unlock = false;
#if MDBX_DEBUG
if (growth_step < 0) {
@ -13770,13 +13792,12 @@ __cold int mdbx_env_set_geometry(MDBX_env *env, intptr_t size_lower,
#endif /* MDBX_DEBUG */
intptr_t reasonable_maxsize = 0;
bool should_unlock = false;
if (env->me_map) {
/* env already mapped */
if (unlikely(env->me_flags & MDBX_RDONLY))
return MDBX_EACCESS;
if (need_lock) {
if (!txn0_owned) {
int err = osal_txn_lock(env, false);
if (unlikely(err != MDBX_SUCCESS))
return err;
@ -16024,6 +16045,9 @@ __cold int mdbx_env_close_ex(MDBX_env *env, bool dont_sync) {
#endif /* Windows */
}
if (env->me_txn0 && env->me_txn0->mt_owner == osal_thread_self())
osal_txn_unlock(env);
eASSERT(env, env->me_signature.weak == 0);
rc = env_close(env, false) ? MDBX_PANIC : rc;
ENSURE(env, osal_fastmutex_destroy(&env->me_dbi_lock) == MDBX_SUCCESS);
@ -22997,8 +23021,8 @@ __cold int mdbx_env_set_flags(MDBX_env *env, MDBX_env_flags_t flags,
if (unlikely(env->me_flags & MDBX_RDONLY))
return MDBX_EACCESS;
const bool lock_needed = (env->me_flags & MDBX_ENV_ACTIVE) &&
env->me_txn0->mt_owner != osal_thread_self();
const bool lock_needed =
(env->me_flags & MDBX_ENV_ACTIVE) && !env_txn0_owned(env);
bool should_unlock = false;
if (lock_needed) {
rc = osal_txn_lock(env, false);
@ -23233,8 +23257,7 @@ __cold int mdbx_env_stat_ex(const MDBX_env *env, const MDBX_txn *txn,
if (unlikely(err != MDBX_SUCCESS))
return err;
if (env->me_txn0 && env->me_txn0->mt_owner == osal_thread_self() &&
env->me_txn)
if (env->me_txn && env_txn0_owned(env))
/* inside write-txn */
return stat_acc(env->me_txn, dest, bytes);
@ -26209,7 +26232,7 @@ __cold int mdbx_env_set_option(MDBX_env *env, const MDBX_option_t option,
return err;
const bool lock_needed = ((env->me_flags & MDBX_ENV_ACTIVE) && env->me_txn0 &&
env->me_txn0->mt_owner != osal_thread_self());
!env_txn0_owned(env));
bool should_unlock = false;
switch (option) {
case MDBX_opt_sync_bytes:
@ -26324,7 +26347,7 @@ __cold int mdbx_env_set_option(MDBX_env *env, const MDBX_option_t option,
return MDBX_EACCESS;
value = osal_16dot16_to_monotime((uint32_t)value);
if (value != env->me_options.gc_time_limit) {
if (env->me_txn && env->me_txn0->mt_owner != osal_thread_self())
if (env->me_txn && lock_needed)
return MDBX_EPERM;
env->me_options.gc_time_limit = value;
if (!env->me_options.flags.non_auto.rp_augment_limit)

View File

@ -842,8 +842,9 @@ MDBX_INTERNAL_FUNC int osal_ipclock_destroy(osal_ipclock_t *ipc);
* read transactions started by the same thread need no further locking to
* proceed.
*
* If MDBX_NOTLS is set, the slot address is not saved in thread-specific data.
* No reader table is used if the database is on a read-only filesystem.
* If MDBX_NOSTICKYTHREADS is set, the slot address is not saved in
* thread-specific data. No reader table is used if the database is on a
* read-only filesystem.
*
* Since the database uses multi-version concurrency control, readers don't
* actually need any locking. This table is used to keep track of which
@ -1786,8 +1787,8 @@ log2n_powerof2(size_t value_uintptr) {
MDBX_NOMEMINIT | MDBX_COALESCE | MDBX_PAGEPERTURB | MDBX_ACCEDE | \
MDBX_VALIDATION)
#define ENV_CHANGELESS_FLAGS \
(MDBX_NOSUBDIR | MDBX_RDONLY | MDBX_WRITEMAP | MDBX_NOTLS | MDBX_NORDAHEAD | \
MDBX_LIFORECLAIM | MDBX_EXCLUSIVE)
(MDBX_NOSUBDIR | MDBX_RDONLY | MDBX_WRITEMAP | MDBX_NOSTICKYTHREADS | \
MDBX_NORDAHEAD | MDBX_LIFORECLAIM | MDBX_EXCLUSIVE)
#define ENV_USABLE_FLAGS (ENV_CHANGEABLE_FLAGS | ENV_CHANGELESS_FLAGS)
#if !defined(__cplusplus) || CONSTEXPR_ENUM_FLAGS_OPERATIONS

View File

@ -326,7 +326,7 @@ static int suspend_and_append(mdbx_handle_array_t **array,
MDBX_INTERNAL_FUNC int
osal_suspend_threads_before_remap(MDBX_env *env, mdbx_handle_array_t **array) {
eASSERT(env, (env->me_flags & MDBX_NOTLS) == 0);
eASSERT(env, (env->me_flags & MDBX_NOSTICKYTHREADS) == 0);
const uintptr_t CurrentTid = GetCurrentThreadId();
int rc;
if (env->me_lck_mmap.lck) {

View File

@ -1216,8 +1216,8 @@ env::operate_parameters::make_flags(bool accede, bool use_subdirectory) const {
flags |= MDBX_NOSUBDIR;
if (options.exclusive)
flags |= MDBX_EXCLUSIVE;
if (options.orphan_read_transactions)
flags |= MDBX_NOTLS;
if (options.no_sticky_threads)
flags |= MDBX_NOSTICKYTHREADS;
if (options.disable_readahead)
flags |= MDBX_NORDAHEAD;
if (options.disable_clear_memory)
@ -1275,8 +1275,9 @@ env::reclaiming_options::reclaiming_options(MDBX_env_flags_t flags) noexcept
coalesce((flags & MDBX_COALESCE) ? true : false) {}
env::operate_options::operate_options(MDBX_env_flags_t flags) noexcept
: orphan_read_transactions(
((flags & (MDBX_NOTLS | MDBX_EXCLUSIVE)) == MDBX_NOTLS) ? true
: no_sticky_threads(((flags & (MDBX_NOSTICKYTHREADS | MDBX_EXCLUSIVE)) ==
MDBX_NOSTICKYTHREADS)
? true
: false),
nested_write_transactions((flags & (MDBX_WRITEMAP | MDBX_RDONLY)) ? false
: true),
@ -1831,8 +1832,8 @@ __cold ::std::ostream &operator<<(::std::ostream &out,
static const char comma[] = ", ";
const char *delimiter = "";
out << "{";
if (it.orphan_read_transactions) {
out << delimiter << "orphan_read_transactions";
if (it.no_sticky_threads) {
out << delimiter << "no_sticky_threads";
delimiter = comma;
}
if (it.nested_write_transactions) {

View File

@ -378,7 +378,8 @@ const struct option_verb mode_bits[] = {
{"nosync-safe", unsigned(MDBX_SAFE_NOSYNC)},
{"nometasync", unsigned(MDBX_NOMETASYNC)},
{"writemap", unsigned(MDBX_WRITEMAP)},
{"notls", unsigned(MDBX_NOTLS)},
{"nostickythreads", unsigned(MDBX_NOSTICKYTHREADS)},
{"no-sticky-threads", unsigned(MDBX_NOSTICKYTHREADS)},
{"nordahead", unsigned(MDBX_NORDAHEAD)},
{"nomeminit", unsigned(MDBX_NOMEMINIT)},
{"lifo", unsigned(MDBX_LIFORECLAIM)},

View File

@ -385,9 +385,9 @@ else
fi
if [ "$EXTRA" != "no" ]; then
options=(writemap lifo notls perturb nomeminit nordahead)
options=(writemap lifo nostickythreads perturb nomeminit nordahead)
else
options=(writemap lifo notls)
options=(writemap lifo nostickythreads)
fi
syncmodes=("" ,+nosync-safe ,+nosync-utterly ,+nometasync)
function join { local IFS="$1"; shift; echo "$*"; }

View File

@ -106,7 +106,7 @@ MDBX_NORETURN void usage(void) {
" writemap == MDBX_WRITEMAP\n"
" nosync-utterly == MDBX_UTTERLY_NOSYNC\n"
" perturb == MDBX_PAGEPERTURB\n"
" notls == MDBX_NOTLS\n"
" nostickythreads== MDBX_NOSTICKYTHREADS\n"
" nordahead == MDBX_NORDAHEAD\n"
" nomeminit == MDBX_NOMEMINIT\n"
" --random-writemap[=YES|no] Toggle MDBX_WRITEMAP randomly\n"

View File

@ -351,7 +351,7 @@ else
fi
syncmodes=("" ,+nosync-safe ,+nosync-utterly)
options=(writemap lifo notls perturb)
options=(writemap lifo nostickythreads perturb)
function join { local IFS="$1"; shift; echo "$*"; }