This reverts two commits:
0bb8e418a41c6f583ca9d705b400e37e2308a534
"Fix postgres schema after dropping old tables (#16730)"
and
51e4e35653f98c3f61222fbdbdb1dcb8864f7fca
"Add a Postgres `REPLICA IDENTITY` to tables that do not have an implicit one. This should allow use of Postgres logical replication. (take 2, now with no added deadlocks!) (#16658)"
and also amends the changelog.
* Add `ALTER TABLE ... REPLICA IDENTITY ...` for individual tables
We can't combine them into one file as it makes it likely to hit a deadlock
if Synapse is running, as it only takes one other transaction to access two
tables in a different order to the schema delta.
* Add notes
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Re-introduce REPLICA IDENTITY test
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
poetry-core 1.8.x includes a fix which properly moves the generate
synapse_rust.abi3.so file to the synapse directory when using an
editable install.
Without this change developers are left with a confusing experience
of the synapse.synapse_rust module not being found after installation.
Implement MSC3860 to follow redirects for federated media downloads.
Note that the Client-Server API doesn't support this (yet) since the media
repository in Synapse doesn't have a way of supporting redirects.
* Describe `insert_client_ip`
* Pull out client_ips and MAU tracking to BaseAuth
* Define HAS_AUTHLIB once in tests
sick of copypasting
* Track ips and token usage when delegating auth
* Test that we track MAU and user_ips
* Don't track `__oidc_admin`
pip was using a vendored setuptools that was incompatible with
Python 3.12. Upgrading cibuildwheels to a version with a newer
version of pip (and thus a newer version of setuptools) fixes
the issue.
Keeping track of a lower bound of stream ID where we've deleted everything below makes the queries much faster. Otherwise, every time we scan for rows to delete we'd re-scan across all the rows that have previously deleted (until the next table VACUUM).
If a worker reconnects to Redis we send out the current positions of all our streams. However, if we're also trying to send out a backlog of RDATA at the same time then we can end up sending a `POSITION` with the current token *before* we've sent all the RDATA before the current token.
This doesn't cause actual bugs as the receiving servers see the POSITION, fetch the relevant rows from the DB, and then ignore the old RDATA as they come in. However, this is inefficient so it'd be better if we didn't send out-of-order positions
* Fix the CI query that did not detect all cases of missing primary keys
* Add more missing REPLICA IDENTITY entries
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add Postgres replica identities to tables that don't have an implicit one
Fixes#16224
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Move the delta to version 83 as we missed the boat for 82
* Add a test that all tables have a REPLICA IDENTITY
* Extend the test to include when indices are deleted
* isort
* black
* Fully qualify `oid` as it is a 'hidden attribute' in Postgres 11
* Update tests/storage/test_database.py
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Add missed tables
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
If simple_{insert,upsert,update}_many_txn is called without any data
to modify then return instead of executing the query.
This matches the behavior of simple_{select,delete}_many_txn.
Fetch information needed for push rule evaluation in parallel.
Ideally this would use query pipelining, but this is not
available in psycopg2.
Due to the database thread pool this may result in little
to no parallelization.
Previously only Twisted's EPollReactor was compatible with the
reactor timing metric, notably not working when asyncio was used.
After this change, the following configurations support the reactor
timing metric:
* poll, epoll, or select reactors
* asyncio reactor with a poll, epoll, select, /dev/poll, or kqueue event loop.
The event persistence code used to handle multiple rooms
at a time, but was simplified to only ever be called with a
single room at a time (different rooms are now handled in
parallel). The code is still generic to multiple rooms causing
a lot of work that is unnecessary (e.g. unnecessary loops, and
partitioning data by room).
This strips out the ability to handle multiple rooms at once, greatly
simplifying the code.
Just to standardize on the normal helpers, it might also have
a slight perf improvement on PostgreSQL which will now use
`ANY (?)` instead of `IN (?, ?, ...)`.
* complement: enable dirty runs
* Add changelog
* Set a low connpool limit when running in Complement
Dirty runs can cause many containers to be running concurrently,
which seems to easily exhaust resources on the host. The increased
speedup from dirty runs also seems to use more db connections on
workers, which are misconfigured currently to have
`SUM(workers * cp_max) > max_connections`, causing
```
FATAL: sorry, too many clients already
```
which results in tests failing.
* Try p=2 concurrency to restrict slowness of servers which causes partial state join tests to flake
* Debug logging
* Only run flakey tests
* Only adjust connection pool limits in worker mode
* Move cp vars to somewhere where they get executed in CI
* Move cp values back to where they actually work
* Debug logging
* Try p=1 to see if this makes worker mode happier
* Remove debug logging
This is mostly useful for federated rooms where some users
would get stuck in the invite or knock state when the room
was purged from their homeserver.
This adds a module API which allows a module to update a user's
presence state/status message. This is useful for controlling presence
from an external system.
To fully control presence from the module the presence.enabled config
parameter gains a new state of "untracked" which disables internal tracking
of presence changes via user actions, etc. Only updates from the module will
be persisted and sent down sync properly).
Twisted trunk makes a change to the `TLSMemoryBIOFactory` where
the underlying protocol is changed from `TLSMemoryBIOProtocol` to
`BufferingTLSTransport` to improve performance of TLS code (see
https://github.com/twisted/twisted/issues/11989).
In order to properly hook this code up in tests we need to pass the test
reactor's clock into `TLSMemoryBIOFactory` to avoid the global (trial)
reactor being used by default.
Twisted does something similar internally for tests:
157cd8e659/src/twisted/web/test/test_agent.py (L871-L874)
This reverts commit 5fe76b9434e22bb752c252dd9c66c3c2bfb90dfc.
I think I had this accidentally commited on my local develop branch, and
so it accidentally got merged into upstream develop.
This should re-land with corrections in #16504.
* Fix bug where a new writer advances their token too quickly
When starting a new writer (for e.g. persisting events), the
`MultiWriterIdGenerator` doesn't have a minimum token for it as there
are no rows matching that new writer in the DB.
This results in the the first stream ID it acquired being announced as
persisted *before* it actually finishes persisting, if another writer
gets and persists a subsequent stream ID. This is due to the logic of
setting the minimum persisted position to the minimum known position of
across all writers, and the new writer starts off not being considered.
* Fix sending out POSITIONs when our token advances without update
Broke in #14820
* For replication HTTP requests, only wait for minimal position
This could happen if the last rows in the account data stream were inserted into `account_data`. After a restart the max account ID would be calculated without looking at the `account_data` table, and so have an old ID.
If using the script remotely, there's no particularly convincing reason
to disable certificate verification, as this makes the connection
interceptible.
If on the other hand, the script is used locally (the most common use
case), you can simply target the HTTP listener and avoid TLS altogether.
This is what the script already attempts to do if passed a homeserver
configuration YAML file.
This splits thinsg into two queries, but most of the time we won't have
new event backwards extremities so this shouldn't actually add an extra
RTT for the majority of cases.
Note this removes the check for events with no prev events, but that was
part of MSC2716 work that has since been removed.
Synapse was incorrectly implemented with a knock_state_events
property on some APIs (instead of knock_room_state). This was
correct in Synapse 1.70.0, but *both* fields were sent to also be
compatible with Synapse versions expecting the wrong field.
Enough time has passed that only the correct field needs to be
included/handled.
This converts the media servlet URLs in the same way as
(most) of the rest of Synapse. This will give more flexibility
in the versions each endpoint exists under.
This avoids calling cursor_to_dict and then immediately
unpacking the values in the dict for other users. By not
creating the intermediate dictionary we can avoid allocating
the dictionary and strings for the keys, which should generally
be more performant.
Additionally this improves type hints by avoid Dict[str, Any]
dictionaries coming out of the database layer.
Co-authored-by: David Robertson <davidr@element.io>
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
Assert that the return type of callables wrapped in @cached
and @cachedList are cachable (aka immutable).
This is because if a worker reaches ~100% CPU then everything starts
lagging and we hit the log line a lot. When at error we invoke sentry
and that has a lot of overhead, which then puts even more pressure on
the worker.
There are no known bugs in the message retention code, but
it is possible that there still exists race conditions. Additional
fixes will be made as reported.
This allows maturin >= 0.15 to build the properly named
shared library object.
For now the old configuration is also kept to allow for
older maturin installs to be used.
Also add restore of purge/shutdown rooms after a synapse restart.
Co-authored-by: Eric Eastwood <erice@matrix.org>
Co-authored-by: Erik Johnston <erikj@matrix.org>
Refresh tokens were not correctly moved to the rehydrated
device (similar to how the access token is currently handled).
This resulted in invalid refresh tokens after rehydration.
Adds both the List-Unsubscribe (RFC2369) and List-Unsubscribe-Post (RFC8058)
headers to push notification emails, which together should:
* Show an "Unsubscribe" link in the MUA UI when viewing Synapse notification emails.
* Enable "one-click" unsubscribe (the user never leaves their MUA, which automatically
makes a POST request to the specified endpoint).
Using the new `TaskScheduler` meant that we'ed create lots of new
metrics (due to adding task ID to the desc of background process),
resulting in requests for metrics taking an increasing amount of CPU.
* Allow user_id to be optional for room deletion
* Add module API method to delete a room
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Don't worry about the case block=True && requester_user_id is None
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Add a (long) timeout to when a "busy" device is considered not online.
This does *not* match MSC3026, but is a reasonable thing for an
implementation to do.
Expands tests for the (unstable) busy presence with multiple devices.
Tracks presence on an individual per-device basis and combine
the per-device state into a per-user state. This should help in
situations where a user has multiple devices with conflicting status
(e.g. one is syncing with unavailable and one is syncing with online).
The tie-breaking is done by priority:
BUSY > ONLINE > UNAVAILABLE > OFFLINE
* Fix rare bug that broke looping calls
We can't interact with the reactor from the main thread via looping
call.
Introduced in v1.90.0 / #15791.
* Newsfile
Refactoring to use both the user ID & the device ID when tracking
the currently syncing users in the presence handler.
This is done both locally and over replication. Note that the device
ID is discarded but will be used in a future change.
Refactoring to pass the device ID (in addition to the user ID) through
the presence handler (specifically the `user_syncing`, `set_state`,
and `bump_presence_active_time` methods and their replication
versions).
Simplify some of the presence code by reducing duplicated code between
worker & non-worker modes.
The main change is to push some of the logic from `user_syncing` into
`set_state`. This is done by passing whether the user is setting the presence
via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some
additional logic is performed:
* Don't override `busy` presence.
* Update the `last_user_sync_ts`.
* Never update the status message.
* Properly update retry_last_ts when hitting the maximum retry interval
This was broken in 1.87 when the maximum retry interval got changed from
almost infinite to a week (and made configurable).
fixes#16101
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Add changelog
* Change fix + add test
* Add comment
---------
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
This should only be called on HomeServer objects which are configured
to run background tasks, which is automatically (and properly) done via
the call to setup().
If we don't have all the auth events in a room then not all state events will have a chain cover index. Even so, we can still use the chain cover index on the events that do have it, rather than bailing and using the slower functions.
This situation should not arise for newly persisted rooms, as we check we have the full auth chain for each event, but can happen for existing rooms.
c.f. #15245
We were seeing serialization errors when taking out multiple read locks.
The transactions were retried, so isn't causing any failures.
Introduced in #15782.
* Fix the method signature of `run_db_interaction` on the module API
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Misc. clean-ups to:
* Use keyword arguments.
* Return early (reducing indentation) of some functions.
* Removing duplicated / unused code.
* Use wrap_as_background_process.
* Add a module API function to provide `call_later`
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add comments
* Update version number
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add a cache invalidation clean-up task
* Run the cache invalidation stream clean-up on the background worker
* Tune down
* call_later is in millis!
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* fixup! Add a cache invalidation clean-up task
* Update synapse/storage/databases/main/cache.py
Co-authored-by: Eric Eastwood <erice@element.io>
* Update synapse/storage/databases/main/cache.py
Co-authored-by: Eric Eastwood <erice@element.io>
* MILLISEC -> MS
* Expand on comment
* Move and tweak comment about Postgres
* Use `wrap_as_background_process`
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
For now this maintains compatible with old Synapses by falling back
to using transaction semantics on a per-access token. A future version
of Synapse will drop support for this.
Adds three new configuration variables:
* destination_min_retry_interval is identical to before (10mn).
* destination_retry_multiplier is now 2 instead of 5, the maximum value will
be reached slower.
* destination_max_retry_interval is one day instead of (essentially) infinity.
Capping this will cause destinations to continue to be retried sometimes instead
of being lost forever. The previous value was 2 ^ 62 milliseconds.
The location of the redacts field changes in room version 11. Ensure
it is copied to the *new* location for *old* room versions for
forwards-compatibility with clients.
Note that copying it to the *old* location for the *new* room version
was previously handled.
* Updates the rule ID.
* Use `event_property_is` instead of `event_match`.
This updates the implementation of MSC3958 to match the latest
text from the MSC.
The un_partial_stated_event_stream_sequence and
application_services_txn_id_seq were never properly configured
in the portdb script, resulting in an error on start-up.
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <89468146+nico-famedly@users.noreply.github.com>
Co-authored-by: Hubert Chathi <hubert@uhoreg.ca>
And fix a bug in the implementation of the updated redaction
format (MSC2174) where the top-level redacts field was not
properly added for backwards-compatibility.
* Revert "Stop writing to column `user_id` of tables `profiles` and `user_filters` (#15787)"
This reverts commit f25b0f88081bb436bef914983cff7087b54eba5f.
* newsfragement
Allow configuring the set of workers to proxy outbound federation traffic through (`outbound_federation_restricted_to`).
This is useful when you have a worker setup with `federation_sender` instances responsible for sending outbound federation requests and want to make sure *all* outbound federation traffic goes through those instances. Before this change, the generic workers would still contact federation themselves for things like profile lookups, backfill, etc. This PR allows you to set more strict access controls/firewall for all workers and only allow the `federation_sender`'s to contact the outside world.
Make it more obvious which Python version runs on a given Linux distribution so when we end up dropping support for a given Python version, we can more easily find the reference to the Python version and remove any references for the distribution. We don't want to be running tests or building packages on a distribution that no longer has a supported Python version.
This way, we can avoid another situation like when we dropped support for Python 3.7 but forgot to drop the Debian Buster references everywhere (https://github.com/matrix-org/synapse/pull/15893)
Previously, if you just followed the instructions per the docs, you just ran into an error:
```sh
$ poetry run synapse_worker --config-path homeserver_generic_worker1.yaml
Missing mandatory `server_name` config option.
```
**Before:**
```
Error retrieving alias
```
**After:**
```
Error retrieving alias #foo:bar -> 401 Unauthorized
```
*Spawning from creating the [manual testing strategy for the outbound federation proxy](https://github.com/matrix-org/synapse/pull/15773).*
Unix socket support for `federation` and `client` Listeners has existed now for a little while(since [1.81.0](https://github.com/matrix-org/synapse/pull/15353)), but there was one last hold out before it could be complete: HTTP Replication communication. This should finish it up. The Listeners would have always worked, but would have had no way to be talked to/at.
---------
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
A lot of the functions have the same name in this space like `store_file`,
and we also do it multiple times for different reasons (main media repo,
other storage providers, thumbnails, etc) so it's good to differentiate
them so your head doesn't explode.
Follow-up to https://github.com/matrix-org/synapse/pull/15850
Tracing instrumentation to media `/upload` code paths to investigate https://github.com/matrix-org/synapse/issues/15841
Allow configuring the set of workers to proxy outbound federation traffic through (`outbound_federation_restricted_to`).
This is useful when you have a worker setup with `federation_sender` instances responsible for sending outbound federation requests and want to make sure *all* outbound federation traffic goes through those instances. Before this change, the generic workers would still contact federation themselves for things like profile lookups, backfill, etc. This PR allows you to set more strict access controls/firewall for all workers and only allow the `federation_sender`'s to contact the outside world.
The original code is from @erikjohnston's branches which I've gotten in-shape to merge.
Image.ANTIALIAS is not defined in current pillow releases. Since ANTIALIAS was just using LANCZOS anyways, this is just a cosmetic change, but makes synapse work with most recent pillow releases.
Signed-off-by: Giovanni Harting <539@idlegandalf.com>
Old device entries for the same user were being removed in individual
SQL commands, making the batch take way longer than necessary.
This combines the commands into a single one with a IN/ANY clause.
Example of log entry before the change, regularly observed with
"log_min_duration_statement = 10000" in PostgreSQL's config:
LOG: duration: 42538.282 ms statement:
DELETE FROM device_lists_stream
WHERE user_id = '@someone' AND device_id = 'someid1'
AND stream_id < 123456789
;
DELETE FROM device_lists_stream
WHERE user_id = '@someone' AND device_id = 'someid2'
AND stream_id < 123456789
;
[repeated for each device ID of that user, potentially a lot...]
With the patch applied on my instance for the past couple of days, I
no longer notice overly long statements of that particular kind.
Signed-off-by: pacien <pacien.trangirard@pacien.net>
* Fix use of config override directory in `devenv up`
`--config-directory` is for the generate config script; `-c` is for usage
* Add homeserver config override directory to gitignore
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
If you leave a room and forget it, then rejoin it, the room would be
missing from the next initial sync.
fixes#13262
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
The port DB script would try and run database background tasks, which
could fail if the data they acted on was in the process of being ported.
These exceptions were non fatal.
Fixes#15789
We now only block the client to backfill when we see a large gap in the events (more than 2 events missing in a row according to `depth`), more than 3 single-event holes, or not enough messages to fill the response. Otherwise, we return the messages directly to the client and backfill in the background for eventual consistency sake.
Fix https://github.com/matrix-org/synapse/issues/15696
* Check required power levels earlier in createRoom handler.
- If a server was configured to reject the creation of rooms with E2EE
enabled (by specifying an unattainably high power level for
"m.room.encryption" in default_power_level_content_override), the 403
error was not being triggered until after the room was created and
before the "m.room.power_levels" was sent. This allowed a user to
access the partially-configured room and complete the setup of E2EE
and power levels manually.
- This change causes the power level overrides to be checked earlier and
the request to be rejected before the user gains access to the room.
- A new `_validate_room_config` method is added to contain checks that
should be run before a room is created.
- The new test case confirms that a user request is rejected by the new
validation method.
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
* Add a changelog file.
* Formatting fix for black.
* Remove unneeded line from test.
---------
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
There appears to be a race where you can end up with entries in
`event_push_summary` with both a `NULL` and `main` thread ID.
Fixes#15736
Introduced in #15597
See https://github.com/matrix-org/synapse/pull/14095#discussion_r990335492
This is useful because when see that a relevant event is an `outlier` or `soft-failed`, then that's a good unexpected indicator explaining why it's not showing up. `filter_events_for_client` is used in `/sync`, `/messages`, `/context` which are all common end-to-end assertion touch points (also notifications, relations).
Implements stable support for MSC3882; this involves updating Synapse's support to
match the MSC / the spec says.
Continue to support the unstable version to allow clients to transition.
Application services providing a "user" property (instead of "username") for
the /register endpoint was never specified. Deprecate this very old
fallback.
Fix https://github.com/matrix-org/synapse/issues/15662
This manifests as purple lines that show up on all time series panels
that you can hover and see what version was deployed.
Also added a new "Deployed Synapse versions over time" panel
where the color block changes with each version. And mixed this
color block into the "Up" time series panel.
To get the Grafana dashboard JSON to copy here: use the **Share** icon at the top -> **Export** -> check the **Export for sharing externally** option -> **View JSON** or **Save to file**
The stubs have some issues so this has some generous cast
and ignores in it, but it is better than not having stubs.
Note that confusing that Element is a function which creates
_Element instances (and similarly for Comment).
* Fully qualified docker image names for the main Dockerfile and Complement related.
* Fully qualified docker image names for Dockerfiles associated with building Debian release artifacts.
This one is harder and is separate from the other commit in case it wasn't correct or was unwanted. I decided to
do the expansion on the docker images in the Dockerfile itself, instead of the various source places that build
which distribution that is selected, as it would have been more invasive with the scripts breaking up the string
for tagging and such. This one is untested.
* Changelog
* Update docker/Dockerfile-workers
* Update docker/complement/Dockerfile
---------
Co-authored-by: reivilibre <olivier@librepush.net>
Fix#15667
- Reiterate the importance of getting Rust installed and set up before attempting to install the Python dependencies.
- Mention the importance of confirming that `poetry install` completed successfully and include a typical error that the user might see if it did not.
- Expand on "Now edit homeserver.yaml" to give examples of things likely to need changing and to link to the relevant sections of the Synapse server documentation.
Updates the database schema to require a thread_id (by adding a
constraint that the column is non-null) for event_push_actions,
event_push_actions_staging, and event_push_actions_summary.
For PostgreSQL we add the constraint as NOT VALID, then
VALIDATE the constraint a background job to avoid locking
the table during an upgrade.
Each table is updated as a separate schema delta to avoid
deadlocks between them.
For SQLite we simply rebuild the table & copy the data.
* Fix#15669: always populate instance map even if it was empty
* Fix some tests
* Fix more tests
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* CI fix: don't forget to update apt repository sources before installing olddeps deps
* Add test testing the backwards compatibility
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
The cached decorators always return a Deferred, which was not
properly propagated. It was close enough when wrapping coroutines,
but failed if a bare function was wrapped.
```
2023-05-21 09:30:09,288 - synapse.logging.opentracing - 940 - ERROR - POST-1 - @trace may not have wrapped StateStorageController.get_state_for_groups correctly! The function is not async but returned a coroutine
```
Tracing instrumentation for these functions originally introduced in https://github.com/matrix-org/synapse/pull/15610
This moves the deactivated user check to the method which
all login types call.
Additionally updates the application service tests to be more
realistic by removing invalid tests and fixing server names.
All the information needed is already in the `instance_map`, so
use that instead of passing the hostname / IP & port manually
for each replication request.
This consolidates logic for future improvements of using e.g.
UNIX sockets for workers.
Fix https://github.com/matrix-org/synapse/issues/15618
### Before
```
2023-05-17 22:51:36-0500 [-] 2023-05-17 22:51:36,889 - synapse.server - 338 - INFO - sentinel - Finished setting up.
```
### After
```
2023-05-19 18:16:20-0500 [-] synapse.server - 338 - INFO - sentinel - Finished setting up.
```
### Dev notes
The `Twisted.Logger` controls the `2023-05-19 18:16:20-0500 [-]` prefix, see : [`twisted/twisted` -> `src/twisted/logger/_format.py#L362-L374`](34b161e66b/src/twisted/logger/_format.py (L362-L374))
And we delegate our logs to the Twisted Logger for the tests which puts it in `_trial_temp/test.log`
The event_fields property in filters should use the proper
escape rules, namely backslashes can be escaped with
an additional backslash.
This adds tests (adapted from matrix-js-sdk) and implements
the logic to properly split the event_fields strings.
...to try to control memory usage. `HomeServerConfig`s hold on to
many Jinja2 objects, which come out to over 0.5 MiB per config.
Over the course of a full test run, the cache grows to ~360 entries.
Limit it to 8 entries.
Part of #15622.
Signed-off-by: Sean Quah <seanq@matrix.org>
Instrument `state` and `state_group` storage related things (tracing) so it's a little more clear where these database transactions are coming from as there is a lot of wires crossing in these functions.
Part of `/messages` performance investigation: https://github.com/matrix-org/synapse/issues/13356
R30v2 has been out since 2021-07-19 (https://github.com/matrix-org/synapse/pull/10332)
and we started collecting stats on 2021-08-16. Since it's been over a year now
(almost 2 years), this is enough grace period for us to now rip it out.
Synapse will no longer send (or respond to) the unstable flags
for faster joins. These were only available behind a configuration
flag and handled in parallel with the stable flags.
This change fixes two memory leaks during `trial` test runs.
Garbage collection is disabled during each test case and a gen-0 GC is
run at the end of each test. However, when the gen-0 GC is run, the
`TestCase` object usually still holds references to the `HomeServer`
used during the test. As a result, the `HomeServer` gets promoted to
gen-1 and then never garbage collected.
Fix this by periodically running full GCs.
Additionally, fix `HomeServer`s leaking after tests that touch inbound
federation due to `FederationRateLimiter`s adding themselves to a global
set, by turning the set into a `WeakSet`.
Resolves#15622.
Signed-off-by: Sean Quah <seanq@matrix.org>
If the previous read marker is pointing to an event that no longer exists
(e.g. due to retention) then assume that the newly given read marker
is newer.
To track changes in MSC2666:
- The change from `/mutual_rooms/{user_id}` to `/mutual_rooms?user_id={user_id}`.
- The addition of `next_batch_token` (and logic).
- Unstable flag now being `uk.half-shot.msc2666.query_mutual_rooms`.
- The error code when your own user is requested.
The second argument of `ConfigError` is a path, passed as an optional
`Iterable[str]` and not a `str`. If a string is passed directly,
Synapse unhelpfully emits "Error in configuration at
a.p.p._.s.e.r.v.i.c.e._.c.o.n.f.i.g._.f.i.l.e.s'" when the config
option has the wrong data type.
Signed-off-by: Sean Quah <seanq@matrix.org>
There are two situations which were previously not properly checked:
1. If the requested URL was replaced with an oEmbed URL, then the
oEmbed URL was not checked against url_preview_url_blacklist.
2. Follow-up URLs (either via autodiscovery of oEmbed or to pre-cache
images) were not checked against url_preview_url_blacklist.
We use the oldest Python version because later Python versions can include some overloads which don't work in the older versions which we still support.
We're using Python 3.8 instead of 3.7 which is our actual minimum support version because it's EOL is in a matter of weeks so can avoid the extra effort. And in any case, minimum Python 3.8 support is better than winging it on Python 3.11.
* Usage that is compatible with Python 3.8 and 3.11
> Since Python 3.10, instead of passing value and tb, an exception object can
be passed as the first argument. If value and tb are provided, the first
argument is ignored in order to provide backwards compatibility.
>
> -- https://docs.python.org/3/library/traceback.html
* Add changelog
Fix the following `mypy` errors when running `mypy` with Python 3.7:
```
synapse/storage/controllers/stats.py:58: error: "Counter" is not subscriptable, use "typing.Counter" instead [misc]
tests/test_state.py:267: error: "dict" is not subscriptable, use "typing.Dict" instead [misc]
```
Part of https://github.com/matrix-org/synapse/issues/15603
In Python 3.9, `typing` is deprecated and the types are subscriptable (generics) by default, https://peps.python.org/pep-0585/#implementation
* Usage that is compatible with Python 3.8 and 3.11
> Since Python 3.10, instead of passing value and tb, an exception object can
be passed as the first argument. If value and tb are provided, the first
argument is ignored in order to provide backwards compatibility.
>
> -- https://docs.python.org/3/library/traceback.html
* Add changelog
Fix:
```
tests/test_state.py:267: error: "dict" is not subscriptable, use "typing.Dict" instead [misc]
```
In Python 3.9, `typing` is deprecated and the types are subscriptable (generics) by default,
https://peps.python.org/pep-0585/#implementation
MSC3389 proposes protecting the relation type & parent event ID
from redaction. This keeps the relation information intact after
redaction which helps with some UX flaws (e.g. deleting an
event causes it to no longer be in a thread, which is confusing).
Adds logging for key server requests which include a key ID.
This is technically in violation of the 1.6 spec, but is the only
way to remain backwards compatibly with earlier versions of
Synapse (and possibly other homeservers) which *did* include
the key ID.
I found the error in the **Before** really vague and obtuse and didn't realize port `5432` corresponded to the Postgres port until searching the codebase. It says to check the logs but that wasn't my first instinct. It's just more obvious if we just print the full thing which gives context of the error type and the traceback to the relevant area of code.
#### Before
```
$ poetry run python -m synapse.app.homeserver -c homeserver.yaml
**********************************************************************************
Error during initialisation:
connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
There may be more information in the logs.
**********************************************************************************
```
#### After
```sh
$ poetry run python -m synapse.app.homeserver -c homeserver.yaml
**********************************************************************************
Error during initialisation:
Traceback (most recent call last):
File "/home/eric/Documents/github/element/synapse/synapse/app/homeserver.py", line 352, in setup
hs.setup()
File "/home/eric/Documents/github/element/synapse/synapse/server.py", line 337, in setup
self.datastores = Databases(self.DATASTORE_CLASS, self)
File "/home/eric/Documents/github/element/synapse/synapse/storage/databases/__init__.py", line 65, in __init__
with make_conn(database_config, engine, "startup") as db_conn:
File "/home/eric/Documents/github/element/synapse/synapse/storage/database.py", line 161, in make_conn
native_db_conn = engine.module.connect(**db_params)
File "/home/eric/.cache/pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.10/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
There may be more information in the logs.
**********************************************************************************
```
* Add SSL options to redis config
* fix lint issues
* Add documentation and changelog file
* add missing . at the end of the changelog
* Move client context factory to new file
* Rename ssl to tls and fix typo
* fix lint issues
* Added when redis attributes were added
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.
* Fix typo in drive by.
* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)
* Several updates:
1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.
* Initial Docs upload
* Changelog
* Missed some commented out code that can go now
* Remove TODO comment that no longer holds true.
* Fix links in docs
* More docs
* Remove debug logging
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Update version to latest, include completeish before/after examples in upgrade notes.
* Fix up and docs too
---------
Co-authored-by: reivilibre <olivier@librepush.net>
Separate out a HTTP client for replication in preparation for
also supporting using UNIX sockets. The major difference from
the base class is that this does not use treq to handle HTTP
requests.
This stops media (and thumbnails) from being accessed from the
listed domains. It does not delete any already locally cached media,
but will prevent accessing it.
Note that admin APIs are unaffected by this change.
m.push_rules, like m.fully_read, is a special account data type that cannot
be set using the normal /account_data endpoint. Return an error instead
of allowing data that will not be used to be stored.
MSC3984 proxies /keys/query requests to appservices, but servers will
can also requests devices / keys from the /user/devices endpoint.
The formats are close enough that we can "proxy" that /user/devices to
appservices (by calling /keys/query) and then change the format of the
returned data before returning it over federation.
Behind a configuration flag this adds + to the list of allowed
characters in Matrix IDs. The main feature this enables is
using full E.164 phone numbers as Matrix IDs.
Add an `is_mine_server_name` method, similar to `is_mine_id`.
Ideally we would use this consistently, instead of sometimes comparing
against `hs.hostname` and other times reaching into
`hs.config.server.server_name`.
Also fix a bug in the tests where `hs.hostname` would sometimes differ
from `hs.config.server.server_name`.
Signed-off-by: Sean Quah <seanq@matrix.org>
A dont_notify action is a no-op (and coalesce is undefined). These are
both considered no-ops by the spec, per MSC3987 and the predefined
push rules were updated to remove dont_notify from the list of actions.
It seems that YouTube Short previews do not work in some
regions, but the oEmbed information for those areas is still
valid.
This causes YouTube Shorts to always use (only) the oEmbed
endpoint which is a minor regression for regions where the URL
preview was already working -- some of the additional video
metadata is lost. It is not likely that clients are using this today
and it is more beneficial to have a limited preview working everywhere
than unused metadata in the Open Graph response.
Enforce that we use index scans (rather than seq scans), which we also do for state queries. The reason to enforce this is that we can't correctly get PostgreSQL to understand the distribution of `stream_ordering` depends on `highlight`, and so it always defaults (on matrix.org) to sequential scans.
#15514 introduced a regression where Synapse would encounter
`PartialDownloadError`s when fetching OpenID metadata for certain
providers on startup. Due to #8088, this prevents Synapse from starting
entirely.
Revert the change while we decide what to do about the regression.
Updates the database schema to require a thread_id (by adding a
constraint that the column is non-null) for event_push_actions,
event_push_actions_staging, and event_push_actions_summary.
For PostgreSQL we add the constraint as NOT VALID, then
VALIDATE the constraint a background job to avoid locking
the table during an upgrade.
For SQLite we simply rebuild the table & copy the data.
Pushers tend to make many connections to the same HTTP host
(e.g. a new event comes in, causes events to be pushed, and then
the homeserver connects to the same host many times). Due to this
the per-host HTTP connection pool size was increased, but this does
not make sense for other SimpleHttpClients.
Add a parameter for the connection pool and override it for pushers
(making a separate SimpleHttpClient for pushers with the increased
configuration).
This returns the HTTP connection pool settings to the default Twisted
ones for non-pusher HTTP clients.
Adds an optional keyword argument to the /relations API which
will recurse a limited number of event relationships.
This will cause the API to return not just the events related to the
parent event, but also events related to those related to the parent
event, etc.
This is disabled by default behind an experimental configuration
flag and is currently implemented using prefixed parameters.
MSC3983 provides a way to request multiple OTKs at once from appservices,
this extends this concept to the Client-Server API.
Note that this will likely be spit out into a separate MSC, but is currently part of
MSC3983.
Cleans-up the schema delta files:
* Removes no-op functions.
* Adds missing type hints to function parameters.
* Fixes any issues with type hints.
This also renames one (very old) schema delta to avoid a conflict
that mypy complains about.
* Docs: Add Nginx loadbalancing example with sticky mxid for workers
Add example nginx configuration snippet that
* does load balancing for workers
* respects mxid part of the token
* from both url parameter and auth header
* and handles since parameter
Thanks to @olmari for pushing me to write this and testing the configs
Signed-off-by: Tatu Wikman <tatu.wikman@gmail.com>
* Add changelog entry
Signed-off-by: Tatu Wikman <tatu.wikman@gmail.com>
* Update codeblock formatter
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
* Remove indirectly related nginx-config
Signed-off-by: Sami Olmari <sami@olmari.fi>
* Proper definition of action how to target username for worker
Signed-off-by: Sami Olmari <sami@olmari.fi>
* Change "nginx" to general "reverse proxy" as it's concept now.
Signed-off-by: Sami Olmari <sami@olmari.fi>
* Wording in better English
Co-authored-by: Tatu Wikman <tatu.wikman@gmail.com>
* rename changelog entry to have correct extension
---------
Signed-off-by: Tatu Wikman <tatu.wikman@gmail.com>
Signed-off-by: Sami Olmari <sami@olmari.fi>
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
Co-authored-by: Sami Olmari <sami@olmari.fi>
Co-authored-by: Sami Olmari <sami+github@olmari.fi>
It can be useful to always return the fallback key when attempting to
claim keys. This adds an unstable endpoint for `/keys/claim` which
always returns fallback keys in addition to one-time-keys.
The fallback key(s) are not marked as "used" unless there are no
corresponding OTKs.
This is currently defined in MSC3983 (although likely to be split out
to a separate MSC). The endpoint shape may change or be requested
differently (i.e. a keyword parameter on the current endpoint), but the
core logic should be reasonable.
Before this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_signature_keys`.
After this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_keys_json`.
This results in `StoreKeyFetcher` now using the results from `ServerKeyFetcher`
in addition to those from `PerspectivesKeyFetcher`, i.e. keys which are directly
fetched from a server will now be pulled from the database instead of refetched.
An additional minor change is included to avoid creating a `PerspectivesKeyFetcher`
(and checking it) if no `trusted_key_servers` are configured.
The overall impact of this should be better usage of cached results:
* If a server has no trusted key servers configured then it should reduce how often keys
are fetched.
* if a server's trusted key server does not have a requested server's keys cached then it
should reduce how often keys are directly fetched.
These two lines:
```
config_obj = HomeServerConfig()
config_obj.parse_config_dict(config, "", "")
```
are called many times with the exact same value for `config`.
As the test suite is CPU-bound and non-negligeably time is spent in
`parse_config_dict`, this saves ~5% on the overall runtime of the Trial
test suite (tested with both `-j2` and `-j12` on a 12t CPU).
This is sadly rather limited, as the cache cannot be shared between
processes (it contains at least jinja2.Template and RLock objects which
aren't pickleable), and Trial tends to run close tests in different
processes.
* Switch InstanceLocationConfig to a pydantic BaseModel, apply Strict* types and add a few helper methods(that will make more sense in follow up work).
Co-authored-by: David Robertson <davidr@element.io>
* More precise type for LoggingTransaction.execute
* Add an annotation for stream_ordering_month_ago
This would have spotted the error that was fixed in "Add comma missing from #15382. (#15429)"
c.f. #15264
The two changes are:
1. Add indexes so that the select / deletes don't do sequential scans
2. Don't repeatedly call `SELECT count(*)` each iteration, as that's slow
The registration fallback is broken and unspecced. This removes it
since there is no plan to spec it.
Note that this does not modify the login fallback code.
* Change `store_server_verify_keys` to take a `Mapping[(str, str), FKR]`
This is because we already can't handle duplicate keys — leads to cardinality violation
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
This moves `redacts` from being a top-level property to
a `content` property in a new room version.
MSC2176 (which was previously implemented) states to not
`redact` this property.
* raise a ConfigError on an invalid app_service_config_files
* changelog
* Move config check to read_config
* Add test
* Ensure list also contains strings
* Trust dtolnay/rust-toolchain
The author is a big deal in the Rust world and I'm happy to trust them.
I'm also bored of the dependabot updates tbh.
* Changelog
This change fixes a rare bug where initial /syncs would fail with a
`KeyError` under the following circumstances:
1. A user fast joins a remote room.
2. The user is kicked from the room before the room's full state has
been synced.
3. A second local user fast joins the room.
4. Events are backfilled into the room with a higher topological
ordering than the original user's leave. They are assigned a
negative stream ordering. It's not clear how backfill happened here,
since it is expected to be equivalent to syncing the full state.
5. The second local user leaves the room before the room's full state
has been synced. The homeserver does not complete the sync.
6. The original user performs an initial /sync with lazy_load_members
enabled.
* Because they were kicked from the room, the room is included in
the /sync response even though the include_leave option is not
specified.
* To populate the room's timeline, `_load_filtered_recents` /
`get_recent_events_for_room` fetches events with a lower stream
ordering than the leave event and picks the ones with the highest
topological orderings (which are most recent). This captures the
backfilled events after the leave, since they have a negative
stream ordering. These events are filtered out of the timeline,
since the user was not in the room at the time and cannot view
them. The sync code ends up with an empty timeline for the room
that notably does not include the user's leave event.
This seems buggy, but at least we don't disclose events the user
isn't allowed to see.
* Normally, `compute_state_delta` would fetch the state at the
start and end of the room's timeline to generate the sync
response. Since the timeline is empty, it fetches the state at
`min(now, last event in the room)`, which corresponds with the
second user's leave. The state during the entirety of the second
user's membership does not include the membership for the first
user because of partial state.
This part is also questionable, since we are fetching state from
outside the bounds of the user's membership.
* `compute_state_delta` then tries and fails to find the user's
membership in the auth events of timeline events. Because there
is no timeline event whose auth events are expected to contain
the user's membership, a `KeyError` is raised.
Also contains a drive-by fix for a separate unlikely race condition.
Signed-off-by: Sean Quah <seanq@matrix.org>
This uses the specced /_matrix/app/v1/... paths instead of the
"legacy" paths. If the homeserver receives an error it will retry
using the legacy path.
* Add IReactorUNIX to ISynapseReactor type hint.
* Create listen_unix().
Two options, 'path' to the file and 'mode' of permissions(not umask, recommend 666 as default as
nginx/other reverse proxies write to it and it's setup as user www-data)
For the moment, leave the option to always create a PID lockfile turned on by default
* Create UnixListenerConfig and wire it up.
Rename ListenerConfig to TCPListenerConfig, then Union them together into ListenerConfig.
This spidered around a bit, but I think I got it all. Metrics and manhole have been placed
behind a conditional in case of accidental putting them onto a unix socket.
Use new helpers to get if a listener is configured for TLS, and to help create a site tag
for logging.
There are 2 TODO things in parse_listener_def() to finish up at a later point.
* Refactor SynapseRequest to handle logging correctly when using a unix socket.
This prevents an exception when an IP address can not be retrieved for a request.
* Make the 'Synapse now listening on Unix socket' log line a little prettier.
* No silent failures on generic workers when trying to use a unix socket with metrics or manhole.
* Inline variables in app/_base.py
* Update docstring for listen_unix() to remove reference to a hardcoded permission of 0o666 and add a few comments saying where the default IS declared.
* Disallow both a unix socket and a ip/port combo on the same listener resource
* Linting
* Changelog
* review: simplify how listen_unix returns(and get rid of a type: ignore)
* review: fix typo from ConfigError in app/homeserver.py
* review: roll conditional for http_options.tag into get_site_tag() helper(and add docstring)
* review: enhance the conditionals for checking if a port or path is valid, remove a TODO line
* review: Try updating comment in get_client_ip_if_available to clarify what is being retrieved and why
* Pretty up how 'Synapse now listening on Unix Socket' looks by decoding the byte string.
* review: In parse_listener_def(), raise ConfigError if neither socket_path nor port is declared(and fix a typo)
* Revert "Fix registering a device on an account with lots of devices (#15348)"
This reverts commit f0d8f66eaaacfa75bed65bc5d0c602fbc5339c85.
* Revert "Delete stale non-e2e devices for users, take 3 (#15183)"
This reverts commit 78cdb72cd6b0e007c314d9fed9f629dfc5b937a6.
Clean-up from adding the thread_id column, which was initially
null but backfilled with values. It is desirable to require it to now
be non-null.
In addition to altering this column to be non-null, we clean up
obsolete background jobs, indexes, and just-in-time updating
code.
If enabled, for users which are exclusively owned by an application
service then the appservice will be queried for devices in addition
to any information stored in the Synapse database.
Previously, we would spin in a tight loop until
`update_state_for_partial_state_event` stopped raising
`FederationPullAttemptBackoffError`s. Replace the spinloop with a wait
until the backoff period has expired.
Signed-off-by: Sean Quah <seanq@matrix.org>
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
* Fix joining rooms you have been unbanned from
Since forever synapse did not allow you to join a room after you have
been unbanned from it over federation. This was not actually because of
the unban event not federating. Synapse simply used outdated state to
validate the join transition. This skips the validation if we are not in
the room and for that reason won't have the current room state.
Fixes#1563
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Add changelog
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Update changelog.d/15323.bugfix
---------
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Experimental support for MSC3983 is behind a configuration flag.
If enabled, for users which are exclusively owned by an application
service then the appservice will be queried for one-time keys *if*
there are none uploaded to Synapse.
This makes it so that we rely on the `device_id` to delete pushers on logout,
instead of relying on the `access_token_id`. This ensures we're not removing
pushers on token refresh, and prepares for a world without access token IDs
(also known as the OIDC).
This actually runs the `set_device_id_for_pushers` background update, which
was forgotten in #13831.
Note that for backwards compatibility it still deletes pushers based on the
`access_token` until the background update finishes.
Invalid mentions data received over the Client-Server API should
be rejected with a 400 error. This will hopefully stop clients from
sending invalid data, although does not help with data received
over federation.
* Add `event_stream_ordering` column to membership state tables
Specifically this adds the column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used.
* Make denormalised `event_stream_ordering` columns foreign keys
* Add comment in schema file explaining new denormalised columns
* Add triggers to enforce consistency of `event_stream_ordering` columns
* Re-order purge room tables to account for foreign keys
* Bump schema version to 75
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Additionally:
* Consistently use `freeze()` in test
---------
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: 6543 <6543@obermui.de>
* Have replication clients remove _INT_STREAM_POS
Suppose worker A makes an internal http request from worker B. B may
make changes that A later learns about over replication. We want A's
request to block until it has seen those changes—mainly to ensure A's
caches are invalidated promptly. This helps provide read-after-write
consistency, eliminating entire categories of races and test flakes.
To implement this, B includes a top-level field `_INT_STREAM_POS` in its
response JSON. Roughly speaking, the field's value tells A what to wait
for. But we weren't removing that internal field before A's request
completed!
Introduced in https://github.com/matrix-org/synapse/pull/14820.
Fixes#15308.
* Changelog
When a room is deleted in Synapse we remove the event forward
extremities in the room, so if (say a bot) tries to send a message into
the room we error out due to not being able to calculate prev events for
the new event *before* we check if the sender is in the room.
Fixes#8094
With Redis commands do not need to be re-issued by the main
process (they fan-out to all processes at once) and thus it is no
longer necessary to worry about them reflecting recursively forever.
* Scaffolding for background process to refresh profiles
* Add scaffolding for background process to refresh profiles for a given server
* Implement the code to select servers to refresh from
* Ensure we don't build up multiple looping calls
* Make `get_profile` able to respect backoffs
* Add logic for refreshing users
* When backing off, schedule a refresh when the backoff is over
* Wake up the background processes when we receive an interesting state event
* Add tests
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add comment about 1<<62
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Remove special-case method for new memberships only, use more generic method
* Only collect profiles from state events in public rooms
* Add a table to track stale remote user profiles
* Add store methods to set and delete rows in this new table
* Mark remote profiles as stale when a member state event comes in to a private room
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Simplify by removing Optionality of `event_id`
* Replace names and avatars with None if they're set to dodgy things
I think this makes more sense anyway.
* Move schema delta to 74 (I missed the boat?)
* Turns out these can be None after all
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
It is not necessary to reach out to the database to check some
parameters if the auto-join rooms are not configured, or (in some cases)
if auto-create rooms is not configured.
* Tweak docstring and type hint
* Flip logic and provide better name
* Separate decision from action
* Track a set of strings, not EventBases
* Require explicit boolean options from callers
* Add explicit option for partial state rooms
* Changelog
* Rename param
When pushing events in partial state rooms down incremental /sync, we
try to find the `m.room.member` state event for their senders by digging
through their auth events, so that we can present the membership to the
client. Events usually have a membership event in their auth events,
with the exception of the `m.room.create` event and a user's first join
into the room.
When implementing #13477, we took the case of a user's first join into
account, but forgot to handle the `m.room.create` case. This change
fixes that.
Signed-off-by: Sean Quah <seanq@matrix.org>
This removes the experimental configuration option and
always escapes the push rule condition keys.
Also escapes any (experimental) push rule condition keys
in the base rules which contain dot in a field name.
Enables MSC3925 support by default, which:
* Includes the full edit event in the bundled aggregations of an
edited event.
* Stops modifying the original event's content to return the new
content from the edit event.
This is a backwards-incompatible change that is considered to be
"correct" by the spec.
AbstractStreamIdTracker (now) has only a single sub-class: AbstractStreamIdGenerator,
combine them to simplify some code and remove any direct references to
AbstractStreamIdTracker.
This replaces the specific `is_user_mention` push rule condition
used in MSC3952 with the generic `exact_event_property_contains`
push rule condition from MSC3966.
It turns out that no clients rely on server-side aggregation of `m.annotation`
relationships: it's just not very useful as currently implemented.
It's also non-trivial to calculate.
I want to remove it from MSC2677, so to keep the implementation in line, let's
remove it here.
Internally the push rules module uses a `pattern_type` property for `event_match`
conditions (and `related_event_match`) to mark the condition as matching the
current user's Matrix ID or localpart.
This is leaky to the Client-Server API where a user can successfully set a condition
which provides `pattern_type` instead of `pattern` (note that there's no benefit to
doing this -- the user can just use their own Matrix ID or localpart instead). When
serializing back to the client the `pattern_type` property is converted into a proper
`pattern`.
The following changes are made to avoid this:
* Separate the `KnownCondition::EventMatch` enum value into `EventMatch`
and `EventMatchType`, each with their own expected properties. (Note that a
similar change is made for `RelatedEventMatch`.)
* Make it such that the `pattern_type` variants serialize to the same condition kind,
but cannot be deserialized (since they're only provided by base rules).
* As a final tweak, convert `user_id` vs. `user_localpart` values into an enum.
* Add documentation for caching in a module
* Changelog
* Formatting
* Wrap lines at a length that mdbook is happier with
* Typo fix
Co-authored-by: Erik Johnston <erik@matrix.org>
* Link to recent version of the API
In the longer term I'd like to see us generate markdown with Sphinx.
* Refer to public `cached` decorator
* Mark caching as being added in 1.74
Some of the underlying infrastructure was added in 1.69, but the
public-facing `cached` decorator was only added in 1.74. It is the
latter that I think we should be advertising.
* Update docs/modules/writing_a_module.md
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
---------
Co-authored-by: David Robertson <davidr@element.io>
Co-authored-by: Erik Johnston <erik@matrix.org>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Admin api to delete event report
* lint + tests
* newsfile
* Apply suggestions from code review
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* revert changes - move to WorkerStore
* update unit test
* Note that timestamp is in millseconds
---------
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Removes the `v1` directory from `test.rest.media.v1`.
* Moves the non-REST code from `synapse.rest.media.v1` to `synapse.media`.
* Flatten the `v1` directory from `synapse.rest.media`, but leave compatiblity
with 3rd party media repositories and spam checkers.
* Fix a long-standing bug where non-ASCII characters in search terms,
including accented letters, would not match characters in a different
case.
* Fix a long-standing bug where search terms using combining accents
would not match display names using precomposed accents and vice
versa.
To fully take effect, the user directory must be rebuilt after this
change.
Fixes#14630.
Signed-off-by: Sean Quah <seanq@matrix.org>
Previously if an autodiscovered oEmbed request failed (e.g. the
oEmbed endpoint is down or does not exist) then the entire URL
preview would fail. Instead we now return everything we can, even
if this additional request fails.
Ideally we would replace this with parsing of the Accept header
or something else, but for now just make Synapse spec compliant
by ignoring the unspecced parameter.
It does not seem that this is ever sent by a client, and even if it is
there's a reasonable fallback.
* Change `create_room` return type
* Don't return room alias from /createRoom
* Update other callsites
* Fix up mypy complaints
It looks like new_room_user_id is None iff new_room_id is None. It's a
shame we haven't expressed this in a way that mypy can understand.
* Changelog
* Upper-bound frozendict dependency
This is an ugly kludge to solve
https://github.com/matrix-org/synapse/issues/15109. It is not the most
friendly thing to do for downstream packagers (apologies), but we are a)
running low on time at the moment, and b) seeking to remove frozendict
anyway.
* Changelog
* Update database_maintenance_tools.md
Included a blog post by Jackson Chen, which DID work when I followed it to perform Matrix Synapse Maintenance, versus the 2020 blog post by Victor Berger, which DID NOT work when performining maintenance.
* Update database_maintenance_tools.md
* Rephrasing
* Sort BOOLEAN_COLUMNS and APPEND_ONLY_TABLES
So I can see if a given table is present in logarithmic time, rather
than linear.
* Teach portdb about `un_partial_stated_event_streams`
* Comments comments comments
* Changelog
Previously, when creating a join event in /make_join, we would decide
whether to include additional fields to satisfy restricted room checks
based on the current state of the room. Then, when building the event,
we would capture the forward extremities of the room to use as prev
events.
This is subject to race conditions. For example, when leaving and
rejoining a room, the following sequence of events leads to a misleading
403 response:
1. /make_join reads the current state of the room and sees that the user
is still in the room. It decides to omit the field required for
restricted room joins.
2. The leave event is persisted and the room's forward extremities are
updated.
3. /make_join builds the event, using the post-leave forward extremities.
The event then fails the restricted room checks.
To mitigate the race, we move the read of the forward extremities closer
to the read of the current state. Ideally, we would compute the state
based off the chosen prev events, but that can involve state resolution,
which is expensive.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Update mypy and mypy-zope
* Remove unused ignores
These used to suppress
```
synapse/storage/engines/__init__.py:28: error: "__new__" must return a
class instance (got "NoReturn") [misc]
```
and
```
synapse/http/matrixfederationclient.py:1270: error: "BaseException" has no attribute "reasons" [attr-defined]
```
(note that we check `hasattr(e, "reasons")` above)
* Avoid empty body warnings, sometimes by marking methods as abstract
E.g.
```
tests/handlers/test_register.py:58: error: Missing return statement [empty-body]
tests/handlers/test_register.py:108: error: Missing return statement [empty-body]
```
* Suppress false positive about `JaegerConfig`
Complaint was
```
synapse/logging/opentracing.py:450: error: Function "Type[Config]" could always be true in boolean context [truthy-function]
```
* Fix not calling `is_state()`
Oops!
```
tests/rest/client/test_third_party_rules.py:428: error: Function "Callable[[], bool]" could always be true in boolean context [truthy-function]
```
* Suppress false positives from ParamSpecs
````
synapse/logging/opentracing.py:971: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
synapse/logging/opentracing.py:1017: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
````
* Drive-by improvement to `wrapping_logic` annotation
* Workaround false "unreachable" positives
See https://github.com/Shoobx/mypy-zope/issues/91
```
tests/http/test_proxyagent.py:626: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:762: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:826: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:838: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:845: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:151: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:452: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:60: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:93: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:127: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:152: error: Statement is unreachable [unreachable]
```
* Changelog
* Tweak DBAPI2 Protocol to be accepted by mypy 1.0
Some extra context in:
- https://github.com/matrix-org/python-canonicaljson/pull/57
- https://github.com/python/mypy/issues/6002
- https://mypy.readthedocs.io/en/latest/common_issues.html#covariant-subtyping-of-mutable-protocol-members-is-rejected
* Pull in updated canonicaljson lib
so the protocol check just works
* Improve comments in opentracing
I tried to workaround the ignores but found it too much trouble.
I think the corresponding issue is
https://github.com/python/mypy/issues/12909. The mypy repo has a PR
claiming to fix this (https://github.com/python/mypy/pull/14677) which
might mean this gets resolved soon?
* Better annotation for INTERACTIVE_AUTH_CHECKERS
* Drive-by AUTH_TYPE annotation, to remove an ignore
This replaces the specific `is_room_mention` push rule condition
used in MSC3952 with the generic `exact_event_match` push rule
condition from MSC3758.
No functionality changes due to this.
Previously we would give up upon receiving a 404 from the first server,
instead of trying the rest of the servers in the list.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Fix order of partial state tables when purging
`partial_state_rooms` has an FK on `events` pointing to the join event we
get from `/send_join`, so we must delete from that table before deleting
from `events`.
**NB:** It would be nice to cancel any resync processes for the room
being purged. We do not do this at present. To do so reliably we'd need
an internal HTTP "replication" endpoint, because the worker doing the
resync process may be different to that handling the purge request.
The first time the resync process tries to write data after the deletion
it will fail because we have deleted necessary data e.g. auth
events. AFAICS it will not retry the resync, so the only downside to
not cancelling the resync is a scary-looking traceback.
(This is presumably extremely race-sensitive.)
* Changelog
* admist(?) -> between
* Warn about a race
* Fix typo, thanks Sean
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
---------
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
...when lazy loading of members is not enabled. It's weird to notify
a client that another user's device list has changed when the client
doesn't think that they share a room.
Note that when a room is un-partial stated, device list updates are
emitted for every member in that room over /sync.
Signed-off-by: Sean Quah <seanq@matrix.org>
Fixes#12801.
Complement tests are at
https://github.com/matrix-org/complement/pull/567.
Avoid blocking on full state when handling a subsequent join into a
partial state room.
Also always perform a remote join into partial state rooms, since we do
not know whether the joining user has been banned and want to avoid
leaking history to banned users.
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
It's important that collections returned from `@cached` methods are not
modified, otherwise future retrievals from the cache will return the
modified collection.
This applies to the return values from `@cached` methods and the values
inside the dictionaries returned by `@cachedList` methods. It's not
necessary for the dictionaries returned by `@cachedList` methods
themselves to be read-only.
Signed-off-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
This specifies to search for an exact value match, instead of
string globbing. It only works across non-compound JSON values
(null, boolean, integer, and strings).
The per-room account data is no longer unconditionally
fetched, even if all rooms will be filtered out.
Global account data will not be fetched if it will all be
filtered out.
The previous version of the code could mutate a cached value,
but only if the input requested all devices of a user *and* a specific
device.
To avoid this nonsensical situation we no longer fetch a specific
device ID if all of a user's devices are returned.
* -> None for test methods
* A first batch of type fixes
* Introduce common parent test case
* Fixup that big test method
* tests.module_api passes mypy
* Changelog
This disambiguates keys which attempt to match fields
with a dot in them (e.g. m.relates_to).
Disabled by default behind an experimental configuration flag.
This PR just clarifies in the SRV DNS delegation document that there are
still cases a user may have to serve files from `.well-known` endpoints,
and this may not be a valid case for using SRV delegation. This has
caused some confusion in a few cases.
Signed-off-by: William Kray <github@williamkray.com>
* Skip testing PyPy wheels
One of the test builds on #15015 failed to install a pp38-* wheel
because it didn't have access to the openssl headers to build
`cryptography` from source. We don't run CI against PyPy so I'm going to
be a meanie and skip testing the wheels. (And I've no idea why 3.8 was
special in the first place, either.)
* Hack the name of the wheel so cibw can test it
I hate hate hate hate hate hate hate hate hate this
* Changelog
* Apply suggestions from code review
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
---------
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Fix MediaStorage type hint
* Typecheck tests.rest.media.v1.test_media_storage
* Changelog
* Remove assert and make the comment succinct
* Fix syntax for olddeps
* Tweak http types in Synapse
AFACIS these are correct, and they make mypy happier on tests.http.
* Type hints for test_proxyagent
* type hints for test_srv_resolver
* test_matrix_federation_agent
* tests.http.server._base
* tests.http.__init__
* tests.http.test_additional_resource
* tests.http.test_client
* tests.http.test_endpoint
* tests.http.test_matrixfederationclient
* tests.http.test_servlet
* tests.http.test_simple_client
* tests.http.test_site
* One fixup in tests.server
* Untyped defs
* Changelog
* Fixup syntax for Python 3.7
* Fix olddeps syntax
* Use a twisted IPv4 addr for dummy_address
* Fix typo, thanks Sean
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Remove redundant `Optional`
---------
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
This adds an `event_stream_ordering` column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used. Includes a background job to
populate these values from the `events` table.
Same idea as https://github.com/matrix-org/synapse/pull/13703.
Signed off by Nick @ Beeper (@fizzadar).
* Make tests.federation pass mypy
* Untyped defs in tests.federation.transport
* test methods return None
* Remaining type hints in tests.federation
* Changelog
* Avoid an uncessary type-ignore
* Accept a Sequence of events in synapse.appservice
This avoids some casts/ignores in the tests I'm about to fixup. It seems
that `List[Mock]` is not a subtype of `List[EventBase]`, but
`Sequence[Mock]` is a subtype of `Sequence[EventBase]`. So presumably
`Mock` is considered a subtype of anything, much like `Any`.
* make tests.appservice.test_scheduler pass mypy
* Extra hints in tests.appservice.test_scheduler
* Extra hints in tests.appservice.test_api
* Extra hints in tests.appservice.test_appservice
* Disallow untyped defs
* Changelog
Co-authored-by: Brad Murray <brad@beeper.com>
Co-authored-by: Nick Barrett <nick@beeper.com>
Copy the suppress_edits push rule from Beeper to implement MSC3958.
9415a1284b/rust/src/push/base_rules.rs (L98-L114)
Ensure that the list of servers in a partial state room always contains
the server we joined off.
Also refactor `get_partial_state_servers_at_join` to return `None` when
the given room is no longer partial stated, to explicitly indicate when
the room has partial state. Otherwise it's not clear whether an empty
list means that the room has full state, or the room is partial stated,
but the server we joined off told us that there are no servers in the
room.
Signed-off-by: Sean Quah <seanq@matrix.org>
Since pyo3-log is initialized very early in the Python start-up
it caches the state of the loggers before they're fully initialized
(and thus are essentially disabled). Whenever we reload the
logging configuration we now also tell pyo3-log to discard
any cached logging configuration it has; it will refetch the
current logging configuration from Python at the next point
it logs.
This fixes Rust log lines not appearing in the homeserver logs.
If a sync request does not need to calculate per-room entries &
is not generating presence & is not generating device list data
(e.g. during initial sync) avoid the expensive calculation of room
specific data.
This is a micro-optimisation for clients syncing simply to receive
to-device information.
This expands the previous optimisation from being only for initial
sync to being for all sync requests.
It also inverts some of the logic to be inclusive instead of exclusive.
The `parse_enum` helper pulls an enum value from the query string
(by delegating down to the parse_string helper with values generated
from the enum).
This is used to pull out "f" and "b" in most places and then we thread
the resulting Direction enum throughout more code.
The previous assumption was that the stream_id column was unique
(for a room ID, receipt type, user ID tuple), but this turned out to be
incorrect.
Now find the max stream ID, then map this back to a database-specific
row identifier and delete other rows which match the (room ID, receipt type,
user ID) tuple, but *not* the row ID.
`run_in_background` calls re-use the current logging context. When they
are not awaited, they can complete after the current logging context has
been marked as finished, which leads to log spam. Use
`run_as_background_process` instead.
Fixes one of the instances of #13090.
Signed-off-by: Sean Quah <seanq@matrix.org>
#14910 fixed the regression introduced by #13873 where sqlite database
migrations would no longer run inside a transaction. However, it
committed the transaction before Synapse updated its bookkeeping of
which migrations have been run, which means that migrations may be run
again after they have completed successfully.
Leave the transaction open at the end of `executescript`, to restore the
old, correct behaviour. Also make the PostgreSQL behaviour consistent
with SQLite.
Fixes#14909.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Better test for bad values in power levels events
The previous test only checked that Synapse didn't raise an exception,
but didn't check that we had correctly interpreted the value of the
dodgy power level.
It also conflated two things: bad room notification levels, and bad user
levels. There _is_ logic for converting the latter to integers, but we
should test it separately.
* Check we ignore types that don't convert to int
* Handle `None` values in `notifications.room`
* Changelog
* Also test that bad values are rejected by event auth
* Docstring
* linter scripttttttttt
* Test boolean values in PL content
* Reject boolean power levels
* Changelog
* Perfer `type(x) is int` to `isinstance(x, int)`
This covered all additional instances I could see where `x` was
user-controlled.
The remaining cases are
```
$ rg -s 'isinstance.*[^_]int'
tests/replication/_base.py
576: if isinstance(obj, int):
synapse/util/caches/stream_change_cache.py
136: assert isinstance(stream_pos, int)
214: assert isinstance(stream_pos, int)
246: assert isinstance(stream_pos, int)
267: assert isinstance(stream_pos, int)
synapse/replication/tcp/external_cache.py
133: if isinstance(result, int):
synapse/metrics/__init__.py
100: if isinstance(calls, (int, float)):
synapse/handlers/appservice.py
262: assert isinstance(new_token, int)
synapse/config/_util.py
62: if isinstance(p, int):
```
which cover metrics, logic related to `jsonschema`, and replication and
data streams. AFAICS these are all internal to Synapse
* Changelog
* Better test for bad values in power levels events
The previous test only checked that Synapse didn't raise an exception,
but didn't check that we had correctly interpreted the value of the
dodgy power level.
It also conflated two things: bad room notification levels, and bad user
levels. There _is_ logic for converting the latter to integers, but we
should test it separately.
* Check we ignore types that don't convert to int
* Handle `None` values in `notifications.room`
* Changelog
* Also test that bad values are rejected by event auth
* Docstring
* linter scripttttttttt
MSC3952 defines push rules which searches for mentions in a list of
Matrix IDs in the event body, instead of searching the entire event
body for display name / local part.
This is implemented behind an experimental configuration flag and
does not yet implement the backwards compatibility pieces of the MSC.
The `/relations` endpoint was not properly handle "live tokens"
(i.e sync tokens), to do this properly we abstract the code that
`/messages` has and re-use it.
* Batch look-ups to see if rooms are partial stated.
* Fix issues found in linting.
* Fix typo.
* Apply suggestions from code review
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Clarify comments.
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Also improve the cache size while we're at it
* is_partial_state_rooms -> is_partial_state_room_batched
* Run `black`
* Improve annotation for `simple_select_many_batch`
* Fix is_partial_state_room_batched impl
* Okay, _actually_ fix impl
* Update description.
* Update synapse/storage/databases/main/room.py
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Run black.
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: David Robertson <davidr@element.io>
On startup, the `_device_list_id_gen` stream id generator is initialized
using the maximum stream id seen in a list of tables. When we started
populating the `device_list_remote_pending` table in #13913, we forgot
to add it to the aforementioned list of tables, so the stream id
generator can hand out old stream ids after a restart. The end result is
that Synapse can fail to handle device list update EDUs after a restart
when a partial state join is in progress.
Add the `device_list_remote_pending` table to the list of tables to
consider when initializing the `_device_list_id_gen` stream id generator.
Signed-off-by: Sean Quah <seanq@matrix.org>
Destination was being used incorrectly (a single destination instead
of a list of destinations was being passed).
This also updates some of the types in the area to not use Collection[str],
which is a footgun.
* Bump the client-side timeout for /state
to allow faster joins resyncs the chance to complete for large rooms.
We have seen this fair poorly (~90s for Matrix HQ's /state) in testing,
causing the resync to advance to another HS who hasn't seen our join yet.
* Changelog
* Milliseconds!!!!
#13873 introduced a regression which causes sqlite database migrations
to no longer run inside a transaction. Wrap them in a transaction again,
to avoid database corruption when migrations are interrupted.
Fixes#14909.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Request partial joins by default
This is a little sloppy, but we are trying to gain confidence in faster
joins in the upcoming RC.
Admins can still opt out by adding the following to their Synapse
config:
```yaml
experimental:
faster_joins: false
```
We may revert this change before the release proper, depending on how
testing in the wild goes.
* Changelog
* Try to fix the backfill test failures
* Upgrade notes
* Postgres compat?
* Allow `AbstractSet` in `StrCollection`
Or else frozensets are excluded. This will be useful in an upcoming
commit where I plan to change a function that accepts `List[str]` to
accept `StrCollection` instead.
* `rooms_to_exclude` -> `rooms_to_exclude_globally`
I am about to make use of this exclusion mechanism to exclude rooms for
a specific user and a specific sync. This rename helps to clarify the
distinction between the global config and the rooms to exclude for a
specific sync.
* Better function names for internal sync methods
* Track a list of excluded rooms on SyncResultBuilder
I plan to feed a list of partially stated rooms for this sync to ignore
* Exclude partial state rooms during eager sync
using the mechanism established in the previous commit
* Track un-partial-state stream in sync tokens
So that we can work out which rooms have become fully-stated during a
given sync period.
* Fix mutation of `@cached` return value
This was fouling up a complement test added alongside this PR.
Excluding a room would mean the set of forgotten rooms in the cache
would be extended. This means that room could be erroneously considered
forgotten in the future.
Introduced in #12310, Synapse 1.57.0. I don't think this had any
user-visible side effects (until now).
* SyncResultBuilder: track rooms to force as newly joined
Similar plan as before. We've omitted rooms from certain sync responses;
now we establish the mechanism to reintroduce them into future syncs.
* Read new field, to present rooms as newly joined
* Force un-partial-stated rooms to be newly-joined
for eager incremental syncs only, provided they're still fully stated
* Notify user stream listeners to wake up long polling syncs
* Changelog
* Typo fix
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Unnecessary list cast
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Rephrase comment
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Another comment
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Fixup merge(?)
* Poke notifier when receiving un-partial-stated msg over replication
* Fixup merge whoops
Thanks MV :)
Co-authored-by: Mathieu Velen <mathieuv@matrix.org>
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: Update room stats and user directory on workers when done
When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.
We wake the room stats and user directory updaters at the appropriate
time in this commit.
Part of #12814 and #12815.
Signed-off-by: Sean Quah <seanq@matrix.org>
* fixup comment
Signed-off-by: Sean Quah <seanq@matrix.org>
* Enable Complement tests for Faster Remote Room Joins on worker-mode
* (dangerous) Add an override to allow Complement to use FRRJ under workers
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Fix race where we didn't send out replication notification
* MORE HACKS
* Fix get_un_partial_stated_rooms_token to take instance_name
* Fix bad merge
* Remove warning
* Correctly advance un_partial_stated_room_stream
* Fix merge
* Add another notify_replication
* Fixups
* Create a separate ReplicationNotifier
* Fix test
* Fix portdb
* Create a separate ReplicationNotifier
* Fix test
* Fix portdb
* Fix presence test
* Newsfile
* Apply suggestions from code review
* Update changelog.d/14752.misc
Co-authored-by: Erik Johnston <erik@matrix.org>
* lint
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
* Avoid clearing out forward extremities when doing a second remote join
When joining a restricted room where the local homeserver does not have
a user able to issue invites, we perform a second remote join. We want
to avoid clearing out forward extremities in this case because the
forward extremities we have are up to date and clearing out forward
extremities creates a window in which the room can get bricked if
Synapse crashes.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Do a full join when doing a second remote join into a full state room
We cannot persist a partial state join event into a joined full state
room, so we perform a full state join for such rooms instead. As a
future optimization, we could always perform a partial state join and
compute or retrieve the full state ourselves if necessary.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Add lock around partial state flag for rooms
Signed-off-by: Sean Quah <seanq@matrix.org>
* Preserve partial state info when doing a second partial state join
Signed-off-by: Sean Quah <seanq@matrix.org>
* Add newsfile
* Add a TODO(faster_joins) marker
Signed-off-by: Sean Quah <seanq@matrix.org>
Now that we wait for stream positions whenever we do a HTTP replication
hit, we need to be less brutal in the case where we do timeout (as we
have bugs around this).
Currently, we will try to start a new partial state sync every time we
perform a remote join, which is undesirable if there is already one
running for a given room.
We intend to perform remote joins whenever additional local users wish
to join a partial state room, so let's ensure that we do not start more
than one concurrent partial state sync for any given room.
------------------------------------------------------------------------
There is a race condition where the homeserver leaves a room and later
rejoins while the partial state sync from the previous membership is
still running. There is no guarantee that the previous partial state
sync will process the latest join, so we restart it if needed.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Change Documentation to have v10 as default room version
* Change Default Room version to 10
* Add changelog entry for default room version swap
* Add changelog entry for v10 default room version in docs
* Clarify doc changelog entry
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Improve Documentation changes.
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Update Changelog entry to have correct format
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Update Spec Version to 1.5
* Only need 1 changelog.
* Fix test.
* Update "Changed in" line
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
* Upgrade to new lockfile format
Now requires poetry >= 1.2.2 to read and poetry >= 1.3.0 to write.
Cheat sheet:
```
poetry --version
poetry show > scratch/before
pipx upgrade poetry
poetry --version
poetry show > scratch/after
diff scratch{before,after} && echo "no change!"
```
* Use Poetry 1.3.2 when reading or writing lockfile
* Remove unneeded(?) poetry dep for cibuildwheel
* Update docs
* Remove redundant call to setup-python
* Remove outdated comments related to Poetry 1.x
* Remove outdated docs line
was fixed in #13082
* Minor improvements to poetry cheat sheet
* Invoke setup-python-poetry with explicit version
Not sure about this. It's hardcoding versions everywhere.
* Changelog
* Check the lockfile is version 2.0
Might one day incorporate other checks like #14742
* Typo fixes, thanks Sean
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Serving partial join responses is no longer experimental. They will only be served under the stable identifier if the the undocumented config flag experimental.msc3706_enabled is set to true.
Synapse continues to request a partial join only if the undocumented config flag experimental.faster_joins is set to true; this setting remains present and unaffected.
We were incorrectly checking if the *local* token had been advanced, rather than the token for the remote instance.
In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).
When the local homeserver is already joined to a room and wants to
perform another remote join, we may find it useful to do a non-partial
state join if we already have the full state for the room.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Also use stable name in SendJoinResponse struct
follow-up to #14832
* Changelog
* Fix a rename I missed
* Run black
* Update synapse/federation/federation_client.py
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Use new query param when requesting a partial join
* Read new query param when serving partial join
* Provide new field names when serving partial joins
* Read new field names from partial join response
* Changelog
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.
A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.
Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.
Fixes#14480.
Signed-off-by: Sean Quah <seanq@matrix.org>
Two parts to this:
* Bundle the whole of the replacement with any edited events. This is backwards-compatible so I haven't put it behind a flag.
* Optionally, inhibit server-side replacement of edited events. This has scope to break things, so it is currently disabled by default.
2023-01-10 16:31:28 +00:00
885 changed files with 67318 additions and 31433 deletions
This directory contains symlinks to the latest dump of the postgres full schema. This is useful to have, as it allows IDEs to understand our schema and provide autocomplete, linters, inspections, etc.
In particular, the DataGrip functionality in IntelliJ's products seems to only consider files called `*.sql` when defining a schema from DDL; `*.sql.postgres` will be ignored. To get around this we symlink those files to ones ending in `.sql`. We've chosen to ignore the `.sql.sqlite` schema dumps here, as they're not intended for production use (and are much quicker to test against).
## Example

## Caveats
- Doesn't include temporary tables created ad-hoc by Synapse.
- Postgres only. IDEs will likely be confused by SQLite-specific queries.
- Will not include migrations created after the latest schema dump.
- Symlinks might confuse checkouts on Windows systems.
## Instructions
### Jetbrains IDEs with DataGrip plugin
- View -> Tool Windows -> Database
- `+` Icon -> DDL Data Source
- Pick a name, e.g. `Synapse schema dump`
- Under sources, click `+`.
- Add an entry with Path pointing to this directory, and dialect set to PostgreSQL.
- OK, and OK.
- IDE should now be aware of the schema.
- Try control-clicking on a table name in a bit of SQL e.g. in `_get_forgotten_rooms_for_user_txn`.
This would create five generic workers with a unique `worker_name` field in each file and listening on ports 8081-8085.
Customise the script to your needs.
Customise the script to your needs. Note that `worker_pid_file` is required if `worker_daemonize` is `true`. Uncomment and/or modify the line if needed.
@ -8,7 +8,9 @@ It also prints out the example lines for Synapse main configuration file.
Remember to route necessary endpoints directly to a worker associated with it.
If you run the script as-is, it will create workers with the replication listener starting from port 8034 and another, regular http listener starting from 8044. If you don't need all of the stream writers listed in the script, just remove them from the ```STREAM_WRITERS``` array.
If you run the script as-is, it will create workers with the replication listener starting from port 8034 and another, regular http listener starting from 8044. If you don't need all of the stream writers listed in the script, just remove them from the ```STREAM_WRITERS``` array.
Hint: Note that `worker_pid_file` is required if `worker_daemonize` is `true`. Uncomment and/or modify the line if needed.
- `medium` - string. Kind of third-party ID, either `email` or `msisdn`.
- `address` - string. Value of third-party ID.
belonging to a user.
- `external_ids` - array, optional. Allow setting the identifier of the external identity
provider for SSO (Single sign-on). Details in the configuration manual under the
sections [sso](../usage/configuration/config_documentation.md#sso) and [oidc_providers](../usage/configuration/config_documentation.md#oidc_providers).
- `auth_provider` - string. ID of the external identity provider. Value of `idp_id`
in the homeserver configuration. Note that no error is raised if the provided
value is not in the homeserver configuration.
- `external_id` - string, user ID in the external identity provider.
- `avatar_url` - string, optional, must be a
- `displayname` - **string**, optional. If set to an empty string (`""`), the user's display name
- `deactivated` - bool, optional. If unspecified, deactivation state will be left
unchanged on existing accounts and set to `false` for new accounts.
A user cannot be erased by deactivating with this API. For details on
deactivating users see [Deactivate Account](#deactivate-account).
- `user_type` - string or null, optional. If provided, the user type will be
adjusted. If `null` given, the user type will be cleared. Other
allowed options are: `bot` and `support`.
If set to an empty string (`""`), the user's avatar is removed.
- `threepids` - **array**, optional. If provided, the user's third-party IDs (email, msisdn) are
entirely replaced with the given list. Each item in the array is an object with the following
fields:
- `medium` - **string**, required. The type of third-party ID, either `email` or `msisdn` (phone number).
- `address` - **string**, required. The third-party ID itself, e.g. `alice@example.com` for `email` or
`447470274584` (for a phone number with country code "44") and `19254857364` (for a phone number
with country code "1") for `msisdn`.
Note: If a threepid is removed from a user via this option, Synapse will also attempt to remove
that threepid from any identity servers it is aware has a binding for it.
- `external_ids` - **array**, optional. Allow setting the identifier of the external identity
provider for SSO (Single sign-on). More details are in the configuration manual under the
sections [sso](../usage/configuration/config_documentation.md#sso) and [oidc_providers](../usage/configuration/config_documentation.md#oidc_providers).
- `auth_provider` - **string**, required. The unique, internal ID of the external identity provider.
The same as `idp_id` from the homeserver configuration. Note that no error is raised if the
provided value is not in the homeserver configuration.
- `external_id` - **string**, required. An identifier for the user in the external identity provider.
When the user logs in to the identity provider, this must be the unique ID that they map to.
- `admin` - **bool**, optional, defaults to `false`. Whether the user is a homeserver administrator,
granting them access to the Admin API, among other things.
- `deactivated` - **bool**, optional. If unspecified, deactivation state will be left unchanged.
If the user already exists then optional parameters default to the current value.
Note: the `password` field must also be set if both of the following are true:
- `deactivated` is set to `false` and the user was previously deactivated (you are reactivating this user)
- Users are allowed to set their password on this homeserver (both `password_config.enabled` and
`password_config.localdb_enabled` config options are set to `true`).
Users' passwords are wiped upon account deactivation, hence the need to set a new one here.
In order to re-activate an account `deactivated` must be set to `false`. If
users do not login via single-sign-on, a new `password` must be provided.
Note: a user cannot be erased with this API. For more details on
deactivating and erasing users see [Deactivate Account](#deactivate-account).
- `locked` - **bool**, optional. If unspecified, locked state will be left unchanged.
- `user_type` - **string** or null, optional. If not provided, the user type will be
not be changed. If `null` is given, the user type will be cleared.
Other allowed options are: `bot` and `support`.
## List Accounts
@ -172,7 +186,8 @@ A response body like the following is returned:
"shadow_banned": 0,
"displayname": "<UserOne>",
"avatar_url": null,
"creation_ts": 1560432668000
"creation_ts": 1560432668000,
"locked": false
}, {
"name": "<user_id2>",
"is_guest": 0,
@ -183,7 +198,8 @@ A response body like the following is returned:
"shadow_banned": 0,
"displayname": "<UserTwo>",
"avatar_url": "<avatar_url>",
"creation_ts": 1561550621000
"creation_ts": 1561550621000,
"locked": false
}
],
"next_token": "100",
@ -206,7 +222,9 @@ The following parameters should be set in the URL:
- `name` - Is optional and filters to only return users with user ID localparts
**or** displaynames that contain this value.
- `guests` - string representing a bool - Is optional and if `false` will **exclude** guest users.
Defaults to `true` to include guest users.
Defaults to `true` to include guest users. This parameter is not supported when MSC3861 is enabled. [See #15582](https://github.com/matrix-org/synapse/pull/15582)
- `admins` - Optional flag to filter admins. If `true`, only admins are queried. If `false`, admins are excluded from
the query. When the flag is absent (the default), **both** admins and non-admins are included in the search results.
- `deactivated` - string representing a bool - Is optional and if `true` will **include** deactivated users.
Defaults to `false` to exclude deactivated users.
- `limit` - string representing a positive integer - Is optional but is used for pagination,
@ -228,9 +246,15 @@ The following parameters should be set in the URL:
- `displayname` - Users are ordered alphabetically by `displayname`.
- `avatar_url` - Users are ordered alphabetically by avatar URL.
- `creation_ts` - Users are ordered by when the users was created in ms.
- `last_seen_ts` - Users are ordered by when the user was lastly seen in ms.
- `dir` - Direction of media order. Either `f` for forwards or `b` for backwards.
Setting this value to `b` will reverse the above sort order. Defaults to `f`.
- `not_user_type` - Exclude certain user types, such as bot users, from the request.
Can be provided multiple times. Possible values are `bot`, `support` or "empty string".
"empty string" here means to exclude users without a type.
- `locked` - string representing a bool - Is optional and if `true` will **include** locked users.
Defaults to `false` to exclude locked users. Note: Introduced in v1.93.
Caution. The database only has indexes on the columns `name` and `creation_ts`.
This means that if a different sort order is used (`is_guest`, `admin`,
@ -255,10 +279,12 @@ The following fields are returned in the JSON response body:
- `displayname` - string - The user's display name if they have set one.
- `avatar_url` - string - The user's avatar URL if they have set one.
- `creation_ts` - integer - The user's creation timestamp in ms.
- `last_seen_ts` - integer - The user's last activity timestamp in ms.
- `locked` - bool - Status if that user has been marked as locked. Note: Introduced in v1.93.
- `next_token`: string representing a positive integer - Indication for pagination. See above.
- `total` - integer - Total number of media.
*Added in Synapse 1.93:* the `locked` query parameter and response field.
## Query current sessions for a user
@ -373,6 +399,8 @@ The following actions are **NOT** performed. The list may be incomplete.
## Reset password
**Note:** This API is disabled when MSC3861 is enabled. [See #15582](https://github.com/matrix-org/synapse/pull/15582)
Changes the password of another user. This will automatically log the user out of all their devices.
The api is:
@ -396,6 +424,8 @@ The parameter `logout_devices` is optional and defaults to `true`.
## Get whether a user is a server administrator or not
**Note:** This API is disabled when MSC3861 is enabled. [See #15582](https://github.com/matrix-org/synapse/pull/15582)
The api is:
```
@ -413,6 +443,8 @@ A response body like the following is returned:
## Change whether a user is a server administrator or not
**Note:** This API is disabled when MSC3861 is enabled. [See #15582](https://github.com/matrix-org/synapse/pull/15582)
Note that you cannot demote yourself.
The api is:
@ -586,6 +618,16 @@ A response body like the following is returned:
"quarantined_by": null,
"safe_from_quarantine": false,
"upload_name": "test2.png"
},
{
"created_ts": 300400,
"last_access_ts": 300700,
"media_id": "BzYNLRUgGHphBkdKGbzXwbjX",
"media_length": 1337,
"media_type": "application/octet-stream",
"quarantined_by": null,
"safe_from_quarantine": false,
"upload_name": null
}
],
"next_token": 3,
@ -647,16 +689,17 @@ The following fields are returned in the JSON response body:
- `media` - An array of objects, each containing information about a media.
Media objects contain the following fields:
- `created_ts` - integer - Timestamp when the content was uploaded in ms.
- `last_access_ts` - integer - Timestamp when the content was last accessed in ms.
- `last_access_ts` - integer or null - Timestamp when the content was last accessed in ms.
Null if there was no access, yet.
- `media_id` - string - The id used to refer to the media. Details about the format
are documented under
[media repository](../media_repository.md).
- `media_length` - integer - Length of the media in bytes.
- `media_type` - string - The MIME-type of the media.
- `quarantined_by` - string - The user ID that initiated the quarantine request
for this media.
- `quarantined_by` - string or null - The user ID that initiated the quarantine request
for this media. Null if not quarantined.
- `safe_from_quarantine` - bool - Status if this media is safe from quarantining.
- `upload_name` - string - The name the media was uploaded with.
- `upload_name` - string or null - The name the media was uploaded with. Null if not provided during upload.
- `next_token`: integer - Indication for pagination. See above.
- `total` - integer - Total number of media.
@ -706,6 +749,8 @@ delete largest/smallest or newest/oldest files first.
## Login as a user
**Note:** This API is disabled when MSC3861 is enabled. [See #15582](https://github.com/matrix-org/synapse/pull/15582)
Get an access token that can be used to authenticate as that user. Useful for
when admins wish to do actions on behalf of a user.
@ -718,7 +763,8 @@ POST /_synapse/admin/v1/users/<user_id>/login
An optional `valid_until_ms` field can be specified in the request body as an
integer timestamp that specifies when the token should expire. By default tokens
do not expire.
do not expire. Note that this API does not allow a user to login as themselves
(to create more tokens).
A response body like the following is returned:
@ -738,6 +784,43 @@ Note: The token will expire if the *admin* user calls `/logout/all` from any
of their devices, but the token will *not* expire if the target user does the
same.
## Allow replacing master cross-signing key without User-Interactive Auth
This endpoint is not intended for server administrator usage;
we describe it here for completeness.
This API temporarily permits a user to replace their master cross-signing key
- Fix a bug which could cause the background database update hander for event labels to get stuck in a loop raising exceptions. ([\#6407](https://github.com/matrix-org/synapse/issues/6407))
- Fix a bug which could cause the background database update handler for event labels to get stuck in a loop raising exceptions. ([\#6407](https://github.com/matrix-org/synapse/issues/6407))
Synapse 1.6.0rc1 (2019-11-20)
@ -191,7 +191,7 @@ Bugfixes
- Appservice requests will no longer contain a double slash prefix when the appservice url provided ends in a slash. ([\#6306](https://github.com/matrix-org/synapse/issues/6306))
- Fix the `hidden` field in the `devices` table for SQLite versions prior to 3.23.0. ([\#6313](https://github.com/matrix-org/synapse/issues/6313))
- Fix bug which casued rejected events to be persisted with the wrong room state. ([\#6320](https://github.com/matrix-org/synapse/issues/6320))
- Fix bug which caused rejected events to be persisted with the wrong room state. ([\#6320](https://github.com/matrix-org/synapse/issues/6320))
- Fix bug where `rc_login` ratelimiting would prematurely kick in. ([\#6335](https://github.com/matrix-org/synapse/issues/6335))
- Prevent the server taking a long time to start up when guest registration is enabled. ([\#6338](https://github.com/matrix-org/synapse/issues/6338))
- Fix bug where upgrading a guest account to a full user would fail when account validity is enabled. ([\#6359](https://github.com/matrix-org/synapse/issues/6359))
@ -232,7 +232,7 @@ Internal Changes
- Add some documentation about worker replication. ([\#6305](https://github.com/matrix-org/synapse/issues/6305))
- Move admin endpoints into separate files. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#6308](https://github.com/matrix-org/synapse/issues/6308))
- Document the use of `lint.sh` for code style enforcement & extend it to run on specified paths only. ([\#6312](https://github.com/matrix-org/synapse/issues/6312))
- Add optional python dependencies and dependant binary libraries to snapcraft packaging. ([\#6317](https://github.com/matrix-org/synapse/issues/6317))
- Add optional python dependencies and dependent binary libraries to snapcraft packaging. ([\#6317](https://github.com/matrix-org/synapse/issues/6317))
- Remove the dependency on psutil and replace functionality with the stdlib `resource` module. ([\#6318](https://github.com/matrix-org/synapse/issues/6318), [\#6336](https://github.com/matrix-org/synapse/issues/6336))
- Improve documentation for EventContext fields. ([\#6319](https://github.com/matrix-org/synapse/issues/6319))
- Add some checks that we aren't using state from rejected events. ([\#6330](https://github.com/matrix-org/synapse/issues/6330))
@ -653,7 +653,7 @@ Internal Changes
- Return 502 not 500 when failing to reach any remote server. ([\#5810](https://github.com/matrix-org/synapse/issues/5810))
- Reduce global pauses in the events stream caused by expensive state resolution during persistence. ([\#5826](https://github.com/matrix-org/synapse/issues/5826))
- Add a lower bound to well-known lookup cache time to avoid repeated lookups. ([\#5836](https://github.com/matrix-org/synapse/issues/5836))
- Whitelist history visbility sytests in worker mode tests. ([\#5843](https://github.com/matrix-org/synapse/issues/5843))
- Whitelist history visibility sytests in worker mode tests. ([\#5843](https://github.com/matrix-org/synapse/issues/5843))
Synapse 1.2.1 (2019-07-26)
@ -817,7 +817,7 @@ See the [upgrade notes](docs/upgrade.md#upgrading-to-v110) for more details.
Features
--------
- Added possibilty to disable local password authentication. Contributed by Daniel Hoffend. ([\#5092](https://github.com/matrix-org/synapse/issues/5092))
- Added possibility to disable local password authentication. Contributed by Daniel Hoffend. ([\#5092](https://github.com/matrix-org/synapse/issues/5092))
- Add monthly active users to phonehome stats. ([\#5252](https://github.com/matrix-org/synapse/issues/5252))
- Allow expired user to trigger renewal email sending manually. ([\#5363](https://github.com/matrix-org/synapse/issues/5363))
- Statistics on forward extremities per room are now exposed via Prometheus. ([\#5384](https://github.com/matrix-org/synapse/issues/5384), [\#5458](https://github.com/matrix-org/synapse/issues/5458), [\#5461](https://github.com/matrix-org/synapse/issues/5461))
@ -850,7 +850,7 @@ Bugfixes
- Fix bug where clients could tight loop calling `/sync` for a period. ([\#5507](https://github.com/matrix-org/synapse/issues/5507))
- Fix bug with `jinja2` preventing Synapse from starting. Users who had this problem should now simply need to run `pip install matrix-synapse`. ([\#5514](https://github.com/matrix-org/synapse/issues/5514))
- Fix a regression where homeservers on private IP addresses were incorrectly blacklisted. ([\#5523](https://github.com/matrix-org/synapse/issues/5523))
- Fixed m.login.jwt using unregistred user_id and added pyjwt>=1.6.4 as jwt conditional dependencies. Contributed by Pau Rodriguez-Estivill. ([\#5555](https://github.com/matrix-org/synapse/issues/5555), [\#5586](https://github.com/matrix-org/synapse/issues/5586))
- Fixed m.login.jwt using unregistered user_id and added pyjwt>=1.6.4 as jwt conditional dependencies. Contributed by Pau Rodriguez-Estivill. ([\#5555](https://github.com/matrix-org/synapse/issues/5555), [\#5586](https://github.com/matrix-org/synapse/issues/5586))
- Fix a bug that would cause invited users to receive several emails for a single 3PID invite in case the inviter is rate limited. ([\#5576](https://github.com/matrix-org/synapse/issues/5576))
- Type hints for `RegistrationStore`. ([\#8615](https://github.com/matrix-org/synapse/issues/8615))
- Change schema to support access tokens belonging to one user but granting access to another. ([\#8616](https://github.com/matrix-org/synapse/issues/8616))
- Fix a bug which cause the logging system to report errors, if `DEBUG` was enabled and no `context` filter was applied. ([\#8278](https://github.com/matrix-org/synapse/issues/8278))
- Fix edge case where push could get delayed for a user until a later event was pushed. ([\#8287](https://github.com/matrix-org/synapse/issues/8287))
- Fix fetching malformed events from remote servers. ([\#8324](https://github.com/matrix-org/synapse/issues/8324))
- Fix `UnboundLocalError` from occuring when appservices send a malformed register request. ([\#8329](https://github.com/matrix-org/synapse/issues/8329))
- Fix `UnboundLocalError` from occurring when appservices send a malformed register request. ([\#8329](https://github.com/matrix-org/synapse/issues/8329))
- Don't send push notifications to expired user accounts. ([\#8353](https://github.com/matrix-org/synapse/issues/8353))
- Fix a regression in v1.19.0 with reactivating users through the admin API. ([\#8362](https://github.com/matrix-org/synapse/issues/8362))
- Fix a bug where during device registration the length of the device name wasn't limited. ([\#8364](https://github.com/matrix-org/synapse/issues/8364))
@ -815,7 +815,7 @@ Bugfixes
- Fix a bug introduced in Synapse v1.7.2 which caused inaccurate membership counts in the room directory. ([\#7977](https://github.com/matrix-org/synapse/issues/7977))
- Fix a long standing bug: 'Duplicate key value violates unique constraint "event_relations_id"' when message retention is configured. ([\#7978](https://github.com/matrix-org/synapse/issues/7978))
- Fix "no create event in auth events" when trying to reject invitation after inviter leaves. Bug introduced in Synapse v1.10.0. ([\#7980](https://github.com/matrix-org/synapse/issues/7980))
- Fix various comments and minor discrepencies in server notices code. ([\#7996](https://github.com/matrix-org/synapse/issues/7996))
- Fix various comments and minor discrepancies in server notices code. ([\#7996](https://github.com/matrix-org/synapse/issues/7996))
- Fix a long standing bug where HTTP HEAD requests resulted in a 400 error. ([\#7999](https://github.com/matrix-org/synapse/issues/7999))
- Fix a long-standing bug which caused two copies of some log lines to be written when synctl was used along with a MemoryHandler logger. ([\#8011](https://github.com/matrix-org/synapse/issues/8011), [\#8012](https://github.com/matrix-org/synapse/issues/8012))
@ -1460,7 +1460,7 @@ Bugfixes
- Transfer alias mappings on room upgrade. ([\#6946](https://github.com/matrix-org/synapse/issues/6946))
- Ensure that a user interactive authentication session is tied to a single request. ([\#7068](https://github.com/matrix-org/synapse/issues/7068), [\#7455](https://github.com/matrix-org/synapse/issues/7455))
- Fix a bug in the federation API which could cause occasional "Failed to get PDU" errors. ([\#7089](https://github.com/matrix-org/synapse/issues/7089))
- Return the proper error (`M_BAD_ALIAS`) when a non-existant canonical alias is provided. ([\#7109](https://github.com/matrix-org/synapse/issues/7109))
- Return the proper error (`M_BAD_ALIAS`) when a non-existent canonical alias is provided. ([\#7109](https://github.com/matrix-org/synapse/issues/7109))
- Fix a bug which meant that groups updates were not correctly replicated between workers. ([\#7117](https://github.com/matrix-org/synapse/issues/7117))
- Fix starting workers when federation sending not split out. ([\#7133](https://github.com/matrix-org/synapse/issues/7133))
- Ensure `is_verified` is a boolean in responses to `GET /_matrix/client/r0/room_keys/keys`. Also warn the user if they forgot the `version` query param. ([\#7150](https://github.com/matrix-org/synapse/issues/7150))
@ -1482,7 +1482,7 @@ Bugfixes
- Fix bad error handling that would cause Synapse to crash if it's provided with a YAML configuration file that's either empty or doesn't parse into a key-value map. ([\#7341](https://github.com/matrix-org/synapse/issues/7341))
- Fix incorrect metrics reporting for `renew_attestations` background task. ([\#7344](https://github.com/matrix-org/synapse/issues/7344))
- Prevent non-federating rooms from appearing in responses to federated `POST /publicRoom` requests when a filter was included. ([\#7367](https://github.com/matrix-org/synapse/issues/7367))
- Fix a bug which would cause the room durectory to be incorrectly populated if Synapse was upgraded directly from v1.2.1 or earlier to v1.4.0 or later. Note that this fix does not apply retrospectively; see the [upgrade notes](docs/upgrade.md#upgrading-to-v1130) for more information. ([\#7387](https://github.com/matrix-org/synapse/issues/7387))
- Fix a bug which would cause the room directory to be incorrectly populated if Synapse was upgraded directly from v1.2.1 or earlier to v1.4.0 or later. Note that this fix does not apply retrospectively; see the [upgrade notes](docs/upgrade.md#upgrading-to-v1130) for more information. ([\#7387](https://github.com/matrix-org/synapse/issues/7387))
- Fix bug in `EventContext.deserialize`. ([\#7393](https://github.com/matrix-org/synapse/issues/7393))
@ -1638,7 +1638,7 @@ Security advisory
-----------------
Synapse may be vulnerable to request-smuggling attacks when it is used with a
reverse-proxy. The vulnerabilties are fixed in Twisted 20.3.0, and are
reverse-proxy. The vulnerabilities are fixed in Twisted 20.3.0, and are
- Refactoring work in preparation for changing the event redaction algorithm. ([\#6874](https://github.com/matrix-org/synapse/issues/6874), [\#6875](https://github.com/matrix-org/synapse/issues/6875), [\#6983](https://github.com/matrix-org/synapse/issues/6983), [\#7003](https://github.com/matrix-org/synapse/issues/7003))
- Improve performance of v2 state resolution for large rooms. ([\#6952](https://github.com/matrix-org/synapse/issues/6952), [\#7095](https://github.com/matrix-org/synapse/issues/7095))
- Reduce time spent doing GC, by freezing objects on startup. ([\#6953](https://github.com/matrix-org/synapse/issues/6953))
- Minor perfermance fixes to `get_auth_chain_ids`. ([\#6954](https://github.com/matrix-org/synapse/issues/6954))
- Minor performance fixes to `get_auth_chain_ids`. ([\#6954](https://github.com/matrix-org/synapse/issues/6954))
- Don't record remote cross-signing keys in the `devices` table. ([\#6956](https://github.com/matrix-org/synapse/issues/6956))
- Use flake8-comprehensions to enforce good hygiene of list/set/dict comprehensions. ([\#6957](https://github.com/matrix-org/synapse/issues/6957))
- Allow URL-encoded User IDs on `/_synapse/admin/v2/users/<user_id>[/admin]` endpoints. Thanks to @NHAS for reporting. ([\#6825](https://github.com/matrix-org/synapse/issues/6825))
- Fix Synapse refusing to start if `federation_certificate_verification_whitelist` option is blank. ([\#6849](https://github.com/matrix-org/synapse/issues/6849))
- Fix errors from logging in the purge jobs related to the message retention policies support. ([\#6945](https://github.com/matrix-org/synapse/issues/6945))
- Return a 404 instead of 200 for querying information of a non-existant user through the admin API. ([\#6901](https://github.com/matrix-org/synapse/issues/6901))
- Return a 404 instead of 200 for querying information of a non-existent user through the admin API. ([\#6901](https://github.com/matrix-org/synapse/issues/6901))
Updates to the Docker image
@ -1889,7 +1889,7 @@ Bugfixes
Synapse 1.10.0rc4 (2020-02-11)
==============================
This release candidate was built incorrectly and is superceded by 1.10.0rc5.
This release candidate was built incorrectly and is superseded by 1.10.0rc5.
- Fix spurious errors in logs when deleting a non-existant pusher. ([\#9121](https://github.com/matrix-org/synapse/issues/9121))
- Fix spurious errors in logs when deleting a non-existent pusher. ([\#9121](https://github.com/matrix-org/synapse/issues/9121))
- Fix a long-standing bug where Synapse would return a 500 error when a thumbnail did not exist (and auto-generation of thumbnails was not enabled). ([\#9163](https://github.com/matrix-org/synapse/issues/9163))
- Fix a long-standing bug where an internal server error was raised when attempting to preview an HTML document in an unknown character encoding. ([\#9164](https://github.com/matrix-org/synapse/issues/9164))
- Fix a long-standing bug where invalid data could cause errors when calculating the presentable room name for push. ([\#9165](https://github.com/matrix-org/synapse/issues/9165))
@ -2522,7 +2522,7 @@ Bugfixes
- Fix a long-standing bug where a `m.image` event without a `url` would cause errors on push. ([\#8965](https://github.com/matrix-org/synapse/issues/8965))
- Fix a small bug in v2 state resolution algorithm, which could also cause performance issues for rooms with large numbers of power levels. ([\#8971](https://github.com/matrix-org/synapse/issues/8971))
- Add validation to the `sendToDevice` API to raise a missing parameters error instead of a 500 error. ([\#8975](https://github.com/matrix-org/synapse/issues/8975))
- Add validation of group IDs to raise a 400 error instead of a 500 eror. ([\#8977](https://github.com/matrix-org/synapse/issues/8977))
- Add validation of group IDs to raise a 400 error instead of a 500 error. ([\#8977](https://github.com/matrix-org/synapse/issues/8977))
@ -22,6 +22,9 @@ on Windows is not officially supported.
The code of Synapse is written in Python 3. To do pretty much anything, you'll need [a recent version of Python 3](https://www.python.org/downloads/). Your Python also needs support for [virtual environments](https://docs.python.org/3/library/venv.html). This is usually built-in, but some Linux distributions like Debian and Ubuntu split it out into its own package. Running `sudo apt install python3-venv` should be enough.
A recent version of the Rust compiler is needed to build the native modules. The
easiest way of installing the latest version is to use [rustup](https://rustup.rs/).
Synapse can connect to PostgreSQL via the [psycopg2](https://pypi.org/project/psycopg2/) Python library. Building this library from source requires access to PostgreSQL's C header files. On Debian or Ubuntu Linux, these can be installed with `sudo apt install libpq-dev`.
Synapse has an optional, improved user search with better Unicode support. For that you need the development package of `libicu`. On Debian or Ubuntu Linux, this can be installed with `sudo apt install libicu-dev`.
@ -30,9 +33,6 @@ The source code of Synapse is hosted on GitHub. You will also need [a recent ver
For some tests, you will need [a recent version of Docker](https://docs.docker.com/get-docker/).
A recent version of the Rust compiler is needed to build the native modules. The
easiest way of installing the latest version is to use [rustup](https://rustup.rs/).
# 3. Get the source.
@ -53,6 +53,11 @@ can find many good git tutorials on the web.
# 4. Install the dependencies
Before installing the Python dependencies, make sure you have installed a recent version
of Rust (see the "What do I need?" section above). The easiest way of installing the
latest version is to use [rustup](https://rustup.rs/).
Synapse uses the [poetry](https://python-poetry.org/) project to manage its dependencies
and development environment. Once you have installed Python 3 and added the
source, you should install `poetry`.
@ -61,13 +66,13 @@ Of their installation methods, we recommend
```shell
pip install --user pipx
pipx install poetry
pipx install poetry==1.5.1 # Problems with Poetry 1.6, see https://github.com/matrix-org/synapse/issues/16147
```
but see poetry's [installation instructions](https://python-poetry.org/docs/#installation)
for other installation methods.
Synapse requires Poetry version 1.2.0 or later.
Developing Synapse requires Poetry version 1.3.2 or later.
Next, open a terminal and install dependencies as follows:
@ -76,8 +81,39 @@ cd path/where/you/have/cloned/the/repository
poetry install --extras all
```
This will install the runtime and developer dependencies for the project.
This will install the runtime and developer dependencies for the project. Be sure to check
that the `poetry install` step completed cleanly.
## Running Synapse via poetry
To start a local instance of Synapse in the locked poetry environment, create a config file:
```sh
cp docs/sample_config.yaml homeserver.yaml
cp docs/sample_log_config.yaml log_config.yaml
```
Now edit `homeserver.yaml`, things you might want to change include:
- Set a `server_name`
- Adjusting paths to be correct for your system like the `log_config` to point to the log config you just copied
- Using a [PostgreSQL database instead of SQLite](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#database)
- Adding a [`registration_shared_secret`](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#registration_shared_secret) so you can use [`register_new_matrix_user` command](https://matrix-org.github.io/synapse/latest/setup/installation.html#registering-a-user).
And then run Synapse with the following command:
```sh
poetry run python -m synapse.app.homeserver -c homeserver.yaml
Boolean columns require special treatment, since SQLite treats booleans the
same as integers.
There are three separate aspects to this:
* Any new boolean column must be added to the `BOOLEAN_COLUMNS` list in
Any new boolean column must be added to the `BOOLEAN_COLUMNS` list in
`synapse/_scripts/synapse_port_db.py`. This tells the port script to cast
the integer value from SQLite to a boolean before writing the value to the
postgres database.
* Before SQLite 3.23, `TRUE` and `FALSE` were not recognised as constants by
SQLite, and the `IS [NOT] TRUE`/`IS [NOT] FALSE` operators were not
supported. This makes it necessary to avoid using `TRUE` and `FALSE`
constants in SQL commands.
For example, to insert a `TRUE` value into the database, write:
```python
txn.execute("INSERT INTO tbl(col) VALUES (?)", (True, ))
```
* Default values for new boolean columns present a particular
difficulty. Generally it is best to create separate schema files for
Postgres and SQLite. For example:
```sql
# in 00delta.sql.postgres:
ALTER TABLE tbl ADD COLUMN col BOOLEAN DEFAULT FALSE;
```
```sql
# in 00delta.sql.sqlite:
ALTER TABLE tbl ADD COLUMN col BOOLEAN DEFAULT 0;
```
Note that there is a particularly insidious failure mode here: the Postgres
flavour will be accepted by SQLite 3.22, but will give a column whose
default value is the **string**`"FALSE"` - which, when cast back to a boolean
in Python, evaluates to `True`.
## `event_id` global uniqueness
@ -216,3 +245,160 @@ version `3`, that can only happen with a hash collision, which we basically hope
will never happen (SHA256 has a massive big key space).
## Worked examples of gradual migrations
Some migrations need to be performed gradually. A prime example of this is anything
which would need to do a large table scan — including adding columns, indices or
`NOT NULL` constraints to non-empty tables — such a migration should be done as a
background update where possible, at least on Postgres.
We can afford to be more relaxed about SQLite databases since they are usually
used on smaller deployments and SQLite does not support the same concurrent
DDL operations as Postgres.
We also typically insist on having at least one Synapse version's worth of
backwards compatibility, so that administrators can roll back Synapse if an upgrade
did not go smoothly.
This sometimes results in having to plan a migration across multiple versions
of Synapse.
This section includes an example and may include more in the future.
### Transforming a column into another one, with `NOT NULL` constraints
This example illustrates how you would introduce a new column, write data into it
based on data from an old column and then drop the old column.
We are aiming for semantic equivalence to:
```sql
ALTER TABLE mytable ADD COLUMN new_column INTEGER;
UPDATE mytable SET new_column = old_column * 100;
ALTER TABLE mytable ALTER COLUMN new_column ADD CONSTRAINT NOT NULL;
ALTER TABLE mytable DROP COLUMN old_column;
```
#### Synapse version `N`
```python
SCHEMA_VERSION = S
SCHEMA_COMPAT_VERSION = ... # unimportant at this stage
```
**Invariants:**
1. `old_column` is read by Synapse and written to by Synapse.
#### Synapse version `N + 1`
```python
SCHEMA_VERSION = S + 1
SCHEMA_COMPAT_VERSION = ... # unimportant at this stage
```
**Changes:**
1.
```sql
ALTER TABLE mytable ADD COLUMN new_column INTEGER;
```
**Invariants:**
1. `old_column` is read by Synapse and written to by Synapse.
2. `new_column` is written to by Synapse.
**Notes:**
1. `new_column` can't have a `NOT NULL NOT VALID` constraint yet, because the previous Synapse version did not write to the new column (since we haven't bumped the `SCHEMA_COMPAT_VERSION` yet, we still need to be compatible with the previous version).
#### Synapse version `N + 2`
```python
SCHEMA_VERSION = S + 2
SCHEMA_COMPAT_VERSION = S + 1 # this signals that we can't roll back to a time before new_column existed
```
**Changes:**
1. On Postgres, add a `NOT VALID` constraint to ensure new rows are compliant. *SQLite does not have such a construct, but it would be unnecessary anyway since there is no way to concurrently perform this migration on SQLite.*
```sql
ALTER TABLE mytable ADD CONSTRAINT CHECK new_column_not_null (new_column IS NOT NULL) NOT VALID;
```
2. Start a background update to perform migration: it should gradually run e.g.
```sql
UPDATE mytable SET new_column = old_column * 100 WHERE 0 <mytable_idANDmytable_id<=5;
```
This background update is technically pointless on SQLite, but you must schedule it anyway so that the `portdb` script to migrate to Postgres still works.
3. Upon completion of the background update, you should run `VALIDATE CONSTRAINT` on Postgres to turn the `NOT VALID` constraint into a valid one.
```sql
ALTER TABLE mytable VALIDATE CONSTRAINT new_column_not_null;
```
This will take some time but does **NOT** hold an exclusive lock over the table.
**Invariants:**
1. `old_column` is read by Synapse and written to by Synapse.
2. `new_column` is written to by Synapse and new rows always have a non-`NULL` value in this field.
**Notes:**
1. If you wish, you can convert the `CHECK (new_column IS NOT NULL)` to a `NOT NULL` constraint free of charge in Postgres by adding the `NOT NULL` constraint and then dropping the `CHECK` constraint, because Postgres can statically verify that the `NOT NULL` constraint is implied by the `CHECK` constraint without performing a table scan.
2. It might be tempting to make version `N + 2` redundant by moving the background update to `N + 1` and delaying adding the `NOT NULL` constraint to `N + 3`, but that would mean the constraint would always be validated in the foreground in `N + 3`. Whereas if the `N + 2` step is kept, the migration in `N + 3` would be fast in the happy case.
#### Synapse version `N + 3`
```python
SCHEMA_VERSION = S + 3
SCHEMA_COMPAT_VERSION = S + 1 # we can't roll back to a time before new_column existed
```
**Changes:**
1. (Postgres) Update the table to populate values of `new_column` in case the background update had not completed. Additionally, `VALIDATE CONSTRAINT` to make the check fully valid.
```sql
-- you ideally want an index on `new_column` or e.g. `(new_column) WHERE new_column IS NULL` first, or perhaps you can find a way to skip this if the `NOT NULL` constraint has already been validated.
UPDATE mytable SET new_column = old_column * 100 WHERE new_column IS NULL;
-- this is a no-op if it already ran as part of the background update
ALTER TABLE mytable VALIDATE CONSTRAINT new_column_not_null;
```
2. (SQLite) Recreate the table by precisely following [the 12-step procedure for SQLite table schema changes](https://www.sqlite.org/lang_altertable.html#otheralter).
During this table rewrite, you should recreate `new_column` as `NOT NULL` and populate any outstanding `NULL` values at the same time.
Unfortunately, you can't drop `old_column` yet because it must be present for compatibility with the Postgres schema, as needed by `portdb`.
(Otherwise you could do this all in one go with SQLite!)
**Invariants:**
1. `old_column` is written to by Synapse (but no longer read by Synapse!).
2. `new_column` is read by Synapse and written to by Synapse. Moreover, all rows have a non-`NULL` value in this field, as guaranteed by a schema constraint.
**Notes:**
1. We can't drop `old_column` yet, or even stop writing to it, because that would break a rollback to the previous version of Synapse.
2. Application code can now rely on `new_column` being populated. The remaining steps are only motivated by the wish to clean-up old columns.
#### Synapse version `N + 4`
```python
SCHEMA_VERSION = S + 4
SCHEMA_COMPAT_VERSION = S + 3 # we can't roll back to a time before new_column was entirely non-NULL
```
**Invariants:**
1. `old_column` exists but is not written to or read from by Synapse.
2. `new_column` is read by Synapse and written to by Synapse. Moreover, all rows have a non-`NULL` value in this field, as guaranteed by a schema constraint.
**Notes:**
1. We can't drop `old_column` yet because that would break a rollback to the previous version of Synapse. \
**TODO:** It may be possible to relax this and drop the column straight away as long as the previous version of Synapse detected a rollback occurred and stopped attempting to write to the column. This could possibly be done by checking whether the database's schema compatibility version was `S + 3`.
#### Synapse version `N + 5`
```python
SCHEMA_VERSION = S + 5
SCHEMA_COMPAT_VERSION = S + 4 # we can't roll back to a time before old_column was no longer being touched
This is a quick cheat sheet for developers on how to use [`poetry`](https://python-poetry.org/).
# Installing
See the [contributing guide](contributing_guide.md#4-install-the-dependencies).
Developers should use Poetry 1.3.2 or higher. If you encounter problems related
to poetry, please [double-check your poetry version](#check-the-version-of-poetry-with-poetry---version).
# Background
Synapse uses a variety of third-party Python packages to function as a homeserver.
@ -123,7 +130,7 @@ context of poetry's venv, without having to run `poetry shell` beforehand.
## ...reset my venv to the locked environment?
```shell
poetry install --extras all --remove-untracked
poetry install --all-extras --sync
```
## ...delete everything and start over from scratch?
@ -183,7 +190,6 @@ Either:
- manually update `pyproject.toml`; then `poetry lock --no-update`; or else
- `poetry add packagename`. See `poetry add --help`; note the `--dev`,
`--extras` and `--optional` flags in particular.
- **NB**: this specifies the new package with a version given by a "caret bound". This won't get forced to its lowest version in the old deps CI job: see [this TODO](https://github.com/matrix-org/synapse/blob/4e1374373857f2f7a911a31c50476342d9070681/.ci/scripts/test_old_deps.sh#L35-L39).
Include the updated `pyproject.toml` and `poetry.lock` files in your commit.
@ -196,7 +202,7 @@ poetry remove packagename
```
ought to do the trick. Alternatively, manually update `pyproject.toml` and
`poetry lock --no-update`. Include the updated `pyproject.toml` and poetry.lock`
`poetry lock --no-update`. Include the updated `pyproject.toml` and `poetry.lock`
files in your commit.
## ...update the version range for an existing dependency?
@ -240,9 +246,6 @@ poetry export --extras all
Be wary of bugs in `poetry export` and `pip install -r requirements.txt`.
Note: `poetry export` will be made a plugin in Poetry 1.2. Additional config may
be required.
## ...build a test wheel?
I usually use
@ -255,12 +258,28 @@ because [`build`](https://github.com/pypa/build) is a standardish tool which
doesn't require poetry. (It's what we use in CI too). However, you could try
`poetry build` too.
## ...handle a Dependabot pull request?
Synapse uses Dependabot to keep the `poetry.lock` and `Cargo.lock` file
up-to-date with the latest releases of our dependencies. The changelog check is
omitted for Dependabot PRs; the release script will include them in the
changelog.
When reviewing a dependabot PR, ensure that:
* the lockfile changes look reasonable;
* the upstream changelog file (linked in the description) doesn't include any
breaking changes;
* continuous integration passes.
In particular, any updates to the type hints (usually packages which start with `types-`)
should be safe to merge if linting passes.
# Troubleshooting
## Check the version of poetry with `poetry --version`.
The minimum version of poetry supported by Synapse is 1.2.
The minimum version of poetry supported by Synapse is 1.3.2.
It can also be useful to check the version of `poetry-core` in use. If you've
installed `poetry` with `pipx`, try `pipx runpip poetry list | grep
). So any the erroneous invite should be ignored by fully-joined
homeservers and resolved by the resync for partially-joined homeservers.
In more generality, there are two problems we're worrying about here:
- We might create an event that is valid under our partial state, only to later
find out that is actually invalid according to the full state.
- Or: we might refuse to create an event that is invalid under our partial
state, even though it would be perfectly valid under the full state.
However we expect such problems to be unlikely in practise, because
- We trust that the room has sensible power levels, e.g. that bad actors with
high power levels are demoted before their ban.
- We trust that the resident server provides us up-to-date power levels, join
rules, etc.
- State changes in rooms are relatively infrequent, and the resync period is
relatively quick.
#### Sending out the event over federation
**TODO:** needs prose fleshing out.
Normally: send out in a fed txn to all HSes in the room.
We only know that some HSes were in the room at some point. Wat do.
Send it out to the list of servers from the first join.
**TODO** what do we do here if we have full state?
If the prev event was created by us, we can risk sending it to the wrong HS. (Motivation: privacy concern of the content. Not such a big deal for a public room or an encrypted room. But non-encrypted invite-only...)
But don't want to send out sensitive data in other HS's events in this way.
Suppose we discover after resync that we shouldn't have sent out one our events (not a prev_event) to a target HS. Not much we can do.
What about if we didn't send them an event but shouldn't've?
E.g. what if someone joined from a new HS shortly after you did? We wouldn't talk to them.
Could imagine sending out the "Missed" events after the resync but... painful to work out what they should have seen if they joined/left.
Instead, just send them the latest event (if they're still in the room after resync) and let them backfill.(?)
- Don't do this currently.
- If anyone who has received our messages sends a message to a HS we missed, they can backfill our messages
- Gap: rooms which are infrequently used and take a long time to resync.
(Rich was surprised we didn't just create it locally. Answer: to try and avoid
a join which then gets rejected after resync.)
We don't know for sure that any join we create would be accepted.
E.g. the joined user might have been banned; the join rules might have changed in a way that we didn't realise... some way in which the partial state was mistaken.
Instead, do another partial make-join/send-join handshake to confirm that the join works.
- Probably going to get a bunch of duplicate state events and auth events.... but the point of partial joins is that these should be small. Many are already persisted = good.
- What if the second send_join response includes a different list of reisdent HSes? Could ignore it.
- Could even have a special flag that says "just make me a join", i.e. don't bother giving me state or servers in room. Deffo want the auth chain tho.
- SQ: wrt device lists it's a lot safer to ignore it!!!!!
- What if the state at the second join is inconsistent with what we have? Ignore it?
</details>
### Leaving (and kicks and bans) after a partial join
**NB.** Not yet implemented.
<details>
When you're fully joined to a room, to have `U` leave a room their homeserver
needs to
- create a new leave event for `U` which will be accepted by other homeservers,
and
- send that event `U` out to the homeservers in the federation.
Generally speaking, streams are a series of notifications that something in Synapse's database has changed that the application might need to respond to.
For example:
- The events stream reports new events (PDUs) that Synapse creates, or that Synapse accepts from another homeserver.
- The account data stream reports changes to users' [account data](https://spec.matrix.org/v1.7/client-server-api/#client-config).
- The to-device stream reports when a device has a new [to-device message](https://spec.matrix.org/v1.7/client-server-api/#send-to-device-messaging).
It is very helpful to understand the streams mechanism when working on any part of Synapse that needs to respond to changes—especially if those changes are made by different workers.
To that end, let's describe streams formally, paraphrasing from the docstring of [`AbstractStreamIdGenerator`](
A stream is an append-only log `T1, T2, ..., Tn, ...` of facts[^1] which grows over time.
Only "writers" can add facts to a stream, and there may be multiple writers.
Each fact has an ID, called its "stream ID".
Readers should only process facts in ascending stream ID order.
Roughly speaking, each stream is backed by a database table.
It should have a `stream_id` (or similar) bigint column holding stream IDs, plus additional columns as necessary to describe the fact.
Typically, a fact is expressed with a single row in its backing table.[^2]
Within a stream, no two facts may have the same stream_id.
> _Aside_. Some additional notes on streams' backing tables.
>
> 1. Rich would like to [ditch the backing tables](https://github.com/matrix-org/synapse/issues/13456).
> 2. The backing tables may have other uses.
> For example, the events table serves backs the events stream, and is read when processing new events.
> But old rows are read from the table all the time, whenever Synapse needs to lookup some facts about an event.
> 3. Rich suspects that sometimes the stream is backed by multiple tables, so the stream proper is the union of those tables.
Stream writers can "reserve" a stream ID, and then later mark it as having being completed.
Stream writers need to track the completion of each stream fact.
In the happy case, completion means a fact has been written to the stream table.
But unhappy cases (e.g. transaction rollback due to an error) also count as completion.
Once completed, the rows written with that stream ID are fixed, and no new rows
will be inserted with that ID.
### Current stream ID
For any given stream reader (including writers themselves), we may define a per-writer current stream ID:
> A current stream ID _for a writer W_ is the largest stream ID such that
> all transactions added by W with equal or smaller ID have completed.
Similarly, there is a "linear" notion of current stream ID:
> A "linear" current stream ID is the largest stream ID such that
> all facts (added by any writer) with equal or smaller ID have completed.
Because different stream readers A and B learn about new facts at different times, A and B may disagree about current stream IDs.
Put differently: we should think of stream readers as being independent of each other, proceeding through a stream of facts at different rates.
The above definition does not give a unique current stream ID, in fact there can
be a range of current stream IDs. Synapse uses both the minimum and maximum IDs
for different purposes. Most often the maximum is used, as its generally
beneficial for workers to advance their IDs as soon as possible. However, the
minimum is used in situations where e.g. another worker is going to wait until
the stream advances past a position.
**NB.** For both senses of "current", that if a writer opens a transaction that never completes, the current stream ID will never advance beyond that writer's last written stream ID.
For single-writer streams, the per-writer current ID and the linear current ID are the same.
Both senses of current ID are monotonic, but they may "skip" or jump over IDs because facts complete out of order.
_Example_.
Consider a single-writer stream which is initially at ID 1.
| Complete 3 | 1 | current ID unchanged, waiting for 2 to complete |
| Complete 2 | 3 | current ID jumps from 1 -> 3 |
| Reserve 4 | 3 | |
| Reserve 5 | 3 | |
| Reserve 6 | 3 | |
| Complete 5 | 3 | |
| Complete 4 | 5 | current ID jumps 3->5, even though 6 is pending |
| Complete 6 | 6 | |
### Multi-writer streams
There are two ways to view a multi-writer stream.
1. Treat it as a collection of distinct single-writer streams, one
for each writer.
2. Treat it as a single stream.
The single stream (option 2) is conceptually simpler, and easier to represent (a single stream id).
However, it requires each reader to know about the entire set of writers, to ensures that readers don't erroneously advance their current stream position too early and miss a fact from an unknown writer.
In contrast, multiple parallel streams (option 1) are more complex, requiring more state to represent (map from writer to stream id).
The payoff for doing so is that readers can "peek" ahead to facts that completed on one writer no matter the state of the others, reducing latency.
Note that a multi-writer stream can be viewed in both ways.
For example, the events stream is treated as multiple single-writer streams (option 1) by the sync handler, so that events are sent to clients as soon as possible.
But the background process that works through events treats them as a single linear stream.
Another useful example is the cache invalidation stream.
The facts this stream holds are instructions to "you should now invalidate these cache entries".
We only ever treat this as a multiple single-writer streams as there is no important ordering between cache invalidations.
(Invalidations are self-contained facts; and the invalidations commute/are idempotent).
### Writing to streams
Writers need to track:
- track their current position (i.e. its own per-writer stream ID).
- their facts currently awaiting completion.
At startup,
- the current position of that writer can be found by querying the database (which suggests that facts need to be written to the database atomically, in a transaction); and
- there are no facts awaiting completion.
To reserve a stream ID, call [`nextval`](https://www.postgresql.org/docs/current/functions-sequence.html) on the appropriate postgres sequence.
To write a fact to the stream: insert the appropriate rows to the appropriate backing table.
To complete a fact, first remove it from your map of facts currently awaiting completion.
Then, if no earlier fact is awaiting completion, the writer can advance its current position in that stream.
Upon doing so it should emit an `RDATA` message[^3], once for every fact between the old and the new stream ID.
### Subscribing to streams
Readers need to track the current position of every writer.
At startup, they can find this by contacting each writer with a `REPLICATE` message,
requesting that all writers reply describing their current position in their streams.
Writers reply with a `POSITION` message.
To learn about new facts, readers should listen for `RDATA` messages and process them to respond to the new fact.
The `RDATA` itself is not a self-contained representation of the fact;
readers will have to query the stream tables for the full details.
Readers must also advance their record of the writer's current position for that stream.
# Summary
In a nutshell: we have an append-only log with a "buffer/scratchpad" at the end where we have to wait for the sequence to be linear and contiguous.
---
[^1]: we use the word _fact_ here for two reasons.
Firstly, the word "event" is already heavily overloaded (PDUs, EDUs, account data, ...) and we don't need to make that worse.
Secondly, "fact" emphasises that the things we append to a stream cannot change after the fact.
[^2]: A fact might be expressed with 0 rows, e.g. if we opened a transaction to persist an event, but failed and rolled the transaction back before marking the fact as completed.
In principle a fact might be expressed with 2 or more rows; if so, each of those rows should share the fact's stream ID.
[^3]: This communication used to happen directly with the writers [over TCP](../../tcp_replication.md);
nowadays it's done via Redis's Pubsub.
Some files were not shown because too many files have changed in this diff
Show More
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.