Commit Graph

316 Commits

Author SHA1 Message Date
Jon Chambers 647a2aea64 Cache a reference to the OS management bean to avoid repeated lookups. 2020-08-10 12:56:37 -04:00
Jon Chambers 58e58ce51c Remove a candidate metric provider. 2020-08-10 11:03:20 -04:00
Ehren Kret 4b7e48d3ec
Override default ingestion URI for SignalFx (#131) 2020-08-07 15:29:42 -05:00
Ehren Kret 0e074d3a5a Copy SignalFxMeterRegistry into a new class to get better logging 2020-08-07 16:01:56 -04:00
Ehren Kret ea00224e7f
Add support for reporting metrics to signalfx (#129) 2020-08-07 11:10:31 -05:00
Jon Chambers 38293efe75 Keep a running count of the number of open websockets. 2020-08-06 16:07:34 -04:00
Jon Chambers 3286c5e174 Disable Redis persistence for tests. 2020-08-06 11:22:51 -04:00
Jon Chambers e0f8a28f38 Close connections before closing the whole cluster client. 2020-08-06 11:22:31 -04:00
Jon Chambers bf1b00b163 Drop a spurious RedisClusterClient. 2020-08-06 11:22:31 -04:00
Ehren Kret 4fa3a136ad
Remove arbitrary SMS and add a NANPA message service (#123)
* Remove arbitrary SMS code

This code has run its course and is no longer needed for now.

* Add elements to sample config that were left out

* Add a messaging service for NANPA

* Fixup sample config capitalization
2020-08-05 13:35:11 -05:00
Jon Chambers 178a6bd66e Log the top-level exception name and message when crawling badness happens. 2020-08-05 11:23:16 -04:00
Ehren Kret 57e1339230
Further restrict user agent pattern matching (#120)
* Further restrict user agent pattern matching

* Add static qualifier to method
2020-08-04 12:58:16 -05:00
Jon Chambers 4144423227 Publish percentiles for Micrometer distributions/timers. 2020-08-04 10:58:59 -04:00
Jon Chambers 4d03514142 Add a command for clearing the messages cache cluster. 2020-08-04 10:58:41 -04:00
Jon Chambers 0bc5566976 Mirror delete-after-persist operations to the clustered message cache. 2020-08-04 10:58:41 -04:00
Jon Chambers 925567add5 Actually "plug in" the reglock counter. 2020-08-03 15:43:33 -04:00
Jon Chambers ad97731d46 Reduce the maximum number of versions in play to 1,000. 2020-08-03 15:42:15 -04:00
Jon Chambers 40684a93a2 Restrict user-agent version matching to a more confined space. 2020-08-03 15:42:15 -04:00
Jon Chambers f3b644ceb8 Update the push latency manager to use UUIDs and a Redis cluster. 2020-08-03 15:36:02 -04:00
Jon Chambers 901ba6e87f Added a push latency manager. 2020-08-03 15:36:02 -04:00
Jon Chambers 76389bd584 Clear would-be-persisted messages from the cache cluster, but don't store them to the database. 2020-07-30 19:14:39 -04:00
Jon Chambers 7bf8650d59 Un-manage FaultTolerantRedisCluster so it shuts down at JVM shutdown instead of Jetty shutdown. 2020-07-30 18:37:38 -04:00
Ehren Kret 7cb24dd96d Add environment tag to datadog metric reporting 2020-07-30 18:04:16 -04:00
Ehren Kret dee040318a Add the host tag to datadog metric reporting 2020-07-30 18:04:16 -04:00
Jon Chambers baf563e46d Temporarily disarm the actual persisting part of the message persister. 2020-07-30 17:12:37 -04:00
Jon Chambers e10246f10b Use Dropwizard timers/histograms for persister metrics. 2020-07-30 14:27:06 -04:00
Jon Chambers a9dfd88671 Start the clustered message persister at application startup. 2020-07-30 12:32:35 -04:00
Jon Chambers beac73b6c8 Add a cluster-capable message persister 2020-07-30 11:39:14 -04:00
Jon Chambers f9f93c77e2 Use UUIDs instead of phone numbers as account identifiers in clustered message cache 2020-07-30 11:39:14 -04:00
Jon Chambers 6fc1b4c6c0 Add a cluster-backed message cache. 2020-07-30 11:39:14 -04:00
Jon Chambers 639898ec07 Expand Experiment to deal with async suppliers and Optionals. 2020-07-30 11:39:14 -04:00
Jon Chambers 3d3790fdbc Add binary execution methods to ClusterLuaScript. 2020-07-30 11:39:14 -04:00
Jon Chambers 69c8968cb0 Add byte-array-based methods to FaultTolerantRedisCluster. 2020-07-30 11:39:14 -04:00
Jon Chambers aa25fc7901 Fix UsernamesManager metric/logger names. 2020-07-29 11:00:29 -04:00
Jon Chambers 4aba493ee2 Fix the key used for database crawler workers. 2020-07-29 10:58:06 -04:00
Jon Chambers b9cfac5934 Introduce additional metric aggregators. 2020-07-28 15:11:51 -04:00
Brian Acton f8e97fcc32 revise 12 hour active user fudge to 8 hours for better continuity of data from a month ago 2020-07-28 11:09:41 -07:00
Jon Chambers 7f8f2641f6 Simplify registration lock counting by avoiding inactive accounts. 2020-07-28 11:48:20 -04:00
Jon Chambers 022dbb606f Count registration lock versions when crawling the account database. 2020-07-28 11:48:20 -04:00
Jon Chambers fea72b190d Record message content size as a dimensioned distribution. 2020-07-28 11:47:56 -04:00
Jon Chambers eea073f882 Decommission the old cache. 2020-07-28 10:29:28 -04:00
Jon Chambers fc1d88f5bb Read exclusively from the cache cluster. 2020-07-27 15:11:40 -04:00
Jon Chambers acbe410e0b Remove a metric aggregator. 2020-07-27 12:50:49 -04:00
Ehren Kret 89bafea61f Move SMS strings to configuration 2020-07-27 11:23:21 -05:00
Jon Chambers ec072fd639 Update telemetry dependencies and resolve dependency conflicts. 2020-07-24 18:59:35 -04:00
Jon Chambers 67b03076d7 Add a missing dependency. 2020-07-24 17:21:56 -04:00
Jon Chambers 33a0c4a9ae Use first party metric aggregator libraries where possible. 2020-07-24 17:21:56 -04:00
Jon Chambers 4cc5999f05 Configure additional metric aggregators. 2020-07-23 13:31:19 -04:00
Jon Chambers 403aa5fd3e Add dependencies for additional metric aggregators. 2020-07-23 13:31:19 -04:00
Jon Chambers 0fbf31ec98 Clear each cluster node individually. 2020-07-22 11:12:21 -04:00
Jon Chambers db9b7ca447 Fix slot assignment when building a cluster for tests. 2020-07-22 11:04:10 -04:00
Jon Chambers eecc71c77f
Revert batch message storage. (#95) 2020-07-20 16:28:32 -04:00
Jon Chambers 5f898a9071 Measure inserted message batch size. 2020-07-20 10:30:29 -04:00
Jon Chambers a08f21336a Be explicit about transaction management. 2020-07-20 10:30:29 -04:00
Jon Chambers 215125de26 Update tests. 2020-07-20 10:30:29 -04:00
Jon Chambers dfa94eac41 Store messages in batches. 2020-07-20 10:30:29 -04:00
Jon Chambers 247d869a5c De-randomize message tests to minimize flakiness. 2020-07-14 18:46:39 -04:00
Ehren Kret b9b6e1818f Rename SenderIdSelector to SenderIdSupplier per code review discussion 2020-07-14 10:53:48 -05:00
Ehren Kret a7968ccc3c Address code review comments 2020-07-14 10:53:48 -05:00
Ehren Kret b7e0e5a356 Create a strategy class to decide which sender id to use
The rules around selecting sender ids can get complicated with some
countries not supporting it and others requiring pre-registration that
may result in having a different sender id for that country than
others. This strategy class handles the logic of dealing with this
expanded configuration and applying the appropriate sender id or none
when it's not appropriate to do so at all.
2020-07-14 10:53:48 -05:00
Brian Acton e3aecb2aa9 apply a 12 hour fudge to daily user counting to account for last seen timestamp fuzzing 2020-07-09 17:43:12 -07:00
Jon Chambers 116ab83b95 Include a PushType header when sending APNs notifications. 2020-07-09 16:12:20 -04:00
Jon Chambers c5d0d4acd0 Revert "Move rate limiter logic to Lua scripts"
This reverts commit b585c6676d.
2020-07-09 12:30:25 -04:00
Jon Chambers 06190286ec Remove temporary circuit breaker suppression. 2020-07-07 16:33:05 -04:00
Jon Chambers 3bca856e87 Remove a pair of spurious SET calls in the rate limiter script. 2020-07-07 16:33:05 -04:00
Jon Chambers b3a778b89a Temporarily catch and log all script execution exceptions to avoid opening the breaker. 2020-07-07 15:17:25 -04:00
Jon Chambers dcb11f7606 Log errors from experiments. 2020-07-07 15:17:25 -04:00
Jon Chambers 933ce42d5a Test rate limiters against a real cluster. 2020-07-07 15:17:25 -04:00
Ehren Kret 6c1ba957bd Ensure the default alphaId configuration is an empty list rather than null 2020-07-07 10:17:40 -05:00
Ehren Kret e021286eee Add configuration by country for sending from alpha IDs 2020-07-07 10:17:40 -05:00
Ehren Kret 0ee7a66033
Keep trying ports until you get one lower than 55535 (#83)
* Keep trying ports until you get one lower than 55535

* Rename method and change to do...while

* Limit attempts to 11,000 to find an open redis cluster port
2020-07-07 10:12:31 -05:00
Jon Chambers 42c797ee97 Set the default log level for tests to WARN. 2020-07-07 11:05:39 -04:00
Jon Chambers b585c6676d
Move rate limiter logic to Lua scripts 2020-07-06 10:10:13 -04:00
Jon Chambers f5ddb0f1f8 Test ClusterLuaScript against a real Redis cluster. 2020-07-02 18:58:30 -04:00
Jon Chambers ef97f9e738 Revert "Temporarily suspend execution of the "unlock" script."
This reverts commit 6aecd8d44a.
2020-07-02 18:58:30 -04:00
Jon Chambers 26a03b55de Un-reinvent the clustered script execution wheel. 2020-07-02 18:58:30 -04:00
Jon Chambers b93a16abae Honor the step size set in the micrometer config. 2020-07-02 11:40:41 -04:00
Jon Chambers ff2783d434 Fixed a goof where we were mirroring a write to the wrong key in the new cache cluster. 2020-07-02 11:40:27 -04:00
Ehren Kret 25a5a8db68
Set avatar to null on Account when request is false (#78) 2020-06-29 15:53:31 -05:00
Jon Chambers a68d91b54c Resolve some test flakiness by adding a deterministic "wait" mechanism. (SERVER-86) 2020-06-29 12:24:25 -04:00
Jon Chambers 88ec3a5751 Add a counter for dead letter events. 2020-06-26 09:00:11 -04:00
Jon Chambers 734dc2e37a Don't block the Redis instance when clearing the cache. 2020-06-19 10:52:18 -04:00
Jon Chambers 6aecd8d44a Temporarily suspend execution of the "unlock" script. 2020-06-17 22:27:02 -04:00
Jon Chambers bbf5e1fa78 Use the UA string from websocket upgrade requests if available. 2020-06-17 15:40:18 -04:00
Jon Chambers 7454e55693 Write synchronously to the cache cluster. 2020-06-17 15:38:56 -04:00
Jon Chambers c745fe7778 Fix a poorly-mirrored cache delete operation. 2020-06-17 15:35:46 -04:00
Jon Chambers 6adcebb247 Return to just using counters instead of timers for measuring experiment outcomes. 2020-06-17 15:34:02 -04:00
Jon Chambers 38f9b8f3dd Make write operations in `AccountDatabaseCrawlerCache` synchronous. 2020-06-17 10:05:43 -04:00
Jon Chambers 7faf143a97 Subdivide the account database crawler cache experiment and add logging to track down lingering disagreements. 2020-06-17 09:23:40 -04:00
Jon Chambers 17cfd4924c Fixed a poorly-mirrored write operation to the new cluster. 2020-06-16 16:46:41 -04:00
Jon Chambers a0bebca1e6 Extend Experiment to report more detail when results don't match. 2020-06-16 16:46:41 -04:00
Jon Chambers 75cbfa2898 Mirror unlock-via-script calls to the cache cluster. 2020-06-16 16:46:41 -04:00
Jon Chambers 58a8ed1588 Add a cluster-friendly version of LuaScript. 2020-06-16 16:46:41 -04:00
Jon Chambers e032f8df59 Add a command for clearing the cache cluster. 2020-06-16 16:46:41 -04:00
Jon Chambers b16e37d80a Record a histogram of incoming message list sizes. 2020-06-12 14:43:50 -04:00
Jon Chambers c17cc07b73 Instrument BlockingThreadPoolExecutor. 2020-06-12 14:43:50 -04:00
Jon Chambers 6f767a72a7 Add a timer for the private sendMessage method. 2020-06-12 14:43:50 -04:00
Jon Chambers 11196436e9 Time rate limiter validation calls. 2020-06-12 14:43:50 -04:00
Jon Chambers 9afc433db4 Record exceptions associated with server responses. 2020-06-11 22:08:07 -04:00
Jon Chambers f701e3d834 Record distributions of timer values; stop recording error causes. 2020-06-11 11:50:36 -04:00
Jon Chambers 4c623ca3c5 Compare Redis reads using Lettuce's synchronous path. 2020-06-11 11:50:36 -04:00
Jon Chambers 0671f05c05 Introduce experiment comparison methods for suppliers. 2020-06-11 11:50:36 -04:00
Jon Chambers 0713da7393 Record experiment results with a timer instead of a counter. 2020-06-11 11:50:36 -04:00
Jon Chambers 05955d0483 Check for null header values before trying to iterate through them. 2020-06-09 15:45:32 -04:00
Jon Chambers 28c765bd9a Add an in-app-context test for websocket metrics. 2020-06-09 15:45:32 -04:00
Ehren Kret 8287317be7 Add account device ID to the prekey rate limiter
This limits prekey fetching per device on an account instead of on an
account level.
2020-06-09 10:20:10 -07:00
Jon Chambers ec858b2d4c
Set a timeout for Redis cluster operations and shut down the cluster as part of service shutdown 2020-06-07 18:27:57 -04:00
Jon Chambers 47ece983d2 Added a Redis cluster health check. 2020-06-07 18:27:11 -04:00
Jon Chambers 52310b5dd9 Compare results of reads from old and new Redis caches. 2020-06-07 18:27:11 -04:00
Jon Chambers c2a4a2778e Introduce the Experiment class to compare results from parallel systems. 2020-06-07 18:27:11 -04:00
Jon Chambers 1db5977e80 Mirror username deletes unconditionally. 2020-06-07 18:27:11 -04:00
Jon Chambers 1b5dc0e434 Fixed a potential issue where locks could get out of sync between Redis instances. 2020-06-07 18:27:11 -04:00
Moxie Marlinspike f07f02d866 Deliver upgrade link to stale clients 2020-06-06 18:20:55 -07:00
Jon Chambers 1388103919 Mirror writes to the cache cluster. 2020-06-06 20:37:48 -04:00
Jon Chambers fe1054d58a Introduce a Lettuce-based fault-tolerant Redis cluster accessor. 2020-06-06 20:37:48 -04:00
Jon Chambers ba6ac778fc Update to Pushy v0.14.1. 2020-06-05 12:21:56 -04:00
Jon Chambers 228ffcbfce Differentiate between websocket and "boring" HTTP traffic. 2020-05-28 12:52:49 -04:00
Jon Chambers f18ab9e5cc Measure traffic from websockets. 2020-05-28 12:52:49 -04:00
Jon Chambers 06c82ee87d Celebrate the diversity of UA strings when generating tags for metrics. 2020-05-27 19:35:42 -04:00
Jon Chambers 9ba5ee8043 Move UA tag extraction into its own utility class. 2020-05-27 19:35:42 -04:00
Ehren Kret eede4e50ca
Use hashed UUID to spread last seen updates over a full day (#40) 2020-05-26 13:38:52 -07:00
Jon Chambers aa10f63d9f Add the timestamp using the `add` method. 2020-05-22 17:39:25 -04:00
Jon Chambers a25af36e32 Include timestamps in all server-to-client websocket messages. 2020-05-22 15:13:39 -04:00
Jon Chambers 817f057927 Inject timestamps into responses. 2020-05-22 15:13:39 -04:00
Jon Chambers a13c44d81a Capture request-level metrics (path, status, client platform/version). 2020-05-20 17:48:19 -04:00
Jon Chambers 45ad8f8ffb Add the Wavefront/Micrometer reporter as a dependency and configure a registry. 2020-05-20 17:46:07 -04:00
Ehren Kret 7da9e88c0b Add hashKey to RemoteConfig
This allows the percentages for different entries in remote config to
be aligned so one remote config can be a subset of another.
2020-05-13 11:08:22 -07:00
Jon Chambers 1c73c91133 Report the number of days until the CDS CA cert expires as a metric so we can set an alarm. 2020-05-12 12:57:11 -04:00
Jon Chambers b1d11d4f69 Use APNs signing keys instead of expiring certificates. 2020-05-12 12:48:28 -04:00
Jon Chambers 001a9310c3
Support device transfers (SERVER-41, SERVER-42) (#32)
This change introduces a `transfer` device capability and account creation argument in support of the iOS device transfer effort.
2020-05-12 12:23:18 -04:00
Moxie Marlinspike 8ffadfa1f1 Add payment addresses on account attributes update 2020-05-07 09:52:38 -07:00
Jon Chambers 50d7929e76 Drop the GCM `RECEIPT` message type (unused). 2020-05-04 17:51:54 -04:00
Jon Chambers 10840b22c5 Don't let one unregistered device block receipt for others. 2020-05-04 17:51:25 -04:00
Jon Chambers acfbab5915 Update to Pushy v0.13.11. 2020-05-04 17:50:35 -04:00
Ehren Kret 48c324fe86 Use a static sequence of randomness in tests
The RemoteConfigControllerTest#testMath unit test would occassionally
fail because randomness doesn't necessarily group into expected ranges
over a finite trial count. This changes the test to use a predefined
PRNG sequence instead of one that varies with each test so that the
test will no long randomly fail.
2020-04-29 17:31:43 -07:00
Ehren Kret 0c495e7e72 Workaround lack of internal retry on transaction rollback
The get endpoint for key fetching can fail if the transaction cannot
complete because of simultaneous modification. Clients currently
receive 500 from this and retry if it happens, but this test case runs
into it without retrying and then complains that not all the threads
completed successfully. This workaround adds some retry attempts.
2020-04-29 17:10:13 -07:00
Ehren Kret 50ccfee201 Allow remote config to send non-boolean values
This version of remote config allows non-boolean values to be returned
to clients but unfortunately limits the configuration to only one
value or another. There is no way to configure more than two values
for the same key with this setup.
2020-04-29 10:51:10 -07:00
Moxie Marlinspike fa739c9594 Bump zkgroups to 0.7.0 2020-04-28 08:58:57 -07:00
Moxie Marlinspike 95f0ce1816 Support for advertising payment addresses on profile 2020-04-22 12:32:53 -07:00
Moxie Marlinspike a32c8fabed Temporarily move GV2 capability from allMatch to anyMatch 2020-04-20 13:42:36 -07:00
Moxie Marlinspike 6a11501184 Bump zkgroups to 0.6.0 2020-04-20 13:41:54 -07:00
Moxie Marlinspike c03fd4645d Bump zkgroups to 0.5.0 2020-04-09 20:36:34 -07:00
Moxie Marlinspike b76c7a4824 Update zkgroups to 0.4.2 2020-04-09 11:21:58 -07:00
Moxie Marlinspike 1408ac77f9 Make storageCapable a boolean result rather than an auth token 2020-04-09 10:19:49 -07:00
Ehren Kret 7e97d10ae1 Fix account dropping new style registration locks 2020-04-06 09:27:23 -07:00
Ehren Kret 56b134facd Change attachment key from long to base64 of 15 bytes 2020-04-02 10:20:42 -07:00
Ehren Kret 41286650cc Create attachments V3 endpoint for CDN2 on GCP
In preparation for resumable uploads, this creates a separate
attachment authorization endpoint that creates a signed URL for
accessing GCP Storage through Signal's CDN2. This should allow Signal
clients to do byte-level resume of media uploads.
2020-04-02 10:20:42 -07:00
Moxie Marlinspike 3c8e7c6c10 Add storage capability and return KBS creds on rereg w/ storage set 2020-03-27 10:45:48 -07:00
Moxie Marlinspike 4f64513c83 Break out redis pubsub into dedicated cluster 2020-03-16 17:44:42 -07:00
Moxie Marlinspike 350f5ccb3c Account for fronted regions 2020-03-14 19:07:42 -07:00