At the moment the first AJAX call is triggered as soon as the page
loads. We then look at its response time to work out how long to wait
until making the next call.
This isn’t great because:
- stuff is unlikely to have changed straight away, so it’s a waste of a
call
- while the DOM is being updated it seems to introduce a delay in
clicks on links, which is either more pronounced or more noticeable
when it’s happening straight away, making the UI feel less snappy
I chose a value of 2 seconds as a rough proxy for the minimum time we’d
expect to see a notification go from created to delivered. Median
time-to-delivered was 2.9 seconds when we analysed it for
https://github.com/alphagov/notifications-admin/pull/2974#discussion_r286101286
By default our AJAX calls were 2 seconds. Then they were 5 seconds
because someone reckoned 2 seconds was putting too much load on the
system. Then we made them 10 seconds while we were having an incident.
Then we made them 20 seconds for the heaviest pages, but back to 5
seconds or 2 seconds for the rest of the pages.
This is not a good situation because:
- it slows all services down equally, no matter how much traffic they
have, or which features they have switched on
- it slows everything down by the same amount, no matter how much load
the platform is under
- the values are set based on our worst performance, until we manually
remember to switch them back
- we spend time during incidents deploying changes to slow down the
dashboard refresh time because it’s a nothing-to-lose change that
might relieve some symptoms, when we could be spending time digging
into the underlying cause
This pull request makes the Javascript smarter about how long it waits
until it makes another AJAX call. It bases the delay on how long the
server takes to respond (as a proxy for how much load the server is
under).
It’s based on the square root of the response time, so is more sensitive
to slow downs early on, and less sensitive to slow downs later on. This
helps us give a more pronounced difference in delay between an AJAX call
that is fast (for example the page for a single notification) and one
that is slow (for example a dashboard for a service with lots of
traffic).
*Some examples of what this would mean for various pages*
Page | Response time | Wait until next AJAX call
---|---|---
Check a reply to address | 130ms | 1,850ms
Brand new service dashboard | 229ms | 2,783ms
HM Passport Office dashboard | 634ms | 5,294ms
NHS Coronavirus Service dashboard | 779ms | 5,977ms
_Example of the kind of slowness we’ve seen during an incident_ | 6,000ms | 18,364ms
GOV.UK email dashboard | `HTTP 504` | 😬
Includes:
- make 'remove team member' link, on edit member
permissions page, destructive
- convert missed links on /features pages
- convert missed links on /using-notify/guidance and sub pages
- give links in browse-lists back their size and
weight (needed for lists of live and trial
services on Platform Admin)
- give links on Platform Admin inbound numbers
page back their size and weight
- update links in JS tests
Includes:
- make 'remove team member' link, on edit member
permissions page, destructive
- convert missed links on /features pages
- convert missed links on /using-notify/guidance and sub pages
- give links in browse-lists back their size and
weight (needed for lists of live and trial
services on Platform Admin)
- give links on Platform Admin inbound numbers
page back their size and weight
- update links in JS tests