Commit Graph

54 Commits

Author SHA1 Message Date
Rebecca Law
28e78780d0 Added more logging for provider tasks. 2018-03-26 09:31:52 +01:00
Rebecca Law
cd2d85f2a3 Updates after code review.
- Remove print
- Update exception message.
2018-03-19 14:08:38 +00:00
Rebecca Law
0dc50190b2 Throw an exception whenever we updated a notification to technical failure.
If this is happening we want to know about it.
2018-03-16 17:18:44 +00:00
Richard Chapman
d855b4e4ec Removed statsd from the api and use the statsd in the utils library.
The statsd code was added to the utils library a while ago, uses the
statsd from the util library and therefore consolidates the code into
once place.
2018-02-06 09:52:15 +00:00
Leo Hemsted
28d5f9b87f flake8 - remove unused imports and ensure they're always at the top of the file 2017-11-28 14:28:01 +00:00
Richard Chapman
cc4d022213 Adding extra logging to celery tasks ans gunicorn, specifically log on SIGTERM and SIGINIT so that we can track better when an app restarts and why it restarts e.g. when it restarts after another signal. 2017-10-12 11:39:21 +01:00
Leo Hemsted
6c61a3fc2a Revert celery4
Revert the following three pull requests:
https://github.com/alphagov/notifications-api/pull/1085
https://github.com/alphagov/notifications-api/pull/1086
https://github.com/alphagov/notifications-api/pull/1088

celery 4.0.2 looked promising, however, on staging under mild load
(5/sec api calls) the performance was actually worse than 3.1.25
2017-07-19 15:17:19 +01:00
Martyn Inglis
786adb5d71 Move Queuenames in with the celery code, revamp config to allow move to celery 4.x 2017-07-12 12:01:52 +01:00
Martyn Inglis
4768f0b9fd Change retries policy.
Before we had a long back off, now we have more, but shorter backoffs.

- PREVIOUS
When we had an error talking to a provider we retried quickly and if we still got errors we backed off more and more. Maximum attempts was 5, max delay 4hours. This was to allow us time to ship a build if that was required.

- NOW
Backing off 48 times of 5 minutes each. This gives us the same total backoff, but many more tries in that period.

- WHY
Having the long back off meant messages could be delayed 4 hours. This was happening more and more, as PaaS deploys can place things into the "inflight" state in SQS. The inflight state MUST have an expiry time LONGER than the maximum retry back off. This meant that messages would be delayed 4 hours, even when there was no app error.

By doing this we can reduce this delay to 5 minutes. Whilst still giving us time to fix issues.
2017-05-25 11:12:40 +01:00
Martyn Inglis
2591d3a1df This massive set of changes uses the new queue names object throughout the app and tests.
Lots of changes, all changing the line of code that puts things into queues, and the code that tests that.
2017-05-25 10:51:49 +01:00
Imdad Ahad
b5d4acb758 Make message more accurate 2017-04-27 16:58:00 +01:00
Leo Hemsted
0136e1e32d fix invalid logging
the first argument to ANY logger.____ function is ALWAYS cast to a
string and used as a format argument for ALL remaining arguments
using %s formatting. even `logger.exception`, which just logs as
normal and then appends the stack trace.

so we shouldn't be passing `e` into logger.exception - just
`logger.exception('something went wrong!')`

also de-duplicated a test
2016-12-19 17:13:10 +00:00
Leo Hemsted
a2c3d265de remove unused former send_sms_to_provider and send_sms_to_email functions
they were superceded by deliver_sms and deliver_email in the same file 3 wks ago
2016-10-13 15:53:01 +01:00
Leo Hemsted
a095aa41f3 don't retry task if InvalidEmailError
just record it as a technical error - retrying wont fix a bad email
2016-10-13 15:27:47 +01:00
Martyn Inglis
376f8355cb Updated clients to have a more robust error handling
- fire text and omg much more similar. Ready to be combined.
- Error handling now for JSON valid responses
2016-09-22 17:18:05 +01:00
Martyn Inglis
59ab5da5d3 Handle errors where the notification isn't found when executing the tasks
- Thows a NoResultFound sqlalchemy exception
- Which causes a retry. This means we give it a few goes (5, max 5 hours)  for the notification to appear.
- Should never happen, only if we get some task overlaps that are unusual that leads to tasks executed in an overlapping nature.
2016-09-22 09:52:23 +01:00
Martyn Inglis
2f36e0dbcf Refactor to make the new send_to_provider methods take a notification not a notification ID.
- Driven by the fact we won't know the type in the API call
- hence we need to load notification earlier , so pass it not the id through to the send task to avoid loading it twice.
2016-09-22 09:16:58 +01:00
Martyn Inglis
991cc884b4 Refactoring the send to provider code out of the tasks folder
- building a rest endpoint to call that code to compliment the existing task based approach.
2016-09-21 13:27:32 +01:00
Martyn Inglis
bd06b18684 Refactor of celery tasks.
- Aim to move the code that contacts providers into it's own module.
- Celery tasks now call this module to send to provider
- No exceptions caught in the new module. Celery tasks now use any exception to trigger a retry.
- tests moved about - new test directory for the new class, all tests from celery test module moved, excepting the retry logic.
2016-09-20 17:24:28 +01:00
Leo Hemsted
4f0c5fdd9e Merge pull request #623 from alphagov/test-fixes
Test fixes
2016-08-30 10:38:53 +01:00
minglis
287d32c037 Merge pull request #638 from alphagov/remove-contested-writes
Remove contested writes
2016-08-30 08:56:45 +01:00
Chris Hill-Scott
f7391da350 Don’t prefix text messages is sender name is set
Implements:
- [x] https://github.com/alphagov/notifications-utils/pull/66
2016-08-26 14:45:01 +01:00
Martyn Inglis
893164ae40 Removed contented updates the notifications stats table
- As before this is now driven from the notifications history table

- Removed from updates and create
- Signatures changes to removed unused params hits many files
- Also potential issue around rate limiting - we used to get the number sent per day from the stats table - which was a single row lookup, now we have to count this. This applies to EVERY API CALL. Probably not a good thing and should be addressed urgently.
2016-08-25 11:55:38 +01:00
Martyn Inglis
708f566c24 Removed updates to the provider stats table
- again these new come from the notifications history table
- We update this when we sent a notification, so removed from celery tasks
- tests removed also
2016-08-25 10:33:12 +01:00
Leo Hemsted
26d7675baa pep8 fixes
no idea why the build/local pep8s weren't picking them up before.

also excluded import order pep8
2016-08-23 12:05:47 +01:00
minglis
7fab5ada38 Merge pull request #596 from alphagov/extend-timeout-on-db-retries
Improve logging and extend time periods associated with retrying
2016-08-17 09:24:49 +01:00
Martyn Inglis
67a4ee7d51 In event of a task retry ensure the log message is identifier as such
- "RETRY" prefixes the messages

In event of the retry attempts completing without successfully completing the task identify message as such

- "RETRY FAILED" prefixes the messages

Applies to the send_sms|send_email and send_sms_to_provider|send_email_to_provider tasks

These are there to try and ensure we can alert on these events so that we know if we have started retrying messages

Retry messages also contain notification ids to aid debugging.
2016-08-10 08:53:15 +01:00
Leo Hemsted
e88624ed4a attempt to pull logos from the admin app's static images directory
(this is configured by a config value)
2016-08-09 16:03:04 +01:00
Leo Hemsted
ebb8947290 look at service's organisation for branding to pass through to email renderer 2016-08-09 16:03:04 +01:00
minglis
d67e43a6d0 Merge pull request #583 from alphagov/stats-db-updates
Stats db updates
2016-08-08 11:48:01 +01:00
Martyn Inglis
fe54fa9f73 Final pass through existing statsd endpoints to ensure they match new naming strategy.
Updates accordingly.
2016-08-08 11:23:58 +01:00
Martyn Inglis
f223446f73 Refactor statsd logging
Removed all existing statsd logging and replaced with:

- statsd decorator. Infers the stat name from the decorated function call. Delegates statsd call to statsd client. Calls incr and timing for each decorated method. This is applied to all tasks and all dao methods that touch the notifications/notification_history tables

- statsd client changed to prefix all stats with "notification.api."

- Relies on https://github.com/alphagov/notifications-utils/pull/61 for request logging. Once integrated we pass the statsd client to the logger, allowing us to statsd all API calls. This passes in the start time and the method to be called (NOT the url) onto the global flask object. We then construct statsd counters and timers in the following way

	notifications.api.POST.notifications.send_notification.200

This should allow us to aggregate to the level of

	- API or ADMIN
	- POST or GET etc
	- modules
	- methods
	- status codes

Finally we count the callbacks received from 3rd parties to mapped status.
2016-08-05 10:44:43 +01:00
Leo Hemsted
143cfb526c change update_provider_stats to use billable_units
updated tests etc, and removed some old tests that are no longer relevant
2016-08-03 17:22:20 +01:00
Leo Hemsted
527a5c4eaa calculate billable units when sending an sms
don't calculate it if we're in research mode
* added tests to prove this
* removed last code referring to content_char_count
2016-08-03 16:46:05 +01:00
Adam Shimali
cfd42a8a05 Statsd timings were being passed a timedelta instead for float for
milliseconds.

Bugfix for https://www.pivotaltracker.com/story/show/126852733
2016-07-22 12:13:24 +01:00
Chris Hill-Scott
824085ead8 Bring in changes to template and CSV processing
Functional changes:
- adds the blue bar

Performance changes
- faster CSV processing

Depends on:

- [ ] https://github.com/alphagov/notifications-utils/pull/47
- [ ] https://github.com/alphagov/notifications-utils/pull/48

Also brings in some breaking changes, which do not affect utils (apart
from a weird import).
2016-07-07 15:44:30 +01:00
Leo Hemsted
96513442fe notifications from test api keys dont send for real - a la research_mode
also added a bunch more tests around api keys to help ensure we're behaving in at least a consistent way
2016-07-05 16:29:53 +01:00
Leo Hemsted
242be97bfa remove reply_to_addresses from task kwargs
also added a test for aws_ses.send_email to prove it handles reply_to_address correctly
2016-07-04 17:29:41 +01:00
Leo Hemsted
f5e14f043d dont pass reply-to-addresses about
dont send reply_to_addresses around from process_job and send_email -
take it from the service in send_email_to_provider. also clean up
the kwarg in aws_ses.send_email to more accurately reflect what we
might pass in
2016-07-04 15:04:43 +01:00
Adam Shimali
b32f0ab2cd Resolve conflicts 2016-07-01 16:53:12 +01:00
Adam Shimali
c29dd23702 Add sms sender to service to be used in sms templates
in place of default numeric short code.

If not present default short code is used.
2016-07-01 15:27:54 +01:00
Rebecca Law
3aff22cf87 Improve the logging message 2016-07-01 14:42:40 +01:00
Rebecca Law
f52755742c Split send_email task into one task to create the notification and one to send it to the provider.
Is there is an exception the task will go to the retry queue.
2016-07-01 14:14:28 +01:00
Rebecca Law
3f11447bc8 A small refactor to use the SMS_TYPE and EMAIL_TYPE in code rather that 'sms' or 'email' 2016-06-30 15:41:51 +01:00
Rebecca Law
25db1bce74 Use the notification types enum for the notifications.notification_type.
Reuse EMAIL_TYPE in template_types and notification_types.
2016-06-29 11:50:54 +01:00
Rebecca Law
8a0211b3eb Only send to the provider if the notification has a created status.
If the notification ends up in the retry queue and the delivery app is restarted the notification will get sent twice.
This is because when the app is restarted another message will be in the retry queue as message available which is a
duplicate of the one in the queue that is a  message in flight.

This should complete https://www.pivotaltracker.com/story/show/121844427
2016-06-27 14:47:20 +01:00
Rebecca Law
fcb5ca9ef4 The send_sms task will created the notification with a status = created.
The encrypted_notification for the send_sms_to_provider task has been made optional.
2016-06-22 13:32:27 +01:00
Rebecca Law
3d3bff25a8 [WIP] updating notification to start in the created status 2016-06-21 15:51:30 +01:00
Martyn Inglis
51c6d57a86 Changed the delay period. Now waits
10 seconds, 1 minute, 5 minutes, 1 hour and 4 hours.

Total elapsed wait is max 5 hours 6 minutes and 10 seconds.

Changed visibility window of SQS to be 4 hours 10 seconds, longer the max retry period.
2016-06-17 16:32:56 +01:00
Martyn Inglis
f143aade71 Added new tests/code to ensure failure stats recorded 2016-06-13 16:40:46 +01:00