The assumption was that S3 would throw an exception if the object was uploaded twice. That's not the case the default behaviour is that if a file already exists it will be overwritten. So it is completely safe to run the task from the alert.
It can also mean that we don't need to wait 4hours 15 minutes. Shall I decease the amount of time before restarting the task?
When we upload a CSV for a job, we add the sender_id as metadata to the file that is uploaded on S3.
There is more than one place where we process rows from that CSV.
- process_job
- scheduled_job
- check_for_missing_rows_in_completed_jobs
- check_job_status
All of these places need to use the sender_id, now the sender_id is always read from the file metadata.
In a subsequent PR we can remove the optional sender_id parameter from process_job task.
The group by for the query was wrong which would result in 2 rows with different totals but the same unique key, so the second row would update the first row. Meaning we had incorrect numbers for the billing data.
Because some of the data had null for the sent_by column, the select would turn the Null --> dvla, but that same function was not used in the group by. So any time we had missing sent_by data we would end up with 2 rows where one would overwrite the other.
This is only necessary because there is currently a job that is old, but had 1 row created a couple days later. So now there is 1 notifications for the job where the rest have been purged.
Sometimes a job finishes but has missed a row in the middle. It is a mystery why this is happening, it could be that the task to save the notifications has been dropped.
So until we solve the missing let's find missing rows and process them.
A new scheduled task has been added to find any "finished" jobs that do not have enough notifications created. If there are missing notifications the job processes those rows for the job.
Adding the new task to beat schedule will be done in the next commit.
A unique key constraint has been added to Notifications to ensure that the row is not added twice. Any index or constraint can affect performance, but this unique constraint should not affect it enough for us to notice.
Also use this metadata to decide whether preview pages need
overlay or not. So far we have always added overlay when validation
has failed. Now we will only show it when validation failed due to
content being outside of printable area.
We want to pass the `request_id` to Celery tasks if the task is called
from an HTTP request, so that we can add the `request_id` to the logs.
This change overwrites `apply_async` to add the `request_id` to the
kwargs if available. When we call the task, we then add the `request_id`
to g on Flask's application context.
Tasks called from `send_task` won't have a `request_id` for now, and
this change only affects tasks called from HTTP requests (not from other
tasks or from Celery Beat).
we don't use it since we wrote our own provider stubs for performance
tests.
this removes it from the api - it's still in the DB and will be
retrieved by queries, but is set to disabled on prod
The nightly job to delete email notifications was failing because it was
timing out (`psycopg2.errors.QueryCanceled: canceling statement due to statement timeout`).
This adds a query limit to the query which inserts or updates
notification history so that it only updates a maximum of 10000 rows at
a time.
We have a table for datetime dimension that we no longer use and
I believe can be dropped.
First step is to remove the model and release the change. The next
step will be to then add a database migration to drop the actual
table. I believe we need to do it in this order and it can't
be done as a single PR.
All our endpoint should perform a check that the params are valid - this is an easy whay to check that and is standard for our endpoints.
I reverted the query to just filter by job id.
When a service asks for branding I often go and:
- set the branding on the organisation as well
- set the branding on all the organisation’s services
The latter can be quite time consuming, but it does save effort since
existing services from the same organisation won’t have to request
branding. It also improves the consistency of the communications that
users are receiving.
This commit automates that process by applying the branding update to
any services belonging to the organisation, if those services don’t
already have their own custom branding set up.
Now we consistently use the created_at date, so we can always get the right file location and name.
The previous updates to this code were trying to solve the problem if a pdf being created at 17:29, but not ready to upload until 17:31 after the antivirus and validation check.
But in those cases we would have trouble finding the file.
When we cancel a job, we need to check if all notifications are
already in the database. So far, we were querying for all
notification objects in the database and counting them in
admin app, which runs into pagination problems for large jobs,
and could time out for very large jobs.