app/aws/s3.py

import os

import botocore
from boto3 import Session, client
from flask import current_app

FILE_LOCATION_STRUCTURE = 'service-{}-notify/{}.csv'

default_access_key = os.environ.get('AWS_ACCESS_KEY_ID')
default_secret_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
default_region = os.environ.get('AWS_REGION')

def get_s3_file(bucket_name, file_location, access_key=default_access_key, secret_key=default_secret_key, region=default_region):
    s3_file = get_s3_object(bucket_name, file_location, access_key, secret_key, region)
    return s3_file.get()['Body'].read().decode('utf-8')


def get_s3_object(bucket_name, file_location, access_key=default_access_key, secret_key=default_secret_key, region=default_region):
    session = Session(aws_access_key_id=access_key, aws_secret_access_key=secret_key, region_name=region)
    s3 = session.resource('s3')
    return s3.Object(bucket_name, file_location)


def file_exists(bucket_name, file_location, access_key=default_access_key, secret_key=default_secret_key, region=default_region):
    try:
        # try and access metadata of object
        get_s3_object(bucket_name, file_location, access_key, secret_key, region).metadata
        return True
    except botocore.exceptions.ClientError as e:
        if e.response['ResponseMetadata']['HTTPStatusCode'] == 404:
            return False
        raise


def get_job_location(service_id, job_id):
    return (
        current_app.config['CSV_UPLOAD_BUCKET_NAME'],
        FILE_LOCATION_STRUCTURE.format(service_id, job_id),
        current_app.config['CSV_UPLOAD_ACCESS_KEY'],
        current_app.config['CSV_UPLOAD_SECRET_KEY'],
        current_app.config['CSV_UPLOAD_REGION'],
    )


def get_contact_list_location(service_id, contact_list_id):
    return (
        current_app.config['CONTACT_LIST_BUCKET_NAME'],
        FILE_LOCATION_STRUCTURE.format(service_id, contact_list_id),
        current_app.config['CONTACT_LIST_ACCESS_KEY'],
        current_app.config['CONTACT_LIST_SECRET_KEY'],
        current_app.config['CONTACT_LIST_REGION'],
    )


def get_job_and_metadata_from_s3(service_id, job_id):
    obj = get_s3_object(*get_job_location(service_id, job_id))
    return obj.get()['Body'].read().decode('utf-8'), obj.get()['Metadata']


def get_job_from_s3(service_id, job_id):
    obj = get_s3_object(*get_job_location(service_id, job_id))
    return obj.get()['Body'].read().decode('utf-8')


def get_job_metadata_from_s3(service_id, job_id):
    obj = get_s3_object(*get_job_location(service_id, job_id))
    return obj.get()['Metadata']


def remove_job_from_s3(service_id, job_id):
    return remove_s3_object(*get_job_location(service_id, job_id))


def remove_contact_list_from_s3(service_id, contact_list_id):
    return remove_s3_object(*get_contact_list_location(service_id, contact_list_id))


def remove_s3_object(bucket_name, object_key, access_key, secret_key, region):
    obj = get_s3_object(bucket_name, object_key, access_key, secret_key, region)
    return obj.delete()


def get_list_of_files_by_suffix(
    bucket_name,
    subfolder='',
    suffix='',
    last_modified=None,
    access_key=default_access_key,
    secret_key=default_secret_key,
    region=default_region
):
    s3_client = client('s3', region, aws_access_key_id=access_key, aws_secret_access_key=secret_key)
    paginator = s3_client.get_paginator('list_objects_v2')

    page_iterator = paginator.paginate(
        Bucket=bucket_name,
        Prefix=subfolder
    )

    for page in page_iterator:
        for obj in page.get('Contents', []):
            key = obj['Key']
            if key.lower().endswith(suffix.lower()):
                if not last_modified or obj['LastModified'] >= last_modified:
                    yield key
Use correct access credentials for each bucket 2022-09-21 15:02:43 -04:00			`import os`

create fake letter response files with variables timestamps when a test letter is created on dev or preview, we upload a file to the dvla ftp response bucket, to test that our integration with s3 works. s3 triggers an sns notification, which we pick up, and then we download the file and mark the letters it mentions as delivered. However, if two tests run at the same time, they'll create the same file on s3. One will just overwrite the next, and the first letter will never move into delivered - this was causing functional tests to intermittently fail. This commit makes the test letter task check if the file exists - if it does, it moves back one second and tries again. It tries this thirty times before giving up. 2018-07-12 16:53:10 +01:00			`import botocore`
Use correct access credentials for each bucket 2022-09-21 15:02:43 -04:00			`from boto3 import Session, client`
Run auto-correct on app/ and tests/ 2021-03-10 13:55:06 +00:00			`from flask import current_app`
Add methods to get and remove s3 bucket objects 2017-06-12 15:55:05 +01:00
Updated to retrieve csv upload from new bucket. Fix test errors. 2016-04-07 13:44:04 +01:00			`FILE_LOCATION_STRUCTURE = 'service-{}-notify/{}.csv'`
Move job processing into celery - brings boto S3 into new AWS folder - CSV processing utils method Rejigs the jobs rest endpoint - removes some now unused endpoints, Calls to the task with the job, job processing in task, delegating SMS calls to the sms task 2016-02-24 17:12:30 +00:00
Use correct access credentials for each bucket 2022-09-21 15:02:43 -04:00			`default_access_key = os.environ.get('AWS_ACCESS_KEY_ID')`
			`default_secret_key = os.environ.get('AWS_SECRET_ACCESS_KEY')`
Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`default_region = os.environ.get('AWS_REGION')`
Updated to retrieve csv upload from new bucket. Fix test errors. 2016-04-07 13:44:04 +01:00
Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`def get_s3_file(bucket_name, file_location, access_key=default_access_key, secret_key=default_secret_key, region=default_region):`
			`s3_file = get_s3_object(bucket_name, file_location, access_key, secret_key, region)`
Add get_s3_file method for use in DVLA processing 2017-05-12 17:39:15 +01:00			`return s3_file.get()['Body'].read().decode('utf-8')`


Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`def get_s3_object(bucket_name, file_location, access_key=default_access_key, secret_key=default_secret_key, region=default_region):`
			`session = Session(aws_access_key_id=access_key, aws_secret_access_key=secret_key, region_name=region)`
Use correct access credentials for each bucket 2022-09-21 15:02:43 -04:00			`s3 = session.resource('s3')`
Revert "Process SNS request triggered by a DVLA S3 update" 2017-05-12 17:21:07 +01:00			`return s3.Object(bucket_name, file_location)`
Remove csv after process job is finished. Fixed new tests. 2016-04-05 14:28:19 +01:00

Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`def file_exists(bucket_name, file_location, access_key=default_access_key, secret_key=default_secret_key, region=default_region):`
create fake letter response files with variables timestamps when a test letter is created on dev or preview, we upload a file to the dvla ftp response bucket, to test that our integration with s3 works. s3 triggers an sns notification, which we pick up, and then we download the file and mark the letters it mentions as delivered. However, if two tests run at the same time, they'll create the same file on s3. One will just overwrite the next, and the first letter will never move into delivered - this was causing functional tests to intermittently fail. This commit makes the test letter task check if the file exists - if it does, it moves back one second and tries again. It tries this thirty times before giving up. 2018-07-12 16:53:10 +01:00			`try:`
			`# try and access metadata of object`
Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`get_s3_object(bucket_name, file_location, access_key, secret_key, region).metadata`
create fake letter response files with variables timestamps when a test letter is created on dev or preview, we upload a file to the dvla ftp response bucket, to test that our integration with s3 works. s3 triggers an sns notification, which we pick up, and then we download the file and mark the letters it mentions as delivered. However, if two tests run at the same time, they'll create the same file on s3. One will just overwrite the next, and the first letter will never move into delivered - this was causing functional tests to intermittently fail. This commit makes the test letter task check if the file exists - if it does, it moves back one second and tries again. It tries this thirty times before giving up. 2018-07-12 16:53:10 +01:00			`return True`
			`except botocore.exceptions.ClientError as e:`
			`if e.response['ResponseMetadata']['HTTPStatusCode'] == 404:`
			`return False`
			`raise`


Read job metadata from S3 metadata All of our uploads now have the metadata about the job set on them in S3. So this commit moves to using that metadata, if it’s there, instead of the data in the body of the post request. The aim of this is to stop the admin app having to post this data, which means that it won’t have to keep this data in the session for the while doing the file upload flow. 2018-04-30 11:47:13 +01:00			`def get_job_location(service_id, job_id):`
			`return (`
			`current_app.config['CSV_UPLOAD_BUCKET_NAME'],`
			`FILE_LOCATION_STRUCTURE.format(service_id, job_id),`
Use correct access credentials for each bucket 2022-09-21 15:02:43 -04:00			`current_app.config['CSV_UPLOAD_ACCESS_KEY'],`
			`current_app.config['CSV_UPLOAD_SECRET_KEY'],`
Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`current_app.config['CSV_UPLOAD_REGION'],`
Read job metadata from S3 metadata All of our uploads now have the metadata about the job set on them in S3. So this commit moves to using that metadata, if it’s there, instead of the data in the body of the post request. The aim of this is to stop the admin app having to post this data, which means that it won’t have to keep this data in the session for the while doing the file upload flow. 2018-04-30 11:47:13 +01:00			`)`


Remove the list from S3 once we don’t need it Once a contact list is gone from the database there’s no way to reference it again. Any jobs have made their own copy. So we can clean it up, meaning we’re not storing personal data longer than we need to. 2020-03-25 16:36:27 +00:00			`def get_contact_list_location(service_id, contact_list_id):`
			`return (`
			`current_app.config['CONTACT_LIST_BUCKET_NAME'],`
			`FILE_LOCATION_STRUCTURE.format(service_id, contact_list_id),`
Use correct access credentials for each bucket 2022-09-21 15:02:43 -04:00			`current_app.config['CONTACT_LIST_ACCESS_KEY'],`
			`current_app.config['CONTACT_LIST_SECRET_KEY'],`
Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`current_app.config['CONTACT_LIST_REGION'],`
Remove the list from S3 once we don’t need it Once a contact list is gone from the database there’s no way to reference it again. Any jobs have made their own copy. So we can clean it up, meaning we’re not storing personal data longer than we need to. 2020-03-25 16:36:27 +00:00			`)`


[WIP] 2019-11-08 10:30:26 +00:00			`def get_job_and_metadata_from_s3(service_id, job_id):`
			`obj = get_s3_object(*get_job_location(service_id, job_id))`
			`return obj.get()['Body'].read().decode('utf-8'), obj.get()['Metadata']`


Updated to retrieve csv upload from new bucket. Fix test errors. 2016-04-07 13:44:04 +01:00			`def get_job_from_s3(service_id, job_id):`
Read job metadata from S3 metadata All of our uploads now have the metadata about the job set on them in S3. So this commit moves to using that metadata, if it’s there, instead of the data in the body of the post request. The aim of this is to stop the admin app having to post this data, which means that it won’t have to keep this data in the session for the while doing the file upload flow. 2018-04-30 11:47:13 +01:00			`obj = get_s3_object(*get_job_location(service_id, job_id))`
Revert "Process SNS request triggered by a DVLA S3 update" 2017-05-12 17:21:07 +01:00			`return obj.get()['Body'].read().decode('utf-8')`
Remove csv after process job is finished. Fixed new tests. 2016-04-05 14:28:19 +01:00

Read job metadata from S3 metadata All of our uploads now have the metadata about the job set on them in S3. So this commit moves to using that metadata, if it’s there, instead of the data in the body of the post request. The aim of this is to stop the admin app having to post this data, which means that it won’t have to keep this data in the session for the while doing the file upload flow. 2018-04-30 11:47:13 +01:00			`def get_job_metadata_from_s3(service_id, job_id):`
			`obj = get_s3_object(*get_job_location(service_id, job_id))`
			`return obj.get()['Metadata']`


Updated to retrieve csv upload from new bucket. Fix test errors. 2016-04-07 13:44:04 +01:00			`def remove_job_from_s3(service_id, job_id):`
Read job metadata from S3 metadata All of our uploads now have the metadata about the job set on them in S3. So this commit moves to using that metadata, if it’s there, instead of the data in the body of the post request. The aim of this is to stop the admin app having to post this data, which means that it won’t have to keep this data in the session for the while doing the file upload flow. 2018-04-30 11:47:13 +01:00			`return remove_s3_object(*get_job_location(service_id, job_id))`
Add methods to get and remove s3 bucket objects 2017-06-12 15:55:05 +01:00

Remove the list from S3 once we don’t need it Once a contact list is gone from the database there’s no way to reference it again. Any jobs have made their own copy. So we can clean it up, meaning we’re not storing personal data longer than we need to. 2020-03-25 16:36:27 +00:00			`def remove_contact_list_from_s3(service_id, contact_list_id):`
			`return remove_s3_object(*get_contact_list_location(service_id, contact_list_id))`


Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`def remove_s3_object(bucket_name, object_key, access_key, secret_key, region):`
			`obj = get_s3_object(bucket_name, object_key, access_key, secret_key, region)`
Revert "Process SNS request triggered by a DVLA S3 update" 2017-05-12 17:21:07 +01:00			`return obj.delete()`
Add s3 method to remove transformed dvla files 2017-06-07 16:31:14 +01:00

Proactively specify aws region for s3 operations 2022-09-26 10:56:59 -04:00			`def get_list_of_files_by_suffix(`
			`bucket_name,`
			`subfolder='',`
			`suffix='',`
			`last_modified=None,`
			`access_key=default_access_key,`
			`secret_key=default_secret_key,`
			`region=default_region`
			`):`
			`s3_client = client('s3', region, aws_access_key_id=access_key, aws_secret_access_key=secret_key)`
Added process for dvla acknowledgement file Daily schedule task to check ack file against zip file lists if we haven't receive ack for a zip file, raise a 500 exception 2018-01-12 15:10:42 +00:00			`paginator = s3_client.get_paginator('list_objects_v2')`

			`page_iterator = paginator.paginate(`
			`Bucket=bucket_name,`
			`Prefix=subfolder`
			`)`

			`for page in page_iterator:`
Using key query rather than try catch Added a unit test for empty content 2018-01-17 13:51:57 +00:00			`for obj in page.get('Contents', []):`
Parse acknowledgement files against .ZIP.TXT created by ftp app. - Also convert the files info to upper() for comparison rather than lower because original file names are in upper case. The unit tests contain examples of the returned lists. 2018-01-18 10:44:36 +00:00			`key = obj['Key']`
			`if key.lower().endswith(suffix.lower()):`
Using key query rather than try catch Added a unit test for empty content 2018-01-17 13:51:57 +00:00			`if not last_modified or obj['LastModified'] >= last_modified:`
			`yield key`