punycode encode emails before sending

amazon SES only accepts domains encoded in punycode, an encoding that
converts utf-8 into an ascii encoding that browsers and mailservers
recognise.

We currently just send through emails as we store them (in full
unicode), which means any rogue characters break SES and cause us to
put the email in technical-failure. Most of these appear to be typos
and rogue control characters, but there is a small chance that it could
be a real domain (eg https://🅂𝖍𝐤ₛᵖ𝒓.ⓜ𝕠𝒃𝓲/🆆🆃🅵/).

We should encode to and reply-to-address emails as punycode to make
sure that they can always be sent. The chance that anyone actually uses
a unicode domain name for their email is probably pretty low, but we
should allow it for completeness.
This commit is contained in:
Leo Hemsted
2019-08-12 13:53:22 +01:00
parent 4056e335cc
commit 09d6c60ff1
2 changed files with 40 additions and 14 deletions

View File

@@ -83,7 +83,7 @@ class AwsSesClient(EmailClient):
response = self._client.send_email(
Source=source,
Destination={
'ToAddresses': to_addresses,
'ToAddresses': [punycode_encode_email(addr) for addr in to_addresses],
'CcAddresses': [],
'BccAddresses': []
},
@@ -93,7 +93,7 @@ class AwsSesClient(EmailClient):
},
'Body': body
},
ReplyToAddresses=reply_to_addresses
ReplyToAddresses=[punycode_encode_email(addr) for addr in reply_to_addresses]
)
except botocore.exceptions.ClientError as e:
self.statsd_client.incr("clients.ses.error")
@@ -116,3 +116,9 @@ class AwsSesClient(EmailClient):
self.statsd_client.timing("clients.ses.request-time", elapsed_time)
self.statsd_client.incr("clients.ses.success")
return response['MessageId']
def punycode_encode_email(email_address):
# only the hostname should ever be punycode encoded.
local, hostname = email_address.split('@')
return '{}@{}'.format(local, hostname.encode('idna').decode('utf-8'))