If a user chooses a very common password then an attacker could guess it
in relatively few attempts, circumventing the lockout.
CESG recommend blacklisting the most common passwords:
> …enforcing the requirement for complex character sets in passwords is
> not recommended. Instead, concentrate efforts on technical controls,
> especially:
>
> - defending against automated guessing attacks by either using account
> lockout, throttling, or protective monitoring
> - blacklisting the most common password choices
How I made this list:
- went to the OWASP repository of security lists:
https://github.com/danielmiessler/SecLists
- downloaded `10k_most_common.txt`, `twitter-banned.txt` and
`500-worst-passwords.txt`
- filtered out any under 8 characters:
```
sed -r '/^.{,7}$/d' passwords-twitter.txt > passwords-combined.txt
sed -r '/^.{,7}$/d' passwords-500.txt >> passwords-combined.txt
sed -r '/^.{,7}$/d' passwords.txt >> passwords-combined.txt
```
- filtered out any duplicates:
```
cat passwords-combined.txt | awk '!x[$0]++' > passwords-combined-deduped.txt
```
We require users to export their spreadsheets as CSV files before
uploading them. But this seems like the sort of thing a computer should
be able to do.
So this commit adds a wrapper class which:
- takes a the uploaded file
- returns it in a normalised format, or reads it using pyexcel[1]
- gives the data back in CSV format
This allows us to accept `.csv`, `.xlsx`, `.xls` (97 and 95), `.ods`,
`.xlsm` and `.tsv` files. We can upload the resultant CSV just like
normal, and process it for errors as before.
Testing
---
To test this I’ve added a selection of common spreadsheet files as test
data. They all contain the same data, so the tests look to see that the
resultant CSV output is the same for each.
UI changes
---
This commit doesn’t change the UI, apart from to give a different error
message if a user uploads a file type that we still don’t understand.
I intend to do this as a separate pull request, in order to fulfil
https://www.pivotaltracker.com/story/show/119371637
> If a user tries to save a template containing something like
> ((name,date)) we should give a validation error.
This is because it causes havoc with the column headers in CSV files.
https://www.pivotaltracker.com/story/show/117043389
At the moment the file contents are not persisted by checked in
memory.
The first and last three records are show if all are valid.
If there are invalid rows, they are reported and the user is
prompted to go back and sort out upload file.
The storing of upload result (i.e. validation of file) in session
will be removed in next story which is about persisting of file
for later processing.