From bce064fa0cc735b8b9613a38da4885bbbfaeb9b5 Mon Sep 17 00:00:00 2001 From: John Skiles Skinner Date: Mon, 22 Jul 2024 15:31:01 -0700 Subject: [PATCH 1/4] Start troubleshooting section, add errors seen in sandboxing --- docs/all.md | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/docs/all.md b/docs/all.md index 0c9aa6af7..2543a1515 100644 --- a/docs/all.md +++ b/docs/all.md @@ -1327,11 +1327,12 @@ Seven (7) days by default. Each service can be set with a custom policy via `Ser Data cleanup is controlled by several tasks in the `nightly_tasks.py` file, kicked off by Celery Beat. +# Troubleshooting -# Debug messages not being sent +## Debug messages not being sent -## Getting the file location and tracing what happens +### Getting the file location and tracing what happens Ask the user to provide the csv file name. Either the csv file they uploaded, or the one that is autogenerated when they do a one-off send and is visible in the UI @@ -1340,7 +1341,7 @@ Starting with the admin logs, search for this file name. When you find it, the In the api logs, search by job_id. Either you will see evidence of the job failing and retrying over and over (in which case search for a stack trace using timestamp), or you will ultimately get to a log line that links the job_id to a message_id. In this case, now search by message_id. You should be able to find the actual result from AWS, either success or failure, with hopefully some helpful info. -## Viewing the csv file +### Viewing the csv file If you need to view the questionable csv file on production, run the following command: @@ -1355,7 +1356,7 @@ locally, just do: poetry run flask command download-csv-file-by-name -f ``` -## Debug steps +### Debug steps 1. Either send a message and capture the csv file name, or get a csv file name from a user 2. Using the log tool at logs.fr.cloud.gov, use filters to limit what you're searching on (cf.app is 'notify-admin-production' for example) and then search with the csv file name in double quotes over the relevant time period (last 5 minutes if you just sent a message, or else whatever time the user sent at) @@ -1363,3 +1364,28 @@ poetry run flask command download-csv-file-by-name -f Date: Mon, 22 Jul 2024 15:41:38 -0700 Subject: [PATCH 2/4] Notes about push command, URLs for Sandbox --- docs/all.md | 7 +++++-- terraform/README.md | 2 ++ 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/all.md b/docs/all.md index 2543a1515..45d85bbf7 100644 --- a/docs/all.md +++ b/docs/all.md @@ -449,7 +449,10 @@ If this is the first time you have used Terraform in this repository, you will f ``` cf push --vars-file deploy-config/sandbox.yml --var NEW_RELIC_LICENSE_KEY=$NEW_RELIC_LICENSE_KEY ``` - + The real `push` command has more var arguments than the single one above. Get their values from a Notify team member. +1. Visit the URL of the app you just deployed + * Admin https://notify-sandbox.app.cloud.gov/ + * API https://notify-api-sandbox.app.cloud.gov/ # Database management @@ -1385,7 +1388,7 @@ After pushing the Admin app, you might see this in the logs {"name": "app", "levelname": "ERROR", "message": "API unknown failed with status 503 message Request failed", "pathname": "/home/vcap/app/app/__init__.py", ... ``` -This indicates that the Admin and API apps are unable to talk to each other because of either a missing route or a missing network policy. The apps require [container-to-container networking](https://cloud.gov/docs/management/container-to-container/) to communicate. List `cf network-policies` and compare the output to other deployed envrionments. If you find a policy is missing, you might have to create a network policy with something like: +This indicates that the Admin and API apps are unable to talk to each other because of either a missing route or a missing network policy. The apps require [container-to-container networking](https://cloud.gov/docs/management/container-to-container/) to communicate. List `cf network-policies` and compare the output to our other deployed envs. If you find a policy is missing, you might have to create a network policy with something like: ``` cf add-network-policy notify-admin-sandbox notify-api-sandbox --protocol tcp --port 61443 ``` diff --git a/terraform/README.md b/terraform/README.md index f3e619137..bbb63424a 100644 --- a/terraform/README.md +++ b/terraform/README.md @@ -134,6 +134,8 @@ These steps assume shared [Terraform state credentials](#terraform-state-credent This command *will deploy your changes* to the cloud. This is a healthy part of testing your code in the sandbox, or if you are creating a new environment (a new directory). **Do not** apply in environments that people are relying upon. + If you need to go on to deploy application code on top of the resources you just instantiated, you will [use `cf push`](https://github.com/GSA/notifications-api/blob/main/docs/all.md#deploying-to-the-sandbox) + 1. Remove the space deployer service instance when you are done manually running Terraform. ```bash # and have the same values as used above. From 2ac0ae73ce82ba5f1486278a064f0a28019fbbc4 Mon Sep 17 00:00:00 2001 From: John Skiles Skinner Date: Mon, 22 Jul 2024 15:47:42 -0700 Subject: [PATCH 3/4] Update table of contents --- docs/all.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/all.md b/docs/all.md index 45d85bbf7..3de8a5e45 100644 --- a/docs/all.md +++ b/docs/all.md @@ -60,9 +60,13 @@ - [Data Storage Policies \& Procedures](#data-storage-policies--procedures) - [Potential PII Locations](#potential-pii-locations) - [Data Retention Policy](#data-retention-policy) -- [Debug messages not being sent](#debug-messages-not-being-sent) - - [Getting the file location and tracing what happens](#getting-the-file-location-and-tracing-what-happens) - - [Viewing the csv file](#viewing-the-csv-file) +- [Troubleshooting] + - [Debug messages not being sent](#debug-messages-not-being-sent) + - [Getting the file location and tracing what happens](#getting-the-file-location-and-tracing-what-happens) + - [Viewing the csv file](#viewing-the-csv-file) + - [Deployment / app push problems](#deployment--app-push-problems) + - [Routes cannot be mapped to destinations in different spaces](#routes-cannot-be-mapped-to-destinations-in-different-spaces) + - [API request failed](#api-request-failed) # Infrastructure overview From be360cd82bbb135709d8fa92661d5b7e8406af66 Mon Sep 17 00:00:00 2001 From: John Skiles Skinner Date: Mon, 22 Jul 2024 15:49:19 -0700 Subject: [PATCH 4/4] Complete link to Troubleshooting section --- docs/all.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/all.md b/docs/all.md index 3de8a5e45..5f9be2a30 100644 --- a/docs/all.md +++ b/docs/all.md @@ -60,7 +60,7 @@ - [Data Storage Policies \& Procedures](#data-storage-policies--procedures) - [Potential PII Locations](#potential-pii-locations) - [Data Retention Policy](#data-retention-policy) -- [Troubleshooting] +- [Troubleshooting](#troubleshooting) - [Debug messages not being sent](#debug-messages-not-being-sent) - [Getting the file location and tracing what happens](#getting-the-file-location-and-tracing-what-happens) - [Viewing the csv file](#viewing-the-csv-file)