Start troubleshooting section, add errors seen in sandboxing

2026-02-04 02:11:11 -05:00 · 2024-07-22 15:31:01 -07:00
parent cfb9c608ca
commit bce064fa0c
1 changed files with 30 additions and 4 deletions
--- a/docs/all.md
+++ b/docs/all.md
@@ -1327,11 +1327,12 @@ Seven (7) days by default. Each service can be set with a custom policy via `Ser
 Data cleanup is controlled by several tasks in the `nightly_tasks.py` file, kicked off by Celery Beat.
 # Troubleshooting
-# Debug messages not being sent
+## Debug messages not being sent
-## Getting the file location and tracing what happens
+### Getting the file location and tracing what happens
 Ask the user to provide the csv file name.  Either the csv file they uploaded, or the one that is autogenerated when they do a one-off send and is visible in the UI
@@ -1340,7 +1341,7 @@ Starting with the admin logs, search for this file name.  When you find it, the
 In the api logs, search by job_id.  Either you will see evidence of the job failing and retrying over and over (in which case search for a stack trace using timestamp), or you will ultimately get to a log line that links the job_id to a message_id.  In this case, now search by message_id.  You should be able to find the actual result from AWS, either success or failure, with hopefully some helpful info.
-## Viewing the csv file
+### Viewing the csv file
 If you need to view the questionable csv file on production, run the following command:
@@ -1355,7 +1356,7 @@ locally, just do:
 poetry run flask command download-csv-file-by-name -f <file location in admin logs>
 ```
-## Debug steps
+### Debug steps
 1. Either send a message and capture the csv file name, or get a csv file name from a user
 2. Using the log tool at logs.fr.cloud.gov, use filters to limit what you're searching on (cf.app is 'notify-admin-production' for example) and then search with the csv file name in double quotes over the relevant time period (last 5 minutes if you just sent a message, or else whatever time the user sent at)
@@ -1363,3 +1364,28 @@ poetry run flask command download-csv-file-by-name -f <file location in admin lo
 4. To get the csv file contents, you can run the command above.  This command currently prints to the notify-api log, so after you run the command,
 you need to search in notify-api-production for the last 5 minutes with the logs sorted by timestamp.  The contents of the csv file unfortunately appear on separate lines so it's very important to sort by time.
 5. If you want to see where the message actually failed, search with cf.app is notify-api-production using the job_id that you saved in step #3.   If you get far enough, you might see one of the log lines has a message_id.  If you see it, you can switch and search on that, which should tell you what happened in AWS (success or failure).
 ## Deployment / app push problems
 ### Routes cannot be mapped to destinations in different spaces
 During `cf push` you may see
 ```
 For application 'notify-api-sandbox': Routes cannot be mapped to destinations in different spaces
 ```
 :ghost: This indicates a ghost route squatting on a route you need to create. In the cloud.gov web interface, check for incomplete deployments. They might be holding on to a route. Delete them. Also, check the list of routes (from the CloudFoundry icon in the left sidebar) for routes without an associated app. If they look like a route your app would need to create, delete them.
 ### API request failed
 After pushing the Admin app, you might see this in the logs
 ```
 {"name": "app", "levelname": "ERROR", "message": "API unknown failed with status 503 message Request failed", "pathname": "/home/vcap/app/app/__init__.py", ...
 ```
 This indicates that the Admin and API apps are unable to talk to each other because of either a missing route or a missing network policy. The apps require [container-to-container networking](https://cloud.gov/docs/management/container-to-container/) to communicate. List `cf network-policies` and compare the output to other deployed envrionments. If you find a policy is missing, you might have to create a network policy with something like:
 ```
 cf add-network-policy notify-admin-sandbox notify-api-sandbox --protocol tcp --port 61443
 ```