Add comprehensive Gitea Actions troubleshooting documentation
Some checks failed
Tests / Setup and Checkout (push) Failing after 1m43s
Tests / Backend Setup (Python 3.13 + uv + Environment) (push) Has been skipped
Tests / Frontend Setup (Node.js 24 + Yarn Berry + Build) (push) Has been skipped
Tests / Backend Tests (Python 3.13 + uv) (push) Has been skipped
Tests / Frontend Tests (TypeScript + Vue + Yarn Berry) (push) Has been skipped

- Documents the critical 'jobs waiting forever' issue and solution
- Root cause: Docker syntax in runs-on labels causes immediate job cancellation
- Includes diagnosis steps, SQL queries, and test procedures
- References multi-Pi runner infrastructure and lessons learned

Signed-off-by: Cliff Hill <xlorep@darkhelm.org>
This commit is contained in:
2025-10-25 17:45:17 -04:00
parent 08a413af39
commit 253e3a7eb0
2 changed files with 155 additions and 2 deletions

View File

@@ -194,6 +194,8 @@ npm run dev
---
See the `backend/` and `frontend/` folders for more details.
## Documentation
Something-something I need this to work.
- **[Gitea Actions Troubleshooting](docs/GITEA_ACTIONS_TROUBLESHOOTING.md)** - Solutions for CI/CD pipeline issues, including the critical "jobs waiting forever" problem
See the `backend/` and `frontend/` folders for more details.

View File

@@ -0,0 +1,151 @@
# Gitea Actions Troubleshooting Guide
This document contains solutions to common issues with Gitea Actions CI/CD pipeline.
## Critical Issue: Jobs Stuck in "Waiting" State Forever
### Symptoms
- Workflows are created but jobs show "Waiting" indefinitely
- Runners are online and healthy
- No tasks appear in `action_task` database table
- Jobs get cancelled immediately (0-second duration)
- UI shows "Waiting" but database shows status 5 (cancelled)
### Root Cause
**Docker syntax in `runs-on` labels** causes Gitea Actions to immediately cancel jobs.
### Problem Syntax (BROKEN)
```yaml
jobs:
setup:
runs-on: ubuntu-latest:docker://ubuntu:22.04
backend:
runs-on: python-latest:docker://python:3.13-slim
frontend:
runs-on: node-latest:docker://node:20-bookworm-slim
```
### Solution Syntax (WORKING)
```yaml
jobs:
setup:
runs-on: ubuntu-latest
backend:
runs-on: python-latest
frontend:
runs-on: node-latest
```
### Why This Works
The runners are configured with Docker images in their labels:
```bash
GITEA_RUNNER_LABELS=ubuntu-latest:docker://ubuntu:22.04,node-latest:docker://node:20-bookworm-slim,python-latest:docker://python:3.13-slim
```
So jobs still run in the correct Docker containers, but Gitea can properly parse and dispatch them.
### Diagnosis Steps
1. **Check if new runs are created:**
```sql
SELECT id, status, title FROM action_run ORDER BY id DESC LIMIT 3;
```
2. **Check job status and duration:**
```sql
SELECT arj.id, arj.job_id, arj.status, ar.created, ar.updated, (ar.updated - ar.created) as duration_seconds
FROM action_run_job arj
JOIN action_run ar ON arj.run_id = ar.id
WHERE ar.id = (SELECT MAX(id) FROM action_run);
```
3. **Check if tasks are created:**
```sql
SELECT * FROM action_task ORDER BY id DESC LIMIT 5;
```
4. **Verify runners are online:**
```sql
SELECT id, name, last_online, agent_labels FROM action_runner WHERE last_online > (EXTRACT(epoch FROM NOW()) - 300)::bigint;
```
### Key Indicators
- **Duration = 0 seconds** → Immediate cancellation due to syntax issue
- **Empty action_task table** → Jobs never converted to executable tasks
- **Status 5 jobs with Status 7 dependents** → Setup job cancelled, others skipped
### Test Procedure
Create a minimal test workflow to isolate issues:
```yaml
# .gitea/workflows/test-simple.yml
name: Simple Test
on: push
jobs:
test:
name: Simple Test
runs-on: ubuntu-latest
steps:
- name: Echo
run: echo "Hello World"
```
If this works but your main workflow doesn't, the issue is likely syntax-related.
## Other Common Issues
### Cache/UI Synchronization Problems
If UI shows different status than database:
1. Restart Gitea: `docker compose restart server`
2. Clear browser cache
3. Check database vs UI status discrepancies
### Stuck Runs from Previous Sessions
Clean up stuck runs:
```sql
-- Clear stuck pending jobs
UPDATE action_run_job SET status = 5 WHERE status IN (1, 2);
UPDATE action_run SET status = 5 WHERE status IN (1, 2);
```
### Runner Registration Issues
If runners show "unregistered runner" errors:
1. Delete runner registrations: `DELETE FROM action_runner;`
2. Restart all runner containers
3. Let them auto-register with fresh state
## Infrastructure Overview
### Current Setup
- **Gitea Server**: Docker container with PostgreSQL backend
- **Runners**: 8 Raspberry Pi runners across 4 servers
- pi-desktop: Pi 400 4GB (2 runners)
- kankali: Pi with local Gitea (2 runners)
- urtzul: Pi 4B 8GB (2 runners)
- zhokq: Pi 4B 8GB (2 runners)
### Runner Configuration
Each runner supports multiple Docker environments:
- `ubuntu-latest``ubuntu:22.04`
- `python-latest``python:3.13-slim`
- `node-latest``node:20-bookworm-slim`
- `ubuntu-act``catthehacker/ubuntu:act-latest`
### Workflow Design
Multi-stage pipeline with artifact passing:
1. **Setup**: Checkout code, create artifacts
2. **Parallel Setup**: Backend (Python/uv) + Frontend (Node.js/Yarn)
3. **Parallel Tests**: Backend tests + Frontend tests
## Lessons Learned
1. **Gitea Actions syntax is stricter than GitHub Actions**
2. **Runner labels must match exactly** - no Docker syntax in workflow files
3. **Database debugging is essential** - UI can show cached/incorrect status
4. **Job cancellation happens immediately** for syntax errors
5. **Empty action_task table** is the key indicator of dispatch failure
---
*Last updated: October 25, 2025*
*Issue resolved after extensive database-level debugging and syntax isolation*