Add comprehensive Gitea Actions troubleshooting documentation
Some checks failed
Tests / Setup and Checkout (push) Failing after 1m43s
Tests / Backend Setup (Python 3.13 + uv + Environment) (push) Has been skipped
Tests / Frontend Setup (Node.js 24 + Yarn Berry + Build) (push) Has been skipped
Tests / Backend Tests (Python 3.13 + uv) (push) Has been skipped
Tests / Frontend Tests (TypeScript + Vue + Yarn Berry) (push) Has been skipped
Some checks failed
Tests / Setup and Checkout (push) Failing after 1m43s
Tests / Backend Setup (Python 3.13 + uv + Environment) (push) Has been skipped
Tests / Frontend Setup (Node.js 24 + Yarn Berry + Build) (push) Has been skipped
Tests / Backend Tests (Python 3.13 + uv) (push) Has been skipped
Tests / Frontend Tests (TypeScript + Vue + Yarn Berry) (push) Has been skipped
- Documents the critical 'jobs waiting forever' issue and solution - Root cause: Docker syntax in runs-on labels causes immediate job cancellation - Includes diagnosis steps, SQL queries, and test procedures - References multi-Pi runner infrastructure and lessons learned Signed-off-by: Cliff Hill <xlorep@darkhelm.org>
This commit is contained in:
@@ -194,6 +194,8 @@ npm run dev
|
||||
|
||||
---
|
||||
|
||||
See the `backend/` and `frontend/` folders for more details.
|
||||
## Documentation
|
||||
|
||||
Something-something I need this to work.
|
||||
- **[Gitea Actions Troubleshooting](docs/GITEA_ACTIONS_TROUBLESHOOTING.md)** - Solutions for CI/CD pipeline issues, including the critical "jobs waiting forever" problem
|
||||
|
||||
See the `backend/` and `frontend/` folders for more details.
|
||||
|
||||
151
docs/GITEA_ACTIONS_TROUBLESHOOTING.md
Normal file
151
docs/GITEA_ACTIONS_TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Gitea Actions Troubleshooting Guide
|
||||
|
||||
This document contains solutions to common issues with Gitea Actions CI/CD pipeline.
|
||||
|
||||
## Critical Issue: Jobs Stuck in "Waiting" State Forever
|
||||
|
||||
### Symptoms
|
||||
- Workflows are created but jobs show "Waiting" indefinitely
|
||||
- Runners are online and healthy
|
||||
- No tasks appear in `action_task` database table
|
||||
- Jobs get cancelled immediately (0-second duration)
|
||||
- UI shows "Waiting" but database shows status 5 (cancelled)
|
||||
|
||||
### Root Cause
|
||||
**Docker syntax in `runs-on` labels** causes Gitea Actions to immediately cancel jobs.
|
||||
|
||||
### Problem Syntax (BROKEN)
|
||||
```yaml
|
||||
jobs:
|
||||
setup:
|
||||
runs-on: ubuntu-latest:docker://ubuntu:22.04
|
||||
backend:
|
||||
runs-on: python-latest:docker://python:3.13-slim
|
||||
frontend:
|
||||
runs-on: node-latest:docker://node:20-bookworm-slim
|
||||
```
|
||||
|
||||
### Solution Syntax (WORKING)
|
||||
```yaml
|
||||
jobs:
|
||||
setup:
|
||||
runs-on: ubuntu-latest
|
||||
backend:
|
||||
runs-on: python-latest
|
||||
frontend:
|
||||
runs-on: node-latest
|
||||
```
|
||||
|
||||
### Why This Works
|
||||
The runners are configured with Docker images in their labels:
|
||||
```bash
|
||||
GITEA_RUNNER_LABELS=ubuntu-latest:docker://ubuntu:22.04,node-latest:docker://node:20-bookworm-slim,python-latest:docker://python:3.13-slim
|
||||
```
|
||||
|
||||
So jobs still run in the correct Docker containers, but Gitea can properly parse and dispatch them.
|
||||
|
||||
### Diagnosis Steps
|
||||
|
||||
1. **Check if new runs are created:**
|
||||
```sql
|
||||
SELECT id, status, title FROM action_run ORDER BY id DESC LIMIT 3;
|
||||
```
|
||||
|
||||
2. **Check job status and duration:**
|
||||
```sql
|
||||
SELECT arj.id, arj.job_id, arj.status, ar.created, ar.updated, (ar.updated - ar.created) as duration_seconds
|
||||
FROM action_run_job arj
|
||||
JOIN action_run ar ON arj.run_id = ar.id
|
||||
WHERE ar.id = (SELECT MAX(id) FROM action_run);
|
||||
```
|
||||
|
||||
3. **Check if tasks are created:**
|
||||
```sql
|
||||
SELECT * FROM action_task ORDER BY id DESC LIMIT 5;
|
||||
```
|
||||
|
||||
4. **Verify runners are online:**
|
||||
```sql
|
||||
SELECT id, name, last_online, agent_labels FROM action_runner WHERE last_online > (EXTRACT(epoch FROM NOW()) - 300)::bigint;
|
||||
```
|
||||
|
||||
### Key Indicators
|
||||
- **Duration = 0 seconds** → Immediate cancellation due to syntax issue
|
||||
- **Empty action_task table** → Jobs never converted to executable tasks
|
||||
- **Status 5 jobs with Status 7 dependents** → Setup job cancelled, others skipped
|
||||
|
||||
### Test Procedure
|
||||
Create a minimal test workflow to isolate issues:
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/test-simple.yml
|
||||
name: Simple Test
|
||||
on: push
|
||||
jobs:
|
||||
test:
|
||||
name: Simple Test
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Echo
|
||||
run: echo "Hello World"
|
||||
```
|
||||
|
||||
If this works but your main workflow doesn't, the issue is likely syntax-related.
|
||||
|
||||
## Other Common Issues
|
||||
|
||||
### Cache/UI Synchronization Problems
|
||||
If UI shows different status than database:
|
||||
1. Restart Gitea: `docker compose restart server`
|
||||
2. Clear browser cache
|
||||
3. Check database vs UI status discrepancies
|
||||
|
||||
### Stuck Runs from Previous Sessions
|
||||
Clean up stuck runs:
|
||||
```sql
|
||||
-- Clear stuck pending jobs
|
||||
UPDATE action_run_job SET status = 5 WHERE status IN (1, 2);
|
||||
UPDATE action_run SET status = 5 WHERE status IN (1, 2);
|
||||
```
|
||||
|
||||
### Runner Registration Issues
|
||||
If runners show "unregistered runner" errors:
|
||||
1. Delete runner registrations: `DELETE FROM action_runner;`
|
||||
2. Restart all runner containers
|
||||
3. Let them auto-register with fresh state
|
||||
|
||||
## Infrastructure Overview
|
||||
|
||||
### Current Setup
|
||||
- **Gitea Server**: Docker container with PostgreSQL backend
|
||||
- **Runners**: 8 Raspberry Pi runners across 4 servers
|
||||
- pi-desktop: Pi 400 4GB (2 runners)
|
||||
- kankali: Pi with local Gitea (2 runners)
|
||||
- urtzul: Pi 4B 8GB (2 runners)
|
||||
- zhokq: Pi 4B 8GB (2 runners)
|
||||
|
||||
### Runner Configuration
|
||||
Each runner supports multiple Docker environments:
|
||||
- `ubuntu-latest` → `ubuntu:22.04`
|
||||
- `python-latest` → `python:3.13-slim`
|
||||
- `node-latest` → `node:20-bookworm-slim`
|
||||
- `ubuntu-act` → `catthehacker/ubuntu:act-latest`
|
||||
|
||||
### Workflow Design
|
||||
Multi-stage pipeline with artifact passing:
|
||||
1. **Setup**: Checkout code, create artifacts
|
||||
2. **Parallel Setup**: Backend (Python/uv) + Frontend (Node.js/Yarn)
|
||||
3. **Parallel Tests**: Backend tests + Frontend tests
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. **Gitea Actions syntax is stricter than GitHub Actions**
|
||||
2. **Runner labels must match exactly** - no Docker syntax in workflow files
|
||||
3. **Database debugging is essential** - UI can show cached/incorrect status
|
||||
4. **Job cancellation happens immediately** for syntax errors
|
||||
5. **Empty action_task table** is the key indicator of dispatch failure
|
||||
|
||||
---
|
||||
|
||||
*Last updated: October 25, 2025*
|
||||
*Issue resolved after extensive database-level debugging and syntax isolation*
|
||||
Reference in New Issue
Block a user