- Documents the critical 'jobs waiting forever' issue and solution - Root cause: Docker syntax in runs-on labels causes immediate job cancellation - Includes diagnosis steps, SQL queries, and test procedures - References multi-Pi runner infrastructure and lessons learned Signed-off-by: Cliff Hill <xlorep@darkhelm.org>
4.4 KiB
Gitea Actions Troubleshooting Guide
This document contains solutions to common issues with Gitea Actions CI/CD pipeline.
Critical Issue: Jobs Stuck in "Waiting" State Forever
Symptoms
- Workflows are created but jobs show "Waiting" indefinitely
- Runners are online and healthy
- No tasks appear in
action_taskdatabase table - Jobs get cancelled immediately (0-second duration)
- UI shows "Waiting" but database shows status 5 (cancelled)
Root Cause
Docker syntax in runs-on labels causes Gitea Actions to immediately cancel jobs.
Problem Syntax (BROKEN)
jobs:
setup:
runs-on: ubuntu-latest:docker://ubuntu:22.04
backend:
runs-on: python-latest:docker://python:3.13-slim
frontend:
runs-on: node-latest:docker://node:20-bookworm-slim
Solution Syntax (WORKING)
jobs:
setup:
runs-on: ubuntu-latest
backend:
runs-on: python-latest
frontend:
runs-on: node-latest
Why This Works
The runners are configured with Docker images in their labels:
GITEA_RUNNER_LABELS=ubuntu-latest:docker://ubuntu:22.04,node-latest:docker://node:20-bookworm-slim,python-latest:docker://python:3.13-slim
So jobs still run in the correct Docker containers, but Gitea can properly parse and dispatch them.
Diagnosis Steps
- Check if new runs are created:
SELECT id, status, title FROM action_run ORDER BY id DESC LIMIT 3;
- Check job status and duration:
SELECT arj.id, arj.job_id, arj.status, ar.created, ar.updated, (ar.updated - ar.created) as duration_seconds
FROM action_run_job arj
JOIN action_run ar ON arj.run_id = ar.id
WHERE ar.id = (SELECT MAX(id) FROM action_run);
- Check if tasks are created:
SELECT * FROM action_task ORDER BY id DESC LIMIT 5;
- Verify runners are online:
SELECT id, name, last_online, agent_labels FROM action_runner WHERE last_online > (EXTRACT(epoch FROM NOW()) - 300)::bigint;
Key Indicators
- Duration = 0 seconds → Immediate cancellation due to syntax issue
- Empty action_task table → Jobs never converted to executable tasks
- Status 5 jobs with Status 7 dependents → Setup job cancelled, others skipped
Test Procedure
Create a minimal test workflow to isolate issues:
# .gitea/workflows/test-simple.yml
name: Simple Test
on: push
jobs:
test:
name: Simple Test
runs-on: ubuntu-latest
steps:
- name: Echo
run: echo "Hello World"
If this works but your main workflow doesn't, the issue is likely syntax-related.
Other Common Issues
Cache/UI Synchronization Problems
If UI shows different status than database:
- Restart Gitea:
docker compose restart server - Clear browser cache
- Check database vs UI status discrepancies
Stuck Runs from Previous Sessions
Clean up stuck runs:
-- Clear stuck pending jobs
UPDATE action_run_job SET status = 5 WHERE status IN (1, 2);
UPDATE action_run SET status = 5 WHERE status IN (1, 2);
Runner Registration Issues
If runners show "unregistered runner" errors:
- Delete runner registrations:
DELETE FROM action_runner; - Restart all runner containers
- Let them auto-register with fresh state
Infrastructure Overview
Current Setup
- Gitea Server: Docker container with PostgreSQL backend
- Runners: 8 Raspberry Pi runners across 4 servers
- pi-desktop: Pi 400 4GB (2 runners)
- kankali: Pi with local Gitea (2 runners)
- urtzul: Pi 4B 8GB (2 runners)
- zhokq: Pi 4B 8GB (2 runners)
Runner Configuration
Each runner supports multiple Docker environments:
ubuntu-latest→ubuntu:22.04python-latest→python:3.13-slimnode-latest→node:20-bookworm-slimubuntu-act→catthehacker/ubuntu:act-latest
Workflow Design
Multi-stage pipeline with artifact passing:
- Setup: Checkout code, create artifacts
- Parallel Setup: Backend (Python/uv) + Frontend (Node.js/Yarn)
- Parallel Tests: Backend tests + Frontend tests
Lessons Learned
- Gitea Actions syntax is stricter than GitHub Actions
- Runner labels must match exactly - no Docker syntax in workflow files
- Database debugging is essential - UI can show cached/incorrect status
- Job cancellation happens immediately for syntax errors
- Empty action_task table is the key indicator of dispatch failure
Last updated: October 25, 2025 Issue resolved after extensive database-level debugging and syntax isolation