docs: normalize markdown quality across PP-58 docs

This commit is contained in:
copilotcoder
2026-06-19 10:33:42 -04:00
parent aaa94f5d32
commit 676cb11a07
3 changed files with 113 additions and 22 deletions

View File

@@ -7,9 +7,11 @@ This project uses a two-stage Docker build approach to optimize CI/CD performanc
## Architecture
### Stage 1: Base Image (`Dockerfile.cicd-base`)
**Purpose**: Contains all system dependencies and language runtimes that change infrequently.
**Contents**:
- Ubuntu 22.04 base system
- Python 3.14 with development tools
- Node.js 24 with npm/yarn
@@ -22,15 +24,18 @@ This project uses a two-stage Docker build approach to optimize CI/CD performanc
- SSH helper scripts for git operations
**Registry**:
- Immutable: `kankali.darkhelm.lan:3001/darkhelm.org/plex-playlist-cicd-base:<hash>`
- Convenience: `kankali.darkhelm.lan:3001/darkhelm.org/plex-playlist-cicd-base:latest`
**Rebuild Triggers**: Only when `Dockerfile.cicd-base`, `.dockerignore`, or the shared hash helper changes
### Stage 2: Complete Image (`Dockerfile.cicd`)
**Purpose**: Inherits from base and adds project code and dependencies.
**Contents**:
- Project source code (cloned via SSH)
- **Optimized backend dependencies** (leverages pre-installed dev tools)
- **Optimized frontend dependencies** (leverages global TypeScript, ESLint, etc.)
@@ -45,6 +50,7 @@ This project uses a two-stage Docker build approach to optimize CI/CD performanc
## Performance Benefits
### Before Multi-Stage Optimization
- Single monolithic build: ~15-25 minutes on Raspberry Pi 4GB workers
- Full system dependency installation every time
- No caching of expensive operations (Python compilation, Node.js setup)
@@ -52,6 +58,7 @@ This project uses a two-stage Docker build approach to optimize CI/CD performanc
- Common dev tools (ruff, pyright, eslint, typescript) compiled from source each time
### After Multi-Stage Optimization (✅ **VALIDATED SUCCESSFUL**)
- **Complete CI/CD pipeline: ~3-5 minutes** (85% improvement!)
- Base image cached and reused across builds
- Pre-installed development tools eliminate compilation overhead
@@ -62,7 +69,9 @@ This project uses a two-stage Docker build approach to optimize CI/CD performanc
## Advanced Optimizations in Base Image
### Pre-installed Development Tools
**Python Tools** (cached in `/opt/python-dev-tools/`):
- `ruff` - Fast Python linter/formatter
- `pyright` - Python type checker
- `pytest` + plugins - Testing framework
@@ -70,6 +79,7 @@ This project uses a two-stage Docker build approach to optimize CI/CD performanc
- `yamllint`, `toml-sort` - Configuration file tools
**Node.js Tools** (installed globally via npm):
- `@playwright/test` - Playwright testing framework
- `typescript` - TypeScript compiler
- `eslint` - JavaScript/TypeScript linter
@@ -82,12 +92,14 @@ This project uses a two-stage Docker build approach to optimize CI/CD performanc
- **Build Reliability**: Stable tool versions cached in base
### After Multi-Stage (Fully Optimized)
- Base image build: ~20-25 minutes (only when base changes, includes browsers + dev tools)
- Complete image build: ~2-3 minutes (reuses cached base with everything pre-installed)
- **Typical CI run**: ~2-3 minutes (98% of runs use fully cached base)
- **Major wins**: No browser downloads (~400MB), no dev tool compilation, faster dependency resolution
### Caching Strategy
1. **Docker Layer Caching**: Docker automatically caches unchanged layers
2. **Registry Caching**: Base image is built once and then pulled by all runners
3. **Hash-Based Invalidation**: Base image tagged with a shared helper-derived hash
@@ -129,6 +141,7 @@ jobs:
```
### Responsibility Split
- `.gitea/workflows/docker-build-base.yaml` owns base publication and verification.
- `.gitea/workflows/docker-build-main.yaml` owns complete-image publication.
- `.gitea/workflows/cicd-start.yaml`, `.gitea/workflows/cicd-checks.yaml`, and
@@ -138,6 +151,7 @@ jobs:
## Local Development
### Building Base Image
```bash
# Build base image locally
docker build -f Dockerfile.cicd-base -t cicd-base:local .
@@ -147,6 +161,7 @@ docker run -it cicd-base:local bash
```
### Building Complete Image
```bash
# Build complete image (requires SSH access to git repo)
export SSH_PRIVATE_KEY="$(cat ~/.ssh/id_rsa)"
@@ -163,6 +178,7 @@ rm /tmp/ssh_key
```
### Using Local Build Script
```bash
# Use the provided build script
./scripts/build-cicd-local.sh
@@ -174,12 +190,14 @@ base image matches the immutable tag CI expects.
## Memory Optimization
### Raspberry Pi 4GB Constraints
- **Swap File**: 1GB temporary swap during yarn install
- **Node.js Memory**: Limited to 1024MB (`--max-old-space-size=1024`)
- **UV Workers**: Single-threaded Python package installation
- **Graceful Degradation**: Frontend dependencies optional in constrained environments
### Frontend Dependency Handling
```dockerfile
# Conservative installation with fallback
RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
@@ -197,23 +215,27 @@ RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
## Monitoring and Debugging
### Build Time Tracking
- Base image builds logged with timing information
- Hash-based cache hit/miss tracking
- Registry pull vs build decision logging
### Troubleshooting
1. **Base Image Issues**: Check `Dockerfile.cicd-base` syntax and system dependencies
2. **Complete Image Issues**: Usually project dependency or SSH access problems
3. **Cache Misses**: Verify registry connectivity and the shared base hash calculation
4. **Memory Issues**: Check swap setup and Node.js memory limits
### Missing Immutable Base Tag
- Symptom: main CI fails with `Required immutable base image is not available`
- Cause: the expected `cicd-base:<hash>` has not been published yet
- Fix: run or rerun the `CICD Base Image` workflow, or wait for it to finish when a PR changes base inputs
- Design note: main CI intentionally fails instead of rebuilding the base locally
### Common Issues
- **SSH Key Problems**: Ensure SSH_PRIVATE_KEY secret is properly configured
- **Registry Authentication**: Verify PACKAGE_ACCESS_TOKEN permissions
- **Memory Constraints**: Monitor swap usage on Raspberry Pi workers
@@ -222,9 +244,11 @@ RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
#### Base Image Optimization Issues
**Missing `/opt/python-dev-tools/` (Oct 2025 Resolution)**:
- **Symptom**: Build fails with `No virtual environment or system Python installation found for path /opt/python-dev-tools/bin/python`
- **Cause**: Base image in registry doesn't contain pre-installed Python dev tools optimization
- **Fix Applied**: Made complete image resilient to missing optimization
```dockerfile
# In Dockerfile.cicd - now handles missing pre-installed tools gracefully
if [ -f "/opt/python-dev-tools/bin/python" ]; then
@@ -233,10 +257,12 @@ RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
echo "⚠ Pre-installed Python dev tools not found - fresh installation"
fi
```
- **Impact**: Builds continue successfully but without optimization benefits (~20s longer)
- **Long-term Solution**: Rebuild base image to restore `/opt/python-dev-tools/` optimization
**Playwright E2E Test Failures (Oct 2025 Resolution)**:
- **Symptom**: `error: unknown option '--headed=false'` during E2E test execution
- **Cause**: Invalid Playwright CLI flag syntax in workflow and documentation
- **Fix Applied**:
@@ -246,6 +272,7 @@ RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
- **Key Learning**: Use yarn scripts (`yarn test:e2e`) rather than direct Playwright CLI calls
**Missing Playwright Browser Binaries (Nov 2025 Resolution)**:
- **Symptom**: `Executable doesn't exist at /root/.cache/ms-playwright/chromium_headless_shell-*/` for all browsers
- **Cause**: Base image browsers not properly cached or registry image outdated
- **Fix Applied**: Added `yarn playwright install --with-deps` step before running E2E tests in CI
@@ -254,6 +281,7 @@ RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
- **Long-term Solution**: Rebuild base image to restore Playwright browser caching
**Firefox/WebKit Browser Compatibility in Docker CI (Nov 2025 Resolution)**:
- **Symptom**: Firefox sandbox/timeout errors, WebKit content loading failures in Docker environment
- **Root Cause**: Firefox requires special sandbox configuration, WebKit has timing issues in headless Docker
- **Fix Applied**: CI now runs only Chromium browser (most reliable), all browsers available locally
@@ -262,6 +290,7 @@ RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
- **Coverage**: Chromium provides excellent coverage as it's most widely used browser engine
**Network Instability Resilience (Nov 2025 Enhancement)**:
- **Problem**: CI environment has unstable network causing Docker registry timeouts, image pull failures
- **Solutions Applied**:
- **Docker Login Retry**: 5 attempts with 15s intervals, 60s timeout per attempt
@@ -316,6 +345,7 @@ RUN export NODE_OPTIONS="--max-old-space-size=1024" && \
**Decision**: Install dependencies before cloning full source code
**Rationale**:
- Dependencies change less frequently than source code (~5% vs 95% of commits)
- Docker layer caching works best with stable, early layers
- Separation allows independent cache invalidation
@@ -334,6 +364,7 @@ RUN git clone full_repo && merge_with_dependencies
```
**Trade-offs**:
- ✅ 85% faster typical builds (3-5min vs 15-20min)
- ✅ Better resource utilization (RPi 4GB workers)
- ❌ More complex Dockerfile logic
@@ -344,6 +375,7 @@ RUN git clone full_repo && merge_with_dependencies
**Decision**: Run E2E tests only with Chromium in CI, all browsers locally
**Rationale**:
- Firefox sandbox issues in Docker environment require complex configuration
- WebKit has timing/content loading issues in headless Docker
- Chromium is most stable and widely-used browser engine
@@ -354,11 +386,12 @@ RUN git clone full_repo && merge_with_dependencies
```typescript
// playwright.config.ts - Conditional browser setup
const projects = process.env.CI
? [{ name: 'chromium', use: devices['Desktop Chrome'] }]
: [chromium, firefox, webkit]; // Full coverage locally
? [{ name: "chromium", use: devices["Desktop Chrome"] }]
: [chromium, firefox, webkit]; // Full coverage locally
```
**Trade-offs**:
- ✅ Reliable CI runs (100% success rate vs 60% with multi-browser)
- ✅ Faster CI execution (single browser vs three)
- ✅ Simpler Docker configuration
@@ -369,6 +402,7 @@ const projects = process.env.CI
**Decision**: Implement comprehensive retry logic for all network operations
**Rationale**:
- Self-hosted CI environment has intermittent network instability
- Docker registry operations are critical path failures
- Playwright browser downloads are large and failure-prone
@@ -384,6 +418,7 @@ done
```
**Coverage**:
- Docker login/pull operations (5 attempts, 15-60s intervals)
- Playwright browser installs (3 attempts, 30s intervals)
- E2E navigation (built-in retry with network error filtering)
@@ -391,12 +426,15 @@ done
## Migration Path
### From Single-Stage Build
1. **Phase 1**: Deploy both Dockerfiles, workflow uses old single-stage
2. **Phase 2**: Switch workflow to use multi-stage (this deployment)
3. **Phase 3**: Remove old `Dockerfile.cicd.old` after successful runs
### Rollback Strategy
If issues arise, revert workflow to use single-stage:
```yaml
# Emergency rollback: use old Dockerfile directly
docker build -f Dockerfile.cicd.old -t cicd:latest .
@@ -405,12 +443,14 @@ docker build -f Dockerfile.cicd.old -t cicd:latest .
## Future Enhancements
### Potential Optimizations
1. **Dependency Caching**: Pre-install common Python/Node packages in base
2. **Multi-Architecture**: ARM64 native builds for Raspberry Pi
3. **Parallel Builds**: Build base and project dependencies in parallel
4. **Smart Invalidation**: More granular dependency change detection
### Monitoring Additions
1. **Build Time Metrics**: Track cache hit rates and build duration
2. **Registry Usage**: Monitor storage and bandwidth usage
3. **Worker Performance**: Profile builds across different runner types

View File

@@ -30,6 +30,7 @@ RUN git clone full_repo && merge_preserving_deps # ✅ Source changes don't bus
**Technical Challenges & Solutions**:
1. **Local Package Build Error**: `OSError: Readme file does not exist: ../README.md`
```dockerfile
# Fix: Create minimal structure for package build
RUN mkdir -p src/backend && \
@@ -39,6 +40,7 @@ RUN git clone full_repo && merge_preserving_deps # ✅ Source changes don't bus
```
2. **Dependency Preservation**: Need to preserve installed packages when copying source
```dockerfile
# Fix: Backup/restore strategy
RUN if [ -d "/workspace/backend/.venv" ]; then mv /workspace/backend/.venv /tmp/venv_backup; fi && \
@@ -47,13 +49,15 @@ RUN git clone full_repo && merge_preserving_deps # ✅ Source changes don't bus
```
3. **No rsync Available**: Base image doesn't include rsync for selective copying
```dockerfile
```dockerfile
# Fix: Use standard cp with backup strategy instead of rsync
# rsync -av --exclude='node_modules' /tmp/fullrepo/ /workspace/ # ❌ Not available
# Standard cp with manual exclusions # ✅ Works everywhere
```
**Metrics**:
- Dependency cache hit rate: ~95% (only miss when pyproject.toml/package.json change)
- Average build time reduction: 12-17 minutes saved per build
- Resource efficiency: Better CPU/memory utilization on Raspberry Pi workers
@@ -65,6 +69,7 @@ RUN git clone full_repo && merge_preserving_deps # ✅ Source changes don't bus
**Problem**: Firefox and WebKit browsers failing consistently in Docker CI environment.
**Root Cause Analysis**:
- **Firefox**: Sandbox restrictions in Docker containers, requires `--no-sandbox` and security compromises
- **WebKit**: Content loading timeout issues, navigation reliability problems in headless mode
- **Docker Environment**: Limited resources (RPi 4GB) exacerbate browser compatibility issues
@@ -77,26 +82,28 @@ const projects = process.env.CI
? [
// CI: Only Chromium (most reliable in Docker)
{
name: 'chromium',
use: { ...devices['Desktop Chrome'] },
}
name: "chromium",
use: { ...devices["Desktop Chrome"] },
},
]
: [
// Local: Full browser coverage
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'firefox', use: { ...devices['Desktop Firefox'] } },
{ name: 'webkit', use: { ...devices['Desktop Safari'] } },
{ name: "chromium", use: { ...devices["Desktop Chrome"] } },
{ name: "firefox", use: { ...devices["Desktop Firefox"] } },
{ name: "webkit", use: { ...devices["Desktop Safari"] } },
];
```
**Rationale**:
- Chromium engine powers 95%+ of web browsers (Chrome, Edge, Opera, Brave)
- Excellent Docker compatibility and resource efficiency
- Core functionality testing coverage maintained
- Full browser testing available for local development
**Error Examples Resolved**:
```
```text
Firefox: error: unknown option '--headed=false'
WebKit: Test timeout 30000ms exceeded... waiting for navigation
Firefox: browserType.launch: Executable doesn't exist
@@ -113,6 +120,7 @@ Firefox: browserType.launch: Executable doesn't exist
**Solution**: Multi-level retry logic with exponential backoff:
#### Docker Registry Operations
```yaml
# .gitea/workflows/cicd-checks.yaml
- name: Login to Container Registry (with retry)
@@ -135,6 +143,7 @@ Firefox: browserType.launch: Executable doesn't exist
```
#### Playwright Browser Installation
```yaml
- name: Install Playwright Browsers (with retry)
run: |
@@ -151,14 +160,19 @@ Firefox: browserType.launch: Executable doesn't exist
```
#### E2E Test Navigation Resilience
```typescript
// frontend/tests/e2e/app.spec.ts
async function navigateWithRetry(page: Page, url: string, maxRetries: number = 3): Promise<void> {
async function navigateWithRetry(
page: Page,
url: string,
maxRetries: number = 3,
): Promise<void> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
await page.goto(url, {
waitUntil: 'networkidle',
timeout: 90000 // Extended timeout
waitUntil: "networkidle",
timeout: 90000, // Extended timeout
});
return;
} catch (error) {
@@ -171,6 +185,7 @@ async function navigateWithRetry(page: Page, url: string, maxRetries: number = 3
```
**Configuration Enhancements**:
```typescript
// playwright.config.ts - CI optimizations
use: {
@@ -182,6 +197,7 @@ use: {
```
**Results**:
- CI success rate: 40% → 95%
- Average retry overhead: +30 seconds per build
- Network timeout elimination: 100% of Docker operations now succeed
@@ -193,7 +209,8 @@ use: {
**Problem**: Production base image missing pre-installed Python dev tools optimization.
**Symptom**:
```
```text
⚠ Pre-installed Python dev tools not found - fresh installation
Base image may need rebuild for optimal caching
```
@@ -201,6 +218,7 @@ Base image may need rebuild for optimal caching
**Impact**: +15-20 seconds build time (acceptable degradation vs failure)
**Solution**: Graceful fallback detection:
```dockerfile
# Dockerfile.cicd - Resilient optimization detection
RUN echo "=== Base Image Optimization Status ===" && \
@@ -220,7 +238,8 @@ RUN echo "=== Base Image Optimization Status ===" && \
### Missing Immutable Base Image
**Symptom**:
```
```text
❌ Required immutable base image is not available: kankali.darkhelm.lan:3001/darkhelm.org/plex-playlist-cicd-base:<hash>
Publish the base image via the CICD Base Image workflow before rerunning main CI.
```
@@ -229,11 +248,13 @@ Publish the base image via the CICD Base Image workflow before rerunning main CI
dedicated base-image workflow has not published that immutable tag yet.
**Checks**:
1. Confirm whether `Dockerfile.cicd-base`, `.dockerignore`, or `scripts/compute-cicd-base-hash.sh` changed in the branch.
2. Check the `CICD Base Image` workflow for the same commit or PR.
3. Verify the registry contains `plex-playlist-cicd-base:<hash>`.
**Resolution**:
1. If the base workflow is still running, rerun main CI after it completes.
2. If the base workflow did not trigger, run it manually with `force_rebuild=false`.
3. If the tag should be republished despite already existing, run it manually with `force_rebuild=true`.
@@ -245,47 +266,60 @@ publish-once/consume-many design.
### Docker Build Failures
#### 1. rsync Command Not Found
```
```text
/bin/bash: line 1: rsync: command not found
```
**Fix**: Replace with standard cp commands and backup strategy (implemented)
#### 2. README.md Not Found During uv sync
```
```text
OSError: Readme file does not exist: ../README.md
```
**Fix**: Create dummy README.md during dependency installation phase (implemented)
#### 3. Dependency Cache Invalidation
**Symptom**: Dependencies rebuilding on every commit
**Fix**: Verify dependency-first build pattern is correctly implemented
### E2E Test Failures
#### 1. Browser Not Found
```
```text
Executable doesn't exist at /root/.cache/ms-playwright/chromium-*/
```
**Fix**: Ensure `yarn playwright install --with-deps` runs before tests
#### 2. Navigation Timeouts
```
```text
Test timeout 30000ms exceeded
```
**Fix**: Use `navigateWithRetry` helper with extended timeouts
#### 3. Multi-browser Failures in CI
**Fix**: Use Chromium-only configuration for CI environments
### Network-Related Issues
#### 1. Docker Registry Timeouts
**Fix**: Retry logic with exponential backoff (5 attempts, 15s intervals)
#### 2. Package Download Failures
**Fix**: Increase timeouts and add retry mechanisms
#### 3. SSL Certificate Issues
**Fix**: Set `ignoreHTTPSErrors: true` and `NODE_TLS_REJECT_UNAUTHORIZED=0`
## Performance Monitoring
@@ -320,12 +354,13 @@ Test timeout 30000ms exceeded
**🎉 MILESTONE ACHIEVED**: First fully successful CI/CD workflow completion with all optimizations working together.
**Final Performance Metrics**:
- **Total Pipeline Time**: ~3-5 minutes (down from 15-25 minutes)
- **Success Rate**: 100% (all test phases passing)
- **Build Optimization**: 85% time reduction achieved
- **E2E Test Reliability**: 100% (simplified Docker approach)
### **Key Issues Resolved in Final Sprint**:
### **Key Issues Resolved in Final Sprint**
1. **✅ README.md Dependency Fix**: Dummy file creation for dependency-only builds
2. **✅ Rsync Replacement**: Standard cp commands with backup/restore strategy
@@ -333,7 +368,8 @@ Test timeout 30000ms exceeded
4. **✅ E2E Test Simplification**: Removed unnecessary complex retry logic
5. **✅ Memory Management**: Proper swap configuration and Node.js memory limits
### **Validated Working Components**:
### **Validated Working Components**
- **Multi-stage Docker builds** with optimal layer caching
- **Dependency-first build pattern** preventing cache invalidation
- **Network-resilient Playwright setup** with Chromium-only CI testing
@@ -341,8 +377,10 @@ Test timeout 30000ms exceeded
- **SSH-based secure repository access** with proper key management
- **Comprehensive test coverage** (linting, unit tests, integration, E2E)
### **Architecture Stability**:
### **Architecture Stability**
All components now work cohesively:
- Base image caching (cicd-base) ↔️ Complete image building (cicd)
- Python dependency management (uv) ↔️ Backend source integration
- Frontend dependency management (Yarn PnP) ↔️ Source code preservation

View File

@@ -5,22 +5,26 @@ This document explains how our CI/CD pipeline securely handles SSH keys using Do
## 🔒 Security Benefits
### Before (Insecure)
```dockerfile
ARG SSH_PRIVATE_KEY
RUN echo "$SSH_PRIVATE_KEY" > ~/.ssh/id_rsa
```
- ❌ SSH key stored in Docker image layers
- ❌ Visible in `docker history`
- ❌ Can be extracted from images
- ❌ Security vulnerability
### After (Secure)
```dockerfile
RUN --mount=type=secret,id=ssh_private_key \
cp /run/secrets/ssh_private_key ~/.ssh/id_rsa && \
# ... use key ... && \
rm -rf ~/.ssh
```
- ✅ SSH key never stored in image layers
- ✅ Not visible in `docker history`
- ✅ Cannot be extracted from final image
@@ -29,14 +33,17 @@ RUN --mount=type=secret,id=ssh_private_key \
## 🏗️ CI/CD Pipeline Implementation
### Gitea Actions Workflow
The CI workflow files under `.gitea/workflows/` now use:
1. **Docker BuildKit Enabled**
```yaml
export DOCKER_BUILDKIT=1
```
2. **Secure Secret Mounting**
```yaml
# Create temporary SSH key file
echo "${SSH_PRIVATE_KEY}" > /tmp/ssh_key
@@ -52,7 +59,9 @@ The CI workflow files under `.gitea/workflows/` now use:
```
### Local Development
Use the secure build script:
```bash
./scripts/build-cicd-secure.sh plex-playlist-cicd:latest
```
@@ -60,11 +69,14 @@ Use the secure build script:
## 🔧 Required Setup
### 1. Gitea Secrets Configuration
Ensure these secrets are configured in your Gitea repository:
- `SSH_PRIVATE_KEY`: Your private SSH key for git operations
- `GITEA_TOKEN`: Token for pushing to container registry
### 2. Docker BuildKit Support
- **Gitea Actions**: Automatically enabled with `DOCKER_BUILDKIT=1`
- **Local builds**: Requires Docker 18.09+ with BuildKit enabled
- **CI runners**: Ensure BuildKit support in your runner environment
@@ -80,6 +92,7 @@ Ensure these secrets are configured in your Gitea repository:
## 🧪 Testing Security
Verify no secrets in image:
```bash
# Build the image
./scripts/build-cicd-secure.sh test-image