CI/CD流水线最佳实践与实现

软件部署

CI/CD流水线最佳实践与实现

CI/CD(持续集成/持续部署)是现代软件开发的核心实践。本文将详细介绍CI/CD流水线的最佳实践和实现方法。

CI/CD基础概念

1. 持续集成(CI)

核心实践

  • 代码频繁提交(每日多次)
  • 自动化构建
  • 自动化测试
  • 快速反馈

收益

  • 及早发现问题
  • 减少集成风险
  • 提高代码质量
  • 加快交付速度

2. 持续交付/部署(CD)

持续交付

  • 代码自动构建、测试
  • 可手动部署到生产环境
  • 保留发布决策权

持续部署

  • 代码自动构建、测试、部署
  • 全自动发布到生产环境
  • 需要完善的测试和监控

流水线设计

1. 流水线阶段

代码提交
    ↓
代码检查(Lint、Format)
    ↓
单元测试
    ↓
构建打包
    ↓
集成测试
    ↓
安全扫描
    ↓
制品推送
    ↓
部署到测试环境
    ↓
验收测试
    ↓
部署到预发布环境
    ↓
部署到生产环境
    ↓
监控验证

2. 设计原则

快速反馈

  • 快速阶段优先
  • 失败快速终止
  • 并行执行

安全可靠

  • 自动化测试覆盖
  • 安全扫描集成
  • 人工审批关卡

可追溯

  • 版本控制
  • 制品管理
  • 审计日志

GitLab CI实现

1. 基础配置

# .gitlab-ci.yml
stages:
  - validate
  - build
  - test
  - security
  - deploy

variables:
  DOCKER_REGISTRY: registry.gitlab.com
  IMAGE_NAME: $DOCKER_REGISTRY/$CI_PROJECT_PATH
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: ""

# 全局默认配置
default:
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY

# 缓存配置
.npm_cache: &npm_cache
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
    policy: pull-push

# 作业模板
.build_template: &build_definition
  stage: build
  script:
    - docker build -t $IMAGE_NAME:$CI_COMMIT_SHA .
    - docker push $IMAGE_NAME:$CI_COMMIT_SHA

2. 完整流水线示例

stages:
  - validate
  - build
  - test
  - security
  - package
  - deploy

# 代码检查
lint:
  stage: validate
  image: node:18
  <<: *npm_cache
  script:
    - npm ci
    - npm run lint
    - npm run format:check
  only:
    - merge_requests
    - main

# 单元测试
unit-test:
  stage: test
  image: node:18
  <<: *npm_cache
  script:
    - npm ci
    - npm run test:unit -- --coverage
  coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
      junit: junit.xml
    paths:
      - coverage/
    expire_in: 1 week
  parallel:
    matrix:
      - NODE_VERSION: ["16", "18", "20"]

# 构建应用
build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  script:
    - docker build 
      --build-arg NODE_ENV=production
      --cache-from $IMAGE_NAME:latest
      -t $IMAGE_NAME:$CI_COMMIT_SHA
      -t $IMAGE_NAME:$CI_COMMIT_REF_SLUG
      -f Dockerfile.prod .
    - docker push $IMAGE_NAME:$CI_COMMIT_SHA
    - docker push $IMAGE_NAME:$CI_COMMIT_REF_SLUG
  only:
    - main
    - tags

# 集成测试
integration-test:
  stage: test
  image: docker/compose:latest
  services:
    - docker:24-dind
  script:
    - docker-compose -f docker-compose.test.yml up --abort-on-container-exit
  artifacts:
    when: always
    reports:
      junit: test-results/integration.xml

# 安全扫描
security-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    # 镜像扫描
    - trivy image --exit-code 1 --severity HIGH,CRITICAL 
      --format template --template "@contrib/sarif.tpl" 
      -o trivy-results.sarif $IMAGE_NAME:$CI_COMMIT_SHA
    # 代码扫描
    - trivy filesystem --scanners vuln,secret,config 
      --severity HIGH,CRITICAL .
  artifacts:
    reports:
      sast: trivy-results.sarif
  allow_failure: true

# SAST扫描
sast:
  stage: security
  image: returntocorp/semgrep-agent:v1
  script:
    - semgrep-agent 
      --config=auto 
      --config=p/security-audit
      --config=p/owasp-top-ten
      --json --output=semgrep-results.json
  artifacts:
    reports:
      sast: semgrep-results.json
  allow_failure: true

# 制品推送
push-image:
  stage: package
  image: docker:24
  services:
    - docker:24-dind
  script:
    - docker pull $IMAGE_NAME:$CI_COMMIT_SHA
    - docker tag $IMAGE_NAME:$CI_COMMIT_SHA $IMAGE_NAME:latest
    - docker tag $IMAGE_NAME:$CI_COMMIT_SHA $IMAGE_NAME:$CI_COMMIT_TAG
    - docker push $IMAGE_NAME:latest
    - docker push $IMAGE_NAME:$CI_COMMIT_TAG
  only:
    - tags

# 部署到开发环境
deploy-dev:
  stage: deploy
  image: bitnami/kubectl:latest
  environment:
    name: development
    url: https://dev.example.com
  script:
    - kubectl config use-context dev
    - helm upgrade --install myapp ./helm-chart
      --namespace dev
      --set image.tag=$CI_COMMIT_SHA
      --set environment=dev
      --wait
      --timeout 5m
  only:
    - main

# 部署到测试环境
deploy-staging:
  stage: deploy
  image: bitnami/kubectl:latest
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - kubectl config use-context staging
    - helm upgrade --install myapp ./helm-chart
      --namespace staging
      --set image.tag=$CI_COMMIT_SHA
      --set environment=staging
      --wait
      --timeout 10m
  only:
    - main
  when: manual

# 部署到生产环境
deploy-production:
  stage: deploy
  image: bitnami/kubectl:latest
  environment:
    name: production
    url: https://example.com
  script:
    - kubectl config use-context production
    - helm upgrade --install myapp ./helm-chart
      --namespace production
      --set image.tag=$CI_COMMIT_TAG
      --set environment=production
      --wait
      --timeout 15m
  only:
    - tags
  when: manual
  needs:
    - job: push-image
    - job: security-scan

3. 高级特性

动态流水线

workflow:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"
    - if: $CI_COMMIT_TAG

# 条件作业
database-migration:
  script:
    - npm run migrate
  rules:
    - changes:
        - migrations/*
      when: always
    - when: never

# 父子流水线
trigger-child:
  trigger:
    include:
      - local: '/microservices/service-a/.gitlab-ci.yml'
    strategy: depend

矩阵构建

# 多环境测试
test:
  parallel:
    matrix:
      - PROVIDER: [aws, gcp, azure]
        STACK: [cfn, terraform, pulumi]
  script:
    - echo "Testing $PROVIDER with $STACK"

GitHub Actions实现

1. 基础工作流

# .github/workflows/ci.yml
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
    tags: [ 'v*' ]
  pull_request:
    branches: [ main ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # 代码检查
  lint:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: '18'
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Run ESLint
      run: npm run lint
    
    - name: Check formatting
      run: npm run format:check

  # 单元测试
  unit-test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [16, 18, 20]
    steps:
    - uses: actions/checkout@v4
    
    - name: Setup Node.js ${{ matrix.node-version }}
      uses: actions/setup-node@v4
      with:
        node-version: ${{ matrix.node-version }}
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Run tests
      run: npm run test:unit -- --coverage
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        files: ./coverage/lcov.info

  # 构建镜像
  build:
    runs-on: ubuntu-latest
    needs: [lint, unit-test]
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
    steps:
    - uses: actions/checkout@v4
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3
    
    - name: Login to Container Registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=semver,pattern={{version}}
          type=semver,pattern={{major}}.{{minor}}
          type=sha,prefix={{branch}}-
    
    - name: Build and push
      uses: docker/build-push-action@v5
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max
        platforms: linux/amd64,linux/arm64

  # 安全扫描
  security-scan:
    runs-on: ubuntu-latest
    needs: build
    permissions:
      contents: read
      security-events: write
    steps:
    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload Trivy scan results
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  # 集成测试
  integration-test:
    runs-on: ubuntu-latest
    needs: build
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432
      redis:
        image: redis:7
        ports:
          - 6379:6379
    steps:
    - uses: actions/checkout@v4
    
    - name: Run integration tests
      run: |
        docker run --network host \
          -e DATABASE_URL=postgresql://postgres:postgres@localhost:5432/test \
          -e REDIS_URL=redis://localhost:6379 \
          ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
          npm run test:integration

  # 部署到开发环境
  deploy-dev:
    runs-on: ubuntu-latest
    needs: [build, integration-test]
    if: github.ref == 'refs/heads/main'
    environment:
      name: development
      url: https://dev.example.com
    steps:
    - uses: actions/checkout@v4
    
    - name: Setup kubectl
      uses: azure/setup-kubectl@v3
    
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1
    
    - name: Update kubeconfig
      run: aws eks update-kubeconfig --name dev-cluster
    
    - name: Deploy to EKS
      run: |
        helm upgrade --install myapp ./helm-chart \
          --namespace dev \
          --set image.tag=${{ github.sha }} \
          --set environment=dev \
          --wait

  # 部署到生产环境
  deploy-production:
    runs-on: ubuntu-latest
    needs: [security-scan, integration-test]
    if: startsWith(github.ref, 'refs/tags/v')
    environment:
      name: production
      url: https://example.com
    steps:
    - uses: actions/checkout@v4
    
    - name: Deploy to production
      run: |
        echo "Deploying ${{ github.ref_name }} to production"
        # 生产部署步骤

2. 可复用工作流

# .github/workflows/reusable-build.yml
name: Reusable Build

on:
  workflow_call:
    inputs:
      node-version:
        required: true
        type: string
      environment:
        required: true
        type: string
    secrets:
      registry-token:
        required: true
    outputs:
      image-tag:
        description: "Built image tag"
        value: ${{ jobs.build.outputs.tag }}

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      tag: ${{ steps.meta.outputs.tags }}
    steps:
    - uses: actions/checkout@v4
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ inputs.node-version }}
    
    - name: Build
      run: |
        npm ci
        npm run build
    
    - name: Push to registry
      run: |
        echo "Pushing to ${{ inputs.environment }}"

调用可复用工作流

# .github/workflows/main.yml
name: Main CI

on:
  push:
    branches: [main]

jobs:
  build-dev:
    uses: ./.github/workflows/reusable-build.yml
    with:
      node-version: '18'
      environment: 'development'
    secrets:
      registry-token: ${{ secrets.REGISTRY_TOKEN }}

制品管理

1. Docker镜像管理

镜像标签策略

分支构建:
- main: latest, main-{sha}
- feature/*: feature-{branch}-{sha}
- PR: pr-{number}-{sha}

标签构建:
- v1.2.3: 1.2.3, 1.2, 1
- v1.2.3-rc1: 1.2.3-rc1

镜像清理

# .github/workflows/cleanup.yml
name: Cleanup Old Images

on:
  schedule:
    - cron: '0 0 * * 0'  # 每周日

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
    - name: Delete old container images
      uses: snok/container-retention-policy@v2
      with:
        image-names: myapp
        cut-off: 30 days ago UTC
        account-type: org
        org-name: myorg
        keep-at-least: 10
        token: ${{ secrets.GITHUB_TOKEN }}

2. Helm Chart管理

# Chart.yaml
apiVersion: v2
name: myapp
description: A Helm chart for myapp
type: application
version: 1.0.0
appVersion: "1.0.0"
dependencies:
  - name: postgresql
    version: 12.x.x
    repository: https://charts.bitnami.com/bitnami
    condition: postgresql.enabled

Chart发布

# .github/workflows/release-chart.yml
name: Release Chart

on:
  push:
    branches: [main]
    paths:
      - 'helm-chart/**'

jobs:
  release:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
    
    - name: Configure Git
      run: |
        git config user.name "$GITHUB_ACTOR"
        git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
    
    - name: Install Helm
      uses: azure/setup-helm@v3
    
    - name: Run chart-releaser
      uses: helm/chart-releaser-action@v1.6.0
      with:
        charts_dir: helm-chart
      env:
        CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

安全最佳实践

1. 密钥管理

GitLab CI

# 使用CI/CD变量
script:
  - echo "$PRODUCTION_KEY" > key.pem
  - chmod 600 key.pem

# 使用HashiCorp Vault
vault-secrets:
  image: vault:latest
  script:
    - export VAULT_TOKEN=$(vault write -field=token auth/jwt/login role=ci jwt=$CI_JOB_JWT)
    - vault kv get -field=password secret/data/db

GitHub Actions

# 使用Secrets
- name: Deploy
  env:
    API_KEY: ${{ secrets.API_KEY }}
    DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
  run: |
    echo "Deploying with API key"

# 使用OIDC
- name: Configure AWS Credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789:role/my-role
    aws-region: us-east-1

2. 安全扫描

SAST扫描

sast:
  stage: security
  image: returntocorp/semgrep-agent:v1
  script:
    - semgrep-agent
      --config=auto
      --config=p/security-audit
      --config=p/owasp-top-ten
      --config=p/cwe-top-25
  artifacts:
    reports:
      sast: gl-sast-report.json

依赖扫描

dependency-scanning:
  stage: security
  image: node:18
  script:
    - npm audit --audit-level=moderate --json > dependency-scan-report.json
  artifacts:
    reports:
      dependency_scanning: dependency-scan-report.json
  allow_failure: true

监控与度量

1. DORA指标

# 收集部署频率
deploy-metrics:
  stage: .post
  script:
    - |
      curl -X POST https://metrics.example.com/dora \
        -H "Content-Type: application/json" \
        -d '{
          "event": "deployment",
          "timestamp": "'$(date -Iseconds)'",
          "environment": "'"$CI_ENVIRONMENT_NAME"'",
          "commit_sha": "'"$CI_COMMIT_SHA"'",
          "duration": "'"$CI_JOB_DURATION"'"
        }'

2. 流水线性能监控

# 发送指标到Prometheus Pushgateway
pipeline-metrics:
  stage: .post
  script:
    - |
      cat <<EOF | curl --data-binary @- http://pushgateway:9091/metrics/job/gitlab-ci
      # HELP gitlab_ci_pipeline_duration_seconds Pipeline duration
      # TYPE gitlab_ci_pipeline_duration_seconds gauge
      gitlab_ci_pipeline_duration_seconds{pipeline_id="$CI_PIPELINE_ID"} $CI_PIPELINE_DURATION
      # HELP gitlab_ci_job_duration_seconds Job duration
      # TYPE gitlab_ci_job_duration_seconds gauge
      gitlab_ci_job_duration_seconds{job_name="$CI_JOB_NAME"} $CI_JOB_DURATION
      EOF

总结

CI/CD流水线的最佳实践包括:

  1. 快速反馈:快速阶段优先,失败快速终止
  2. 全面测试:单元测试、集成测试、安全扫描
  3. 制品管理:版本控制、镜像管理、依赖管理
  4. 安全加固:密钥管理、安全扫描、访问控制
  5. 可观测性:日志、指标、追踪
  6. 持续优化:度量分析、流程改进

通过合理的CI/CD流水线设计,可以实现快速、安全、可靠的软件交付。