Code documentation

This page holds the user documentation of gcip.

For the code documentation please proceed to the README.md within the gcip folder

Configuring your project to use gcip

Your Gitlab project needs these two files:

MyProject
├ .gitlab-ci.py
└ .gitlab-ci.yml

The .gitlab-ci.yml is the file you already know. It’s only task is to render and trigger the child Pipeline created with the Gitlab CI Python Library. Latter is written into the .gitlab-ci.py. Let’s have a look at this projects .gitlab-ci.yml how this file should look like:

---
generate-pipeline:
  stage: build
  image: python:3.11-slim
  script:
    - pip install pipenv
    - pipenv install --system
    - python .gitlab-ci.py
  artifacts:
    paths:
      - generated-config.yml

run-pipeline:
  stage: deploy
  needs:
    - generate-pipeline
  trigger:
    include:
      - artifact: generated-config.yml
        job: generate-pipeline
    strategy: depend

Your gcip pipeline code then goes into the file named .gitlab-ci.py. The following chapters show to create the pipeline code.

As an alternative to install the gcip in a Python container, you can also use the official Docker image released for every tag. The first job would then look like:

---
generate-pipeline:
  stage: build
  image: thomass/gcip:1.0.0
  script: /usr/src/app/docker/gcip.sh
  artifacts:
    paths:
      - generated-config.yml

Hints regarding the following examples

All the code examples in the following chapters are made for also be run with Pytest. For instance a code example could look like following:

import gcip
from tests import conftest


def test():
    pipeline = gcip.Pipeline()
    pipeline.add_children(gcip.Job(stage="print_date", script="date"))

    conftest.check(pipeline.render())

To transform this pytest into a valid .gitlab-ci.py file your have to:

  • Omit the import from tests import conftest.

  • Put your pipeline code plain into the Python script and not within the def test(): method.

  • Instead of testing rendered pipeline with conftest.check(pipeline.render()) you have to write the generated-pipeline.yml with pipeline.write_yaml().

The real .gitlab-ci.py code derived from the example would look like following:

import gcip

pipeline = gcip.Pipeline()
pipeline.add_children(gcip.Job(stage="print_date", script="date"))

pipeline.write_yaml()

Create a pipeline with one job

Input:

import gcip
from tests import conftest


def test():
    pipeline = gcip.Pipeline()
    pipeline.add_children(gcip.Job(stage="print_date", script="date"))

    conftest.check(pipeline.render())

Remember: As stated in the hints regarding the examples, your real pipeline code must end with pipeline.write_yaml() instead of conftest.check(pipeline.render())!

Output:

stages:
- print_date
print-date:
  stage: print_date
  script:
  - date

Pipeline context manager

You can produce the same output as above with the context manager of the pipeline.

import gcip

with gcip.Pipeline() as pipe:
  pipe.add_children(gcip.Job(stage="print_date", script="date"))

Configure jobs

Jobs can be configured by calling following methods:

Input:

import gcip
from tests import conftest


def test():
    pipeline = gcip.Pipeline()

    job = gcip.Job(stage="print_date", script="date")
    job.set_image("docker/image:example")
    job.prepend_scripts("./before-script.sh")
    job.append_scripts("./after-script.sh")
    job.add_variables(USER="Max Power", URL="https://example.com")
    job.add_tags("test", "europe")
    job.artifacts.add_paths("binaries/", ".config")
    job.append_rules(gcip.Rule(if_statement="$MY_VARIABLE_IS_PRESENT"))

    pipeline.add_children(job)

    conftest.check(pipeline.render())

The prepend_scripts, append_scripts and all add_* methods allow an arbitrary number of positional arguments. That means you can prepend/append/add a single script/variable/tag/…​ or a list of them.

Output:

stages:
- print_date
print-date:
  image:
    name: docker/image:example
  stage: print_date
  script:
  - ./before-script.sh
  - date
  - ./after-script.sh
  variables:
    USER: Max Power
    URL: https://example.com
  rules:
  - if: $MY_VARIABLE_IS_PRESENT
  artifacts:
    name: ci_job_name-my-awsome-feature-branch
    paths:
    - binaries
    - .config
  tags:
  - test
  - europe

Bundling jobs as sequence

You can bundle jobs to a sequence to apply a common configuration for all jobs included. A job sequence has the same configuration methods as shown in the previous example for jobs.

Input:

import gcip
from tests import conftest


def test():
    sequence = gcip.Sequence()

    job1 = gcip.Job(stage="job1", script="script1.sh")
    job1.prepend_scripts("from-job-1.sh")

    sequence.add_children(
        job1,
        gcip.Job(stage="job2", script="script2.sh"),
    )

    sequence.prepend_scripts("from-sequence.sh")

    pipeline = gcip.Pipeline()
    pipeline.add_children(sequence)

    conftest.check(pipeline.render())

As you will see in the output, jobs can have their own configuration (job1.prepend_scripts(...) as well as a common configuration from their sequence (sequence.prepend_scripts(...).

Output:

stages:
- job1
- job2
job1:
  stage: job1
  script:
  - from-sequence.sh
  - from-job-1.sh
  - script1.sh
job2:
  stage: job2
  script:
  - from-sequence.sh
  - script2.sh

Stacking sequences

Input:

import gcip
from tests import conftest


def test():
    sequence_a = gcip.Sequence()
    sequence_a.add_children(gcip.Job(stage="job1", script="script1.sh"))
    sequence_a.prepend_scripts("from-sequence-a.sh")

    sequence_b = gcip.Sequence()
    sequence_b.add_children(sequence_a)
    sequence_b.add_children(gcip.Job(stage="job2", script="script2.sh"))
    sequence_b.prepend_scripts("from-sequence-b.sh")

    pipeline = gcip.Pipeline()
    pipeline.add_children(sequence_b)

    conftest.check(pipeline.render())

Output:

stages:
- job1
- job2
job1:
  stage: job1
  script:
  - from-sequence-b.sh
  - from-sequence-a.sh
  - script1.sh
job2:
  stage: job2
  script:
  - from-sequence-b.sh
  - script2.sh

Pipelines are sequences

Pipelines are an extended version of sequences and have all their abilities (plus pipeline specific abilities), like their configuration options and stacking other sequences.

Input:

import gcip
from tests import conftest


def test():
    sequence_a = gcip.Sequence()
    sequence_a.add_children(gcip.Job(stage="job1", script="script1.sh"))
    sequence_a.prepend_scripts("from-sequence.sh")

    pipeline = gcip.Pipeline()
    pipeline.add_children(sequence_a)
    pipeline.add_children(gcip.Job(stage="job2", script="script2.sh"))
    pipeline.prepend_scripts("from-pipeline.sh")

    conftest.check(pipeline.render())

Output:

stages:
- job1
- job2
job1:
  stage: job1
  script:
  - from-pipeline.sh
  - from-sequence.sh
  - script1.sh
job2:
  stage: job2
  script:
  - from-pipeline.sh
  - script2.sh

Stages allow reuse of jobs and sequences

Assume you want to reuse a parameterized job. Following code shows an incorrect example:

import pytest

from gcip import Job, JobNameConflictError, Pipeline


def job_for(environment: str) -> Job:
    return Job(stage="do_something", script=f"./do-something-on.sh {environment}")


def test():
    pipeline = Pipeline()
    for env in ["development", "test"]:
        pipeline.add_children(job_for(env))

    with pytest.raises(JobNameConflictError):
        pipeline.render()

Rendering this pipeline leads to an error:

JobNameConflictError: Two jobs have the same name 'do-something' when rendering the pipeline.
Please fix this by providing a different name and/or stage when adding those jobs to their sequences/pipeline.

This is because both jobs were added with an identical name to the pipeline. The second job would overwrite the first one.

When adding jobs or sequences to a sequence, the .add_children() method accepts the stage parameter, you should use to modify the name of the jobs added. The value of stages will be appended to the jobs name and stage. This only applies to the jobs (sequences) added but not to the jobs (and sequences) already contained in the sequence.

Reuse jobs

Input:

import gcip
from tests import conftest


def job_for(environment: str) -> gcip.Job:
    return gcip.Job(stage="do_something", script=f"./do-something-on.sh {environment}")


def test():
    pipeline = gcip.Pipeline()
    for env in ["development", "test"]:
        pipeline.add_children(job_for(env), stage=env)

    conftest.check(pipeline.render())

Mention that we added both jobs with a different stage to the sequence. Thus in the output we correctly populate the one job per environment:

Output:

stages:
- do_something_development
- do_something_test
development-do-something:
  stage: do_something_development
  script:
  - ./do-something-on.sh development
test-do-something:
  stage: do_something_test
  script:
  - ./do-something-on.sh test

Reuse sequences

Namespacing is much more useful for reusing sequences. You can define a whole Gitlab CI pipeline within a sequence and reuse that sequence per environment. You simply repeat that sequence in a loop for all environments. Namespacing allows that all jobs of the sequence are populated per environment.

Input:

import gcip
from tests import conftest


def environment_pipeline(environment: str) -> gcip.Sequence:
    sequence = gcip.Sequence()
    sequence.add_children(
        gcip.Job(stage="job1", script=f"job-1-on-{environment}"),
        gcip.Job(stage="job2", script=f"job-2-on-{environment}"),
    )
    return sequence


def test():
    pipeline = gcip.Pipeline()
    for env in ["development", "test"]:
        pipeline.add_children(environment_pipeline(env), stage=env)

    conftest.check(pipeline.render())

Output:

stages:
- job1_development
- job2_development
- job1_test
- job2_test
development-job1:
  stage: job1_development
  script:
  - job-1-on-development
development-job2:
  stage: job2_development
  script:
  - job-2-on-development
test-job1:
  stage: job1_test
  script:
  - job-1-on-test
test-job2:
  stage: job2_test
  script:
  - job-2-on-test

Parallelization - name, stage

As you may have mentioned from the previous examples, all jobs have a distinct stage and thus run in sequence. This is because stage will always extend the jobs name and stage. This applies to all stage parameters, either of the constructor of a Job object or to the .add_*() methods of a sequence.

So when adding jobs to a sequence (either directly or contained in a sequence itself) the goal is to just extend the name of the jobs but not their stage, such that jobs with equal stages run in parallel.

This is possible by setting equal values for the stage parameter but providing different values for the name parameter when creating jobs or adding them to sequences. The value of the name parameter will extend only the name of a job but not its stage.

name parameter when creating jobs

Input:

import gcip
from tests import conftest


def test():
    pipeline = gcip.Pipeline()
    pipeline.add_children(
        gcip.Job(name="job1", stage="single-stage", script="date"),
        gcip.Job(name="job2", stage="single-stage", script="date"),
    )

    conftest.check(pipeline.render())

Output:

stages:
- single_stage
job1-single-stage:
  stage: single_stage
  script:
  - date
job2-single-stage:
  stage: single_stage
  script:
  - date

This time we have chosen an equal value for stage, such that the stages of both jobs will be set equally. To avoid that also the name values of both jobs are equal (and the second job overwrites the first one), we also have provided the name parameter, whose value will be appended to the name of the jobs. Both jobs will run in parallel within the same stage.

First you might wonder, why there is nothing like a stage parameter. When thinking of sequences, the stage parameter will extend both, the name and stage of a job, and the name parameter will just extend the name of a job. Extends means their values will be appended to the current values of name or stage of a job. However there is no need to extend just the stage of a job, such that two jobs have distinct stages but unique names. Unique names means, that the latter job will overwrite all other jobs with the same name, as a Job in Gitlab CI must have a unique name. It is only usefull to extend both values, such that two jobs are different and run in different stages, or only to extend the name of jubs, such that two jobs are different but run in the same stage in parallel. To have the consistent concept of only the name and stage parameter, this applies also to jobs.

Second you might wonder, why we haven’t omit the stage parameter when creating the jobs. This would be possible. But because of the explanation in the previous paragraph, when creating jobs we can’t set the stage value. Omitting the stage parameter means we will not set any value for stage. By default Gitlab CI jobs without a stage value will be in the test stage. To define a stage other than test, we used the stage parameter. Yes - that means that also the jobs name will include the value of the stage. But this design decision will make the concept of name and stage much more clear that also providing a stage parameter for jobs while sequences haven’t such a (useless) stage parameter (because it makes no sense to extend the stage over the name of a job).

Sorry - that was a lot of theory - but simply keep in mind when creating Jobs:

  • Set different values for just the stage parameter when creating distinct jobs which will run in sequence (separate stages).

  • Set different values for just the name parameter when creating distinct jobs which will run in parallel (equal stage).

  • Set different values for the name parameters but equal values for the stage parameters when creating distinct jobs which will run in parallel (equal stage) but defining the name of the stage.

  • Setting different values for both parameters is nonsense and will lead to the first result of distinct jobs which will run in sequence.

name parameter when adding jobs (and sequences) to sequences

Lets take the sequence example from the chapter Stages allow reuse of jobs and sequence and instead of using the stage when adding the sequence several times to the pipeline we now use the name parameter.

Input:

import gcip
from tests import conftest


def environment_pipeline(environment: str) -> gcip.Sequence:
    sequence = gcip.Sequence()
    sequence.add_children(
        gcip.Job(stage="job1", script=f"job-1-on-{environment}"),
        gcip.Job(stage="job2", script=f"job-2-on-{environment}"),
    )
    return sequence


def test():
    pipeline = gcip.Pipeline()
    for env in ["development", "test"]:
        pipeline.add_children(environment_pipeline(env), name=env)

    conftest.check(pipeline.render())

Now the environments run in parallel, because just the job names are populated per environment but not the stage names.

Output:

stages:
- job1
- job2
development-job1:
  stage: job1
  script:
  - job-1-on-development
development-job2:
  stage: job2
  script:
  - job-2-on-development
test-job1:
  stage: job1
  script:
  - job-1-on-test
test-job2:
  stage: job2
  script:
  - job-2-on-test

You can also mix the usage of stage and name. This makes sense when adding lots of jobs where groups of jobs should run sequentially but jobs within a group in parallel. Here an Example:

Input:

import gcip
from tests import conftest


def job_for(service: str) -> gcip.Job:
    return gcip.Job(stage="update_service", script=f"./update-service.sh {service}")


def test():
    pipeline = gcip.Pipeline()
    for env in ["development", "test"]:
        for service in ["service1", "service2"]:
            pipeline.add_children(job_for(f"{service}_{env}"), stage=env, name=service)

    conftest.check(pipeline.render())

As output we get two services updated in parallel but in consecutive stages.

Output:

stages:
- update_service_development
- update_service_test
service1-development-update-service:
  stage: update_service_development
  script:
  - ./update-service.sh service1_development
service2-development-update-service:
  stage: update_service_development
  script:
  - ./update-service.sh service2_development
service1-test-update-service:
  stage: update_service_test
  script:
  - ./update-service.sh service1_test
service2-test-update-service:
  stage: update_service_test
  script:
  - ./update-service.sh service2_test

Batteries included

Until here you have learned everything about the logical functionality of gcip. But gcip does also contain a library of predefined assets you can use for building your pipelines. Those assets are contained in the following modules named by their type:

Following sub chapters provide an example for one asset out of every module.

scripts

Input:

import gcip
from gcip.addons.gitlab.scripts import clone_repository
from tests import conftest


def test():
    pipeline = gcip.Pipeline()
    pipeline.add_children(
        gcip.Job(stage="print_date", script=clone_repository("path/to/group"))
    )

    conftest.check(pipeline.render())

Output:

stages:
- print_date
print-date:
  stage: print_date
  script:
  - git clone --branch main --single-branch https://gitlab-ci-token:${CI_JOB_TOKEN}@${CI_SERVER_HOST}/path/to/group.git

jobs

Input:

import gcip
from gcip.addons.python.jobs.linter import Flake8
from tests import conftest


def test():
    pipeline = gcip.Pipeline()
    pipeline.add_children(Flake8())

    conftest.check(pipeline.render())

Output:

stages:
- lint
flake8-lint:
  stage: lint
  script:
  - pip3 install --upgrade flake8
  - flake8

sequences

Input:

import gcip
from gcip.addons.aws.sequences.cdk import DiffDeploy
from tests import conftest


def test():
    pipeline = gcip.Pipeline()
    sequence = DiffDeploy(stacks=["my-cdk-stack"])
    sequence.deploy_job.toolkit_stack_name = "cdk-toolkit"
    pipeline.add_children(sequence)

    conftest.check(pipeline.render())

Output:

stages:
- diff
- deploy
cdk-diff:
  stage: diff
  script:
  - cdk diff my-cdk-stack
cdk-deploy:
  needs:
  - job: cdk-diff
    artifacts: true
  stage: deploy
  script:
  - pip3 install gcip
  - python3 -m gcip.addons.aws.tools.wait_for_cloudformation_stack_ready --stack-names
    'my-cdk-stack'
  - cdk deploy --require-approval 'never' --strict --toolkit-stack-name cdk-toolkit
    my-cdk-stack

rules

Input:

import gcip
from gcip.lib import rules
from tests import conftest


def test():
    job = gcip.Job(stage="print_date", script="date")
    job.append_rules(
        rules.on_merge_request_events().never(),
        rules.on_master(),
    )

    pipeline = gcip.Pipeline()
    pipeline.add_children(job)

    conftest.check(pipeline.render())

Output:

stages:
- print_date
print-date:
  stage: print_date
  script:
  - date
  rules:
  - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    when: never
  - if: $CI_COMMIT_BRANCH == "master"

Do more with Python

Note
Please note regarding the current version of gcip Currently not all functionality of Gitlab CI is provided by gcip. In the following section is described that you don’t need all the functionality from Gitlab CI, as you can cover some of this in Python. But some functionality must be part of gcip, like configure caching or artifacts, which isn’t implemented yet.

Until here you have learned everything about the functionality of gcip. That is, to sum it up:

  • Creating jobs.

  • Organizing job hierarchies with sequences.

  • Configuring jobs directly or at hierarchy level over sequences.

  • Namespacing and parallelization.

  • Predefined assets.

With the few functionalities of gcip and the capabilities of Python, there is nothing left to create every pipeline you can imagine. Gitlab CI provides much more constructs you may miss here, but most of them are clunky workarounds as cause of the limited logic capabilities of the Domain Specific Script "Language" of Gitlab CI. You don’t need them, when you can design your pipelines in Python. Here a few examples:

  • You don’t need templates (the extends keyword or YAML anchors), because you can reuse jobs and sequences.

  • You don’t need before_script, after_script or global configurations, because you can do configurations at an arbitrary level in the sequences hierarchy. All configurations will finally be populated down to the jobs.

  • You didn’t have to keep struggling with rules at pipeline and job level. In gcipd you can configure rules at an arbitrary level in the sequences hierarchy.

Furthermore you can leverage all the power of a programming language, to dynamically design your pipelies. Here some ideas:

  • Bundle jobs in sequences and use loops to populate the sequences over a list of environments.

  • Use if-then-else expressions to create jobs within job sequences depending on environment information or requirements.

  • Access information from outside your pipeline script you use for decision making inside your pipeline script.

Beyond the basics

This chapter covers further abilities of GCIP which suffices to be read after the basics.

TriggerJobs

Besides normal Jobs with GCIP you can define TriggerJobs which either run another projects pipeline or a child-pipeline.

Here an example for triggering another pipeline:

Input:

from gcip import Pipeline, TriggerJob, TriggerStrategy
from tests import conftest


def test():
    pipeline = Pipeline()
    pipeline.add_children(
        TriggerJob(
            stage="trigger-banana",
            project="myteam/banana",
            branch="test",
            strategy=TriggerStrategy.DEPEND,
        )
    )

    conftest.check(pipeline.render())

Output:

stages:
- trigger_banana
trigger-banana:
  trigger:
    project: myteam/banana
    branch: test
    strategy: depend
  stage: trigger_banana

Here an example for triggering a child pipeline:

Input:

from gcip import (
    IncludeLocal,
    Pipeline,
    TriggerJob,
    TriggerStrategy,
)
from tests import conftest


def test():
    pipeline = Pipeline()
    pipeline.add_children(
        TriggerJob(
            stage="trigger-subpipe",
            includes=IncludeLocal("./my-subpipe.yml"),
            strategy=TriggerStrategy.DEPEND,
        )
    )

    conftest.check(pipeline.render())

Output:

stages:
- trigger_subpipe
trigger-subpipe:
  trigger:
    include:
    - local: ./my-subpipe.yml
    strategy: depend
  stage: trigger_subpipe

PagesJob for Gitlab Pages

For creating Gitlab Pages you need:

  1. Content under the repository path ./public

  2. The special job gcip.PagesJob which deploys those artifacts to Gitlab Pages.

The first condition could be fulfilled by either having static content in the repository under the ./public path or having one or more jobs generating that content under the artifacts path ./public.

The module gcip.addons.gitlab.jobs.pages contains predefined jobs generating html content from different sources and storing them under the artifacts path ./public. Here an example how to generate and deploy Gitlab Pages:

Input:

from gcip import PagesJob, Pipeline
from gcip.addons.gitlab.jobs.pages import AsciiDoctor
from tests import conftest


def test():
    pipeline = Pipeline()
    pipeline.add_children(
        AsciiDoctor(source="docs/index.adoc", out_file="/index.html"),
        PagesJob(),
    )

    conftest.check(pipeline.render())

Output:

stages:
- build
- pages
asciidoctor-pages-build:
  image:
    name: ruby:3-alpine
  stage: build
  script:
  - gem install asciidoctor
  - asciidoctor docs/index.adoc -o public/index.html
  artifacts:
    name: ci_job_name-my-awsome-feature-branch
    paths:
    - public
pages:
  image:
    name: busybox:latest
  stage: pages
  script:
  - echo 'Publishing Gitlab Pages'
  artifacts:
    name: ci_job_name-my-awsome-feature-branch
    paths:
    - public

Prefill variables in manual pipelines

One may ask how to use prefilled variables that can be configured for manual started pipelines, like described in the official documentation. As the gcip pipeline is started as a child pipeline, its pipeline code is first available when rendered by the parent pipeline, whereas the parent pipeline is the .gitlab-ci.yml file. Thus it is not possible to define prefilled variables within the gcip pipeline itself, because the rendered pipeline script is not available, when the Gitlab CI GUI evaluates the (parent) pipeline.

The way to go is to define the prefill variables in the parent pipeline and then pass them to the gcip child pipeline. Here is an example:

---
variables:
  MY_PREFILLED_VARIABLE:
    value: "This value is not good enough."
    description: "Please provide a better value."

generate-pipeline:
  stage: build
  image: thomass/gcip:latest
  script: /usr/src/app/docker/gcip.sh
  artifacts:
    paths:
      - generated-config.yml

run-pipeline:
  stage: deploy
  needs:
    - generate-pipeline
  trigger:
    include:
      - artifact: generated-config.yml
        job: generate-pipeline
    strategy: depend
  variables:
    GCIP_MY_PREFILLED_VARIABLE: $MY_PREFILLED_VARIABLE

This is the same parent pipeline from the chapter Configuring your project to use gcip but with prefilled variables.

Please note that the variable defined at pipeline level is passed with a different variable name to the gcip child pipeline. This is necessary because of a Bug in Gitlab (issue 213729), where variables not correctly being passed to child pipelines. In this example we simply prepended 'GCIP_' to the variable passed to the child pipeline. You can access this variable either by the jobs generated by your gcip script - or directly within your gcip script during pipeline generation time. Latter ist just a matter of python code and could look like following:

# this is our Python gcip code
import os
...
MY_PREFILLED_VARIABLE = os.getenv('GCIP_MY_PREFILLED_VARIABLE')
...

string Job / Sequence modifications together

Every modification method of Job and Sequence returns the appropriate Job / Sequence object. Thus you can string multiple modifications methods together. Here an example for the job configuration.

Input:

from gcip import Job, Pipeline, Rule
from tests import conftest


def test():
    pipeline = Pipeline()
    job = Job(stage="print_date", script="date")
    job.set_image("docker/image:example")
    job.prepend_scripts("./before-script.sh")
    job.append_scripts("./after-script.sh")
    job.add_variables(USER="Max Power", URL="https://example.com")
    job.add_tags("test", "europe")
    job.append_rules(Rule(if_statement="$MY_VARIABLE_IS_PRESENT"))
    job.artifacts.add_paths("binaries/", ".config")
    pipeline.add_children(job)

    conftest.check(pipeline.render())

Output:

stages:
- print_date
print-date:
  image:
    name: docker/image:example
  stage: print_date
  script:
  - ./before-script.sh
  - date
  - ./after-script.sh
  variables:
    USER: Max Power
    URL: https://example.com
  rules:
  - if: $MY_VARIABLE_IS_PRESENT
  artifacts:
    name: ci_job_name-my-awsome-feature-branch
    paths:
    - binaries
    - .config
  tags:
  - test
  - europe

The same works with sequences.

Find (and modify) Jobs by their attributes

With Sequence.find_jobs() you will get a powerful tool. But you should use it with care, because you can bring much confusion to your pipeline code. This is because it introduces a third way of modifying Jobs. Till here you learned how to create and modify jobs directly and how to indirectly modify job attributes by setting those attribute on sequences, which is both a very structured approach. Now you learn how search jobs depending on their current attributes and modify them. This is an unstructrured approach, because you don’t take care of which jobs exactly you modify. You only care about the state of some jobs before and after your modification.

Let’s get into action, as this tool might be really helpful for you.

Imagine you have a really huge pipeline script and, for example, include a lot of jobs from sequences you haven’t created on your own. Now there are a couple of jobs that use the docker image foo/bar:latest. However you want for all jobs, which are tagged with prd, to change the Docker image tag to stable. Be aware that the following actual example is far more simple than the scenario described. But it reflects the scenario and is simple enough to understand the mechanics.

Input:

from gcip import Job, JobFilter, Pipeline
from tests import conftest


def test():
    dev_job = Job(stage="build-dev", script="do_something development")
    prd_job = Job(stage="build-prd", script="do_something production")

    dev_job.set_image("foo/bar:latest")
    prd_job.set_image("foo/bar:latest")

    dev_job.add_tags("dev")
    prd_job.add_tags("prd")

    pipeline = Pipeline()
    pipeline.add_children(dev_job, prd_job)

    # Imagine the upper pipeline is far more complex then what you see.
    # Then the interesting part starts here:

    filter = JobFilter(image="foo/bar:.*", tags="prd")

    for job in pipeline.find_jobs(filter):
        job.set_image("foo/bar:stable")

    conftest.check(pipeline.render())

Output:

stages:
- build_dev
- build_prd
build-dev:
  image:
    name: foo/bar:latest
  stage: build_dev
  script:
  - do_something development
  tags:
  - dev
build-prd:
  image:
    name: foo/bar:stable
  stage: build_prd
  script:
  - do_something production
  tags:
  - prd

With Sequence.find_jobs() you can pass a JobFilter to return all jobs that matches the filter conditions. On the jobs returned you can change whatever the Job class allows. The JobFilter allows to filter for all attributes a Job typically has. For most of the job filters parameters you can pass regular expressions, for pattern matching of job attributes. In the example above we were looking for foo/bar:.* images with any tag.

The jobs returned by Sequence.find_jobs() must match all attributes from the JobFilter (logical conjunction / AND). You can pass multiple JobFilter to the Sequence.find_jobs() method. The jobs returned must match at least one of those filters (logical disjunction / OR)

Be aware that the result of Sequence.find_jobs() depend on the current state of the sequence you are calling this method on. If you would have searched the prd_job before adding him to the pipeline, then Sequence.find_jobs() would have returned nothing. The method also just looks downward the sequence you are calling this method. A rule of thumb is to apply modifications on Sequence.find_jobs() only at the very end of you code and only to the pipeline sequence. However the gcip implements this feature on the sequence level, to allow special cases where you just want to search for jobs within a child sequence.

The Sequence.find_jobs() method has some traps you can step into. They are mainly related to inherited attributes from sequences. By default the Sequence.find_jobs() method is just looking for attributes set on jobs itself and NOT for attributes jobs would inherit from its sequences. Imagine following yaml output of an gcip pipeline.

Output:

stages:
- build_dev
- build_prd
dev-build:
  image:
    name: foo/bar:latest
  stage: build_dev
  script:
  - do_something development
  tags:
  - dev
prd-build:
  image:
    name: foo/bar:latest
  stage: build_prd
  script:
  - do_something development
  tags:
  - prd

The output of this pipeline is the same than from the previous example but without job modifications. Now imagine you want to make the same modifications than before…​

filter = JobFilter(image="foo/bar:.*", tags="prd")
for job in pipeline.find_jobs(filter):
    job.set_image("foo/bar:stable")

…​but the output remains the same. This could happen, if the attributes you are filtering for are not directly set to the job but inherited from its sequences:

Input:

from gcip import Job, JobFilter, Pipeline
from gcip.core.sequence import Sequence
from tests import conftest


def test():
    job = Job(stage="build", script="do_something development")
    job.set_image("foo/bar:latest")

    dev_sequence = Sequence().add_children(job, stage="dev")
    prd_sequence = Sequence().add_children(job, stage="prd")

    dev_sequence.add_tags("dev")
    prd_sequence.add_tags("prd")

    pipeline = Pipeline()
    pipeline.add_children(dev_sequence, prd_sequence)

    # The following filter returns no jobs, as the tags are attributes
    # of the sequences and `find_jobs()` is setup to not look for
    # inherited attributes.

    filter = JobFilter(image="foo/bar:.*", tags="prd")

    for job in pipeline.find_jobs(filter):
        job.set_image("foo/bar:stable")

    conftest.check(pipeline.render())

What you might want to do is adding include_sequence_attributes=True to the Sequence.find_jobs() method:

Input:

from gcip import Job, JobFilter, Pipeline
from gcip.core.sequence import Sequence
from tests import conftest


def test():
    job = Job(stage="build", script="do_something development")
    job.set_image("foo/bar:latest")

    dev_sequence = Sequence().add_children(job, stage="dev")
    prd_sequence = Sequence().add_children(job, stage="prd")

    dev_sequence.add_tags("dev")
    prd_sequence.add_tags("prd")

    pipeline = Pipeline()
    pipeline.add_children(dev_sequence, prd_sequence)

    filter = JobFilter(image="foo/bar:.*", tags="prd")

    for job in pipeline.find_jobs(filter, include_sequence_attributes=True):
        job.set_image("foo/bar:stable")

    conftest.check(pipeline.render())

But beware! In the output yaml both jobs becomes modified and have the image foo/bar:stable set:

Output:

stages:
- build_dev
- build_prd
dev-build:
  image:
    name: foo/bar:stable
  stage: build_dev
  script:
  - do_something development
  tags:
  - dev
prd-build:
  image:
    name: foo/bar:stable
  stage: build_prd
  script:
  - do_something development
  tags:
  - prd

This is because Sequence.find_job() did find the job within the prd_sequence but the same job is added to both sequences. Thus you modify just one job, but the change affects both rendered instances of this job in the output yaml file.

You now should know both working modes of the Sequence.find_jobs() methods and their limitations or drawbacks:

  • include_sequence_attributes=False (default) - will just return jobs whose matching attributes are directly set on that job.

  • include_sequence_attributes=True - will return all jobs whose attributes matches, including attributes inherited from sequences. If a job is included in multiple sequences and the matching attribute is inherited from just one sequences, changes on that job will affect all rendered instances of that job in all sequences.

Author

GCIP was created by Thomas Steinbach in 2020.

Thanks to initial contributions from Daniel von Eßen

Licence

The content of this repository is licensed under the Apache 2.0 license.

Copyright DB Systel GmbH