A Python Library for creating dynamic pipelines for Gitlab CI in Python.
Furthermore the Gitlab Ci Python Library is called gcip.
- Code documentation
- Configuring your project to use gcip
- Hints regarding the following examples
- Create a pipeline with one job
- Configure jobs
- Bundling jobs as sequence
- Stacking sequences
- Pipelines are sequences
- Stages allow reuse of jobs and sequences
- Reuse sequences
- Parallelization - name, stage
- Batteries included
- Do more with Python
- Beyond the basics
- Author
- Licence
Code documentation
This page holds the user documentation of gcip.
For the code documentation please proceed to the README.md within the gcip folder
Configuring your project to use gcip
Your Gitlab project needs these two files:
MyProject
├ .gitlab-ci.py
└ .gitlab-ci.yml
The .gitlab-ci.yml
is the file you already know. It’s only task is to
render and trigger the child Pipeline created with the Gitlab CI Python Library.
Latter is written into the .gitlab-ci.py
. Let’s have a look at this projects
.gitlab-ci.yml
how this file should look like:
---
generate-pipeline:
stage: build
image: python:3.11-slim
script:
- pip install pipenv
- pipenv install --system
- python .gitlab-ci.py
artifacts:
paths:
- generated-config.yml
run-pipeline:
stage: deploy
needs:
- generate-pipeline
trigger:
include:
- artifact: generated-config.yml
job: generate-pipeline
strategy: depend
Your gcip pipeline code then goes into the file named .gitlab-ci.py
. The following
chapters show to create the pipeline code.
As an alternative to install the gcip in a Python container, you can also use the official Docker image released for every tag. The first job would then look like:
---
generate-pipeline:
stage: build
image: thomass/gcip:1.0.0
script: /usr/src/app/docker/gcip.sh
artifacts:
paths:
- generated-config.yml
Hints regarding the following examples
All the code examples in the following chapters are made for also be run with Pytest. For instance a code example could look like following:
import gcip
from tests import conftest
def test():
pipeline = gcip.Pipeline()
pipeline.add_children(gcip.Job(stage="print_date", script="date"))
conftest.check(pipeline.render())
To transform this pytest into a valid .gitlab-ci.py
file your have to:
-
Omit the import
from tests import conftest
. -
Put your pipeline code plain into the Python script and not within the
def test():
method. -
Instead of testing rendered pipeline with
conftest.check(pipeline.render())
you have to write thegenerated-pipeline.yml
withpipeline.write_yaml()
.
The real .gitlab-ci.py
code derived from the example would look like following:
import gcip
pipeline = gcip.Pipeline()
pipeline.add_children(gcip.Job(stage="print_date", script="date"))
pipeline.write_yaml()
Create a pipeline with one job
Input:
import gcip
from tests import conftest
def test():
pipeline = gcip.Pipeline()
pipeline.add_children(gcip.Job(stage="print_date", script="date"))
conftest.check(pipeline.render())
Remember: As stated in the hints regarding the examples,
your real pipeline code must end with pipeline.write_yaml()
instead of conftest.check(pipeline.render())
!
Output:
stages:
- print_date
print-date:
stage: print_date
script:
- date
Pipeline context manager
You can produce the same output as above with the context manager of the pipeline.
import gcip with gcip.Pipeline() as pipe: pipe.add_children(gcip.Job(stage="print_date", script="date"))
Configure jobs
Jobs can be configured by calling following methods:
Input:
import gcip
from tests import conftest
def test():
pipeline = gcip.Pipeline()
job = gcip.Job(stage="print_date", script="date")
job.set_image("docker/image:example")
job.prepend_scripts("./before-script.sh")
job.append_scripts("./after-script.sh")
job.add_variables(USER="Max Power", URL="https://example.com")
job.add_tags("test", "europe")
job.artifacts.add_paths("binaries/", ".config")
job.append_rules(gcip.Rule(if_statement="$MY_VARIABLE_IS_PRESENT"))
pipeline.add_children(job)
conftest.check(pipeline.render())
The prepend_scripts
, append_scripts
and all add_*
methods allow an arbitrary number of positional arguments.
That means you can prepend/append/add a single script/variable/tag/… or a list of them.
Output:
stages:
- print_date
print-date:
image:
name: docker/image:example
stage: print_date
script:
- ./before-script.sh
- date
- ./after-script.sh
variables:
USER: Max Power
URL: https://example.com
rules:
- if: $MY_VARIABLE_IS_PRESENT
artifacts:
name: ci_job_name-my-awsome-feature-branch
paths:
- binaries
- .config
tags:
- test
- europe
Bundling jobs as sequence
You can bundle jobs to a sequence to apply a common configuration for all jobs included. A job sequence has the same configuration methods as shown in the previous example for jobs.
Input:
import gcip
from tests import conftest
def test():
sequence = gcip.Sequence()
job1 = gcip.Job(stage="job1", script="script1.sh")
job1.prepend_scripts("from-job-1.sh")
sequence.add_children(
job1,
gcip.Job(stage="job2", script="script2.sh"),
)
sequence.prepend_scripts("from-sequence.sh")
pipeline = gcip.Pipeline()
pipeline.add_children(sequence)
conftest.check(pipeline.render())
As you will see in the output, jobs can have their own configuration (job1.prepend_scripts(...
)
as well as a common configuration from their sequence (sequence.prepend_scripts(...
).
Output:
stages:
- job1
- job2
job1:
stage: job1
script:
- from-sequence.sh
- from-job-1.sh
- script1.sh
job2:
stage: job2
script:
- from-sequence.sh
- script2.sh
Stacking sequences
Input:
import gcip
from tests import conftest
def test():
sequence_a = gcip.Sequence()
sequence_a.add_children(gcip.Job(stage="job1", script="script1.sh"))
sequence_a.prepend_scripts("from-sequence-a.sh")
sequence_b = gcip.Sequence()
sequence_b.add_children(sequence_a)
sequence_b.add_children(gcip.Job(stage="job2", script="script2.sh"))
sequence_b.prepend_scripts("from-sequence-b.sh")
pipeline = gcip.Pipeline()
pipeline.add_children(sequence_b)
conftest.check(pipeline.render())
Output:
stages:
- job1
- job2
job1:
stage: job1
script:
- from-sequence-b.sh
- from-sequence-a.sh
- script1.sh
job2:
stage: job2
script:
- from-sequence-b.sh
- script2.sh
Pipelines are sequences
Pipelines are an extended version of sequences and have all their abilities (plus pipeline specific abilities), like their configuration options and stacking other sequences.
Input:
import gcip
from tests import conftest
def test():
sequence_a = gcip.Sequence()
sequence_a.add_children(gcip.Job(stage="job1", script="script1.sh"))
sequence_a.prepend_scripts("from-sequence.sh")
pipeline = gcip.Pipeline()
pipeline.add_children(sequence_a)
pipeline.add_children(gcip.Job(stage="job2", script="script2.sh"))
pipeline.prepend_scripts("from-pipeline.sh")
conftest.check(pipeline.render())
Output:
stages:
- job1
- job2
job1:
stage: job1
script:
- from-pipeline.sh
- from-sequence.sh
- script1.sh
job2:
stage: job2
script:
- from-pipeline.sh
- script2.sh
Stages allow reuse of jobs and sequences
Assume you want to reuse a parameterized job. Following code shows an incorrect example:
import pytest
from gcip import Job, JobNameConflictError, Pipeline
def job_for(environment: str) -> Job:
return Job(stage="do_something", script=f"./do-something-on.sh {environment}")
def test():
pipeline = Pipeline()
for env in ["development", "test"]:
pipeline.add_children(job_for(env))
with pytest.raises(JobNameConflictError):
pipeline.render()
Rendering this pipeline leads to an error:
JobNameConflictError: Two jobs have the same name 'do-something' when rendering the pipeline.
Please fix this by providing a different name and/or stage when adding those jobs to their sequences/pipeline.
This is because both jobs were added with an identical name to the pipeline. The second job would overwrite the first one.
When adding jobs or sequences to a sequence, the .add_children()
method accepts the stage
parameter,
you should use to modify the name of the jobs added. The value of stages
will be appended to the jobs
name
and stage
. This only applies to the jobs (sequences) added but not to the jobs (and sequences) already
contained in the sequence.
Reuse jobs
Input:
import gcip
from tests import conftest
def job_for(environment: str) -> gcip.Job:
return gcip.Job(stage="do_something", script=f"./do-something-on.sh {environment}")
def test():
pipeline = gcip.Pipeline()
for env in ["development", "test"]:
pipeline.add_children(job_for(env), stage=env)
conftest.check(pipeline.render())
Mention that we added both jobs with a different stage
to the sequence.
Thus in the output we correctly populate the one job per environment:
Output:
stages:
- do_something_development
- do_something_test
development-do-something:
stage: do_something_development
script:
- ./do-something-on.sh development
test-do-something:
stage: do_something_test
script:
- ./do-something-on.sh test
Reuse sequences
Namespacing is much more useful for reusing sequences. You can define a whole Gitlab CI pipeline within a sequence and reuse that sequence per environment. You simply repeat that sequence in a loop for all environments. Namespacing allows that all jobs of the sequence are populated per environment.
Input:
import gcip
from tests import conftest
def environment_pipeline(environment: str) -> gcip.Sequence:
sequence = gcip.Sequence()
sequence.add_children(
gcip.Job(stage="job1", script=f"job-1-on-{environment}"),
gcip.Job(stage="job2", script=f"job-2-on-{environment}"),
)
return sequence
def test():
pipeline = gcip.Pipeline()
for env in ["development", "test"]:
pipeline.add_children(environment_pipeline(env), stage=env)
conftest.check(pipeline.render())
Output:
stages:
- job1_development
- job2_development
- job1_test
- job2_test
development-job1:
stage: job1_development
script:
- job-1-on-development
development-job2:
stage: job2_development
script:
- job-2-on-development
test-job1:
stage: job1_test
script:
- job-1-on-test
test-job2:
stage: job2_test
script:
- job-2-on-test
Parallelization - name, stage
As you may have mentioned from the previous examples, all jobs have a distinct stage and thus run in sequence.
This is because stage
will always extend the jobs name
and stage
. This applies to all stage
parameters, either of the constructor of a Job object or to the .add_*()
methods of a sequence.
So when adding jobs to a sequence (either directly or contained in a sequence itself) the goal is to just
extend the name
of the jobs but not their stage
, such that jobs with equal stages run in parallel.
This is possible by setting equal values for the stage
parameter but providing different values for the
name
parameter when creating jobs or adding them to sequences. The value of the name
parameter will extend
only the name
of a job but not its stage
.
name
parameter when creating jobs
Input:
import gcip
from tests import conftest
def test():
pipeline = gcip.Pipeline()
pipeline.add_children(
gcip.Job(name="job1", stage="single-stage", script="date"),
gcip.Job(name="job2", stage="single-stage", script="date"),
)
conftest.check(pipeline.render())
Output:
stages:
- single_stage
job1-single-stage:
stage: single_stage
script:
- date
job2-single-stage:
stage: single_stage
script:
- date
This time we have chosen an equal value for stage
, such that the stage
s of both jobs will be set equally. To avoid that also the
name
values of both jobs are equal (and the second job overwrites the first one), we also have provided the name
parameter, whose
value will be appended to the name
of the jobs. Both jobs will run in parallel within the same stage.
First you might wonder, why there is nothing like a stage
parameter. When thinking of sequences, the stage
parameter will extend
both, the name
and stage
of a job, and the name
parameter will just extend the name
of a job. Extends means their values will
be appended to the current values of name
or stage
of a job. However there is no need to extend just the stage
of a job, such that
two jobs have distinct stages but unique names. Unique names means, that the latter job will overwrite all other jobs with the same name,
as a Job in Gitlab CI must have a unique name. It is only usefull to extend both values, such that two jobs are different and run in different
stages, or only to extend the name
of jubs, such that two jobs are different but run in the same stage in parallel. To have the consistent
concept of only the name
and stage
parameter, this applies also to jobs.
Second you might wonder, why we haven’t omit the stage
parameter when creating the jobs. This would be possible. But because of the
explanation in the previous paragraph, when creating jobs we can’t set the stage
value. Omitting the stage
parameter means we will
not set any value for stage
. By default Gitlab CI jobs without a stage
value will be in the test
stage. To define a stage other than
test
, we used the stage
parameter. Yes - that means that also the jobs name
will include the value of the stage
. But this
design decision will make the concept of name
and stage
much more clear that also providing a stage
parameter for jobs while
sequences haven’t such a (useless) stage
parameter (because it makes no sense to extend the stage
over the name
of a job).
Sorry - that was a lot of theory - but simply keep in mind when creating Jobs:
-
Set different values for just the
stage
parameter when creating distinct jobs which will run in sequence (separate stages). -
Set different values for just the
name
parameter when creating distinct jobs which will run in parallel (equal stage). -
Set different values for the
name
parameters but equal values for thestage
parameters when creating distinct jobs which will run in parallel (equal stage) but defining the name of the stage. -
Setting different values for both parameters is nonsense and will lead to the first result of distinct jobs which will run in sequence.
name
parameter when adding jobs (and sequences) to sequences
Lets take the sequence example from the chapter Stages allow reuse of jobs and sequence
and instead of using the stage
when adding the sequence several times to the pipeline we now use the name
parameter.
Input:
import gcip
from tests import conftest
def environment_pipeline(environment: str) -> gcip.Sequence:
sequence = gcip.Sequence()
sequence.add_children(
gcip.Job(stage="job1", script=f"job-1-on-{environment}"),
gcip.Job(stage="job2", script=f"job-2-on-{environment}"),
)
return sequence
def test():
pipeline = gcip.Pipeline()
for env in ["development", "test"]:
pipeline.add_children(environment_pipeline(env), name=env)
conftest.check(pipeline.render())
Now the environments run in parallel, because just the job names are populated per environment but not the stage names.
Output:
stages:
- job1
- job2
development-job1:
stage: job1
script:
- job-1-on-development
development-job2:
stage: job2
script:
- job-2-on-development
test-job1:
stage: job1
script:
- job-1-on-test
test-job2:
stage: job2
script:
- job-2-on-test
You can also mix the usage of stage
and name
. This makes sense when adding lots of jobs
where groups of jobs should run sequentially but jobs within a group in parallel.
Here an Example:
Input:
import gcip
from tests import conftest
def job_for(service: str) -> gcip.Job:
return gcip.Job(stage="update_service", script=f"./update-service.sh {service}")
def test():
pipeline = gcip.Pipeline()
for env in ["development", "test"]:
for service in ["service1", "service2"]:
pipeline.add_children(job_for(f"{service}_{env}"), stage=env, name=service)
conftest.check(pipeline.render())
As output we get two services updated in parallel but in consecutive stages.
Output:
stages:
- update_service_development
- update_service_test
service1-development-update-service:
stage: update_service_development
script:
- ./update-service.sh service1_development
service2-development-update-service:
stage: update_service_development
script:
- ./update-service.sh service2_development
service1-test-update-service:
stage: update_service_test
script:
- ./update-service.sh service1_test
service2-test-update-service:
stage: update_service_test
script:
- ./update-service.sh service2_test
Batteries included
Until here you have learned everything about the logical functionality of gcip. But gcip does also contain a library of predefined assets you can use for building your pipelines. Those assets are contained in the following modules named by their type:
Following sub chapters provide an example for one asset out of every module.
scripts
Input:
import gcip
from gcip.addons.gitlab.scripts import clone_repository
from tests import conftest
def test():
pipeline = gcip.Pipeline()
pipeline.add_children(
gcip.Job(stage="print_date", script=clone_repository("path/to/group"))
)
conftest.check(pipeline.render())
Output:
stages:
- print_date
print-date:
stage: print_date
script:
- git clone --branch main --single-branch https://gitlab-ci-token:${CI_JOB_TOKEN}@${CI_SERVER_HOST}/path/to/group.git
jobs
Input:
import gcip
from gcip.addons.python.jobs.linter import Flake8
from tests import conftest
def test():
pipeline = gcip.Pipeline()
pipeline.add_children(Flake8())
conftest.check(pipeline.render())
Output:
stages:
- lint
flake8-lint:
stage: lint
script:
- pip3 install --upgrade flake8
- flake8
sequences
Input:
import gcip
from gcip.addons.aws.sequences.cdk import DiffDeploy
from tests import conftest
def test():
pipeline = gcip.Pipeline()
sequence = DiffDeploy(stacks=["my-cdk-stack"])
sequence.deploy_job.toolkit_stack_name = "cdk-toolkit"
pipeline.add_children(sequence)
conftest.check(pipeline.render())
Output:
stages:
- diff
- deploy
cdk-diff:
stage: diff
script:
- cdk diff my-cdk-stack
cdk-deploy:
needs:
- job: cdk-diff
artifacts: true
stage: deploy
script:
- pip3 install gcip
- python3 -m gcip.addons.aws.tools.wait_for_cloudformation_stack_ready --stack-names
'my-cdk-stack'
- cdk deploy --require-approval 'never' --strict --toolkit-stack-name cdk-toolkit
my-cdk-stack
rules
Input:
import gcip
from gcip.lib import rules
from tests import conftest
def test():
job = gcip.Job(stage="print_date", script="date")
job.append_rules(
rules.on_merge_request_events().never(),
rules.on_master(),
)
pipeline = gcip.Pipeline()
pipeline.add_children(job)
conftest.check(pipeline.render())
Output:
stages:
- print_date
print-date:
stage: print_date
script:
- date
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
when: never
- if: $CI_COMMIT_BRANCH == "master"
Do more with Python
Note
|
Please note regarding the current version of gcip Currently not all functionality of Gitlab CI is provided by gcip. In the following section is described that you don’t need all the functionality from Gitlab CI, as you can cover some of this in Python. But some functionality must be part of gcip, like configure caching or artifacts, which isn’t implemented yet. |
Until here you have learned everything about the functionality of gcip. That is, to sum it up:
-
Creating jobs.
-
Organizing job hierarchies with sequences.
-
Configuring jobs directly or at hierarchy level over sequences.
-
Namespacing and parallelization.
-
Predefined assets.
With the few functionalities of gcip and the capabilities of Python, there is nothing left to create every pipeline you can imagine. Gitlab CI provides much more constructs you may miss here, but most of them are clunky workarounds as cause of the limited logic capabilities of the Domain Specific Script "Language" of Gitlab CI. You don’t need them, when you can design your pipelines in Python. Here a few examples:
-
You don’t need templates (the
extends
keyword or YAML anchors), because you can reuse jobs and sequences. -
You don’t need
before_script
,after_script
or global configurations, because you can do configurations at an arbitrary level in the sequences hierarchy. All configurations will finally be populated down to the jobs. -
You didn’t have to keep struggling with rules at pipeline and job level. In gcipd you can configure rules at an arbitrary level in the sequences hierarchy.
Furthermore you can leverage all the power of a programming language, to dynamically design your pipelies. Here some ideas:
-
Bundle jobs in sequences and use loops to populate the sequences over a list of environments.
-
Use if-then-else expressions to create jobs within job sequences depending on environment information or requirements.
-
Access information from outside your pipeline script you use for decision making inside your pipeline script.
Beyond the basics
This chapter covers further abilities of GCIP which suffices to be read after the basics.
TriggerJobs
Besides normal Jobs with GCIP you can define TriggerJobs which either run another projects pipeline or a child-pipeline.
Here an example for triggering another pipeline:
Input:
from gcip import Pipeline, TriggerJob, TriggerStrategy
from tests import conftest
def test():
pipeline = Pipeline()
pipeline.add_children(
TriggerJob(
stage="trigger-banana",
project="myteam/banana",
branch="test",
strategy=TriggerStrategy.DEPEND,
)
)
conftest.check(pipeline.render())
Output:
stages:
- trigger_banana
trigger-banana:
trigger:
project: myteam/banana
branch: test
strategy: depend
stage: trigger_banana
Here an example for triggering a child pipeline:
Input:
from gcip import (
IncludeLocal,
Pipeline,
TriggerJob,
TriggerStrategy,
)
from tests import conftest
def test():
pipeline = Pipeline()
pipeline.add_children(
TriggerJob(
stage="trigger-subpipe",
includes=IncludeLocal("./my-subpipe.yml"),
strategy=TriggerStrategy.DEPEND,
)
)
conftest.check(pipeline.render())
Output:
stages:
- trigger_subpipe
trigger-subpipe:
trigger:
include:
- local: ./my-subpipe.yml
strategy: depend
stage: trigger_subpipe
PagesJob for Gitlab Pages
For creating Gitlab Pages you need:
-
Content under the repository path
./public
-
The special job
gcip.PagesJob
which deploys those artifacts to Gitlab Pages.
The first condition could be fulfilled by either having static content in the repository under the ./public
path
or having one or more jobs generating that content under the artifacts path ./public
.
The module gcip.addons.gitlab.jobs.pages
contains predefined jobs generating html content from different sources
and storing them under the artifacts path ./public
. Here an example how to generate and deploy Gitlab Pages:
Input:
from gcip import PagesJob, Pipeline
from gcip.addons.gitlab.jobs.pages import AsciiDoctor
from tests import conftest
def test():
pipeline = Pipeline()
pipeline.add_children(
AsciiDoctor(source="docs/index.adoc", out_file="/index.html"),
PagesJob(),
)
conftest.check(pipeline.render())
Output:
stages:
- build
- pages
asciidoctor-pages-build:
image:
name: ruby:3-alpine
stage: build
script:
- gem install asciidoctor
- asciidoctor docs/index.adoc -o public/index.html
artifacts:
name: ci_job_name-my-awsome-feature-branch
paths:
- public
pages:
image:
name: busybox:latest
stage: pages
script:
- echo 'Publishing Gitlab Pages'
artifacts:
name: ci_job_name-my-awsome-feature-branch
paths:
- public
Prefill variables in manual pipelines
One may ask how to use prefilled variables that can be configured for manual started pipelines, like described in the official documentation. As the gcip pipeline is started as a child pipeline, its pipeline code is first available when rendered by the parent pipeline, whereas the parent pipeline is the .gitlab-ci.yml
file. Thus it is not possible to define prefilled variables within the gcip pipeline itself, because the rendered pipeline script is not available, when the Gitlab CI GUI evaluates the (parent) pipeline.
The way to go is to define the prefill variables in the parent pipeline and then pass them to the gcip child pipeline. Here is an example:
---
variables:
MY_PREFILLED_VARIABLE:
value: "This value is not good enough."
description: "Please provide a better value."
generate-pipeline:
stage: build
image: thomass/gcip:latest
script: /usr/src/app/docker/gcip.sh
artifacts:
paths:
- generated-config.yml
run-pipeline:
stage: deploy
needs:
- generate-pipeline
trigger:
include:
- artifact: generated-config.yml
job: generate-pipeline
strategy: depend
variables:
GCIP_MY_PREFILLED_VARIABLE: $MY_PREFILLED_VARIABLE
This is the same parent pipeline from the chapter Configuring your project to use gcip but with prefilled variables.
Please note that the variable defined at pipeline level is passed with a different variable name to the gcip child pipeline. This is necessary because of a Bug in Gitlab (issue 213729), where variables not correctly being passed to child pipelines. In this example we simply prepended 'GCIP_' to the variable passed to the child pipeline. You can access this variable either by the jobs generated by your gcip script - or directly within your gcip script during pipeline generation time. Latter ist just a matter of python code and could look like following:
# this is our Python gcip code
import os
...
MY_PREFILLED_VARIABLE = os.getenv('GCIP_MY_PREFILLED_VARIABLE')
...
string Job / Sequence modifications together
Every modification method of Job and Sequence returns the appropriate Job / Sequence object. Thus you can string multiple modifications methods together. Here an example for the job configuration.
Input:
from gcip import Job, Pipeline, Rule
from tests import conftest
def test():
pipeline = Pipeline()
job = Job(stage="print_date", script="date")
job.set_image("docker/image:example")
job.prepend_scripts("./before-script.sh")
job.append_scripts("./after-script.sh")
job.add_variables(USER="Max Power", URL="https://example.com")
job.add_tags("test", "europe")
job.append_rules(Rule(if_statement="$MY_VARIABLE_IS_PRESENT"))
job.artifacts.add_paths("binaries/", ".config")
pipeline.add_children(job)
conftest.check(pipeline.render())
Output:
stages:
- print_date
print-date:
image:
name: docker/image:example
stage: print_date
script:
- ./before-script.sh
- date
- ./after-script.sh
variables:
USER: Max Power
URL: https://example.com
rules:
- if: $MY_VARIABLE_IS_PRESENT
artifacts:
name: ci_job_name-my-awsome-feature-branch
paths:
- binaries
- .config
tags:
- test
- europe
The same works with sequences.
Find (and modify) Jobs by their attributes
With Sequence.find_jobs()
you will get a powerful tool. But you should use it with care, because you can bring much confusion to
your pipeline code. This is because it introduces a third way of modifying Jobs. Till here you learned how to create and
modify jobs directly and how to indirectly modify job attributes by setting those attribute on sequences, which is both a very structured approach.
Now you learn how search jobs depending on their current attributes and modify them. This is an
unstructrured approach, because you don’t take care of which jobs exactly you modify. You only care about the state of some jobs before and after your modification.
Let’s get into action, as this tool might be really helpful for you.
Imagine you have a really huge pipeline script and, for example, include a lot of jobs from sequences you haven’t created on your
own. Now there are a couple of jobs that use the docker image foo/bar:latest
. However you want for all jobs,
which are tagged with prd
, to change the Docker image tag to stable
.
Be aware that the following actual example is far more simple than the scenario described. But it reflects the scenario and is
simple enough to understand the mechanics.
Input:
from gcip import Job, JobFilter, Pipeline
from tests import conftest
def test():
dev_job = Job(stage="build-dev", script="do_something development")
prd_job = Job(stage="build-prd", script="do_something production")
dev_job.set_image("foo/bar:latest")
prd_job.set_image("foo/bar:latest")
dev_job.add_tags("dev")
prd_job.add_tags("prd")
pipeline = Pipeline()
pipeline.add_children(dev_job, prd_job)
# Imagine the upper pipeline is far more complex then what you see.
# Then the interesting part starts here:
filter = JobFilter(image="foo/bar:.*", tags="prd")
for job in pipeline.find_jobs(filter):
job.set_image("foo/bar:stable")
conftest.check(pipeline.render())
Output:
stages:
- build_dev
- build_prd
build-dev:
image:
name: foo/bar:latest
stage: build_dev
script:
- do_something development
tags:
- dev
build-prd:
image:
name: foo/bar:stable
stage: build_prd
script:
- do_something production
tags:
- prd
With Sequence.find_jobs()
you can pass a JobFilter
to return all jobs that matches the filter conditions. On the jobs returned you can change whatever
the Job
class allows. The JobFilter
allows to filter for all attributes a Job
typically has. For most of the job filters parameters you can pass
regular expressions, for pattern matching of job attributes. In the example above we were looking for foo/bar:.*
images with any tag.
The jobs returned by Sequence.find_jobs()
must match all attributes from the JobFilter
(logical conjunction / AND). You can pass multiple JobFilter
to the
Sequence.find_jobs()
method. The jobs returned must match at least one of those filters (logical disjunction / OR)
Be aware that the result of Sequence.find_jobs()
depend on the current state of the sequence you are calling this method on.
If you would have searched the prd_job
before adding him to the pipeline, then Sequence.find_jobs()
would have returned nothing.
The method also just looks downward the sequence you are calling this method. A rule of thumb is to apply modifications on
Sequence.find_jobs()
only at the very end of you code and only to the pipeline sequence. However the gcip implements this feature on the sequence level,
to allow special cases where you just want to search for jobs within a child sequence.
The Sequence.find_jobs()
method has some traps you can step into. They are mainly related to inherited attributes from sequences.
By default the Sequence.find_jobs()
method is just looking for attributes set on jobs itself and NOT for attributes jobs would inherit from its sequences.
Imagine following yaml output of an gcip pipeline.
Output:
stages:
- build_dev
- build_prd
dev-build:
image:
name: foo/bar:latest
stage: build_dev
script:
- do_something development
tags:
- dev
prd-build:
image:
name: foo/bar:latest
stage: build_prd
script:
- do_something development
tags:
- prd
The output of this pipeline is the same than from the previous example but without job modifications. Now imagine you want to make the same modifications than before…
filter = JobFilter(image="foo/bar:.*", tags="prd")
for job in pipeline.find_jobs(filter):
job.set_image("foo/bar:stable")
…but the output remains the same. This could happen, if the attributes you are filtering for are not directly set to the job but inherited from its sequences:
Input:
from gcip import Job, JobFilter, Pipeline
from gcip.core.sequence import Sequence
from tests import conftest
def test():
job = Job(stage="build", script="do_something development")
job.set_image("foo/bar:latest")
dev_sequence = Sequence().add_children(job, stage="dev")
prd_sequence = Sequence().add_children(job, stage="prd")
dev_sequence.add_tags("dev")
prd_sequence.add_tags("prd")
pipeline = Pipeline()
pipeline.add_children(dev_sequence, prd_sequence)
# The following filter returns no jobs, as the tags are attributes
# of the sequences and `find_jobs()` is setup to not look for
# inherited attributes.
filter = JobFilter(image="foo/bar:.*", tags="prd")
for job in pipeline.find_jobs(filter):
job.set_image("foo/bar:stable")
conftest.check(pipeline.render())
What you might want to do is adding include_sequence_attributes=True
to the Sequence.find_jobs()
method:
Input:
from gcip import Job, JobFilter, Pipeline
from gcip.core.sequence import Sequence
from tests import conftest
def test():
job = Job(stage="build", script="do_something development")
job.set_image("foo/bar:latest")
dev_sequence = Sequence().add_children(job, stage="dev")
prd_sequence = Sequence().add_children(job, stage="prd")
dev_sequence.add_tags("dev")
prd_sequence.add_tags("prd")
pipeline = Pipeline()
pipeline.add_children(dev_sequence, prd_sequence)
filter = JobFilter(image="foo/bar:.*", tags="prd")
for job in pipeline.find_jobs(filter, include_sequence_attributes=True):
job.set_image("foo/bar:stable")
conftest.check(pipeline.render())
But beware! In the output yaml both jobs becomes modified and have the image foo/bar:stable
set:
Output:
stages:
- build_dev
- build_prd
dev-build:
image:
name: foo/bar:stable
stage: build_dev
script:
- do_something development
tags:
- dev
prd-build:
image:
name: foo/bar:stable
stage: build_prd
script:
- do_something development
tags:
- prd
This is because Sequence.find_job()
did find the job within the prd_sequence
but the same job is added to both sequences. Thus
you modify just one job, but the change affects both rendered instances of this job in the output yaml file.
You now should know both working modes of the Sequence.find_jobs()
methods and their limitations or drawbacks:
-
include_sequence_attributes=False
(default) - will just return jobs whose matching attributes are directly set on that job. -
include_sequence_attributes=True
- will return all jobs whose attributes matches, including attributes inherited from sequences. If a job is included in multiple sequences and the matching attribute is inherited from just one sequences, changes on that job will affect all rendered instances of that job in all sequences.
Author
GCIP was created by Thomas Steinbach in 2020.
Thanks to initial contributions from Daniel von Eßen
Licence
The content of this repository is licensed under the Apache 2.0 license.
Copyright DB Systel GmbH