gcip developer documentation

This documentation describes good practices and conventions when creating your own library based on the gcip.

It could be seen as a continuation of the user documentation and thus should be read afterwards.

Table of Contents

Inheritance for custom Jobs and Sequences
- The problem
- The solution

Inheritance for custom Jobs and Sequences

The problem

When subclassing your custom Job you typically have two bad options how to pass parameters to the superclass:

Repeating the parent classes __init__ parameters and forwarding them to the parents __init__ method.
Collecting arbitrary **kwargs and unpacking them to the parents __init__ method.

Both ways are shown in the following simplified example:

class Job:
  def __init__(self, script: str, name: str, ...)

class BuildX(Job):
  def __init__(self, script: str, name: str, custom_param1: str, custom_param2: str):
    super().__init__(script=script, name=name, ...)

class DeployY(Job):
  def __init__(self, custom_param1: str, custom_param2: str, **kwargs,):
    super().__init__(**kwargs)

The first option is bad because

your Jobs __init__ method is bloated with all parameters from the parent class.
you have to update the signature of your Jobs __init__ method every time the parent class does.
you have to bother with naming conflicts between your Jobs __init__ parameters and the ones from the parent class.
the problems gets worse if you inherit from multiple classes in the chain.

The second option is bad because

you and your IDE doesn’t know which arguments are consumed by the parent class.
…which is error prone because of typos.
the problems gets worse if you inherit from multiple classes in the chain.

Another problem is passing __init__ parameters to Jobs within a Sequence:

class CompleteTask(Sequence):
  def __init__(
    self,
    custom_build_param1: str,
    custom_build_param2: str,
    custom_deploy_param1: str,
    custom_deploy_param2: str
  ):
  self.add_children(
    BuildX(custom_param1: custom_build_param1, custom_param2: custom_build_param2),
    DeployY(custom_param1: custom_deploy_param1, custom_param2: custom_deploy_param2),
  )

As this little example shows you could collect all required job parameters and bother with naming conflicts. Or you could collect a dictionary of parameters for every job and unpack it to the Jobs __init__ methods. The problems remain the same as with subclassing Jobs.

The solution

The solution to overcome this problems are good practice patterns you should use, when creating your own jobs and seqeuences by inheritance.

Jobs

The good practice for subclasses of Jobs is not to pass through all __init__ parameters to the parent job class, but only provide newly introduced parameters. When a user of you job want’s to modify the superclass job fields, he can do this with the setters and getters of the job object.

The Job class provides setters for all fields but the name, stage and script fields. Those fields cannot and must not be changed after the Job object is created. The name and stage fields are the only fields you should pass through to the parent Job class.

The script field typically depends on the configuration of you subclass - and if not now, then maybe in the future. So the script should be created as latest as possible. That means your subclass should overwrite the render() method, create the final script there and apply it to self and then call the render method of the superclass.

@dataclass(kw_only=True)
class MyJob(Job):
    """
    Documentation
    """

    parameter1: str
    parameter2: Optional[str] = None
    jobName: InitVar[str] = "myjob"
    jobStage: InitVar[str] = "build"

    def __post_init__(self, jobName: str, jobStage: str) -> None:
        super().__init__(script="", name=jobName, stage=jobStage)
        self.set_image("custom/image")
        self.add_variables(SPECIAL_ENV="some_value")

    def render(self) -> Dict[str, Any]:
        if self.parameter2:
            parameter2 = parameter2.strip()

        self._scripts = [f"do-something --with {parameter1} --and {parameter2}"]
        return super().render()

The upper example shows following:

For simplicity we advice to use @dataclass for your Job class. Users then can change the parameters of your job, without having explicit setters and getters.
The jobName and jobStage are passed through to the parent Job class. They are only available during initialization.
All modifications on the initialized job object can be done within the __post_init__ method.
All parameters, that take account into the final script of your Job, as well as the script itself are modified and created in the render method - and never in the __post_init__ method.

Have a look at a real world example, the docker.Build job.

One note to subclassing those dataclass classes itself: If you want to create a subclass of your custom job yourself, you may want to hide parameters from the superclass and determine their value in the subclass. Here is how you can do it:

@dataclass(kw_only=True)
class A:
  a: str
  b: str

@dataclass(kw_only=True)
class B(A):
  a: str = field(init=False)  # <-- this field is hidden in objects of B
  c: str

  def __post_init__(self) -> None:
    super().__post_init__()
    self.a = f"extended-{c}"  # <-- the value of 'a' is calculated within B

test = B(b="foo", c="bar")

test.a  # "extended-bar"
test.b  # "foo"
test.c  # "bar"

Have a look at a real world example, the codecommit.MirrorToCodecommit job.

Sequences

The approach when subclassing sequences differs from subclassing jobs. This is because you cannot overwrite the render() method of the jobs within your sequence. Thus jobs within your sequence cannot be modified by the sequence after initialization (despite from the modification methods the Sequence class itself). That means your sequences should only provide configuration parameters in the __init__ method and should not expose fields that can be modified after initialization.

To overcome the problem of too much __init__ parameters when having multiple sublcasses in chain, we suggest following practice. Beside the init parameters your sequence has, you should provide the same parameters within a dataclass:

@dataclass(kw_only=True)
class MySequenceOpts:
  param1: str
  param2: str

class MySequence(Sequence):
  def __init__(
    self,
    param1: str,
    param2: str,
  ):
    super().__init__()
    self.jobx = MyJob(a=param1, b="hello")
    self.joby = MyJob(a=param2, b="world")

When you or someone else would then subclass your sequence, he can use the 'opts' dataclass to bundle and pass through all superclass parameters as follows:

class UberSequence(MySequence):
  def __init__(
    paramA: str,
    paramB: str,
    my_sequence_opts: MySequenceOpts,
  ):
    super().__init__(**my_sequence_opts.__dict__)
    ...

The same applies when composing sequences:

class SuperSequence(Sequence):
  def __init__(
    paramA: str,
    my_sequence_opts: MySequenceOpts,
  ):
    self.my_sequence = MySequence(**my_sequence_opts.__dict__)