Modify the execution environment

All executable units in the JIP system are attached to a specific Profile that can be modified from within the tool implementation as well as at submission or execution time.

The Profile

The jip.profiles module documentation covers the properties that you can set and modify. Here, we will go over some of them to explain how and in which order a certain profile is applied to a job at execution time.

The Job name

Job profiles of a tool can be accessed from within the tools setup or validate block. This is important because the properties of the profile are applied as submission time properties. That means they have to be set when the tool is submitted to your compute cluster.

You can set the job name using either the name() function or the job attribute. For example:

>>> from jip import *
>>> @tool()
... class MyTool(object):
...     def setup(self):
...         self.name("Custom-Name")
...         # this is equivalent
...         self.job.name = "Custom-Name"
...     def get_command(self):
...         return "true"
>>> p = Pipeline()
>>> node = p.run('MyTool')
>>> jobs = create_jobs(p)
>>> assert jobs[0].name == 'Custom-Name'

Your tool, and therefor the job that is be created from the tool will have a given name, in this case “Custom-Name”.

The same technique works for pipelines, but they will alter the pipeline name not the names of the embedded jobs within the pipeline:

>>> from jip import *
>>> @pipeline()
... class MyPipeline(object):
...     def setup(self):
...         self.name("The-Pipeline")
...
...     def pipeline(self):
...         p = Pipeline()
...         p.run("MyTool")
...         return p
...
>>> p = Pipeline()
>>> node = p.run('MyPipeline')
>>> jobs = create_jobs(p)
>>> assert jobs[0].pipeline == 'The-Pipeline'
>>> assert jobs[0].name == 'Custom-Name'

If you run “MyPipeline”, the pipeline name will be set to “The-Pipeline”, but the jobs’ name will still be “Custom-Name”. You can however change the job names when you construct the pipeline. For this, you can create a custom profile that will be applied to the job. For example:

>>> @pipeline()
... class MyPipeline(object):
...     def setup(self):
...         self.name("The-Pipeline")
...
...     def pipeline(self):
...         p = Pipeline()
...         p.job("The-Tool").run("MyTool")
...         return p
...
>>> p = Pipeline()
>>> node = p.run('MyPipeline')
>>> jobs = create_jobs(p)
>>> assert jobs[0].pipeline == 'The-Pipeline'
>>> assert jobs[0].name == 'The-Tool'

Here, we used p.job("The-Tool") to create a custom environment with a specified name. Then we run a tool in that environment.

Please keep in mind that multiplexing alters jobs names. If a pipeline contains two jobs with the same name, their names will be suffixed with a counter starting at ‘0’ and applied in the order the nodes where added to the pipeline. For example:

>>> @tool()
... class MyTool(object):
...     """My tool
...     usage:
...         mytool <input>
...     """
...     def setup(self):
...         self.name("MyName")
...
...     def get_command(self):
...         return "true"
...
>>> @pipeline()
... class MyPipeline(object):
...     def setup(self):
...         self.name("The-Pipeline")
...
...     def pipeline(self):
...         p = Pipeline()
...         p.run("MyTool", input=['A', 'B'])
...         return p
>>> p = Pipeline()
>>> node = p.run('MyPipeline')
>>> jobs = create_jobs(p, validate=False)
>>> assert jobs[0].pipeline == 'The-Pipeline'
>>> assert jobs[0].name == 'MyName.0'
>>> assert jobs[1].pipeline == 'The-Pipeline'
>>> assert jobs[1].name == 'MyName.1'

In this example, we call MyTool with two input values A and B. That causes the multiplexing to kick in and results in two jobs are created: MyTool.0 and MyTool.1.

With multiplexing in place it can be useful not to apply fixed names or other environment properties to your nodes, but use templates to customize, for example, your job names according to the tools options. Take the example from above. We can use the jobs input option to construct more meaningful job names:

>>> @tool()
... class MyTool(object):
...     """My tool
...     usage:
...         mytool <input>
...     """
...     def setup(self):
...         self.name("${input|name}")
...
...     def get_command(self):
...         return "true"

>>> @pipeline()
... class MyPipeline(object):
...     def setup(self):
...         self.name("The-Pipeline")
...
...     def pipeline(self):
...         p = Pipeline()
...         p.run("MyTool", input=['A', 'B'])
...         return p
>>> p = Pipeline()
>>> node = p.run('MyPipeline')
>>> jobs = create_jobs(p, validate=False)
>>> assert jobs[0].pipeline == 'The-Pipeline'
>>> assert jobs[0].name == 'A'
>>> assert jobs[1].pipeline == 'The-Pipeline'
>>> assert jobs[1].name == 'B'

In this example, the created jobs will have the names ‘A’ and ‘B’.

Note

In order to assign the node names, we make use of the name template filter. We do so because JIP operates on absolute paths of input and output files and we only want to use the base name if the input file here.

Custom profiles in pipelines

We have seen before that we can use the job() function to create a custom profile and then run a job with that profile. In fact, you can use this to create a set of profiles in your pipeline and then run different jobs with different profiles. For example, assume that you have a few tools that need more CPU’s and, in your environment, they have to be submitted to a specific queue. Other jobs should just run with a default profile. You could do something like this in a JIP script (you can call the same functions on a pipeline directly).

slow = job(threads=8, queue="slow_queue", time="8h")
fast = job(threads=1, queue="fast_queue", time="1h")

europe = slow.run('predict_weather', location='Europe')
america = slow.run('predict_weather', location='America')

stats = fast.run('weather_stats', predictions=[europe, america])

In this example, we create two job profiles, one for slow, multi-threaded jobs, and one for fast jobs. We can then run tools, here predict_weather and weather_stats using these dedicated profiles. The profiles themselves are again callable. That means you can further customize them. Say you want to assign names to the jobs and set a higher priority to one of them:

europe = slow("EU", priority=20).run('predict_weather', location='Europe')
america = slow("USA", priority=10).run('predict_weather', location='America')
Fork me on GitHub