Basic pipeline building blocks.
This modules provides the basic building blocks in a JIP pipeline and a way to search and find them at run-time. The basic buiding blocks are instances of Tool. The JIP library comes with two sub-classes that can be used to create tool implementations:
In addition to the Tool implementations, this module provides the Scanner class, which is used to find tool implementations either form disk or from an arbitrary python module. This class is supposed to be used as a singleton and an configured instance is available in the main jip module, exposed as jip.scanner. The scanner class itself is configured either through the jip.configuration, or through environment variables. The Scanner documentation covers both the environment variables that can be used as well as the configuration properties.
Decorate functions and classes and convert them to tools.
The @jip.tool decorator turns classes and functions into valid JIP tools. The simplest way to use this decorator is to annotate a python function that returns a string. This string is then interpreted as a JIP script template. The functions docstring is used, similar to JIP scripts, to parse command line options and tool input and output parameters. For example:
@tool()
def mytool():
'''
Send a greeting
usage:
mytool <name>
'''
return 'echo "hello ${name}'"
This create a single bash interpreted script and exposes a tool, mytool, into the JIP environment. You can use the decorators arguments to further customize the tool specification, i.e. specify a different name. If you want to use a different interpreter, you can return a tuple where the first element is the interpreter name and the second is the script template.
Parameters: |
|
---|
This is a decorator that can be used to mark single python functions as tools. The function will be wrapped in a PythonTool instance and the function must accept a single paramter self to access to tools options.
This is a decorator that can be used to mark single python functions as pipelines.
The base class for all implementation of executable units.
This class provides all the building block to integrated new tool implementations that can be executed, submitted and integrated in pipelines to construct more complex setups.
A Tool in a JIP setup is considered to be a container for the executions meta-data, i.e. options and files that are needed to the actual run. The main function of the Tool class is it get_command() function, which returns a tuple (interpreter, command), where the interpreter is a string like “bash” or “perl” or even a path to some interpreter executable that will be used to execute the command. The command itself is the string representation of the content of a script that will be passed to the interpreter at execution time. Please note that the get_command() functions command part is supposed to be fully rendered, it will not be modified any further. The JIP default tool classes that are used, for example, to provide script to the system, are already integrated with the jip.templates system, but you can easily use the rendering function directly to create more dynamic commands that can adopt easily to changed in the configuration of a tool.
The class exposes a name and a path to a source file as properties. Both are optional and can be omitted in order to implement anonymous tools. In addition to these meta data, the tools __init__() function allows you to provide a options_source. This object is used to create the jip.options.Options that cover the runtime configuration of a tool. The options are initialize lazily on first access using the options_source provided at initialization time. This object can be either a string or an instance of an argparse.ArgumentParser. Both styles of providing tool options are described in the jip.options module.
Initialize a tool instance. If no options_source is given the class docstring is used as a the options source.
Parameters: |
|
---|
Delegates to the options check name function
Parameters: | option_name – the name of the option |
---|
The celanup method removes all output files for this tool
Clones this instance of the tool and returns the clone. If the optional counter is profiled, the name of the cloned tool will be updated using .counter as a suffix.
Check a given option value using the check pattern or function and raise a ValidationError in case the pattern does not match or the function does return False.
In case of list values, please note that in case check is a pattern, all values are checked independently. If check is a function, the list is passed on as is if the option takes list values, otherwise, the check function is called for each value independently.
Note also that you should not use this function to check for file existence. Use the check_file() function on the option or on the tool instead. check_file checks for incoming dependencies in pipelines, in which case the file does not exist _yet_ but it will be created by a parent job.
Parameters: |
|
---|
Return a tuple of (template, interpreter) where the template is a string that will be rendered and the interpreter is a name of an interpreter that will be used to run the filled template.
Yields a list of all input files for the options of this tool. Only TYPE_INPUT options are considered whose values are strings. If a source for the option is not None, it has to be equal to this tool.
Returns: | list of file names |
---|
Yields a list of all output files for the options of this tool. Only TYPE_OUTPUT options are considered whose values are strings. If a source for the option is not None, it has to be equal to this tool.
If sticky is set to False, all options marked with the sticky flag are ignored
Parameters: | sticky (boolean) – by default all output option values are returned, if this is set to False, only non-sticky output options are yield |
---|---|
Returns: | list of file names |
Return help for this tool. By default this delegates to the options help.
Initialization method that can be implemented to initialize the tool instance and, for example, add options. init is called once for the tool instance and the logic within the init is not allowed to rely on any values set or applied to the tool.
Raises Exception: | |
---|---|
in case of a critical error |
The default implementation return true if the tools has output files and all output files exist.
Parses the given argument. An excetion is raised if an error ocurres during argument parsing
Parameters: | args (list of strings) – the argument list |
---|
Create and return the pipeline that will run this tool
Setup method that can be implemented to manipulate tool options before rendering and validation. Note that options here might still contain template string. You are also allowed to set option values to template strings.
Raises Exception: | |
---|---|
in case of a critical error |
The default implementation validates all options that belong to this tool and checks that all options that are of TYPE_INPUT reference existing files.
The method raises a ValidationError in case an option could not be validated or an input file does not exist.
Quickly raise a validation error with a custom message.
This function simply raises a ValidationError. You can use it in a custom validation implementation to quickly fail the validation
Parameters: |
|
---|---|
Raises ValidationError: | |
always |
Returns a dictionary from the option names to the option values
Access this tools jip.options.Options instance.
The tools options are the main way to interact with and configure a tool instance either from outside or from within a pipeline.
path to the tools source file
An extension of the tool class that is initialized with a docstring and operates on Blocks that can be loade form a script file or from string.
If specified as initializer parameters, both the validation and the pipeline block will be handled with special care. Pipeline blocks currently can only be embedded python block. Therefore the interpreter has to be ‘python’. Validation blocks where the interpreter is ‘python’ will be converted to embedded python blocks. This allows the validation process to modify the tool and its arguments during validation.
An extension of the tool class that is initialized with a decorated class to simplify the process of implementing Tools in python.
Extends block and runs the content as embedded python
Execute this block as an embedded python script
The terminate function on a python block does nothing. A Python block can not be terminated directly
Utility functions that are exposed in template blocks and template functions
The block utilities store a reference to the local and global environment, to the current tool and to the current pipeline.
Create a bash job that executes a bash command.
This us a fast way to build pipelines that execute shell commands. The functions wraps the given command string in the bash tool that is defined with input, output, and outfile. Input and output default to stdin and stdout. Note that you can access your local context within the command string. Take for example the following pipeline script:
name = "Joe"
bash("echo 'Hello ${name}'")
This will work as expected. The command template can access local variables. Please keep in mind that the tools context takes precedence over the script context. That means that:
input="myfile.txt"
bash("wc -l ${input}")
in this example, the command wc -l will be rendered and wait for input on stdin. The bash command has an input option and that takes precedence before the globally defined input variable. This is true for input, output, and outfile, even if they are not explicitly set. You can however access variables defined in the global context using the _ctx:
input="myfile.txt"
bash("wc -l ${_ctx.input}")
will indeed render and execute wc -l myfile.txt.
Parameters: |
|
---|---|
Returns: | a new pipeline node that represents the bash job |
Return type: |
Checks for the existence of a file referenced by an options.
Please note that this doe not take a file name, but the name of an option. This function is preferred over a simple check using os.path.exists() because it also checks for job dependencies. This is important because a mandatory file might not yet exist within the context of a pipeline, but it will be created at runtime in a previous step.
Parameters: | name – the options name |
---|---|
Returns: | True if the file exists or the file is created by another job that will run before this options job is executed. |
Return type: | boolean |
Create and returns a new Job.
The job instance can be used to customize the execution environment for the next job. For example:
job("Test", threads=2).run('mytool', ...)
This is a typical usage in a pipeline context, where a new job environment is created and then applied to a new ‘mytool’ pipeline node.
Parameters: |
|
---|---|
Returns: | a new job instance |
Return type: |
Set the runtime name of a pipeline. The runtime name of the pipeline is stored in the database and is used as a general identifier for a pipeline run.
Note that this set the name of the pipeline if used in a pipeline context, otherwise it set the name of the tool/job. Within a pipeline context, you can be changed using a job():
job("my job").run(...)
or after the node was created:
myrun = run(...) myrun.job.name = “my job”
Parameters: | name (string) – the name of the pipeline |
---|
Searches for a tool with the specified name and adds it as a new Node to the current pipeline. All specified keyword argument are passed as option values to the tool.
Delegates to the pipelines jip.pipelines.Pipeline.run() method.
Parameters: |
|
---|---|
Returns: | a new node that executes the specified tool and is added to the current pipeline |
Return type: |
Set an options value.
Parameters: |
|
---|
Quickly raise a validation error with a custom message.
This function simply raises a ValidationError. You can use it in a custom validation implementation to quickly fail the validation
Parameters: |
|
---|---|
Raises ValidationError: | |
always |
This class holds a script/tool cache The cache is organized in to dicts, the script_cache, which store name->instance pairs pointing form the name of the tool to its cahced instance. The find implementations will return clones of the instances in the cache.
Add a folder to the list of folders that are scanned for tools.
Param: | path to the folder that will be added to the search path |
---|
Add a module or a python file to the list of module that are scanned for tools.
Param: | path to the module that will be added to the search path |
---|
Finds a tool by its name or file name.
If the given name points to an existing file, the file is loaded as a script tools and returned. Otherwise, a default search is triggered, optionally including the specified path.
Returns: | a new instance of the tool |
---|---|
Return type: | Tool |
Raises ToolNotFoundException: | |
if the tool could not be found |
Searches for scripts and python modules in the configured locations and returns a dictionary of the detected instances
Parameters: | path – optional path value to define a folder to scan |
---|---|
Returns: | dict of tools |
Scan files for jip tools. This functions detects files with the .jip extension in the default search locations.
Parameters: | parent – optional parent folder |
---|---|
Returns: | list of found files |
Loads the python modules specified in the JIP configuration. This will register any functions and classes decorated with one of the JIP decorators.