jip.tools

Basic pipeline building blocks.

This modules provides the basic building blocks in a JIP pipeline and a way to search and find them at run-time. The basic buiding blocks are instances of Tool. The JIP library comes with two sub-classes that can be used to create tool implementations:

ScriptTool
This sub-class of Tool integrates file or script based tool implementations which can be served from stand-alone script files
PythonTool
In contrast to the script tool, this Tool extension allows to create Tool instances from other, possibly non-related, python classes. The easiest way to used this is with the jip.tools.tool decorator, which allows you to take arbitrary python classes and make them jip tools.

In addition to the Tool implementations, this module provides the Scanner class, which is used to find tool implementations either form disk or from an arbitrary python module. This class is supposed to be used as a singleton and an configured instance is available in the main jip module, exposed as jip.scanner. The scanner class itself is configured either through the jip.configuration, or through environment variables. The Scanner documentation covers both the environment variables that can be used as well as the configuration properties.

Decorators

class jip.tools.tool(name=None, inputs=None, outputs=None, argparse='register', get_command='get_command', validate='validate', setup='setup', init='init', run='run', pipeline='pipeline', is_done='is_done', cleanup='cleanup', help='help', add_outputs=None, check_files=None, ensure=None, pytool=False, force_pipeline=False)

Decorate functions and classes and convert them to tools.

The @jip.tool decorator turns classes and functions into valid JIP tools. The simplest way to use this decorator is to annotate a python function that returns a string. This string is then interpreted as a JIP script template. The functions docstring is used, similar to JIP scripts, to parse command line options and tool input and output parameters. For example:

@tool()
def mytool():
    '''
    Send a greeting

    usage:
        mytool <name>
    '''
    return 'echo "hello ${name}'"

This create a single bash interpreted script and exposes a tool, mytool, into the JIP environment. You can use the decorators arguments to further customize the tool specification, i.e. specify a different name. If you want to use a different interpreter, you can return a tuple where the first element is the interpreter name and the second is the script template.

Parameters:
  • name – specify a tool name. If no name is specified, the name of the decorated function or class is used as the tool name
  • inputs – specify a list of option names that are treated as input options
  • outputs – specify a list of option names that are treated as output options
  • argparse – specify the name of the function or a function reference that take an ArgumentParser instance and populates it. This takes precedence over the doc string if the function exists.
  • get_command – name of the function or a function reference that implements the tools get_command function
  • validate – name of the function or a function reference that implements the tools validate function
  • setup – name of the function or a function reference that implements the tools setup function
  • init – name of the function or a function reference that implements the tools init function
  • run – name of the function or a function reference that implements the tools run function
  • pipeline – name of the function or a function reference that implements the tools pipeline function
  • is_done – name of the function or a function reference that implements the tools is_done function
  • cleanup – name of the function or a function reference that implements the tools cleanup function
  • help – name of the function or a function reference that implements the tools help function
  • add_outputs – takes a list of values to add hidden output options
  • check_files – takes a list of option names that will be passed through file checks on validation
class jip.tools.pytool(*args, **kwargs)

This is a decorator that can be used to mark single python functions as tools. The function will be wrapped in a PythonTool instance and the function must accept a single paramter self to access to tools options.

class jip.tools.pipeline(*args, **kwargs)

This is a decorator that can be used to mark single python functions as pipelines.

Tool classes

class jip.tools.Tool(options_source=None, name=None)

The base class for all implementation of executable units.

This class provides all the building block to integrated new tool implementations that can be executed, submitted and integrated in pipelines to construct more complex setups.

A Tool in a JIP setup is considered to be a container for the executions meta-data, i.e. options and files that are needed to the actual run. The main function of the Tool class is it get_command() function, which returns a tuple (interpreter, command), where the interpreter is a string like “bash” or “perl” or even a path to some interpreter executable that will be used to execute the command. The command itself is the string representation of the content of a script that will be passed to the interpreter at execution time. Please note that the get_command() functions command part is supposed to be fully rendered, it will not be modified any further. The JIP default tool classes that are used, for example, to provide script to the system, are already integrated with the jip.templates system, but you can easily use the rendering function directly to create more dynamic commands that can adopt easily to changed in the configuration of a tool.

The class exposes a name and a path to a source file as properties. Both are optional and can be omitted in order to implement anonymous tools. In addition to these meta data, the tools __init__() function allows you to provide a options_source. This object is used to create the jip.options.Options that cover the runtime configuration of a tool. The options are initialize lazily on first access using the options_source provided at initialization time. This object can be either a string or an instance of an argparse.ArgumentParser. Both styles of providing tool options are described in the jip.options module.

__init__(options_source=None, name=None)

Initialize a tool instance. If no options_source is given the class docstring is used as a the options source.

Parameters:
  • options_source – either a string or an argparser instance defaults to the class docstring
  • name – the name of this tool
check_file(option_name)

Delegates to the options check name function

Parameters:option_name – the name of the option
cleanup()

The celanup method removes all output files for this tool

clone(counter=None)

Clones this instance of the tool and returns the clone. If the optional counter is profiled, the name of the cloned tool will be updated using .counter as a suffix.

ensure(option_name, check, message=None)

Check a given option value using the check pattern or function and raise a ValidationError in case the pattern does not match or the function does return False.

In case of list values, please note that in case check is a pattern, all values are checked independently. If check is a function, the list is passed on as is if the option takes list values, otherwise, the check function is called for each value independently.

Note also that you should not use this function to check for file existence. Use the check_file() function on the option or on the tool instead. check_file checks for incoming dependencies in pipelines, in which case the file does not exist _yet_ but it will be created by a parent job.

Parameters:
  • option_name – the name of the option to check
  • check – either a string that is interpreter as a regexp pattern or a function that takes the options value as a single paramter and returns True if the value is valid
get_command()

Return a tuple of (template, interpreter) where the template is a string that will be rendered and the interpreter is a name of an interpreter that will be used to run the filled template.

get_input_files()

Yields a list of all input files for the options of this tool. Only TYPE_INPUT options are considered whose values are strings. If a source for the option is not None, it has to be equal to this tool.

Returns:list of file names
get_output_files(sticky=True)

Yields a list of all output files for the options of this tool. Only TYPE_OUTPUT options are considered whose values are strings. If a source for the option is not None, it has to be equal to this tool.

If sticky is set to False, all options marked with the sticky flag are ignored

Parameters:sticky (boolean) – by default all output option values are returned, if this is set to False, only non-sticky output options are yield
Returns:list of file names
help()

Return help for this tool. By default this delegates to the options help.

init()

Initialization method that can be implemented to initialize the tool instance and, for example, add options. init is called once for the tool instance and the logic within the init is not allowed to rely on any values set or applied to the tool.

Raises Exception:
 in case of a critical error
is_done()

The default implementation return true if the tools has output files and all output files exist.

parse_args(args)

Parses the given argument. An excetion is raised if an error ocurres during argument parsing

Parameters:args (list of strings) – the argument list
pipeline()

Create and return the pipeline that will run this tool

setup()

Setup method that can be implemented to manipulate tool options before rendering and validation. Note that options here might still contain template string. You are also allowed to set option values to template strings.

Raises Exception:
 in case of a critical error
validate()

The default implementation validates all options that belong to this tool and checks that all options that are of TYPE_INPUT reference existing files.

The method raises a ValidationError in case an option could not be validated or an input file does not exist.

validation_error(message, *args)

Quickly raise a validation error with a custom message.

This function simply raises a ValidationError. You can use it in a custom validation implementation to quickly fail the validation

Parameters:
  • message – the message
  • args – argument interpolated into the message
Raises ValidationError:
 

always

args

Returns a dictionary from the option names to the option values

options

Access this tools jip.options.Options instance.

The tools options are the main way to interact with and configure a tool instance either from outside or from within a pipeline.

path = None

path to the tools source file

class jip.tools.ScriptTool(docstring, command_block=None, setup_block=None, init_block=None, validation_block=None, pipeline_block=None)

An extension of the tool class that is initialized with a docstring and operates on Blocks that can be loade form a script file or from string.

If specified as initializer parameters, both the validation and the pipeline block will be handled with special care. Pipeline blocks currently can only be embedded python block. Therefore the interpreter has to be ‘python’. Validation blocks where the interpreter is ‘python’ will be converted to embedded python blocks. This allows the validation process to modify the tool and its arguments during validation.

class jip.tools.PythonTool(cls, decorator, add_outputs=None)

An extension of the tool class that is initialized with a decorated class to simplify the process of implementing Tools in python.

Blocks and Block utilities

class jip.tools.PythonBlock(content=None, lineno=0)

Extends block and runs the content as embedded python

run(tool, stdin=None, stdout=None)

Execute this block as an embedded python script

terminate()

The terminate function on a python block does nothing. A Python block can not be terminated directly

class jip.tools.PythonBlockUtils(tool, local_env)

Utility functions that are exposed in template blocks and template functions

The block utilities store a reference to the local and global environment, to the current tool and to the current pipeline.

bash(command, **kwargs)

Create a bash job that executes a bash command.

This us a fast way to build pipelines that execute shell commands. The functions wraps the given command string in the bash tool that is defined with input, output, and outfile. Input and output default to stdin and stdout. Note that you can access your local context within the command string. Take for example the following pipeline script:

name = "Joe"
bash("echo 'Hello ${name}'")

This will work as expected. The command template can access local variables. Please keep in mind that the tools context takes precedence over the script context. That means that:

input="myfile.txt"
bash("wc -l ${input}")

in this example, the command wc -l will be rendered and wait for input on stdin. The bash command has an input option and that takes precedence before the globally defined input variable. This is true for input, output, and outfile, even if they are not explicitly set. You can however access variables defined in the global context using the _ctx:

input="myfile.txt"
bash("wc -l ${_ctx.input}")

will indeed render and execute wc -l myfile.txt.

Parameters:
  • command (string) – the bash command to execute
  • kwargs – arguments passed into the context used to render the bash command. input, output, and outfile are passed as options to the bash tool that is used to run the command
Returns:

a new pipeline node that represents the bash job

Return type:

jip.pipelines.Node

check_file(name)

Checks for the existence of a file referenced by an options.

Please note that this doe not take a file name, but the name of an option. This function is preferred over a simple check using os.path.exists() because it also checks for job dependencies. This is important because a mandatory file might not yet exist within the context of a pipeline, but it will be created at runtime in a previous step.

Parameters:name – the options name
Returns:True if the file exists or the file is created by another job that will run before this options job is executed.
Return type:boolean
job(*args, **kwargs)

Create and returns a new Job.

The job instance can be used to customize the execution environment for the next job. For example:

job("Test", threads=2).run('mytool', ...)

This is a typical usage in a pipeline context, where a new job environment is created and then applied to a new ‘mytool’ pipeline node.

Parameters:
  • args – job arguments
  • kwargs – job keyword arguments
Returns:

a new job instance

Return type:

jip.pipelines.Job

name(name)

Set the runtime name of a pipeline. The runtime name of the pipeline is stored in the database and is used as a general identifier for a pipeline run.

Note that this set the name of the pipeline if used in a pipeline context, otherwise it set the name of the tool/job. Within a pipeline context, you can be changed using a job():

job("my job").run(...)

or after the node was created:

myrun = run(...) myrun.job.name = “my job”
Parameters:name (string) – the name of the pipeline
run(_name, **kwargs)

Searches for a tool with the specified name and adds it as a new Node to the current pipeline. All specified keyword argument are passed as option values to the tool.

Delegates to the pipelines jip.pipelines.Pipeline.run() method.

Parameters:
  • _name (string) – the name of the tool
  • kwargs – additional argument passed to the tool as options
Returns:

a new node that executes the specified tool and is added to the current pipeline

Return type:

jip.pipelines.Node

set(name, value)

Set an options value.

Parameters:
  • name (string) – the options name
  • value – the new value
validation_error(message, *args)

Quickly raise a validation error with a custom message.

This function simply raises a ValidationError. You can use it in a custom validation implementation to quickly fail the validation

Parameters:
  • message – the message
  • args – argument interpolated into the message
Raises ValidationError:
 

always

Tool Scanner

class jip.tools.Scanner(jip_path=None, jip_modules=None)

This class holds a script/tool cache The cache is organized in to dicts, the script_cache, which store name->instance pairs pointing form the name of the tool to its cahced instance. The find implementations will return clones of the instances in the cache.

add_folder(path)

Add a folder to the list of folders that are scanned for tools.

Param:path to the folder that will be added to the search path
add_module(path)

Add a module or a python file to the list of module that are scanned for tools.

Param:path to the module that will be added to the search path
find(name, path=None, is_pipeline=False)

Finds a tool by its name or file name.

If the given name points to an existing file, the file is loaded as a script tools and returned. Otherwise, a default search is triggered, optionally including the specified path.

Returns:a new instance of the tool
Return type:Tool
Raises ToolNotFoundException:
 if the tool could not be found
scan(path=None)

Searches for scripts and python modules in the configured locations and returns a dictionary of the detected instances

Parameters:path – optional path value to define a folder to scan
Returns:dict of tools
scan_files(parent=None)

Scan files for jip tools. This functions detects files with the .jip extension in the default search locations.

Parameters:parent – optional parent folder
Returns:list of found files
scan_modules()

Loads the python modules specified in the JIP configuration. This will register any functions and classes decorated with one of the JIP decorators.

Exceptions

exception jip.tools.ValidationError(source, message)

Exception raised in validation steps. The exception carries the source tool and a message.

exception jip.tools.ToolNotFoundException

Raised in case a tool is not found by the scanner

Fork me on GitHub