.. _tools_and_pipelines: Tools and Pipelines =================== The essential parts of the JIP system are *tools* and *pipelines*. Tools represent the smallest unit in the system and allow you to implement independent executable blocks. Pipelines are *directed acyclic graphs* that consist of a set of nodes representing tool executions and edges representing the dependencies between these executions. .. _jip_tools: Tools ----- In JIP, *tools* are small executable units that carry meta information to describe the actual execution and its options as well as a way to validate and update the tools' state. .. figure:: _static/single_tool_def.png :align: center :alt: tool structure A single tool consists of the following essential parts. The tools ``options`` are divided into ``Inputs``, ``Outputs``, and ``Options``. The tool itself consists of an optional validation block and an execution block. In addition, a tool has a ``Job`` association that covers the basic execution environment. The simplest form of a tool consist of the following parts: Options Options are a way to express the tools input and output capabilities and other options. ``Inputs`` are usually files or data streams that are read by the tool. ``Outputs``, as the name suggests, cover files and data streams created by a tool. Other ``Options`` can also be defined. Please note that :ref:`input and output ` options are treated specially when a tool is executed. Execution block A single tool contains one execution block that either executes a command script or that creates and returns a pipeline. Command scripts are, by default, implemented in ``bash`` but you can switch the interpreter and write the command in any interpreted language. On the other hand, an execution block can also create a *pipeline* which then will be incorporated into the overall execution graph. Init block A tool instance can provide an ``init`` block that will be called once, when the tool is loaded. ``init`` implementations are not allowed to act on option values, but, can be used to setup and initialize the tool instance itself. Use this block, for example, to add :ref:`dynamic options ` to the tool instance. Please note that the setup blocks have to be implemented in `python` and there is currently no way to change the interpreter for those blocks. Setup block A tool instance can provide a ``setup`` block that will be called before the options are finalized and rendered. The tool options are set when this block is called and you can use it, for example, to implement some logic on the option values. When this block is executed, the options are not yet rendered. That means you are allowed to set option values to template strings. Validation block In addition to the actual execution, a *tool* implementation can extend its default validation. By default, the system ensures that all specified input files exists. You can add more checks in the validation block. Please note that the validation blocks have to be implemented in `python` and there is currently no way to change the interpreter for those blocks. The validation block also is the place to modify the tools job environment in case you don't want to set parameters from the command line at execution or submission time. JIP currently supports two ways to implement tools and pipelines. JIP :ref:`scripts ` and python :ref:`modules with decorators `. .. _jip_tool_scripts: Scripts ^^^^^^^ One way to implement your tools and pipelines is using JIP *scripts*. The system can be executed as an interpreter, hence you can start your scripts with ``#!/usr/bin/env jip``, make them executable and run them directly. The interpreter detects ``--`` in the command line and uses it to separate arguments. Everything after the ``--`` is passed as an argument to the JIP interpreter rather than your tool. Basic JIP scripts can be used to implement both tools & pipeline, and they provide a way to define the JIP options directly in the script. A script usually contains the following blocks: Documentation, help and options A jip script starts with a documentation and help block that contains also the option definition. We use the great `docopt `_ library to parse your option definitions. Blocks for ``init``, ``setup``, ``validate`` and execution You can open a block in a JIP script using ``#%begin `` and close it with ``#%end``. Nested blocks are currently not supported. Documentation, help, and options ******************************** An essential part of any script, independent of the context, is documentation and command line options. Unfortunately, this is often neglected and you end up with a set of script files that you understand while you write and use them first, but if you have to come back to those *things* after some time, you are often lost. The easiest way is to try to document both your script and the command line options, then it takes in a meaningful way. The downside of this is that your initial small script that consists of just a few lines of code will get filled with a lot of code responsible for parsing your command line options. The *docopt* library tries to tackle the problem and is able to parse option definitions that are given in a *POSIX* compliant way. JIP makes heavy use of this library and allows you to specify the option definition in a *POSIX* style way and then to extract the available meta-information. Here is one of the most simple scripts you can write:: #!/usr/bin/env jip # Send greetings # # usage: # greeting echo "Hello ${name}" Make the script executable, ``chmod +x greetings.jip`` and run it:: $> ./greetings.jip Joe Hello Joe You can see that you have access to the parsed options directly in your script. In addition, the ``-h|--help`` option is in place and will print the documentation. We decided to use a slightly modified version of the `docopt `_ library and to force you to write documentation, at least for your options. It might sound harsh and it is a hard constrain, but in order to write reusable tools, you have to provide some sort of definition of your tools options. It turns out, writing options is rather straight forward, you get documentation for your tools and the JIP system can extract the information about your tools options. Here is a larger example where we actually define different kinds of options:: #!/usr/bin/env jip # Wow, accessing arguments without parsing them is great! # # Usage: # my_tool -i ... [-o ] [-b] # # Inputs: # -i, --input List of input files # # Outputs: # -o, --output The output file # [default: stdout] # # Options: # -b, --boolean A boolean flag echo "INPUT: ${input}" echo "OUTPUT: ${output}" echo "BOOLEAN? ${boolean|arg("yes")|else("no")}" A single JIP tool always has a set of options (see :class:`~jip.options.Options` for the underlying API). The options are divided into three groups: ``Inputs`` Input options usually are options that take a file or a list of files. These files, if specified, have to be present at execution time. ``Outputs`` Output options are all options that define files that are created by a tool run. These are of particular importance when it comes to job failures and cleanups. In addition, you might not always be able to expose all your outputs through the command line interfaces. For example, your tool might just take a prefix and then create a set of files based on the specified prefix. These cases can be handled using :ref:`dynamic options `. ``Options`` All options that are not ``Inputs`` or ``Outputs`` fall into this group. .. note:: Note that you have to indicate the ability of a tool to read form ``stdin`` or write to ``stdout`` explicitly. For this, set the options default value to ``stdin`` or ``stdout`` respectively. When options are used to build pipelines, it is important to indicate a tools default input and output options. This is done using the definition order. In case you have more than one input or output option, the first one in the list is marked as the default input/output. Options that accept streams always take precedence and are always defined as the default options for input or output. Execution blocks **************** JIP script must contain exactly one, non-empty, execution block. There are two types of execution blocks: command block (``#%begin command []``) Command blocks execute their block content with a specified interpreter. The block content is a JIP template and you have access to the full context. The command block takes a single argument, which defines the interpreter that will be used to run the blocks content. The default interpreter is *bash*. pipeline block (``#%begin pipeline``) Pipeline blocks are written in *python* and allow you to define a pipeline graph that will then be expanded and executed. All execution blocks can be explicitly opened with ``#%begin command`` or ``#%begin pipeline`` and can be closed by ``#%end``. If no block is opened explicitly, a *bash* command block is created implicitly. Init blocks ************ A script or tool definition can specify a ``init`` block in order to create more options that are registered with the tool. Please note that the init blocks are evaluated once, just after the tool is created. That means that the option values are not yet set and you can not implement any logical decisions based on the option values. You can, however, use the init block to add more options to a tool. For example:: #%begin init add_output('output', '${input|name|ext}.out') #%end Here we add a new output option and set its value as a template that uses the tools ``input`` option. This is valid as the options value will be evaluated later, when the input option is set. Setup blocks ************ Setup blocks are executed before the options values are rendered and can be used to change options based on their values. Because template strings are not yet rendered, you can set the template strings as values. For example:: #%begin setup if options['threads'].get(int) > 1: options['parallel_mode'] = True #%end Validation blocks ***************** In addition to the command execution or pipeline definition, a script can contain a single ``validate`` block:: #%begin validate # check a file check_file('input') import datetime day = datetime.date.today().strftime("%A") if day == 'Monday': validation_error("I don't like Mondays") #%end All validation blocks are written in *python* and the :ref:`context ` exposes a set of helper functions to perform checks on files and raises arbitrary validation errors. See :ref:`Validation ` for more about tool validation. The execution environment ************************* A tool implementation carries its own job environment. This options let you to modify on a per-tool bases are covered in the :py:class:`~jip.profiles.Profile` class. Job profiles can also be applied *outside* of the tool implementation, when you submit or execute the tool or pipeline. Please note that specifying the job options is the preferred way. This enhances portability and flexibility and allows you as a user of a tool to modify its execution environment without touching the tool implementation. The documentation contains an :ref:`example ` that covers the aspects of how you can modify the jobs environment both in the tool implementation as well as on the command line. .. _jip_tool_modules: Modules ^^^^^^^ In addition to JIP scripts, tools and pipeline can also be implemented in Python modules directly, using the JIP API and the available :ref:`decorators `. Tools can be loaded from Python modules directly. Here is how you could implement a simple `hello world` example as a Python function. Create a Python module `hello_world.py` and add the following content:: #!/usr/bin/env python from jip import * @pytool() def hello_world(): """Prints hello world in a python module""" print "Hello world" All we have to do here is to decorate a function with the :py:class:`jip.tools.pytool` decorator exported in the `jip` package. This allows us to treat a single Python function as a tool implementation. In order to integrate the module, we have to either configure the :ref:`jip_modules ` jip configuration or export the :envvar:`JIP_MODULES` environment variable. For example:: $> JIP_MODULES=hello_world.py jip tools Implementing tools in Python modules allows you to group and organize your tools using standard Python modules, but you are no longer able to have them exposed as single commands to your shell. You have to use the :ref:`jip run ` command to execute a tool implemented in Python modules. To run the "hello world" example, try the following:: $> JIP_MODULES=hello_world.py jip run hello_world If you use pPthon modules to organize your tools, you might encounter situations where it would be much easier just to execute a single line of bash rather than implementing the full execution in Python. The latter can by quiet tricky and a lot of things from the Python standard library might get involved. There is however a simpler way where you can use a Python function (or class, see :ref:`decorators `) to create an interpreted script. For this purpose, jip contains the :py:class:`jip.tools.tool` decorator. You can decorate a function with ``@tool()`` and return a template string that is then treated in the same way jip script content would be interpreted. Your function can either return a single string, which will be interpreted using bash, or a tuple where you specify first the interpreter and then the actual script template. Please take a look at the following examples:: @tool() def hello_world(): return "echo 'hello world'" @tool() def hello_perl(): return "perl", """ use strict; print "Hello World\n" """ There are more :ref:`decorators ` that you can use to annotate functions and classes to create pipeline and tools. .. _validation: Tool validation and pre-processing ********************************** *Validation* is an essential step in all pipeline executions. You would want to fail as early as possible and to make sure all mandatory options are set. JIP tools and pipeline come with a default validation mechanism that is triggered while building pipelines and before the execution. By default, all ``input`` options of a tool or pipeline are validated and it is ensured that the referenced file exists or that the file will be created by another tool in a pipeline setup. In addition, all mandatory options are checked and errors raise if a mandatory option is not set. You can also customize the process of validation. In JIP scripts, you can add a ``validate`` block like this:: #%begin validate ... #%end Within the validate block, which is implemented in `python`, you have full access to `the scripts' context `, for example, to use the :py:meth:`~jip.tools.PythonBlockUtils.check_file` function. If you want to fail your validation manually, you have to raise an :py:exc:`~jip.tools.ValidationError`. The easies way to do this is via the Python context' :py:meth:`~jip.tools.PythonBlockUtils.validation_error` function. Specify an error message and the exaception will be raised. For example:: #%begin validate ... if day == "Monday": validation_error("I refuse to work on Mondays") ... #%end Since that the validation blocks run before the actual execution or submission of the pipeline, you can also use the validation block as a general pre-processor for your tool. This can be handy in various circumstances, but keep in mind that the idea is **not** to do the tools job while validating it. Keep your validation methods small and fast so speed up pipeline generation. .. _dynamic_options: Within your ``init`` and ``setup`` blocks, you are allowed to modify the tool options. One common pattern is to add additional `hidden` output options. Assume for example you have a simple tool that take a prefix parameter and a count and then creates a number of files:: #!/usr/bin/env jip # Touch a number of files with a common prefix # # usage: # touch -p -c #%begin command for x in {1..${c}}; do touch ${p}_$x done The tool will do the right job, but the files generated by the tool (``_``) will not be registered as output files. That means they can not be handled in case of a failure or restart, and the tool can not easily be wired up within a pipeline setup as no outputs are defined. On the other hand, we can also not specify the output option within the scripts header directly. The values of the output file options depends on what will be specified for the ``prefix`` and ``counter`` options. The way around the problem is to use the ``init`` and ``setup`` blocks, register the output option dynamically, and then update its value based on the configured options:: #!/usr/bin/env jip # Touch a number of files with a common prefix # # usage: # touch --prefix --count #%begin init add_output('output') #%end #%begin setup options['output'].set(["%s_%s" % (p, i) for i in range(1, count.get(int) + 1)]) #%end #%begin command for x in ${output}; do touch $x done What happens here is that we register a new ``output`` option using the contexts :py:meth:`~jip.tools.PythonBlockUtils.add_output` function, pre-calculate the names of the files and set them as values. Note that you can pass converter functions like, ``str``, ``int``, or ``float`` to the options :py:meth:`jip.options.Option.get` method to convert the value. In fact, now that we have the options specified, we can also use it in the `command` block and replace the bash sequence generation. This way, there is only one place where the names of the output files are generated. That means only one place exists where we have to look for bugs or to change things. .. note:: You can use the validation block for pre-processing, but keep in mind that the validation block will be called **more than once**. That means you have to be careful to implement your pre-processing in a way that it can be executed multiple times and is not too time consuming. .. _decorators: Decorators ^^^^^^^^^^ The :py:mod:`jip.tools` module provides a set of decorators that can be applied to `function` and `classes` in order to transform the decorated instance into a jip tool or pipeline. The following decorators are available: :class:`@tool ` Apply this to classes and functions that return a string (for functions) or implement a ``get_command`` method that returns a string (for classes). The returned string is interpreted as a jip script template. The function can also return a tuple (``interpreter``, ``template``) to indicate an interpreter other than ``bash``. :class:`@pytool ` Apply this to functions or classes. Decorated functions are executed as jip tools, decorated classes are expected to implement a ``run`` method that is then executed as a tool. :class:`@pipeline ` Apply this to functions or classes. Functions must return an :class:`jip.pipelines.Pipeline` instance or a pipeline script. Classes must implement a ``pipeline`` function that returns a pipeline instance or a pipeline script. Function annotation is the most simple and also the most limited way to implement a JIP tool. You do not have a way to customize the tool validation. That said, implementing jip tools as Python functions is straight forward and easy to do:: @pytool() def greetings(): print "Greetings fellow Pythoniast" In this case the tool execution itself is implemented in Python. Alternatively, you can also use the ``@tool`` annotation and return a template string or a tuple to specify the interpreter and the template string:: @tool() def greetings(): return "bash", "echo 'Greetings bash user'" In case you use ``@tool``, you can access the tools :py:attr:`jip.tools.Tool.options` as in any JIP script from :ref:`the context `. On the other hand, if you use the `@pytool` decorator and implement a Python function that is executed as a tool directly, you can access the tool instance as a parameter:: @pytool() def greeting(self): """ usage: greeting """ assert isinstance(self, jip.tools.Tool) print "Greetings", self.options['name'].get() Here, ``self`` is the actual tool instance created by the decorator and populated with the options. An alternative approach, and as well suitable when you deal with more complex tools, is to implement the tool not as a function but as a class. This enables you to add more than just the ``run`` or ``get_command`` functions, but also provide a ``validate`` implementation and even customize other parts of the tool implementation. Here is the python implementation of the greetings tool:: @pytool() class greetings(object): """ usage: greetings """ def validate(self): if self.options['name'] == 'Joe': self.validation_error("Sorry Joe, I don't like your shoes.") def run(self): # we are not a tool instance assert isinstance(self, greetings) # but we can access it assert hasattr(self, 'tool_instance') # and we have the helpers directly available assert hasattr(self, 'args') assert hasattr(self, 'options') assert hasattr(self, 'check_file') assert hasattr(self, 'ensure') assert hasattr(self, 'validation_error') print "Greetings", self.args['name'] As you can see from the example above, you can override most of the functions provided by the tool implementation. If you use a class based approach, a few helper functions and variable are injected into your custom class. You always have access to: args the option values in a read-only dictionary options the :class:`tool options ` check_file the options :py:meth:`~jip.options.Options.check_file` function to quickly check file parameters validation_error access the tools :py:meth:`~jip.tools.Tool.validation_error` function to be able to raise error quickly Please take a look at the documentation of the :class:`@tool ` decorator. There are options you can pass to the decorator to customize how your class is converted to a tool and change, for example, the names of functions that are to map between your implementation and the :class:`~jip.tools.Tool` class. JIP Pipelines ------------- .. _pipeline_operators: Node operators ^^^^^^^^^^^^^^ Pipeline nodes support a set of operators that simplify some operations on the nodes and the graph structure. The following operators are supported by pipeline :py:class:`~jip.pipelines.Node` instances: ``|`` The *or* or *pipe* operator behaves similar to the common behaviour in your bash shell. The default output of the left sides' node (see :py:meth:`jip.options.Option.get_default_output`) is connected the default input of the right sides' node. A new edge is added to the pipeline graph making the right side dependent to the left side, and, if both nodes support streaming, a stream link is established. ``>`` The *greater than* operator can be used **set the output** option of the left side to the right side value. The right hand side can be a string, representing a file name, or another node, or another option. If the right side is another node or another nodes option, a dependency edge will also be created. ``<`` The *less than* operator can be used **set the input** option of the left side to the right side value. The right hand side can be a string, representing a file name, or another node, or another option. If the right side is another node or another nodes option, a dependency edge will also be created. ``>>`` The *right shift* operator creates a dependency between the left side and the right side, making the **left side executed before** the right side. ``<<`` The *left shift* operator creates a dependency between the left side and the right side, making the **right side executed before** the left side. ``+`` The *plus* operator creates a group of jobs. All operations on the group node are now delegated to all members of the group. ``-`` The *minus* operator creates a group of sequentially executed jobs that are send as a single job to the compute cluster. .. _tool_io: Inputs, Outputs, and Options ---------------------------- The previous chapter already explained how you can define tool options and use them in your tool implementations. The options are divided in *input*, *output* and *general* options. All options can be used to create links (dependencies) between tool execution in a pipeline context, but *intpu* and *output* options are treated specially. Input options are automatically validated for each tool. The system will raise an error if you set an input option to a non-existing file. Output options and files that are referenced as outputs of tools are used to detect the state of a tool and its execution, especially when something goes wrong. The first indicator for a jobs state, say *running* or *failed*, is the job database. Alternatively, if the job is no longer marked as running on your compute cluster, the output files of a job are checked and the job is marked as completed if all outputs exists. That would mean that a job that failed in the middle of its run might leave files on disk and might be marked as completed accidentally. To prevent this, a JIP run **deletes all output of failed jobs** automatically. If you submit your jobs through the ``jip submit`` command line tool or run it with ``jip run``, you can prevent deletion of files using the ``--keep`` flag. In general, you are encouraged not to use *keep* though. If a job fails, its output will be removed and this will allow you to fix the problem and to restart your job without thinking about orphan files. .. _stream_dispatching: The stream dispatcher --------------------- If your *tool* implementation can handle streamed input and output, the JIP pipeline system allows dynamic stream dispatching. .. figure:: _static/stream_dispatch.png Dispatch the output of the ``Producer`` to two ``Consumers`` and an output file. All three nodes on the right side will receive the same content. This will also wrap all jobs into a single job group that is executed in parallel. The dispatcher will automatically delegate content from a *producer* node to a number of *consumers*. A valid consumer is either a *file* or another tool that accepts input form the ``stdin`` stream. This allows you to construct parallel running pipelines very similar to what you can do with the bash ``tee`` command. For example:: $> echo "Hello World" | (tee > producer_out.txt | (tee >(wc -w) | wc -l)) Here, the ``echo`` command is the *producer* whose output is piped to the ``producer_out.txt`` file as well as to a word and a line count. To build the same pipeline in JIP, you have a couple of options. We can start with a rough, one-to-one translation:: #!/usr/bin/env jip #%begin pipeline (bash('echo "Hello World"') > 'producer_out.txt') | (bash('wc -l') + bash('wc -w')) This gives the same result. Try to run in and push it through a *dry* run (use ``jip run --dry`` or ``./myscript.jip -- --dry``) to see the pipeline structure. The hierarchy contains all three jobs, but only a single job will be send and executed on your compute cluster. In this example, we use the pipeline :ref:`node operators ` to delegate the output from our *producer* to the output file and then further to a *group* of two jobs of the word and the line counts. A variation of the example above would be to explicitly specify the producers output:: bash('echo "Hello World"'), output='producer_out.txt') | (bash('wc -l') + bash('wc -w')) Both variations are similar in nature and so the jobs. But, both of them do not necessarily improve readability or maintainability of the script. They do the job, but you might not consider the script *nice*. An alternative implementation of the same pipeline might look like this:: #!/usr/bin/env jip #%begin pipeline producer = bash('echo "Hello World"', output='producer_out.txt') word_count = bash("wc -w", input=producer) line_count = bash("wc -l", input=producer) producer | (word_count + line_count) Granted, this is no longer a single line. But the goal is also not to use the least number of keystrokes (if you are interested in that, start playing `vimgolf `_). The script above allows more flexibility and you will be able to update the pipeline faster. The key line with respect to the streaming dispatcher is the last line of the script. This line enables the stream dispatching. If you remove it, your pipeline will still work, but the producer and the consumer jobs will no longer run in parallel. Without the last line, first the producer will be executed and its output will be written to `producer_out.txt`. Then the two consumer jobs will execute (potentially in parallel) and operate on the output file. If you decide that you don't need the `producer_out.txt` file, you can simply remove it from the producer definition. In that case you will end up again with a pipeline structure that executes a single job and all data will be streamed. In this case you don't even need the last line, the streaming dependency is implicit. .. note:: Another nice feature of the last version of the pipeline is that *auto-naming* kicks in and your pipeline jobs will be named according to the variable names you used in your script:: #################### | Job hierarchy | #################### producer ├─word_count └─line_count #################### .. _templates: The Template system =================== JIP uses `jinja2 `_ as template system, and all jip scripts are passed through the jinja2 engine. There are just a few things we changed and added to the context. Most importantly, we use `${}` notation to identify variables. This provides a slightly "nicer" integration with bash and feels a little bit more native. In addition, we configured *jinja2* not to replace any unknown variable, which allows you to use bash environment variables without any problems. .. _template_filters: Template Filters ---------------- Template filters can be a very powerful tool to simplify processing users input and to reduce the number of ``if/else`` statements in templates. For example: .. code-block:: bash # get the parent folder name of a file # and prefix it with '1_' parent = ${myfile|parent|name|pre('1_')} # get the base name of a file and remove the file extension file_name = ${myfile|name|ext} # print the boolean option '-e, --enable' as -e=yes if the # option is true and specified by the user some_tools ${enable|arg(suffix='=yes')} # say 'output' can be stdout, redirect to a file only if # the user specified a file name, otherwise nothing # will be put into the template, hence output goes to # stdout ... ${output|arg(">")} # translate an options -i, --input one to one into the template # if it was specified. This yields: mytool -i input.txt mytool ${input|arg} The following filters are currently available: **arg** The argument filter applies to options that have a value specified and whose value is not False. The *arg* filter without any arguments prefixes the options with its original short/long option name. You can specify a prefix or a suffix to change this behaviour or to change to option name. For example ``${output|arg}`` will return ``-o outfile`` assuming that the output option has a short form of `-o` and the value was set to `outfile`. You can change the prefix by specifying the first argument, for example, ``${output|arg(">")}`` will print ``>outfile``. Suffixes can also be specified, i.e., ``${output|arg(suffix=";")}`` **ext** The extension filter cuts away file name extension and can also be applied multiple times. Assume your `output` options is set to `my.file.txt`. Using ``${output|ext}`` prints ``my.file`` while ``${output|ext|ext}`` prints ``my``. The ``ext`` filter cuts away the rightmost extension by default. You can however set the ``all`` option to ``True``. This will cause all file extensions to be removed. For example, `my.file.txt` passed through ``${output|ext(all=True)}`` will print ``my``. **suf** Takes a single argument and adds it as a suffix to the option value **pre** Takes a single argument and adds it as a prefix to the option value **name** Returns the basename of a file **abs** Returns the absolute path of a file. If no argument is specified, and the rendered value is an option instance, the absolute path is calculated relative to the tool jobs working directory. Otherwise the current working directory is used as a base. You can specify a base folder as an optional argument to the filter. **parent** Return the name of the parent directory of a given file path **re** Takes two arguments for search and replace. The search argument can be a regular expression **else** Takes a single argument and outputs it if the passed in value is either a file stream or it evaluates to False. .. note:: All input and output files paths are translated to absolute paths in JIP. In order just to get the name of a file, use the ``name`` filter. The JIP `repository contains an example `_ that demonstrates the usage of the filters:: #!/usr/bin/env jip # Template filter examples # # usage: # template_vars.jip -i [-o ] [-b] # # Options: # -i, --input A single input file # -o, --output Output file # [default: stdout] # -b, --boolean A boolean option echo "=========================================" echo "Raw values are printed as they are, except" echo "stream and boolean options." echo "" echo "RAW INPUT : ${input}" echo "RAW OUTPUT : ${output}" echo "RAW BOOLEAN : ${boolean}" echo "=========================================" echo "=========================================" echo "The 'arg' filter without any argument" echo "prefixs the value with its option if" echo "the value is not a stream or it evaluates to" echo "true." echo "" echo "RAW INPUT : ${input|arg}" echo "RAW OUTPUT : ${output|arg}" echo "RAW BOOLEAN : ${boolean|arg}" echo "=========================================" echo "=========================================" echo "The 'arg' filter with arguments can be" echo "used to add custom prefixes and suffixes" echo "to the value is not a stream or evaluates" echo "to true." echo "" echo "RAW INPUT : ${input|arg('--prefix ', ';suffix')}" echo "RAW OUTPUT : ${output|arg('>')}" echo "RAW BOOLEAN : ${boolean|arg('--yes')}" echo "=========================================" echo "=========================================" echo "The 'pre' and 'suf' filter can also be" echo "used to add a prefix or a suffix." echo "" echo "RAW INPUT : ${input|pre('--prefix ')|suf(';suffix')}" echo "RAW OUTPUT : ${output|pre('>')}" echo "RAW BOOLEAN : ${boolean|suf('yes')}" echo "=========================================" echo "=========================================" echo "The 'name' filter returns the base name" echo "of a file or directory" echo "" echo "RAW INPUT : ${input|name}" echo "RAW OUTPUT : ${output|name}" echo "RAW BOOLEAN : ${boolean|name}" echo "=========================================" echo "=========================================" echo "The 'parent' filter returns the path to" echo "the parent folder of a file or directory" echo "" echo "RAW INPUT : ${input|parent}" echo "RAW OUTPUT : ${output|parent}" echo "RAW BOOLEAN : ${boolean|parent}" echo "=========================================" echo "=========================================" echo "The 'ext' filter cuts away the last file" echo "extension. By default, the extension is" echo "detected by '.', but you can specify a" echo "custom split character" echo "" echo "RAW INPUT : ${input|ext}" echo "RAW OUTPUT : ${output|ext('_')}" echo "RAW BOOLEAN : ${boolean|ext}" echo "=========================================" echo "=========================================" echo "The 'else' filter can be used to insert a" echo "string in case the value evaluates to " echo "a stream or false." echo "" echo "RAW INPUT : ${input|else('-')}" echo "RAW OUTPUT : ${output|else('default')}" echo "RAW BOOLEAN : ${boolean|else('--no')}" echo "=========================================" echo "=========================================" echo "The 're' filter can be used for search" echo "and replace on the value. Regular" echo "expressions are supported." echo "" echo "RAW INPUT : ${input|re('setup', 'replaced')}" echo "RAW OUTPUT : ${output|re('.py$', '.txt')}" echo "RAW BOOLEAN : ${boolean|re('no', 'effect')}" echo "=========================================" Option translation ^^^^^^^^^^^^^^^^^^ The template context offers access to the ``options``, which can be used for a quick one to one translation of your input parameter in a command template. For example: .. code-block:: bash #!/usr/bin/env jip The GEM Indexer tool Usage: gem_index -i [-o ] [-t ] [--no-hash] Options: --help Show this help message -o, --output-dir The folder where the output GEM index is created -t, --threads The number of execution threads [default: 1] --no-hash Do not produce the hash file [default: false] Inputs: -i, --input The fasta file for the genome """ gemtools index ${options()} Here, all specified options will be rendered after ``gemtools index``. This only applies to non-hidden options that have a long or a short name. That means, if you want dynamically created options to be rendered, you have to set the ``long`` or ``short`` flags and make them non-hidden:: add_output("output", short='-o', hidden=False) .. _python_context: The script context ------------------ Within a jip script, within template blocks, and in Python blocks like *validate*, *setup*, *init*, or *pipeline*, a set of functions is exposed to simplify certain tasks that have to be done quiet often, for example, checking for the existence of files. The following functions and variables are available without any additional import statements: * **tool** holds a reference to the current tool or pipeline * **args** args is a read-only dictionary of the option values * **opts** holds a reference to the tool/pipeline :py:class:`jip.options.Options` instance. This can be used like a dictionary to access the raw options. Note that you will not get the values directly but an instance of :py:class:`jip.options.Option`. If you want to get the value, try ``opts['output'].get()``. * **_ctx** a named tuple that allows read-only access to the current script context. * **__file__** contains the path to the script file * **pwd** string with the current working directory * **basename** pythons :py:func:`os.path.basename` * **dirname** pythons :py:func:`os.path.dirname` * **abspath** pythons :py:func:`os.path.abspath` * **exists** pythons :py:func:`os.path.exists`. Please note that you might want to take a look at the :py:meth:`~jip.tools.PythonBlockUtils.check_file` function exposed in the context or :py:meth:`jip.options.Option.check_file`. Both will check for the existence of a file, but in case the tool is used in a pipeline, the check will only happen if the option is not passed in as a dependency, in which case the file might simply not exist yet because the job that the option depends on was not executed yet. * **r** is an alias to the :py:meth:`~jip.templates.render_template` function In addition, the following functions are available: .. raw:: html .. automethod:: jip.tools.PythonBlockUtils.check_file :noindex: .. automethod:: jip.tools.PythonBlockUtils.validation_error :noindex: .. automethod:: jip.tools.PythonBlockUtils.run :noindex: .. automethod:: jip.tools.PythonBlockUtils.bash :noindex: .. automethod:: jip.tools.PythonBlockUtils.job :noindex: .. automethod:: jip.tools.PythonBlockUtils.name :noindex: .. automethod:: jip.tools.PythonBlockUtils.set :noindex: .. automethod:: jip.options.Options.add_output :noindex: .. automethod:: jip.options.Options.add_input :noindex: .. automethod:: jip.options.Options.add_option :noindex: .. automethod:: jip.templates.render_template :noindex: .. _injected_functions: Injected functions ^^^^^^^^^^^^^^^^^^ If you use a class-based approach and the :ref:`decorators ` to implement your tools, the following functions and attributes are injected into your class if they do not conflict with a local function or attribute: options Reference to your tools :py:class:`~jip.options.Options` instance opts An alias for ``options`` args Read-only dictionary of the option values ensure Helper function that simplifies raising validation errors. check_file The ``check_file`` helper to check for existence of files referenced by an option validation_error quickly raises a validation error name a function to set your tool or pipeline run-time name add_output add an output option add_input add an input option add_option add a general option render_template render a template string r an alias for ``render_template`` In addition, all tool options are injected as class attributes as long as they do not conflict with an existing property. This allows you to quickly access the functions and properties in your class-based implementations. For example: .. code-block:: python @tool('bwa_index') class BwaIndex(): """\ Run the BWA indexer on a given reference genome Usage: bwa_index -r Inputs: -r, --reference The reference """ def init(self): self.add_output('output', '${reference}.bwt') def get_command(self): return 'bwa index ${reference}' Here we access the ``reference`` option and the ``add_output`` function as class attributes directly.