.. _setup: Setup and configuration ======================= Dependencies and requirements ----------------------------- JIP does have a bunch dependencies that should be installed with the system automatically. What is needed are the following libraries: * `SQLAlchemy `_ is used for the job database integration * `Jinja2 `_ is the template system that is used. **Note** that instead of the default ``{{`` ``}}`` separators, by default, JIP templates use ``${`` ``}``. You can change the variable open and close strings in the :ref:`JIP configuration `. * ``argparse`` is used for argument parsing. This is part of the Python standard library since version 2.7, but will be installed as a dependency for older python versions. .. note:: The JIP job database uses an SQlite back-end, which is part of the Python standard library since version 2.5, but needs to be enabled when Python is compiled. Most of the bundled Python installations come with support for sqlite, but if you compiled your own version of Python, make sure you have sqlite support. You can check if your Python installation supports sqlite with the following command:: $> python -c 'import sqlite3' If the command above does not raise any exception, you have sqlite support. The current implementation uses the job database for simple communication. That requires a way to lock the database file and have it in a location accessible from all nodes in your compute cluster. The default location is ``$HOME/.jip/jobs.db``, but you can change the path in the JIP configuration. For more complex pipeline implementations the SQlite back-end could generate concurrency issues, especially on network shared file systems where the file locking mecanism could not work properly or could not even be available. In order to resolve this problem, the JIP database module also supports a MySQL back-end. In order to use it `mysql-python `_ must be installed beforehand. Installation ------------ The JIP system is mainly implemented in Python and can be installed from either the JIP `GitHub repository `_ or directly through pypi. Install from GitHub ^^^^^^^^^^^^^^^^^^^ In order to install JIP from the Github repository, you have to check out the code and run the install script:: $~> git clone https://github.com/thasso/pyjip $~> cd jip $~/jip> python setup.py install This will install JIP system wide, but you need to have administrative privileges to do so. If you do not have root permissions or you do not want to install JIP system wide, you can append the `--user` option to the install command:: $~/jip> python setup.py install --user This will install into you home folder. .. note:: You might have to update your `PATH` variable to include `$HOME/.local/bin`. This is the default install location for `--user` mode. Install from pypi ^^^^^^^^^^^^^^^^^ The pypi Python repository contains tools and libraries written in Python and provides an easy way to install such packages. An easy way to install a package from pypi is using the `pip` package manager. You can install JIP system wide with `pip`:: $~> pip install jip Alternatively, you can also install the JIP into your home folder with `pip` by appending the `--user` option:: $~> pip install jip --user .. _jip_configuration: Configuration and setup ----------------------- After installation, you should have the `jip` command line tool available to interact with the system. This command can be seen as the master and control command to work with jip tools, pipelines, and jobs from the command line. JIP comes with a set of :ref:`JIP command line utilities `. Almost all of the commands at hand will work out of the box, but some might need a little bit of configuration before you can use them on your system. The *JIP configuration* is stored in two location on your system: `$INSTALL_DIR/jip.json`: You can put a global configuration just next to the `jip` executable. This configuration file will always be loaded and evaluated for all calls to the command line utilities. In case you use the Python API directly, you might have to specify the path to the global configuration file explicitly. To do this, set ``jip.configuration.install_path`` to the absolute path the directory that contains the ``jip.json`` **before** you make any other calls to the JIP API. `$JIP_CONFIG`: You can point to a custom configuration using the :envvar:`JIP_CONFIG` environment variable. If the file exists, it is loaded after the global configuration but before the configuration in the current users home directory. `$HOME/.jip/jip.json`: In order to provide user-level configuration, you can create a `.jip` folder in your :envvar:`$HOME` directory and put a `jip.json` configuration file there. This file will be evaluated automatically for both the calls to the command line utilities as well as any calls done using the Python API directly. The file will extend and overwrite any global configuration. Here is an example of a JIP configuration file:: { "db": "sqlite:///home/thasso/.jip/jobs.db", "jip_path": "", "jip_modules": [], "cluster": "jip.cluster.Slurm", "profiles": { "default": { "queue": "project", "time": "3h" } }, "templates":{ "variable_open": "${", "variable_close": "}" } } The configuration can contain the following entries that are used by the JIP API: `db` Database location. The path or URL to connect to the JIP database. The JIP database is used to store runtime information about jobs submitted to a compute cluster. By default, :command:`jip` puts the database into `$HOME/.jip/jobs.db` and uses an embedded sqlite database. This setting can be overwritten at runtime using the :envvar:`JIP_DB` environment variable. Valid URLs for the connection string are:: # SQlite paths rel/path/to/dir/jip.db /abs/path/to/dir/jip.db sqlite:///rel/path/to/dir/jip.db sqlite:////abs/path/to/dir/jip.db # MySQL URLs mysql://user:password@host/jip mysql:///jip (uses the user MySQL configuration in ~/.my.cnf) For MySQL databases, :command:`jip` assumes that the specified database already exists. No database creation operation is performed. `jip_path` Colon separated path or locations for jip tools. You can put a colon separated list of folder here. All folders in this list will be searched for tools. You can add paths at runtime using the :envvar:`JIP_PATH` environment variable. `jip_modules` List of Python modules. Put a list of module names here to specify locations of JIP tools that are implemented in a Python module. For examples:: ... "jip_modules":["my.tools"] ... With this configuration, JIP will load the `my.tools` Python module to search for tools. Please note that `my.tools` module must be available on your :envvar:`PYTHONPATH`. You can add module dynamically to the list using the :envvar:`JIP_MODULES` environment variable. `cluster` name of a class that implements :py:class:`jip.cluster.Cluster`. When used in a cluster environment, the specified class is used to interact with your grid system on the lower level. See :ref:`the cluster configuration documentation ` and the :py:mod:`jip.cluster` module for more information about supported cluster engines and how you can configure them. `profiles` list of profiles that can be used to configure jobs on a cluster `templates` .. _config_templates: configure parts of the template system. Currently, you can change the separator strings that are used to access the templates variables. For examples, if you want to switch back to the jinja2 defaults, add the following configuration block:: "templates":{ "variable_open": "{{", "variable_close": "}}" } In addition, other configuration blocks, which are interpreted by specific module, can be specified. For example, the different cluster implementations can ask for specific configuration blocks. .. _cluster_config: Cluster Configuration ^^^^^^^^^^^^^^^^^^^^^ The ``cluster`` configuration is loaded from your JIP configuration file. The following base configurations are available. Please refer to the implementation documentation for details of the configuration parameters for each grid connectors. Grid engines ************ JIP ships with connector implementations for the following grid systems: For a :class:`~jip.cluster.Slurm` cluster:: { "cluster": "jip.cluster.Slurm" } For a :class:`PBS/Torque ` cluster:: { "cluster": "jip.cluster.PBS" } For a :class:`Gridengine/SGE/OGE ` cluster:: { "cluster": "jip.cluster.SGE", "sge" : { "threads_pe": "threads" } } Please note that for SGE, in order to submit multi-threaded jobs, you have to specify the parallel environment that is configured for threaded jobs. For a :class:`Platform LSF or Openlava ` cluster:: { "cluster": "jip.cluster.LSF" } Local scheduler *************** If you don't have access to a compute grid or if you want to use JIP on your local machine to schedule jobs & run them in the background, JIP comes with local scheduler implementations. For this to work, you have to configure JIP to connect to a server process using the :class:`JIP local scheduler connector ` in your JIP configuration:: { "cluster": "jip.grids.JIP", "jip_grid": { "port": 5556 } } In addition you have to start the *JIP server* and keep it running:: $> jip server This will start a server process that will take care of accepting jobs and executing them in the background. .. note:: The JIP server uses PyZMQ for message passing and you have to make sure that the library is installed before you can start the server. You can install PyZMQ with pip:: $> pip install pyzmq