Skip to content
Snippets Groups Projects
README.md 10.3 KiB
Newer Older
# Using HTCondor

This tutorial demonstrates how to submit jobs to the HTCondor infrastructure
at INM7.

### Meet Bob and Alice...

![Bob and Alice](src/alice_bob.png)
Adina Wagner's avatar
Adina Wagner committed

Bob and Alice are a new PhD students and have stressful lifes: understand complex
science, develop analysis methods, write code, interpret & write up the results,
and survive the whims of their PhD supervisor. With all of this overhead, they do
not care much about the technical details on how to efficiently distribute and
run their computations on the available computational infrastructure. Instead, ...

... Bob just logs into one of the available compute nodes and starts a gigantic
analysis. This analysis keeps the node so busy that no one else can use it, and
yet it takes weeks to complete. He gets stressed about the time this takes, and
everyone else is pissed because they can't work with his computations running.

.. Alice lets **HTCondor manage her computational jobs**, and in a
**only a few days** all of her analyses are calculated, because Condor fitted
Laura Waite's avatar
Laura Waite committed
everything exactly where computational resources were available and ran many
Adina Wagner's avatar
Adina Wagner committed
jobs in parallel, **without interfering** with the work that others are doing on
the cluster.

Alice can compute a lot more, way faster, and much more ignorantly than Bob by
using HTCondor. If you want to be like Alice, this tutorial is for you.
### 1000 feet overview
Simply put, HTCondor is software that schedules and runs computing tasks on computers.
Apart from providing a *job submission*, Alice can relax and have HTCondor do all the work
to fairly distribute, run, and monitor her jobs for her.
The basic workflow looks like this:

Adina Wagner's avatar
Adina Wagner committed
Alice submits a *job* to HTCondor. HTCondor chooses when and where to run the job
based upon the job requirement and available worker machines. HTCondor monitors
the jobs' progress, and HTCondor notifies the Alice upon completion.
Hanke, Michael's avatar
Hanke, Michael committed
Practically, this means Alice submits her job(s) from a submit machine, the login
node (or head node) of a compute cluster, to a *queue*.
HTCondor manages this queue and schedules the jobs to run on the available compute
Adina Wagner's avatar
Adina Wagner committed
nodes of the cluster. Thus, Alice does not run her analysis on the compute nodes
Hanke, Michael's avatar
Hanke, Michael committed
themselves -- HTCondor handles this exclusively. Because HTCondor knows basic properties
of a compute nodes' resources (e.g., operating system, available memory, or current
CPU load), it matches a job to a resource based on the *requirements* of the job and
can thus -- just like in a game of Tetris -- run a job as soon as it fits in somewhere
on the available resources. This makes job execution faster and more efficient
than any human-made spreadsheet, and Alice is happy that she does not need to care
when or where to start her jobs.
Adina Wagner's avatar
Adina Wagner committed
From the login node, Alice can view the state of her jobs or trouble shoot problems,
Adina Wagner's avatar
Adina Wagner committed
should they occur. Beyond managing and monitoring job execution, Condor keeps *logs* of the jobs,
*error* files, and *output* files that provide valuable information.
![HTCondor workflow](src/condor_cycle.svg)

### Where to start
The key to using Condor is to break down an analysis into smaller, loosely coupled
computing tasks that can be distributed across a compute cluster.
Hanke, Michael's avatar
Hanke, Michael committed
Often, this is easily possible, and helps to make analyses faster and more suitable
for a scientific compute cluster. For example, the large computation
"preprocess 200 subjects" can be split into 200 times the job "preprocess 1 subject".
All else it takes is a single *submit file* that queues these 200 jobs, and Condor
will take care of running them as efficiently as possible, as much in parallel as
possible, and as fairly towards other users of a shared computing resource as possible.

Imagine you have an analysis script (``process.py``) that processes input data
(``sub-*_data_file``) from 20 subjects, and saves the results in 20 independent
files.

For your analysis, you build an intuitively and consistently structured/named
directory structure, such as this:

```
home
└── user
    ├── documents
    ├── projects
    │   └── myanalysis
    │       ├── code
    │       │   └── process.py
    │       ├── input
    │       │   ├── sub-1
    │       │   │   └── sub-01_data_file
    │       │   ├── sub-2
    │       │   │   └── sub-02_data_file
    │       │   ├── [...]
    │       │   └── sub-20
    │       │       └── sub-20_data_file
    │       └── output
    └── scratch
```

For your analysis, you want to process the ``data_file``s of all subjects and
save them into ``output``.
In order to compute such an analysis efficiently with HTCondor, you
have to complete the following steps:

1. Prepare the script to constitute an ideally sized job
2. Create a submit file defining all the jobs you need to run
3. Submit the jobs to HTCondor


### 1. Prepare an ideally sized job

An ideal job is as small as feasible, and independent: Smaller jobs run faster,
and as independent jobs can be executed in parallel you can save a lot of time.
Moreover, HTCondor matches jobs to compute resources.
As there are more compute nodes with small and midsized resources, and fewer with
many resources for single jobs, your job has a higher chance for a slot if it is
smaller.

The largest possible job in the above scenario is a single job that processes all
20 subjects, for example by hardcoding all of the files in the
script to process them one after the other. **Such a job is not ideal.**

A better approach is to define a smaller, yet still feasable job size:
20 single subject analyses.

Laura Waite's avatar
Laura Waite committed
How do you do this without adjusting the same script 20 times for each subject's
file? A sensible approach is to provide command line arguments to the script
that define an input file and an output file name to allow subject-wise
computations without modifying the script.
This ensures that the script flexibly processes any subject with command line
calls such as

```
python3 code/process.py -i input/sub-01/sub-01_data_file -o output/sub-01/sub-01_processed
```

The above command line call is a single job, processing only subject 1. But
as directory structures and file naming is consistent, a simple for-loop over
subjects can define all 20 jobs at once. Such a for-loop can be done in any language,
below is an example with ``bash``:

```
for dir in input/sub-*; do
sub=$(basename $dir)
echo "python3 code/process.py -i input/${sub}/data_file -o output/${sub}/${sub}_processed";
done
```

Running this line in a terminal would print out this:

```
python3 code/process.py -i input/sub-01/sub-01_data_file -o output/sub-01/sub-01_processed
python3 code/process.py -i input/sub-02/sub-02_data_file -o output/sub-02/sub-02_processed
python3 code/process.py -i input/sub-03/sub-03_data_file -o output/sub-03/sub-03_processed
python3 code/process.py -i input/sub-04/sub-04_data_file -o output/sub-04/sub-04_processed
python3 code/process.py -i input/sub-05/sub-05_data_file -o output/sub-05/sub-05_processed
[...]
python3 code/process.py -i input/sub-20/sub-20_data_file -o output/sub-20/sub-20_processed
```

Yay! 20 jobs. If you now put them into a submit file and queue them, HTCondor will
distribute these 20 jobs to the available compute resources. At best, all 20 of your
jobs will run at the same time, using only 1/20th of the time of a single, large
analysis.

How do you do this? The next sections will get you there:

- The section [01_running_a_job](01_running_a_job.md) is a guide to understanding
  a *submit file*.
- The section [02_quickstart](02_quickstart.md) will let you write a
  tiny script for a single job, write a submit file for this job, submit the job
  to HTCondor, and interact with the monitoring aspects of the submission.
- Lastly, the section [03_define_jobs](03_define_jobs.md) will get back to this
  examples and show you how to submit all 20 jobs at once.

### Practical advice on how to prepare small-sized jobs

-  Identify which parts of your analysis are a bottleneck, and which parts can
   be broken down into smaller parts. A top-level analysis that aggregates statistics
   over all subjects could not be broken down into smaller computations, but
   lower-level analyses that compute the statistics per subject can easily be
   broken into subject-based computations.
-  Instead of having one large script perform all computations across all subjects,
   prepare many scripts that each compute an independent part of your analysis if
   possible.
-  In your programming language of choice, use tools that allow you to specify
   any varying input to your script (data files, output names, options, ...) via
   the command line.
-  "Small" jobs run independently of one another. This involves defining all
   input files for each job and ensuring output files do not overwrite output
   files from other jobs. For simulations, this involves deciding how to run
   simulations (e.g., should one simulation be a job or multiple simulations that
   share the same inputs?). For data processing, breaking a task down may involve
   splitting data into smaller files.
-  For large data, consider whether breaking the data up into smaller files is
   feasible. For large amounts of simulations, consider whether they can be run
   in parallel, maybe by creating an input table of parameters in which each row
   corresponds to a job.
-  As shown above, have a way to identify files by some naming convention. The
   BIDS standard for example works well!
-  When a computation or input data really can't be broken down into smaller
   pieces, or into pieces that run in parallel, the analysis can be run as a single,
   large job. However, a large job will take more time to run, it will
   likely take more time until an available compute resource for this job is
   found, and large jobs are more heavily penalized by HTCondors ranking/
   priorization system.


### Further Reading

Please see [the HTCondor User Manual](https://research.cs.wisc.edu/htcondor/manual/)
for more information about HTCondor concepts and further examples. The following
publication may be beneficial if you need a high level overview in a scientific
format: Erickson RA, Fienen MN, McCalla SG, et al. Wrangling distributed computing for
high-throughput environmental science: An introduction to HTCondor.
[doi:10.1371/journal.pcbi.1006468](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006468)