Skip to content

Lesson 6: Array jobs

Many researchers find that they often need to run the same task multiple times, with different input parameters. Whilst this could be done by submitting lots of individual jobs, with one job script file per task, a more efficient and robust way is to use an array job. This method also allows you to circumvent the maximum jobs per user limitation, and administer the submission process more effectively.

Prepare a job script called array.sh containing the following:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
#$ -t 1-30         # Run 30 tasks

# Sleep for 60 seconds to give you a chance to run qstat
sleep 60

# Display the task id of the current job
echo "This is task number ${SGE_TASK_ID} of job ${JOB_ID}"

1) Submit the job

[USERNAME@login-01 ~]$ qsub array.sh
Your job-array 4723192.1-30:1 ("array.sh") has been submitted

Check the status of your running job, with qstat, which (if you are quick enough) should display thirty items, each with the same job number - these are called tasks. Each task runs independently.

As each task completes, it will create a individual file containing the output generated by the task. Each file will be named in the format array.sh.oJOBID.TASKNUMBER.

2) Examine the content of one of the output files

In our example, the job id was 4723192. Executing ls array.sh.o4723192.* will show the list of output files generated by the 30 tasks. In this example we examine the output of task 13:

[USERNAME@login-01 ~]$ cat array.sh.o4723192.13
This is task number 13 of job 4723192

3) Change the step size

By default, an array job uses an increment of 1 to step through the number of tasks provided. However, the values of -t can be any range, with the option to increase the step size. In the following example, we use a step size of 10:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
#$ -t 10-100:10         # Run 10 tasks,using a step size of 10

# Sleep for 60 seconds to give you a chance to run qstat
sleep 60

# Display the task id of the current job
echo "This is task number ${SGE_TASK_ID} of job ${JOB_ID}"

Try this for yourself with different task values and step sizes.

4) Task concurrency

Task concurrency (-tc N) is the number of array tasks allowed to run at the same time. In the following example, we are limiting our array to only run 3 tasks at a time:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
#$ -t 1-30         # Run 30 tasks
#$ -tc 3           # Allow only 3 running tasks at a time

# Sleep for 60 seconds to give you a chance to run qstat
sleep 60

# Display the task id of the current job
echo "This is task number ${SGE_TASK_ID} of job ${JOB_ID}"

After submitting the job, you can observe using the qstat command that only 3 tasks from the array are running at any time.

5) Deleting tasks

Observe that since there is just one job id, all of the tasks in the array can be deleted instantly via the qdel command by supplying the job id:

USERNAME@login-01:~$ qsub array.sh
Your job-array 4723274.1-10:1 ("array.sh") has been submitted
USERNAME@login-01:~$ qdel 4723274
USERNAME has registered the job-array task 4723274.1 for deletion
USERNAME has registered the job-array task 4723274.2 for deletion
USERNAME has registered the job-array task 4723274.3 for deletion
USERNAME has registered the job-array task 4723274.4 for deletion
USERNAME has registered the job-array task 4723274.5 for deletion
USERNAME has registered the job-array task 4723274.6 for deletion
USERNAME has registered the job-array task 4723274.7 for deletion
USERNAME has registered the job-array task 4723274.8 for deletion
USERNAME has registered the job-array task 4723274.9 for deletion
USERNAME has registered the job-array task 4723274.10 for deletion

Note that we can also delete specific tasks by providing a task id, or range of tasks. In the example below, we delete any queued or running tasks in the range 6-8.

USERNAME@login-01:~$ qsub array.sh
Your job-array 4723281.1-10:1 ("array.sh") has been submitted
USERNAME@login-01:~$ qdel 4723281 -t 6-8
USERNAME has registered the job-array task 4723281.6 for deletion
USERNAME has registered the job-array task 4723281.7 for deletion
USERNAME has registered the job-array task 4723281.8 for deletion

This provides a very flexible way of administering tasks.

6) Process a list of files using an array job

A common requirement is to process a large number of files, or provide a filename as input to an application.

In this example, we want to rotate all of the images in the current directory. The images are suffixed with .png, so first we generate a list of files using the single-column option of the ls command:

$ ls -1 *.png > list_of_files.txt
$ wc -l list_of_files.txt
35 list_of_files.txt

At this stage it's best to inspect the content of this file using cat to ensure everything is correct. Since there are 35 files, we now have a t value to supply in the array script.

To process a list of files as an array, we add a special line to our script. It looks rather complicated but it is essentially the same line each time you use it:

INPUT_FILE=$(sed -n "${SGE_TASK_ID}p" list_of_files.txt)

This will assign the corresponding line from list_of_files.txt to a variable called $INPUT_FILE. So, for the first task, this will correspond to the 1st line from our list of files, and therefore the name of our first file. For the nth task, this will be the nth file, until the array job is complete.

Here is our array job to rotate the files listed in list_of_files.txt. If you aren't running it in the same directory, you would need to ensure the list of files contains the correct relative or absolute path to find the files.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
#$ -t 1-35

# Load imagemagick to allow image rotation with the convert command
module load imagemagick

# Assign the nth line to INPUT_FILE
INPUT_FILE=$(sed -n "${SGE_TASK_ID}p" list_of_files.txt)

# Add an echo line for debugging (optional)
echo "Rotating" $INPUT_FILE

# Rotate the image and rename the output
convert $INPUT_FILE -rotate 90 "${INPUT_FILE%.png}"_rotated.png

Additional array job documentation can be found here.