SLURM Guide

This document will provide you with an overview of SLURM and specifically discusses how to use various SLURM commands. The first set of SLURM commands go over job submission whereas the second set of SLURM commands discuss using SLURM to get information about your current or past jobs.

Table of Contents

Using SLURM to Submit Jobs

Additional SLURM commands

Using SLURM to Submit Jobs

In general there are two ways to submit a job. You can either construct a job submission script or use a command-line approach.

1. The Submission Script

Create a SLURM script called example.sl using a text editor. Enter the highlighted text below into your example.sl script:

The following script contains job submission options (the #SBATCH lines) followed by the actual application command (myexe). The application command needs to be on your PATH.

In this example, you would enter the following into your script (Note: that each SBATCH switch below has two ‘-‘ characters, not one):

#!/bin/bash

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=01:00:00
#SBATCH --mem=100m

myexe

The job submission options are flags that tell SLURM what you need (e.g., resources, etc.) in order for your job to run. Here the job submission options request to launch one task (--ntasks=1), where the task gets one CPU (--cpus-per-task=1), 100MB of memory (--mem=100m), and a one hour time limit (--time=01:00:00). Here the single "task" being run is the myexe command. It is to your advantage, in terms of getting your job dispatched, to request resources (e.g., CPUs, memory, time limit, etc.) that approximate as closely as possible what your job truly needs/uses at runtime versus over-requesting resources. If you need help understanding your job's resource requirements email research@unc.edu.

Many SLURM job submission options offer both a double hypen and single hyphen syntax for requestiong resources. You can use either syntax in your job submission. Some of the more common ones are in the table below.


Option Description
--ntasks, -n number of tasks
--cpus-per-task, -c number of CPUS per task
--time, -t time limit
--nodes, -N number of nodes
--partition, -p job partition

Additional useful SLURM job submission options include getting email notifications regarding your job, requesting a quality of service, etc. To learn more about the many different job submission options refer to the man pages on the sbatch command:

man sbatch

To submit your job using your job submission script use the sbatch command:

sbatch example.sl

2. The Command-line Method

The equivalent command-line method of the above job submission would be

sbatch --ntasks=1 --cpus-per-task=1 --time=01:00:00 --mem=100m --wrap="myexe"

Note every job you submit gets assigned a unique job ID number by SLURM which gets displayed immediately upon submitting your job. The job ID is useful to have for some of the other SLURM commands we discuss below.

Additional SLURM commands

SLURM provides a variety of commands to get information about your jobs, account, etc.

Note in the commands discussed below you'll need to replace

  • <onyen> with your actual ONYEN
  • <jobID> with the SLURM job ID number
  • <partition_name> with the name of the SLURM partition
  • <PI_group_name> with your SLURM PI group name.

1. The squeue command

This command can be used to see the status of your current jobs:

squeue -u <onyen>

The first column in the squeue output displays the job's ID number, the second column shows the partition associated with the job, etc.

For more information on the squeue command:

man sacct

2. The scancel command

This command can be used to cancel your submitted job that has a job ID number of jobID:

scancel <jobID>

For more information on the scancel command:

man scancel

3. The sacct command

This command can be used to check the details of your completed job that has a job ID number of jobID:

sacct -j <jobID> --format=User,JobID,MaxRSS,Start,End,Elapsed

Here the items listed for --format are the specific fields we are interested in retrieving for the job. Usually the MaxRSS field is of particular interest since it shows the maximum amount of RAM used by your job, so this value can be used to get a sense of your job's memory requirements.

For more information on the sacct command:

man sacct

4. The scontrol command

This command can be used to check the details of SLURM partitions:

scontrol show partition

If you want the output compressed you can use the --oneliner flag:

scontrol --oneliner show partition

If you want the output only for a specific partition you can specify the partition's name in the command:

scontrol show partition <partition_name>

The scontrol command can also be used to get information about a current job:

scontrol show jobid <jobID>

For more information on the scontrol command:

man scontrol

5. The sinfo command

This command can be used to get information about SLURM partitions and nodes.

To get general information about SLURM partitons:

sinfo

You can also use the sinfo command to view available features:

sinfo -o "%20N %5D %10c %10m %25f %10G"

For more information on the sinfo command:

man sinfo

6. The sacctmgr command

This command can be used to check SLURM account information.

  • To see the current limit settings by quality of servce (qos):
sacctmgr show qos format=name%15,mintres,grptres,maxtres%20,maxtrespernode,maxtrespu%20,maxjobs,mintres,MaxSubmitJobsPerUser

Here the items listed for format are the specific fields we are interested in retrieving and we've also included formatting options (e.g., %15) to control how the output for a particular field is displayed.

  • To see your designated PI account and current user-level limts:
sacctmgr show association where user=<onyen> format=Account%40,GrpTRES%40
  • To see the current limit settings applied to groups:
sacctmgr show account withassoc where name=<PI_group_name>

The PI group name associated with your account can be found from the Account field in the sacctmgr command directly above.

Note: System administrators occasionally tweak settings on an as-needed basis.

For more information on the sacctmgr command:

man sacctmgr

7. The seff command

This command can be used to get some useful information on a completed job, such as your job's CPU utilization and memory utilization.

seff <jobID>

The seff command is a good way to evaluate your job's CPU and memory runtime requirements. The seff command returns inaccurate information for currently running jobs.

8. The sstat command

This is another command that can be used to get information on a running job.

sstat -j <jobID>

 

Last Update 2/5/2025 5:48:44 AM