DOGWOOD SLURM EXAMPLES

This guide will provide you with enough basic information to run straight-forward jobs on Dogwood. Please use the table of contents to access more detailed information, or email research@unc.edu with questions.

Table of Contents

Method 1: The Submission Script

Method 2: Inline Submission

Example: Submitting mvapich2_2.3rc1 without mpirun

Interactive Debugging Example

Other SLURM Information

These are just examples to give you an idea of how to submit jobs on Dogwood for some commonly used applications. You’ll need to specify SBATCH options as appropriate for your job and application.

To connect to <onyen>@dogwood.unc.edu, see: Getting Logged on

Notable Directories Your home directory is: /nas/longleaf/home/<onyen>. Your scratch space is: /21dayscratch/scr/o/n/<onyen>.

Dogwood uses SLURM to schedule and submit jobs. Below are the most common methods to submit jobs to Dogwood. Always submit your compute jobs via SLURM. Never run the compute jobs from the $ prompt (the node where are you are logged in).

The Submission Script

Create a bash script using your favorite editor. If you don’t have a favorite editor, use nano (for now).

nano example.sh

The script contains job submission options followed by application commands. Please enter the following into your script:

#!/bin/bash
#SBATCH --job-name=first_slurm_job
#SBATCH -N 2
#SBATCH -p 528_queue
#SBATCH --ntasks-per-node=44
#SBATCH --time=5:00:00  # format days-hh:mm:ss

mpirun my_parallel_MPI_job

Save your file and exit nano. Submit your job using the sbatch command:

sbatch example.sh

You have created a script, example.shthat will ask for 2 nodes each running 44 tasks, for up to 5 hours. It will name the job “first_slurm_job” and run the MPI executablemy_parallel_job using the mpirun command.

Inline Submission

If you would like to submit your job at the command line without creating a script, please try the following:

$ sbatch -t 120 -p 528_queue -N 2 --ntasks-per-node=32 -o hello.out.%j --wrap="mpirun my_parallel_MPI_job"

This requests 32 tasks running on each of two nodes. We left out the --job-name option, and used a shorthand option for time and specified it in minutes. We also specified our own output file where the SLURM jobid will be substituted for the “%j” in the output file name. Note: Openmpi throws a weird error on this (probably due to a bug) but you can get around this error by adding -oversubscribe after mpirun and it will do the right thing, which you can verify by also adding --report-bindings.

If you see the following error message then you used the sbatch command without the --wrap option or a valid shell script. Use the Script submission method or Inline submission method as shown above.

/var/spool/slurmd/job1535767/slurm_script: line 445: /var/spool/slurmd/bin/util/arch.sh: No such file or directory

Example: Submitting mvapich2_2.3rc1 without mpirun

The build of mvapich2_2.3rc1 does not include an mpirun command but you can still run these MPI jobs with one small modification. You can use srun and specify the number of processes as follows.

#!/bin/bash
#SBATCH --job-name=first_slurm_job
#SBATCH -N 2
#SBATCH -p 528_queue
#SBATCH --ntasks-per-node=44
#SBATCH --time=5:00:00  # format days-hh:mm:ss

srun -n $SLURM_NPROCS my_parallel_MPI_job

Interactive Debugging Example

$ module add matlab
$ srun -n 1 -p debug_queue --mem=5g --x11=first matlab

This requests 1 tasks running with 5g memory.

Other SLURM Information

  • Partition (Queue) Information

     sinfo
    
    squeue
    squeue -u <onyen>
    squeue -u <onyen> -l
    

    Dogwood Partitions and User Limits

  • Job details

     scontrol show jobid <job_id_number>
    
  • Cancel Job

     scancel <job_id_number>
    
  • Details of Completed Job Note that -jbelow has a single hyphen ‘-‘, and --formathas two hyphens ‘–‘.

     sacct -j <jobid> --format=JobID,JobName,Partition,ReqMem,MaxRSS,NTasks,AllocCPUS,Elapsed,State
    
    scontrol show job <jobid>
    

See man sacct and scontrolfor details.

 

Last Update 11/21/2024 1:36:46 AM