Managing Workloads

A key concept in HPC/HTC is managing workloads, specifically, the length of time a given job holds onto the resources that have been allocated.

Individual slurm jobs should never, by design, execute for a small amount of time. This is especially true when submitting a high volume of jobs.

An ideal minimum runtime for given slurm job is 1 hour or more; particularly for GPU inference jobs. We have seen cases where the amount of time needed to schedule a job, acquire resources and load a model into GPU memory takes orders of magnitude more time than the execution of a single inference task. It is imperative to batch up inferenece tasks so that meaningful work can be accomplished after expending the time and effort to acquire resources and load models into GPU memory.

Submitting short-running jobs (e.g., ones that complete in seconds or less) to the SLURM job scheduler can be inefficient for a number of reasons:

Scheduling overhead: scheduling latency (the time SLURM takes to queue, allocate resources, and start a job) is often much longer than the runtime of short jobs. If your job only runs for, say, 1–10 seconds but SLURM spends roughly the same amount of time scheduling it, you waste most of your time in overhead rather than computation.
Queue congestion: a large number of short-running jobs can flood the scheduler, making it harder for SLURM to efficiently manage job priorities and resources. This can delay not only your own jobs but also those of other users, harming overall cluster performance.
Accounting and logging overhead: SLURM logs information about every job submission and completion. Thousands of short jobs generate large log files and accounting entries, which can strain the SLURM controller (slurmctld) and database (slurmdbd).
I/O and file system bottlenecks: many short-running jobs often independently read input files and write output files. This can overwhelm the shared filesystem with frequent small I/O operations, leading to contention and degraded performance.

Some alternatives to submitting a high volume of short-runnings jobs include:

Job packing: combine multiple small tasks into a single job script that runs them sequentially or in parallel.
Task launchers: use SLURM tools like srun within a single SLURM allocation to execute many short tasks without re-queuing.
Longer-term allocations: request your allocation for a longer time, then run your short jobs interactively.

Last Update 3/27/2026 4:31:40 PM