GETTING STARTED ON DOGWOOD

Table of Contents

Introduction

System Information

Getting an Account

Logging In

Directory Spaces

Mass Storage

Development and Application Environment: Modules

Job Submission

Additional Help

Introduction

  • The Dogwood cluster is a Linux-based computing system available to researchers across the campus. This cluster is intended to run large-way, distributed memory, multi-node, parallel jobs. The system has a fast switching fabric (Infiniband EDR interconnect) for this purpose. Here, by large way jobs we mean jobs that span beyond a node (here > 44 cores). This contains a longer description of dogwood.

System Information

Getting an Account

Follow the steps listed on Request a Cluster Account page and select Dogwood Cluster under subscription type. You will receive an email notification once your account has been created.

Logging In

Linux:

Linux users can use ssh from within their Terminal application to connect to Dogwood.

If you wish to enable x11 forwarding use the “–X” ssh option. Be sure to use your UNC ONYEN and password for the login:

ssh -X <onyen>@dogwood.unc.edu

Windows:

Windows users should download MobaXterm (Home Edition). Then use the Session icon to create a Dogwood SSH session using dogwood.unc.edu for “Remote host” and your ONYEN for the “username” (Port should be left at 22).

Mac:

Mac users can use ssh from within their Terminal application to connect to Dogwood. Be sure to use your UNC ONYEN and password for the login:

ssh -X <onyen>@dogwood.unc.edu

To enable x11 forwarding Mac users will need to download, install, and run Xquartz on their local machine in addition to using the “–X” ssh option. Furthermore, in many instances for x11 forwarding to work properly Mac users need to use the Terminal application that comes with Xquartz instead of the default Mac terminal application.

A successful login takes you to “login node” resources that have been set aside for user access. The login node is where you will edit your code, execute basic UNIX commands, and submit your jobs from to the SLURM job scheduler.

DO NOT RUN YOUR CODE OR RESEARCH APPLICATIONS DIRECTLY ON THE LOGIN NODE. THESE MUST BE SUBMITTED TO SLURM!

Directory Spaces

NAS home space

Your home directory will be in /nas/longleaf/home/. Your home directory has a 75 GB soft limit and a 50 GB hard limit. Note that the Dogwood and Longleag clusters share the same home file spaces. Thus if you are someone who uses both clusters, we strongly recommend creating a longleaf and/or dogwood subdirectory under your home directory to keep the files separated as needed.

/users storage

Your primary storage directory is: /users///

This storage is provided by the same hardware as /proj.

  • High capacity storage
  • OK to compute against, however as IO increases, consider copying or moving to /work for processing
  • OK to hold inactive data sets like a near-line archive
  • If a meaningful amount of cold data accrues, it can be packaged and MOVED to cloud archive, providing more working space for your warm data
  • /users is not intended to be used for team oriented shared storage, like /proj; it is intended to be your personal storage location. Think of it as a capacity expansion to your home directory. In this context, please note that work is NOT intended to be a personal storage location; /work is for data actively being processed with high IO requirements. Please move any data from /work to /users that is not actively being computed upon
  • 10 TB quota

Work/Scratch Space

Your scratch directory will be in

/21dayscratch/scr/o/n/onyen

(the “o/n/” are the first two letters of your onyen).This is the scatch space for working with large files.The following apply to the scratch space:

  • Scratch space uses standard UNIX permissions to control access to files and directories. By default other users in your group (graduate students, faculty, employees) have read access to your scratch directory. You can easily remove this read permission with the “chmod” command.
  • A policy has been established for cleaning out files. Scratch file deletion will be enforced with files older than 21 days being removed. Any file not used or modified in the last 21 days will be deleted.
  • Scratch space is a shared, temporary work space. Please not that scratch space is not backed up and is, therefore, not intended for permanent data storage. See the “Mass Storage” section below about how to store permanent data.
  • Note it is a violation of research computing policy to use artificial means, such as the “touch” command, to maintain unused files in the scratch directory beyond their natural lifetime. Violators will be warned and repeat violators are subject to loss of privileges and access. This is a shared resource, please be courteous to other users.

What follows are suggested “best practices” to keep in mind when using scratch space on the Dogwood cluster:

  • Try to avoid using “ls -l” and use “ls” with no options instead. ?? drop this ??
  • Never have a large number of files (>1000) in a single directory.
  • Avoid submitting jobs in a way that will access the same file(s) at the same point(s) in time.
  • Limit the number of processes performing parallel I/O work or other highly intensive I/O jobs.

Mass Storage

Formerly, cluster users were provided with access to personal Mass Storage space and could easily use 1 TB of space there. Mass storage space is no longer provided to new cluster accounts.

These legacy mass storage directories were named with the convention /ms/home///. Mass storage was intended for long-term storage and archiving of files; it is by nature a very slow file system and now read-only.

If you still have data and files in your legacy mass storage space, you will need to copy them to an appropriate directory before you can use them when running jobs (e.g., your home directory, /pine space, or /proj space).

If you are part of a department, group, or lab that needs shared read access to legacy mass storage space, please send your request to research@unc.edu.

IMPORTANT NOTE:

Development and Application Environment: Modules

The environment on Dogwood is managed as modules. The basic module commands are

module [ add | avail | help | list | load | unload | show ]

When you first log in you should run

module list

And the response should be

1) null

To add a module for this session only, use “module add [application]” where “[application]” is the name given on the output of the “module avail” command.

To add a module for every time you login, use “module save”. This does not change your current session, only later logins.

Please refer to the Help document on modules for further information.

This page describes the various MPI modules available on Dogwood.

Job Submission

Once you have decided what software you need to use, added those packages to your environment using modules, and you have successfully compiled your serial or parallel code, you can then submit your jobs to run on Dogwood. We use the Slurm workload manager software to schedule and manage jobs that are submitted to run on Dogwood.

To submit a job to run, you will need to use the SLURM sbatch command as shown below. SLURM submits jobs to particular job partitions you specify.

A short description of the partitions available to users in the Dogwood cluster can be found here.

Monitoring and Controlling Jobs:

You can check the status of your submitted SLURM jobs with the command “squeue -u ” (note squeue shows jobs from all users, provide your onyen to just show your jobs) The output of that command will include a Job ID, the state of your job (e.g. pending or running), the partition to which you submitted the job, the job name, and other information. See “man squeue” for more information on using this command. If you need to kill/end a running job, use the “scancel” command:

scancel [JobID]

Where JobID is the SLURM job ID displayed with the “squeue” command.

Finally, you can provide an output file to SLURM (“-o filename” in the sbatch command). For regular jobs, if you don’t provide a name the default file name is “slurm-%j.out”, where the “%j” is replaced by the SLURM job ID.

Note Jobs running outside the SLURM partitions will be killed. The logon privileges of users who repeatedly run jobs outside of the SLURM partitions will be suspended.

Additional Help

Be sure to check the Research Computing home page for information about other resources available to you.

We encourage you to attend a training session on “Using Dogwood” and other related topics. Please refer to the Research Computing Training site for further information.

If you have any questions, please feel free either to call 962-HELP, email research@unc.edu, or submit an Online Web Ticket.

 

Last Update 7/15/2024 7:56:02 PM