GETTING STARTED ON DOGWOOD
Table of Contents
- The Dogwood cluster is a Linux-based computing system available to researchers across the campus. This cluster is intended to run large-way, distributed memory, multi-node, parallel jobs. The system has a fast switching fabric (Infiniband EDR interconnect) for this purpose. Here, by large way jobs we mean jobs that span beyond a node (here > 44 cores).
- Dogwood cheat sheet document, technical specifications.
Follow the steps listed on Request a Cluster Account page and select Dogwood Cluster under subscription type. You will receive an email notification once your account has been created.
Linux users can use ssh from within their Terminal application to connect to Dogwood.
If you wish to enable x11 forwarding use the “–X” ssh option. Be sure to use your UNC ONYEN and password for the login:
ssh -X <onyen>@dogwood.unc.edu
Windows users should download MobaXterm (Home Edition). Then use the Session icon to create a Dogwood SSH session using dogwood.unc.edu for “Remote host” and your ONYEN for the “username” (Port should be left at 22).
Mac users can use ssh from within their Terminal application to connect to Dogwood. Be sure to use your UNC ONYEN and password for the login:
ssh -X <onyen>@dogwood.unc.edu
To enable x11 forwarding Mac users will need to download, install, and run Xquartz on their local machine in addition to using the “–X” ssh option. Furthermore, in many instances for x11 forwarding to work properly Mac users need to use the Terminal application that comes with Xquartz instead of the default Mac terminal application.
A successful login takes you to “login node” resources that have been set aside for user access. The login node is where you will edit your code, execute basic UNIX commands, and submit your jobs from to the SLURM job scheduler.
DO NOT RUN YOUR CODE OR RESEARCH APPLICATIONS DIRECTLY ON THE LOGIN NODE. THESE MUST BE SUBMITTED TO SLURM!
NAS home space
Your home directory will be in /nas/longleaf/home/. Your home directory has a 75 GB soft limit and a 50 GB hard limit. Note that the Dogwood and Longleag clusters share the same home file spaces. Thus if you are someone who uses both clusters, we strongly recommend creating a longleaf and/or dogwood subdirectory under your home directory to keep the files separated as needed.
Your scratch directory will be in
(the “o/n/” are the first two letters of your onyen).This is the scatch space for working with large files.The following apply to the scratch space:
- Scratch space uses standard UNIX permissions to control access to files and directories. By default other users in your group (graduate students, faculty, employees) have read access to your scratch directory. You can easily remove this read permission with the “chmod” command.
- A policy has been established for cleaning out files. Scratch file deletion will be enforced with files older than 21 days being removed. Any file not used or modified in the last 21 days will be deleted.
- Scratch space is a shared, temporary work space. Please not that scratch space is not backed up and is, therefore, not intended for permanent data storage. See the “Mass Storage” section below about how to store permanent data.
- Note it is a violation of research computing policy to use artificial means, such as the “touch” command, to maintain unused files in the scratch directory beyond their natural lifetime. Violators will be warned and repeat violators are subject to loss of privileges and access. This is a shared resource, please be courteous to other users.
What follows are suggested “best practices” to keep in mind when using scratch space on the Dogwood cluster:
- Try to avoid using “ls -l” and use “ls” with no options instead. ?? drop this ??
- Never have a large number of files (>1000) in a single directory.
- Avoid submitting jobs in a way that will access the same file(s) at the same point(s) in time.
- Limit the number of processes performing parallel I/O work or other highly intensive I/O jobs.
Formerly, cluster users were provided with access to personal Mass Storage space and could easily use 1 TB of space there. Mass storage space is no longer provided to new cluster accounts.
These legacy mass storage directories were named with the convention /ms/home/
If you still have data and files in your legacy mass storage space, you will need to copy them to an appropriate directory before you can use them when running jobs (e.g., your home directory, /pine space, or /proj space).
If you are part of a department, group, or lab that needs shared read access to legacy mass storage space, please send your request to email@example.com.
The environment on Dogwood is managed as modules. The basic module commands are
module [ add | avail | help | list | load | unload | show ]
When you first log in you should run
And the response should be
To add a module for this session only, use “module add [application]” where “[application]” is the name given on the output of the “module avail” command.
To add a module for every time you login, use “module save”. This does not change your current session, only later logins.
Please refer to the Help document on modules for further information.
This page describes the various MPI modules available on Dogwood.
Once you have decided what software you need to use, added those packages to your environment using modules, and you have successfully compiled your serial or parallel code, you can then submit your jobs to run on Dogwood. We use the Slurm workload manager software to schedule and manage jobs that are submitted to run on Dogwood.
To submit a job to run, you will need to use the SLURM “sbatch” command as shown below. SLURM submits jobs to particular job partitions you specify.
A short description of the partitions available to users in the Dogwood cluster can be found here.
You can check the status of your submitted SLURM jobs with the command “
squeue -u ” (note
squeue shows jobs from all users, provide your onyen to just show your jobs) The output of that command will include a Job ID, the state of your job (e.g. pending or running), the partition to which you submitted the job, the job name, and other information. See “man squeue” for more information on using this command.
If you need to kill/end a running job, use the “
JobID is the SLURM job ID displayed with the “
Finally, you can provide an output file to SLURM (“-o filename” in the sbatch command). For regular jobs, if you don’t provide a name the default file name is “slurm-%j.out”, where the “%j” is replaced by the SLURM job ID.
Note Jobs running outside the SLURM partitions will be killed. The logon privileges of users who repeatedly run jobs outside of the SLURM partitions will be suspended.
Be sure to check the Research Computing home page for information about other resources available to you.
We encourage you to attend a training session on “Using Dogwood” and other related topics. Please refer to the Research Computing Training site for further information.
If you have any questions, please feel free either to call 962-HELP, email firstname.lastname@example.org, or submit an Online Web Ticket.
Last Update 11/29/2023 5:31:25 PM