Now that you are connected to the head node, familiarize yourself with the cluster structure by running the following set of commands. If you are connected using DCV, please open a Terminal window using the icon on the left:
SLURM from SchedMD is one of the batch schedulers that you can use in AWS ParallelCluster. For an overview of the SLURM commands, see the SLURM Quick Start User Guide.
sinfo
shows both the instances we currently have running and those that are not running (think of this as a queue limit). Initially we’ll see all the node in state idle~
, this means no instances are running. When we submit a job we’ll see some instances go into state alloc
meaning they’re currently completely allocated, or mix
meaning some but not all cores are allocated. After the job completes the instance stays around for a few minutes (default cooldown is 10 mins) in state idle%
. This can be confusing, so we’ve tried to simplify it in the below table:State | Description |
---|---|
idle~ |
Instance is not running but can launch when a job is submitted. |
idle% |
Instance is running and will shut down after ScaledownIdletime (default 10 mins). |
mix |
Instance is partially allocated. |
alloc |
Instance is completely allocated. |
sinfo
squeue
Environment Modules are a fairly standard tool in HPC that is used to dynamically change your environment variables (PATH
, LD_LIBRARY_PATH
, etc).
intelmpi
and openmpi
pre-installed. These MPI versions are compiled with support for the high-speed interconnect EFA.module av
mpirun
is available just to see the effect of the module load:mpirun -V
The Intel MPI library can then be loaded, and the command rerun to see the loaded library:
module load intelmpi
mpirun -V
showmount -e localhost
df -h
You’ll see a line like:
/dev/nvme1n1 50G 24K 47G 1% /shared
This is the shared EBS filesystem, mounted at /shared
.
In the next section the input files to run a job using the cluster will be copied.