Healthtech cluster

Getting an account

Contact Gabriel Renaud and we will contact Peter Wad Sackett.

First time login

Do you have a DTU account? If not contact Gabriel.
Have you changed DTU password after August 2021? If not, change it at password.dtu.dk
If you can log in to DTU webmail https://mail.dtu.dk, your credentials should be in order.
If you can not login in to DTU webmail, you should set up the MFA (2-factor authentication). We refer to DTU general instructions on how to do that.
At this stage try to login using your DTU username and password:

ssh -XC  <username>@login.healthtech.dtu.dk
Example: ssh -XC gabre@login.healthtech.dtu.dk

You will be prompted for the password (you can not see the chars you type), and the MFA code.
If you can not login at this stage, using your DTU username, password and MFA, then you are probably not enrolled into Microsoft Azure. Enroll here. Use your DTU email as username. You will be taken to the DTU login procedure for verification of your identity.
Efter the enrollment, try to login again.
Logout again using Ctrl D or typing "logout".

Subsequent logins

Login using:

ssh -XC  <username>@login.healthtech.dtu.dk

Enter your DTU password and the MFA password when prompted
You should see:

<username>@login ~$

This is the login node, do not run anything there.

Then select a node from 01 to 14:

ssh -XC  <username>@nodeX

where X is 01, 02, 03, ..., 14

if this is your first time logging in, read the:

/home/README.txt

very carefully WHEN on a node. It will tell you where the different directories are.

Check if the Node is busy using:

htop

You will see the CPUs on top and the memory.

Nodes

NodeID	CPUs	GHz (7z)	MIPS (7z)	Memory (kb)
node01	24	2.67GHz	3197	64,304
node02	16	2.40GHz	2898	64,402
node03	24	3.07GHz	3713	96,656
node04	24	2.93GHz	3551	193,424
node05	24	3.33GHz	3773	193,424
node06	24	2.40GHz	3017	64,400
node07	64	2.70GHz	3463	515,942
node08	20	2.30GHz	3456	128,796
node09	16	2.53GHz	2991	120,853
node10	16	2.53GHz	2985	120,853
node11	40	2.60GHz	3872	257,847
node12	16	2.80GHz	3458	24,081
node13	16	2.53GHz	3036	24,081
node14	16	2.43GHz	3002	128,913
compute01	128	3.53GHz	3671	1,056,315
compute02	192	2.62GHz	2917	1,031,526
compute03	128	3.20GHz	2449	515,453

Running things in parallel versus interactive

You can login to node07, node10, node12 and compute01 directly. The rest are non-interactive meaning you need to use slurm (see below). On the iterative nodes, you can use parallel. First, write a file with the commands, let's call it "list_cmds". We need full paths, no relative paths.

Then pick an interative server to run the commands and you write:

   cat list_cmds | parallel

General considerations

You can ssh to the login node and, once on the login node to other nodes without 2 factor or a password using ssh keys. To set up the ssh keys and not have to type your password+2FA, find instructions here: Sshnopassword.

Do not forget to nice -19 your commands.

Using SLURM

cancel your jobs

For a specific job:

    scancel JOBID

For all your jobs:

    scancel -u gabre

get info on jobs

To get info on the nodes:

    sinfo -N -l

submit jobs using sbatch

We will launch 100 jobs of generating random numbers and check if they are prime. First make sure that you have a directory called ~/slurm_out/ and then run:

     for i in `seq 100`; do echo " /home/ctools/bin//python3  /home/projects/MAAG/FileDescriptors/random_int_generator.py   --min 1000000000000 --max 100000000000000 1000 | /home/ctools/bin//python3 /home/projects/MAAG/FileDescriptors/prime_checker.py"; done | xargs -I CMD sbatch     --job-name=gabriel_job     --mem=2G     --time=00:10:00     --cpus-per-task=1     --output=~/slurm_out/slurm_output_%j.out     --error=~/slurm_out/slurm_error_%j.err     --wrap="CMD"

checking jobs

Either use:

     squeue

or:

     watch -n1 "squeue"

to refresh every 1s. You can limit to a user using -u [username]

Healthtech cluster

Contents

Getting an account

First time login

Subsequent logins

Nodes

Running things in parallel versus interactive

General considerations

Using SLURM

cancel your jobs

get info on jobs

submit jobs using sbatch

checking jobs

Navigation menu

Healthtech cluster

Getting an account

First time login

Subsequent logins

Nodes

Running things in parallel versus interactive

General considerations

Using SLURM

cancel your jobs

get info on jobs

submit jobs using sbatch

checking jobs

Navigation menu

Search