Healthtech cluster

From maag
Revision as of 15:29, 5 September 2024 by Gabre (talk | contribs)
Jump to navigation Jump to search

Getting an account

Contact Gabriel Renaud and we will contact Peter Wad Sackett.

First time login

  • Do you have a DTU account? If not contact Gabriel.
  • Have you changed DTU password after August 2021? If not, change it at password.dtu.dk
  • If you can log in to DTU webmail https://mail.dtu.dk, your credentials should be in order.
  • If you can not login in to DTU webmail, you should set up the MFA (2-factor authentication). We refer to DTU general instructions on how to do that.
  • At this stage try to login using your DTU username and password:
ssh -XC  <username>@login.healthtech.dtu.dk
Example: ssh -XC gabre@login.healthtech.dtu.dk
  • You will be prompted for the password (you can not see the chars you type), and the MFA code.
  • If you can not login at this stage, using your DTU username, password and MFA, then you are probably not enrolled into Microsoft Azure. Enroll here. Use your DTU email as username. You will be taken to the DTU login procedure for verification of your identity.
    Efter the enrollment, try to login again.
  • Logout again using Ctrl D or typing "logout".

Subsequent logins

  • Login using:
ssh -XC  <username>@login.healthtech.dtu.dk
  • Enter your DTU password and the MFA password when prompted
  • You should see:
<username>@login ~$ 

This is the login node, do not run anything there.

  • Then select a node from 01 to 14:
ssh -XC  <username>@nodeX

where X is 01, 02, 03, ..., 14

  • if this is your first time logging in, read the:
/home/README.txt

very carefully WHEN on a node. It will tell you where the different directories are.

  • Check if the Node is busy using:
htop

You will see the CPUs on top and the memory.

Nodes

NodeID CPUs GHz (7z) MIPS (7z) Memory (kb)
node01 24 2.67GHz 3197 64,304
node02 16 2.40GHz 2898 64,402
node03 24 3.07GHz 3713 96,656
node04 24 2.93GHz 3551 193,424
node05 24 3.33GHz 3773 193,424
node06 24 2.40GHz 3017 64,400
node07 64 2.70GHz 3463 515,942
node08 20 2.30GHz 3456 128,796
node09 16 2.53GHz 2991 120,853
node10 16 2.53GHz 2985 120,853
node11 40 2.60GHz 3872 257,847
node12 16 2.80GHz 3458 24,081
node13 16 2.53GHz 3036 24,081
node14 16 2.43GHz 3002 128,913
compute01 128 3.53GHz 3671 1,056,315
compute02 192 2.62GHz 2917 1,031,526
compute03 128 3.20GHz 2449 515,453


Running things in parallel

There is no queuing system on the cluster. But that can be simulated using parallel. First, write a file with the commands, let's call it "list_cmds". We need full paths, no relative paths. Then pick a host server to launch the commands from, I pick node10 because it is the slowest (then I keep the faster nodes for computations): you write:

   cat list_cmds | parallel --slf list_serv_1_14minus10

And the file: list_serv_1_14minus10 has:

   node01
   node02
   node03
   node04
   node05
   node06
   node07
   node08
   node09
   node11
   node12
   node13
   node14

General considerations

You can ssh to the login node and, once on the login node to other nodes without 2 factor or a password using ssh keys. To set up the ssh keys and not have to type your password+2FA, find instructions here: Sshnopassword.

Do not forget to nice -19 your commands.