Why Linux

Linux is the most common platform for scientific computing.

Distributions of Linux


Linux shells

Shells

  • A shell translates commands to OS instructions.

  • Most commonly used shells include bash, csh, tcsh, zsh, etc.

  • Sometimes a script or a command does not run simply because it’s written for another shell.

  • We mostly use bash shell commands in this class.

  • Determine the current shell:

    echo $SHELL
    ## /bin/bash
  • List available shells:

    cat /etc/shells
    ## /bin/sh
    ## /bin/bash
    ## /usr/bin/sh
    ## /usr/bin/bash
  • Change to another shell:

    exec bash -l

    The -l option indicates it should be a login shell.

  • Change your login shell permanently:

    chsh -s /bin/bash userid

    Then log out and log in.

Bash completion

Bash provides the following standard completion for the Linux users by default. Much less typing errors and time!

  • Pathname completion.

  • Filename completion.

  • Variablename completion: echo $[TAB][TAB].

  • Username completion: cd ~[TAB][TAB].

  • Hostname completion ssh hwachou@[TAB][TAB].

  • It can also be customized to auto-complete other stuff such as options and command’s arguments. Google bash completion for more information.

Work with text files

View/peek text files

  • cat prints the contents of a file:

    cat runSim.R
    ## ## parsing command arguments
    ## for (arg in commandArgs(TRUE)) {
    ##   eval(parse(text=arg))
    ## }
    ## 
    ## ## check if a given integer is prime
    ## isPrime = function(n) {
    ##   if (n <= 3) {
    ##     return (TRUE)
    ##   }
    ##   if (any((n %% 2:floor(sqrt(n))) == 0)) {
    ##     return (FALSE)
    ##   }
    ##   return (TRUE)
    ## }
    ## 
    ## ## estimate mean only using observation with prime indices
    ## estMeanPrimes = function (x) {
    ##   n = length(x)
    ##   ind = sapply(1:n, isPrime)
    ##   return (mean(x[ind]))
    ## }
    ## 
    ## # simulate data
    ## x = rnorm(n)
    ## 
    ## # estimate mean
    ## estMeanPrimes(x)

  • head prints the first 10 lines of a file:

    head runSim.R
    ## ## parsing command arguments
    ## for (arg in commandArgs(TRUE)) {
    ##   eval(parse(text=arg))
    ## }
    ## 
    ## ## check if a given integer is prime
    ## isPrime = function(n) {
    ##   if (n <= 3) {
    ##     return (TRUE)
    ##   }

    head -l prints the first \(l\) lines of a file:

    head -15 runSim.R
    ## ## parsing command arguments
    ## for (arg in commandArgs(TRUE)) {
    ##   eval(parse(text=arg))
    ## }
    ## 
    ## ## check if a given integer is prime
    ## isPrime = function(n) {
    ##   if (n <= 3) {
    ##     return (TRUE)
    ##   }
    ##   if (any((n %% 2:floor(sqrt(n))) == 0)) {
    ##     return (FALSE)
    ##   }
    ##   return (TRUE)
    ## }
  • tail prints the last 10 lines of a file:

    tail runSim.R
    ##   n = length(x)
    ##   ind = sapply(1:n, isPrime)
    ##   return (mean(x[ind]))
    ## }
    ## 
    ## # simulate data
    ## x = rnorm(n)
    ## 
    ## # estimate mean
    ## estMeanPrimes(x)

    tail -l prints the last \(l\) lines of a file:

    tail -15 runSim.R
    ##   return (TRUE)
    ## }
    ## 
    ## ## estimate mean only using observation with prime indices
    ## estMeanPrimes = function (x) {
    ##   n = length(x)
    ##   ind = sapply(1:n, isPrime)
    ##   return (mean(x[ind]))
    ## }
    ## 
    ## # simulate data
    ## x = rnorm(n)
    ## 
    ## # estimate mean
    ## estMeanPrimes(x)

  • Questions:
    • How to see the 2nd line of the file and nothing else?
    • What about the penultimate (2nd to last) line?

less is more; more is less

  • more browses a text file screen by screen (only downwards). Scroll down one page (paging) by pressing the spacebar; exit by pressing the q key.

  • less is also a pager, but has more functionalities, e.g., scroll upwards and downwards through the input.

  • less doesn’t need to read the whole file, i.e., it loads files faster than more.

grep

grep prints lines that match an expression:

  • Show lines that contain string CentOS:

    # quotes not necessary if not a regular expression
    grep 'CentOS' linux.Rmd
    ## - RHEL/CentOS is popular on servers.
    ## - The teaching server for this class runs CentOS 7.
    ## - Show lines that contain string `CentOS`:
    ##     grep 'CentOS' linux.Rmd
    ##     grep 'CentOS' *.Rmd
    ##     grep -n 'CentOS' linux.Rmd
    ## - Replace `CentOS` by `RHEL` in a text file:
    ##     sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
  • Search multiple text files:

    grep 'CentOS' *.Rmd
    ## - RHEL/CentOS is popular on servers.
    ## - The teaching server for this class runs CentOS 7.
    ## - Show lines that contain string `CentOS`:
    ##     grep 'CentOS' linux.Rmd
    ##     grep 'CentOS' *.Rmd
    ##     grep -n 'CentOS' linux.Rmd
    ## - Replace `CentOS` by `RHEL` in a text file:
    ##     sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
  • Show matching line numbers:

    grep -n 'CentOS' linux.Rmd
    ## 31:- RHEL/CentOS is popular on servers.
    ## 33:- The teaching server for this class runs CentOS 7.
    ## 294:- Show lines that contain string `CentOS`:
    ## 297:    grep 'CentOS' linux.Rmd
    ## 302:    grep 'CentOS' *.Rmd
    ## 307:    grep -n 'CentOS' linux.Rmd
    ## 324:- Replace `CentOS` by `RHEL` in a text file:
    ## 326:    sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
  • Find all files in current directory with .png extension:

    ls | grep '.png$'
    ## key_authentication_1.png
    ## key_authentication_2.png
    ## linux_directory_structure.png
    ## linux_filepermission_oct.png
    ## linux_filepermission.png
    ## Richard_Stallman_2013.png
    ## screenshot_top.png
  • Find all directories in the current directory:

    ls -al | grep '^d'
    ## drwxrwxr-x. 2 hwachou hwachou    4096 Jan 15 17:50 .
    ## drwxrwxr-x. 6 hwachou hwachou     102 Jan 15 04:02 ..

sed

  • sed is a stream editor.

  • Replace CentOS by RHEL in a text file:

    sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
    ## - RHEL/RHEL is popular on servers.
    ## - The teaching server for this class runs RHEL 7.
    ## - Show lines that contain string `RHEL`:
    ##     grep 'RHEL' linux.Rmd
    ##     grep 'RHEL' *.Rmd
    ##     grep -n 'RHEL' linux.Rmd
    ## - Replace `RHEL` by `RHEL` in a text file:
    ##     sed 's/RHEL/RHEL/' linux.Rmd | grep RHEL

awk

  • awk is a filter and report writer.

  • First let’s display first lines of the file /etc/passwd:

    head /etc/passwd
    ## root:x:0:0:root:/root:/bin/bash
    ## bin:x:1:1:bin:/bin:/sbin/nologin
    ## daemon:x:2:2:daemon:/sbin:/sbin/nologin
    ## adm:x:3:4:adm:/var/adm:/sbin/nologin
    ## lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
    ## sync:x:5:0:sync:/sbin:/bin/sync
    ## shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
    ## halt:x:7:0:halt:/sbin:/sbin/halt
    ## mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
    ## operator:x:11:0:operator:/root:/sbin/nologin

    Each line contains fields (1) user name, (2) password, (3) user ID, (4) group ID, (5) user ID info, (6) home directory, and (7) command shell, spearated by :.

  • Print sorted list of login names:

    awk -F: '{ print $1 }' /etc/passwd | sort | head -10
    ## adm
    ## anorthrup
    ## bhsiao
    ## bin
    ## biona001
    ## brendon.chau
    ## bryanmkevan
    ## chrony
    ## daemon
    ## dbus
  • Print number of lines in a file, as NR stands for Number of Rows:

    awk 'END { print NR }' /etc/passwd
    ## 69

    or

    wc -l /etc/passwd
    ## 69 /etc/passwd

    or (not displaying file name)

    wc -l < /etc/passwd
    ## 69
  • Print login names with UID in range 1000-1035:

    awk -F: '{if ($3 >= 1000 && $3 <= 1035) print}' /etc/passwd
    ## huazhou:x:1000:1001::/home/huazhou:/bin/bash
    ## hwachou:x:1001:1003::/home/hwachou:/bin/bash
    ## juhkim111:x:1002:1004::/home/juhkim111:/bin/bash
    ## huijun.an:x:1003:1005::/home/huijun.an:/bin/bash
    ## edenaxe:x:1004:1006::/home/edenaxe:/bin/bash
    ## seancampeau:x:1005:1007::/home/seancampeau:/bin/bash
    ## brendon.chau:x:1006:1008::/home/brendon.chau:/bin/bash
    ## elviscuihan:x:1007:1009::/home/elviscuihan:/bin/bash
    ## qedo:x:1008:1010::/home/qedo:/bin/bash
    ## fangyw1995:x:1009:1011::/home/fangyw1995:/bin/bash
    ## suneeta.godbole:x:1010:1012::/home/suneeta.godbole:/bin/bash
    ## yunh:x:1011:1013::/home/yunh:/bin/bash
    ## bhsiao:x:1012:1014::/home/bhsiao:/bin/bash
    ## hujuehao:x:1013:1015::/home/hujuehao:/bin/bash
    ## lucymjimenez:x:1014:1016::/home/lucymjimenez:/bin/bash
    ## yoonjun05:x:1015:1017::/home/yoonjun05:/bin/bash
    ## shelleyjung:x:1016:1018::/home/shelleyjung:/bin/bash
    ## bryanmkevan:x:1017:1019::/home/bryanmkevan:/bin/bash
    ## sdkwok2:x:1018:1020::/home/sdkwok2:/bin/bash
    ## dereklee20:x:1019:1021::/home/dereklee20:/bin/bash
    ## liuweijie12345678:x:1020:1022::/home/liuweijie12345678:/bin/bash
    ## peterljw:x:1021:1023::/home/peterljw:/bin/bash
    ## kristenmae:x:1022:1024::/home/kristenmae:/bin/bash
    ## menrge666:x:1023:1025::/home/menrge666:/bin/bash
    ## anorthrup:x:1024:1026::/home/anorthrup:/bin/bash
    ## ethanpark26:x:1025:1027::/home/ethanpark26:/bin/bash
    ## shryu94:x:1026:1028::/home/shryu94:/bin/bash
    ## elliewky:x:1027:1029::/home/elliewky:/bin/bash
    ## sijiawang0729:x:1028:1030::/home/sijiawang0729:/bin/bash
    ## xiayu960112:x:1029:1031::/home/xiayu960112:/bin/bash
    ## haowenxu930622:x:1030:1032::/home/haowenxu930622:/bin/bash
    ## zfy917:x:1032:1034::/home/zfy917:/bin/bash
    ## suedez:x:1033:1035::/home/suedez:/bin/bash
    ## wenyan1996:x:1034:1036::/home/wenyan1996:/bin/bash
    ## dmorrison01:x:1035:1037::/home/dmorrison01:/bin/bash
  • Print login names and log-in shells in comma-seperated format:

    awk -F: '{OFS = ","} {print $1, $7}' /etc/passwd
    ## root,/bin/bash
    ## bin,/sbin/nologin
    ## daemon,/sbin/nologin
    ## adm,/sbin/nologin
    ## lp,/sbin/nologin
    ## sync,/bin/sync
    ## shutdown,/sbin/shutdown
    ## halt,/sbin/halt
    ## mail,/sbin/nologin
    ## operator,/sbin/nologin
    ## games,/sbin/nologin
    ## ftp,/sbin/nologin
    ## nobody,/sbin/nologin
    ## systemd-network,/sbin/nologin
    ## dbus,/sbin/nologin
    ## polkitd,/sbin/nologin
    ## ntp,/sbin/nologin
    ## sshd,/sbin/nologin
    ## postfix,/sbin/nologin
    ## chrony,/sbin/nologin
    ## huazhou,/bin/bash
    ## mongodb,/sbin/nologin
    ## tss,/sbin/nologin
    ## rstudio-server,/bin/bash
    ## shiny,/bin/sh
    ## saslauth,/sbin/nologin
    ## hwachou,/bin/bash
    ## juhkim111,/bin/bash
    ## huijun.an,/bin/bash
    ## edenaxe,/bin/bash
    ## seancampeau,/bin/bash
    ## brendon.chau,/bin/bash
    ## elviscuihan,/bin/bash
    ## qedo,/bin/bash
    ## fangyw1995,/bin/bash
    ## suneeta.godbole,/bin/bash
    ## yunh,/bin/bash
    ## bhsiao,/bin/bash
    ## hujuehao,/bin/bash
    ## lucymjimenez,/bin/bash
    ## yoonjun05,/bin/bash
    ## shelleyjung,/bin/bash
    ## bryanmkevan,/bin/bash
    ## sdkwok2,/bin/bash
    ## dereklee20,/bin/bash
    ## liuweijie12345678,/bin/bash
    ## peterljw,/bin/bash
    ## kristenmae,/bin/bash
    ## menrge666,/bin/bash
    ## anorthrup,/bin/bash
    ## ethanpark26,/bin/bash
    ## shryu94,/bin/bash
    ## elliewky,/bin/bash
    ## sijiawang0729,/bin/bash
    ## xiayu960112,/bin/bash
    ## haowenxu930622,/bin/bash
    ## zfy917,/bin/bash
    ## suedez,/bin/bash
    ## wenyan1996,/bin/bash
    ## dmorrison01,/bin/bash
    ## biona001,/bin/bash
    ## gaoxueyao,/bin/bash
    ## jian.he,/bin/bash
    ## zyshi,/bin/bash
    ## wudiyangabc,/bin/bash
    ## kaversoniano,/bin/bash
    ## edwardmjyu,/bin/bash
    ## lizhang1122,/bin/bash
    ## ryhwang,/bin/bash
  • Print login names and indicate those with UID>1000 as vip:

    awk -F: -v status="" '{OFS = ","} 
    {if ($3 >= 1000) status="vip"; else status="regular"} 
    {print $1, status}' /etc/passwd
    ## root,regular
    ## bin,regular
    ## daemon,regular
    ## adm,regular
    ## lp,regular
    ## sync,regular
    ## shutdown,regular
    ## halt,regular
    ## mail,regular
    ## operator,regular
    ## games,regular
    ## ftp,regular
    ## nobody,regular
    ## systemd-network,regular
    ## dbus,regular
    ## polkitd,regular
    ## ntp,regular
    ## sshd,regular
    ## postfix,regular
    ## chrony,regular
    ## huazhou,vip
    ## mongodb,regular
    ## tss,regular
    ## rstudio-server,regular
    ## shiny,regular
    ## saslauth,regular
    ## hwachou,vip
    ## juhkim111,vip
    ## huijun.an,vip
    ## edenaxe,vip
    ## seancampeau,vip
    ## brendon.chau,vip
    ## elviscuihan,vip
    ## qedo,vip
    ## fangyw1995,vip
    ## suneeta.godbole,vip
    ## yunh,vip
    ## bhsiao,vip
    ## hujuehao,vip
    ## lucymjimenez,vip
    ## yoonjun05,vip
    ## shelleyjung,vip
    ## bryanmkevan,vip
    ## sdkwok2,vip
    ## dereklee20,vip
    ## liuweijie12345678,vip
    ## peterljw,vip
    ## kristenmae,vip
    ## menrge666,vip
    ## anorthrup,vip
    ## ethanpark26,vip
    ## shryu94,vip
    ## elliewky,vip
    ## sijiawang0729,vip
    ## xiayu960112,vip
    ## haowenxu930622,vip
    ## zfy917,vip
    ## suedez,vip
    ## wenyan1996,vip
    ## dmorrison01,vip
    ## biona001,vip
    ## gaoxueyao,vip
    ## jian.he,vip
    ## zyshi,vip
    ## wudiyangabc,vip
    ## kaversoniano,vip
    ## edwardmjyu,vip
    ## lizhang1122,vip
    ## ryhwang,vip

Piping and redirection

  • | sends output from one command as input of another command.

  • > directs output from one command to a file.

  • >> appends output from one command to a file.

  • < reads input from a file.

  • Combinations of shell commands (grep, sed, awk, …), piping and redirection, and regular expressions allow us pre-process and reformat huge text files efficiently.

  • See HW1.

Text editors

Source: Editor War on Wikipedia.

Emacs

  • Emacs is a powerful text editor with extensive support for many languages including R, \(\LaTeX\), python, and C/C++; however it’s not installed by default on many Linux distributions.

  • Basic survival commands:
    • emacs filename to open a file with emacs.
    • CTRL-x CTRL-f to open an existing or new file.
    • CTRL-x CTRX-s to save.
    • CTRL-x CTRL-w to save as.
    • CTRL-x CTRL-c to quit.
  • Google emacs cheatsheet

C-<key> means hold the control key, and press <key>.
M-<key> means press the Esc key once, and press <key>.

Vi

  • Vi is ubiquitous (POSIX standard). Learn at least its basics; otherwise you can edit nothing on some clusters.

  • Basic survival commands:
    • vi filename to start editing a file.
    • vi is a modal editor: insert mode and normal mode. Pressing i switches from the normal mode to insert mode. Pressing ESC switches from the insert mode to normal mode.
    • :x<Return> quits vi and saves changes.
    • :q!<Return> quits vi without saving latest changes.
    • :w<Return> saves changes.
    • :wq<Return> quits vi and saves changes.
  • Google vi cheatsheet

IDE (Integrated Development Environment)

  • Statisticians write a lot of code. Critical to adopt a good IDE that goes beyond code editing: syntax highlighting, executing code within editor, debugging, profiling, version control, etc.

  • R Studio, Eclipse, Emacs, Matlab, Visual Studio, etc.

Processes

Processes

  • OS runs processes on behalf of user.

  • Each process has Process ID (PID), Username (UID), Parent process ID (PPID), Time and data process started (STIME), time running (TIME), etc.

    ps
    ##   PID TTY          TIME CMD
    ## 18125 ?        00:00:07 rsession
    ## 18301 ?        00:00:00 sshd
    ## 20674 ?        00:00:00 R
    ## 20763 ?        00:00:00 sh
    ## 20764 ?        00:00:00 ps
  • All current running processes:

    ps -eaf
    ## UID        PID  PPID  C STIME TTY          TIME CMD
    ## root         1     0  0 Jan11 ?        00:00:10 /usr/lib/systemd/systemd --system --deserialize 15
    ## root         2     0  0 Jan11 ?        00:00:00 [kthreadd]
    ## root         3     2  0 Jan11 ?        00:00:00 [ksoftirqd/0]
    ## root         5     2  0 Jan11 ?        00:00:00 [kworker/0:0H]
    ## root         6     2  0 Jan11 ?        00:00:01 [kworker/u8:0]
    ## root         7     2  0 Jan11 ?        00:00:00 [migration/0]
    ## root         8     2  0 Jan11 ?        00:00:00 [rcu_bh]
    ## root         9     2  0 Jan11 ?        00:00:21 [rcu_sched]
    ## root        10     2  0 Jan11 ?        00:00:00 [lru-add-drain]
    ## root        11     2  0 Jan11 ?        00:00:02 [watchdog/0]
    ## root        12     2  0 Jan11 ?        00:00:02 [watchdog/1]
    ## root        13     2  0 Jan11 ?        00:00:00 [migration/1]
    ## root        14     2  0 Jan11 ?        00:00:00 [ksoftirqd/1]
    ## root        16     2  0 Jan11 ?        00:00:00 [kworker/1:0H]
    ## root        17     2  0 Jan11 ?        00:00:02 [watchdog/2]
    ## root        18     2  0 Jan11 ?        00:00:01 [migration/2]
    ## root        19     2  0 Jan11 ?        00:00:00 [ksoftirqd/2]
    ## root        21     2  0 Jan11 ?        00:00:00 [kworker/2:0H]
    ## root        22     2  0 Jan11 ?        00:00:01 [watchdog/3]
    ## root        23     2  0 Jan11 ?        00:00:01 [migration/3]
    ## root        24     2  0 Jan11 ?        00:00:00 [ksoftirqd/3]
    ## root        26     2  0 Jan11 ?        00:00:00 [kworker/3:0H]
    ## root        28     2  0 Jan11 ?        00:00:00 [kdevtmpfs]
    ## root        29     2  0 Jan11 ?        00:00:00 [netns]
    ## root        30     2  0 Jan11 ?        00:00:00 [khungtaskd]
    ## root        31     2  0 Jan11 ?        00:00:00 [writeback]
    ## root        32     2  0 Jan11 ?        00:00:00 [kintegrityd]
    ## root        33     2  0 Jan11 ?        00:00:00 [bioset]
    ## root        34     2  0 Jan11 ?        00:00:00 [bioset]
    ## root        35     2  0 Jan11 ?        00:00:00 [bioset]
    ## root        36     2  0 Jan11 ?        00:00:00 [kblockd]
    ## root        37     2  0 Jan11 ?        00:00:00 [md]
    ## root        38     2  0 Jan11 ?        00:00:00 [edac-poller]
    ## root        39     2  0 Jan11 ?        00:00:00 [watchdogd]
    ## root        46     2  0 Jan11 ?        00:00:00 [kswapd0]
    ## root        47     2  0 Jan11 ?        00:00:00 [ksmd]
    ## root        48     2  0 Jan11 ?        00:00:02 [khugepaged]
    ## root        49     2  0 Jan11 ?        00:00:00 [crypto]
    ## root        57     2  0 Jan11 ?        00:00:00 [kthrotld]
    ## root        59     2  0 Jan11 ?        00:00:00 [kmpath_rdacd]
    ## root        60     2  0 Jan11 ?        00:00:00 [kaluad]
    ## root        61     2  0 Jan11 ?        00:00:00 [kpsmoused]
    ## root        63     2  0 Jan11 ?        00:00:00 [ipv6_addrconf]
    ## root        64     2  0 Jan11 ?        00:00:07 [kworker/1:1]
    ## root        77     2  0 Jan11 ?        00:00:00 [deferwq]
    ## root       111     2  0 Jan11 ?        00:00:14 [kauditd]
    ## root      1445     2  0 Jan11 ?        00:00:00 [virtscsi-scan]
    ## root      1447     2  0 Jan11 ?        00:00:00 [scsi_eh_0]
    ## root      1448     2  0 Jan11 ?        00:00:00 [scsi_tmf_0]
    ## root      1455     2  0 Jan11 ?        00:00:00 [kworker/u8:2]
    ## root      1515     2  0 Jan11 ?        00:00:00 [bioset]
    ## root      1517     2  0 Jan11 ?        00:00:00 [xfsalloc]
    ## root      1519     2  0 Jan11 ?        00:00:00 [xfs_mru_cache]
    ## root      1525     2  0 Jan11 ?        00:00:00 [xfs-buf/sda1]
    ## root      1528     2  0 Jan11 ?        00:00:00 [xfs-data/sda1]
    ## root      1529     2  0 Jan11 ?        00:00:00 [xfs-conv/sda1]
    ## root      1535     2  0 Jan11 ?        00:00:00 [xfs-cil/sda1]
    ## root      1536     2  0 Jan11 ?        00:00:00 [xfs-reclaim/sda]
    ## root      1537     2  0 Jan11 ?        00:00:00 [xfs-log/sda1]
    ## root      1538     2  0 Jan11 ?        00:00:00 [xfs-eofblocks/s]
    ## root      1539     2  0 Jan11 ?        00:01:37 [xfsaild/sda1]
    ## root      1540     2  0 Jan11 ?        00:00:01 [kworker/0:1H]
    ## root      1582     2  0 Jan11 ?        00:00:00 [kworker/1:1H]
    ## root      1593     1  0 Jan11 ?        00:01:18 /usr/lib/systemd/systemd-journald
    ## root      2014     1  0 Jan11 ?        00:00:21 /sbin/auditd
    ## root      2533     2  0 Jan11 ?        00:00:00 [nfit]
    ## dbus      2603     1  0 Jan11 ?        00:00:01 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
    ## root      3095     1  0 Jan11 ?        00:00:01 /usr/lib/systemd/systemd-logind
    ## root      3137     1  0 Jan11 ?        00:00:00 /opt/shiny-server/ext/node/bin/shiny-server /opt/shiny-server/lib/main.js
    ## polkitd   3141     1  0 Jan11 ?        00:00:00 /usr/lib/polkit-1/polkitd --no-debug
    ## root      3151     1  0 Jan11 ?        00:00:00 /usr/sbin/acpid
    ## root      3155     1  0 Jan11 ?        00:00:01 /usr/sbin/crond -n
    ## root      3189     1  0 Jan11 tty1     00:00:00 /sbin/agetty --noclear tty1 linux
    ## root      3199     1  0 Jan11 ttyS0    00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
    ## chrony    3203     1  0 Jan11 ?        00:00:00 /usr/sbin/chronyd
    ## rstudio+  3261     1  0 Jan11 ?        00:02:06 /usr/lib/rstudio-server/bin/rserver
    ## root      3269     1  0 Jan11 ?        00:00:00 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
    ## root      3283     1  0 Jan11 ?        00:00:07 /usr/sbin/NetworkManager --no-daemon
    ## root      3593  3283  0 Jan11 ?        00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-41b5db38-b54b-449d-8471-1311ba0d5b71-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
    ## root      3840     1  0 Jan11 ?        00:00:00 /usr/sbin/cupsd -f
    ## root      3842     1  0 Jan11 ?        00:00:46 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
    ## root      3843     1  0 Jan11 ?        00:01:05 /usr/sbin/rsyslogd -n
    ## root      4120     1  0 Jan11 ?        00:00:12 /usr/sbin/sshd -D
    ## root      4122     1  0 Jan11 ?        00:00:19 /usr/bin/python /usr/bin/google_network_daemon
    ## root      4126     1  0 Jan11 ?        00:00:09 /usr/bin/python /usr/bin/google_clock_skew_daemon
    ## root      4128     1  0 Jan11 ?        00:00:31 /usr/bin/python /usr/bin/google_accounts_daemon
    ## root      4131     1  0 Jan11 ?        00:00:02 /usr/libexec/postfix/master -w
    ## postfix   4142  4131  0 Jan11 ?        00:00:00 qmgr -l -t unix -u
    ## root      4168     2  0 Jan11 ?        00:00:00 [kworker/3:1H]
    ## root      4174     2  0 Jan11 ?        00:00:00 [kworker/2:1H]
    ## root      5763     2  0 Jan12 ?        00:00:02 [kworker/3:0]
    ## root     11279     1  0 07:37 ?        00:00:00 /usr/lib/systemd/systemd-udevd
    ## root     15035     2  0 12:04 ?        00:00:00 [kworker/3:1]
    ## root     15673     2  0 13:10 ?        00:00:00 [kworker/0:1]
    ## root     18070     2  0 17:15 ?        00:00:00 [kworker/1:2]
    ## hwachou  18125  3261  0 17:21 ?        00:00:07 /usr/lib/rstudio-server/bin/rsession -u hwachou --launcher-token D66448C1
    ## root     18283     2  0 17:31 ?        00:00:00 [kworker/2:2]
    ## root     18296  4120  0 17:33 ?        00:00:00 sshd: hwachou [priv]
    ## hwachou  18301 18296  0 17:33 ?        00:00:00 sshd: hwachou@pts/0
    ## hwachou  18302 18301  0 17:33 pts/0    00:00:00 -bash
    ## postfix  19009  4131  0 17:40 ?        00:00:00 pickup -l -t unix -u
    ## root     19304     2  0 17:42 ?        00:00:00 [kworker/2:0]
    ## hwachou  19463 18302  0 17:42 pts/0    00:00:00 top
    ## root     20151     2  0 17:47 ?        00:00:00 [kworker/2:1]
    ## hwachou  20674 18125 45 17:51 ?        00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/hwachou/Hua-Zhou.github.io/teaching/biostatm280-2019winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
    ## root     20688     2  0 Jan11 ?        00:00:04 [kworker/0:0]
    ## hwachou  20765 20674  0 17:51 ?        00:00:00 sh -c 'bash'  -c 'ps -eaf' 2>&1
    ## hwachou  20766 20765  0 17:51 ?        00:00:00 ps -eaf
  • All Python processes:

    ps -eaf | grep python
    ## root      3269     1  0 Jan11 ?        00:00:00 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
    ## root      3842     1  0 Jan11 ?        00:00:46 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
    ## root      4122     1  0 Jan11 ?        00:00:19 /usr/bin/python /usr/bin/google_network_daemon
    ## root      4126     1  0 Jan11 ?        00:00:09 /usr/bin/python /usr/bin/google_clock_skew_daemon
    ## root      4128     1  0 Jan11 ?        00:00:31 /usr/bin/python /usr/bin/google_accounts_daemon
    ## hwachou  20767 20674  0 17:51 ?        00:00:00 sh -c 'bash'  -c 'ps -eaf | grep python' 2>&1
    ## hwachou  20768 20767  0 17:51 ?        00:00:00 bash -c ps -eaf | grep python
    ## hwachou  20770 20768  0 17:51 ?        00:00:00 grep python
  • Process with PID=1:

    ps -fp 1
    ## UID        PID  PPID  C STIME TTY          TIME CMD
    ## root         1     0  0 Jan11 ?        00:00:10 /usr/lib/systemd/systemd --system --deserialize 15
  • All processes owned by a user:

    ps -fu hwachou
    ## UID        PID  PPID  C STIME TTY          TIME CMD
    ## hwachou  18125  3261  0 17:21 ?        00:00:07 /usr/lib/rstudio-server/bin/rsession -u hwachou --launcher-token D66448C1
    ## hwachou  18301 18296  0 17:33 ?        00:00:00 sshd: hwachou@pts/0
    ## hwachou  18302 18301  0 17:33 pts/0    00:00:00 -bash
    ## hwachou  19463 18302  0 17:42 pts/0    00:00:00 top
    ## hwachou  20674 18125 46 17:51 ?        00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/hwachou/Hua-Zhou.github.io/teaching/biostatm280-2019winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
    ## hwachou  20773 20674  0 17:51 ?        00:00:00 sh -c 'bash'  -c 'ps -fu hwachou' 2>&1
    ## hwachou  20774 20773  0 17:51 ?        00:00:00 ps -fu hwachou

Kill processes

  • Kill process with PID=1001:

    kill 1001
  • Kill all R processes.

    killall -r R

top

  • top prints realtime process information (very useful).

    top

Secure shell (SSH)

SSH

SSH (secure shell) is the dominant cryptographic network protocol for secure network connection via an insecure network.

  • On Linux or Mac, access the teaching server by

    ssh username@server.biostat-m280.info
  • Windows machines need the PuTTY program (free).

Use keys over password

  • Key authentication is more secure than password. Most passwords are weak.

  • Script or a program may need to systematically SSH into other machines.

  • Log into multiple machines using the same key.

  • Seamless use of many services: Git, AWS or Google cloud service, parallel computing on multiple hosts, etc.

  • Many servers only allow key authentication and do not accept password authentication.

Key authentication


  • Public key. Put on the machine(s) you want to log in.

  • Private key. Put on your own computer. Consider this as the actual key in your pocket; never give to others.

  • Messages from server to your computer is encrypted with your public key. It can only be decrypted using your private key.

  • Messages from your computer to server is signed with your private key (digital signatures) and can be verified by anyone who has your public key (authentication).

Steps for generate keys

  • On Linux or Mac, to generate a key pair:

    ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
    • [KEY_FILENAME] is the name that you want to use for your SSH key files. For example, a filename of id_rsa generates a private key file named id_rsa and a public key file named id_rsa.pub.

    • [USERNAME] is the user for whom you will apply this SSH key.

    • Use a (optional) paraphrase different form password.

  • Set correct permissions on the .ssh folder and key files

    chmod 400 ~/.ssh/[KEY_FILENAME]

  • Append the public key to the ~/.ssh/authorized_keys file of any Linux machine we want to SSH to, e.g.,

    ssh-copy-id -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.biostat-m280.info
  • Test your new key.

    ssh -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.biostat-m280.info
  • Now you don’t need password each time you connect from your machine to the teaching server.


  • If you set paraphrase when generating keys, you’ll be prompted for the paraphrase each time the private key is used. Avoid repeatedly entering the paraphrase by using ssh-agent on Linux/Mac or Pagent on Windows.

  • Same key pair can be used between any two machines. We don’t need to regenerate keys for each new connection.

  • For Windows users, the private key generated by ssh-keygen cannot be directly used by PuTTY; use PuTTYgen for conversion. Then let PuTTYgen use the converted private key. Read tutorial.

Transfer files between machines

  • scp securely transfers files between machines using SSH.

    ## copy file from local to remote
    scp [LOCALFILE] [USERNAME]@server.biostat-m280.info:/[PATHTOFOLDER]
    ## copy file from remote to local
    scp [USERNAME]@server.biostat-m280.info:/[PATHTOFILE] [PATHTOLOCALFOLDER]
  • sftp is FTP via SSH.

  • GUIs for Windows (WinSCP) or Mac (Cyberduck).

  • (My preferred way) Use a version control system to sync project files between different machines and systems.

Line breaks in text files

  • Windows uses a pair of CR and LF for line breaks.

  • Linux/Unix uses an LF character only.

  • MacOS X also uses a single LF character. But old Mac OS used a single CR character for line breaks.

  • If transferred in binary mode (bit by bit) between OSs, a text file could look a mess.

  • Most transfer programs automatically switch to text mode when transferring text files and perform conversion of line breaks between different OSs; but I used to run into problems using WinSCP. Sometimes you have to tell WinSCP explicitly a text file is being transferred.

Run R in Linux

Interactive mode

  • Start R in the interactive mode by typing R in shell.

  • Then run R script by

    source("script.R")

Batch mode

  • Demo script meanEst.R implements an (terrible) estimator of mean \[ {\widehat \mu}_n = \frac{\sum_{i=1}^n x_i 1_{x_i \text{ is prime}}}{\sum_{i=1}^n 1_{x_i \text{ is prime}}}. \]

    ## ## check if a given integer is prime
    ## isPrime = function(n) {
    ##   if (n <= 3) {
    ##     return (TRUE)
    ##   }
    ##   if (any((n %% 2:floor(sqrt(n))) == 0)) {
    ##     return (FALSE)
    ##   }
    ##   return (TRUE)
    ## }
    ## 
    ## ## estimate mean only using observation with prime indices
    ## estMeanPrimes = function (x) {
    ##   n = length(x)
    ##   ind = sapply(1:n, isPrime)
    ##   return (mean(x[ind]))
    ## }
    ## 
    ## print(estMeanPrimes(rnorm(100000)))

  • To run your R code non-interactively aka in batch mode, we have at least two options:

    # default output to meanEst.Rout
    R CMD BATCH meanEst.R

    or

    # output to stdout
    Rscript meanEst.R
  • Typically automate batch calls using a scripting language, e.g., Python, perl, and shell script.

Pass arguments to R scripts

  • Specify arguments in R CMD BATCH:

    R CMD BATCH '--args mu=1 sig=2 kap=3' script.R
  • Specify arguments in Rscript:

    Rscript script.R mu=1 sig=2 kap=3
  • Parse command line arguments using magic formula

    for (arg in commandArgs(T)) {
      eval(parse(text=arg))
    }

    in R script. After calling the above code, all command line arguments will be available in the global namespace.


  • To understand the magic formula commandArgs, run R by:

    R '--args mu=1 sig=2 kap=3'

    and then issue commands in R

    commandArgs()
    commandArgs(TRUE)

  • Understand the magic formula parse and eval:

    rm(list=ls())
    print(x)
    ## Error in print(x): object 'x' not found
    parse(text="x=3")
    ## expression(x = 3)
    eval(parse(text="x=3"))
    print(x)
    ## [1] 3

  • runSim.R has components: (1) method implementation, (2) data generator with unspecified parameter n, (3) estimation based on generated data, and (4) command argument parser.
## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
##   eval(parse(text=arg))
## }
## 
## ## check if a given integer is prime
## isPrime = function(n) {
##   if (n <= 3) {
##     return (TRUE)
##   }
##   if (any((n %% 2:floor(sqrt(n))) == 0)) {
##     return (FALSE)
##   }
##   return (TRUE)
## }
## 
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
##   n = length(x)
##   ind = sapply(1:n, isPrime)
##   return (mean(x[ind]))
## }
## 
## # simulate data
## x = rnorm(n)
## 
## # estimate mean
## estMeanPrimes(x)

  • Call runSim.R with sample size n=100:

    R CMD BATCH '--args n=100' runSim.R

    or

    Rscript runSim.R n=100
    ## [1] 0.1553339

Run long jobs

  • Many statistical computing tasks take long: simulation, MCMC, etc.

  • nohup command in Linux runs program(s) immune to hangups and writes output to nohup.out by default. Logging out will not kill the process; we can log in later to check status and results.

  • nohup is POSIX standard thus available on Linux and MacOS.

  • Run runSim.R in background and writes output to nohup.out:

    nohup Rscript runSim.R n=100 &
    ## [1] -0.2407291

screen

  • screen is another popular utility, but not installed by default.

  • Typical workflow using screen.

    1. Access remote server using ssh.

    2. Start jobs in batch mode.

    3. Detach jobs.

    4. Exit from server, wait for jobs to finish.

    5. Access remote server using ssh.

    6. Re-attach jobs, check on progress, get results, etc.

Use R to call R

R in conjuction with nohup or screen can be used to orchestrate a large simulation study.

  • It can be more elegant, transparent, and robust to parallelize jobs corresponding to different scenarios (e.g., different generative models) outside of the code used to do statistical computation.

  • We consider a simulation study in R but the same approach could be used with code written in Julia, Matlab, Python, etc.

  • Python in many ways makes a better glue.


  • Suppose we have
    • runSim.R which runs a simulation based on command line argument n.
    • A large collection of n values that we want to use in our simulation study.
    • Access to a server with 128 cores.
  • Option 1: manually call runSim.R for each setting.

  • Option 2: automate calls using R and nohup. autoSim.R


  • cat autoSim.R
    ## # autoSim.R
    ## 
    ## nVals <- seq(100, 1000, by=100)
    ## for (n in nVals) {
    ##   oFile <- paste("n", n, ".txt", sep="")
    ##   sysCall <- paste("nohup Rscript runSim.R n=", n, " > ", oFile, sep="")
    ##   system(sysCall)
    ##   print(paste("sysCall=", sysCall, sep=""))
    ## }

  • Rscript autoSim.R
    ## [1] "sysCall=nohup Rscript runSim.R n=100 > n100.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=200 > n200.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=300 > n300.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=400 > n400.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=500 > n500.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=600 > n600.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=700 > n700.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=800 > n800.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=900 > n900.txt"
    ## [1] "sysCall=nohup Rscript runSim.R n=1000 > n1000.txt"
  • Now we just need to write a script to collect results from the output files.