Linux is the most common platform for scientific computing.
Open source and community support.
Things break; when they break using Linux, it’s easy to fix.
Cost: it’s free!
Debian/Ubuntu is a popular choice for personal computers.
RHEL/CentOS is popular on servers.
The teaching server for this class runs CentOS 7.
MacOS was originally derived from Unix/Linux (Darwin kernel). It is POSIX compliant. Most shell commands we review here apply to MacOS terminal as well. Windows/DOS, unfortunately, is a totally different breed.
Show distribution/version on Linux:
cat /etc/*-release
## CentOS Linux release 7.6.1810 (Core)
## NAME="CentOS Linux"
## VERSION="7 (Core)"
## ID="centos"
## ID_LIKE="rhel fedora"
## VERSION_ID="7"
## PRETTY_NAME="CentOS Linux 7 (Core)"
## ANSI_COLOR="0;31"
## CPE_NAME="cpe:/o:centos:centos:7"
## HOME_URL="https://www.centos.org/"
## BUG_REPORT_URL="https://bugs.centos.org/"
##
## CENTOS_MANTISBT_PROJECT="CentOS-7"
## CENTOS_MANTISBT_PROJECT_VERSION="7"
## REDHAT_SUPPORT_PRODUCT="centos"
## REDHAT_SUPPORT_PRODUCT_VERSION="7"
##
## CentOS Linux release 7.6.1810 (Core)
## CentOS Linux release 7.6.1810 (Core)Show distribution/version on MacOS:
sw_vers -productVersion
or
system_profiler SPSoftwareDataTypeA shell translates commands to OS instructions.
Most commonly used shells include bash, csh, tcsh, zsh, etc.
Sometimes a script or a command does not run simply because it’s written for another shell.
We mostly use bash shell commands in this class.
Determine the current shell:
echo $SHELL
## /bin/bashList available shells:
cat /etc/shells
## /bin/sh
## /bin/bash
## /usr/bin/sh
## /usr/bin/bashChange to another shell:
exec bash -l
The -l option indicates it should be a login shell.
Change your login shell permanently:
chsh -s /bin/bash userid
Then log out and log in.
Bash provides the following standard completion for the Linux users by default. Much less typing errors and time!
Pathname completion.
Filename completion.
Variablename completion: echo $[TAB][TAB].
Username completion: cd ~[TAB][TAB].
Hostname completion ssh hwachou@[TAB][TAB].
It can also be customized to auto-complete other stuff such as options and command’s arguments. Google bash completion for more information.
cat prints the contents of a file:
cat runSim.R
## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)head prints the first 10 lines of a file:
head runSim.R
## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
head -l prints the first \(l\) lines of a file:
head -15 runSim.R
## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }tail prints the last 10 lines of a file:
tail runSim.R
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)
tail -l prints the last \(l\) lines of a file:
tail -15 runSim.R
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)less is more; more is lessmore browses a text file screen by screen (only downwards). Scroll down one page (paging) by pressing the spacebar; exit by pressing the q key.
less is also a pager, but has more functionalities, e.g., scroll upwards and downwards through the input.
less doesn’t need to read the whole file, i.e., it loads files faster than more.
grepgrep prints lines that match an expression:
Show lines that contain string CentOS:
# quotes not necessary if not a regular expression
grep 'CentOS' linux.Rmd
## - RHEL/CentOS is popular on servers.
## - The teaching server for this class runs CentOS 7.
## - Show lines that contain string `CentOS`:
## grep 'CentOS' linux.Rmd
## grep 'CentOS' *.Rmd
## grep -n 'CentOS' linux.Rmd
## - Replace `CentOS` by `RHEL` in a text file:
## sed 's/CentOS/RHEL/' linux.Rmd | grep RHELSearch multiple text files:
grep 'CentOS' *.Rmd
## - RHEL/CentOS is popular on servers.
## - The teaching server for this class runs CentOS 7.
## - Show lines that contain string `CentOS`:
## grep 'CentOS' linux.Rmd
## grep 'CentOS' *.Rmd
## grep -n 'CentOS' linux.Rmd
## - Replace `CentOS` by `RHEL` in a text file:
## sed 's/CentOS/RHEL/' linux.Rmd | grep RHELShow matching line numbers:
grep -n 'CentOS' linux.Rmd
## 31:- RHEL/CentOS is popular on servers.
## 33:- The teaching server for this class runs CentOS 7.
## 294:- Show lines that contain string `CentOS`:
## 297: grep 'CentOS' linux.Rmd
## 302: grep 'CentOS' *.Rmd
## 307: grep -n 'CentOS' linux.Rmd
## 324:- Replace `CentOS` by `RHEL` in a text file:
## 326: sed 's/CentOS/RHEL/' linux.Rmd | grep RHELFind all files in current directory with .png extension:
ls | grep '.png$'
## key_authentication_1.png
## key_authentication_2.png
## linux_directory_structure.png
## linux_filepermission_oct.png
## linux_filepermission.png
## Richard_Stallman_2013.png
## screenshot_top.pngFind all directories in the current directory:
ls -al | grep '^d'
## drwxrwxr-x. 2 hwachou hwachou 4096 Jan 15 17:50 .
## drwxrwxr-x. 6 hwachou hwachou 102 Jan 15 04:02 ..sedsed is a stream editor.
Replace CentOS by RHEL in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
## - RHEL/RHEL is popular on servers.
## - The teaching server for this class runs RHEL 7.
## - Show lines that contain string `RHEL`:
## grep 'RHEL' linux.Rmd
## grep 'RHEL' *.Rmd
## grep -n 'RHEL' linux.Rmd
## - Replace `RHEL` by `RHEL` in a text file:
## sed 's/RHEL/RHEL/' linux.Rmd | grep RHELawkawk is a filter and report writer.
First let’s display first lines of the file /etc/passwd:
head /etc/passwd
## root:x:0:0:root:/root:/bin/bash
## bin:x:1:1:bin:/bin:/sbin/nologin
## daemon:x:2:2:daemon:/sbin:/sbin/nologin
## adm:x:3:4:adm:/var/adm:/sbin/nologin
## lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
## sync:x:5:0:sync:/sbin:/bin/sync
## shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
## halt:x:7:0:halt:/sbin:/sbin/halt
## mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
## operator:x:11:0:operator:/root:/sbin/nologin
Each line contains fields (1) user name, (2) password, (3) user ID, (4) group ID, (5) user ID info, (6) home directory, and (7) command shell, spearated by :.
Print sorted list of login names:
awk -F: '{ print $1 }' /etc/passwd | sort | head -10
## adm
## anorthrup
## bhsiao
## bin
## biona001
## brendon.chau
## bryanmkevan
## chrony
## daemon
## dbusPrint number of lines in a file, as NR stands for Number of Rows:
awk 'END { print NR }' /etc/passwd
## 69
or
wc -l /etc/passwd
## 69 /etc/passwd
or (not displaying file name)
wc -l < /etc/passwd
## 69Print login names with UID in range 1000-1035:
awk -F: '{if ($3 >= 1000 && $3 <= 1035) print}' /etc/passwd
## huazhou:x:1000:1001::/home/huazhou:/bin/bash
## hwachou:x:1001:1003::/home/hwachou:/bin/bash
## juhkim111:x:1002:1004::/home/juhkim111:/bin/bash
## huijun.an:x:1003:1005::/home/huijun.an:/bin/bash
## edenaxe:x:1004:1006::/home/edenaxe:/bin/bash
## seancampeau:x:1005:1007::/home/seancampeau:/bin/bash
## brendon.chau:x:1006:1008::/home/brendon.chau:/bin/bash
## elviscuihan:x:1007:1009::/home/elviscuihan:/bin/bash
## qedo:x:1008:1010::/home/qedo:/bin/bash
## fangyw1995:x:1009:1011::/home/fangyw1995:/bin/bash
## suneeta.godbole:x:1010:1012::/home/suneeta.godbole:/bin/bash
## yunh:x:1011:1013::/home/yunh:/bin/bash
## bhsiao:x:1012:1014::/home/bhsiao:/bin/bash
## hujuehao:x:1013:1015::/home/hujuehao:/bin/bash
## lucymjimenez:x:1014:1016::/home/lucymjimenez:/bin/bash
## yoonjun05:x:1015:1017::/home/yoonjun05:/bin/bash
## shelleyjung:x:1016:1018::/home/shelleyjung:/bin/bash
## bryanmkevan:x:1017:1019::/home/bryanmkevan:/bin/bash
## sdkwok2:x:1018:1020::/home/sdkwok2:/bin/bash
## dereklee20:x:1019:1021::/home/dereklee20:/bin/bash
## liuweijie12345678:x:1020:1022::/home/liuweijie12345678:/bin/bash
## peterljw:x:1021:1023::/home/peterljw:/bin/bash
## kristenmae:x:1022:1024::/home/kristenmae:/bin/bash
## menrge666:x:1023:1025::/home/menrge666:/bin/bash
## anorthrup:x:1024:1026::/home/anorthrup:/bin/bash
## ethanpark26:x:1025:1027::/home/ethanpark26:/bin/bash
## shryu94:x:1026:1028::/home/shryu94:/bin/bash
## elliewky:x:1027:1029::/home/elliewky:/bin/bash
## sijiawang0729:x:1028:1030::/home/sijiawang0729:/bin/bash
## xiayu960112:x:1029:1031::/home/xiayu960112:/bin/bash
## haowenxu930622:x:1030:1032::/home/haowenxu930622:/bin/bash
## zfy917:x:1032:1034::/home/zfy917:/bin/bash
## suedez:x:1033:1035::/home/suedez:/bin/bash
## wenyan1996:x:1034:1036::/home/wenyan1996:/bin/bash
## dmorrison01:x:1035:1037::/home/dmorrison01:/bin/bashPrint login names and log-in shells in comma-seperated format:
awk -F: '{OFS = ","} {print $1, $7}' /etc/passwd
## root,/bin/bash
## bin,/sbin/nologin
## daemon,/sbin/nologin
## adm,/sbin/nologin
## lp,/sbin/nologin
## sync,/bin/sync
## shutdown,/sbin/shutdown
## halt,/sbin/halt
## mail,/sbin/nologin
## operator,/sbin/nologin
## games,/sbin/nologin
## ftp,/sbin/nologin
## nobody,/sbin/nologin
## systemd-network,/sbin/nologin
## dbus,/sbin/nologin
## polkitd,/sbin/nologin
## ntp,/sbin/nologin
## sshd,/sbin/nologin
## postfix,/sbin/nologin
## chrony,/sbin/nologin
## huazhou,/bin/bash
## mongodb,/sbin/nologin
## tss,/sbin/nologin
## rstudio-server,/bin/bash
## shiny,/bin/sh
## saslauth,/sbin/nologin
## hwachou,/bin/bash
## juhkim111,/bin/bash
## huijun.an,/bin/bash
## edenaxe,/bin/bash
## seancampeau,/bin/bash
## brendon.chau,/bin/bash
## elviscuihan,/bin/bash
## qedo,/bin/bash
## fangyw1995,/bin/bash
## suneeta.godbole,/bin/bash
## yunh,/bin/bash
## bhsiao,/bin/bash
## hujuehao,/bin/bash
## lucymjimenez,/bin/bash
## yoonjun05,/bin/bash
## shelleyjung,/bin/bash
## bryanmkevan,/bin/bash
## sdkwok2,/bin/bash
## dereklee20,/bin/bash
## liuweijie12345678,/bin/bash
## peterljw,/bin/bash
## kristenmae,/bin/bash
## menrge666,/bin/bash
## anorthrup,/bin/bash
## ethanpark26,/bin/bash
## shryu94,/bin/bash
## elliewky,/bin/bash
## sijiawang0729,/bin/bash
## xiayu960112,/bin/bash
## haowenxu930622,/bin/bash
## zfy917,/bin/bash
## suedez,/bin/bash
## wenyan1996,/bin/bash
## dmorrison01,/bin/bash
## biona001,/bin/bash
## gaoxueyao,/bin/bash
## jian.he,/bin/bash
## zyshi,/bin/bash
## wudiyangabc,/bin/bash
## kaversoniano,/bin/bash
## edwardmjyu,/bin/bash
## lizhang1122,/bin/bash
## ryhwang,/bin/bashPrint login names and indicate those with UID>1000 as vip:
awk -F: -v status="" '{OFS = ","}
{if ($3 >= 1000) status="vip"; else status="regular"}
{print $1, status}' /etc/passwd
## root,regular
## bin,regular
## daemon,regular
## adm,regular
## lp,regular
## sync,regular
## shutdown,regular
## halt,regular
## mail,regular
## operator,regular
## games,regular
## ftp,regular
## nobody,regular
## systemd-network,regular
## dbus,regular
## polkitd,regular
## ntp,regular
## sshd,regular
## postfix,regular
## chrony,regular
## huazhou,vip
## mongodb,regular
## tss,regular
## rstudio-server,regular
## shiny,regular
## saslauth,regular
## hwachou,vip
## juhkim111,vip
## huijun.an,vip
## edenaxe,vip
## seancampeau,vip
## brendon.chau,vip
## elviscuihan,vip
## qedo,vip
## fangyw1995,vip
## suneeta.godbole,vip
## yunh,vip
## bhsiao,vip
## hujuehao,vip
## lucymjimenez,vip
## yoonjun05,vip
## shelleyjung,vip
## bryanmkevan,vip
## sdkwok2,vip
## dereklee20,vip
## liuweijie12345678,vip
## peterljw,vip
## kristenmae,vip
## menrge666,vip
## anorthrup,vip
## ethanpark26,vip
## shryu94,vip
## elliewky,vip
## sijiawang0729,vip
## xiayu960112,vip
## haowenxu930622,vip
## zfy917,vip
## suedez,vip
## wenyan1996,vip
## dmorrison01,vip
## biona001,vip
## gaoxueyao,vip
## jian.he,vip
## zyshi,vip
## wudiyangabc,vip
## kaversoniano,vip
## edwardmjyu,vip
## lizhang1122,vip
## ryhwang,vip| sends output from one command as input of another command.
> directs output from one command to a file.
>> appends output from one command to a file.
< reads input from a file.
Combinations of shell commands (grep, sed, awk, …), piping and redirection, and regular expressions allow us pre-process and reformat huge text files efficiently.
See HW1.
Emacs is a powerful text editor with extensive support for many languages including R, \(\LaTeX\), python, and C/C++; however it’s not installed by default on many Linux distributions.
emacs filename to open a file with emacs.CTRL-x CTRL-f to open an existing or new file.CTRL-x CTRX-s to save.CTRL-x CTRL-w to save as.CTRL-x CTRL-c to quit.Google emacs cheatsheet
C-<key> means hold the control key, and press <key>.
M-<key> means press the Esc key once, and press <key>.
Vi is ubiquitous (POSIX standard). Learn at least its basics; otherwise you can edit nothing on some clusters.
vi filename to start editing a file.vi is a modal editor: insert mode and normal mode. Pressing i switches from the normal mode to insert mode. Pressing ESC switches from the insert mode to normal mode.:x<Return> quits vi and saves changes.:q!<Return> quits vi without saving latest changes.:w<Return> saves changes.:wq<Return> quits vi and saves changes.Google vi cheatsheet
Statisticians write a lot of code. Critical to adopt a good IDE that goes beyond code editing: syntax highlighting, executing code within editor, debugging, profiling, version control, etc.
R Studio, Eclipse, Emacs, Matlab, Visual Studio, etc.
OS runs processes on behalf of user.
Each process has Process ID (PID), Username (UID), Parent process ID (PPID), Time and data process started (STIME), time running (TIME), etc.
ps
## PID TTY TIME CMD
## 18125 ? 00:00:07 rsession
## 18301 ? 00:00:00 sshd
## 20674 ? 00:00:00 R
## 20763 ? 00:00:00 sh
## 20764 ? 00:00:00 psAll current running processes:
ps -eaf
## UID PID PPID C STIME TTY TIME CMD
## root 1 0 0 Jan11 ? 00:00:10 /usr/lib/systemd/systemd --system --deserialize 15
## root 2 0 0 Jan11 ? 00:00:00 [kthreadd]
## root 3 2 0 Jan11 ? 00:00:00 [ksoftirqd/0]
## root 5 2 0 Jan11 ? 00:00:00 [kworker/0:0H]
## root 6 2 0 Jan11 ? 00:00:01 [kworker/u8:0]
## root 7 2 0 Jan11 ? 00:00:00 [migration/0]
## root 8 2 0 Jan11 ? 00:00:00 [rcu_bh]
## root 9 2 0 Jan11 ? 00:00:21 [rcu_sched]
## root 10 2 0 Jan11 ? 00:00:00 [lru-add-drain]
## root 11 2 0 Jan11 ? 00:00:02 [watchdog/0]
## root 12 2 0 Jan11 ? 00:00:02 [watchdog/1]
## root 13 2 0 Jan11 ? 00:00:00 [migration/1]
## root 14 2 0 Jan11 ? 00:00:00 [ksoftirqd/1]
## root 16 2 0 Jan11 ? 00:00:00 [kworker/1:0H]
## root 17 2 0 Jan11 ? 00:00:02 [watchdog/2]
## root 18 2 0 Jan11 ? 00:00:01 [migration/2]
## root 19 2 0 Jan11 ? 00:00:00 [ksoftirqd/2]
## root 21 2 0 Jan11 ? 00:00:00 [kworker/2:0H]
## root 22 2 0 Jan11 ? 00:00:01 [watchdog/3]
## root 23 2 0 Jan11 ? 00:00:01 [migration/3]
## root 24 2 0 Jan11 ? 00:00:00 [ksoftirqd/3]
## root 26 2 0 Jan11 ? 00:00:00 [kworker/3:0H]
## root 28 2 0 Jan11 ? 00:00:00 [kdevtmpfs]
## root 29 2 0 Jan11 ? 00:00:00 [netns]
## root 30 2 0 Jan11 ? 00:00:00 [khungtaskd]
## root 31 2 0 Jan11 ? 00:00:00 [writeback]
## root 32 2 0 Jan11 ? 00:00:00 [kintegrityd]
## root 33 2 0 Jan11 ? 00:00:00 [bioset]
## root 34 2 0 Jan11 ? 00:00:00 [bioset]
## root 35 2 0 Jan11 ? 00:00:00 [bioset]
## root 36 2 0 Jan11 ? 00:00:00 [kblockd]
## root 37 2 0 Jan11 ? 00:00:00 [md]
## root 38 2 0 Jan11 ? 00:00:00 [edac-poller]
## root 39 2 0 Jan11 ? 00:00:00 [watchdogd]
## root 46 2 0 Jan11 ? 00:00:00 [kswapd0]
## root 47 2 0 Jan11 ? 00:00:00 [ksmd]
## root 48 2 0 Jan11 ? 00:00:02 [khugepaged]
## root 49 2 0 Jan11 ? 00:00:00 [crypto]
## root 57 2 0 Jan11 ? 00:00:00 [kthrotld]
## root 59 2 0 Jan11 ? 00:00:00 [kmpath_rdacd]
## root 60 2 0 Jan11 ? 00:00:00 [kaluad]
## root 61 2 0 Jan11 ? 00:00:00 [kpsmoused]
## root 63 2 0 Jan11 ? 00:00:00 [ipv6_addrconf]
## root 64 2 0 Jan11 ? 00:00:07 [kworker/1:1]
## root 77 2 0 Jan11 ? 00:00:00 [deferwq]
## root 111 2 0 Jan11 ? 00:00:14 [kauditd]
## root 1445 2 0 Jan11 ? 00:00:00 [virtscsi-scan]
## root 1447 2 0 Jan11 ? 00:00:00 [scsi_eh_0]
## root 1448 2 0 Jan11 ? 00:00:00 [scsi_tmf_0]
## root 1455 2 0 Jan11 ? 00:00:00 [kworker/u8:2]
## root 1515 2 0 Jan11 ? 00:00:00 [bioset]
## root 1517 2 0 Jan11 ? 00:00:00 [xfsalloc]
## root 1519 2 0 Jan11 ? 00:00:00 [xfs_mru_cache]
## root 1525 2 0 Jan11 ? 00:00:00 [xfs-buf/sda1]
## root 1528 2 0 Jan11 ? 00:00:00 [xfs-data/sda1]
## root 1529 2 0 Jan11 ? 00:00:00 [xfs-conv/sda1]
## root 1535 2 0 Jan11 ? 00:00:00 [xfs-cil/sda1]
## root 1536 2 0 Jan11 ? 00:00:00 [xfs-reclaim/sda]
## root 1537 2 0 Jan11 ? 00:00:00 [xfs-log/sda1]
## root 1538 2 0 Jan11 ? 00:00:00 [xfs-eofblocks/s]
## root 1539 2 0 Jan11 ? 00:01:37 [xfsaild/sda1]
## root 1540 2 0 Jan11 ? 00:00:01 [kworker/0:1H]
## root 1582 2 0 Jan11 ? 00:00:00 [kworker/1:1H]
## root 1593 1 0 Jan11 ? 00:01:18 /usr/lib/systemd/systemd-journald
## root 2014 1 0 Jan11 ? 00:00:21 /sbin/auditd
## root 2533 2 0 Jan11 ? 00:00:00 [nfit]
## dbus 2603 1 0 Jan11 ? 00:00:01 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
## root 3095 1 0 Jan11 ? 00:00:01 /usr/lib/systemd/systemd-logind
## root 3137 1 0 Jan11 ? 00:00:00 /opt/shiny-server/ext/node/bin/shiny-server /opt/shiny-server/lib/main.js
## polkitd 3141 1 0 Jan11 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
## root 3151 1 0 Jan11 ? 00:00:00 /usr/sbin/acpid
## root 3155 1 0 Jan11 ? 00:00:01 /usr/sbin/crond -n
## root 3189 1 0 Jan11 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
## root 3199 1 0 Jan11 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
## chrony 3203 1 0 Jan11 ? 00:00:00 /usr/sbin/chronyd
## rstudio+ 3261 1 0 Jan11 ? 00:02:06 /usr/lib/rstudio-server/bin/rserver
## root 3269 1 0 Jan11 ? 00:00:00 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
## root 3283 1 0 Jan11 ? 00:00:07 /usr/sbin/NetworkManager --no-daemon
## root 3593 3283 0 Jan11 ? 00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-41b5db38-b54b-449d-8471-1311ba0d5b71-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
## root 3840 1 0 Jan11 ? 00:00:00 /usr/sbin/cupsd -f
## root 3842 1 0 Jan11 ? 00:00:46 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
## root 3843 1 0 Jan11 ? 00:01:05 /usr/sbin/rsyslogd -n
## root 4120 1 0 Jan11 ? 00:00:12 /usr/sbin/sshd -D
## root 4122 1 0 Jan11 ? 00:00:19 /usr/bin/python /usr/bin/google_network_daemon
## root 4126 1 0 Jan11 ? 00:00:09 /usr/bin/python /usr/bin/google_clock_skew_daemon
## root 4128 1 0 Jan11 ? 00:00:31 /usr/bin/python /usr/bin/google_accounts_daemon
## root 4131 1 0 Jan11 ? 00:00:02 /usr/libexec/postfix/master -w
## postfix 4142 4131 0 Jan11 ? 00:00:00 qmgr -l -t unix -u
## root 4168 2 0 Jan11 ? 00:00:00 [kworker/3:1H]
## root 4174 2 0 Jan11 ? 00:00:00 [kworker/2:1H]
## root 5763 2 0 Jan12 ? 00:00:02 [kworker/3:0]
## root 11279 1 0 07:37 ? 00:00:00 /usr/lib/systemd/systemd-udevd
## root 15035 2 0 12:04 ? 00:00:00 [kworker/3:1]
## root 15673 2 0 13:10 ? 00:00:00 [kworker/0:1]
## root 18070 2 0 17:15 ? 00:00:00 [kworker/1:2]
## hwachou 18125 3261 0 17:21 ? 00:00:07 /usr/lib/rstudio-server/bin/rsession -u hwachou --launcher-token D66448C1
## root 18283 2 0 17:31 ? 00:00:00 [kworker/2:2]
## root 18296 4120 0 17:33 ? 00:00:00 sshd: hwachou [priv]
## hwachou 18301 18296 0 17:33 ? 00:00:00 sshd: hwachou@pts/0
## hwachou 18302 18301 0 17:33 pts/0 00:00:00 -bash
## postfix 19009 4131 0 17:40 ? 00:00:00 pickup -l -t unix -u
## root 19304 2 0 17:42 ? 00:00:00 [kworker/2:0]
## hwachou 19463 18302 0 17:42 pts/0 00:00:00 top
## root 20151 2 0 17:47 ? 00:00:00 [kworker/2:1]
## hwachou 20674 18125 45 17:51 ? 00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/hwachou/Hua-Zhou.github.io/teaching/biostatm280-2019winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
## root 20688 2 0 Jan11 ? 00:00:04 [kworker/0:0]
## hwachou 20765 20674 0 17:51 ? 00:00:00 sh -c 'bash' -c 'ps -eaf' 2>&1
## hwachou 20766 20765 0 17:51 ? 00:00:00 ps -eafAll Python processes:
ps -eaf | grep python
## root 3269 1 0 Jan11 ? 00:00:00 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
## root 3842 1 0 Jan11 ? 00:00:46 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
## root 4122 1 0 Jan11 ? 00:00:19 /usr/bin/python /usr/bin/google_network_daemon
## root 4126 1 0 Jan11 ? 00:00:09 /usr/bin/python /usr/bin/google_clock_skew_daemon
## root 4128 1 0 Jan11 ? 00:00:31 /usr/bin/python /usr/bin/google_accounts_daemon
## hwachou 20767 20674 0 17:51 ? 00:00:00 sh -c 'bash' -c 'ps -eaf | grep python' 2>&1
## hwachou 20768 20767 0 17:51 ? 00:00:00 bash -c ps -eaf | grep python
## hwachou 20770 20768 0 17:51 ? 00:00:00 grep pythonProcess with PID=1:
ps -fp 1
## UID PID PPID C STIME TTY TIME CMD
## root 1 0 0 Jan11 ? 00:00:10 /usr/lib/systemd/systemd --system --deserialize 15All processes owned by a user:
ps -fu hwachou
## UID PID PPID C STIME TTY TIME CMD
## hwachou 18125 3261 0 17:21 ? 00:00:07 /usr/lib/rstudio-server/bin/rsession -u hwachou --launcher-token D66448C1
## hwachou 18301 18296 0 17:33 ? 00:00:00 sshd: hwachou@pts/0
## hwachou 18302 18301 0 17:33 pts/0 00:00:00 -bash
## hwachou 19463 18302 0 17:42 pts/0 00:00:00 top
## hwachou 20674 18125 46 17:51 ? 00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/hwachou/Hua-Zhou.github.io/teaching/biostatm280-2019winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
## hwachou 20773 20674 0 17:51 ? 00:00:00 sh -c 'bash' -c 'ps -fu hwachou' 2>&1
## hwachou 20774 20773 0 17:51 ? 00:00:00 ps -fu hwachouKill process with PID=1001:
kill 1001Kill all R processes.
killall -r Rtoptop prints realtime process information (very useful).
top
SSH (secure shell) is the dominant cryptographic network protocol for secure network connection via an insecure network.
On Linux or Mac, access the teaching server by
ssh username@server.biostat-m280.infoWindows machines need the PuTTY program (free).
Key authentication is more secure than password. Most passwords are weak.
Script or a program may need to systematically SSH into other machines.
Log into multiple machines using the same key.
Seamless use of many services: Git, AWS or Google cloud service, parallel computing on multiple hosts, etc.
Many servers only allow key authentication and do not accept password authentication.
Public key. Put on the machine(s) you want to log in.
Private key. Put on your own computer. Consider this as the actual key in your pocket; never give to others.
Messages from server to your computer is encrypted with your public key. It can only be decrypted using your private key.
Messages from your computer to server is signed with your private key (digital signatures) and can be verified by anyone who has your public key (authentication).
On Linux or Mac, to generate a key pair:
ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
[KEY_FILENAME] is the name that you want to use for your SSH key files. For example, a filename of id_rsa generates a private key file named id_rsa and a public key file named id_rsa.pub.
[USERNAME] is the user for whom you will apply this SSH key.
Use a (optional) paraphrase different form password.
Set correct permissions on the .ssh folder and key files
chmod 400 ~/.ssh/[KEY_FILENAME]Append the public key to the ~/.ssh/authorized_keys file of any Linux machine we want to SSH to, e.g.,
ssh-copy-id -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.biostat-m280.infoTest your new key.
ssh -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.biostat-m280.infoNow you don’t need password each time you connect from your machine to the teaching server.
If you set paraphrase when generating keys, you’ll be prompted for the paraphrase each time the private key is used. Avoid repeatedly entering the paraphrase by using ssh-agent on Linux/Mac or Pagent on Windows.
Same key pair can be used between any two machines. We don’t need to regenerate keys for each new connection.
For Windows users, the private key generated by ssh-keygen cannot be directly used by PuTTY; use PuTTYgen for conversion. Then let PuTTYgen use the converted private key. Read tutorial.
scp securely transfers files between machines using SSH.
## copy file from local to remote
scp [LOCALFILE] [USERNAME]@server.biostat-m280.info:/[PATHTOFOLDER]
## copy file from remote to local
scp [USERNAME]@server.biostat-m280.info:/[PATHTOFILE] [PATHTOLOCALFOLDER]sftp is FTP via SSH.
GUIs for Windows (WinSCP) or Mac (Cyberduck).
(My preferred way) Use a version control system to sync project files between different machines and systems.
Windows uses a pair of CR and LF for line breaks.
Linux/Unix uses an LF character only.
MacOS X also uses a single LF character. But old Mac OS used a single CR character for line breaks.
If transferred in binary mode (bit by bit) between OSs, a text file could look a mess.
Most transfer programs automatically switch to text mode when transferring text files and perform conversion of line breaks between different OSs; but I used to run into problems using WinSCP. Sometimes you have to tell WinSCP explicitly a text file is being transferred.
Start R in the interactive mode by typing R in shell.
Then run R script by
source("script.R")Demo script meanEst.R implements an (terrible) estimator of mean \[
{\widehat \mu}_n = \frac{\sum_{i=1}^n x_i 1_{x_i \text{ is prime}}}{\sum_{i=1}^n 1_{x_i \text{ is prime}}}.
\]
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## print(estMeanPrimes(rnorm(100000)))To run your R code non-interactively aka in batch mode, we have at least two options:
# default output to meanEst.Rout
R CMD BATCH meanEst.R
or
# output to stdout
Rscript meanEst.RTypically automate batch calls using a scripting language, e.g., Python, perl, and shell script.
Specify arguments in R CMD BATCH:
R CMD BATCH '--args mu=1 sig=2 kap=3' script.RSpecify arguments in Rscript:
Rscript script.R mu=1 sig=2 kap=3Parse command line arguments using magic formula
for (arg in commandArgs(T)) {
eval(parse(text=arg))
}
in R script. After calling the above code, all command line arguments will be available in the global namespace.
To understand the magic formula commandArgs, run R by:
R '--args mu=1 sig=2 kap=3'
and then issue commands in R
commandArgs()
commandArgs(TRUE)Understand the magic formula parse and eval:
rm(list=ls())
print(x)
## Error in print(x): object 'x' not found
parse(text="x=3")
## expression(x = 3)
eval(parse(text="x=3"))
print(x)
## [1] 3runSim.R has components: (1) method implementation, (2) data generator with unspecified parameter n, (3) estimation based on generated data, and (4) command argument parser.## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)
Call runSim.R with sample size n=100:
R CMD BATCH '--args n=100' runSim.R
or
Rscript runSim.R n=100
## [1] 0.1553339Many statistical computing tasks take long: simulation, MCMC, etc.
nohup command in Linux runs program(s) immune to hangups and writes output to nohup.out by default. Logging out will not kill the process; we can log in later to check status and results.
nohup is POSIX standard thus available on Linux and MacOS.
Run runSim.R in background and writes output to nohup.out:
nohup Rscript runSim.R n=100 &
## [1] -0.2407291screen is another popular utility, but not installed by default.
Typical workflow using screen.
Access remote server using ssh.
Start jobs in batch mode.
Detach jobs.
Exit from server, wait for jobs to finish.
Access remote server using ssh.
Re-attach jobs, check on progress, get results, etc.
R in conjuction with nohup or screen can be used to orchestrate a large simulation study.
It can be more elegant, transparent, and robust to parallelize jobs corresponding to different scenarios (e.g., different generative models) outside of the code used to do statistical computation.
We consider a simulation study in R but the same approach could be used with code written in Julia, Matlab, Python, etc.
Python in many ways makes a better glue.
runSim.R which runs a simulation based on command line argument n.n values that we want to use in our simulation study.Option 1: manually call runSim.R for each setting.
Option 2: automate calls using R and nohup. autoSim.R
cat autoSim.R
## # autoSim.R
##
## nVals <- seq(100, 1000, by=100)
## for (n in nVals) {
## oFile <- paste("n", n, ".txt", sep="")
## sysCall <- paste("nohup Rscript runSim.R n=", n, " > ", oFile, sep="")
## system(sysCall)
## print(paste("sysCall=", sysCall, sep=""))
## }Rscript autoSim.R
## [1] "sysCall=nohup Rscript runSim.R n=100 > n100.txt"
## [1] "sysCall=nohup Rscript runSim.R n=200 > n200.txt"
## [1] "sysCall=nohup Rscript runSim.R n=300 > n300.txt"
## [1] "sysCall=nohup Rscript runSim.R n=400 > n400.txt"
## [1] "sysCall=nohup Rscript runSim.R n=500 > n500.txt"
## [1] "sysCall=nohup Rscript runSim.R n=600 > n600.txt"
## [1] "sysCall=nohup Rscript runSim.R n=700 > n700.txt"
## [1] "sysCall=nohup Rscript runSim.R n=800 > n800.txt"
## [1] "sysCall=nohup Rscript runSim.R n=900 > n900.txt"
## [1] "sysCall=nohup Rscript runSim.R n=1000 > n1000.txt"