Linux is the most common platform for scientific computing.
Open source and community support.
Things break; when they break using Linux, it’s easy to fix.
Cost: it’s free!
Debian/Ubuntu is a popular choice for personal computers.
RHEL/CentOS is popular on servers.
The teaching server for this class runs CentOS 7.
MacOS was originally derived from Unix/Linux (Darwin kernel). It is POSIX compliant. Most shell commands we review here apply to MacOS terminal as well. Windows/DOS, unfortunately, is a totally different breed.
Show distribution/version on Linux:
cat /etc/*-release
## CentOS Linux release 7.6.1810 (Core)
## NAME="CentOS Linux"
## VERSION="7 (Core)"
## ID="centos"
## ID_LIKE="rhel fedora"
## VERSION_ID="7"
## PRETTY_NAME="CentOS Linux 7 (Core)"
## ANSI_COLOR="0;31"
## CPE_NAME="cpe:/o:centos:centos:7"
## HOME_URL="https://www.centos.org/"
## BUG_REPORT_URL="https://bugs.centos.org/"
##
## CENTOS_MANTISBT_PROJECT="CentOS-7"
## CENTOS_MANTISBT_PROJECT_VERSION="7"
## REDHAT_SUPPORT_PRODUCT="centos"
## REDHAT_SUPPORT_PRODUCT_VERSION="7"
##
## CentOS Linux release 7.6.1810 (Core)
## CentOS Linux release 7.6.1810 (Core)
Show distribution/version on MacOS:
sw_vers -productVersion
or
system_profiler SPSoftwareDataType
A shell translates commands to OS instructions.
Most commonly used shells include bash
, csh
, tcsh
, zsh
, etc.
Sometimes a script or a command does not run simply because it’s written for another shell.
We mostly use bash
shell commands in this class.
Determine the current shell:
echo $SHELL
## /bin/bash
List available shells:
cat /etc/shells
## /bin/sh
## /bin/bash
## /usr/bin/sh
## /usr/bin/bash
Change to another shell:
exec bash -l
The -l
option indicates it should be a login shell.
Change your login shell permanently:
chsh -s /bin/bash userid
Then log out and log in.
Bash provides the following standard completion for the Linux users by default. Much less typing errors and time!
Pathname completion.
Filename completion.
Variablename completion: echo $[TAB][TAB]
.
Username completion: cd ~[TAB][TAB]
.
Hostname completion ssh hwachou@[TAB][TAB]
.
It can also be customized to auto-complete other stuff such as options and command’s arguments. Google bash completion
for more information.
cat
prints the contents of a file:
cat runSim.R
## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)
head
prints the first 10 lines of a file:
head runSim.R
## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
head -l
prints the first \(l\) lines of a file:
head -15 runSim.R
## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }
tail
prints the last 10 lines of a file:
tail runSim.R
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)
tail -l
prints the last \(l\) lines of a file:
tail -15 runSim.R
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)
less
is more; more
is lessmore
browses a text file screen by screen (only downwards). Scroll down one page (paging) by pressing the spacebar; exit by pressing the q
key.
less
is also a pager, but has more functionalities, e.g., scroll upwards and downwards through the input.
less
doesn’t need to read the whole file, i.e., it loads files faster than more
.
grep
grep
prints lines that match an expression:
Show lines that contain string CentOS
:
# quotes not necessary if not a regular expression
grep 'CentOS' linux.Rmd
## - RHEL/CentOS is popular on servers.
## - The teaching server for this class runs CentOS 7.
## - Show lines that contain string `CentOS`:
## grep 'CentOS' linux.Rmd
## grep 'CentOS' *.Rmd
## grep -n 'CentOS' linux.Rmd
## - Replace `CentOS` by `RHEL` in a text file:
## sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Search multiple text files:
grep 'CentOS' *.Rmd
## - RHEL/CentOS is popular on servers.
## - The teaching server for this class runs CentOS 7.
## - Show lines that contain string `CentOS`:
## grep 'CentOS' linux.Rmd
## grep 'CentOS' *.Rmd
## grep -n 'CentOS' linux.Rmd
## - Replace `CentOS` by `RHEL` in a text file:
## sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Show matching line numbers:
grep -n 'CentOS' linux.Rmd
## 31:- RHEL/CentOS is popular on servers.
## 33:- The teaching server for this class runs CentOS 7.
## 294:- Show lines that contain string `CentOS`:
## 297: grep 'CentOS' linux.Rmd
## 302: grep 'CentOS' *.Rmd
## 307: grep -n 'CentOS' linux.Rmd
## 324:- Replace `CentOS` by `RHEL` in a text file:
## 326: sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Find all files in current directory with .png
extension:
ls | grep '.png$'
## key_authentication_1.png
## key_authentication_2.png
## linux_directory_structure.png
## linux_filepermission_oct.png
## linux_filepermission.png
## Richard_Stallman_2013.png
## screenshot_top.png
Find all directories in the current directory:
ls -al | grep '^d'
## drwxrwxr-x. 2 hwachou hwachou 4096 Jan 15 17:50 .
## drwxrwxr-x. 6 hwachou hwachou 102 Jan 15 04:02 ..
sed
sed
is a stream editor.
Replace CentOS
by RHEL
in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
## - RHEL/RHEL is popular on servers.
## - The teaching server for this class runs RHEL 7.
## - Show lines that contain string `RHEL`:
## grep 'RHEL' linux.Rmd
## grep 'RHEL' *.Rmd
## grep -n 'RHEL' linux.Rmd
## - Replace `RHEL` by `RHEL` in a text file:
## sed 's/RHEL/RHEL/' linux.Rmd | grep RHEL
awk
awk
is a filter and report writer.
First let’s display first lines of the file /etc/passwd
:
head /etc/passwd
## root:x:0:0:root:/root:/bin/bash
## bin:x:1:1:bin:/bin:/sbin/nologin
## daemon:x:2:2:daemon:/sbin:/sbin/nologin
## adm:x:3:4:adm:/var/adm:/sbin/nologin
## lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
## sync:x:5:0:sync:/sbin:/bin/sync
## shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
## halt:x:7:0:halt:/sbin:/sbin/halt
## mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
## operator:x:11:0:operator:/root:/sbin/nologin
Each line contains fields (1) user name, (2) password, (3) user ID, (4) group ID, (5) user ID info, (6) home directory, and (7) command shell, spearated by :
.
Print sorted list of login names:
awk -F: '{ print $1 }' /etc/passwd | sort | head -10
## adm
## anorthrup
## bhsiao
## bin
## biona001
## brendon.chau
## bryanmkevan
## chrony
## daemon
## dbus
Print number of lines in a file, as NR
stands for Number of Rows:
awk 'END { print NR }' /etc/passwd
## 69
or
wc -l /etc/passwd
## 69 /etc/passwd
or (not displaying file name)
wc -l < /etc/passwd
## 69
Print login names with UID in range 1000-1035
:
awk -F: '{if ($3 >= 1000 && $3 <= 1035) print}' /etc/passwd
## huazhou:x:1000:1001::/home/huazhou:/bin/bash
## hwachou:x:1001:1003::/home/hwachou:/bin/bash
## juhkim111:x:1002:1004::/home/juhkim111:/bin/bash
## huijun.an:x:1003:1005::/home/huijun.an:/bin/bash
## edenaxe:x:1004:1006::/home/edenaxe:/bin/bash
## seancampeau:x:1005:1007::/home/seancampeau:/bin/bash
## brendon.chau:x:1006:1008::/home/brendon.chau:/bin/bash
## elviscuihan:x:1007:1009::/home/elviscuihan:/bin/bash
## qedo:x:1008:1010::/home/qedo:/bin/bash
## fangyw1995:x:1009:1011::/home/fangyw1995:/bin/bash
## suneeta.godbole:x:1010:1012::/home/suneeta.godbole:/bin/bash
## yunh:x:1011:1013::/home/yunh:/bin/bash
## bhsiao:x:1012:1014::/home/bhsiao:/bin/bash
## hujuehao:x:1013:1015::/home/hujuehao:/bin/bash
## lucymjimenez:x:1014:1016::/home/lucymjimenez:/bin/bash
## yoonjun05:x:1015:1017::/home/yoonjun05:/bin/bash
## shelleyjung:x:1016:1018::/home/shelleyjung:/bin/bash
## bryanmkevan:x:1017:1019::/home/bryanmkevan:/bin/bash
## sdkwok2:x:1018:1020::/home/sdkwok2:/bin/bash
## dereklee20:x:1019:1021::/home/dereklee20:/bin/bash
## liuweijie12345678:x:1020:1022::/home/liuweijie12345678:/bin/bash
## peterljw:x:1021:1023::/home/peterljw:/bin/bash
## kristenmae:x:1022:1024::/home/kristenmae:/bin/bash
## menrge666:x:1023:1025::/home/menrge666:/bin/bash
## anorthrup:x:1024:1026::/home/anorthrup:/bin/bash
## ethanpark26:x:1025:1027::/home/ethanpark26:/bin/bash
## shryu94:x:1026:1028::/home/shryu94:/bin/bash
## elliewky:x:1027:1029::/home/elliewky:/bin/bash
## sijiawang0729:x:1028:1030::/home/sijiawang0729:/bin/bash
## xiayu960112:x:1029:1031::/home/xiayu960112:/bin/bash
## haowenxu930622:x:1030:1032::/home/haowenxu930622:/bin/bash
## zfy917:x:1032:1034::/home/zfy917:/bin/bash
## suedez:x:1033:1035::/home/suedez:/bin/bash
## wenyan1996:x:1034:1036::/home/wenyan1996:/bin/bash
## dmorrison01:x:1035:1037::/home/dmorrison01:/bin/bash
Print login names and log-in shells in comma-seperated format:
awk -F: '{OFS = ","} {print $1, $7}' /etc/passwd
## root,/bin/bash
## bin,/sbin/nologin
## daemon,/sbin/nologin
## adm,/sbin/nologin
## lp,/sbin/nologin
## sync,/bin/sync
## shutdown,/sbin/shutdown
## halt,/sbin/halt
## mail,/sbin/nologin
## operator,/sbin/nologin
## games,/sbin/nologin
## ftp,/sbin/nologin
## nobody,/sbin/nologin
## systemd-network,/sbin/nologin
## dbus,/sbin/nologin
## polkitd,/sbin/nologin
## ntp,/sbin/nologin
## sshd,/sbin/nologin
## postfix,/sbin/nologin
## chrony,/sbin/nologin
## huazhou,/bin/bash
## mongodb,/sbin/nologin
## tss,/sbin/nologin
## rstudio-server,/bin/bash
## shiny,/bin/sh
## saslauth,/sbin/nologin
## hwachou,/bin/bash
## juhkim111,/bin/bash
## huijun.an,/bin/bash
## edenaxe,/bin/bash
## seancampeau,/bin/bash
## brendon.chau,/bin/bash
## elviscuihan,/bin/bash
## qedo,/bin/bash
## fangyw1995,/bin/bash
## suneeta.godbole,/bin/bash
## yunh,/bin/bash
## bhsiao,/bin/bash
## hujuehao,/bin/bash
## lucymjimenez,/bin/bash
## yoonjun05,/bin/bash
## shelleyjung,/bin/bash
## bryanmkevan,/bin/bash
## sdkwok2,/bin/bash
## dereklee20,/bin/bash
## liuweijie12345678,/bin/bash
## peterljw,/bin/bash
## kristenmae,/bin/bash
## menrge666,/bin/bash
## anorthrup,/bin/bash
## ethanpark26,/bin/bash
## shryu94,/bin/bash
## elliewky,/bin/bash
## sijiawang0729,/bin/bash
## xiayu960112,/bin/bash
## haowenxu930622,/bin/bash
## zfy917,/bin/bash
## suedez,/bin/bash
## wenyan1996,/bin/bash
## dmorrison01,/bin/bash
## biona001,/bin/bash
## gaoxueyao,/bin/bash
## jian.he,/bin/bash
## zyshi,/bin/bash
## wudiyangabc,/bin/bash
## kaversoniano,/bin/bash
## edwardmjyu,/bin/bash
## lizhang1122,/bin/bash
## ryhwang,/bin/bash
Print login names and indicate those with UID>1000 as vip
:
awk -F: -v status="" '{OFS = ","}
{if ($3 >= 1000) status="vip"; else status="regular"}
{print $1, status}' /etc/passwd
## root,regular
## bin,regular
## daemon,regular
## adm,regular
## lp,regular
## sync,regular
## shutdown,regular
## halt,regular
## mail,regular
## operator,regular
## games,regular
## ftp,regular
## nobody,regular
## systemd-network,regular
## dbus,regular
## polkitd,regular
## ntp,regular
## sshd,regular
## postfix,regular
## chrony,regular
## huazhou,vip
## mongodb,regular
## tss,regular
## rstudio-server,regular
## shiny,regular
## saslauth,regular
## hwachou,vip
## juhkim111,vip
## huijun.an,vip
## edenaxe,vip
## seancampeau,vip
## brendon.chau,vip
## elviscuihan,vip
## qedo,vip
## fangyw1995,vip
## suneeta.godbole,vip
## yunh,vip
## bhsiao,vip
## hujuehao,vip
## lucymjimenez,vip
## yoonjun05,vip
## shelleyjung,vip
## bryanmkevan,vip
## sdkwok2,vip
## dereklee20,vip
## liuweijie12345678,vip
## peterljw,vip
## kristenmae,vip
## menrge666,vip
## anorthrup,vip
## ethanpark26,vip
## shryu94,vip
## elliewky,vip
## sijiawang0729,vip
## xiayu960112,vip
## haowenxu930622,vip
## zfy917,vip
## suedez,vip
## wenyan1996,vip
## dmorrison01,vip
## biona001,vip
## gaoxueyao,vip
## jian.he,vip
## zyshi,vip
## wudiyangabc,vip
## kaversoniano,vip
## edwardmjyu,vip
## lizhang1122,vip
## ryhwang,vip
|
sends output from one command as input of another command.
>
directs output from one command to a file.
>>
appends output from one command to a file.
<
reads input from a file.
Combinations of shell commands (grep
, sed
, awk
, …), piping and redirection, and regular expressions allow us pre-process and reformat huge text files efficiently.
See HW1.
Emacs
is a powerful text editor with extensive support for many languages including R
, \(\LaTeX\), python
, and C/C++
; however it’s not installed by default on many Linux distributions.
emacs filename
to open a file with emacs.CTRL-x CTRL-f
to open an existing or new file.CTRL-x CTRX-s
to save.CTRL-x CTRL-w
to save as.CTRL-x CTRL-c
to quit.Google emacs cheatsheet
C-<key>
means hold the control
key, and press <key>
.
M-<key>
means press the Esc
key once, and press <key>
.
Vi
is ubiquitous (POSIX standard). Learn at least its basics; otherwise you can edit nothing on some clusters.
vi filename
to start editing a file.vi
is a modal editor: insert mode and normal mode. Pressing i
switches from the normal mode to insert mode. Pressing ESC
switches from the insert mode to normal mode.:x<Return>
quits vi
and saves changes.:q!<Return>
quits vi without saving latest changes.:w<Return>
saves changes.:wq<Return>
quits vi
and saves changes.Google vi cheatsheet
Statisticians write a lot of code. Critical to adopt a good IDE that goes beyond code editing: syntax highlighting, executing code within editor, debugging, profiling, version control, etc.
R Studio, Eclipse, Emacs, Matlab, Visual Studio, etc.
OS runs processes on behalf of user.
Each process has Process ID (PID), Username (UID), Parent process ID (PPID), Time and data process started (STIME), time running (TIME), etc.
ps
## PID TTY TIME CMD
## 18125 ? 00:00:07 rsession
## 18301 ? 00:00:00 sshd
## 20674 ? 00:00:00 R
## 20763 ? 00:00:00 sh
## 20764 ? 00:00:00 ps
All current running processes:
ps -eaf
## UID PID PPID C STIME TTY TIME CMD
## root 1 0 0 Jan11 ? 00:00:10 /usr/lib/systemd/systemd --system --deserialize 15
## root 2 0 0 Jan11 ? 00:00:00 [kthreadd]
## root 3 2 0 Jan11 ? 00:00:00 [ksoftirqd/0]
## root 5 2 0 Jan11 ? 00:00:00 [kworker/0:0H]
## root 6 2 0 Jan11 ? 00:00:01 [kworker/u8:0]
## root 7 2 0 Jan11 ? 00:00:00 [migration/0]
## root 8 2 0 Jan11 ? 00:00:00 [rcu_bh]
## root 9 2 0 Jan11 ? 00:00:21 [rcu_sched]
## root 10 2 0 Jan11 ? 00:00:00 [lru-add-drain]
## root 11 2 0 Jan11 ? 00:00:02 [watchdog/0]
## root 12 2 0 Jan11 ? 00:00:02 [watchdog/1]
## root 13 2 0 Jan11 ? 00:00:00 [migration/1]
## root 14 2 0 Jan11 ? 00:00:00 [ksoftirqd/1]
## root 16 2 0 Jan11 ? 00:00:00 [kworker/1:0H]
## root 17 2 0 Jan11 ? 00:00:02 [watchdog/2]
## root 18 2 0 Jan11 ? 00:00:01 [migration/2]
## root 19 2 0 Jan11 ? 00:00:00 [ksoftirqd/2]
## root 21 2 0 Jan11 ? 00:00:00 [kworker/2:0H]
## root 22 2 0 Jan11 ? 00:00:01 [watchdog/3]
## root 23 2 0 Jan11 ? 00:00:01 [migration/3]
## root 24 2 0 Jan11 ? 00:00:00 [ksoftirqd/3]
## root 26 2 0 Jan11 ? 00:00:00 [kworker/3:0H]
## root 28 2 0 Jan11 ? 00:00:00 [kdevtmpfs]
## root 29 2 0 Jan11 ? 00:00:00 [netns]
## root 30 2 0 Jan11 ? 00:00:00 [khungtaskd]
## root 31 2 0 Jan11 ? 00:00:00 [writeback]
## root 32 2 0 Jan11 ? 00:00:00 [kintegrityd]
## root 33 2 0 Jan11 ? 00:00:00 [bioset]
## root 34 2 0 Jan11 ? 00:00:00 [bioset]
## root 35 2 0 Jan11 ? 00:00:00 [bioset]
## root 36 2 0 Jan11 ? 00:00:00 [kblockd]
## root 37 2 0 Jan11 ? 00:00:00 [md]
## root 38 2 0 Jan11 ? 00:00:00 [edac-poller]
## root 39 2 0 Jan11 ? 00:00:00 [watchdogd]
## root 46 2 0 Jan11 ? 00:00:00 [kswapd0]
## root 47 2 0 Jan11 ? 00:00:00 [ksmd]
## root 48 2 0 Jan11 ? 00:00:02 [khugepaged]
## root 49 2 0 Jan11 ? 00:00:00 [crypto]
## root 57 2 0 Jan11 ? 00:00:00 [kthrotld]
## root 59 2 0 Jan11 ? 00:00:00 [kmpath_rdacd]
## root 60 2 0 Jan11 ? 00:00:00 [kaluad]
## root 61 2 0 Jan11 ? 00:00:00 [kpsmoused]
## root 63 2 0 Jan11 ? 00:00:00 [ipv6_addrconf]
## root 64 2 0 Jan11 ? 00:00:07 [kworker/1:1]
## root 77 2 0 Jan11 ? 00:00:00 [deferwq]
## root 111 2 0 Jan11 ? 00:00:14 [kauditd]
## root 1445 2 0 Jan11 ? 00:00:00 [virtscsi-scan]
## root 1447 2 0 Jan11 ? 00:00:00 [scsi_eh_0]
## root 1448 2 0 Jan11 ? 00:00:00 [scsi_tmf_0]
## root 1455 2 0 Jan11 ? 00:00:00 [kworker/u8:2]
## root 1515 2 0 Jan11 ? 00:00:00 [bioset]
## root 1517 2 0 Jan11 ? 00:00:00 [xfsalloc]
## root 1519 2 0 Jan11 ? 00:00:00 [xfs_mru_cache]
## root 1525 2 0 Jan11 ? 00:00:00 [xfs-buf/sda1]
## root 1528 2 0 Jan11 ? 00:00:00 [xfs-data/sda1]
## root 1529 2 0 Jan11 ? 00:00:00 [xfs-conv/sda1]
## root 1535 2 0 Jan11 ? 00:00:00 [xfs-cil/sda1]
## root 1536 2 0 Jan11 ? 00:00:00 [xfs-reclaim/sda]
## root 1537 2 0 Jan11 ? 00:00:00 [xfs-log/sda1]
## root 1538 2 0 Jan11 ? 00:00:00 [xfs-eofblocks/s]
## root 1539 2 0 Jan11 ? 00:01:37 [xfsaild/sda1]
## root 1540 2 0 Jan11 ? 00:00:01 [kworker/0:1H]
## root 1582 2 0 Jan11 ? 00:00:00 [kworker/1:1H]
## root 1593 1 0 Jan11 ? 00:01:18 /usr/lib/systemd/systemd-journald
## root 2014 1 0 Jan11 ? 00:00:21 /sbin/auditd
## root 2533 2 0 Jan11 ? 00:00:00 [nfit]
## dbus 2603 1 0 Jan11 ? 00:00:01 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
## root 3095 1 0 Jan11 ? 00:00:01 /usr/lib/systemd/systemd-logind
## root 3137 1 0 Jan11 ? 00:00:00 /opt/shiny-server/ext/node/bin/shiny-server /opt/shiny-server/lib/main.js
## polkitd 3141 1 0 Jan11 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
## root 3151 1 0 Jan11 ? 00:00:00 /usr/sbin/acpid
## root 3155 1 0 Jan11 ? 00:00:01 /usr/sbin/crond -n
## root 3189 1 0 Jan11 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
## root 3199 1 0 Jan11 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
## chrony 3203 1 0 Jan11 ? 00:00:00 /usr/sbin/chronyd
## rstudio+ 3261 1 0 Jan11 ? 00:02:06 /usr/lib/rstudio-server/bin/rserver
## root 3269 1 0 Jan11 ? 00:00:00 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
## root 3283 1 0 Jan11 ? 00:00:07 /usr/sbin/NetworkManager --no-daemon
## root 3593 3283 0 Jan11 ? 00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-41b5db38-b54b-449d-8471-1311ba0d5b71-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
## root 3840 1 0 Jan11 ? 00:00:00 /usr/sbin/cupsd -f
## root 3842 1 0 Jan11 ? 00:00:46 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
## root 3843 1 0 Jan11 ? 00:01:05 /usr/sbin/rsyslogd -n
## root 4120 1 0 Jan11 ? 00:00:12 /usr/sbin/sshd -D
## root 4122 1 0 Jan11 ? 00:00:19 /usr/bin/python /usr/bin/google_network_daemon
## root 4126 1 0 Jan11 ? 00:00:09 /usr/bin/python /usr/bin/google_clock_skew_daemon
## root 4128 1 0 Jan11 ? 00:00:31 /usr/bin/python /usr/bin/google_accounts_daemon
## root 4131 1 0 Jan11 ? 00:00:02 /usr/libexec/postfix/master -w
## postfix 4142 4131 0 Jan11 ? 00:00:00 qmgr -l -t unix -u
## root 4168 2 0 Jan11 ? 00:00:00 [kworker/3:1H]
## root 4174 2 0 Jan11 ? 00:00:00 [kworker/2:1H]
## root 5763 2 0 Jan12 ? 00:00:02 [kworker/3:0]
## root 11279 1 0 07:37 ? 00:00:00 /usr/lib/systemd/systemd-udevd
## root 15035 2 0 12:04 ? 00:00:00 [kworker/3:1]
## root 15673 2 0 13:10 ? 00:00:00 [kworker/0:1]
## root 18070 2 0 17:15 ? 00:00:00 [kworker/1:2]
## hwachou 18125 3261 0 17:21 ? 00:00:07 /usr/lib/rstudio-server/bin/rsession -u hwachou --launcher-token D66448C1
## root 18283 2 0 17:31 ? 00:00:00 [kworker/2:2]
## root 18296 4120 0 17:33 ? 00:00:00 sshd: hwachou [priv]
## hwachou 18301 18296 0 17:33 ? 00:00:00 sshd: hwachou@pts/0
## hwachou 18302 18301 0 17:33 pts/0 00:00:00 -bash
## postfix 19009 4131 0 17:40 ? 00:00:00 pickup -l -t unix -u
## root 19304 2 0 17:42 ? 00:00:00 [kworker/2:0]
## hwachou 19463 18302 0 17:42 pts/0 00:00:00 top
## root 20151 2 0 17:47 ? 00:00:00 [kworker/2:1]
## hwachou 20674 18125 45 17:51 ? 00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/hwachou/Hua-Zhou.github.io/teaching/biostatm280-2019winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
## root 20688 2 0 Jan11 ? 00:00:04 [kworker/0:0]
## hwachou 20765 20674 0 17:51 ? 00:00:00 sh -c 'bash' -c 'ps -eaf' 2>&1
## hwachou 20766 20765 0 17:51 ? 00:00:00 ps -eaf
All Python processes:
ps -eaf | grep python
## root 3269 1 0 Jan11 ? 00:00:00 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
## root 3842 1 0 Jan11 ? 00:00:46 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
## root 4122 1 0 Jan11 ? 00:00:19 /usr/bin/python /usr/bin/google_network_daemon
## root 4126 1 0 Jan11 ? 00:00:09 /usr/bin/python /usr/bin/google_clock_skew_daemon
## root 4128 1 0 Jan11 ? 00:00:31 /usr/bin/python /usr/bin/google_accounts_daemon
## hwachou 20767 20674 0 17:51 ? 00:00:00 sh -c 'bash' -c 'ps -eaf | grep python' 2>&1
## hwachou 20768 20767 0 17:51 ? 00:00:00 bash -c ps -eaf | grep python
## hwachou 20770 20768 0 17:51 ? 00:00:00 grep python
Process with PID=1:
ps -fp 1
## UID PID PPID C STIME TTY TIME CMD
## root 1 0 0 Jan11 ? 00:00:10 /usr/lib/systemd/systemd --system --deserialize 15
All processes owned by a user:
ps -fu hwachou
## UID PID PPID C STIME TTY TIME CMD
## hwachou 18125 3261 0 17:21 ? 00:00:07 /usr/lib/rstudio-server/bin/rsession -u hwachou --launcher-token D66448C1
## hwachou 18301 18296 0 17:33 ? 00:00:00 sshd: hwachou@pts/0
## hwachou 18302 18301 0 17:33 pts/0 00:00:00 -bash
## hwachou 19463 18302 0 17:42 pts/0 00:00:00 top
## hwachou 20674 18125 46 17:51 ? 00:00:00 /usr/lib64/R/bin/exec/R --slave --no-save --no-restore -e rmarkdown::render('/home/hwachou/Hua-Zhou.github.io/teaching/biostatm280-2019winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
## hwachou 20773 20674 0 17:51 ? 00:00:00 sh -c 'bash' -c 'ps -fu hwachou' 2>&1
## hwachou 20774 20773 0 17:51 ? 00:00:00 ps -fu hwachou
Kill process with PID=1001:
kill 1001
Kill all R processes.
killall -r R
top
top
prints realtime process information (very useful).
top
SSH (secure shell) is the dominant cryptographic network protocol for secure network connection via an insecure network.
On Linux or Mac, access the teaching server by
ssh username@server.biostat-m280.info
Windows machines need the PuTTY program (free).
Key authentication is more secure than password. Most passwords are weak.
Script or a program may need to systematically SSH into other machines.
Log into multiple machines using the same key.
Seamless use of many services: Git, AWS or Google cloud service, parallel computing on multiple hosts, etc.
Many servers only allow key authentication and do not accept password authentication.
Public key. Put on the machine(s) you want to log in.
Private key. Put on your own computer. Consider this as the actual key in your pocket; never give to others.
Messages from server to your computer is encrypted with your public key. It can only be decrypted using your private key.
Messages from your computer to server is signed with your private key (digital signatures) and can be verified by anyone who has your public key (authentication).
On Linux or Mac, to generate a key pair:
ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
[KEY_FILENAME]
is the name that you want to use for your SSH key files. For example, a filename of id_rsa
generates a private key file named id_rsa
and a public key file named id_rsa.pub
.
[USERNAME]
is the user for whom you will apply this SSH key.
Use a (optional) paraphrase different form password.
Set correct permissions on the .ssh
folder and key files
chmod 400 ~/.ssh/[KEY_FILENAME]
Append the public key to the ~/.ssh/authorized_keys
file of any Linux machine we want to SSH to, e.g.,
ssh-copy-id -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.biostat-m280.info
Test your new key.
ssh -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.biostat-m280.info
Now you don’t need password each time you connect from your machine to the teaching server.
If you set paraphrase when generating keys, you’ll be prompted for the paraphrase each time the private key is used. Avoid repeatedly entering the paraphrase by using ssh-agent
on Linux/Mac or Pagent on Windows.
Same key pair can be used between any two machines. We don’t need to regenerate keys for each new connection.
For Windows users, the private key generated by ssh-keygen
cannot be directly used by PuTTY; use PuTTYgen for conversion. Then let PuTTYgen use the converted private key. Read tutorial.
scp
securely transfers files between machines using SSH.
## copy file from local to remote
scp [LOCALFILE] [USERNAME]@server.biostat-m280.info:/[PATHTOFOLDER]
## copy file from remote to local
scp [USERNAME]@server.biostat-m280.info:/[PATHTOFILE] [PATHTOLOCALFOLDER]
sftp
is FTP via SSH.
GUIs for Windows (WinSCP) or Mac (Cyberduck).
(My preferred way) Use a version control system to sync project files between different machines and systems.
Windows uses a pair of CR
and LF
for line breaks.
Linux/Unix uses an LF
character only.
MacOS X also uses a single LF
character. But old Mac OS used a single CR
character for line breaks.
If transferred in binary mode (bit by bit) between OSs, a text file could look a mess.
Most transfer programs automatically switch to text mode when transferring text files and perform conversion of line breaks between different OSs; but I used to run into problems using WinSCP. Sometimes you have to tell WinSCP explicitly a text file is being transferred.
Start R in the interactive mode by typing R
in shell.
Then run R script by
source("script.R")
Demo script meanEst.R
implements an (terrible) estimator of mean \[
{\widehat \mu}_n = \frac{\sum_{i=1}^n x_i 1_{x_i \text{ is prime}}}{\sum_{i=1}^n 1_{x_i \text{ is prime}}}.
\]
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## print(estMeanPrimes(rnorm(100000)))
To run your R code non-interactively aka in batch mode, we have at least two options:
# default output to meanEst.Rout
R CMD BATCH meanEst.R
or
# output to stdout
Rscript meanEst.R
Typically automate batch calls using a scripting language, e.g., Python, perl, and shell script.
Specify arguments in R CMD BATCH
:
R CMD BATCH '--args mu=1 sig=2 kap=3' script.R
Specify arguments in Rscript
:
Rscript script.R mu=1 sig=2 kap=3
Parse command line arguments using magic formula
for (arg in commandArgs(T)) {
eval(parse(text=arg))
}
in R script. After calling the above code, all command line arguments will be available in the global namespace.
To understand the magic formula commandArgs
, run R by:
R '--args mu=1 sig=2 kap=3'
and then issue commands in R
commandArgs()
commandArgs(TRUE)
Understand the magic formula parse
and eval
:
rm(list=ls())
print(x)
## Error in print(x): object 'x' not found
parse(text="x=3")
## expression(x = 3)
eval(parse(text="x=3"))
print(x)
## [1] 3
runSim.R
has components: (1) method implementation, (2) data generator with unspecified parameter n
, (3) estimation based on generated data, and (4) command argument parser.## ## parsing command arguments
## for (arg in commandArgs(TRUE)) {
## eval(parse(text=arg))
## }
##
## ## check if a given integer is prime
## isPrime = function(n) {
## if (n <= 3) {
## return (TRUE)
## }
## if (any((n %% 2:floor(sqrt(n))) == 0)) {
## return (FALSE)
## }
## return (TRUE)
## }
##
## ## estimate mean only using observation with prime indices
## estMeanPrimes = function (x) {
## n = length(x)
## ind = sapply(1:n, isPrime)
## return (mean(x[ind]))
## }
##
## # simulate data
## x = rnorm(n)
##
## # estimate mean
## estMeanPrimes(x)
Call runSim.R
with sample size n=100
:
R CMD BATCH '--args n=100' runSim.R
or
Rscript runSim.R n=100
## [1] 0.1553339
Many statistical computing tasks take long: simulation, MCMC, etc.
nohup
command in Linux runs program(s) immune to hangups and writes output to nohup.out
by default. Logging out will not kill the process; we can log in later to check status and results.
nohup
is POSIX standard thus available on Linux and MacOS.
Run runSim.R
in background and writes output to nohup.out
:
nohup Rscript runSim.R n=100 &
## [1] -0.2407291
screen
is another popular utility, but not installed by default.
Typical workflow using screen
.
Access remote server using ssh
.
Start jobs in batch mode.
Detach jobs.
Exit from server, wait for jobs to finish.
Access remote server using ssh
.
Re-attach jobs, check on progress, get results, etc.
R in conjuction with nohup
or screen
can be used to orchestrate a large simulation study.
It can be more elegant, transparent, and robust to parallelize jobs corresponding to different scenarios (e.g., different generative models) outside of the code used to do statistical computation.
We consider a simulation study in R but the same approach could be used with code written in Julia, Matlab, Python, etc.
Python in many ways makes a better glue.
runSim.R
which runs a simulation based on command line argument n
.n
values that we want to use in our simulation study.Option 1: manually call runSim.R
for each setting.
Option 2: automate calls using R and nohup
. autoSim.R
cat autoSim.R
## # autoSim.R
##
## nVals <- seq(100, 1000, by=100)
## for (n in nVals) {
## oFile <- paste("n", n, ".txt", sep="")
## sysCall <- paste("nohup Rscript runSim.R n=", n, " > ", oFile, sep="")
## system(sysCall)
## print(paste("sysCall=", sysCall, sep=""))
## }
Rscript autoSim.R
## [1] "sysCall=nohup Rscript runSim.R n=100 > n100.txt"
## [1] "sysCall=nohup Rscript runSim.R n=200 > n200.txt"
## [1] "sysCall=nohup Rscript runSim.R n=300 > n300.txt"
## [1] "sysCall=nohup Rscript runSim.R n=400 > n400.txt"
## [1] "sysCall=nohup Rscript runSim.R n=500 > n500.txt"
## [1] "sysCall=nohup Rscript runSim.R n=600 > n600.txt"
## [1] "sysCall=nohup Rscript runSim.R n=700 > n700.txt"
## [1] "sysCall=nohup Rscript runSim.R n=800 > n800.txt"
## [1] "sysCall=nohup Rscript runSim.R n=900 > n900.txt"
## [1] "sysCall=nohup Rscript runSim.R n=1000 > n1000.txt"