In [1]:
versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code

Computer arithmetics

Units of computer storage

  • bit = binary + digit (coined by statistician John Tukey).
  • byte = 8 bits.
  • KB = kilobyte = $10^3$ bytes.
  • MB = megabytes = $10^6$ bytes.
  • GB = gigabytes = $10^9$ bytes. Typical RAM size.
  • TB = terabytes = $10^{12}$ bytes. Typical hard drive size. Size of NYSE each trading session.
  • PB = petabytes = $10^{15}$ bytes.
  • EB = exabytes = $10^{18}$ bytes. Size of all healthcare data in 2011 is ~150 EB.
  • ZB = zetabytes = $10^{21}$ bytes.

Julia function Base.summarysize shows the amount of memory (in bytes) used by an object.

In [2]:
x = rand(100, 100)
Base.summarysize(x)
Out[2]:
80040

varinfo() function prints all variables in workspace and their sizes.

In [3]:
varinfo() # similar to Matlab whos()
Out[3]:
name size summary
Base Module
Core Module
Main Module
x 78.164 KiB 100×100 Array{Float64,2}

Storage of Characters

  • Plain text files are stored in the form of characters: .jl, .r, .c, .cpp, .ipynb, .html, .tex, ...
  • ASCII (American Code for Information Interchange): 7 bits, only $2^7=128$ characters.
In [4]:
# integers 0, 1, ..., 127 and corresponding ascii character
[0:127 Char.(0:127)]
Out[4]:
128×2 Array{Any,2}:
   0  '\0'  
   1  '\x01'
   2  '\x02'
   3  '\x03'
   4  '\x04'
   5  '\x05'
   6  '\x06'
   7  '\a'  
   8  '\b'  
   9  '\t'  
  10  '\n'  
  11  '\v'  
  12  '\f'  
   ⋮        
 116  't'   
 117  'u'   
 118  'v'   
 119  'w'   
 120  'x'   
 121  'y'   
 122  'z'   
 123  '{'   
 124  '|'   
 125  '}'   
 126  '~'   
 127  '\x7f'
  • Extended ASCII: 8 bits, $2^8=256$ characters.
In [5]:
# integers 128, 129, ..., 255 and corresponding extended ascii character
# show(STDOUT, "text/plain", [128:255 Char.(128:255)])
[128:255 Char.(128:255)]
Out[5]:
128×2 Array{Any,2}:
 128  '\u80'
 129  '\u81'
 130  '\u82'
 131  '\u83'
 132  '\u84'
 133  '\u85'
 134  '\u86'
 135  '\u87'
 136  '\u88'
 137  '\u89'
 138  '\u8a'
 139  '\u8b'
 140  '\u8c'
   ⋮        
 244  'ô'   
 245  'õ'   
 246  'ö'   
 247  '÷'   
 248  'ø'   
 249  'ù'   
 250  'ú'   
 251  'û'   
 252  'ü'   
 253  'ý'   
 254  'þ'   
 255  'ÿ'   
  • Unicode: UTF-8, UTF-16 and UTF-32 support many more characters including foreign characters; last 7 digits conform to ASCII.

  • UTF-8 is the current dominant character encoding on internet.

  • Julia supports the full range of UTF-8 characters. You can type many Unicode math symbols by typing the backslashed LaTeX symbol name followed by tab.
In [6]:
# \beta-<tab>
β = 0.0
# \beta-<tab>-\hat-<tab>
β̂ = 0.0
Out[6]:
0.0

Integers: fixed-point number system

  • Fixed-point number system is a computer model for integers $\mathbb{Z}$.

  • The number of bits and method of representing negative numbers vary from system to system.

    • The integer type in R has $M=32$ or 64 bits, determined by machine word size.
    • Matlab has (u)int8, (u)int16, (u)int32, (u)int64.
  • Julia has even more integer types. Using Tom Breloff's Plots.jl and GraphRecipes.jl packages, we can visualize the type tree under Integer

    • Storage for a Signed or Unsigned integer can be $M = 8, 16, 32, 64$ or 128 bits.
    • GraphRecipes.jl package has a convenience function for plotting the type hiearchy.
In [7]:
using GraphRecipes, Plots

#pyplot(size=(800, 600))
gr(size=(600, 400))
theme(:default)

plot(Integer, method=:tree, fontsize=4)
Out[7]:
Any Number Real Integer Bool OffsetInteger Signed BigInt Int128 Int16 Int32 Int64 Int8 Unsigned UInt128 UInt16 UInt32 UInt64 UInt8

Signed integers

  • First bit indicates sign.

    • 0 for nonnegative numbers
    • 1 for negative numbers
  • Two's complement representation for negative numbers.

    • Sign bit is set to 1
    • remaining bits are set to opposite values
    • 1 is added to the result
    • Two's complement representation of a negative integer x is same as the unsigned integer 2^64 + x.
In [8]:
@show typeof(18)
@show bitstring(18)
@show bitstring(-18)
@show bitstring(UInt64(Int128(2)^64 - 18)) == bitstring(-18)
@show bitstring(2 * 18) # shift bits of 18
@show bitstring(2 * -18); # shift bits of -18
typeof(18) = Int64
bitstring(18) = "0000000000000000000000000000000000000000000000000000000000010010"
bitstring(-18) = "1111111111111111111111111111111111111111111111111111111111101110"
bitstring(UInt64(Int128(2) ^ 64 - 18)) == bitstring(-18) = true
bitstring(2 * 18) = "0000000000000000000000000000000000000000000000000000000000100100"
bitstring(2 * -18) = "1111111111111111111111111111111111111111111111111111111111011100"
  • Two's complement representation respects modular arithmetic nicely.
    Addition of any two signed integers are just bitwise addition, possibly modulo $2^M$

  • Range of representable integers by $M$-bit signed integer is $[-2^{M-1},2^{M-1}-1]$.
    • Julia functions typemin(T) and typemax(T) give the lowest and highest representable number of a type T respectively
In [9]:
typemin(Int64), typemax(Int64)
Out[9]:
(-9223372036854775808, 9223372036854775807)
In [10]:
for T in [Int8, Int16, Int32, Int64, Int128]
    println(T, '\t', typemin(T), '\t', typemax(T))
end
Int8	-128	127
Int16	-32768	32767
Int32	-2147483648	2147483647
Int64	-9223372036854775808	9223372036854775807
Int128	-170141183460469231731687303715884105728	170141183460469231731687303715884105727

Unsigned integers

  • For unsigned integers, the range is $[0,2^M-1]$.
In [11]:
for t in [UInt8, UInt16, UInt32, UInt64, UInt128]
    println(t, '\t', typemin(t), '\t', typemax(t))
end
UInt8	0	255
UInt16	0	65535
UInt32	0	4294967295
UInt64	0	18446744073709551615
UInt128	0	340282366920938463463374607431768211455

BigInt

Julia BigInt type is arbitrary precision.

In [12]:
@show typemax(Int128)
@show typemax(Int128) + 1 # modular arithmetic!
@show BigInt(typemax(Int128)) + 1;
typemax(Int128) = 170141183460469231731687303715884105727
typemax(Int128) + 1 = -170141183460469231731687303715884105728
BigInt(typemax(Int128)) + 1 = 170141183460469231731687303715884105728

Overflow and underflow for integer arithmetic

R reports NA for integer overflow and underflow.
Julia outputs the result according to modular arithmetic.

In [13]:
@show typemax(Int32)
@show typemax(Int32) + Int32(1); # modular arithmetics!
typemax(Int32) = 2147483647
typemax(Int32) + Int32(1) = -2147483648
In [14]:
using RCall

R"""
.Machine$integer.max
"""
Out[14]:
RObject{IntSxp}
[1] 2147483647
In [15]:
R"""
M <- 32
big <- 2^(M-1) - 1
as.integer(big)
"""
Out[15]:
RObject{IntSxp}
[1] 2147483647
In [16]:
R"""
as.integer(big+1)
"""
┌ Warning: RCall.jl: Warning: NAs introduced by coercion to integer range
└ @ RCall /Users/huazhou/.julia/packages/RCall/ffM0W/src/io.jl:113
Out[16]:
RObject{IntSxp}
[1] NA

Real numbers: floating-number system

Floating-point number system is a computer model for real numbers.

  • Most computer systems adopt the IEEE 754 standard, established in 1985, for floating-point arithmetics.
    For the history, see an interview with William Kahan.

  • In the scientific notation, a real number is represented as $$\pm d_0.d_1d_2 \cdots d_p \times b^e.$$ In computer, the base is $b=2$ and the digits $d_i$ are 0 or 1.

  • Normalized vs denormalized numbers. For example, decimal number 18 is $$ +1.0010 \times 2^4 \quad (\text{normalized})$$ or, equivalently, $$ +0.1001 \times 2^5 \quad (\text{denormalized}).$$

  • In the floating-number system, computer stores

    • sign bit
    • the fraction (or mantissa, or significand) of the normalized representation
    • the actual exponent plus a bias
In [17]:
using GraphRecipes, Plots

#pyplot(size=(800, 600))
gr(size=(600, 400))
theme(:default)

plot(AbstractFloat, method=:tree, fontsize=4)
Out[17]:
Any Number Real AbstractFloat BigFloat Float16 Float32 Float64

Double precision (Float64)

  • Double precision (64 bits = 8 bytes) numbers are the dominant data type in scientific computing.

  • In Julia, Float64 is the type for double precision numbers.

  • First bit is sign bit.

  • $p=52$ significant bits.

  • 11 exponent bits: $e_{\max}=1023$, $e_{\min}=-1022$, bias=1023.

  • $e_{\text{min}}-1$ and $e_{\text{max}}+1$ are reserved for special numbers.

  • range of magnitude: $10^{\pm 308}$ in decimal because $\log_{10} (2^{1023}) \approx 308$.

  • precision to the $\log_{10}(2^{-52}) \approx 15$ decimal point.

In [18]:
println("Double precision:")
@show bitstring(Float64(18)) # 18 in double precision
@show bitstring(Float64(-18)); # -18 in double precision
Double precision:
bitstring(Float64(18)) = "0100000000110010000000000000000000000000000000000000000000000000"
bitstring(Float64(-18)) = "1100000000110010000000000000000000000000000000000000000000000000"

Single precision (Float32)

  • In Julia, Float32 is the type for single precision numbers.

  • First bit is sign bit.

  • $p=23$ significant bits.

  • 8 exponent bits: $e_{\max}=127$, $e_{\min}=-126$, bias=127.

  • $e_{\text{min}}-1$ and $e_{\text{max}}+1$ are reserved for special numbers.

  • range of magnitude: $10^{\pm 38}$ in decimal because $\log_{10} (2^{127}) \approx 38$.

  • precision: $\log_{10}(2^{23}) \approx 7$ decimal point.

In [19]:
println("Single precision:")
@show bitstring(Float32(18.0)) # 18 in single precision
@show bitstring(Float32(-18.0)); # -18 in single precision
Single precision:
bitstring(Float32(18.0)) = "01000001100100000000000000000000"
bitstring(Float32(-18.0)) = "11000001100100000000000000000000"

Half precision (Float16)

  • In Julia, Float16 is the type for half precision numbers.

  • First bit is sign bit.

  • $p=10$ significant bits.

  • 5 exponent bits: $e_{\max}=15$, $e_{\min}=-14$, bias=15.

  • $e_{\text{min}}-1$ and $e_{\text{max}}+1$ are reserved for special numbers.

  • range of magnitude: $10^{\pm 4}$ in decimal because $\log_{10} (2^{15}) \approx 4$.

  • precision: $\log_{10}(2^{10}) \approx 3$ decimal point.

In [20]:
println("Half precision:")
@show bitstring(Float16(18)) # 18 in half precision
@show bitstring(Float16(-18)); # -18 in half precision
Half precision:
bitstring(Float16(18)) = "0100110010000000"
bitstring(Float16(-18)) = "1100110010000000"

Special floating-point numbers.

  • Exponent $e_{\max}+1$ plus a zero mantissa means $\pm \infty$.
In [21]:
@show bitstring(Inf) # Inf in double precision
@show bitstring(-Inf); # -Inf in double precision
bitstring(Inf) = "0111111111110000000000000000000000000000000000000000000000000000"
bitstring(-Inf) = "1111111111110000000000000000000000000000000000000000000000000000"
  • Exponent $e_{\max}+1$ plus a nonzero mantissa means NaN. NaN could be produced from 0 / 0, 0 * Inf, ...

  • In general NaN ≠ NaN bitwise. Test whether a number is NaN by isnan function.

In [22]:
@show bitstring(0 / 0) # NaN
@show bitstring(0 * Inf); # NaN
bitstring(0 / 0) = "1111111111111000000000000000000000000000000000000000000000000000"
bitstring(0Inf) = "1111111111111000000000000000000000000000000000000000000000000000"
  • Exponent $e_{\min}-1$ with a zero mantissa represents the real number 0.
In [23]:
@show bitstring(0.0); # 0 in double precision
bitstring(0.0) = "0000000000000000000000000000000000000000000000000000000000000000"
  • Exponent $e_{\min}-1$ with a nonzero mantissa are for numbers less than $b^{e_{\min}}$.
    Numbers are denormalized in the range $(0,b^{e_{\min}})$ -- graceful underflow.
In [24]:
@show nextfloat(0.0) # next representable number
@show bitstring(nextfloat(0.0)); # denormalized
nextfloat(0.0) = 5.0e-324
bitstring(nextfloat(0.0)) = "0000000000000000000000000000000000000000000000000000000000000001"

Rounding

  • Rounding is necessary whenever a number has more than $p$ significand bits. Most computer systems use the default IEEE 754 round to nearest mode (also called ties to even mode). Julia offers several rounding modes, the default being RoundNearest. For example, the number 0.1 in decimal system cannot be represented accurately as a floating point number: $$ 0.1 = 1.10011001... \times 2^{-4} $$
In [25]:
@show bitstring(0.1f0) # single precision Float32, 1001 gets rounded to 101(0)
@show bitstring(0.1);  # double precision Float64
bitstring(0.1f0) = "00111101110011001100110011001101"
bitstring(0.1) = "0011111110111001100110011001100110011001100110011001100110011010"

Summary

  • Single precision: range $\pm 10^{\pm 38}$ with precision up to 7 decimal digits.

  • Double precision: range $\pm 10^{\pm 308}$ with precision up to 16 decimal digits.

  • The floating-point numbers do not occur uniformly over the real number line Each magnitude has same number of representible numbers

  • Machine epsilons are the spacings of numbers around 1: $$\epsilon_{\min}=b^{-p}, \quad \epsilon_{\max} = b^{1-p}.$$

In [26]:
@show eps(Float32)  # machine epsilon for a floating point type
@show eps(Float64)  # same as eps()
# eps(x) is the spacing after x
@show eps(100.0)
@show eps(0.0)
# nextfloat(x) and prevfloat(x) give the neighbors of x
@show x = 1.25f0
@show prevfloat(x), x, nextfloat(x)
@show bitstring(prevfloat(x)), bitstring(x), bitstring(nextfloat(x));
eps(Float32) = 1.1920929f-7
eps(Float64) = 2.220446049250313e-16
eps(100.0) = 1.4210854715202004e-14
eps(0.0) = 5.0e-324
x = 1.25f0 = 1.25f0
(prevfloat(x), x, nextfloat(x)) = (1.2499999f0, 1.25f0, 1.2500001f0)
(bitstring(prevfloat(x)), bitstring(x), bitstring(nextfloat(x))) = ("00111111100111111111111111111111", "00111111101000000000000000000000", "00111111101000000000000000000001")
  • In R, the variable .Machine contains numerical characteristics of the machine.
In [27]:
R"""
.Machine
"""
Out[27]:
RObject{VecSxp}
$double.eps
[1] 2.220446e-16

$double.neg.eps
[1] 1.110223e-16

$double.xmin
[1] 2.225074e-308

$double.xmax
[1] 1.797693e+308

$double.base
[1] 2

$double.digits
[1] 53

$double.rounding
[1] 5

$double.guard
[1] 0

$double.ulp.digits
[1] -52

$double.neg.ulp.digits
[1] -53

$double.exponent
[1] 11

$double.min.exp
[1] -1022

$double.max.exp
[1] 1024

$integer.max
[1] 2147483647

$sizeof.long
[1] 8

$sizeof.longlong
[1] 8

$sizeof.longdouble
[1] 16

$sizeof.pointer
[1] 8

  • Julia provides Float16 (half precision), Float32 (single precision), Float64 (double precision), and BigFloat (arbitrary precision).

Overflow and underflow of floating-point number

  • For double precision, the range is $\pm 10^{\pm 308}$. In most situations, underflow (magnitude of result is less than $10^{-308}$) is preferred over overflow (magnitude of result is larger than $10^{-308}$). Overflow produces $\pm \inf$. Underflow yields zeros or denormalized numbers.

  • E.g., the logit link function is $$p = \frac{\exp (x^T \beta)}{1 + \exp (x^T \beta)} = \frac{1}{1+\exp(- x^T \beta)}.$$ The former expression can easily lead to Inf / Inf = NaN, while the latter expression leads to graceful underflow.

  • floatmin and floatmax functions gives largest and smallest finite number represented by a type.

In [28]:
for T in [Float16, Float32, Float64]
    println(T, '\t', floatmin(T), '\t', floatmax(T), '\t', typemin(T), 
        '\t', typemax(T), '\t', eps(T))
end
Float16	6.104e-5	6.55e4	-Inf	Inf	0.000977
Float32	1.1754944e-38	3.4028235e38	-Inf	Inf	1.1920929e-7
Float64	2.2250738585072014e-308	1.7976931348623157e308	-Inf	Inf	2.220446049250313e-16
  • BigFloat in Julia offers arbitrary precision.
In [29]:
@show precision(BigFloat)
@show floatmin(BigFloat)
@show floatmax(BigFloat);
precision(BigFloat) = 256
floatmin(BigFloat) = 8.50969131174083613912978790962048280567755996982969624908264897850135431080301e-1388255822130839284
floatmax(BigFloat) = 5.875653789111587590936911998878442589938516392745498308333779606469323584389875e+1388255822130839282
In [30]:
@show BigFloat(π); # default precision for BigFloat is 256 bits
# set precision to 1024 bits
setprecision(BigFloat, 1024) do 
    @show BigFloat(π)
end;
BigFloat(π) = 3.141592653589793238462643383279502884197169399375105820974944592307816406286198
BigFloat(π) = 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724586997

Catastrophic cancellation

  • Scenario 1: Addition or subtraction of two numbers of widely different magnitudes: $a+b$ or $a-b$ where $a \gg b$ or $a \ll b$. We loose the precision in the number of smaller magnitude. Consider $$\begin{eqnarray*} a &=& x.xxx ... \times 2^{30} \\ b &=& y.yyy... \times 2^{-30} \end{eqnarray*}$$ What happens when computer calculates $a+b$? We get $a+b=a$!
In [31]:
@show a = 2.0^30
@show b = 2.0^-30
@show a + b == a
a = 2.0 ^ 30 = 1.073741824e9
b = 2.0 ^ -30 = 9.313225746154785e-10
a + b == a = true
Out[31]:
true
  • Scenario 2: Subtraction of two nearly equal numbers eliminates significant digits. $a-b$ where $a \approx b$. Consider $$\begin{eqnarray*} a &=& x.xxxxxxxxxx1ssss \\ b &=& x.xxxxxxxxxx0tttt \end{eqnarray*}$$ The result is $1.vvvvu...u$ where $u$ are unassigned digits.
In [32]:
a = 1.2345678f0 # rounding
@show bitstring(a) # rounding
b = 1.2345677f0
@show bitstring(b)
@show a - b # correct result should be 1e-7
bitstring(a) = "00111111100111100000011001010001"
bitstring(b) = "00111111100111100000011001010000"
a - b = 1.1920929f-7
Out[32]:
1.1920929f-7
  • Implications for numerical computation
    • Rule 1: add small numbers together before adding larger ones
    • Rule 2: add numbers of like magnitude together (paring). When all numbers are of same sign and similar magnitude, add in pairs so each stage the summands are of similar magnitude
    • Rule 3: avoid substraction of two numbers that are nearly equal

Algebraic laws

Floating-point numbers may violate many algebraic laws we are familiar with, such associative and distributive laws. See Homework 1 problems.

Further reading