bit
= binary
+ digit
(coined by statistician John Tukey). byte
= 8 bits. Julia function Base.summarysize
shows the amount of memory (in bytes) used by an object.
x = rand(100, 100)
Base.summarysize(x)
whos()
function prints all variables in workspace and their sizes.
whos()
.jl
, .r
, .c
, .cpp
, .ipynb
, .html
, .tex
, ... # integers 0, 1, ..., 127 and corresponding ascii character
# show(STDOUT, "text/plain", [0:127 Char.(0:127)])
[0:127 Char.(0:127)]
# integers 128, 129, ..., 255 and corresponding extended ascii character
# show(STDOUT, "text/plain", [128:255 Char.(128:255)])
[128:255 Char.(128:255)]
Unicode: UTF-8, UTF-16 and UTF-32 support many more characters including foreign characters; last 7 digits conform to ASCII.
UTF-8 is the current dominant character encoding on internet.
# \beta-<tab>
β = 0.0
# \beta-<tab>-\hat-<tab>
β̂ = 0.0
Fixed-point number system is a computer model for integers $\mathbb{Z}$.
The number of bits and method of representing negative numbers vary from system to system.
integer
type in R has $M=32$ or 64 bits. (u)int8
, (u)int16
, (u)int32
, (u)int64
. Plots.jl
and PlotRecipes.jl
packages, we can visualize the type tree under Integer
Signed
or Unsigned
integer can be $M = 8, 16, 32, 64$ or 128 bits.# make a list of a type T and it's supertypes
T = Integer
sups = [T]
sup = T
while sup != Any
sup = supertype(sup)
unshift!(sups, sup)
end
# recursively build a graph of subtypes of T
n = length(sups)
nodes, source, destiny = copy(sups), collect(1:n-1), collect(2:n)
function add_subs!(T, supidx)
for sub in subtypes(T)
push!(nodes, sub)
subidx = length(nodes)
push!(source, supidx)
push!(destiny, subidx)
add_subs!(sub, subidx)
end
end
add_subs!(T, n)
names = map(string, nodes)
using PlotRecipes
#pyplot(alpha=0.5, size=(800, 500))
gr(alpha=0.5, size=(800, 500))
graphplot(source, destiny, names=names, method=:tree)
First bit indicates sign.
0
for nonnegative numbers1
for negative numbers Two's complement representation for negative numbers.
x
is same as the unsigned integer 2^m + x
.@show typeof(18)
@show bits(18)
@show bits(-18)
@show bits(UInt64(Int128(2)^64 - 18)) == bits(-18)
@show bits(2 * 18) # shift bits of 18
@show bits(2 * -18); # shift bits of -18
typemin(T)
and typemax(T)
give the lowest and highest representable number of a type T
respectivelyfor T in [Int8, Int16, Int32, Int64, Int128]
println(T, '\t', typemin(T), '\t', typemax(T))
end
for t in [UInt8, UInt16, UInt32, UInt64, UInt128]
println(t, '\t', typemin(t), '\t', typemax(t))
end
BigInt
¶Julia BigInt
type is arbitrary precision.
@show typemax(Int128)
@show typemax(Int128) + 1 # modular arithmetic!
@show BigInt(typemax(Int128)) + 1;
R reports NA
for integer overflow and underflow.
Julia outputs the result according to modular arithmetic.
@show typemax(Int32)
@show typemax(Int32) + Int32(1); # modular arithmetics!
using RCall
R"""
.Machine$integer.max
"""
R"""
M <- 32
big <- 2^(M-1) - 1
as.integer(big)
"""
R"""
as.integer(big+1)
"""
Floating-point number system is a computer model for real numbers.
Most computer systems adopt the IEEE 754 standard, established in 1985, for floating-point arithmetics.
For the history, see an interview with William Kahan.
In the scientific notation, a real number is represented as $$\pm d_0.d_1d_2 \cdots d_p \times b^e.$$ In computer, the base is $b=2$ and the digits $d_i$ are 0 or 1.
Normalized vs denormalized numbers. For example, decimal number 18 is $$ +1.0010 \times 2^4 \quad (\text{normalized})$$ or, equivalently, $$ +0.1001 \times 2^5 \quad (\text{denormalized}).$$
In the floating-number system, computer stores
Float64
is the type for double precision numbers Single precision (32 bit = 4 bytes)
Float32
is the type for single precision numbersHalf precision (16 bit = 2 bytes)
Float16
is the type for half precision numbersprintln("Half precision:")
@show bits(Float16(18)) # 18 in half precision
@show bits(Float16(-18)) # -18 in half precision
println("Single precision:")
@show bits(Float32(18)) # 18 in single precision
@show bits(Float32(-18)) # -18 in single precision
println("Double precision:")
@show bits(Float64(18)) # 18 in double precision
@show bits(Float64(-18)) # -18 in double precision
@show Float32(π) # SP number displays 7 decimal digits
@show Float64(π) # DP number displays 15 decimal digits
NaN
. NaN
could be produced from 0 / 0
, 0 * Inf
, ...NaN ≠ NaN
bitwise @show bits(Inf) # Inf in double precision
@show bits(-Inf) # -Inf in double precision
@show bits(0 / 0) # NaN
@show bits(0 * Inf) # NaN
@show bits(0.0) # 0 in double precision
@show nextfloat(0.0) # next representable number
@show bits(nextfloat(0.0)); # denormalized
RoundNearest
. For example, the number 0.1 in decimal system cannot be represented accurately as a floating point number:
$$ 0.1 = 1.10011001... \times 2^{-4} $$@show bits(0.1f0) # single precision Float32
@show bits(0.1); # double precision Float64
@show eps(Float32) # machine epsilon for a floating point type
@show eps(Float64) # same as eps()
# eps(x) is the spacing after x
@show eps(100.0)
@show eps(0.0)
# nextfloat(x) and prevfloat(x) give the neighbors of x
@show x = 1.25f0
@show prevfloat(x), x, nextfloat(x)
@show bits(prevfloat(x)), bits(x), bits(nextfloat(x));
.Machine
contains numerical characteristics of the machine.R"""
.Machine
"""
Float16
(half precision), Float32
(single precision), Float64
(double precision), and BigFloat
(arbitrary precision).# make a list of a type T and it's supertypes
T = AbstractFloat
sups = [T]
sup = T
while sup != Any
sup = supertype(sup)
unshift!(sups,sup)
end
n = length(sups)
nodes, source, destiny = copy(sups), collect(1:n-1), collect(2:n)
add_subs!(T, n)
names = map(string, nodes)
graphplot(source, destiny, names=names, method=:tree)
For double precision, the range is $\pm 10^{\pm 308}$. In most situations, underflow is preferred over overflow. Overflow causes crashes. Underflow yields zeros or denormalized numbers.
E.g., the logit link function is
$$p = \frac{\exp (x^T \beta)}{1 + \exp (x^T \beta)} = \frac{1}{1+\exp(- x^T \beta)}.$$
The former expression can easily lead to Inf / Inf = NaN
, while the latter expression leads to graceful underflow.
for T in [Float16, Float32, Float64,]
println(T, '\t', typemin(T), '\t', typemax(T), '\t', realmin(T),
'\t', realmax(T), '\t', eps(T))
end
BigFloat
in Julia offers arbitrary precision.@show precision(BigFloat), realmin(BigFloat), realmax(BigFloat);
@show BigFloat(π); # default precision for BigFloat is 256 bits
# set precision to 1024 bits
setprecision(BigFloat, 1024) do
@show BigFloat(π)
end;
a = 1.0 * 2.0^30
b = 1.0 * 2.0^-30
a + b == a
a = 1.2345678f0 # rounding
@show bits(a) # rounding
b = 1.2345677f0
@show bits(b)
@show a - b # correct result should be 1e-7
@show bits(a - b);
Floating-point numbers may violate many algebraic laws we are familiar with, such associative and distributive laws. See Homework 1 problems.
Textbook treatment, e.g., Chapter II.2 of Computational Statistics by James Gentle (2010).
What every computer scientist should know about floating-point arithmetic by David Goldberg (1991).