bit
= binary
+ digit
(coined by statistician John Tukey). byte
= 8 bits. Julia function Base.summarysize
shows the amount of memory (in bytes) used by an object.
x = rand(100, 100)
Base.summarysize(x)
.jl
, .r
, .c
, .cpp
, .ipynb
, .html
, .tex
, ... # integers 0, 1, ..., 127 and corresponding ascii character
[0:127 Char.(0:127)]
# integers 128, 129, ..., 255 and corresponding extended ascii character
[128:255 Char.(128:255)]
Unicode: UTF-8, UTF-16 and UTF-32 support many more characters including foreign characters; last 7 digits conform to ASCII.
UTF-8 is the current dominant character encoding on internet.
# \beta-<tab>
β = 0.0
# \beta-<tab>-\hat-<tab>
β̂ = 0.0
Fixed-point number system $\mathbb{I}$ is a computer model for integers $\mathbb{Z}$.
The number of bits and method of representing negative numbers vary from system to system.
integer
type in R has $M=32$ or 64 bits. (u)int8
, (u)int16
, (u)int32
, (u)int64
. Plots.jl
and PlotRecipes.jl
packages, we can visualize the type tree under Integer
Signed
or Unsigned
integer can be $M = 8, 16, 32, 64$ or 128 bits.using PlotRecipes
pyplot(alpha=0.5, size=(800, 500))
# make a list of a type T and it's supertypes
T = Integer
sups = [T]
sup = T
while sup != Any
sup = supertype(sup)
unshift!(sups, sup)
end
sups
# recursively build a graph of subtypes of T
n = length(sups)
nodes, source, destiny = copy(sups), collect(1:n-1), collect(2:n)
function add_subs!(T, supidx)
for sub in subtypes(T)
push!(nodes, sub)
subidx = length(nodes)
push!(source, supidx)
push!(destiny, subidx)
add_subs!(sub, subidx)
end
end
add_subs!(T, n)
names = map(string, nodes)
graphplot(source, destiny, names=names, method=:tree)
First bit indicates sign.
0
for nonnegative numbers1
for negative numbers Two's complement representation for negative numbers.
@show bits(Int8(18))
@show bits(Int8(-18));
typemin(T)
and typemax(T)
give the lowest and highest representable number of a type T
respectivelyfor t in [Int8 Int16 Int32 Int64 Int128]
println(t, '\t', typemin(t), '\t', typemax(t))
end
for t in [UInt8 UInt16 UInt32 UInt64 UInt128]
println(t, '\t', typemin(t), '\t', typemax(t))
end
BigInt
¶Julia BigInt
type is arbitrary precision.
@show typemax(Int128)
@show typemax(Int128) + 1 # modular arithmetic!
@show BigInt(typemax(Int128)) + 1
R reports NA
for integer overflow and underflow.
Julia outputs the result according to modular arithmetic.
@show typemax(Int32) + Int32(1) # modular arithmetics!
using RCall
R"""
.Machine$integer.max
"""
R"""
M <- 32
big <- 2^(M-1) - 1
as.integer(big)
"""
R"""
as.integer(big+1)
"""
Floating-point number system is a computer model for real numbers $\mathbb{R}$.
Most computer systems adopt the IEEE 754 standard, established in 1985, for floating-point arithmetics.
For the history, see an interview with William Kahan.
In the scientific notation, a real number is represented as $$\pm d_0.d_1d_2 \cdots d_p \times b^e.$$ In computer, the base is $b=2$ and the digits $d_i$ are 0 or 1.
Normalized vs denormalized numbers. For example, decimal number 18 is $$ +1.0010 \times 2^4 \quad (\text{normalized})$$ or, equivalently, $$ +0.1001 \times 2^5 \quad (\text{denormalized}).$$
In the floating-number system, computer stores
Single precision (32 bit = 4 bytes)
Float32
is the type for single precision numbersDouble precision (64 bit = 8 bytes)
Float64
is the type for double precision numbers @show bits(Float32(18)) # 18 in single precision
@show bits(Float32(-18)) # -18 in single precision
@show bits(Float64(18)) # 18 in double precision
@show bits(Float64(-18)) # -18 in double precision
@show Float32(π) # SP number displays 7 decimal digits
@show Float64(π) # DP number displays 15 decimal digits
NaN
. NaN
could be produced from 0 / 0
, 0 * Inf
, ... In general NaN ≠ NaN
bitwise @show bits(Inf32) # Inf in single precision
@show bits(-Inf32) # -Inf in single precision
@show bits(Float32(0) / Float32(0)) # NaN
@show bits(Inf32 / Inf32) # NaN
@show bits(Float32(0)) # 0 in single precision
@show nextfloat(Float32(0)) # next representable number
@show bits(nextfloat(Float32(0))) # denormalized
RoundNearest
. For example, the number
$$ 0.1 = 1.10011001... \times 2^{-4} $$@show bits(0.1f0) # single precision Float32
@show bits(0.1); # double precision Float64
@show eps(Float32) # machine epsilon for a floating point type
@show eps(Float64) # same as eps()
# eps(x) is the spacing after x
@show eps(100.0)
@show eps(0.0)
# nextfloat(x) and prevfloat(x) give the neighbors of x
x = 1.25f0
@show prevfloat(x), x, nextfloat(x)
@show bits(prevfloat(x)), bits(x), bits(nextfloat(x));
.Machine
contains numerical characteristics of the machine.R"""
.Machine
"""
Float16
(half precision), Float32
(single precision), Float64
(double precision), and BigFloat
(arbitrary precision).# make a list of a type T and it's supertypes
T = AbstractFloat
sups = [T]
sup = T
while sup != Any
sup = supertype(sup)
unshift!(sups,sup)
end
n = length(sups)
nodes, source, destiny = copy(sups), collect(1:n-1), collect(2:n)
add_subs!(T, n)
names = map(string, nodes)
graphplot(source, destiny, names=names, method=:tree)
For double precision, the range is $\pm 10^{\pm 308}$. In most situations, underflow is preferred over overflow. Overflow causes crashes. Underflow yields zeros or denormalized numbers.
E.g., the logit link function is
$$p = \frac{\exp (x^T \beta)}{1 + \exp (x^T \beta)} = \frac{1}{1+\exp(- x^T \beta)}.$$
The former expression can easily lead to Inf / Inf = NaN
, while the latter expression leads to graceful underflow.
for t in [Float16 Float32 Float64]
println(t, '\t', typemin(t), '\t', typemax(t), '\t', realmin(t), '\t', realmax(t), '\t', eps(t))
end
BigFloat
in Julia offers arbitrary precision.@show precision(BigFloat), realmin(BigFloat), realmax(BigFloat);
@show BigFloat(π); # default precision for BigFloat is 256 bits
# set precision to 1024 bits
setprecision(BigFloat, 1024) do
@show BigFloat(π)
end;
a = 1.0 * 2.0^30
b = 1.0 * 2.0^-30
a + b == a
a = 1.2345678f0 # rounding
@show bits(a) # rounding
b = 1.2345677f0
@show bits(b)
@show a - b # correct result should be 1e-7
@show bits(a - b);
Floating-point numbers may violate many algebraic laws we are familiar with, such associative and distributive laws. See Homework 1 problems.
Textbook treatment, e.g., Chapter II.2 of Computational Statistics by James Gentle (2010).
What every computer scientist should know about floating-point arithmetic by David Goldberg (1991).