versioninfo()
bit
= binary
+ digit
(coined by statistician John Tukey). byte
= 8 bits. Julia function Base.summarysize
shows the amount of memory (in bytes) used by an object.
x = rand(100, 100)
Base.summarysize(x)
varinfo()
function prints all variables in workspace and their sizes.
varinfo() # similar to Matlab whos()
.jl
, .r
, .c
, .cpp
, .ipynb
, .html
, .tex
, ... # integers 0, 1, ..., 127 and corresponding ascii character
[0:127 Char.(0:127)]
# integers 128, 129, ..., 255 and corresponding extended ascii character
# show(STDOUT, "text/plain", [128:255 Char.(128:255)])
[128:255 Char.(128:255)]
Unicode: UTF-8, UTF-16 and UTF-32 support many more characters including foreign characters; last 7 digits conform to ASCII.
UTF-8 is the current dominant character encoding on internet.
# \beta-<tab>
β = 0.0
# \beta-<tab>-\hat-<tab>
β̂ = 0.0
Fixed-point number system is a computer model for integers $\mathbb{Z}$.
The number of bits and method of representing negative numbers vary from system to system.
integer
type in R has $M=32$ or 64 bits, determined by machine word size. (u)int8
, (u)int16
, (u)int32
, (u)int64
. Julia has even more integer types. Using Tom Breloff's Plots.jl
and GraphRecipes.jl
packages, we can visualize the type tree under Integer
Signed
or Unsigned
integer can be $M = 8, 16, 32, 64$ or 128 bits. using GraphRecipes, Plots
#pyplot(size=(800, 600))
gr(size=(600, 400))
theme(:default)
plot(Integer, method=:tree, fontsize=4)
First bit indicates sign.
0
for nonnegative numbers1
for negative numbers Two's complement representation for negative numbers.
x
is same as the unsigned integer 2^64 + x
.@show typeof(18)
@show bitstring(18)
@show bitstring(-18)
@show bitstring(UInt64(Int128(2)^64 - 18)) == bitstring(-18)
@show bitstring(2 * 18) # shift bits of 18
@show bitstring(2 * -18); # shift bits of -18
typemin(T)
and typemax(T)
give the lowest and highest representable number of a type T
respectivelytypemin(Int64), typemax(Int64)
for T in [Int8, Int16, Int32, Int64, Int128]
println(T, '\t', typemin(T), '\t', typemax(T))
end
for t in [UInt8, UInt16, UInt32, UInt64, UInt128]
println(t, '\t', typemin(t), '\t', typemax(t))
end
BigInt
¶Julia BigInt
type is arbitrary precision.
@show typemax(Int128)
@show typemax(Int128) + 1 # modular arithmetic!
@show BigInt(typemax(Int128)) + 1;
R reports NA
for integer overflow and underflow.
Julia outputs the result according to modular arithmetic.
@show typemax(Int32)
@show typemax(Int32) + Int32(1); # modular arithmetics!
using RCall
R"""
.Machine$integer.max
"""
R"""
M <- 32
big <- 2^(M-1) - 1
as.integer(big)
"""
R"""
as.integer(big+1)
"""
Floating-point number system is a computer model for real numbers.
Most computer systems adopt the IEEE 754 standard, established in 1985, for floating-point arithmetics.
For the history, see an interview with William Kahan.
In the scientific notation, a real number is represented as $$\pm d_0.d_1d_2 \cdots d_p \times b^e.$$ In computer, the base is $b=2$ and the digits $d_i$ are 0 or 1.
Normalized vs denormalized numbers. For example, decimal number 18 is $$ +1.0010 \times 2^4 \quad (\text{normalized})$$ or, equivalently, $$ +0.1001 \times 2^5 \quad (\text{denormalized}).$$
In the floating-number system, computer stores
using GraphRecipes, Plots
#pyplot(size=(800, 600))
gr(size=(600, 400))
theme(:default)
plot(AbstractFloat, method=:tree, fontsize=4)
Double precision (64 bits = 8 bytes) numbers are the dominant data type in scientific computing.
In Julia, Float64
is the type for double precision numbers.
First bit is sign bit.
$p=52$ significant bits.
11 exponent bits: $e_{\max}=1023$, $e_{\min}=-1022$, bias=1023.
$e_{\text{min}}-1$ and $e_{\text{max}}+1$ are reserved for special numbers.
range of magnitude: $10^{\pm 308}$ in decimal because $\log_{10} (2^{1023}) \approx 308$.
precision to the $\log_{10}(2^{-52}) \approx 15$ decimal point.
println("Double precision:")
@show bitstring(Float64(18)) # 18 in double precision
@show bitstring(Float64(-18)); # -18 in double precision
In Julia, Float32
is the type for single precision numbers.
First bit is sign bit.
$p=23$ significant bits.
8 exponent bits: $e_{\max}=127$, $e_{\min}=-126$, bias=127.
$e_{\text{min}}-1$ and $e_{\text{max}}+1$ are reserved for special numbers.
range of magnitude: $10^{\pm 38}$ in decimal because $\log_{10} (2^{127}) \approx 38$.
precision: $\log_{10}(2^{23}) \approx 7$ decimal point.
println("Single precision:")
@show bitstring(Float32(18.0)) # 18 in single precision
@show bitstring(Float32(-18.0)); # -18 in single precision
In Julia, Float16
is the type for half precision numbers.
First bit is sign bit.
$p=10$ significant bits.
5 exponent bits: $e_{\max}=15$, $e_{\min}=-14$, bias=15.
$e_{\text{min}}-1$ and $e_{\text{max}}+1$ are reserved for special numbers.
range of magnitude: $10^{\pm 4}$ in decimal because $\log_{10} (2^{15}) \approx 4$.
precision: $\log_{10}(2^{10}) \approx 3$ decimal point.
println("Half precision:")
@show bitstring(Float16(18)) # 18 in half precision
@show bitstring(Float16(-18)); # -18 in half precision
@show bitstring(Inf) # Inf in double precision
@show bitstring(-Inf); # -Inf in double precision
Exponent $e_{\max}+1$ plus a nonzero mantissa means NaN
. NaN
could be produced from 0 / 0
, 0 * Inf
, ...
In general NaN ≠ NaN
bitwise. Test whether a number is NaN
by isnan
function.
@show bitstring(0 / 0) # NaN
@show bitstring(0 * Inf); # NaN
@show bitstring(0.0); # 0 in double precision
@show nextfloat(0.0) # next representable number
@show bitstring(nextfloat(0.0)); # denormalized
RoundNearest
. For example, the number 0.1 in decimal system cannot be represented accurately as a floating point number:
$$ 0.1 = 1.10011001... \times 2^{-4} $$@show bitstring(0.1f0) # single precision Float32, 1001 gets rounded to 101(0)
@show bitstring(0.1); # double precision Float64
Single precision: range $\pm 10^{\pm 38}$ with precision up to 7 decimal digits.
Double precision: range $\pm 10^{\pm 308}$ with precision up to 16 decimal digits.
The floating-point numbers do not occur uniformly over the real number line Each magnitude has same number of representible numbers
Machine epsilons are the spacings of numbers around 1: $$\epsilon_{\min}=b^{-p}, \quad \epsilon_{\max} = b^{1-p}.$$
@show eps(Float32) # machine epsilon for a floating point type
@show eps(Float64) # same as eps()
# eps(x) is the spacing after x
@show eps(100.0)
@show eps(0.0)
# nextfloat(x) and prevfloat(x) give the neighbors of x
@show x = 1.25f0
@show prevfloat(x), x, nextfloat(x)
@show bitstring(prevfloat(x)), bitstring(x), bitstring(nextfloat(x));
.Machine
contains numerical characteristics of the machine.R"""
.Machine
"""
Float16
(half precision), Float32
(single precision), Float64
(double precision), and BigFloat
(arbitrary precision).For double precision, the range is $\pm 10^{\pm 308}$. In most situations, underflow (magnitude of result is less than $10^{-308}$) is preferred over overflow (magnitude of result is larger than $10^{-308}$). Overflow produces $\pm \inf$. Underflow yields zeros or denormalized numbers.
E.g., the logit link function is
$$p = \frac{\exp (x^T \beta)}{1 + \exp (x^T \beta)} = \frac{1}{1+\exp(- x^T \beta)}.$$
The former expression can easily lead to Inf / Inf = NaN
, while the latter expression leads to graceful underflow.
floatmin
and floatmax
functions gives largest and smallest finite number represented by a type.
for T in [Float16, Float32, Float64]
println(T, '\t', floatmin(T), '\t', floatmax(T), '\t', typemin(T),
'\t', typemax(T), '\t', eps(T))
end
BigFloat
in Julia offers arbitrary precision.@show precision(BigFloat)
@show floatmin(BigFloat)
@show floatmax(BigFloat);
@show BigFloat(π); # default precision for BigFloat is 256 bits
# set precision to 1024 bits
setprecision(BigFloat, 1024) do
@show BigFloat(π)
end;
@show a = 2.0^30
@show b = 2.0^-30
@show a + b == a
a = 1.2345678f0 # rounding
@show bitstring(a) # rounding
b = 1.2345677f0
@show bitstring(b)
@show a - b # correct result should be 1e-7
Floating-point numbers may violate many algebraic laws we are familiar with, such associative and distributive laws. See Homework 1 problems.
Textbook treatment, e.g., Chapter II.2 of Computational Statistics by James Gentle (2010).
What every computer scientist should know about floating-point arithmetic by David Goldberg (1991).