BIOSTAT M280: Statistical Computing

  • Tue/Thu 1pm-2:20pm @ CHS 33-105A
  • Instructor: Dr. Hua Zhou, huazhou@ucla.edu
  • Multi-listed as BIOSTAT M280, BIOMATH M280, and STAT M230
  • For biostat studuents, this course satisfies the requirement of BIOSTAT 257 in the new curriculum. Ask Ms Roxy Naranjo rlnaranjo@ph.ucla.edu for the paperwork.

What is statistics?

  • Statistics, the science of data analysis, is the applied mathematics in the 21st century.

  • People (scientists, goverment, health professionals, companies) collect data in order to answer certain questions. Statisticians's job is to help them extract knowledge and insights from data.

  • Must-read for (bio)statistics students:

  • If existing software tools readily solve the problem, all the better.

  • Often statisticians need to implement their own methods, test new algorithms, or tailor classical methods to new types of data (big, streaming).

  • This entails at least two essential skills: programming and fundamental knowledge of algorithms.

What is this course about?

  • Not a course on statistical packages. It does not answer questions such as How to fit a linear mixed model in R, Julia, SAS, SPSS, or Stata?

  • Not a pure programming course, although programming is important and we do homework in Julia.
    The new BIOSTAT 203A (Data Management) in fall quarter focuses on programming in R and SAS.

  • Not a course on data science. The new course BIOSTAT 203B (Introduction to Data Science) in winter quarter focuses on some software tools for data scientists.

  • This course focuses on algorithms, mostly those in numerical linear algebra and numerical optimization.

  • To quote James Gentle

    The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.

  • For a common numerical task in statistics, say solving the least squares problem $$ \widehat \beta = ({\bf X}^T {\bf X})^{-1} {\bf X}^T {\bf y}, $$ we need to know which methods/algorithms are out there and what are their advantages and disadvantages. You will fail this course if you use

    inv(X'X) * X' * y
    

    Using X \ y in Julia/Matlab (or lm(y ~ X) in R) is correct but not the purpose of this course. We want to understand what computer is doing when calling X \ y.

Course logistics