Jan 9, 2018
\(\DeclareMathOperator*{\argmin}{arg\,min}\)
This course (Biostat M280) is used as a placeholder for Biostat 203B: Introduction to Data Science, which is pending approval.
Statistics, the science of data analysis, is the applied mathematics in the 21st century.
Data is increasing in volume, velocity, and variety.
Data Size | Bytes | Storage Mode |
---|---|---|
tiny | \(10^2\) | piece of paper |
small | \(10^4\) | a few pieces of paper |
medium | \(10^6\) (MB) | a floppy disk |
large | \(10^8\) | hard disk |
huge | \(10^9\) (GB) | hard disk(s) |
massive | \(10^{12}\) (TB) | hard disk(s); RAID storage |
Source: IBM.
This course introduces some computing skills and software tools for handling potentially big public health data.
Read syllabus for a tentative list of topics and course logistics.
Huber, P. J. (1994). Huge data sets. In COMPSTAT 1994 (Vienna) (pp. 3–13). Heidelberg: Physica.
Huber, P. J. (1996). Massive data sets workshop: The morning after. In Massive data sets: Proceedings of a workshop (pp. 169–184). Washington: National Academy Press.