Jan 9, 2018
\(\DeclareMathOperator*{\argmin}{arg\,min}\)
This course (Biostat M280) is used as a placeholder for Biostat 203B: Introduction to Data Science, which is pending approval.
Statistics, the science of data analysis, is the applied mathematics in the 21st century.
Data is increasing in volume, velocity, and variety.
| Data Size | Bytes | Storage Mode |
|---|---|---|
| tiny | \(10^2\) | piece of paper |
| small | \(10^4\) | a few pieces of paper |
| medium | \(10^6\) (MB) | a floppy disk |
| large | \(10^8\) | hard disk |
| huge | \(10^9\) (GB) | hard disk(s) |
| massive | \(10^{12}\) (TB) | hard disk(s); RAID storage |
Source: IBM.
This course introduces some computing skills and software tools for handling potentially big public health data.
Read syllabus for a tentative list of topics and course logistics.
Huber, P. J. (1994). Huge data sets. In COMPSTAT 1994 (Vienna) (pp. 3–13). Heidelberg: Physica.
Huber, P. J. (1996). Massive data sets workshop: The morning after. In Massive data sets: Proceedings of a workshop (pp. 169–184). Washington: National Academy Press.