Statistics and data science

Classification of data sets by Huber (1994); Huber (1996)

Data Size Bytes Storage Mode
tiny \(10^2\) piece of paper
small \(10^4\) a few pieces of paper
medium \(10^6\) (MB) a floppy disk
large \(10^8\) hard disk
huge \(10^9\) (GB) hard disk(s)
massive \(10^{12}\) (TB) hard disk(s); RAID storage

Four V’s of big data

Source: IBM.

Course desciption

References

Huber, P. J. (1994). Huge data sets. In COMPSTAT 1994 (Vienna) (pp. 3–13). Heidelberg: Physica.

Huber, P. J. (1996). Massive data sets workshop: The morning after. In Massive data sets: Proceedings of a workshop (pp. 169–184). Washington: National Academy Press.