## Sunday, July 20, 2014

### Mike's Big Data, Data Mining, and Analytics Tutorial

I've been looking for a good tutorial covering the topics of "Statistics," "Big Data," "Data Science," "Data Mining", and "Analytics" for a long time. Needless to say, I've found a lot of piecemeal information about the subjects out on the Internet, but I haven't seen anyone develop a good centralized tutorial for this information.

This tutorial is meant to provide a starting point for people who are interested in learning the topics and collected best practices from what I've learned over the past 11 years or so (Including introductory functions, statistics, trigonometry, pre-calculus, calculus, differential equations, linear algebra, intermediate and advanced applied statistics, data mining, machine learning, and analytics).

This tutorial is generally an "applied" tutorial (as opposed to a mathematical/theoretical statistics tutorial) and aims to help people become better at understanding statistics and performing analyses. Almost everywhere, there is a pervasive misuse of statistics and the only effective tool to fight it is knowledge.

Below are a list of topics that are either documented or are in-progress currently:
• Introductory Statistics
• Introduction To Data Classification
• Descriptive Statistics
Scale of DataLocationDispersionShape
Nominal or Higher
"Qualitative Data"
ModeRange (N)N/A
Ordinal or Higher
"Qualitative/Quantitative Data"
MedianRange (O)
Quantiles
Inter-Quartile Range
Five-Number Summary
N/A
Continuous (Interval/Ratio)
"Quantitative Data"
Weighted Average/Mean
Harmonic Mean
Geometric Mean

Skewness
Kurtosis
• Regression Models (fitting one or more models to a continuous dependent variable)
• Ordinal Models (fitting one or more models to an ordinal dependent variable)
• Data Scale Reduction: Ordinal or Multinomial/Polychotomous/Polytomous?
• Utilizing statistical methods involving ranks
• Classification Models (fitting one or more models to a qualitative/nominal dependent variable)
• Understanding the Null Classification Model
• Understanding Basic Logistic Regression
• Defining Big Data
• Selected Topics in Probability
• Selected Prerequisite Topics
• Selected Statistical Programming Topics
• An Overview of Statistical Software/Programming Tools
• MVPStats
• R
• SPSS
• Python
• Microsoft Excel
• The R Tutorial
• The SQL Tutorial
• Regression Models in SQL (fitting one or more models to a continuous dependent variable)