**Perspectives on Data Science:**
- *Computational view:* Data is a large sequence of numbers that need to be processed by algorithms.
- *Statistical view:* Data comes from a random process. We want to understand that process in order to make predictions and describe its driving factors.
**Statistics vs. Probability:**
- *Probability:* We assume we know the parameters of the distributions that generated the data. It is about understanding the likelihood of different outcomes, based on these parameters.
- *Data:* It is the realizations coming from the data generation process combined with random noise.
- *Statistics:* We observe the data and try to find the parameters of the distributions that reflect the unknown data generation process.
![[statistics-probability.png|center|500]]
**Statistical Modeling:**
Sometimes things are deterministic, but too complex to be understood. Statistical modeling takes a complicated process and defines it as a simple process plus random noise. Good modeling will try to explain as much as possible and minimize this unexplained remaining noise.
**Modeling Assumptions:**
- Identify [[Random Variable|random variables]] in the process to be modeled.
- Assign probability [[Discrete Distributions.canvas|distributions]] for the r.v’s.
- Make assumption about [[Independence of Random Variables]]