**Perspectives on Data Science:** - *Computational view:* Data is a large sequence of numbers that need to be processed by algorithms. - *Statistical view:* Data comes from a random process. We want to understand that process in order to make predictions and describe its driving factors. **Statistics vs. Probability:** - *Probability:* We assume we know the parameters of the distributions that generated the data. It is about understanding the likelihood of different outcomes, based on these parameters. - *Data:* It is the realizations coming from the data generation process combined with random noise. - *Statistics:* We observe the data and try to find the parameters of the distributions that reflect the unknown data generation process. ![[statistics-probability.png|center|500]] **Statistical Modeling:** Sometimes things are deterministic, but too complex to be understood. Statistical modeling takes a complicated process and defines it as a simple process plus random noise. Good modeling will try to explain as much as possible and minimize this unexplained remaining noise. **Modeling Assumptions:** - Identify [[Random Variable|random variables]] in the process to be modeled. - Assign probability [[Discrete Distributions.canvas|distributions]] for the r.v’s. - Make assumption about [[Independence of Random Variables]]