**Randomized Controlled Trial (RCT):** 1. Individuals (”units”) are randomly assigned to either a treatment group or a control group. 2. The treatment group receives the intervention being studied, while the control group receives no treatment or a placebo treatment. 3. The difference in outcomes between the two groups is then used to estimate the treatment effect. **Stratified Split:** - [[Law of Large Numbers|LLN]] ensures a fair distribution of attributes across groups as the sample size $n \to \infty$. - When sample size is limited, we can also enforce an equal distribution of an attribute, to neutralize to potential effects of an attribute on the treatment variable. - Especially for subgroup analysis, evaluating the smaller subgroups might be tricky, if they are not equally represented in both groups. **Variables of Interest:** | Variable | Description | Example | | -------------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Treatment Variable | The independent variable, what is hypothesized to have an *effect on the outcome*. | You are a smoker or not. | | Outcome Variable | The dependent variable *is affected by the treatment* variable and can be observed. | You have lung cancer or not. | | Confounding Variable | This variable *has an influence on both* the treatment and the outcome variable. | The age variable influences the treatment variable, as smokers tend to be rather older. Also it influences the outcome variable, as older people tend to be more affected by lung cancer. | | Common descendent | This variable *is influenced by both* the treatment and the outcome variable. | The binary variable if a patient is coughing is directly affected by being a smoker or by having lung cancer. | ![[experimental-design.jpeg|center|450]] **Dealing with Confounding Variables:** - *Stratification:* By balancing the confounding variable in both groups the effect of the confounder can be neutralized (e.g. when the age variable is a confounder, we make sure to have the same age distribution in both treatment and control group). - *Control variables:* When we use multivariate regression and add the age variable as an additional independent variable. The inclusion of the confounding variable will reduce the slope coefficient of the treatment variable. We check if the coefficient of the treatment variable is still significant. **Dealing with Common Descendent:** - If we include a common descendent into the [[Multivariate Linear Regression]], then it wrongly reduces the slope coefficient of the treatment variable, because $B$ correlates (but does not cause) with the outcome variable $Y$. Thus we should make sure not to include such variables.