##### December 2, 2019

## Lecturer: Paul Williamson

## Descriptive Statistics

Variance = sum of squared errors Standard Deviation = square root of variance

## Principles of Table Design

- never present raw output
- rates are usually better than counts
- but show number of persons that equals 100%
- make tables interesting:
- concatenate
- compare subgroups

- shading scheme must be unambiguous for ranked categories
- only use pie charts in graph multiples
- be imaginative

- maximize data:ink
- provide a context
- table or graph – not both
- captions: above tables, below figures
- attribute contents

## Measurement Error & Missing Data

- Systematic bias.
- Random bias.

Either impute plausible values or delete missing values.

Reweighting. ‘Post-stratification’.

‘Conditional Independence Assumption’.

## Inferential Statistics and χ-Squared

Null Hypothesis.

P-Values.

Amrhein et al. (2019) Scientists rise up against statistical significance.

- Statistical Significance is an arbitrary threshold.
- Statistical Significance is not equivalent to ‘Importance’.
- ‘Effect size’ more important that ‘p-value’.
- Publish all findings (even if not statistically significant).

## Confidence Intervals

The Amazing Central Limits Theorem. (trivia: see also Alan Turing and the Central Limit Theorem)

Standard Error.

## Correlation and Regression

Best line minimizes sum of squared errors.

Visualizing correlation:

- for continuous, scatter plot
- for categorical, stacked percentage bar charts

## Logistic Regression

Mood, C. (2009). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It vs. Kuha, J. (2018). On Group Comparisons With Logistic Regression Models

## Model Diagnostics

- Meets Regression Assumptions?
- Statistically robust?
- Best model?

## Dangers of Area-Level Data Analysis

- Ecological Fallacy
- Modifiable Areal Unit Problem
- Scale Effects