# Course: Social Survey Analysis

R Social Surveys Statistics Regression ENVS450

## Descriptive Statistics

Variance = sum of squared errors Standard Deviation = square root of variance

## Principles of Table Design

• never present raw output
• rates are usually better than counts
• but show number of persons that equals 100%
• make tables interesting:
• concatenate
• compare subgroups
• shading scheme must be unambiguous for ranked categories
• only use pie charts in graph multiples
• be imaginative  • maximize data:ink
• provide a context
• table or graph – not both
• captions: above tables, below figures
• attribute contents

## Measurement Error & Missing Data

• Systematic bias.
• Random bias.

Either impute plausible values or delete missing values.

Reweighting. ‘Post-stratification’.

‘Conditional Independence Assumption’.

## Inferential Statistics and χ-Squared

Karl Popper, Falsification

Null Hypothesis.

P-Values.

Amrhein et al. (2019) Scientists rise up against statistical significance.

• Statistical Significance is an arbitrary threshold.
• Statistical Significance is not equivalent to ‘Importance’.
• ‘Effect size’ more important that ‘p-value’.
• Publish all findings (even if not statistically significant).

Standard Error.

## Correlation and Regression

Best line minimizes sum of squared errors.

Visualizing correlation:

• for continuous, scatter plot
• for categorical, stacked percentage bar charts

## Model Diagnostics

• Meets Regression Assumptions?
• Statistically robust?
• Best model?

## Dangers of Area-Level Data Analysis

• Ecological Fallacy
• Modifiable Areal Unit Problem
• Scale Effects