modelbased/vignettes/practical_comparison.Rmd at f83c169f72b44ad4fe1f02a0e9b53448b75ecc92 · easystats/modelbased · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Case Study: Comparison of R packages related to predictions, marginal means and effects"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Case Study: Comparison of R packages related to predictions, marginal means and effects}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: bibliography.bib
---

```{r set-options, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dev = "png",
  out.width = "100%",
  dpi = 300,
  message = FALSE,
  warning = FALSE,
  package.startup.message = FALSE
)

pkgs <- c("emmeans", "marginaleffects", "ggeffects")
if (!all(insight::check_if_installed(pkgs, quietly = TRUE))) {
  knitr::opts_chunk$set(eval = FALSE)
}
if (getRversion() < "4.1.0") {
  knitr::opts_chunk$set(eval = FALSE)
}
```

This vignette compares the **modelbased** package to other common packages that can be used to compute adjusted predictions, marginal means, marginal effects, or contrasts and pairwise comparisons.

**modelbased** is built on top of the two probably most popular R packages for extracting marginal means and effects, namely **emmeans** [@russell2024emmeans] and **marginaleffects** [@arel2024interpret]. Thus, you obtain the same results either from **modelbased** or one of the other two packages. This vignette shows how to replicate results using the different packages.

```{r}
# to create data grids, we use `insight::get_datagrid()`
library(insight)

# the four packages, which we compare
library(modelbased)
library(emmeans)
library(marginaleffects)
library(ggeffects)
```

The package design is built around following three questions:

1. Predictor of Interest: Which variable's effect on the outcome do you want to analyze? This is specified with the `by`, `contrast`, or `slope` arguments.

2. Evaluation Points: At which specific values should the predictor be evaluated? This can also be defined in the `by` argument; additionally, you have the arguments `length` and `range` especially for continuous predictors. For a more refined control over the evaluation points, see the [data grids](https://easystats.github.io/insight/reference/get_datagrid.html) vignette.

3. Target Population: What population should the inferences generalize to? The `estimate` argument controls this by defining whether predictions are for a typical individual, an average of the sample, or an average of a broader population.

# Estimated marginal means

We start with the default `estimate` option (`"typical"`), which is the same as if we were estimating marginal means using the *emmeans* package.

## Categorical predictors

```{r}
# a very simple model
data(iris)
model <- lm(Petal.Length ~ Species, data = iris)

# modelbased
estimate_means(model, by = "Species")

# emmeans
emmeans(model, "Species")

# marginaleffects
avg_predictions(model, by = "Species")

# ggeffects
predict_response(model, "Species")
```

## Continuous predictors

```{r}
# a very simple model
data(iris)
model <- lm(Petal.Length ~ Sepal.Length, data = iris)

# create a range of representative values
grid <- get_datagrid(model, by = "Sepal.Length")
grid

# modelbased - defaults to create a range of 10 values from
# minimum to maximum for numeric focal predictors
estimate_means(model, by = "Sepal.Length")

# emmeans
emmeans(
  model,
  "Sepal.Length",
  at = list(Sepal.Length = grid$Sepal.Length)
)

# marginaleffects
avg_predictions(
  model,
  by = "Sepal.Length",
  newdata = data.frame(Sepal.Length = grid$Sepal.Length)
)

# ggeffects
predict_response(model, "Sepal.Length [4.3:7.9 by=0.4]")
```

## Interaction between continuous and categorical

```{r}
# a very simple model
data(iris)
model <- lm(Petal.Length ~ Sepal.Length * Species, data = iris)

# create a range of representative values
grid <- get_datagrid(
  model,
  by = c("Species", "Sepal.Length"),
  range = "grid",
  preserve_range = FALSE
)
grid

# modelbased
estimate_means(model, by = c("Species", "Sepal.Length"), range = "grid")

# alternative notation - for "Sepal.Length", we want mean and +/- SD
estimate_means(model, by = c("Species", "Sepal.Length = [meansd]"))

# we could also pass a data grid to the `newdata` argument...
estimate_means(model, by = c("Species", "Sepal.Length"), newdata = grid)

# emmeans
emmeans(
  model,
  c("Species", "Sepal.Length"),
  at = lapply(grid, unique)
)

# marginaleffects
avg_predictions(
  model,
  by = c("Species", "Sepal.Length"),
  newdata = grid
)

# ggeffects
predict_response(model, c("Species", "Sepal.Length"))
```

# References