% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/doc-data-masking.R
\name{args_data_masking}
\alias{args_data_masking}
\title{Argument type: data-masking}
\description{
This page describes the \verb{<data-masking>} argument modifier which
indicates that the argument uses tidy evaluation with \strong{data masking}.
If you've never heard of tidy evaluation before, start with
\code{vignette("programming", package = "dplyr")}.
}
\section{Key terms}{
The primary motivation for tidy evaluation in tidyverse packages is that it
provides \strong{data masking}, which blurs the distinction between two types of
variables:
\itemize{
\item \strong{env-variables} are "programming" variables and live in an environment.
They are usually created with \verb{<-}. Env-variables can be any type of R
object.
\item \strong{data-variables} are "statistical" variables and live in a data frame.
They usually come from data files (e.g. \code{.csv}, \code{.xls}), or are created by
manipulating existing variables. Data-variables live inside data frames,
so must be vectors.
}
}

\section{General usage}{
Data masking allows you to refer to variables in the "current" data frame
(usually supplied in the \code{.data} argument), without any other prefix.
It's what allows you to type (e.g.) \code{filter(diamonds, x == 0 & y == 0 & z == 0)}
instead of \code{diamonds[diamonds$x == 0 & diamonds$y == 0 & diamonds$z == 0, ]}.
}

\section{Indirection}{
The main challenge of data masking arises when you introduce some
indirection, i.e. instead of directly typing the name of a variable you
want to supply it in a function argument or character vector.

There are two main cases:
\itemize{
\item If you want the user to supply the variable (or function of variables)
in a function argument, embrace the argument, e.g. \code{filter(df, {{ var }})}.

\if{html}{\out{<div class="sourceCode">}}\preformatted{dist_summary <- function(df, var) \{
  df \%>\%
    summarise(n = n(), min = min(\{\{ var \}\}), max = max(\{\{ var \}\}))
\}
mtcars \%>\% dist_summary(mpg)
mtcars \%>\% group_by(cyl) \%>\% dist_summary(mpg)
}\if{html}{\out{</div>}}
\item If you have the column name as a character vector, use the \code{.data}
pronoun, e.g. \code{summarise(df, mean = mean(.data[[var]]))}.

\if{html}{\out{<div class="sourceCode">}}\preformatted{for (var in names(mtcars)) \{
  mtcars \%>\% count(.data[[var]]) \%>\% print()
\}

lapply(names(mtcars), function(var) mtcars \%>\% count(.data[[var]]))
}\if{html}{\out{</div>}}

(Note that the contents of \code{[[}, e.g. \code{var} above, is never evaluated
in the data environment so you don't need to worry about a data-variable
called \code{var} causing problems.)
}
}

\section{Dot-dot-dot (...)}{
When this modifier is applied to \code{...}, there is one other useful technique
which solves the problem of creating a new variable with a name supplied by
the user. Use the interpolation syntax from the glue package: \code{"{var}" := expression}. (Note the use of \verb{:=} instead of \code{=} to enable this syntax).

\if{html}{\out{<div class="sourceCode">}}\preformatted{var_name <- "l100km"
mtcars \%>\% mutate("\{var_name\}" := 235 / mpg)
}\if{html}{\out{</div>}}

Note that \code{...} automatically provides indirection, so you can use it as is
(i.e. without embracing) inside a function:

\if{html}{\out{<div class="sourceCode">}}\preformatted{grouped_mean <- function(df, var, ...) \{
  df \%>\%
    group_by(...) \%>\%
    summarise(mean = mean(\{\{ var \}\}))
\}
}\if{html}{\out{</div>}}
}

\seealso{
\itemize{
\item \ifelse{html}{\link[=topic-data-mask]{What is data-masking and why do I need \{\{?}}{\link[=topic-data-mask]{What is data-masking and why do I need curly-curly?}}.
\item \ifelse{html}{\link[=topic-data-mask-programming]{Data mask programming patterns}}{\link[=topic-data-mask-programming]{Data mask programming patterns}}.
}
}
\keyword{internal}
