The function indToPair
takes a dataset of observations (such as individuals in an infectious
disease outbreak) and transforms it into a dataset of pairs.
indToPair(
indData,
indIDVar,
separator = "_",
dateVar = NULL,
units = c("mins", "hours", "days", "weeks"),
ordered = FALSE
)
An individual-level dataframe.
The name (in quotes) of the column with the individual ID.
The character to be used to separate the individual IDs when creating the pairID.
The name (in quotes) of the column with the dates that the individuals are observed
(optional unless ordered = TRUE
). This column must be a date or date-time (POSIXt) object.
If supplied, the time difference between individuals will be calculated in the units specified.
The units for the time difference, only necessary if dateVar
is supplied.
Must be one of "mins", "hours", "days", "weeks"
.
A logical indicating if a set of ordered pairs should be returned
(<dateVar>.1
before <dateVar>.2
or <dateVar>.1
= <dateVar>.2
).
If FALSE a dataframe of all pairs will be returned
A dataframe of either all possible pairs of individuals (ordered = FALSE
) or ordered
pairs of individuals (ordered = TRUE
). The dataframe will have all of the original variables
with suffixes ".1" and ".2" corresponding to the original values of
<indIDVar>.1
and <indIDVar>.2
.
Added to the dataframe will be a column called pairID
which is <indIDVar>.1
and <indIDVar>.2
separated by separator
.
If dateVar is provided the dataframe will also include variables <dateVar>.Diff
giving
the difference of time of dateVar
for <indIDVar>.1
and <indIDVar>.2
in the units specified
The function requires an id column: indIDVar
to identify the individual observations.
The resulting pair-level dataframe will have a pairID
column which combines the individual IDs
for that pair.
The function can either output all possible pairs (ordered = FALSE
) or only ordered pairs
(ordered = TRUE
) where the ordered is determined by a date variable (dateVar
).
If orded = TRUE
, then dateVar
must be provided and if ordered = FALSE
,
it is optional. In both cases, if dateVar
is provided, the output will include the time
difference between the individuals in the pair in the units
specified ("mins", "hours", "days", "weeks").
## Create a dataset of all pairs with no date variable
pairU <- indToPair(indData = indData, indIDVar = "individualID")
## Create a dataset of all pairs with a date variable
pairUD <- indToPair(indData = indData, indIDVar = "individualID",
dateVar = "infectionDate", units = "days")
## Create a dataset of ordered pairs
pairO <- indToPair(indData = indData, indIDVar = "individualID",
dateVar = "infectionDate", units = "days", ordered = TRUE)