The function performNB
Calculates the posterior probabilities of a dichotomous class
variable given a set of covariates using Bayes rule.
performNB(training, prediction, obsIDVar, goldStdVar, covariates, l = 1)
The training dataset name.
The prediction dataset name.
The variable name (in quotes) of the observation ID variable.
The variable name (in quotes) of the outcome in the training dataset
(needs to be a logical variable with value TRUE
for observations with
the outcome of interest.)
A character vector containing the covariate variable names. All covariates need to be categorical factor variables.
Laplace smoothing parameter that is added to each cell (a value of 0 indicates no smoothing).
List containing two dataframes:
probabilities
- a dataframe combining training
and prediction
with predictied probabilities for the prediction
dataframe. Column names:
<obsIDVar>
- the observation ID with the name specified
p
- the probability that <goldStdVar> = TRUE
for observations in the
prediction
dataset.
estimates
- a dataframe with the effect estimates derived from the training dataset.
Column names:
level
- the covariate name and level
est
- the log odds ratio for this covariate and level
se
- the standard error of the log odds ratio
The main purpose of this function is to be used by nbProbabilities
to
estimate the relative transmission probability between individuals in an infectious
disease outbreak. However, it can be used more generally to estimate the probability
of any dichotomous outcome given a set of categorical covariates.
The function needs a training dataset with the outcome variable (goldStdVar
)
which is TRUE
for those who have the value of interest and FALSE
for those who do not. The probability of having the outcome
(<goldStdVar> = TRUE
) is predicted in the prediction dataset.
## Use iris dataset and predict if a flower is of the specices "virginica".
data(iris)
irisNew <- iris
## Creating an id variable
irisNew$id <- seq(1:nrow(irisNew))
## Creating logical variable indicating if the flower is of the species virginica
irisNew$spVirginica <- irisNew$Species == "virginica"
## Creating categorical/factor versions of the covariates
irisNew$Sepal.Length.Cat <- factor(cut(irisNew$Sepal.Length, c(0, 5, 6, 7, Inf)),
labels = c("<=5.0", "5.1-6.0", "6.1-7.0", "7.1+"))
irisNew$Sepal.Width.Cat <- factor(cut(irisNew$Sepal.Width, c(0, 2.5, 3, 3.5, Inf)),
labels = c("<=2.5", "2.6-3.0", "3.1-3.5", "3.6+"))
irisNew$Petal.Length.Cat <- factor(cut(irisNew$Petal.Length, c(0, 2, 4, 6, Inf)),
labels = c("<=2.0", "2.1-4.0", "4.1-6.0", "6.0+"))
irisNew$Petal.Width.Cat <- factor(cut(irisNew$Petal.Width, c(0, 1, 2, Inf)),
labels = c("<=1.0", "1.1-2.0", "2.1+"))
## Using NB to predict if the species is virginica
## (training and predicting on same dataset)
pred <- performNB(irisNew, irisNew, obsIDVar = "id",
goldStdVar = "spVirginica",
covariates = c("Sepal.Length.Cat", "Sepal.Width.Cat",
"Petal.Length.Cat", "Petal.Width.Cat"), l = 1)
irisResults <- merge(irisNew, pred$probabilities, by = "id")
tapply(irisResults$p, irisResults$Species, summary)
#> $setosa
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.00003 0.00006 0.00010 0.00010 0.00012 0.00018 50
#>
#> $versicolor
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.00018 0.03570 0.43028 0.43323 0.76622 0.85702 50
#>
#> $virginica
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.1180 0.8570 0.9920 0.8474 0.9956 0.9999 50
#>