
Simulate many datasets under candidate validation designs
Source:R/simulate_validatedData.R
simulate_validatedData.RdSimulate many datasets under candidate validation designs
Usage
simulate_validatedData(
n_datasets,
design_type = c("BySpecies", "FixedPercent"),
scenarios = NULL,
nsites = 100,
nspecies = 8,
nvisits = 3,
psi = runif(nspecies, 0.1, 0.9),
lambda = abs(rnorm(nspecies, 0, 5)),
theta = t(apply(diag(18, nrow = nspecies) + 2, 1, function(x) {
nimble::rdirch(alpha = x)
})),
confirmable_limits = NULL,
scen_expand = TRUE,
scen_df = NULL,
save_datasets = FALSE,
save_masked_datasets = FALSE,
directory = tempdir()
)Arguments
- n_datasets
The number of datasets you would like to have simulated. Each of these simulated datasets will be subjected to all candidate validation designs.
- design_type
Character string, either "BySpecies" for a stratified-by-species design, or "FixedPercentage" for a fixed effort design (see Oram et al., in review for more details on each of these)
- scenarios
if
design_type = "BySpecies", thescenariosargument must be a list with each entry corresponding to the potential levels of effort for a particular autoID label. Ifdesign_type == "FixedPercent", then thescenariosargument must be a vector with each entry corresponding to a potential percent of calls to be sampled from the first visit at each site. See vignette for an example.- nsites
number of sites in each dataset
- nspecies
size of the species assemblage
- nvisits
the number of visits to each site. Note that these simulations assume a balanced design.
- psi
a vector of length nspecies with the assumed occurrence probabilities for each species
- lambda
a vector of length nspecies with the assumed relative activity levels for each species. Make sure the order is correct and matches psi.
- theta
a matrix containing the (mis)classification probabilities. The rows of this matrix must sum to 1. See vignette for an example.
- confirmable_limits
A numeric vector containing the lower and and upper bounds on the site-visit probabilities that a recording can be validated ("confirmed").
- scen_expand
If
design_type = "BySpecies", shouldsimulate_validatedDataexpand the list ofscenarios? If TRUE (the default value), thenscenariosmust be a list; if FALSE, thensimulate_validatedDataexpects a user-supplied dataframe object through thescen_dfargument.- scen_df
If
scen_expand = FALSE, a user-supplied dataframe object with each row corresponding to the validation scenario and each column to the species. Default value is NULL.- save_datasets
logical. If TRUE, the datasets without any masking of true species labels (i.e., corresponding to complete validation of all recordings) will be saved. Default value is FALSE.
- save_masked_datasets
logical. If TRUE, the masked datasets (i.e., the simulated datasets with partial validation according to the simulation scenario) will be saved. This means that there will be n_datasets x nrow(scenarios_dataframe) datasets saved: one for each dataset under each validation scenario. Default value is FALSE.
- directory
character. Required if save_datasets = TRUE or save_masked_datasets = TRUE. This is where the datasets will be saved. By default, a temporary directory will be used. This must be changed if access to saved datasets is desired after the end of the R session, as tempdir() is cleared at the end of the session.
Value
A list containing three elements:
full_datasets: A list of length n_datasets with unmasked datasets (i.e., full validation of all recordings). Ifsave_datasets = TRUE, then these will be saved individually indirectoryas dataset_n.rds, where n is the dataset number.zeros: A list of length n_datasets containing all of the site-visits where no recordings of a certain classification were observed. For example, if, in dataset 10, there were no calls from species 1 that were classified as 3 on visit 4 to site 156, then the 10th entry of this list would contain a dataset with a row corresponding tosite = 156, visit = 4, true_spp = 1, id_spp = 3, with count = 0. These zeros are necessary for housekeeping in the model-fitting process. Ifsave_datasets = TRUE, the zeros for each each dataset will be saved indirectoryindividually as zeros_in_dataset_n.rds, where n is the dataset number.masked_dfs: A nested list containing each dataset masked under each scenario. masked_dfs\[\[9\]\]\[\[27\]\] contains dataset 27, assuming validation scenario 9. Ifsave_masked_datasets = TRUE, then each dataset/scenario scenario combination is saved individually indirectoryas dataset_n_masked_under_scenario_s.rds, where n is the dataset number and s is the scenario number.
Examples
psi <- c(0.3, 0.6)
lambda <- c(11, 2)
nspecies <- length(psi)
nsites <- 30
nvisits <- 5
test_theta1 <- matrix(c(0.9, 0.1, 0.15, 0.85), byrow = TRUE, nrow = 2)
val_scenarios <- list(spp1 = c(.75, .5), spp2 = .5)
fake_data <- simulate_validatedData(
n_datasets = 5,
design_type = "BySpecies",
scenarios = val_scenarios,
nsites = nsites,
nvisits = nvisits,
nspecies = nspecies,
psi = psi,
lambda = lambda,
theta = test_theta1,
save_datasets = FALSE,
save_masked_datasets = FALSE,
directory = tempdir()
)