Title: | Automates the Creation of New Statistical Analysis Projects |
---|---|
Description: | Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing. |
Authors: | Aleksandar Blagotic [ctb], Diego Valle-Jones [ctb], Jeffrey Breen [ctb], Joakim Lundborg [ctb], John Myles White [aut, cph], Josh Bode [ctb], Kenton White [ctb, cre], Kirill Mueller [ctb], Matteo Redaelli [ctb], Noah Lorang [ctb], Patrick Schalk [ctb], Dominik Schneider [ctb], Gerold Hepp [ctb], Zunaira Jamil [ctb], Glen Falk [ctb] |
Maintainer: | Kenton White <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.11.0 |
Built: | 2024-10-29 05:17:47 UTC |
Source: | https://github.com/kentonwhite/projecttemplate |
This function will associate an extension with a custom reader function.
.add.extension(extension, reader)
.add.extension(extension, reader)
extension |
The extension of the new data file. |
reader |
The function to use when reading the data file. It should
accept three arguments: |
No value is returned; this function is called for its side effects.
This interface should not be considered as stable and is likely to be replaced by a different mechanism in a forthcoming version of this package.
## Not run: .add.extension('foo', foo.reader)
## Not run: .add.extension('foo', foo.reader)
Enables project specific configuration to be added to the global config object. The
allowable format is key value pairs which are appended to the end of the config
object, which is accessible from the global environment.
add.config(..., apply.override = FALSE)
add.config(..., apply.override = FALSE)
... |
A series of key-value pairs containing the configuration. The key is the
name that gets added to the config object. These can be overridden at load
time through the |
apply.override |
A boolean indicating whether overrides should be applied. This
can be used to add a setting disregarding arguments to |
Once defined, the value can be accessed from any ProjectTemplate
script by
referencing config$my_project_var
.
library('ProjectTemplate') ## Not run: add.config( keep_bigdata=TRUE, # Whether to keep the big data file in memory parse=7 # number of fields to parse ) if (config$keep_bigdata) ... ## End(Not run)
library('ProjectTemplate') ## Not run: add.config( keep_bigdata=TRUE, # Whether to keep the big data file in memory parse=7 # number of fields to parse ) if (config$keep_bigdata) ... ## End(Not run)
This function will store a copy of the named data set in the cache
directory. This cached copy of the data set will then be given precedence
at load time when calling load.project
. Cached data sets are
stored as .RData
or optionally as .qs
files.
cache(variable = NULL, CODE = NULL, depends = NULL, ...)
cache(variable = NULL, CODE = NULL, depends = NULL, ...)
variable |
A character string containing the name of the variable to be saved. If the CODE parameter is defined, it is evaluated and saved, otherwise the variable with that name in the global environment is used. |
CODE |
A sequence of R statements enclosed in |
depends |
A character vector of other global environment objects that the CODE depends upon. Caching will be forced if those objects have changed since last caching |
... |
Additional arguments passed on to |
Usually you will want to cache datasets during munging. This can be the raw
data just loaded, or it can be the result of further processing during munge. Either
way, it can take a while to cache large variables, so cache will only cache when it
needs to.
The clear.cache("variable")
command
can be run to flush individual items from the cache.
Calling cache()
with no arguments returns the current status of the cache.
No value is returned; this function is called for its side effects.
library('ProjectTemplate') ## Not run: create.project('tmp-project') setwd('tmp-project') dataset1 <- 1:5 cache('dataset1') setwd('..') unlink('tmp-project') ## End(Not run)
library('ProjectTemplate') ## Not run: create.project('tmp-project') setwd('tmp-project') dataset1 <- 1:5 cache('dataset1') setwd('..') unlink('tmp-project') ## End(Not run)
This function will cache all of the data sets that were loaded by
the load.project
function in a binary format that is
easier to load quickly. This is particularly useful for data sets
that you've modified during a slow munging process that does not
need to be repeated.
cache.project()
cache.project()
No value is returned; this function is called for its side effects.
create.project
, load.project
,
get.project
, show.project
library('ProjectTemplate') ## Not run: load.project() cache.project() ## End(Not run)
library('ProjectTemplate') ## Not run: load.project() cache.project() ## End(Not run)
This function removes specific (or all by default) named objects from the global
environment. If used within a ProjectTemplate
project, then any variables
defined in the config$sticky_variables
will remain.
clear(..., keep = c(), force = FALSE)
clear(..., keep = c(), force = FALSE)
... |
A sequence of character strings of the objects to
be removed from the global environment. If none given, then all items except
those in |
keep |
A character vector of variables that should remain in the global environment |
force |
If |
The variables kept and removed are reported
library('ProjectTemplate') ## Not run: clear("x", "y", "z") clear(keep="a") clear() ## End(Not run)
library('ProjectTemplate') ## Not run: clear("x", "y", "z") clear(keep="a") clear() ## End(Not run)
This function remove specific (or all by default) named data sets from the cache
directory. This will force that data to be read in from the data
directory
next time load.project
is called.
clear.cache(...)
clear.cache(...)
... |
A sequence of character strings of the variables to be removed from the cache. If none given, then all items in the cache will be removed. |
Success or failure is reported
library('ProjectTemplate') ## Not run: clear.cache("x", "y", "z") ## End(Not run)
library('ProjectTemplate') ## Not run: clear.cache("x", "y", "z") ## End(Not run)
This function will create all of the scaffolding for a new project.
It will set up all of the relevant directories and their initial
contents. For those who only want the minimal functionality, the
template
argument can be set to minimal
to create a subset of
ProjectTemplate's default directories. For those who want to dump
all of ProjectTemplate's functionality into a directory for extensive
customization, the dump
argument can be set to TRUE
.
create.project( project.name = "new-project", template = "full", dump = FALSE, merge.strategy = c("require.empty", "allow.non.conflict"), rstudio.project = FALSE )
create.project( project.name = "new-project", template = "full", dump = FALSE, merge.strategy = c("require.empty", "allow.non.conflict"), rstudio.project = FALSE )
project.name |
A character vector containing the name for this new project. Must be a valid directory name for your file system. |
template |
A character vector containing the name of the template to
use for this project. By default a |
dump |
A boolean value indicating whether the entire functionality of ProjectTemplate should be written out to flat files in the current project. |
merge.strategy |
What should happen if the target directory exists and
is not empty?
If |
rstudio.project |
A boolean value indicating whether the project should
also be an 'RStudio Project'. Defaults to |
If the target directory does not exist, it is created. Otherwise, it can only contain files and directories allowed by the merge strategy.
No value is returned; this function is called for its side effects.
load.project
, get.project
,
cache.project
, show.project
library('ProjectTemplate') ## Not run: create.project('MyProject')
library('ProjectTemplate') ## Not run: create.project('MyProject')
This function writes a skeleton directory structure for creating your own custom templates.
create.template(target, source = "minimal")
create.template(target, source = "minimal")
target |
Name of the new template. It is created under the directory
specified by |
source |
Name of an existing template to copy, defaults to the built in 'minimal' template. |
This function will return all of the information that ProjectTemplate has
about the current project. This information is gathered when
load.project
is called. At present, ProjectTemplate keeps a
record of the project's configuration settings, all packages that were loaded
automatically and all of the data sets that were loaded automatically. The
information about autoloaded data sets is used by the
cache.project
function.
get.project()
get.project()
In previous releases this information has been available through the
global variable project.info
. Using this variable is now deprecated
and will result in a warning.
A named list.
create.project
, load.project
,
cache.project
, show.project
library('ProjectTemplate') ## Not run: load.project() get.project() ## End(Not run)
library('ProjectTemplate') ## Not run: load.project() get.project() ## End(Not run)
This function produces a data.frame of all data files in the project, with
meta data on if and how the file will be loaded by load.project
.
list.data(...)
list.data(...)
... |
Named arguments to override configuration from
|
The returned data.frame contains the following variables, with one
observation per file in data/
:
filename |
Character variable containing the filename relative
to data/ directory. |
varname |
Character variable containing the name of the variable into which the file will be imported. * |
is_ignored |
Logical variable that indicates whether the file.
is ignored through the data_ignore option in the configuration |
is_directory |
Logical variable that indicates whether the file is a directory. |
is_cached |
Logical variable that indicates whether the file is
already available in the cache/ directory. |
cached_only |
Logical variable that indicates whether the
variable is only available in the cache/ directory. This occurs
when calling the cache function with a code fragment in a munge script.
|
reader |
Character variable containing the name of the reader
function that will be used to load the data. Contains a
character(0) if no suitable reader was found.
|
* Note that some readers return more than one variable, usually with the
listed variable name as prefix. This is true for for example the
xls.reader
and xlsx.reader
.
A data.frame listing the available data, with relevant meta data
load.project
, show.project
,
project.config
library('ProjectTemplate') ## Not run: list.data()
library('ProjectTemplate') ## Not run: list.data()
This function automatically load all of the data and packages used by
the project from which it is called. The behavior can be controlled by
adjusting the project.config
configuration.
load.project(...)
load.project(...)
... |
Named arguments to override configuration from |
...
can take an argument override.config or a single named
list for backward compatibility. This cannot be mixed with the new style
override. When a named argument override.config is present it takes
precedence over the other options. If any of the provided arguments is
unnamed an error is raised.
No value is returned; this function is called for its side effects.
create.project
, get.project
,
cache.project
, show.project
, project.config
library('ProjectTemplate') ## Not run: load.project()
library('ProjectTemplate') ## Not run: load.project()
This function automatically performs all necessary steps to migrate an existing project so that it is compatible with this version of ProjectTemplate
migrate.project()
migrate.project()
No value is returned; this function is called for its side effects.
library('ProjectTemplate') ## Not run: migrate.project()
library('ProjectTemplate') ## Not run: migrate.project()
This function updates a skeleton project to the current version of ProjectTemplate.
migrate.template(template)
migrate.template(template)
template |
Name of the template to upgrade. |
Every ProjectTemplate
project has a configuration file found at
config/global.dcf
that contains various options that can be tweaked
to control runtime behavior. The valid options are shown below, and must
be encoded using the DCF
format.
project.config()
project.config()
Calling the project.config()
function will display the current project
configuration.
The options that can be configured in the config/global.dcf
are
shown below
data_loading |
This can be set to TRUE or FALSE. If data_loading is on, the system will load data from both the cache and data directories with cache taking precedence in the case of name conflict. |
data_loading_header |
This can be set to TRUE or FALSE. If data_loading_header is on, the system will load text data files, such as CSV, TSV, or XLSX, treating the first row as header. |
data_ignore |
A comma separated list of files to be ignored when importing
from the data/ directory. Regular expressions can be used but should be delimited
(on both sides) by / . Note that filenames and filepaths should never begin with
a / , entire directories under data/ can be ignored by adding a trailing / . |
cache_loading |
This can be set to TRUE or FALSE. If cache_loading is on, the system will load data from the cache directory before any attempt to load from the data directory. |
recursive_loading |
This can be set to TRUE or FALSE. If recursive_loading is on, the system will load data from the data directory and all its sub directories recursively. |
munging |
This can be set to TRUE or FALSE. If munging is on, the system will execute the files in the munge directory sequentially using the order implied by the sort() function. If munging is FALSE, none of the files in the munge directory will be executed. |
logging |
This can be set to TRUE or FALSE. If logging is on, a logger object using the log4r package is automatically created when you run load.project(). This logger will write to the logs directory. |
logging_level |
The value of logging_level is passed to a logger object using the log4r package during logging when when you run load.project(). |
load_libraries |
This can be set to TRUE or FALSE. If load_libraries is on, the system will load all of the R packages listed in the libraries field described below. |
libraries |
This is a comma separated list of all the R packages that the user wants to automatically load when load.project() is called. These packages must already be installed before calling load.project(). |
as_factors |
This can be set to TRUE or FALSE. If as_factors is on, the system will convert every character vector into a factor when creating data frames; most importantly, this automatic conversion occurs when reading in data automatically. If FALSE, character vectors will remain character vectors. |
tables_type |
This is the format for default tables. Values can be 'tibble' (default), 'data_table', or 'data_frame' |
attach_internal_libraries |
This can be set to TRUE or FALSE. If attach_internal_libraries is on, then every time a new package is loaded into memory during load.project() a warning will be displayed informing that has happened. |
cache_loaded_data |
This can be set to TRUE or FALSE. If cache_loaded_data is on, then data loaded from the data directory during load.project() will be automatically cached (so it won't need to be reloaded next time load.project() is called). |
sticky_variables |
This is a comma separated list of any project-specific
variables that should remain in the global environment after a clear() command.
This can be used to clear the global environment, but keep any large datasets in
place so they are not unnecessarily re-generated during load.project() .
Note that any this will be over-ridden if the force=TRUE parameter is passed
to clear() `. |
underscore_variables |
This can be set to TRUE to use
underscores ('_') in variable names or FALSE to replace underscores
('_') with dots ('.'). The default is TRUE . When migrating old
projects, underscore_variables is set to FALSE . |
cache_file_format |
The default file format for cached data is 'RData'. This can be set to 'qs' in order to benefit from the quick serialization of R objects provided by qs. |
If the config/globals.dcf
is missing some items (for example because it was created under an
old version of ProjectTemplate
, then the following configuration is used for any missing items
during load.project()
:
data_loading |
TRUE |
data_loading_header |
TRUE |
data_ignore |
|
cache_loading |
TRUE |
recursive_loading |
FALSE |
munging |
TRUE |
logging |
FALSE |
logging_level |
INFO |
load_libraries |
FALSE |
libraries |
reshape2, plyr, tidyverse, stringr, lubridate |
as_factors |
FALSE |
tables_type |
tibble |
attach_internal_libraries |
TRUE |
cache_loaded_data |
FALSE |
sticky_variables |
NONE |
underscore_variables |
FALSE |
cache_file_format |
RData |
When a new project is created using create.project()
, the following values are pre-populated:
version |
0.11.0 |
data_loading |
TRUE |
data_loading_header |
TRUE |
data_ignore |
|
cache_loading |
TRUE |
recursive_loading |
FALSE |
munging |
TRUE |
logging |
FALSE |
logging_level |
INFO |
load_libraries |
FALSE |
libraries |
reshape2, plyr, tidyverse, stringr, lubridate |
as_factors |
FALSE |
tables_type |
tibble |
attach_internal_libraries |
FALSE |
cache_loaded_data |
TRUE |
sticky_variables |
NONE |
underscore_variables |
TRUE |
cache_file_format |
RData |
The current project configuration is displayed.
This function will clear the global environment and reload a project. This is
useful when you've updated your data sets or changed your preprocessing scripts.
Any sticky_variables
configuration parameter in project.config
will remain both in memory and (if present) in the cache by default. If the reset
parameter is TRUE
, then all variables are cleared from both the global
environment and the cache.
reload.project(..., reset = FALSE)
reload.project(..., reset = FALSE)
... |
Optional parameters passed to |
reset |
A boolean value, which if set |
No value is returned; this function is called for its side effects.
library('ProjectTemplate') ## Not run: load.project() reload.project() ## End(Not run)
library('ProjectTemplate') ## Not run: load.project() reload.project() ## End(Not run)
This functions will require the given package. If the package is not installed it will stop execution and print a message to the user instructing them which package to install and which function caused the error.
require.package(package.name, attach = TRUE)
require.package(package.name, attach = TRUE)
package.name |
A character vector containing the package name. Must be a valid package name installed on the system. |
attach |
Should the package be attached to the search path (as with
|
The function .require.package
is called by internal code. It will
attach the package to the search path (with a warning) only if the
compatibility configuration attach_internal_libraries
is set to
TRUE
. Normally, packages used for loading data are not
needed on the search path, but not loading them might break existing code.
In a forthcoming version this compatibility setting will be removed,
and no packages will be attached to the search path by internal code.
No value is returned; this function is called for its side effects.
library('ProjectTemplate') ## Not run: require.package('PackageName')
library('ProjectTemplate') ## Not run: require.package('PackageName')
src
directory.This function will run each of the analyses in the src
directory in separate processes. At present, this is done serially, but
future versions of this function will provide a means of running
the analyses in parallel.
run.project()
run.project()
No value is returned; this function is called for its side effects.
library('ProjectTemplate') ## Not run: run.project()
library('ProjectTemplate') ## Not run: run.project()
This function will show the user all of the information that
ProjectTemplate has about the current project. This information is
gathered when load.project
is called. At present,
ProjectTemplate keeps a record of the project's configuration settings,
all packages that were loaded automatically and all of the data sets that
were loaded automatically. The information about autoloaded data sets
is used by the cache.project
function.
show.project()
show.project()
No value is returned; this function is called for its side effects.
create.project
, load.project
,
get.project
, cache.project
library('ProjectTemplate') ## Not run: load.project() show.project() ## End(Not run)
library('ProjectTemplate') ## Not run: load.project() show.project() ## End(Not run)
This function will parse all of the functions defined in files inside
of the lib
directory and will generate a trivial unit test for
each function. The resulting tests are stored in the file
tests/autogenerated.R
. Every test is excepted to fail by default,
so you should edit them before calling test.project
.
stub.tests()
stub.tests()
No value is returned; this function is called for its side effects.
library('ProjectTemplate') ## Not run: stub.tests()
library('ProjectTemplate') ## Not run: stub.tests()
This function will run all of the testthat
style unit tests
for the current project that are defined inside of the tests
directory. The tests will be run in the order defined by the filenames
for the tests: it is recommend that each test begin with a number
specifying its position in the sequence.
test.project()
test.project()
No value is returned; this function is called for its side effects.
library('ProjectTemplate') ## Not run: load.project() test.project() ## End(Not run)
library('ProjectTemplate') ## Not run: load.project() test.project() ## End(Not run)
This function will read a DCF file and translate the resulting data frame into a list. The DCF format is used throughout ProjectTemplate for configuration settings and ad hoc file format specifications.
translate.dcf(filename)
translate.dcf(filename)
filename |
A character vector specifying the DCF file to be translated. |
The content of the DCF file are stored as character strings. If the content is placed between the back tick character , then the content is evaluated as R code and the result returned in a string
Returns a list containing the entries from the DCF file.
library('ProjectTemplate') ## Not run: translate.dcf(file.path('config', 'global.dcf'))
library('ProjectTemplate') ## Not run: translate.dcf(file.path('config', 'global.dcf'))