The datscience (dataanalysis and science) R-package contains functions, which are frequently required in the process of preparing data for publication. Among those are:

The overall goal was to improve the workflow of data analysis and help with formatting challenges I encountered in the preparation of submission to scientific journals. For example: getting stats from R with the right format into MS Word.

### Installation

You can install the latest released version of datscience easily, directly from GitHub with:

# Normal intsallation
install.packages("devtools")
devtools::install_github("Buedenbender/datscience")

My recommendation would be to use a the pacman package manger instead, as this installs the latest version from github and directly loads it

# Recommendation: pacman

#### Installation Troubleshoot

Some users might encounter: (System Error 267 @win/processx.c:1040), which is due to special characters in the username and the path to the directories (e.g., ö or é). In this case, you can try to install datscience with the remotes package in standalone mode in a fresh R-sessions (no packages loaded) (see below and for reference this stackoverflow posting).

Sys.setenv(R_REMOTES_STANDALONE="true")
remotes::install_github("Buedenbender/datscience")

### A Teaser of datscience Functionality: flex_table1()

While R provides so many opportunities and power to conduct whatever analyses one can imagine, I found myself often having difficulties with the transfer of the analysis or the results from R session into MS Word.

Example Problem: Get a nicely formatted (in accordance with APA 7th publication manual) sociodemographic table 1 directly into a Word file (*.docx).

As of March 2022 the new function datscience::flex_table1() makes the creation of the Sociodemograhpic Table 1 (including the statistical comparisons of subsamples), basically a piece of cake. Take a look at the new article vignette("flex_table1"). For the example we took the popular iris dataset and included a simulated categorial variable called Color that contains either “Blue” or “Orange”. We supply the function with a formula that determines which variables are to be included in the table. For the example we include the two metric variables Sepal.Length, Sepal.Width as well as the simulated Color

str_formula <- "~ Sepal.Length + Sepal.Width + Color | Species"
flex_table1(str_formula, data = iris_sim, overall = "Overall") # |>
# save_flextable("Table1.docx")

Uncomment the pipe |> operator above and the line after the call to flex_table1() to diretly save this nicely formatted tabular comparison as .docs (Word) document.

### Further Examples of datscience Functionality

Below are just a few examples of the functionality of the package

#### The apa_corrTable() Function

The datscience::apa_corrTable() displays correlations with marked significance and additionally adds descriptive statistics to the table, see below:

Screenshot of “CorrelationTable_iris.docx”

This function resolves around three other useful functions from this package.

1. Creates the correlation table by calling datscience::corstars()[1]).

datscience::corstars(iris[1:4])
#>              Sepal.Length Sepal.Width Petal.Length
#> Sepal.Length
#> Sepal.Width     -0.12
#> Petal.Length     0.87***    -0.43***
#> Petal.Width      0.82***    -0.37***      0.96***
2. Appends desired summary stats to the flextable.

3. Formatting of the flextable::flextable() object to APA 7th style, by utilizing the format_flextable() function. To illustrate the function, we here use it to display the first 5 rows of the iris data set.

table_caption = c("Table 2", "Illustrating Functionality of format_flextable()")
)

4. Utilizing the datscience::save_flextable() function. This will savely (i.e., prohibiting overwrite of files by serializing the naming) write the flextable object to a Word (.docx) file

#### The format_flextable() Function

One of the most utilized functions inside the package is the datscience::format_flextable() which takes a flextable objects and applies the APA 7th edition theme on it. It also provides a work-around to give an APA ready table caption and a note.

Note: The code for the formatting (theme) for format_flextable() function was inspired from the blog post of Rémi Thériault

The flextable package is so versatile and it was exactly what I was looking for to get nicely formatted tables directly from R(studio) into Word. The same holds true for the datscience::format_flextable() function from the datscience package. It just applies some repetitive formatting necessary to convert a flextable to a “publication ready” APA formatted table.

One example of the flexibility would be to just try to print the factor loadings from a principal component analysis (PCA, psych::prinicial)

# Creation of an Example Prinicipal Component Analysis
pc <- principal(Harman74.cor$cov, 4, rotate = "varimax") pc_loadings <- pc$loadings |>
fa.sort() |>
round(3) |>
unclass() |>
as.data.frame() |>
mutate(across(
everything(),
~ if_else((. < 0.3), "", as.character(.))
)) |>
bind_cols(
Communality = pc$communality, Uniqueness = pc$uniquenesses,
Complexity = pc\$complexity
) |>
mutate(across(where(is.numeric), round, 2)) |>
tibble::rownames_to_column("items")