Why your code doesn’t reproduce: Lessons learned from Meta-Psychology

Lucija Batinovic & Rickard Carlsson
Meta-Psychology
November 16, 2023

Agenda

Meta-Psychology computational reproducibility assessment
Common errors in computational reproducibility
Best practices for reproducible analyses
Discussion

Introduction to reproducibility

Transparency is a way out of the replication crisis

Yet, transparency only allows assessing reliability of research findings
Much focus has been given to replication (i.e., same analysis on new data) or robustness (i.e., new analyses same data)
Less attention to the most basic:
- keeping data, analyses and reporting free from errors
Computational reproducibility checks are still rare

Meta-Psychology

Meta-Psychology pioneering in this area (#1 on TOP FACTOR)
Meta-Psychology is a community run journal
No APC (Diamond OA)
Publishes anything “meta” in psychology:
- reviews, meta-analyses, commentaries, replications.
This talk is our 5 years experience with computational reproducibility at MP

Computational Reproducibility Check

Overview

47 articles reproduced since journal’s inception in 2017
papers reproduced in 2022: 13
- papers reproduced without any errors: 2 (with one being simulation study that required running code with fewer iterations)
- errors in code: 8 (e.g., missing library calls, problems knitting reports, container-emulating packages, deprecated functions)
- errors in reported results: 6 (e.g., wrong sample size, mean, plot output discrepancy, copy paste error, package update changed output)

Reproducibility Checklist

Common Errors in Computational Reproducibility

Most common errors we encountered happened because of:

Renaming files
Functions that were not working as they should
- Deprecated functions
Hard-coding file paths
Older software versions

Functions Not Working

Common reasons why a function may not run properly:

Incorrect input arguments or parameters
Undefined or out-of-scope variables
Missing or outdated dependencies

# Example function call without loading library
my_function()

`Error in my_function() : could not find function "my_function"`

Deprecated functions

mydata <- read.csv("myfile.csv", header=TRUE)

`Warning message: In read.csv("myfile.csv", header = TRUE) :
   'read.csv' is deprecated: use 'read.csv2' instead`

Improper syntax or formatting errors
Logical errors or incorrect algorithmic design

Hard-Coding File Paths

hard-coding the absolute path to the file will cause issues when running the code in a different environment

# Hard-coded file path
data_file <- "C:/Users/username/Documents/data.csv"

# Load data from file
data <- read.csv(data_file)

relative paths allow reproducing code across computers and operating systems

# load data from a directory in the project 
data_file <- "data/data.csv"

Software Versions

Not stating which version of the software was used can cause troubles while reproducing the results in different environments

Error: package 'dplyr' requires R version 4.0.0 or later

on the other hand, some packages may stop being maintained and newer versions of software can cause issues with functions

Warning: package 'plyr' was built under R version 3.6.3

Reasons for non-reproducible Results

Most common reasons for non-reproducible results

copy-paste errors
wrong rounding
simulation based analyses

Setting Seed for Simulations

bootstrapping confidence intervals
simulating datasets

head(rnorm(100, mean = 0, sd = 1))

[1] -0.86943091 -1.02729501  0.15032230 -1.48502696 -0.15172372  0.07539084

set.seed(1234)
head(rnorm(100, mean = 0, sd = 1))

[1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247  0.5060559

Wrong Rounding/Copy-Paste Errors

data <- iris
mean(iris$Sepal.Width)

[1] 3.057333

round(mean(iris$Sepal.Width))

[1] 3

round(mean(iris$Sepal.Width), 2)

[1] 3.06

data <- iris
# Hard-coded result
cat("The mean value is 5.")

The mean value is 5.

# Extracted from code
cat(paste0("The mean value is ", round(mean(iris$Sepal.Length), 2), "."))

The mean value is 5.84.

Best practices

Importance of structure

Psych-DS
creating projects allows using relative paths
projects can be easily sent to and managed by other people
hierarchical project structure makes the analysis process clearer for the third party

my_project/
├── README.txt
├── codebook.txt
├── data/
│   ├── raw/
│   │   ├── data_file_1.csv
│   │   ├── data_file_2.csv
│   │   └── ...
│   ├── processed/
│   │   ├── data_file_1_cleaned.csv
│   │   ├── data_file_2_cleaned.csv
│   │   └── ...
│   └── ...
├── docs/
│   ├── report.pdf
│   ├── manuscript.Rmd
│   └── ...
├── src/
│   ├── data_cleaning.R
│   ├── data_analysis.R
│   └── ...
└── ...

File naming conventions

consistency: keep capitalization consistent, as well as use of punctuation, avoid special characters
clarity: names should be descriptive and clear, even if it means it will be longer
chronology: names should indicate the correct order of steps

RMarkdown/Quarto

compiling reports that contain the code and results in one document allows a simpler workflow and minimizes errors caused by copy-pasting
papaja package in R

README and codebook files

README

Computing environment; package version and time necessary to compute (show reproducibility checklists)
Project description: A brief description of the project and its purpose.
Installation instructions: Step-by-step instructions on how to install and set up the project. This should include any dependencies or prerequisites required to run the project

Codebook

Overview of all variables: name and description of each variable
codebook

library(codebook)
new_codebook_rmd(codebook)

Containerization

Docker
Codeocean
R packages (e.g., renv)

Point-and-click alternative

open source: JASP or JAMOVI
jamovi files can be saved with analyses and edited data in a transparent way *Note: although .omv (jamovi) files include datasets, always provide raw .tsv/.csv files separately

Power of AI

with the rise of ChatGPT and LLM-based apps accessible to public, there is no reason why provided materials cannot be well documented

Examples of use that can save time and promote reproducibility (ideas provided by ChatGPT):

Number	Task	Description
1	Drafting Detailed Documentation	ChatGPT can assist in writing comprehensive documentation that covers all aspects of the code including installation, usage, and troubleshooting.
2	Generating README Files	It can create README files that provide a clear overview of the project, setup instructions, and usage guidelines.
3	Preparing Installation Guides	ChatGPT can write detailed installation guides to ensure users can recreate the necessary environment to run the code.
4	Automating Responses to Common Issues	It can draft standard responses for common issues or questions for inclusion in an FAQ section.
5	Creating Example Code Snippets	ChatGPT can generate code snippets to demonstrate the usage of different functions or classes.
6	Writing Test Scripts	It can assist in writing test scripts to verify the functionality of different parts of the code.
7	Developing Code Annotations	ChatGPT can help annotate code, providing explanations for complex sections and clarifying the logic behind certain decisions.
8	Assisting with Dependency Lists	It can help compile and explain a list of dependencies, ensuring all necessary packages are included.
9	Formatting and Linting Code	ChatGPT can offer guidance on code formatting and linting to ensure the code adheres to standard practices.
10	Creating Data Preprocessing Scripts	For projects involving data, it can help write scripts for data cleaning and preprocessing.
11	Drafting Contribution Guidelines	ChatGPT can formulate guidelines for how others can contribute to the project.
12	Version Control Guidance	It can provide tips on using version control systems effectively for tracking changes and ensuring reproducibility.
13	Generating Virtual Environment Setup Instructions	ChatGPT can guide the creation of instructions for setting up virtual environments.
14	Assisting with Continuous Integration Setup	It can offer insights into setting up CI/CD pipelines for automated testing and consistency across environments.
15	Advising on Licensing Issues	ChatGPT can provide information on different software licenses to help authors choose the right one for their code.