Past Workshops

Exploratory Data Analysis (EDA) is a critical first step of data analysis. Here are some reasons why we should use EDA:

Detecting mistakes and data cleaning.
Shedding lights on preliminary selection of appropriate analysis methods.
Exploring relationships among predictors and outcome variables.

This workshop will introduce several useful exploratory data analysis methods and visualization tools in R. Participants can apply these methods and tools using an insurance claim dataset we provide. No experience with coding or the R language is required.

Sample size calculation plays an important role in study design and grant proposal preparation. An overview of power analysis will be given. Two case studies will be presented to demonstrate the magic of power calculation for the survival model and the linear mixed effect model using SAS and R.

Outline:

Basic elements of power analysis
Design of a survival study based on the Log-Rank test
The linear mixed effect model: a path for designing a longitudinal study

Controlling the probability of falsely rejecting the null hypothesis is critical when there are multiple, simultaneous hypotheses. The most common method is to control the family-wise error rate (FWER) which guarantees the probability of falsely rejecting at least one of the hypotheses to be within a desired level.

As the number of hypotheses to be tested grew larger, the Bonferroni correction is too conservative and lacking power. This leads to the introduction of the False Discovery Rate (FDR) which is defined to be the expected proportion of falsely rejected hypotheses out of all rejected hypotheses.

Several modern stepwise methods controlling FDR have been proposed to increase power in the presence of too many hypotheses. We give an overview on the classical and modern multiplicity adjustment methods as well as how to run the procedures in R.

If brain cells of a dead fish were tested, can we conclude if it’s dead? When everyone thought running multiple tests would better support our claim, turns out, it rather leads to false conclusions; hence, the name “multiplicity problem”. An overview of classical and modern multiplicity adjustment methods will be introduced controlling Family-wise Error Rate (FWER) and False Discovery Rate (FDR). Software details in R will be demonstrated. Real examples and applications will be presented. Don’t revive a dead fish!

Outline:

What is “multiplicity”?
When does “multiplicity” arise?
1. (Example 1) Can dead fish be alive?
2. (Example 2) Can one drug cure multiple diseases?
How can it be handled?
1. Procedures (Brace yourself for namedropping!)
2. Dead fish revisited
3. Drug test revisited
How else? (Modern method)
1. Parallel Gate Keeping

This presentation will cover the statistical components of research grants typically required for successful applications. Topics will include a review of study types and their statistical characteristics, formulation of specific aims and hypotheses, development of a statistical plan for your research grant, review of sample size and power and practical advice on how to justify your sample size.

This will be a non-technical session geared towards research scientists who prepare grants and applied statisticians involved in collaborative studies.

Structural equation modeling, also known as SEM, is increasingly being used in research, particularly in the social sciences. SEM allows for the estimation of complex relationships between measured variables and latent constructs. In this workshop, we will introduce the components of the SEM framework. We will also go over a practical implementation of SEM in R, using the lavaan package. Specifically, we will focus on factor and mediation analyses.

Outline:

Latent variable and the basic elements of an SEM
Practical demonstration of Factor Analysis in R using lavaan
Extension of Factor Analysis – Mediation

Patient-reported outcomes are often relevant in studying a variety of diseases and outcomes that cannot be assessed adequately without a patients evaluation and whose key questions require patients input on the impact of a disease or a treatment.

To be useful to patients, researchers and decision makers, a patient-reported outcome (PRO) must undergo a validation process to support that it measures what it is intended to measure accurately and reliably.

In this workshop, after presentation of some key elements on the development of a PRO measure, the core topics of validity and reliability of a PRO measure will be discussed. Exploratory and confirmatory factor analyses, techniques to understand the underlying structure of a PRO measure, will be described. The topic of mediation modeling will be presented as a way to identify and explain the mechanism that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third variable called the mediator variable.

Also discussed will be item response theory and, time permitting, longitudinal analysis. Illustrations will be provided mainly through real-life and simulated examples.

Repeated measures analysis has been widely used in many fields and care in accounting for the covariance structure is needed when analyzing such data. In this workshop, we present an overview of repeated measures analysis. Specifically, we cover basic concepts of various types of repeated measures data with examples. We also demonstrate sample size calculation for repeated measure data. Finally, a detailed analysis of the repeated measures data from a SCS project is carried out.

Variable selection, also known as feature screening, is getting much attention in many research areas, especially for large ‘omics data sets. This workshop will introduce why we should do variable selection and some basic variable selection methods including stepwise, forward and backward regression. The Least absolute shrinkage and selection operator (LASSO) method will also be covered as a widely used variable selection method. Furthermore, this workshop will include the elastic net method, which is a combination of the ridge regression and the LASSO method. All the methods will be implemented in R.

Outline:

Why use variable selection?
Stepwise forward and backward regression
The LASSO method
The Elastic Net in R

The standard linear model is commonly used to describe the relationship between a response and a set of variables (predictors). It is often the case that some or many of the variables used in a linear model are in fact not associated with the response.

Including such irrelevant variables leads to unnecessary complexity in the resulting model, making it more difficult to interpret. In this workshop, we will cover two types of variable selection approaches, subset section and shrinkage, which can yield better prediction accuracy and model interpretability.

Various examples with demos in R will be provided to illustrate a more concrete idea of when and how one should apply each method.

In many fields, recent developments have led to an explosion in the number of measurements that can be collected in various settings. While at first exciting, having so many predictors can lead to serious high-dimensional problems. With so many predictors, how does a researcher identify the most important ones? If there are too many predictors (p > n), how can the analysis be carried out?

These high-dimension problems that arise from modern data sets have called for a major expansion of the classical statistical toolbox for analyzing data.

This workshop will cover the biggest issues in performing high-dimension analysis and will provide the tools needed to be successful; techniques including factor analy- sis to reduce the dimensionality of the predictors (dimension reduction) and LASSO to select the best predictors from those available (model selection), will be covered. These techniques will be presented with corresponding examples and accompanying R scripts that can be done along with the presentation.

No data is perfect, because of imperfect study design or the data collection process. Missing data is often inevitable. In order to help researchers handle missing data properly, cause, consequence, analysis methods and prevention suggestions of missing data will be introduced. In this workshop. Case studies in SPSS will be presented.

Outline:

Types of Missing Data
Consequence of Missing Data
Analysis of Missing Data
Preventing Missing Data

These high-dimension problems that arise from modern data sets have called for a major expansion of the classical statistical toolbox for analyzing data.

Statistical Consulting Services

Workshops on Basic Statistical Methods and Their Interpretations

Exploratory Data Analysis and Visualization in R

The Power Analysis

A Math-Free Workshop for Power Calculation

Multiplicity Adjustment

Modern Statistical Methods for Testing Multiple Hypotheses

Workshops on the Design of Experiments

Incorporating Statistics into Research Grants

Workshops on Specialized Statistical Methods

A Practical Introduction to Structural Equation Modeling in R

Missing Data in Surveys

Analysis of Patient-Recorded Outcomes

Workshops on Multivariate Analyses and Dimension Reduction

An Overview of Repeated Measure Analysis

A Practical Introduction to Variable Selection in R

Variable Selection with Demos in R

Model Selection and Dimension Reduction

Workshops on Data Management, Manipulation, and Visualization

Data Visualizations with R Shiny

Perfect the Imperfect Data – How to Deal with Missing Data in Practice

Analysis of Missing Data

Model Selection and Dimension Reduction

Quick Links