Past Workshops

The SCS tries to offer a number of workshops each year, with some workshops offered every semester. Let us know if you have a suggestion of a workshop that you would like to see offered at UConn!

Workshops on Basic Statistical Methods and Their Interpretations

Exploratory Data Analysis and Visualization in R

Exploratory Data Analysis (EDA) is a critical first step of data analysis. Here are some reasons why we should use EDA:
  • Detecting mistakes and data cleaning.
  • Shedding lights on preliminary selection of appropriate analysis methods.
  • Exploring relationships among predictors and outcome variables.

This workshop will introduce several useful exploratory data analysis methods and visualization tools in R. Participants can apply these methods and tools using an insurance claim dataset we provide. No experience with coding or the R language is required.

The Power Analysis

Power Analysis is the statistical method to determine if the sample size is sufficiently large to detect the treatment effects for a given significance level. This workshop will introduce the fundamentals of power analysis, also cover three different designs for the analysis; Independent two sample t test, One-way ANOVA, and Multiple Regression. For the power analysis during the workshop, G-power will be introduced.

 

A Math-Free Workshop for Power Calculation

Sample size calculation plays an important role in study design and grant proposal preparation. An overview of power analysis will be given. Two case studies will be presented to demonstrate the magic of power calculation for the survival model and the linear mixed effect model using SAS and R.

Outline:

  1. Basic elements of power analysis
  2. Design of a survival study based on the Log-Rank test
  3. The linear mixed effect model: a path for designing a longitudinal study

Multiplicity Adjustment

Controlling the probability of falsely rejecting the null hypothesis is critical when there are multiple, simultaneous hypotheses. The most common method is to control the family-wise error rate (FWER) which guarantees the probability of falsely rejecting at least one of the hypotheses to be within a desired level.

As the number of hypotheses to be tested grew larger, the Bonferroni correction is too conservative and lacking power. This leads to the introduction of the False Discovery Rate (FDR) which is defined to be the expected proportion of falsely rejected hypotheses out of all rejected hypotheses.

Several modern stepwise methods controlling FDR have been proposed to increase power in the presence of too many hypotheses. We give an overview on the classical and modern multiplicity adjustment methods as well as how to run the procedures in R.

Modern Statistical Methods for Testing Multiple Hypotheses

If brain cells of a dead fish were tested, can we conclude if it’s dead? When everyone thought running multiple tests would better support our claim, turns out, it rather leads to false conclusions; hence, the name “multiplicity problem”. An overview of classical and modern multiplicity adjustment methods will be introduced controlling Family-wise Error Rate (FWER) and False Discovery Rate (FDR). Software details in R will be demonstrated. Real examples and applications will be presented. Don’t revive a dead fish!

Outline:

  1. What is “multiplicity”?
  2. When does “multiplicity” arise?
    1. (Example 1) Can dead fish be alive?
    2. (Example 2) Can one drug cure multiple diseases?
  3. How can it be handled?
    1. Procedures (Brace yourself for namedropping!)
    2. Dead fish revisited
    3. Drug test revisited
  4. How else? (Modern method)
    1. Parallel Gate Keeping

Workshops on the Design of Experiments

Incorporating Statistics into Research Grants

This presentation will cover the statistical components of research grants typically required for successful applications. Topics will include a review of study types and their statistical characteristics, formulation of specific aims and hypotheses, development of a statistical plan for your research grant, review of sample size and power and practical advice on how to justify your sample size.

This will be a non-technical session geared towards research scientists who prepare grants and applied statisticians involved in collaborative studies.

Workshops on Specialized Statistical Methods

A Practical Introduction to Structural Equation Modeling in R

Structural equation modeling, also known as SEM, is increasingly being used in research, particularly in the social sciences. SEM allows for the estimation of complex relationships between measured variables and latent constructs. In this workshop, we will introduce the components of the SEM framework. We will also go over a practical implementation of SEM in R, using the lavaan package. Specifically, we will focus on factor and mediation analyses.

 Outline:

  1. Latent variable and the basic elements of an SEM
  2. Practical demonstration of Factor Analysis in R using lavaan
  3. Extension of Factor Analysis – Mediation

Missing Data in Surveys

Missing data are frequently encountered in surveys. This workshop will provide a brief overview of survey data and introduce types of missing data mechanisms. Several statistical methods in analyzing missing data will be discussed. Illustrative case studies will be presented in analyzing missing data using SPSS.

Analysis of Patient-Recorded Outcomes

Patient-reported outcomes are often relevant in studying a variety of diseases and outcomes that cannot be assessed adequately without a patients evaluation and whose key questions require patients input on the impact of a disease or a treatment.

To be useful to patients, researchers and decision makers, a patient-reported outcome (PRO) must undergo a validation process to support that it measures what it is intended to measure accurately and reliably.

In this workshop, after presentation of some key elements on the development of a PRO measure, the core topics of validity and reliability of a PRO measure will be discussed. Exploratory and confirmatory factor analyses, techniques to understand the underlying structure of a PRO measure, will be described. The topic of mediation modeling will be presented as a way to identify and explain the mechanism that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third variable called the mediator variable.

Also discussed will be item response theory and, time permitting, longitudinal analysis. Illustrations will be provided mainly through real-life and simulated examples.

Workshops on Multivariate Analyses and Dimension Reduction

An Overview of Repeated Measure Analysis

Repeated measures analysis has been widely used in many fields and care in accounting for the covariance structure is needed when analyzing such data. In this workshop, we present an overview of repeated measures analysis. Specifically, we cover basic concepts of various types of repeated measures data with examples. We also demonstrate sample size calculation for repeated measure data. Finally, a detailed analysis of the repeated measures data from a SCS project is carried out.

A Practical Introduction to Variable Selection in R

Variable selection, also known as feature screening, is getting much attention in many research areas, especially for large ‘omics data sets. This workshop will introduce why we should do variable selection and some basic variable selection methods including stepwise, forward and backward regression. The Least absolute shrinkage and selection operator (LASSO) method will also be covered as a widely used variable selection method. Furthermore, this workshop will include the elastic net method, which is a combination of the ridge regression and the LASSO method. All the methods will be implemented in R.

 Outline:

  1. Why use variable selection?
  2. Stepwise forward and backward regression
  3. The LASSO method
  4. The Elastic Net in R

Variable Selection with Demos in R

The standard linear model is commonly used to describe the relationship between a response and a set of variables (predictors). It is often the case that some or many of the variables used in a linear model are in fact not associated with the response.

Including such irrelevant variables leads to unnecessary complexity in the resulting model, making it more difficult to interpret. In this workshop, we will cover two types of variable selection approaches, subset section and shrinkage, which can yield better prediction accuracy and model interpretability.

Various examples with demos in R will be provided to illustrate a more concrete idea of when and how one should apply each method.

Model Selection and Dimension Reduction

In many fields, recent developments have led to an explosion in the number of measurements that can be collected in various settings. While at first exciting, having so many predictors can lead to serious high-dimensional problems. With so many predictors, how does a researcher identify the most important ones? If there are too many predictors (p > n), how can the analysis be carried out?

These high-dimension problems that arise from modern data sets have called for a major expansion of the classical statistical toolbox for analyzing data.

This workshop will cover the biggest issues in performing high-dimension analysis and will provide the tools needed to be successful; techniques including factor analy- sis to reduce the dimensionality of the predictors (dimension reduction) and LASSO to select the best predictors from those available (model selection), will be covered. These techniques will be presented with corresponding examples and accompanying R scripts that can be done along with the presentation.

Workshops on Data Management, Manipulation, and Visualization

Data Visualizations with R Shiny

Shiny is an R package that makes it easy to build interactive web apps from R. Shiny combines the computational power of R with the interactivity of the modern web. This workshop will introduce the basics of R Shiny, including the basic structure of a shiny app and how to make interactive visualizations.

Perfect the Imperfect Data – How to Deal with Missing Data in Practice

No data is perfect, because of imperfect study design or the data collection process. Missing data is often inevitable. In order to help researchers handle missing data properly, cause, consequence, analysis methods and prevention suggestions of missing data will be introduced. In this workshop. Case studies in SPSS will be presented.

Outline:

  1. Types of Missing Data
  2. Consequence of Missing Data
  3. Analysis of Missing Data
  4. Preventing Missing Data

Analysis of Missing Data

Missing data are frequently encountered in all type of datasets. This workshop will provide a brief overview of missing data mechanisms and introduced several statistical methods in analyzing missing data. Illustrative case studies will be presented in analyzing missing data using software such as SPSS.

Model Selection and Dimension Reduction

In many fields, recent developments have led to an explosion in the number of measurements that can be collected in various settings. While at first exciting, having so many predictors can lead to serious high-dimensional problems. With so many predictors, how does a researcher identify the most important ones? If there are too many predictors (p > n), how can the analysis be carried out?

These high-dimension problems that arise from modern data sets have called for a major expansion of the classical statistical toolbox for analyzing data.

This workshop will cover the biggest issues in performing high-dimension analysis and will provide the tools needed to be successful; techniques including factor analy- sis to reduce the dimensionality of the predictors (dimension reduction) and LASSO to select the best predictors from those available (model selection), will be covered. These techniques will be presented with corresponding examples and accompanying R scripts that can be done along with the presentation.