Skip to contents

Purpose

This vignette demonstrates ContrastsModeratedDEqMS, a count-dependent variance shrinkage method inspired by the DEqMS approach (Zhu et al., 2020). It is a drop-in alternative to ContrastsModerated:

  • ContrastsModerated (standard): applies a single global prior variance to all proteins — every protein is shrunk toward the same value.
  • ContrastsModeratedDEqMS (count-dependent): the prior variance depends on the number of peptides/PSMs used to quantify each protein. Proteins quantified from many peptides get less shrinkage (their variance is already well estimated); proteins from few peptides get more.

This matters in proteomics because quantification depth varies hugely across proteins. A protein quantified from 20 peptides has a much better-estimated variance than one from 2 peptides, and they should not receive the same prior.

The user-facing API is identical to ContrastsModerated: same downstream methods (get_contrasts, get_Plotter, to_wide, merge_contrasts_results).

How it works

  1. Fit LOESS: log(sigma^2) ~ log2(peptide_count) — captures the count-dependent variance trend.
  2. Estimate prior df (d0): from the residual variance around the LOESS curve, using the trigammaInverse approach (same math as limma/squeezeVarRob).
  3. Count-specific prior variance: s0^2(count) varies per protein, derived from the fitted LOESS curve.
  4. Posterior variance: Bayesian shrinkage combining observed and prior: var.post = (d0 * s0^2(count) + df * sigma^2) / (d0 + df)
  5. Moderated t-statistics and p-values are recomputed with the posterior variance.

Data preparation

We use simulated protein-level data. The simulation includes a nr_peptides column — the number of peptides per protein — which is exactly the count we need.

library(prolfqua)
library(dplyr)
library(ggplot2)

istar <- sim_lfq_data_protein_config(Nprot = 200, weight_missing = 0.3)
lfqdata <- LFQData$new(istar$data, istar$config)
lfqdata$remove_small_intensities()

lt <- lfqdata$get_Transformer()
transformed <- lt$log2()$robscale()$lfq
transformed$rename_response("transformedIntensity")

Define contrasts

contr_spec <- c(
  "AvsCtrl" = "group_A - group_Ctrl",
  "BvsCtrl" = "group_B - group_Ctrl",
  "AvsB"    = "group_A - group_B"
)

Build the count table

ContrastsModeratedDEqMS requires a data.frame mapping each protein to its peptide count. This is typically extracted from the input data. The constructor automatically aggregates to one value per protein (taking the max).

count_df <- istar$data |>
  select(all_of(c(istar$config$hierarchy_keys_depth(), "nr_peptides"))) |>
  distinct()

# Inspect the distribution of peptide counts
count_per_prot <- count_df |>
  group_by(protein_Id) |>
  summarise(nr_peptides = max(nr_peptides, na.rm = TRUE))

ggplot(count_per_prot, aes(x = nr_peptides)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
  labs(x = "Number of peptides per protein", y = "Count",
       title = "Distribution of peptide counts") +
  theme_minimal()

Standard moderation pipeline

mod_lm <- build_model(transformed, strategy_lm("transformedIntensity ~ group_"))
contr_lm <- prolfqua::Contrasts$new(mod_lm, contr_spec)
contr_moderated <- prolfqua::ContrastsModerated$new(contr_lm)
res_moderated <- contr_moderated$get_contrasts()

DEqMS moderation pipeline

The only difference: wrap with ContrastsModeratedDEqMS instead of ContrastsModerated, and supply count_df + count_column.

contr_deqms <- ContrastsModeratedDEqMS$new(
  contr_lm,
  count_df = count_df,
  count_column = "nr_peptides"
)
res_deqms <- contr_deqms$get_contrasts()

Diagnostic: variance vs. peptide count

This is the core diagnostic plot for the DEqMS approach. We expect log(sigma^2) to decrease with increasing peptide count, and the LOESS curve captures this trend.

# Get the unmoderated results with sigma and count
contr_raw <- contr_lm$get_contrasts()
contr_raw <- inner_join(
  contr_raw,
  count_per_prot,
  by = "protein_Id"
)

# Plot for one contrast
one_contrast <- contr_raw |> filter(contrast == levels(factor(contrast))[1])

ggplot(one_contrast, aes(x = log2(nr_peptides), y = log(sigma^2))) +
  geom_point(alpha = 0.4, size = 2) +
  geom_smooth(method = "loess", span = 0.75, color = "steelblue", se = TRUE) +
  labs(x = "log2(peptide count)",
       y = "log(residual variance)",
       title = paste("Variance vs. peptide count —", one_contrast$contrast[1])) +
  theme_minimal()
Residual variance decreases with peptide count. The LOESS curve (blue) defines the count-specific prior variance.

Residual variance decreases with peptide count. The LOESS curve (blue) defines the count-specific prior variance.

Comparing results

Fold changes are identical

Both decorators wrap the same underlying Contrasts object, so fold changes are identical.

merged <- inner_join(
  select(res_deqms, protein_Id, contrast, diff_deqms = diff),
  select(res_moderated, protein_Id, contrast, diff_moderated = diff),
  by = c("protein_Id", "contrast")
)

cor(merged$diff_deqms, merged$diff_moderated, use = "complete.obs")
## [1] 1

P-values differ (count-dependent shrinkage)

The DEqMS approach produces different p-values because the prior variance is protein-specific rather than global.

merged_p <- inner_join(
  select(res_deqms, protein_Id, contrast, p_deqms = p.value),
  select(res_moderated, protein_Id, contrast, p_moderated = p.value),
  by = c("protein_Id", "contrast")
)

cor(-log10(merged_p$p_deqms), -log10(merged_p$p_moderated), use = "complete.obs")
## [1] 0.9964816
ggplot(merged_p, aes(x = -log10(p_moderated), y = -log10(p_deqms))) +
  geom_point(alpha = 0.3, size = 1.5) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  labs(x = "-log10(p) standard moderated",
       y = "-log10(p) DEqMS moderated") +
  theme_minimal()
P-values: DEqMS vs. standard moderated. Highly correlated but not identical — DEqMS adjusts for peptide count.

P-values: DEqMS vs. standard moderated. Highly correlated but not identical — DEqMS adjusts for peptide count.

Volcano plots

pl_moderated <- contr_moderated$get_Plotter()
pl_deqms <- contr_deqms$get_Plotter()

gridExtra::grid.arrange(
  pl_moderated$volcano()$FDR + ggtitle("Standard moderated"),
  pl_deqms$volcano()$FDR + ggtitle("DEqMS moderated"),
  ncol = 1
)
Volcano plots from both moderation approaches.

Volcano plots from both moderation approaches.

Merging with missing value imputation

ContrastsModeratedDEqMS works seamlessly with ContrastsMissing and merge_contrasts_results.

mC <- ContrastsMissing$new(lfqdata = transformed, contrasts = contr_spec)
merged_result <- merge_contrasts_results(prefer = contr_deqms, add = mC)

plotter <- merged_result$merged$get_Plotter()
plotter$volcano()$FDR

Wide format export

wide <- contr_deqms$to_wide()
head(wide)
## # A tibble: 6 × 13
##   protein_Id  diff.AvsB diff.AvsCtrl diff.BvsCtrl p.value.AvsB p.value.AvsCtrl
##   <chr>           <dbl>        <dbl>        <dbl>        <dbl>           <dbl>
## 1 0EfVhX~7697    0.272       0.143      -0.128      0.00000380        0.00575 
## 2 0lCgkE~2864   -0.0325     -0.0330     -0.000541   0.540             0.506   
## 3 0m5WN4~1168    0.0182      0.163       0.145      0.664             0.000510
## 4 0vgFfT~5498    0.135       0.0782     -0.0569     0.0472            0.202   
## 5 0YSKpy~7538    0.0182     -0.00621    -0.0244     0.700             0.896   
## 6 0ZHbvZ~4125    0.0873      0.0471     -0.0402     0.0505            0.283   
## # ℹ 7 more variables: p.value.BvsCtrl <dbl>, FDR.AvsB <dbl>, FDR.AvsCtrl <dbl>,
## #   FDR.BvsCtrl <dbl>, statistic.AvsB <dbl>, statistic.AvsCtrl <dbl>,
## #   statistic.BvsCtrl <dbl>

Inspecting all columns

Use all = TRUE to see both the original and moderated columns side by side.

res_all <- contr_deqms$get_contrasts(all = TRUE)
res_all |>
  select(protein_Id, contrast, sigma, diff, statistic,
         moderated.var.prior, moderated.df.prior,
         moderated.var.post, moderated.statistic, moderated.p.value) |>
  head(10)
## # A tibble: 10 × 10
##    protein_Id  contrast  sigma     diff statistic moderated.var.prior
##    <chr>       <chr>     <dbl>    <dbl>     <dbl>               <dbl>
##  1 0EfVhX~7697 AvsB     0.0659  0.272       5.83              0.00453
##  2 0lCgkE~2864 AvsB     0.0531 -0.0325     -0.749             0.00459
##  3 0m5WN4~1168 AvsB     0.0367  0.0182      0.700             0.00444
##  4 0vgFfT~5498 AvsB     0.0860  0.135       2.06              0.00715
##  5 0YSKpy~7538 AvsB     0.0632  0.0182      0.406             0.00453
##  6 0ZHbvZ~4125 AvsB     0.0490  0.0873      2.52              0.00424
##  7 1HZ2jt~5236 AvsB     0.0871  0.0666      1.00              0.00468
##  8 3QYop0~6494 AvsB     0.0477  0.168       4.30              0.00482
##  9 3Zhyy7~9324 AvsB     0.0604  0.137       2.97              0.00469
## 10 4E74Ry~7621 AvsB     0.0515  0.00604     0.166             0.00439
## # ℹ 4 more variables: moderated.df.prior <dbl>, moderated.var.post <dbl>,
## #   moderated.statistic <dbl>, moderated.p.value <dbl>

Note that moderated.var.prior varies per protein (count-dependent), unlike in ContrastsModerated where it is constant.

Session Info

## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_4.0.2  dplyr_1.2.0    prolfqua_1.5.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6        xfun_0.56           bslib_0.10.0       
##  [4] htmlwidgets_1.6.4   ggrepel_0.9.7       lattice_0.22-7     
##  [7] vctrs_0.7.1         tools_4.5.2         generics_0.1.4     
## [10] tibble_3.3.1        pkgconfig_2.0.3     Matrix_1.7-4       
## [13] pheatmap_1.0.13     data.table_1.18.2.1 RColorBrewer_1.1-3 
## [16] S7_0.2.1            desc_1.4.3          lifecycle_1.0.5    
## [19] compiler_4.5.2      farver_2.1.2        textshaping_1.0.4  
## [22] progress_1.2.3      statmod_1.5.1       htmltools_0.5.9    
## [25] sass_0.4.10         yaml_2.3.12         lazyeval_0.2.2     
## [28] plotly_4.12.0       pillar_1.11.1       pkgdown_2.2.0      
## [31] crayon_1.5.3        jquerylib_0.1.4     tidyr_1.3.2        
## [34] MASS_7.3-65         cachem_1.1.0        limma_3.66.0       
## [37] nlme_3.1-168        tidyselect_1.2.1    digest_0.6.39      
## [40] stringi_1.8.7       purrr_1.2.1         labeling_0.4.3     
## [43] forcats_1.0.1       splines_4.5.2       fastmap_1.2.0      
## [46] grid_4.5.2          cli_3.6.5           magrittr_2.0.4     
## [49] utf8_1.2.6          withr_3.0.2         prettyunits_1.2.0  
## [52] scales_1.4.0        rmarkdown_2.30      httr_1.4.8         
## [55] otel_0.2.0          gridExtra_2.3       ragg_1.5.0         
## [58] hms_1.1.4           evaluate_1.0.5      knitr_1.51         
## [61] UpSetR_1.4.0        viridisLite_0.4.3   mgcv_1.9-3         
## [64] rlang_1.1.7         Rcpp_1.1.1          glue_1.8.0         
## [67] jsonlite_2.0.0      R6_2.6.1            plyr_1.8.9         
## [70] systemfonts_1.3.1   fs_1.6.6