MSstatsBioNet 1.1.7
Run this code below to install MSstatsBioNet from bioconductor
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("MSstatsBioNet")
The MSstatsBioNet
package is a member of the MSstats
family of packages.
It contains a set of functions for interpretation of mass spectrometry (MS)
statistical analysis results in the context of protein-protein interaction
networks. The package is designed to be used in conjunction with the
MSstats
package.
We will be taking a subset of the dataset found in this paper.
input = data.table::fread(system.file(
"extdata/msstats.csv",
package = "MSstatsBioNet"
))
library(MSstatsConvert)
msstats_imported = FragPipetoMSstatsFormat(input, use_log_file = FALSE)
#> INFO [2025-09-12 18:49:03] ** Raw data from FragPipe imported successfully.
#> INFO [2025-09-12 18:49:03] ** Using annotation extracted from quantification data.
#> INFO [2025-09-12 18:49:03] ** Run labels were standardized to remove symbols such as '.' or '%'.
#> INFO [2025-09-12 18:49:03] ** The following options are used:
#> - Features will be defined by the columns: PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge
#> - Shared peptides will be removed.
#> - Proteins with single feature will not be removed.
#> - Features with less than 3 measurements across runs will be removed.
#> INFO [2025-09-12 18:49:03] ** Features with all missing measurements across runs are removed.
#> INFO [2025-09-12 18:49:03] ** Shared peptides are removed.
#> INFO [2025-09-12 18:49:03] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max
#> INFO [2025-09-12 18:49:03] ** Features with one or two measurements across runs are removed.
#> INFO [2025-09-12 18:49:03] ** Run annotation merged with quantification data.
#> INFO [2025-09-12 18:49:03] ** Features with one or two measurements across runs are removed.
#> INFO [2025-09-12 18:49:03] ** Fractionation handled.
#> INFO [2025-09-12 18:49:03] ** Updated quantification data to make balanced design. Missing values are marked by NA
#> INFO [2025-09-12 18:49:03] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.
head(msstats_imported)
#> ProteinName PeptideSequence PrecursorCharge FragmentIon ProductCharge
#> 1 P05023 AVAGDASESALLK 2 b12 1
#> 2 P05023 AVAGDASESALLK 2 b12 1
#> 3 P05023 AVAGDASESALLK 2 b12 1
#> 4 P05023 AVAGDASESALLK 2 b12 1
#> 5 P05023 AVAGDASESALLK 2 b12 1
#> 6 P05023 AVAGDASESALLK 2 b12 1
#> IsotopeLabelType Condition BioReplicate
#> 1 L NAT 1
#> 2 L T 1
#> 3 L NAT 2
#> 4 L T 2
#> 5 L NAT 3
#> 6 L T 3
#> Run Fraction Intensity
#> 1 CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00418_NAT 1 69915.91
#> 2 CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00418_T 1 101726.53
#> 3 CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00606_NAT 1 36215.18
#> 4 CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00606_T 1 24927.36
#> 5 CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-01882_NAT 1 NA
#> 6 CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-01882_T 1 NA
We will first convert the input data to a format that can be processed by
MSstats. In this example, the MetamorpheusToMSstatsFormat
function is used
to convert the input data from Metamorpheus to MSstats format. The MSstats
format will contain the necessary information for downstream analysis, such
as peptide information, abundance values, run ID, and experimental annotation
information.
library(MSstats)
#>
#> Attaching package: 'MSstats'
#> The following object is masked from 'package:grDevices':
#>
#> savePlot
QuantData <- dataProcess(msstats_imported, use_log_file = FALSE)
#> INFO [2025-09-12 18:49:05] ** Log2 intensities under cutoff = 9.2885 were considered as censored missing values.
#> INFO [2025-09-12 18:49:05] ** Log2 intensities = NA were considered as censored missing values.
#> INFO [2025-09-12 18:49:05] ** Use all features that the dataset originally has.
#> INFO [2025-09-12 18:49:05]
#> # proteins: 10
#> # peptides per protein: 1-8
#> # features per peptide: 7-12
#> INFO [2025-09-12 18:49:05]
#> NAT T
#> # runs 5 5
#> # bioreplicates 5 5
#> # tech. replicates 1 1
#> INFO [2025-09-12 18:49:05] Some features are completely missing in at least one condition:
#> ELEAEIQQLR_2_b5_1,
#> ELEAEIQQLR_2_b8_1,
#> NLEAVETLGSTSTIC(UniMod:4)SDK_3_b13_2,
#> NLEAVETLGSTSTIC(UniMod:4)SDK_3_b3_1,
#> NLEAVETLGSTSTIC(UniMod:4)SDK_3_b4_1 ...
#> INFO [2025-09-12 18:49:05] == Start the summarization per subplot...
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#> INFO [2025-09-12 18:49:06] == Summarization is done.
model <- groupComparison(
contrast.matrix = "pairwise",
data = QuantData,
use_log_file = FALSE
)
#> INFO [2025-09-12 18:49:06] == Start to test and get inference in whole plot ...
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#> INFO [2025-09-12 18:49:06] == Comparisons for all proteins are done.
head(model$ComparisonResult)
#> Protein Label log2FC SE Tvalue DF pvalue adj.pvalue
#> 1 O00217 NAT vs T 2.0285031 0.4364177 4.648077 4 0.0096753522 0.013821932
#> 2 O00330 NAT vs T 1.3000941 0.1320659 9.844285 3 0.0022285171 0.004457034
#> 3 O60313 NAT vs T 0.9299641 0.2387992 3.894336 4 0.0176257617 0.019584180
#> 4 O60879 NAT vs T -1.9484511 0.1695323 -11.493098 4 0.0003271868 0.001635934
#> 5 O75306 NAT vs T 2.4745040 0.3530857 7.008225 4 0.0021825029 0.004457034
#> 6 P05023 NAT vs T 1.8391155 0.2122058 8.666660 4 0.0009753219 0.003251073
#> issue MissingPercentage ImputationPercentage
#> 1 NA 0.14166667 0.14166667
#> 2 NA 0.20833333 0.10833333
#> 3 NA 0.29756098 0.29756098
#> 4 NA 0.09583333 0.09583333
#> 5 NA 0.16111111 0.16111111
#> 6 NA 0.11702128 0.11702128
Next, we will preprocess the data using the dataProcess
function and perform
a statistical analysis using the groupComparison
function. The output of
groupComparison
is a table containing a list of differentially abundant
proteins with their associated p-values and log fold changes.
First, we need to convert the group comparison results to a format that can be
processed by INDRA. The getSubnetworkFromIndra
function requires a table
containing HGNC IDs. We can use the annotateProteinInfoFromIndra
function
to obtain these mappings.
In the below example, we convert uniprot IDs to their corresponding Hgnc IDs. We can also extract other information, such as hgnc gene name and protein function.
library(MSstatsBioNet)
annotated_df = annotateProteinInfoFromIndra(model$ComparisonResult, "Uniprot")
head(annotated_df)
#> Protein Label log2FC SE Tvalue DF pvalue adj.pvalue
#> 1 O00217 NAT vs T 2.0285031 0.4364177 4.648077 4 0.0096753522 0.013821932
#> 2 O00330 NAT vs T 1.3000941 0.1320659 9.844285 3 0.0022285171 0.004457034
#> 3 O60313 NAT vs T 0.9299641 0.2387992 3.894336 4 0.0176257617 0.019584180
#> 4 O60879 NAT vs T -1.9484511 0.1695323 -11.493098 4 0.0003271868 0.001635934
#> 5 O75306 NAT vs T 2.4745040 0.3530857 7.008225 4 0.0021825029 0.004457034
#> 6 P05023 NAT vs T 1.8391155 0.2122058 8.666660 4 0.0009753219 0.003251073
#> issue MissingPercentage ImputationPercentage GlobalProtein UniprotId HgncId
#> 1 NA 0.14166667 0.14166667 O00217 O00217 7715
#> 2 NA 0.20833333 0.10833333 O00330 O00330 21350
#> 3 NA 0.29756098 0.29756098 O60313 O60313 8140
#> 4 NA 0.09583333 0.09583333 O60879 O60879 2877
#> 5 NA 0.16111111 0.16111111 O75306 O75306 7708
#> 6 NA 0.11702128 0.11702128 P05023 P05023 799
#> HgncName IsTranscriptionFactor IsKinase IsPhosphatase
#> 1 NDUFS8 FALSE FALSE FALSE
#> 2 PDHX FALSE FALSE FALSE
#> 3 OPA1 FALSE FALSE FALSE
#> 4 DIAPH2 FALSE FALSE FALSE
#> 5 NDUFS2 FALSE FALSE FALSE
#> 6 ATP1A1 FALSE FALSE FALSE
The package provides a function getSubnetworkFromIndra
that retrieves a
subnetwork of proteins from the INDRA database based on differential abundance
analysis results.
subnetwork <- getSubnetworkFromIndra(
annotated_df,
pvalueCutoff = 0.05,
statement_types = c("Complex", "IncreaseAmount", "DecreaseAmount", "Inhibition", "Activation", "Phosphorylation")
)
#> Warning in getSubnetworkFromIndra(annotated_df, pvalueCutoff = 0.05, statement_types = c("Complex", : NOTICE: This function includes third-party software components
#> that are licensed under the BSD 2-Clause License. Please ensure to
#> include the third-party licensing agreements if redistributing this
#> package or utilizing the results based on this package.
#> See the LICENSE file for more details.
head(subnetwork$nodes)
#> id logFC adj.pvalue hgncName Site
#> 1 O00217 2.0285031 0.013821932 NDUFS8 <NA>
#> 3 O60313 0.9299641 0.019584180 OPA1 <NA>
#> 5 O75306 2.4745040 0.004457034 NDUFS2 <NA>
#> 6 P05023 1.8391155 0.003251073 ATP1A1 <NA>
#> 7 P05067 0.7360012 0.020306662 APP <NA>
#> 8 P05090 0.5683951 0.013715050 APOD <NA>
head(subnetwork$edges)
#> source target interaction evidenceCount paperCount
#> 1 O00217 O75306 Complex 3 3
#> 2 O75306 P05067 Complex 1 1
#> 3 P05067 P05362 Complex 11 1
#> 4 P05067 P05090 Complex 1 1
#> 5 P05067 O60313 DecreaseAmount 5 2
#> 6 O00217 P08574 Complex 2 1
#> evidenceLink
#> 1 https://db.indra.bio/statements/from_agents?subject=7715@HGNC&object=7708@HGNC&format=html
#> 2 https://db.indra.bio/statements/from_agents?subject=7708@HGNC&object=620@HGNC&format=html
#> 3 https://db.indra.bio/statements/from_agents?subject=620@HGNC&object=5344@HGNC&format=html
#> 4 https://db.indra.bio/statements/from_agents?subject=620@HGNC&object=612@HGNC&format=html
#> 5 https://db.indra.bio/statements/from_agents?subject=620@HGNC&object=8140@HGNC&format=html
#> 6 https://db.indra.bio/statements/from_agents?subject=7715@HGNC&object=2579@HGNC&format=html
#> sourceCounts site
#> 1 {"signor": 1} <NA>
#> 2 {"biogrid": 1} <NA>
#> 3 {"sparser": 10, "reach": 1} <NA>
#> 4 {"sparser": 1} <NA>
#> 5 {"reach": 2} <NA>
#> 6 {"biogrid": 2} <NA>
This package is distributed under the Artistic-2.0 license. However, its dependencies may have different licenses. In this example, getSubnetworkFromIndra depends on INDRA, which is distributed under the BSD 2-Clause license. Furthermore, INDRA’s knowledge sources may have different licenses for commercial applications. Please refer to the INDRA README for more information on its knowledge sources and their associated licenses.
The function previewNetworkInBrowser
then takes the output of
getSubnetworkFromIndra
and visualizes the subnetwork. The function requires
an internet browser to view the subnetwork
previewNetworkInBrowser(subnetwork$nodes, subnetwork$edges, displayLabelType = "hgncName")
#> Network visualization exported to: /tmp/RtmpmCb3aj/file1697c5122b434e.html
In the network diagram displayed using CytoscapeJS, you should see three arrows connecting two nodes, APP and APOD. These arrows represent the interactions between these two proteins, notably Activation, IncreaseAmount (e.g. via transcriptional regulation), and forming a protein complex.
sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MSstatsBioNet_1.1.7 MSstats_4.17.1 MSstatsConvert_1.19.1
#> [4] BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.2 IRdisplay_1.1
#> [4] dplyr_1.1.4 farver_2.1.2 S7_0.2.0
#> [7] bitops_1.0-9 fastmap_1.2.0 lazyeval_0.2.2
#> [10] RCurl_1.98-1.17 base64url_1.4 XML_3.99-0.19
#> [13] digest_0.6.37 lifecycle_1.0.4 survival_3.8-3
#> [16] statmod_1.5.0 r2r_0.1.2 magrittr_2.0.4
#> [19] compiler_4.5.1 rlang_1.1.6 sass_0.4.10
#> [22] tools_4.5.1 yaml_2.3.10 data.table_1.17.8
#> [25] knitr_1.50 htmlwidgets_1.6.4 curl_7.0.0
#> [28] marray_1.87.0 repr_1.1.7 RColorBrewer_1.1-3
#> [31] KernSmooth_2.23-26 pbdZMQ_0.3-14 purrr_1.1.0
#> [34] BiocGenerics_0.55.1 stats4_4.5.1 grid_4.5.1
#> [37] preprocessCore_1.71.2 caTools_1.18.3 log4r_0.4.4
#> [40] ggplot2_4.0.0 scales_1.4.0 gtools_3.9.5
#> [43] MASS_7.3-65 dichromat_2.0-0.1 cli_3.6.5
#> [46] rmarkdown_2.29 crayon_1.5.3 reformulas_0.4.1
#> [49] generics_0.1.4 httr_1.4.7 minqa_1.2.8
#> [52] cachem_1.1.0 splines_4.5.1 parallel_4.5.1
#> [55] BiocManager_1.30.26 base64enc_0.1-3 vctrs_0.6.5
#> [58] boot_1.3-32 Matrix_1.7-4 jsonlite_2.0.0
#> [61] bookdown_0.44 ggrepel_0.9.6 limma_3.65.4
#> [64] plotly_4.11.0 tidyr_1.3.1 jquerylib_0.1.4
#> [67] glue_1.8.0 nloptr_2.2.1 RJSONIO_2.0.0
#> [70] stringi_1.8.7 gtable_0.3.6 lme4_1.1-37
#> [73] tibble_3.3.0 pillar_1.11.0 htmltools_0.5.8.1
#> [76] gplots_3.2.0 RCy3_2.29.1 graph_1.87.0
#> [79] IRkernel_1.3.2 R6_2.6.1 Rdpack_2.6.4
#> [82] evaluate_1.0.5 lattice_0.22-7 rbibutils_2.3
#> [85] backports_1.5.0 bslib_0.9.0 Rcpp_1.1.0
#> [88] uuid_1.2-1 nlme_3.1-168 checkmate_2.3.3
#> [91] xfun_0.53 fs_1.6.6 pkgconfig_2.0.3