--- title: "BioGSP: Spectral Graph Wavelet Transform for Spatial Data (New Workflow)" author: "Yuzhou" date: "`r Sys.Date()`" output: html_document: toc: true toc_float: false code_folding: show theme: default self_contained: true mathjax: null vignette: > %\VignetteIndexEntry{BioGSP: Spectral Graph Wavelet Transform for Spatial Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fig.width = 5, fig.height = 3.5, dpi = 72, dev = 'png') ``` # Introduction BioGSP provides **Spectral Graph Wavelet Transform (SGWT)** analysis for spatial biological data. This package enables multi-scale analysis of spatial patterns using graph signal processing techniques. ## Key Capabilities - **Object-oriented SGWT workflow** with clear separation of concerns - **Multi-scale spatial analysis** using spectral graph wavelets - **Flexible data input** with custom column naming - **Energy-normalized similarity analysis** with comprehensive metrics - **Multiple wavelet kernels** (Mexican Hat, Meyer, Heat) - **Comprehensive visualization** tools - **Detailed inverse transform results** (low-pass, band-pass approximations) ## New Workflow Overview The new workflow consists of five main steps: 1. **`initSGWT()`**: Initialize SGWT object with data and parameters 2. **`checkKband()`**: Analyze k-band limited property of signals (optional) 3. **`runSpecGraph()`**: Build spectral graph structure 4. **`runSGWT()`**: Perform forward and inverse SGWT transforms 5. **`runSGCC()`**: Calculate energy-normalized weighted similarity # Setup ```{r load-libraries} # Load required packages library(BioGSP) library(ggplot2) library(patchwork) library(viridis) # Set random seed for reproducibility set.seed(123) ``` # New SGWT Workflow Demonstration ## Generate Example Pattern ```{r generate-pattern} # Create a spatial pattern with concentric circles demo_pattern <- simulate_multiscale( grid_size = 20, # Moderate size for fast computation n_centers = 1, # Single center pattern Ra_seq = 5, # Inner circle radius n_steps = 1, # Single step (one pattern) outer_start = 10, # Outer ring radius seed = 123 )[[1]] # Display pattern structure cat("Generated pattern dimensions:", nrow(demo_pattern), "x", ncol(demo_pattern), "\n") cat("Column names:", paste(colnames(demo_pattern), collapse = ", "), "\n") cat("Signals available:", sum(demo_pattern$signal_1), "inner pixels,", sum(demo_pattern$signal_2), "outer pixels\n") ``` ## Visualize Input Pattern ```{r visualize-pattern, fig.width=7, fig.height=3} # Single combined visualization (one plot for this section) demo_pattern$pattern_type <- ifelse( demo_pattern$signal_1 == 1, "Inner Circle", ifelse(demo_pattern$signal_2 == 1, "Outer Ring", "Background") ) p_input <- ggplot(demo_pattern, aes(X, Y, fill = pattern_type)) + geom_tile() + scale_fill_manual( values = c("Background" = "white", "Inner Circle" = "#e31a1c", "Outer Ring" = "#1f78b4"), name = "Pattern" ) + coord_fixed() + theme_void() + ggtitle("Input Pattern (Signal 1 + Signal 2)") print(p_input) ``` ## Step 1: Initialize SGWT Object ```{r init-sgwt} # Initialize SGWT object with custom column names SG <- initSGWT( data.in = demo_pattern, x_col = "X", # Custom X coordinate column name y_col = "Y", # Custom Y coordinate column name signals = c("signal_1", "signal_2"), # Analyze both signals J = 2, # Number of wavelet scales scaling_factor = 1, # Scale progression factor kernel_type = "mexican_hat" ) # Display initialized object print(SG) ``` ## Step 2: Build Spectral Graph ```{r build-graph} # Build spectral graph structure SG <- runSpecGraph(SG, k = 12, laplacian_type = "normalized", length_eigenvalue = 200, verbose = TRUE) # Check updated object cat("Graph construction completed!\n") cat("Adjacency matrix dimensions:", dim(SG$Graph$adjacency_matrix), "\n") cat("Number of eigenvalues computed:", length(SG$Graph$eigenvalues), "\n") ``` ## Visualize Fourier Modes (Eigenvectors) ```{r fourier-modes, fig.width=7, fig.height=5} # Visualize Fourier modes (eigenvectors) - 5 low and 5 high frequency modes fourier_modes <- plot_FM(SG, mode_type = "both", n_modes = 2, ncol = 2) cat("Fourier Modes Visualization:\n") cat("- Low-frequency modes (2-6): Smooth spatial patterns, excluding DC component\n") cat("- High-frequency modes (96-100): Fine-detailed spatial patterns\n") cat("- Each mode shows its eigenvalue (λ) indicating frequency content\n") ``` ## Step 3: Run SGWT Analysis ```{r run-sgwt} # Perform SGWT forward and inverse transforms SG <- runSGWT(SG, verbose = TRUE) # Display final object with all results print(SG) ``` ## Visualize SGWT Decomposition ```{r plot-decomposition, fig.width=7, fig.height=5} # Plot SGWT decomposition results for signal_1 decomp_plots <- plot_sgwt_decomposition( SG = SG, signal_name = "signal_1", plot_scales = 1, # One plot for this section ncol = 1 ) print(decomp_plots) ``` ## Energy Analysis ```{r energy-analysis} # Analyze energy distribution across scales for signal_1 energy_analysis <- sgwt_energy_analysis(SG, "signal_1") print(energy_analysis) # Create energy distribution plot energy_plot <- ggplot(energy_analysis, aes(x = scale, y = energy_ratio)) + geom_bar(stat = "identity", fill = "steelblue", alpha = 0.7) + geom_text(aes(label = paste0(round(energy_ratio * 100, 1), "%")), vjust = -0.5, size = 3) + labs(title = "Energy Distribution Across SGWT Scales (Signal 1)", x = "Scale", y = "Energy Ratio") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) print(energy_plot) ``` # Advanced Features Demonstration ## Different Wavelet Kernels ```{r kernel-comparison} # Compare different kernel types kernels <- c("mexican_hat", "meyer", "heat") kernel_results <- list() for (kn in kernels) { # Initialize with different kernel SG_temp <- initSGWT( data.in = demo_pattern, x_col = "X", y_col = "Y", signals = "signal_1", J = 3, # Reduced for faster computation kernel_type = kn ) # Run full workflow SG_temp <- runSpecGraph(SG_temp, k = 12, laplacian_type = "normalized", length_eigenvalue = 100,verbose = FALSE) SG_temp <- runSGWT(SG_temp, verbose = FALSE) kernel_results[[kn]] <- SG_temp } # Compare reconstruction errors comparison_df <- data.frame( Kernel = kernels, RMSE = sapply(kernel_results, function(x) x$Inverse$signal_1$reconstruction_error), stringsAsFactors = FALSE ) print("Kernel Performance Comparison:") print(comparison_df) # Plot comparison ggplot(comparison_df, aes(x = Kernel, y = RMSE, fill = Kernel)) + geom_bar(stat = "identity", alpha = 0.7) + geom_text(aes(label = round(RMSE, 6)), vjust = -0.5) + scale_fill_viridis_d() + labs(title = "SGWT Reconstruction Error by Kernel Type", x = "Kernel Type", y = "RMSE") + theme_minimal() ``` ## Step 4: Pattern Similarity Analysis ```{r similarity-analysis} # Calculate similarity between signal_1 and signal_2 in same object similarity_within <- runSGCC("signal_1", "signal_2", SG = SG, return_parts = TRUE) cat("Pattern Similarity Analysis (within same graph):\n") cat(sprintf("Overall similarity: %.4f\n", similarity_within$S)) cat(sprintf("Low-frequency similarity: %.4f\n", similarity_within$c_low)) cat(sprintf("Non-low-frequency similarity: %.4f\n", similarity_within$c_nonlow)) cat(sprintf("Energy weights - Low: %.3f, Non-low: %.3f\n", similarity_within$w_low, similarity_within$w_NL)) # Generate a second pattern for cross-comparison pattern_2 <- simulate_multiscale( grid_size = 40, n_centers = 1, Ra_seq = 8, # Different inner radius n_steps = 1, # Single step outer_start = 15, # Different outer radius seed = 456 )[[1]] # Create second SGWT object SG2 <- initSGWT(pattern_2, x_col = "X", y_col = "Y", signals = "signal_1", J = 4) SG2 <- runSpecGraph(SG2, k = 12, laplacian_type = "normalized",length_eigenvalue = 100, verbose = FALSE) SG2 <- runSGWT(SG2, verbose = FALSE) # Note: Cross-object similarity comparison removed due to different eigenvalue counts # between SG (200 eigenvalues) and SG2 (100 eigenvalues) causing dimension mismatch # Visualize both patterns for comparison (single faceted plot for this section) pattern_compare <- rbind( data.frame( X = demo_pattern$X, Y = demo_pattern$Y, signal_1 = demo_pattern$signal_1, Pattern = "Pattern A (R_in=5, R_out=10)" ), data.frame( X = pattern_2$X, Y = pattern_2$Y, signal_1 = pattern_2$signal_1, Pattern = "Pattern B (R_in=8, R_out=15)" ) ) pattern_compare$Signal <- ifelse(pattern_compare$signal_1 == 1, "Signal", "Background") ggplot(pattern_compare, aes(X, Y, fill = Signal)) + geom_tile() + scale_fill_manual(values = c("Background" = "white", "Signal" = "#377eb8")) + coord_fixed() + facet_wrap(~ Pattern) + theme_void() + theme(legend.position = "none") + ggtitle("Pattern Comparison (signal_1)") ``` ## Low-frequency Only Analysis ```{r low-frequency-analysis} # Compare using only low-frequency components similarity_low <- runSGCC("signal_1", "signal_2", SG = SG, low_only = TRUE, return_parts = TRUE) cat("Low-frequency Only Similarity Analysis:\n") cat(sprintf("Low-frequency similarity: %.4f\n", similarity_low$c_low)) cat(sprintf("Overall score (same as low-freq): %.4f\n", similarity_low$S)) cat("Note: Non-low-frequency components are ignored (c_nonlow = NA)\n") # Compare with full analysis cat("\nComparison:\n") cat(sprintf("Full analysis similarity: %.4f\n", similarity_within$S)) cat(sprintf("Low-only similarity: %.4f\n", similarity_low$S)) cat(sprintf("Difference: %.4f\n", abs(similarity_within$S - similarity_low$S))) ``` ## Multiple Signal Analysis ```{r multiple-signals} # Analyze energy distribution for both signals energy_signal1 <- sgwt_energy_analysis(SG, "signal_1") energy_signal2 <- sgwt_energy_analysis(SG, "signal_2") # Combine for comparison energy_comparison <- rbind(energy_signal1, energy_signal2) # Plot comparison ggplot(energy_comparison, aes(x = scale, y = energy_ratio, fill = signal)) + geom_bar(stat = "identity", position = "dodge", alpha = 0.7) + geom_text(aes(label = paste0(round(energy_ratio * 100, 1), "%")), position = position_dodge(width = 0.9), vjust = -0.5, size = 3) + scale_fill_viridis_d() + labs(title = "Energy Distribution Comparison: Signal 1 vs Signal 2", x = "Scale", y = "Energy Ratio", fill = "Signal") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` ## Similarity Space Visualization with Multiple Patterns ```{r similarity-visualization, fig.width=7, fig.height=6} # Generate 9 paired patterns using simulate_multiscale cat("Generating 9 paired patterns for similarity analysis...\n") patterns_9 <- simulate_multiscale( grid_size = 40, n_centers = 1, Ra_seq = c(3, 6, 9), # Inner circle radii n_steps = 3, # 3 shrinkage steps outer_start = 20, # Fixed starting outer radius seed = 123 ) cat("Generated", length(patterns_9), "patterns\n") cat("Pattern names:", names(patterns_9), "\n") # Create SGWT objects for all 9 patterns and compute similarities similarity_results <- list() sgwt_objects <- list() cat("\nProcessing patterns and computing SGWT analysis...\n") # Process each pattern for (i in seq_along(patterns_9)) { pattern_name <- names(patterns_9)[i] pattern_data <- patterns_9[[i]] cat("Processing", pattern_name, "...\n") # Create SGWT object SG_temp <- initSGWT( data.in = pattern_data, x_col = "X", y_col = "Y", signals = c("signal_1", "signal_2"), J = 4, kernel_type = "heat" ) # Build graph and run SGWT SG_temp <- runSpecGraph(SG_temp, k = 12, laplacian_type = "normalized", length_eigenvalue = 50, verbose = FALSE) SG_temp <- runSGWT(SG_temp, verbose = FALSE) sgwt_objects[[pattern_name]] <- SG_temp # Compute within-pattern similarity (signal_1 vs signal_2) sim_within <- runSGCC("signal_1", "signal_2", SG = SG_temp, return_parts = TRUE) similarity_results[[pattern_name]] <- sim_within } # Print similarity results summary cat("\nSimilarity Results Summary (signal_1 vs signal_2 within each pattern):\n") similarity_df <- data.frame( Pattern = names(similarity_results), S = sapply(similarity_results, function(x) x$S), c_low = sapply(similarity_results, function(x) x$c_low), c_nonlow = sapply(similarity_results, function(x) x$c_nonlow), w_low = sapply(similarity_results, function(x) x$w_low), w_NL = sapply(similarity_results, function(x) x$w_NL), stringsAsFactors = FALSE ) # Extract Ra and Step values for better labeling similarity_df$Ra <- as.numeric(gsub(".*Ra_([0-9.]+)_Step.*", "\\1", similarity_df$Pattern)) similarity_df$Step <- as.numeric(gsub(".*Step_([0-9]+)$", "\\1", similarity_df$Pattern)) similarity_df$Label <- paste0("Ra=", similarity_df$Ra, ",Step=", similarity_df$Step) # Create similarity space visualization similarity_plot <- visualize_similarity_xy( similarity_results, point_size = 4, point_color = "steelblue", add_diagonal = TRUE, add_axes_lines = TRUE, title = "SGWT Similarity Space: 9 Paired Patterns (signal_1 vs signal_2)", show_labels = TRUE ) print(similarity_plot) # Analysis of patterns cat("\nPattern Analysis:\n") cat("Ra (Inner Radius) Effect:\n") for (ra in unique(similarity_df$Ra)) { subset_data <- similarity_df[similarity_df$Ra == ra, ] cat(sprintf(" Ra=%g: Mean c_low=%.3f, Mean c_nonlow=%.3f, Mean S=%.3f\n", ra, mean(subset_data$c_low), mean(subset_data$c_nonlow), mean(subset_data$S))) } cat("\nStep (Shrinkage Step) Effect:\n") for (step in unique(similarity_df$Step)) { subset_data <- similarity_df[similarity_df$Step == step, ] cat(sprintf(" Step=%g: Mean c_low=%.3f, Mean c_nonlow=%.3f, Mean S=%.3f\n", step, mean(subset_data$c_low), mean(subset_data$c_nonlow), mean(subset_data$S))) } cat("\nSimilarity Space Interpretation:\n") cat("- Each point represents similarity between signal_1 (inner circle) and signal_2 (outer ring)\n") cat("- Color indicates inner radius (Ra): different colors for different inner radii\n") cat("- Shape indicates shrinkage step: circle/triangle/square for different steps\n") cat("- Position shows frequency-domain similarity characteristics\n") ``` # Advanced Workflow Features ## Custom Parameters ```{r custom-parameters} # Demonstrate advanced parameter customization SG_advanced <- initSGWT( data.in = demo_pattern, x_col = "X", y_col = "Y", signals = "signal_1", J = 6, # More scales for finer analysis scaling_factor = 1.5, # Closer scales kernel_type = "heat" # Heat kernel ) SG_advanced <- runSpecGraph(SG_advanced, k = 15, laplacian_type = "randomwalk",length_eigenvalue = 30, verbose = FALSE) SG_advanced <- runSGWT(SG_advanced, verbose = FALSE) cat("Advanced Parameters Results:\n") cat("Number of scales:", length(SG_advanced$Parameters$scales), "\n") cat("Scales:", paste(round(SG_advanced$Parameters$scales, 4), collapse = ", "), "\n") cat("Reconstruction error:", round(SG_advanced$Inverse$signal_1$reconstruction_error, 6), "\n") ``` # Summary ## Key Takeaways 1. **New Object-Oriented Workflow**: Clear separation of initialization, graph building, analysis, and similarity computation 2. **Flexible Input**: Supports custom column naming for coordinates and signals 3. **Multiple Kernels**: Choose from Mexican Hat, Meyer, or Heat kernel families 4. **Comprehensive Similarity**: Energy-normalized weighted similarity with detailed diagnostics 5. **Advanced Analysis**: Low-frequency only analysis, cross-object comparison, multiple signal support ## New Workflow Benefits ```r # Complete workflow in 5 clear steps: SG <- initSGWT(data, signals = c("signal1", "signal2")) # 1. Initialize kband <- checkKband(SG, signals = c("signal1", "signal2")) # 2. Check k-band (optional) SG <- runSpecGraph(SG, k = 12, laplacian_type = "normalized") # 3. Build graph SG <- runSGWT(SG) # 4. Run SGWT similarity <- runSGCC("signal1", "signal2", SG = SG) # 5. Compare patterns ``` ## Parameter Guidelines - **Small datasets (<200 points)**: k=8-12, J=3-4 - **Medium datasets (200-1000 points)**: k=12-20, J=4-6 - **Large datasets (>1000 points)**: k=15-25, J=5-8 ## Next Steps - Explore your own spatial datasets using the demonstrated workflow - Experiment with different kernel types and parameters - Use comprehensive similarity analysis for comparative studies - Refer to function documentation for detailed parameter specifications --- *BioGSP Package - Biological Graph Signal Processing for Spatial Data Analysis*