Task 1: provide a unified representation of single-cell data
Challenges:
- Hundreds of scRNA-seq software tools 
 
- Most R and Bioconductor packages define their own class
 - Some extend SummarizedExperiment, some ExpressionSet
 - Most packages don’t fully exploit the potential of SummarizedExperiment (e.g., assay does not have to be a matrix)
 
Proposed solutions:
- Create a class for developers to extend: SingleCellExperiment
 
Useful Bioconductor packages and other resources:
Task 2: scale-up of existing tools / implementation of tools to handle large-scale datasets
Challenges:
- Tools are scalable to thousands of cells.
 - 10X Genomics released 1.3 Million cells dataset!
 - Main problem: does not fit in memory!
 
Proposed solutions:
- HD5 files + "chunk operations"
 - Simple algorithms + approximate, scalable methods
 - Provide API to perform common operations independent of data representation (in memory vs. on disk)
 
Useful Bioconductor packages and other resources:
Interested in contributing? Join the slack channel: 
https://community-bioc.slack.com
Discussion points
- Benchmark (canonical datasets)
 - Splatter (simulations of scRNA-seq)
 - What to do next?
 - BigDataAlgorithms: define scope, what functionalities we want
 
- Prior art in astronomy, etc?
 
- Visualization?
 - Multi assay?
 
- People are running single-cell assays that generate multiple types of data (e.g., RNA expression and methylation) from each single-cell.
 - Can store each assay in a SingleCellExperiment and then put inside a MultiAssayExperiment to link up the row and column metadata.
 
- Multiple samples--list of SingleCellExperiments vs giant joined SingleCellExperiment.  Can we learn from flowSet?