
An interactive Gene Set Enrichement Interface: The
Gene Set Matrix.
The
first of the two question was more important to us,
therefore we used a recently described algorithm
(GSA)
which has an improved sensitivity and
specificity over "classical" GSEA for
assessing the differences. We also used a more
stringent significance criterion (False
Discovery Rate [FDR] of < 10%, compared to
the suggested FDR <25% cutoff in the
GSEA software
tool).
We compared each computationally defined cluster with every other cluster. The results can be plotted as a matrix of pairwise values, where each number indicates the number of gene sets that were found to be significantly differentially regulated between two sample clusters.
The comparisons have two aspects:
One, where a number of gene sets were identified as up regulated in the upper sample cluster as compared to the lower sample cluster. A second one, where a number of gene sets were found to be enriched in the lower sample cluster as compared to the upper sample cluster.
We then use the Gene Set Analysis Matrix (GSA Matrix) as interface enable easy access to the Gene Sets found to be enriched in between two clusters.
We compared each computationally defined cluster with every other cluster. The results can be plotted as a matrix of pairwise values, where each number indicates the number of gene sets that were found to be significantly differentially regulated between two sample clusters.
The comparisons have two aspects:
One, where a number of gene sets were identified as up regulated in the upper sample cluster as compared to the lower sample cluster. A second one, where a number of gene sets were found to be enriched in the lower sample cluster as compared to the upper sample cluster.
We then use the Gene Set Analysis Matrix (GSA Matrix) as interface enable easy access to the Gene Sets found to be enriched in between two clusters.
In
this part of the study we have compared each
computationally defined NMF cluster with each other
cluster. We address two questions with this
approach:
1. Are there actual biologically meaningful differences between the computationally defined sNMF clusters?
This questions is answered by the number of gene sets enriched in one sample cluster versus the other.
2. What is the biological basis of these differences?
This questions is answered by the identity and biological implication of a differentially enriched Gene Set, which can tell us
what predefined molecular pathways or downstream transcription factor targets are enriched in one cluster versus the other.
In this part of the study we defined "biologically meaningful differences" as a significant, differential regulation ("enrichment") of Gene Sets. Gene Sets are either based on experimental evidence or were curated to represent as gene set curated "text book knowledge" (e. g. the Krebs cycle). Gene Sets represent a powerful tool for the analysis of microarray datasets with externally validated biological knowledge, thus avoiding circular reasoning in our study.
Gene sets were taken from two public databases, MSigDB2 and the Stanford Synthetic Gene Collection (~3000 Gene Sets altogether).
1. Are there actual biologically meaningful differences between the computationally defined sNMF clusters?
This questions is answered by the number of gene sets enriched in one sample cluster versus the other.
2. What is the biological basis of these differences?
This questions is answered by the identity and biological implication of a differentially enriched Gene Set, which can tell us
what predefined molecular pathways or downstream transcription factor targets are enriched in one cluster versus the other.
In this part of the study we defined "biologically meaningful differences" as a significant, differential regulation ("enrichment") of Gene Sets. Gene Sets are either based on experimental evidence or were curated to represent as gene set curated "text book knowledge" (e. g. the Krebs cycle). Gene Sets represent a powerful tool for the analysis of microarray datasets with externally validated biological knowledge, thus avoiding circular reasoning in our study.
Gene sets were taken from two public databases, MSigDB2 and the Stanford Synthetic Gene Collection (~3000 Gene Sets altogether).
The
light grey block at the upper intersection of two
clusters represents the number of genes sets
up-regulated in the upper cluster, the light grey
block at the lower intersection of two cluster
represents the number of gene sets up-regulated in
the other cluster as compared to the first cluster.
Figure 1 illustrates this display method. Clicking on each comparison tile will return a detailed list of the pathways enriched in the upper and the lower cluster in the diagonale. The first column in each comparison block will list the Gene Sets enriched in the upper cluster, the second column in each comparison Clicking on the clusters on the diagonal will return a web page that displays a TreeMap representation of the sparse NMF cluster consensus summary statistics values.
Figure 1 illustrates this display method. Clicking on each comparison tile will return a detailed list of the pathways enriched in the upper and the lower cluster in the diagonale. The first column in each comparison block will list the Gene Sets enriched in the upper cluster, the second column in each comparison Clicking on the clusters on the diagonal will return a web page that displays a TreeMap representation of the sparse NMF cluster consensus summary statistics values.
Figure 1: Gene Set Matrix Explanation
A
B
C