Fill in the Gene and Tissue values and click the (RE)Draw Plot! button. It may take a few seconds for the plot and table to appear.



Word Clouds of most common enriched GO terms

GO Term Enrichment





Heatmap of developing retina. Rows are genes, columns are age (in days) of retina or organoid.





Gene Expression Levels across Developing Fetal and Organoid Retina (days)

Mouse retina Single Cell Gene Expression Statistics by User Selected Gene(s)
Please be patient with this section as calculating the density plots for all genes in across all cell types can take several seconds.




Heatmap
Table Statistics

ND = Not Detected

tSNE Clustering of bulk human RNA-seq samples
Each point is an RNA-seq(uenced) tissue. Similar tissues cluster together.

The perplexity may be viewed as a knob that sets the number of effective nearest neighbors. It is comparable with the number of nearest neighbors k that is employed in many manifold learners.

Please be patient, as over 1200 interactive data points are being loaded.



Visualization of single cell mouse retina sample expression patterning (tSNE or UMAP based)
Each point is a individual cell with dimensionality reduction and cell labelling done by the publishing scientists. Similar cell types cluster together. Darker opacity are cells which express detectable levels of the user selected gene. The user can also select the minimum level of expression of the gene. Most gene's expression level ranges from 0 to 5 per cell.



Retina Network:

Table of pair-wise gene connection strength
Table of pair-wise gene connection strength
Table of gene connectivities within module
Table of gene connectivities within module
Table of enriched GO terms
Table of enriched GO terms
RPE Network

Table of pair-wise gene connection strength
Table of pair-wise gene connection strength
Table of gene connectivities within module
Table of gene connectivities within module
Table of enriched GO terms
Table of enriched GO terms

References

Bulk Tissue Gene (or transcript(tx)) Raw Count Matrices

Rows are genes, columns are samples, values are raw counts as calculated by salmon.

Bulk Tissue Gene (or transcript(tx)) Expression Matrices

Rows are genes, columns are samples, values are in length scaled Transcripts Per Million (TPM) as calculated by tximport .
$$X = \frac{count\ of\ reads\ mapped\ to\ gene * 10^{3}}{gene\ length\ in\ bp}$$
$$TPM = X \ast \frac{1}{\sum X} \ast 10^{6} $$

De novo transcriptome data

Everything

All of the data and code for this entire web application can be retrieved by following the simple directions here .

Missing anything?

If there's some data you want for easy download, let me know


eyeIntegration v1.05

Mission

The human eye has several specialized tissues which direct, capture, and pre-process information to provide vision. RNA-seq gene expression analyses have been used extensively, for example, to profile specific eye tissues and in large consortium studies, like the GTEx project, to study tissue-specific gene expression patterning

However, there has not been an integrated study of multiple eye tissues expression patterning with other human body tissues.

We have collated publicly available (January 12th, 2017 and January 1st, 2019) healthy human RNA-seq datasets and a substantial subset of the GTEx project RNA-seq datasets and processed all in a consistent bioinformatic workflow. We use this fully integrated dataset to probe the relatedness and biological processes between the cornea, retina, RPE (choroid), and the rest of the human tissues with differential expression, clustering, and GO term enrichment tools. We also leverage our large collection of retina and RPE (choroid) tissues to build the first human weighted gene correlation networks and use them to highlight known biological pathways and eye gene disease enrichment.

Basic Statistics

We make these data, analyses, and visualizations available here with a powerful interactive web application.

Attribution

This project was conceived and implemented by David McGaughey in OGVFB / NEI / NIH . The retina and RPE gene networks along with their accompanying web pages were constructed by John Bryan.

The 2019 automated pipeline datasets were built by Vinay Swamy.

Our analysis of the data in eyeIntegration has been published in Human Molecular Genetics. The manuscript is available here . If you use this resource in your research we would appreciate a citation.

We also strongly encourage citation of the publications behind the datasets used in this resource. A full list can be found here.

Source Code

The source code and data for this web application is available here.

Problems?

First check the FAQ by clicking on Information in the above header, then on FAQs

Other issues can be reported two ways: email or GitLab Issue Tracker




2020-02-14 | v1.05

Updated DNTx to v01. Removed v00 as we have made SUBSTANTIAL improvements to the precision and reliability of the results. We do not recommend v00 be used.

2020-01-31 | v1.04

Add raw counts to data download, as there were a handful of requests from users. Also fix data repo link from gitlab to github.

2019-06-09 | v1.04

Big update, which addresses (I hope) the comments from the reviewers of IOVS. More GTEx samples added per tissue. New GTEx tissues added (bladder, bone marrow, cervix uteri, fallopian tube, ovary, prostrate, testis, uterus, and vagina). Ratnapriya et al. AMD (MGS 1 == normal, 2,3,4 are increasing severity of AMD) retina dataset added. Modified lengthScaledTPM scores to adjust for tissue design and use mapping rate as covariate with limma's batchEffects() function. The differential expression test now uses mapping rate as covariate. On the UI side, now using fixed (consistent) color scheme for tissue in the box plots. Update summary stats on loading page with new tissues, numbers. Also showing GTEx count differences from 2017 to 2019.

2019-04-26 | v1.03

Added prototype ocular de novo transcriptomes Vinay Swamy has built as a database option the pan-tissue visualizations and the data tables. We also make the de novo gene models (GTF) and sequence (fasta) available for download in the Data -> Data Download section. Again, this is version 00 prototype data and will change in the future. Depending on how much the project develops, we will expand eyeIntegration to show more information on the de novo transcript models or move parts of this project out into a new web site.

2019-03-06 | v1.02

Tweaked Retina Stem Cell / Organoid samples labelling. Added temporal heatmap for retina fetal and organoid time points. Modified bulk RNA-seq heatmap to use ComplexHeatmap, which handles long column names better. Optional row and clustering for the heatmaps.

2019-01-16 | v1.01

Fixed some tissue mislabels in EiaD 2019, removed unused sub-tissue, added eye sub-tissue vs human body tissue differential tests and GO term enrichments. Updated the workflow and 2017 to 2019 tables on the main loading page.

2019-01-16 | v1.00

Version 1.0! We introduce a huge set of updates, including a new 2019 dataset with 224 new eye samples, four new eye tissue categories, non-protein coding quantification, heatmap visualization, custom user shortcuts, quick gene information links, and easy data downloads. We will soon have a bioRxiv preprint describing the new automated pipeline that underlies the 2019 dataset. For users wanting to compare previous work done on eyeIntegration, the 2017 dataset is available as a versioned option.

2018-10-12 | v0.73

Now using heatmaps for SC RNA-seq data.

2018-10-04 | v0.72

Engineering changes to make the site more responsive.

2018-09-28 | v0.70

Major changes! Can now select transcript level gene expression and Clark et al. biorXiv 2018 mouse retina time series scRNA-seq data added!

2018-08-21 | v0.63

Updated manuscript link to Human Molecular Genetics advance print and tweaked boxplot plotting size logic

2018-01-09 | v0.62

Table data added for single cell data

2017-11-09 | v0.61

More granular single cell plots and tables in the Gene Expression section

2017-09-29 | v0.60

Mouse retina single cell data now available in the Gene Expression and 2D Clustering sections! Also the Pan-Tissue Expression section now has metadata for each sample point on mouseover!

2017-05-23 | v0.51

Now the user can export data from any table

2017-05-17 | v0.5

SQLite used on the backend for the largest data files to reduce initialization time and memory usage. FAQ section added

2017-04-13 | v0.4

Added full network edge tables for retina and RPE

2017-04-08 | v0.3

Network visualizations changed (gene names are nodes, layouts cleaned up a little), boxplot re-plot button more visible, on load now defaults to the info | overview page
We will soon have a pre-print describing the new 2019 automated data analysis workflow. The code-base for the build can be found here.
First you pick the dataset (2017 or 2019) [2]. Then you can tweak the 'Genes' [3] and 'Tissues' [4] by clicking in them and starting to type (allowed values will auto-fill). You can also delete values by clicking on them and hitting the 'delete' key on your keyboard. You can tweak the display of the box plots a bit by changing the 'Number of columns' field [5]. A higher number will squeeze more plots in each column. When you are done tweaking those parameters, click the big blue '(Re)Draw Plot!' button [6] and wait a few seconds.

If you mouse over a data point [8], you will get metadata about that particular sample.



Each gene gets its own box. The y-axis is length scaled TPM (log2 transformed). The x axis is samples, colored by tissue. The right panels [9, 10] are tables with external links to gene info [9] and the absolute TPM values and the rank of the gene in the particular tissue (lower is more highly expressed) [10].



To give a rough sense of how highly expressed the gene is in the tissue, the decile of expression is given in [10]; 10 is the highest decile of expression and 1 is the lowest.



This is a very simple differential expression test. When you click this radio button [1] the view changes, where the expression is being shown *relative* to the reference tissues [2]. By default it is all of the tissues; in the screenshot Retina - Organoid and Retina - Stem Cell Line are the baseline expression samples. The data table on the right [3] will then display log2 fold change, average expression, and the p-value (from a t-test) of the differential expression.



This produces a 2D visualization, with each gene as a row and each tissue as a column. More yellow is more expressed. It is a efficient way to display the expression of many genes and tissues.



If you use the Pan-Tissue Boxplot feature a lot, you may find it frustrating to have to input in your favorite genes and tissues. We have added the ability to use a custom url to load in the genes and tissues of your choice. Previously you had to build this link youself - but now there's a handy button [7] you can click that will re-create the parameters. One downside is that the web app is continually using the URL dataset, which makes it impossible for you to change it. You can simply reload the web page with the custom bits.



This is short for differential expression. We have pre-calculated 55+ differential expression tests. All eye tissue - origin pairs were compared to each other. We also have a synthetic human body set, made up of equal numbers of GTEx tissues (see manuscript, above, for more details). The word cloud displayed shows as many as the top 75 terms used in enriched GO terms in the selected comparison. The table data shows the actual GO terms. You can search for the comparison of your choice.
These are the values taken from the limma differential expression topTable() summary table. The following has been taken from the limma manual and edited to match parameters we used (https://www.bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf):

A number of summary statistics are presented by topTable() for the top genes and the selected contrast. The logFC column gives the value of the contrast. Usually this represents a log2-fold change between two or more experimental conditions although sometimes it represents a log2-expression level. The AveExpr column gives the average log2-expression level for that gene across all the arrays and channels in the experiment. Column t is the moderated t-statistic. Column P.Value is the associated p-value and adj.P.Value is the p-value adjusted for multiple testing (False Discovery Rate corrected).

The B-statistic (lods or B) is the log-odds that the gene is differentially expressed. Suppose for example that B = 1.5. The odds of differential expression is exp(1.5)=4.48, i.e, about four and a half to one. The probability that the gene is differentially expressed is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this gene is differentially expressed. A B-statistic of zero corresponds to a 50-50 chance that the gene is differentially expressed. The B-statistic is automatically adjusted for multiple testing by assuming that 1% of the genes, or some other percentage specified by the user in the call to eBayes(), are expected to be differentially expressed. The p-values and B-statistics will normally rank genes in the same order. In fact, if the data contains no missing values or quality weights, then the order will be precisely the same.
The Macosko data is a single-cell (~45,000) retina RNA-seq mouse P14 C57BL/6 dataset from Mackosko and McCarroll's field defining paper. The cluster / cell type assignments are taken from here. The Clark data is a 100,000 cell plus mouse retina RNA-seq time series dataset. Their pre-publication manuscript is on bioRxiv. Data was pulled from here.

To efficiently display a huge amount of information, expression across many individual cells is averaged by cell type, (if available) age, and gene. You can select the Macosko or Clark dataset [1], then one gene [2] to plot. The gene expression is displayed as a heatmap, with each row being a retina cell type (derived by the respective authors) and each column [4] is a time point, arranged from youngest to oldest. More yellow is higher expression [5].




You can add the rank of expression (or rank of percentage of cells with detectable expression of selected gene) with this radio



This shows the t-SNE tissue clustering for the bulk human eye tissues along with the GTEx data-set. Hovering the mouse over each data point will show the metadata. Changing the perplexity will demonstrate how low values artificially create sub-groups while higher value (above 30 or so) largely recapitulate tissue type. It also demonstrates that the tissue clustering is stable at higher perplexities.



Each data point is a single cell from the Macosko and McCarroll or the Clark and Blackshaw [2]. Dimensionality reduction with done with the t-SNE (Macosko) or UMAP (Clark) algorithm. Cluster assignments were taken from the respective papers. While we did the t-SNE on the Macosko data, the Clark authors provided the UMAP coordinates. The Clark dataset was generated across multiple time-points during development and thus, you can select time points of interest [4]. Only one gene can selected at a time [3], as it is very computationally expensive to plot many points. Points (cells) expressing the gene of interest are plotted in darker color [arrow]. Hovering over each point [6] will show the metadata for the cell.



This is a weighted gene expression correlation network. The gene expression information for all retina or all RPE tissues is used to identify gene pairs whose expression is correlated with each other. All of the pair-wise correlations are assessed to build a network of interactions.
We imagine the most common use is to search for your gene of interest (GOI). Simply type your GOI into the search box [1]. If it is not in the network, then the name will not appear. After selecting the GOI, the network will reload to display the module the gene is in, as well as several of the most correlated partners. You can adjust the number of displayed correlated genes by changing the K-nearest genes panel [2]. Hovering over a gene name in the network will display GO terms for the gene [3].

Unfortunately, we have no way of knowing this. Since the network algorithms use correlations, the gene to gene interactions have no directional information.
The count plot [1] simply shows the number of genes in each module. A gene can only be in one module. The pair-wise gene connection strength [2] shows the strongest gene partners for the selected gene. If a module search is selected, then this table shows all gene to gene edge connection strengths (higher is stronger) in the module. The gene connectivity table [3] shows the kWithin metric for each gene in the module, which denotes how connected (and important) the gene is across the module. The GO term table [4] shows the significant GO terms for the genes in the module. This allows you to get a sense of the function of the module.




The edge table allows you to search for a gene and it returns all significant (edge length > 0.01) correlated genes ACROSS the entire network. Using the 'Connections to show' radio button, you can control whether only extramodular or intramodular (or both) connections are included in the table.
The data table shows, for each gene and tissue set the user selects, the most important metadata for each sample.
This data table set shows the data used to make the Mouse Single Cell Retina Expression plots in Gene Expression.