Blood-brain barrier (BBB) is a monolayer of endothelial cells that line brain capillaries. Protein assemblies arranged in a complex architecture, seal the junctions between the endothelial cells, to prevent passive diffusion of solutes between blood and brain. Moreover, the transendothelial transport of solutes is also heavily restricted. The BBB protects brain by blocking the entry of harmful substances from blood and shielding the brain from peripheral fluctuations in hormones, fatty acids and electrolytes. Thus doing, the BBB achieves highly regulated environment in the brain, which is critical for optimal neuronal function. Such a well-guarded BBB architecture also poses a formidable barrier to the delivery of drugs contrast agents to brain for the diagnosis and treatment of various neurological diseases.
In addition to functioning as a formidable barrier, the BBB serves as a major conduit for the delivery of crucial nutrients and growth factors needed for the upkeep of brain physiology. Moreover, the BBB aids in the clearance of metabolites from the brain. To achieve these distinct functions, the BBB endothelium is highly specialized in handling material transport and cellular signaling. The high fidelity cellular apparatus of BBB endothelium sense changes in the brain and plasma compartments, exchange signals with other members of the neurovascular unit, and is capable of promptly adjusting the material and information transfer between blood and brain compartments.
Owing to these unique functions, any functional and structural impairment of the BBB could lead to disastrous pathophysiological consequences in the brain. BBB dysfunction is implicated in several brain disorders including Alzheimer’s disease, parkinson’s disease, cerebral amyloid angiopathy, stroke, and vascular dementia. Hence, the research community has been actively investigating the cerebrovascular contributions to neurological diseases with major emphasis on the BBB. On the other hand, monumental efforts are being invested in discovering methods to transiently disrupt the BBB to improve drug delivery to the brain. The success of these efforts is heavily dependent upon the availability of reliable in vitro as well as in vivo BBB models.
The human cerebrovascular endothelial cells (hCMEC/D3) described in the current work serves as one such in vitro model that can be easily cultured and manipulated in the lab. The, hCMEC/D3 cell monolayers are widely used in BBB research. The barrier properties and the expression of several classes of receptors, transporters, and enzymes in hCMEC/D3 cells have been validated in previous publications. However, a comprehensive genomic landscape of hCMEC/D3 cells, which is required for investigating molecular mechanisms using systems biology approaches, is not currently available. The current database attempts to fill this important information gap.
The web portal is divided into coding, noncoding, and pathway sections. Coding and noncoding RNAs are extracted from the RNA-Seq data, whereas the micro RNAs are obtained from microRNA-Seq data. The coding section constitutes of gene expression, alternate splicing, and SNVs; the noncoding section displays circular RNA and linc RNAs. Expressions of various genes involved in the KEGG Pathways could be searched using gene name (symbol), pathway ID, or pathway name.
Note that the query fields are pre-populated to provide a search string reference. Table queries are implemented as grep string matches and multiple queries can be run simultaneously. For instance a user can run an expression query: FOXA1 FOXC1.
Similarly, the pathways can be searched as Alzheimer ECM to yield both HSA05010 (Alzheimers) and HSA04512 (ECM) pathway representations. In the Pathway query table we observe the bile secretion is quoted and searched as single string, in the case of pathways utilizing apostrophes in the pathway name, such as Alzheimers disease would require the use of double quotes.
An individual gene expression could be searched by the gene symbol; if multiple genes are searched, they must be separated by a space. When a gene name is entered both raw and normalized gene expression counts are obtained for eight replicates. The raw gene expression table provides chromosome, gene name, expression counts, start and stop positions obtained using mapRseq workflow. Normalized data for the same gene is also displayed below the raw counts table. The raw gene counts were normalized by the CQN method that performs gene length correction; normalizes for total sequencing depth; and GC content with some offset parameters. Display of NA for any gene of interest in the normalized section means that the gene is not expressed, i.e., the median raw counts across the samples is less than 32.
If a gene of interest has multiple transcripts, both supporting read counts and Bayesian determined ratios (phi) counts will be displayed. The higher phi ratios (Bayesian probabilities) represent the more predominant isoforms. These counts were obtained by employing the MISO> tool.
The MISO documentation nicely illustrates how the junction reads are interpreted to identify the exon skip event differentiating the two isoforms. For example, as shown below the read depths for exons 1 and 3 are both 4, and cannot be confidently assigned to either isoform. The read depth for the junction spanning evidence for the exon skip (0,1) is 3 and assigned to isoform 2. Conversely, supporting read evidence for the exon retention is also 3 and assigned to isoform 1. While there are 9 supporting reads, the 9 reads constitute a read depth of 3 covering the retained exon. The phi ratios of the two isoforms would then be 3:3 (0.5 for each). For further details please consult the MISO manuscript.
We have provided URL links to ENSEMBL gene models, which were found to have multiple isoforms in the replicated hCMEC/D3 sequencing data. Enabling the transcript view table, will provide relevant isoform information about the end users gene of interest. Additionally, we have provided ENSEMBL transcript links with in the splicing query result. Please note that some identified transcript links may no longer be supported by the current ENSEMBL build version. The generated transcript ids are based upon the provided hg19.gff3 file, built in 2012.
Expressed Single Nucleotide Variants (SNV):
provides the variants that are present in the RNA sequencing data obtained from 8 replicates of hCMEC/D3 monolayers. When a gene is queried, the chromosome, chromosomal position, reference allele, and alternate allele found in the sample at that particular nucleotide position are displayed. A variant will also be given high, medium, or low confidence scores, based on its presence in multiple samples.
Functional annotation of the SNV (exonic, 5' UTR, 3' UTR):
Exonic annotations will be further displayed as synonymous or non-synonymous. Amino acid changes in the translated protein will be provided for each variant. Information on the conservation among species and segmental duplication annotations are obtained from the UCSC genome browser. The SNV variant in DBSNP, thousand genome, or ESP6500 (exome variant server) databases is reported. This will help us differentiate reported variants from the novel variants that may be specific to the hCMEC/D3 cell line. The variant is also classified as deleterious using AVsift, which reports predicted impact of the protein.
MicroRNA could be searched with matured microRNA name or gene name. If a gene name is entered, any microRNA that can bind to the gene will be listed. Moreover, the number genes that microRNA could target will also be listed. In addition, normalized data is also represented. Raw reads were normalized to a million and were further computed by dividing each micro RNA raw read count by the total number of micro RNA reads to arrive at the normalized reads for each sample.
There were only 11 circular RNAs found. Hence, the complete table with the raw counts was provided without a search option.
LincRNA (Long non-coding RNAs):
LincRNAs are obtained for all the eight replicates along with the annotation to the closet gene name and distance. The length and start/stop information of a particular LincRNA could be searched in the context of any gene of interest by providing the gene name. The normalized and raw counts are displayed. The raw values for each lincRNA were normalized to a million and corrected for the lincRNA length to obtain the normalized reads (CPM).
Pathways could be searched with constituent gene names (symbol), pathway ID, or pathway name. With the gene name as the search term, all the associated pathways could be retrieved. The retrieved information comprises of pathway outline on which the gene expression is overlaid. Blue colored boxes represent low gene expression; whereas the red boxes represent high expression. Moreover, the pathway search could be performed by pathway ID as well as by the KEGG related to insulin signaling/trafficking will be displayed.
There are several components to the pathview representations of the pathways, which need to be understood. First weve constructed each pathway expression matrix, with n genes in the pathway by the 8 replicated hCMEC/D3 cultures. For HSA05010 (Alzheimers Disease), that expression matrix comprises 168 genes. We observe little biological variation among the replicates represented by their gene-nodes, since most of the gene-nodes appear as one solid color.
In addition, we observe that there are not 168 nodes representing all the genes. Nodes such as CALM1 represent 6 transcriptional variations of calmodulin (6 gene models, not isoforms). Of several options the Pathview provides to represent multi-gene nodes, we have chosen to use the mean, which s also the default option in the software. In the example of CALM1, the node is represented for each sample by the mean of the six genes.
It is widely believed that the blood brain barrier disruption is associated with the Alzheimiers pathology, and is also implicated in the impaired clearance of amyloid-β proteins. The hCMEC/D3 cell cultures would thereby represent homeostatic blood brain barrier, prior to the onset of Alzheimers disease. In this pathway diagram, the 25th quantile is approximately 4 CPM (as normalized be CQN) and the 75th quantiles is 8 CPM; hence, we can presume that the pathway is highly expressed in hCMEC/D3 cells, since the ranges translate to raw reads of 2,048 to 32,768, and the CQN is shifted by approximately 7; 2^ CPM+7. We observed that receptors such as LRP1, GRIN1, and CACNA1C, shaded in blue, are all down regulated with respect to the rest of the pathway. Similarly, we see that the caspase 9 and 12 (as well as caspase 8), which are associated with apoptosis are also down regulated. In addition, IDE and MME, which are implicated in the degradation of amyloid-β proteins are down-regulated. In contrast, if we examine the pathway describing malaria pathogenesis, which does not primarily involve the blood-brain barrier, particularly in the cell lines, we observe that raw read counts range from 32 to 4,096. The quantiles of this pathway expression matrix is skewed and the higher expression levels are being driven by some rather ubiquitous genes (TGFB1, VCAM11, PECAM1, ICAM1, MET, etc.). The expression does not support the relevance of the blood-brain barrier in this pathway.
Finally, in pathview representations, the default representation normalizes each pathway (-1,1). We have chosen to represent the quantile, in order to facilitate the comparison among pathways, as weve illustrated above for malaria and Alzheimers disease. Normalized pathways could hinder the ability to contrast between pathways. Conversely, if the contrast gradient represented the 25th (-2) and 75th (6) quantiles of the entire expression matrix (23,346 genes), the expression differences in the relevant pathway (such as Alzheimers disease) would be lost (the entire pathway would be red).
Finally, a limitation of the pathview package is the inability to label the color key legend. The low and high limits are provided to the pathview package and represented as the extremes on the color key legend. The middle value of the color key legend does not necessarily represent the median (50th quartile) value of the pathway expression matrix. Rather, it is simply the mid-point between the two color key extremes. Rastering the image with our own color legend is a possibility, which will be explored in the future analyses posted on the BBBomic website.
pal2=brewer.pal(7,'Spectral')[c(7,4,1)]
pv.out <- pathview(gene.data = dat, pathway.id = hsa, species = "hsa", out.suffix = "BBB", keys.align = "y", kegg.native = T, match.data = T, multi.state = T, same.layer = F,
low = list(gene = pal2[1], cpd=pal2[1]), mid =list(gene = pal2[2], cpd=pal2[1]), high = list(gene = pal2[3], cpd=pal2[1]),
limit=list(gene=signif(quantile(apply(dat,1,mean), probs=c(.25,.75)),1), cpd=1),node.sum='mean', bins = list(gene = 8 , cpd=8 ))
All of the data sources searchable through this site are available for direct download.