Description
There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts. However, the functional significance of these transcripts has been particularly controversial. While there are some well-characterized examples, the vast majority (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise. Here, we report a new approach to identifying large non-coding RNAs (ncRNAs) by using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified ~1600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening ncRNAs (lincRNAs) exhibit strong purifying selection in their genomic loci, exonic sequences, and promoter regions with greater than 95% showing clear evolutionary conservation. We also developed a novel functional genomics approach that assigns putative functions to each lincRNA, revealing a diverse range of roles for lincRNAs in processes from ES pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFKB, Sox2, Oc4, and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.