RIdeogram: drawing SVG graphics to visualize and map genome-wide data on idiograms

Zhaodong Hao

2020-01-20

Introduction

RIdeogram is a R package to draw SVG (Scalable Vector Graphics) graphics to visualize and map genome-wide data on idiograms.

Citation

If you use this package in a published paper, please cite this paper:

Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J. 2020. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6:e251 http://doi.org/10.7717/peerj-cs.251

Usage and Examples

This is a simple package with only three functions ideogram, convertSVG and GFFex.

First, you need to load the package after you installed it.

require(RIdeogram)#> Loading required package: RIdeogram

Then, you need to load the data from the RIdeogram package.

data(human_karyotype,package="RIdeogram")data(gene_density,package="RIdeogram")data(Random_RNAs_500,package="RIdeogram")

You can use the function “head()” to see the data format.

head(human_karyotype)#> Chr Start End CE_start CE_end#> 1 1 0 248956422 122026459 124932724#> 2 2 0 242193529 92188145 94090557#> 3 3 0 198295559 90772458 93655574#> 4 4 0 190214555 49712061 51743951#> 5 5 0 181538259 46485900 50059807#> 6 6 0 170805979 58553888 59829934

Specifically, the ‘karyotype’ file contains the karyotype information and has five columns (or three, see below). The first column is Chromosome ID, the second and thrid columns are start and end positions of corresponding chromosomes and the fourth and fifth columns are start and end positions of corresponding centromeres.

head(gene_density)#> Chr Start End Value#> 1 1 1 1000000 65#> 2 1 1000001 2000000 76#> 3 1 2000001 3000000 35#> 4 1 3000001 4000000 30#> 5 1 4000001 5000000 10#> 6 1 5000001 6000000 10

The ‘mydata’ file contains the heatmap information and has four columns. The first column is Chromosome ID, the second and thrid columns are start and end positions of windows in corresponding chromosomes and the fourth column is a characteristic value in corresponding windows, such as gene number.

head(Random_RNAs_500)#> Type Shape Chr Start End color#> 1 tRNA circle 6 69204486 69204568 6a3d9a#> 2 rRNA box 3 68882967 68883091 33a02c#> 3 rRNA box 5 55777469 55777587 33a02c#> 4 rRNA box 21 25202207 25202315 33a02c#> 5 miRNA triangle 1 86357632 86357687 ff7f00#> 6 miRNA triangle 11 74399237 74399333 ff7f00

The ‘mydata_interval’ file contains the label information and has six columns. The first column is the label type, the second column is the shape of label with three available options of box, triangle and circle, the third column is Chromosome ID, the fourth and fifth columns are the start and end positions of corresponding labels in the chromosomes and the sixth column is the color of the label.

Or, you can also load your own data by using the function read.table, such as

human_karyotype <- read.table("karyotype.txt",sep ="\t",header =T,stringsAsFactors =F)gene_density <- read.table("data_1.txt",sep ="\t",header =T,stringsAsFactors =F)Random_RNAs_500<- read.table("data_2.txt",sep ="\t",header =T,stringsAsFactors =F)

The “karyotype.txt” file contains karyotype information; the “data_1.txt” file contains heatmap data; the “data_2.txt” contains track label data.

In addition, we also provide a simple function GFFex for the heatmap information (like gene density) extraction from a GFF file. First, you need to download the GFF file of one species genome, for example, human genome annotation file from GENCODE (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.annotation.gff3.gz). Then, you need to prepare the karyotype file with the format same as the one mentioned above. The only thing you need to notice is that the chromosome ID at the first column in the karyotype file must be the same as that in the gff file (in this case, like chr1, chr2,…). Next, you can run the following code:

gene_density <- GFFex(input ="gencode.v32.annotation.gff3.gz",karyotype ="human_karyotype.txt",feature ="gene",window =1000000)

You can use the argument “feature” (default value is “gene”) to select the feature you want to extract from the GFF file and the argument “window” (default value is “1000000”) to set the window size.

Now, you can visualize these information using the ideogram function.

Basic usage

ideogram(karyotype,overlaid =NULL,label =NULL,label_type =NULL,synteny =NULL, colorset1, colorset2, width, Lx, Ly,output ="chromosome.svg")convertSVG(svg, device, width, height, dpi)

Now, let’s begin.

First, we draw a idiogram with no mapping data.

ideogram(karyotype =human_karyotype)convertSVG("chromosome.svg",device ="png")

Then, you will find a SVG file and a PNG file in your Working Directory.

RIdeogram: drawing SVG graphics to visualize and map genome-wide data on idiograms

你可能感兴趣的:(RIdeogram: drawing SVG graphics to visualize and map genome-wide data on idiograms)