iTOL基本用法已经会了,之前记录过一点:系统发育(进化)树绘制小结。最近重用,调图时又发现了些细节,记录下备忘。
1. 注册
不注册也可用,但注册登录可保存树在itol网站上。
2. 去枝长
进化树能展示枝长是最好的,能用来判断材料和群体间的特殊性。但现在大部分文章中的进化树都是去掉了枝长的,也可以理解,样本太多,展示不好看。
处理前:
处理后:
3. 加图例
对进化树添加分类后的注释,图片的图例是需要自己设置的。不然就只能导出,用其他软件添加,或用文字说明。
比如这里我注释三圈,最内圈用range,外两圈用strip(不懂的看之前文章或看itol示例注释)。两部分设置有点不同。
range的注释图例可点开legend即可。可以用鼠标自由移动图例的相对距离。
strip的注释图例,需要自定义,如果有多重注释,注意看选择的是哪个数据集就对该数据集进行设置。最后同样的,图例位置可自由移动。
4. 无根树颜色
材料过多时,我们常用无根树来展示,这时一般展示枝长。很简单,去标签,注释上色即可。
直接拖拽注释文件上色。
但是由于树枝太多密集,颜色显示不是很明显。需要做一点调整:将线型调小一点。
此外,还可以尝试做一些高级调整,如去掉标尺、旋转图片角度等。
但因为无根树去掉了标签(样本名称),所以没有range注释,只有strip类型,需要手动设置图例(见上)。或者不用图例,直接在图片备注中说明即可。
5. 导出图片
当你要导出pdf、png等有长宽位置信息类型图片时,最好选择Full image,它会自动设置配置合适你的图片大小。如果用screen,那么在网页上是什么样,导出就是什么样,你不好把握尺寸。
另外,如果是png,分辨率也有必要设置一下,尽量大点。
更新补充
iTOL真是调图神器,任何细节都可调节,而且非常美观,曾尝试过ggtree,但效果不佳。顶级文章中的进化树多用iTOL吧。不过学习依然需要成本,记录要求以备忘。
6. 无根树添加背景
使用Manual annotation,有很多选项,有固定形状或手动绘制,可多尝试。
可设置线条或背景颜色、透明度等,个性化非常多。我试了一个例子:
7. 其他细节
旋转、枝长、标签、弱化线条等。
去掉tree scale box,展示叶节点等。
8.导出文本
可以将聚类的结果按顺序导出文本,便于材料分类的后续分析,很实用的功能,再也不用一个个去对照看了。
如果你的材料聚类非常清晰,找到每类的根节点,染色将其所有枝长(clade)标色,导出的文本就含有分组信息。
如果材料聚类不是很清晰,你不太好找到根节点标色,或者不是你预期的分组,这时你也可以把树作为整体标色并导出,导出后你只需找到每类的边界样本就可辨别,而非一个个去对照。
比如:
以上示例可大体判断聚为三类,找到三类根节点,右击将其标色。
如果找不到每类的节点,把全部的树标色也行(一定要标,不然导不出文本,或者只导出部分文本)。
选中后选中右上角导出标注文本即可:
完全是按进化树聚类的顺序。这时你就可以进行下游分类分析了。
常用的几种注释
多层分类,如群体遗传中不同类型分组时(品种类型:栽培农家野生,地区:南北方,生态类型等),常常用到多层注释。不同类型可用颜色、形状等注释,可在节点、枝条、标签(样本名)、条带、阴影等地方体现。
我常用的几种注释如下:
- 标签上注释range
注释可选择某节点,或者每个样本对应的组别(推荐之),注释文件如下:
TREE_COLORS
SEPARATOR TAB
DATA
I148 range #eeffee group1
I110 range #ddddff group2
- 条带strip
条带分类,且枝条颜色和条带一致。注释文件tol_color_strip.txt:
DATASET_COLORSTRIP
#lines starting with a hash are comments and ignored during parsing
#select the separator which is used to delimit the data below (TAB,SPACE or COMMA).This separator must be used throught this file (except in the SEPARATOR line, which uses space).
#SEPARATOR TAB
SEPARATOR SPACE
#SEPARATOR COMMA
#label is used in the legend table (can be changed later)
DATASET_LABEL color_strip1
#dataset color (can be changed later)
COLOR #ff0000
#optional settings
#all other optional settings can be set or changed later in the web interface (under 'Datasets' tab)
COLOR_BRANCHES 1
#maximum width
STRIP_WIDTH 25
#left margin, used to increase/decrease the spacing to the next dataset. Can be negative, causing datasets to overlap.
MARGIN 0
#border width; if set above 0, a black border of specified width (in pixels) will be drawn around the color strip
BORDER_WIDTH 1
BORDER_COLOR #000
#show internal values; if set, values associated to internal nodes will be displayed even if these nodes are not collapsed. It could cause overlapping in the dataset display.
SHOW_INTERNAL 0
#In colored strip charts, each ID is associated to a color. Color can be specified in hexadecimal, RGB or RGBA notation
#Internal tree nodes can be specified using IDs directly, or using the 'last common ancestor' method described in iTOL help pages
#Actual data follows after the "DATA" keyword
DATA
#ID1 value1
#ID2 value2
160232 #caf390 COL#caf390
13773 #404c05 COL#404c05
只对条带分类。注释文件tol_color_strip2.txt:
#optional settings
#all other optional settings can be set or changed later in the web interface (under 'Datasets' tab)
COLOR_BRANCHES 1
#maximum width
STRIP_WIDTH 25
#left margin, used to increase/decrease the spacing to the next dataset. Can be negative, causing datasets to overlap.
MARGIN 0
#border width; if set above 0, a black border of specified width (in pixels) will be drawn around the color strip
BORDER_WIDTH 1
BORDER_COLOR #000
#show internal values; if set, values associated to internal nodes will be displayed even if these nodes are not collapsed. It could cause overlapping in the dataset display.
SHOW_INTERNAL 0
#In colored strip charts, each ID is associated to a color. Color can be specified in hexadecimal, RGB or RGBA notation
#Internal tree nodes can be specified using IDs directly, or using the 'last common ancestor' method described in iTOL help pages
#Actual data follows after the "DATA" keyword
DATA
#ID1 value1
#ID2 value2
160232 #caf390 COL#caf390
13773 #404c05 COL#404c05
- 对枝条或标签分组
对枝条注释:文件如下:
TREE_COLORS
SEPARATOR SPACE
DATA
s54 clade #377EB8 normal 2
s212 clade #377EB8 normal 2
s219 clade #377EB8 normal 2
......
对标签注释,文件如下:
TREE_COLORS
SEPARATOR SPACE
DATA
s54 clade #377EB8 normal 2
s212 clade #377EB8 normal 2
s219 clade #377EB8 normal 2
......
利用这个可实现对标签分类,通过选择at tips,或将shift调为负数。
还可对节点末端设置形状,使之区分更为明显。
当然这种方法不是很好,样本多了会显示比较乱。不如直接设置symbol来调节节点。
- (末端)节点注释symbol
对末端节点注释在无根树优化中最常见,因为无根树不能像圈图或矩形图可无限加层来添加多类注释,一般就只能通过枝条和节点来区分组别。
这个注释文件iTOL的example_data注释文件是没有提供的。我也是在官方找了很久才找到示例:https://itol.embl.de/help/dataset_symbols_template.txt
DATASET_SYMBOL
#Symbol datasets allow the display of various symbols on the branches of the tree. For each node, one or more symbols can be defined.
#Each symbol's color, size and position along the branch can be specified.
#lines starting with a hash are comments and ignored during parsing
#=================================================================#
# MANDATORY SETTINGS #
#=================================================================#
#select the separator which is used to delimit the data below (TAB,SPACE or COMMA).This separator must be used throughout this file.
#SEPARATOR TAB
#SEPARATOR SPACE
SEPARATOR COMMA
#label is used in the legend table (can be changed later)
DATASET_LABEL,example symbols
#dataset color (can be changed later)
COLOR,#ffff00
#=================================================================#
# OPTIONAL SETTINGS #
#=================================================================#
#=================================================================#
# all other optional settings can be set or changed later #
# in the web interface (under 'Datasets' tab) #
#=================================================================#
#Each dataset can have a legend, which is defined using LEGEND_XXX fields below
#For each row in the legend, there should be one shape, color and label.
#Optionally, you can define an exact legend position using LEGEND_POSITION_X and LEGEND_POSITION_Y. To use automatic legend positioning, do NOT define these values
#Optionally, shape scaling can be present (LEGEND_SHAPE_SCALES). For each shape, you can define a scaling factor between 0 and 1.
#Optionally, shapes can be inverted (LEGEND_SHAPE_INVERT). When inverted, shape border will be drawn using the selected color, and the fill color will be white.
#Shape should be a number between 1 and 6, or any protein domain shape definition.
#1: square
#2: circle
#3: star
#4: right pointing triangle
#5: left pointing triangle
#6: checkmark
#LEGEND_TITLE,Dataset legend
#LEGEND_POSITION_X,100
#LEGEND_POSITION_Y,100
#LEGEND_SHAPES,1,2,3
#LEGEND_COLORS,#ff0000,#00ff00,#0000ff
#LEGEND_LABELS,value1,value2,value3
#LEGEND_SHAPE_SCALES,1,1,0.5
#LEGEND_SHAPE_INVERT,0,0,0
#largest symbol will be displayed with this size, others will be proportionally smaller.
MAXIMUM_SIZE,50
#symbols can be filled with solid color, or a gradient
#GRADIENT_FILL,1
#Internal tree nodes can be specified using IDs directly, or using the 'last common ancestor' method described in iTOL help pages
#=================================================================#
# Actual data follows after the "DATA" keyword #
#=================================================================#
#the following fields are required for each node:
#ID,symbol,size,color,fill,position,label
#symbol should be a number between 1 and 5:
#1: rectangle
#2: circle
#3: star
#4: right pointing triangle
#5: left pointing triangle
#6: checkmark
#size can be any number. Maximum size in the dataset will be displayed using MAXIMUM_SIZE, while others will be proportionally smaller
#color can be in hexadecimal, RGB or RGBA notation. If RGB or RGBA are used, dataset SEPARATOR cannot be comma.
#fill can be 1 or 0. If set to 0, only the outline of the symbol will be displayed.
#position is a number between 0 and 1 and defines the position of the symbol on the branch (for example, position 0 is exactly at the start of node branch, position 0.5 is in the middle, and position 1 is at the end)
DATA
#Examples
#internal node will have a red filled circle in the middle of the branch
#9606|184922,2,10,#ff0000,1,0.5
#node 100379 will have a blue star outline at the start of the branch, half the size of the circle defined above (size is 5 compared to 10 above)
#100379,3,5,#0000ff,0,0
#node 100379 will also have a filled green rectangle in the middle of the branch, same size as the circle defined above (size is 10)
#100379,1,10,#00ff00,1,0.5
表头说明看上面注释就好,数据说明:#ID,symbol,size,color,fill,position,label。如100379,1,10,#00ff00,1,0.5表示第一列节点/样本,第二列形状,第三列大小(只有大小不一时才能体现,若大小设置一样,则需要导入后才能用symbol size调节),第四列颜色,第五列填充,第六列位置(0是节点起始,0.5是中间,1是末端)。