sed - 非交互式文本编辑器
Lee E. McMahon
Bell Laboratories Murray Hill, New Jersey 07974
翻译:寒蝉退士
译者声明:译者对译文不做任何担保,译者对译文不拥有任何权利并且不负担任何责任和义务。 原文:http://cm.bell-labs.com/7thEdMan/vol2/sed
摘要
sed 是在 UNIX ® 操作系统上运行的一个非交互式上下文编辑器。sed 被设计在下列三种情况下发挥作用:
- 1) 编辑那些对舒适的交互式编辑而言太大的文件。
- 2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。
- 3) 要在对输入的一趟扫描中有效的进行多个‘全局’编辑函数。
本备忘录是给 sed 用户的手册。
August 15, 1978
目录
- 介绍
- 1. 整体操作
- 1.1. 命令行标志
- 1.2. 编辑命令的应用次序
- 1.3. 模式空间
- 1.4. 示例
- 2. 地址: 选择要编辑的行
- 2.1. 行号地址
- 2.2. 上下文地址
- 2.3. 地址的数目
- 例子
- 3. 函数
- 3.1. 面向整行的函数
- 3.2. 替换函数
- 3.3. 输入输出函数
- 3.4. 多输入行函数
- 3.5. 保存和取回函数
- 3.6. 控制流函数
- 3.7. 杂类函数
- 引用
介绍
sed 是一个非交互式上下文(context)编辑器,它被设计在下列三种情况下发挥作用:
1) 编辑那些对舒适的交互式编辑而言太大的文件。2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。3) 要在对输入的一趟扫描中有效的进行多个‘全局’(global)编辑函数。
因为每次只把输入的某些行驻留在内存中,并且不使用临时文件,所以可编辑的文件的有效大小,只受限于输入和输出要同时共存于次级存储的要求。
可以单独的建立复杂的编辑脚本并作为给 sed 的命令文件。对于复杂的编辑,这节省了可观的键入和随之而来的错误。从命令文件运行 sed 高效于作者所知道的任何交互式编辑器,甚至包括能用预先写好的脚本驱动的编辑器。
相较于交互式编辑器而言,根本性的损失是缺乏相对地址(由于操作是每次一行的),和缺乏对命令如期运行的立即验证。
sed 是 UNIX 编辑器 ed 的直系后代。由于在交互式和非交互式操作之间的差异,在 ed 和 sed 之间已经有了可观的变化;甚至 ed 的惯常用户都会经常感到惊讶(并可能气愤),如果他们没有阅读本文档的章节 2 和 3,就草率的使用 sed 的话。在两个编辑器之间最显著的家族性共同之处,在于他们所识别的模式(‘正则表达式’)的种类;匹配模式的代码可以从 ed 的代码几乎原封不动的复制过来,在章节 2 中对正则表达式的描述就是从 UNIX Programmer’s Manual[1] 几乎原封不动的复制过来的。(代码和描述都是 Dennis M. Ritchie 写的)。
1. 整体操作
sed 缺省的把标准输入复制到标准输出,在把每行写到输出之前可能在其上进行一个或多个编辑命令。这种行为可以通过命令行上的标志来更改;参见下面的章节 1.1。
编辑命令的一般格式为:
[地址1,地址2][函数][参数]
一个或两个地址是可以省略的;地址的格式在章节 2 中给出。可以用任何数目的空白或 tab 把地址和函数分隔开。函数必须出现;在章节 3 中讨论可用的所有命令。依据给出的是哪个函数,参数可能是必需的或是可选的;它们在章节 3 中每个单独的函数之下讨论。
忽略在这些行开始处的 tab 字符和空格。
1.1. 命令行标志
在命令行上识别三个标志:
- -n:告诉 sed 不复制所有的行,只复制 p 函数或在 s 函数后 p 标志所指定的行(参见章节 3.3)。
- -e:告诉 sed 把下一个参数接受为编辑命令。
- -f:告诉 sed 把下一个参数接受为文件名;这个文件应当包含一行一个的编辑命令。
1.2. 编辑命令的应用次序
在做任何编辑之前(实际上,甚至在打开任何文件之前),所有编辑命令都被编译成了在执行阶段(在把这些命令实际应用于输入文件的行的时候)有适当效 率的形式。按它们出现的次序编译这些命令;一般而言这也是在执行时尝试应用它们的次序。这些命令一次应用一个;给每个命令的输入都是所有前面命令的输出。
编译命令应用的缺省的线性次序可以通过控制流命令 t 和 b 来变更(参见章节 3)。即使在应用次序被这些命令改变的时候,给任何命令的输入仍是任何此前应用的命令的输出。
1.3. 模式空间
模式匹配的范围叫做模式空间。一般而言,模式空间是输入文本中某一行,但是可以通过使用 N 命令把多于一行读入模式空间(参见章节 3.6.)。
1.4. 示例
例子分散在正文中。除非特别说明,例子都假定了下列输入文本:
In Xanadu did Kubla Khan A stately pleasure dome decree: Where Alph, the sacred river, ran Through caverns measureless to man Down to a sunless sea.
(在任何情况下 sed 命令的输出都不能被当作是对 Coleridge 作品的改进。)
例子:
命令
2q
会在复制了输入的前两行之后退出。输出将是:
In Xanadu did Kubla Khan A stately pleasure dome decree:
2. 地址: 选择要编辑的行
编辑命令要应用于其上的,输入文件中的行可以通过地址来选择。地址可以是行号或者是上下文地址。
通过用花括号(‘{ }’)组合(group)命令,可以用一个地址(或地址对)来控制一组命令的应用(参见章节 3.6.)。
2.1. 行号地址
行号是十进制整数。在从输入读入每一行的时候,增加一个行号计数器;行号地址匹配(选择)导致这个内部计数器等于地址行号的输入行。计数器在多个输入文件上累计运行,在打开一个新文件的时候它不被复零(reset)。
作为特殊情况,字符 $ 匹配输入文件的最后一行。
2.2. 上下文地址
上下文地址是包围在斜杠中(‘/’)的模式(‘正则表达式’)。sed 识别的正则表达式被构造如下:
- 1) 普通字符(不是下面讨论的某个字符)是一个正则表达式,并且匹配这个字符。
- 2) 在正则表达式开始处的‘^’符号(circumflex)匹配在行开始处的空(null)字符。
- 3) 在正则表达式结束处的美元符号‘$’匹配在行结束处的空字符。
- 4) 字符‘/n’匹配内嵌的换行字符,而不是在模式空间结束处的换行。
- 5) 点‘.’匹配除了模式空间的终止换行之外的任何字符。
- 6) 跟随着星号‘*’的正则表达式,匹配它所跟丛的正则表达式的任何数目(包括 0)的毗连出现。
- 7) 在方括号‘[ ]’内的字符串,匹配在字符串内的任何字符,而非其他。但是如果这个字符串的第一个字符是‘^’符号,正则表达式匹配除了在这个字符串内的字符和模式空间的终止换行之外的任何字符。
- 8) 正则表达式的串联(concatenation)是正则表达式,它匹配这个正则表达式的成员所匹配的字符串的串联。
- 9) 在顺序的‘/(’和‘/)’之间的正则表达式,在效果上等同于没有它修饰的正则表达式,但它有个副作用,将在下面的 s 命令和紧后面的规定 10 中描述。
- 10) 表达式‘/d’意味着与在同一个表达式中先前的‘/(’和‘/)’中包围的表达式所匹配的那些字符同样的字符串。这里的 d 是一个单一的数字;指定的字符串是‘/(’的从左至右的第 d 个出现所起始的字符串。例如,表达式‘^/(.*/)/1’匹配开始于同一个字符串的两次重复出现的行。
- 11) 孤立的空正则表达式(就是‘//’)等价于编译的最后一个正则表达式。
要使用特殊字符(^ $ . * [ ] / /)中的某一个字符作为文字(去匹配输入中它们自身的出现),要对这个特殊字符前导一个反斜杠‘/’。
上下文地址‘匹配’输入要求地址内的整个模式匹配模式空间的某个部分。
2.3. 地址的数目
在下一章节中的命令可能有 0, 1 或 2 个地址。在每个命令中都给出了允许的地址的最大数目。地址多于最大允许个数的命令被认为是错误的。
如果命令没有地址,它应用于输入中每个行。
如果命令有一个地址,它应用于匹配这个地址的所有行。
如果命令有两个地址,它应用于匹配第一个地址的第一行,和直到(并包括)匹配第二个地址的第一个后续行的所有后续行。接着在后续的行上再次尝试匹配第一个地址,并重复这个处理。
两个地址用逗号分隔。
例子:
/an/ 匹配我们样例文本的第 1, 3, 4 行 /an.*an/ 匹配第 1 行 /^an/ 没有匹配行 /./ 匹配所有行 //./ 匹配第 5 行 /r*an/ 匹配第 1,3, 4 行(number = zero!) //(an/).*/1/ 匹配第 1 行
3. 函数
所有函数都用一个单一字符来命名。在下面的总结中,允许地址的最大数目在成对的圆括号内给出,接着的单一字符是函数名字,可能有的参数包围在成对的 尖括号(< >)内,单一字符名字的英语解释,并在最后描述每个函数做些什么。在参数外围的尖括号不是参数的一部分,在实际编辑命令中不应该键入。
3.1. 面向整行的函数
(2)d -- delete lines
d 函数从文件中删除(不写入输出)匹配它的地址的所有行。
它还有一个副作用,在这个已删除的行上将不再尝试进一步的命令;在执行了 d 之后,马上就从输入读取一个新行,在新行上从头重新启动编辑命令列表。
(2)n -- next line
n 函数从输入读取下一行,替代当前行。当前行被写入输出,如果应该的话。继续执行编辑命令列表在 n 命令之后的部分。
(1)a/ <文本> -- append lines
a 函数导致在匹配它的地址的行之后把参数<文本>写入输出。a 命令是天生多行的;a 必须出现在一行的结束处,而<文本>可以包含任意数目的行。为了保持一行一个命令的构想,内部的换行必须用给换行立即前导上反斜杠字符(‘/ ’)的方式来隐藏。<文本>参数终止于第一个未隐藏的换行(没有立即前导反斜杠的第一个换行)。
一旦 a 函数成功的执行了,<文本>将被写入输出,而不管后来的命令对触发它的行会做些什么。触发的行可以被完全删除掉;而<文本>仍会被写入输出。
<文本>不被地址匹配所扫描,不尝试对它做编辑命令。它不引起行号计数器的任何变化。
(1)i/ <文本> -- insert lines
i 函数表现得等同于 a 函数,除了<文本>在匹配行之前写入输出之外。关于 a 函数的所有其他注释同样适用于 i 函数。
(2)c/ <文本> -- change lines
c 函数删除它的地址所选择的那些行,并把它们替代为在<文本>中的行。象 a 和 i 一样,c 必须跟随着被反斜杠隐藏了的换行;并且在<文本>中的内部的换行必须用反斜杠隐藏。
c 命令可以有两个地址,所以可选择一定范围内的行。如果找到,在这个范围内的所有行都被删除,只把<文本>的一个复本写入输出,而不是对每个删 除的行都写一个复本。同于 a 和 i,<文本>不被地址匹配所扫描,不尝试对它做编辑命令。它不引起行号计数器的任何变化。
在一行已经被 c 函数删除之后,在这个已删除的行上将不再尝试进一步的命令。
如果 a 或 r 函数在某一行之后添加了文本,而这一行随后被 c 函数变更了,则 c 函数所插入的文本将会放置在 a 或 r 函数的文本之前。(r 函数在章节 3.4. 中描述)。
注意: 在这些函数放入输出的文本内,前导的空白和 tab 都会消失,象 sed 的编辑命令一样。要把前导的空白和 tab 放入输出中,需要在想要的第一个空白或 tab 之前前导反斜杠;这个反斜杠不会出现在输出中。
例子:
编辑命令的列表:
n a/ XXXX d
应用于我们的标准输入,生成:
In Xanadu did Kubhla Khan XXXX Where Alph, the sacred river, ran XXXX Down to a sunless sea.
在这个特定情况下,下面两列命令列表会生成同样的效果:
n n i/ c/ XXXX XXXX d
3.2. 替换函数
这是一个非常重要的函数,它改变在一行之内通过上下文查找而选择出的这一行的某部分。
(2)s<模式><替代><标志> -- substitute
s 函数替代行的(通过<模式>选择的)某部分为<替代>。它可以读做:
替换<模式>为<替代>
<模式>参数包含一个模式,它完全等同于地址中的模式(参见章节 2.2)。在<模式>和上下文地址之间的唯一区别是上下文地址必须用斜杠字符(‘/’)来界定;<模式>可以用不是空格或换行的任何其他字符来界定。
缺省的,只替换匹配<模式>的第一个字符串,参见后面的 g 标志。
<替代>参数紧接着<模式>的第二个分界字符之后开始,并且它必须立即跟随着分界字符的另一个实例。(所以准确的有三个分 界字符的实例)。<替代>不是模式,在模式中有特殊意义的字符在<替代>中没有特殊意义。反而有特殊意义的字符是:
- & 被替代为匹配<模式>的字符串。
- /d (这里的 d 是一个单一的数字)被替代为同<模式>中第 d 个包围在‘/(’和‘/)’内的部分相匹配的子串。如果在<模式>中出现嵌套的子串,第 d 个通过计数开分界符 (‘/(’)来界定。同在模式中一样,特殊字符可以通过前导反斜杠(‘/’)来变为文字。
<标志>参数可以包含任何下列标志:
g -- 把此行中<模式>的所有(不重叠)的实例都替换为<替代>,对<模式>的下一个实例的扫描就开始于插入的 这些字符之后;放置入行中的来自<替代>的字符不会被重新扫描。
p -- 打印此行,如果做了成功替换的话。p 标志导致把输入行写入输出,当且仅当这个 s 函数实际上做了替换。注意如果有多个 s 函数,每个函数都跟随着 p 标志,它们都在同一个输入行上成功的做了替换,会把这一行的多个复本写到输出: 每个成功的替换都写一个复本。
w <文件名> -- 把此行写入一个文件,如果做了成功的替换的话。w 标志导致实际上被 s 函数替代了那些行被写到<文件名>所指名的文件中。如果<文件名>在 sed 运行前就存在,则覆盖它。否则,就建立它。
必须用一个单一的空格分隔 w 和<文件名>。
同 p 一样有着写入一个输入行的多个略有不同的复本的可能性。
在 w 标志和 w 函数(参见后面章节)之后可以提及的不同的文件名字合起来的最大数目为 10 个。
例子:
把下列命令应用于我们的标准输入,
s/to/by/w changes
生成,在标准输出上:
In Xanadu did Kubhla Khan A stately pleasure dome decree: Where Alph, the sacred river, ran Through caverns measureless by man Down by a sunless sea.
在文件‘changes’中:
Through caverns measureless by man Down by a sunless sea.
如果不复制选项生效,命令:
s/[.,;?:]/*P&*/gp
生成:
A stately pleasure dome decree*P:* Where Alph*P,* the sacred river*P,* ran Down to a sunless sea*P.*
最后为了展示 g 标志的效果,命令:
/X/s/an/AN/p
生成(假定不复制模式):
In XANadu did Kubhla Khan
而命令:
/X/s/an/AN/gp
生成:
In XANadu did Kubhla KhAN
3.3. 输入输出函数
(2)p -- print
打印函数把寻址到的行写到标准输出文件。在遇到 p 函数的时候就写入它们,而不管后续的编辑命令对这些行会做些什么。
(2)w <文件名> -- write on <filename>
写函数把寻址到的行写到<文件名>指名的文件中。如果这个文件以前就存在,则覆盖它;否则,就建立它。每行都按遇到写函数时现存的样子 写入,而不管后续的编辑命令对这些行会做些什么。必须用精确的一个空格分隔 w 和<文件名>。在 s 函数的 w 标志之后和写函数中可以提及的不同的文件名字合起来的最大数目为 10 个。
(1)r <文件名> -- read the contents of a file
读函数读入<文件名>的内容,并把它们添加到匹配这个地址的行的后面。读取这个文件并添加它的内容,而不管后续的编辑命令对匹配它的地 址的这些行会做些什么。如果 r 和 a 函数在同一行上执行,来自 a 函数和 r 函数的文本按照这些函数执行的次序写入输出。必须用精确的一个空格分隔 r 和<文件名>。如果 r 函数提及的文件不能打开,它被当作一个空文件,而不是一个错误,所以不给出诊断信息。
注意: 因为对可以同时打开的文件数目是有所限制的,要小心在 w 命令或标志中不要提及多于 10 个(不同的)文件;如果有任何 r 函数出现,这个数目还会再减少一个。(在一个时候只能打开一个读取文件)。
例子
假定文件‘note1’有如下内容:
Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.
则下列命令:
/Kubla/r note1
生成:
In Xanadu did Kubla Khan Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China. A stately pleasure dome decree: Where Alph, the sacred river, ran Through caverns measureless to man Down to a sunless sea.
3.4. 多输入行函数
有三个用大写字母拼写的函数特殊处理包含内嵌换行的模式空间;它们主要意图提供跨越输入中的行的模式匹配。
(2)N -- Next line
在模式空间中把下一行添加到当前行之后;两个输入行用一个内嵌的换行分隔。模式匹配可以延伸跨越这个内嵌换行。
(2)D -- Delete first part of the pattern space
删除当前模式空间中直到并包括第一个换行字符的所有字符。如果这个模式空间变成了空的(唯一的换行是终止换行),则从输入读取另一行。在任何情况下,都再次从编辑命令列表的起始处开始执行。
(2)P -- Print first part of the pattern space
打印模式空间中的直到并包括第一个换行的所有字符。
P 和 D 函数等价于它们对应的小写函数,如果在模式空间中没有内嵌换行的话。
3.5. 保存和取回函数
有四个函数为将来的使用而保存和取回部分输入。
(2)h -- hold pattern space
h 函数把模式空间的内容复制到保存区域(销毁保存区域以前的内容)。
(2)H -- Hold pattern space
H 函数把模式空间的内容添加到保存区域的内容之后;以前和新的内容用换行分隔。
(2)g -- get contents of hold area
g 函数把保存区域的内容复制到模式空间(销毁模式空间以前的内容)。
(2)G -- Get contents of hold area
G 函数把保存区域的内容添加到模式空间的内容之后;以前和新的内容用换行分隔。
(2)x -- exchange
对换命令交换模式空间和保存区域的内容。
例子
命令
1h 1s/ did.*// 1x G s//n/ :/
应用于我们的标准例子,生成:
In Xanadu did Kubla Khan :In Xanadu A stately pleasure dome decree: :In Xanadu Where Alph, the sacred river, ran :In Xanadu Through caverns measureless to man :In Xanadu Down to a sunless sea. :In Xanadu
3.6. 控制流函数
这些函数不在输入行上做编辑,但是控制函数到地址部分所选择的行的应用。
(2)! -- Don’t
非命令导致(写在同一行上的)下一个命令,应用到所有的且只能是未被地址部分选择到那些输入行上。
(2){ -- Grouping
组合命令‘{’导致下一组命令作为一个块而被应用(或不应用)到组合命令的地址所选择的输入行上。在组合控制下的的命令中的第一个命令可以出现在与‘{’相同的一行或下一行上。
组合的命令由自己独立在一行之上的相匹配的‘}’终止。
组合可以嵌套。
(0):<标号> -- place a label
标号函数在编辑命令列表中标记一个位置,它将来可以被 b 和 t 函数所引用。<标号>可以是八个或更少的字符的任何序列;如果两个不同的冒号函数有相同的标号,就会生成编译时间诊断信息,而不做执行尝试。
(2)b<标号> -- branch to label
分支函数导致应用于当前输入行上的编辑命令序列,被立即重新启动到有相同的<标号>的冒号函数的所在位置之后。如果在所有编辑命令都已经被编译了之后仍没有找到有相同的标号的冒号函数,就会生成一个编译时间诊断信息,而不做执行尝试。
不带有<标号>的 b 函数被当作到编辑命令列表结束处的分支;对当前输入行做应做的无论怎样的处理,并读入其他输入行;编辑命令的列表在这个新行上从头重新启动。
(2)t<标号> -- test substitutions
t 函数测试在当前输入行上是否已经做了任何成功的替换;如果有,它分支到<标号>;否则,它什么都不做。指示已经执行了成功的替换的标志通过如下方式复零:
- 1) 读取一个新输入行,或
- 2) 执行 a 和 t 函数。
3.7. 杂类函数
(1)= -- equals
= 函数向标准输出写入匹配它的地址的行的行号。
(1)q -- quit
q 函数导致把当前行写到标准输出(如果应该的话),任何添加的或读入的文本也被写出,而且执行会被终止。
引用
[1] Ken Thompson and Dennis M. Ritchie, The UNIX Programmer’s Manual. Bell Laboratories, 1978.
|
|
原文地址 http://cm.bell-labs.com/7thEdMan/vol2/sed |
发表于: 2006-06-27,修改于: 2006-07-04 20:57,已浏览8926次,有评论1条 推荐 投诉 |
|
|
网友: lgfang |
时间:2006-06-28 11:20:13 IP地址:192.11.188.★ |
|
|
|
原文是man-page,无法直接看,我用emacs把它格式化了:
SED -- A Non-interactive Text Editor
Lee E. McMahon
Context search Editing
Sed is a non-interactive context editor that runs on the
operating system. Sed is designed to be especially useful in
three cases:
1) To edit files too large for comfortable interactive editing;
2) To edit any size file when the sequence of editing commands is
too complicated to be comfortably typed in interactive mode.
3) To perform multiple `global' editing functions efficiently in
one pass through the input.
This memorandum constitutes a manual for users of sed.
Introduction
Sed is a non-interactive context editor designed to be
especially useful in three cases:
1) To edit files too large for comfortable interactive editing;
2) To edit any size file when the sequence of editing commands is
too complicated to be comfortably typed in interactive mode;
3) To perform multiple `global' editing functions efficiently in
one pass through the input.
Since only a few lines of the input reside in core at one
time, and no temporary files are used, the effective size of
file that can be edited is limited only by the requirement
that the input and output fit simultaneously into available
secondary storage.
Complicated editing scripts can be created separately and
given to sed as a command file. For complex edits, this
saves considerable typing, and its attendant errors. Sed
running from a command file is much more efficient than any
interactive editor known to the author, even if that editor
can be driven by a pre-written script.
The principal loss of functions compared to an interactive
editor are lack of relative addressing (because of the
line-at-a-time operation), and lack of immediate
verification that a command has done what was intended.
Sed is a lineal descendant of the UNIX editor, ed. Because
of the differences between interactive and non-interactive
operation, considerable changes have been made between ed
and sed; even confirmed users of ed will frequently be
surprised (and probably chagrined), if they rashly use sed
without reading Sections 2 and 3 of this document. The most
striking family resemblance between the two editors is in
the class of patterns (`regular expressions') they
recognize; the code for matching patterns is copied almost
verbatim from the code for ed, and the description of
regular expressions in Section 2 is copied almost verbatim
from the UNIX Programmer's Manual[1]. (Both code and
description were written by Dennis M. Ritchie.)
1. Overall Operation
Sed by default copies the standard input to the standard
output, perhaps performing one or more editing commands on
each line before writing it to the output. This behavior
may be modified by flags on the command line; see Section
1.1 below.
The general format of an editing command is:
[address1,address2][function][arguments]
One or both addresses may be omitted; the format of addresses is
given in Section 2. Any number of blanks or tabs may separate
the addresses from the function. The function must be present;
the available commands are discussed in Section 3. The arguments
may be required or optional, according to which function is
given; again, they are discussed in Section 3 under each
individual function.
Tab characters and spaces at the beginning of lines are
ignored.
1.1. Command-line Flags
Three flags are recognized on the command line:
-n:
tells sed not to copy all lines, but only those
specified by p functions or p flags after s
functions (see Section 3.3);
-e:
tells sed to take the next argument as an editing
command;
-f:
tells sed to take the next argument as a file
name; the file should contain editing commands,
one to a line.
1.2. Order of Application of Editing Commands
Before any editing is done (in fact, before any input file
is even opened), all the editing commands are compiled into
a form which will be moderately efficient during the
execution phase (when the commands are actually applied to
lines of the input file). The commands are compiled in the
order in which they are encountered; this is generally the
order in which they will be attempted at execution time.
The commands are applied one at a time; the input to each
command is the output of all preceding commands.
The default linear order of application of editing commands
can be changed by the flow-of-control commands, t and b (see
Section 3). Even when the order of application is changed
by these commands, it is still true that the input line to
any command is the output of any previously applied command.
1.3. Pattern-space
The range of pattern matches is called the pattern space.
Ordinarily, the pattern space is one line of the input text,
but more than one line can be read into the pattern space by
using the N command (Section 3.6.).
1.4. Examples
Examples are scattered throughout the text. Except where
otherwise noted, the examples all assume the following input
text:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
(In no case is the output of the sed commands to be
considered an improvement on Coleridge.)
Example:
The command
2q
will quit after copying the first two lines of the input.
The output will be:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
2. ADDRESSES: Selecting lines for editing
Lines in the input file(s) to which editing commands are to
be applied can be selected by addresses. Addresses may be
either line numbers or context addresses.
The application of a group of commands can be controlled by
one address (or address-pair) by grouping the commands with
curly braces (`{ }')(Sec. 3.6.).
2.1. Line-number Addresses
A line number is a decimal integer. As each line is read
from the input, a line-number counter is incremented; a
line-number address matches (selects) the input line which
causes the internal counter to equal the address
line-number. The counter runs cumulatively through multiple
input files; it is not reset when a new input file is
opened.
As a special case, the character ___FCKpd___46nbsp;matches the last line of
the last input file.
2.2. Context Addresses
A context address is a pattern (`regular expression')
enclosed in slashes (`/'). The regular expressions
recognized by sed are constructed as follows:
1) An ordinary character (not one of those discussed below) is a
regular expression, and matches that character.
2) A circumflex `^' at the beginning of a regular expression
matches the null character at the beginning of a line.
3) A dollar-sign ` |
|
|
at the end of a regular expression matches
the null character at the end of a line.
4) The characters `/n' match an imbedded newline character, but
not the newline at the end of the pattern space.
5) A period `.' matches any character except the terminal newline
of the pattern space.
6) A regular expression followed by an asterisk `*' matches any
number (including 0) of adjacent occurrences of the
regular expression it follows.
7) A string of characters in square brackets `[ ]' matches any
character in the string, and no others. If, however, the
first character of the string is circumflex `^', the regular
expression matches any character except the characters in
the string and the terminal newline of the pattern space.
8) A concatenation of regular expressions is a regular expression
which matches the concatenation of strings matched by
the components of the regular expression.
9) A regular expression between the sequences `/(' and `/)' is
identical in effect to the unadorned regular expression,
but has side-effects which are described under the s
command below and specification 10) immediately below.
10) The expression `/d' means the same string of characters
matched by an expression enclosed in `/(' and `/)' earlier
in the same pattern. Here d is a single digit; the string
specified is that beginning with the dth occurrence of `/('
counting from the left. For example, the expression
`^/(.*/)/1' matches a line beginning with two repeated
occurrences of the same string.
11) The null regular expression standing alone (e.g., `//') is
equivalent to the last regular expression compiled.
To use one of the special characters (^ ___FCKpd___46nbsp;. * [ ] / /) as a
literal (to match an occurrence of itself in the input),
precede the special character by a backslash `/'.
For a context address to `match' the input requires that the
whole pattern within the address match some portion of the
pattern space.
2.3. Number of Addresses
The commands in the next section can have 0, 1, or 2
addresses. Under each command the maximum number of allowed
addresses is given. For a command to have more addresses
than the maximum allowed is considered an error.
If a command has no addresses, it is applied to every line
in the input.
If a command has one address, it is applied to all lines
which match that address.
If a command has two addresses, it is applied to the first
line which matches the first address, and to all subsequent
lines until (and including) the first subsequent line which
matches the second address. Then an attempt is made on
subsequent lines to again match the first address, and the
process is repeated.
Two addresses are separated by a comma.
Examples:
/an/ matches lines 1, 3, 4 in our sample text
/an.*an/ matches line 1
/^an/ matches no lines
/./ matches all lines
//./ matches line 5
/r*an/ matches lines 1,3, 4 (number = zero!)
//(an/).*/1/ matches line 1
3. FUNCTIONS
All functions are named by a single character. In the
following summary, the maximum number of allowable addresses
is given enclosed in parentheses, then the single character
function name, possible arguments enclosed in angles (< >),
an expanded English translation of the single-character
name, and finally a description of what each function does.
The angles around the arguments are not part of the
argument, and should not be typed in actual editing
commands.
3.1. Whole-line Oriented Functions
(2)d -- delete lines The d function deletes from the
file (does not write to the output) all those
lines matched by its address(es). It also
has the side effect that no further commands
are attempted on the corpse of a deleted line;
as soon as the d function is executed, a new line
is read from the input, and the list of
editing commands is re-started from the
beginning on the new line.
(2)n -- next line The n function reads the next line
from the input, replacing the current line.
The current line is written to the output if it
should be. The list of editing commands is
continued following the n command.
(1)a/
<text> -- append lines
The a function causes the argument <text> to be
written to the output after the line matched by
its address. The a command is inherently
multi-line; a must appear at the end of a line,
and <text> may contain any number of lines. To
preserve the one-command-to-a-line fiction, the
interior newlines must be hidden by a backslash
character (`/') immediately preceding the newline.
The <text> argument is terminated by the first
unhidden newline (the first one not immediately
preceded by backslash). Once an a function is
successfully executed, <text> will be written to
the output regardless of what later commands do to
the line which triggered it. The triggering line
may be deleted entirely; <text> will still be
written to the output. The <text> is not scanned
for address matches, and no editing commands are
attempted on it. It does not cause any change in
the line-number counter.
(1)i/
<text> -- insert lines
The i function behaves identically to the a
function, except that <text> is written to the
output before the matched line. All other
comments about the a function apply to the i
function as well.
(2)c/
<text> -- change lines
The c function deletes the lines selected by its
address(es), and replaces them with the lines in
<text>. Like a and i, c must be followed by a
newline hidden by a backslash; and interior new
lines in <text> must be hidden by backslashes.
The c command may have two addresses, and
therefore select a range of lines. If it does,
all the lines in the range are deleted, but only
one copy of <text> is written to the output, not
one copy per line deleted. As with a and i,
<text> is not scanned for address matches, and no
editing commands are attempted on it. It does not
change the line-number counter. After a line has
been deleted by a c function, no further commands
are attempted on the corpse. If text is appended
after a line by a or r functions, and the line is
subsequently changed, the text inserted by the c
function will be placed before the text of the a
or r functions. (The r function is described in
Section 3.4.)
Note: Within the text put in the output by these functions,
leading blanks and tabs will disappear, as always in sed
commands. To get leading blanks and tabs into the output,
precede the first desired blank or tab by a backslash; the
backslash will not appear in the output.
Example:
The list of editing commands:
n
a/
XXXX
d
applied to our standard input, produces:
In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.
In this particular case, the same effect would be produced
by either of the two following command lists:
n n
i/ c/
XXXX XXXX
d
3.2. Substitute Function
One very important function changes parts of lines selected
by a context search within the line.
(2)s<pattern><replacement><flags> -- substitute The s
function replaces part of a line
(selected by <pattern>) with <replacement>. It
can best be read:
Substitute for <pattern>, <replacement>
The <pattern> argument contains a pattern, exactly
like the patterns in addresses (see 2.2 above).
The only difference between <pattern> and a
context address is that the context address must
be delimited by slash (`/') characters; <pattern>
may be delimited by any character other than space
or newline. By default, only the first string
matched by <pattern> is replaced, but see the g
flag below. The <replacement> argument begins
immediately after the second delimiting character
of <pattern>, and must be followed immediately by
another instance of the delimiting character.
(Thus there are exactly three instances of the
delimiting character.) The <replacement> is not a
pattern, and the characters which are special in
patterns do not have special meaning in
<replacement>. Instead, other characters are
special:
& is replaced by the string matched
by <pattern>
/d (where d is a single digit) is replaced by
the dth substring matched by
parts of <pattern> enclosed in `/('
and `/)'. If nested substrings occur
in <pattern>, the dth is determined by
counting opening delimiters (`/('). As
in patterns, special characters may be
made literal by preceding them with
backslash (`/').
The <flags> argument may contain the following
flags:
g -- substitute <replacement> for all
(non-overlapping) instances of
<pattern> in the line. After a
successful substitution, the scan for
the next instance of <pattern> begins
just after the end of the inserted
characters; characters put into the line
from <replacement> are not rescanned.
p -- print the line if a successful
replacement was done. The p flag
causes the line to be written to the
output if and only if a substitution was
actually made by the s function.
Notice that if several s
functions, each followed by a p
flag, successfully substitute in the
same input line, multiple copies of
the line will be written to the
output: one for each successful
substitution.
w <filename> -- write the line to a file if a
successful replacement was done. The w
flag causes lines which are actually
substituted by the s function to be
written to a file named by <filename>.
If <filename> exists before sed is run,
it is overwritten; if not, it is
created. A single space must separate
w and <filename>. The
possibilities of multiple, somewhat
different copies of one input line
being written are the same as for p. A
maximum of 10 different file names may
be mentioned after w flags and w
functions (see below), combined.
Examples:
The following command, applied to our standard input,
s/to/by/w changes
produces, on the standard output:
In Xanadu did Kubhla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.
and, on the file `changes':
Through caverns measureless by man
Down by a sunless sea.
If the nocopy option is in effect, the command:
s/[.,;?:]/*P&*/gp
produces:
A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*
Finally, to illustrate the effect of the g flag, the command:
/X/s/an/AN/p
produces (assuming nocopy mode):
In XANadu did Kubhla Khan
and the command:
/X/s/an/AN/gp
produces:
In XANadu did Kubhla KhAN
3.3. Input-output Functions
(2)p -- print The print function writes the addressed
lines to the standard output file. They are
written at the time the p function is
encountered, regardless of what succeeding editing
commands may do to the lines.
(2)w <filename> -- write on <filename> The write
function writes the addressed lines to the file
named by <filename>. If the file previously
existed, it is overwritten; if not, it is
created. The lines are written exactly as they
exist when the write function is encountered
for each line, regardless of what subsequent
editing commands may do to them. Exactly one
space must separate the w and <filename>. A
maximum of ten different files may be
mentioned in write functions and w flags
after s functions, combined.
(1)r <filename> -- read the contents of a file The read
function reads the contents of <filename>, and
appends them after the line matched by the
address. The file is read and appended
regardless of what subsequent editing commands
do to the line which matched its address. If
r and a functions are executed on the same line,
the text from the a functions and the r functions
is written to the output in the order that
the functions are executed. Exactly one
space must separate the r and <filename>. If a
file mentioned by a r function cannot be opened,
it is considered a null file, not an error, and no
diagnostic is given.
NOTE: Since there is a limit to the number of files that can
be opened simultaneously, care should be taken that no more
than ten files be mentioned in w functions or flags; that
number is reduced by one if any r functions are present.
(Only one read file is open at one time.)
Examples
Assume that the file `note1' has the following contents:
Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent
successor of Genghiz (Chingiz) Khan, and founder
of the Mongol dynasty in China.
Then the following command:
/Kubla/r note1
produces:
In Xanadu did Kubla Khan
Note: Kubla Khan (more properly Kublai Khan;
1216-1294) was the grandson and most eminent
successor of Genghiz (Chingiz) Khan, and founder
of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
3.4.
Multiple Input-line Functions
Three functions, all spelled with capital letters, deal
specially with pattern spaces containing imbedded newlines;
they are intended principally to provide pattern matches
across lines in the input.
(2)N -- Next line The next input line is appended to
the current line in the pattern space; the two
input lines are separated by an imbedded
newline. Pattern matches may extend across the
imbedded newline(s).
(2)D -- Delete first part of the pattern space Delete
up to and including the first newline character
in the current pattern space. If the pattern
space becomes empty (the only newline was the
terminal newline), read another line from the
input. In any case, begin the list of editing
commands again from its beginning.
(2)P -- Print first part of the pattern space Print up
to and including the first newline in the
pattern space.
The P and D functions are equivalent to their lower-case
counterparts if there are no imbedded newlines in the pattern
space.
3.5. Hold and Get Functions
Four functions save and retrieve part of the input for
possible later use.
(2)h -- hold pattern space The h functions copies the
contents of the pattern space into a hold area
(destroying the previous contents of the hold area).
(2)H -- Hold pattern space The H function appends the
contents of the pattern space to the contents of the
hold area; the former and new contents are
separated by a newline.
(2)g -- get contents of hold area The g function copies the
contents of the hold area into the pattern space
(destroying the previous contents of the pattern
space).
(2)G -- Get contents of hold area The G function appends the
contents of the hold area to the contents of the
pattern space; the former and new contents are
separated by a newline.
(2)x -- exchange The exchange command interchanges the
contents of the pattern space and the hold area.
Example
The commands
1h
1s/ did.*//
1x
G
s//n/ :/
applied to our standard example, produce:
In Xanadu did Kubla Khan :In Xanadu
A stately pleasure dome decree: :In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :In Xanadu
Down to a sunless sea. :In Xanadu
3.6. Flow-of-Control Functions
These functions do no editing on the input lines, but
control the application of functions to the lines selected
by the address part.
(2)! -- Don't The Don't command causes the next command
(written on the same line), to be applied to
all and only those input lines not selected by the
adress part.
(2){ -- Grouping The grouping command `{' causes the
next set of commands to be applied (or not
applied) as a block to the input lines selected by
the addresses of the grouping command. The first
of the commands under control of the grouping may
appear on the same line as the `{' or on the next
line.
The group of commands is terminated by a matching `}'
standing on a line by itself.
Groups can be nested.
(0):<label> -- place a label The label function marks a place in
the list of editing commands which may be referred to by b
and t functions. The <label> may be any sequence of
eight or fewer characters; if two different colon
functions have identical labels, a compile time
diagnostic will be generated, and no execution attempted.
(2)b<label> -- branch to label The branch function causes the
sequence of editing commands being applied to the current
input line to be restarted immediately after the place
where a colon function with the same <label> was
encountered. If no colon function with the same label
can be found after all the editing commands have been
compiled, a compile time diagnostic is produced, and
no execution is attempted. A b function with no <label>
is taken to be a branch to the end of the list of editing
commands; whatever should be done with the current input
line is done, and another input line is read; the list
of editing commands is restarted from the beginning on the
new line.
(2)t<label> -- test substitutions The t function tests whether
any successful substitutions have been made on the current
input line; if so, it branches to <label>; if not, it
does nothing. The flag which indicates that a successful
substitution has been executed is reset by:
1) reading a new input line, or
2) executing a t function.
3.7. Miscellaneous Functions
(1)= -- equals The = function writes to the standard
output the line number of the line matched by
its address.
(1)q -- quit The q function causes the current line to
be written to the output (if it should be),
any appended or read text to be written, and
execution to be terminated.
.SH
Reference
[1] Ken Thompson and Dennis M. Ritchie, The UNIX
Programmer's Manual. Bell Laboratories, 1978.