vim中的文本文件格式---fileformat

 

http://hi.baidu.com/houhou1999/blog/item/4983d41b680825c5ac6e75cd.html

 

今天逛vimwiki,发现一篇好文章,讲vim的fileformat。如果程序员的开发环境是unix和dos并存,那么你绝对会碰到这个问题。比如,让你心烦的行尾的^M,而且这些^M会导致一些脚本不能正常执行。

文章很长,我来个摘要。原文网址是 http://vim.wikia.com/wiki/VimTip1585

所谓fileformat就讲的是个行结束符的问题。行结束符在显示的时候是不显示的。由于历史原因,dos,unix,mac采用了不同的行结束符。(何时天下大统?)据说这些行结束符和古董打字机还有关系。这让我想起了一个关于火箭和马屁股的故事。

unix LF only (each line ends with an LF character).
dos CRLF (each line ends with two characters, CR then LF).
mac CR only (each line ends with a CR character).

我庆幸还没有同时在三个环境中写程序。可以使用:set ff 查看当前文件fileformat,使用:set ffs查看vim设置。

dos格式读取unix格式文件没有问题。unix读取dos格式文件,会出现^M。它会使某些脚本执行出错,会让你看起来不爽。vim本身是会自动判断文件格式的。

ffs=unix,dos Unix based systems
ffs=dos,unix Windows and DOS systems
ffs=mac,unix,dos Mac OS 9 systems

但是除了平台之外,vim本身还有个偏好

  • If all lines in the file end with CRLF, the dos file format will be applied, meaning that each CRLF is removed when reading the lines into a buffer, and the buffer 'ff' option will be dos.
  • If one or more lines end with LF only, the unix file format will be applied, meaning that each LF is removed (but each CR will be present in the buffer, and will display as ^M ), and the buffer 'ff' option will be unix.

平时vim浏览文件时,如果CR和CRLF同时存在,你会看到很多^M。可以用如下方法将格式统一,

Convert from dos/unix to unix

To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with LF only:

 

:update Save any changes.
:e ++ff=dos Edit file again, using dos file format ('fileformats' is ignored).[A 1]
:setlocal ff=unix This buffer will use LF-only line endings when written.[A 2]
:w Write buffer using unix (LF-only) line endings.

In the above, replacing :set ff=unix with :set ff=mac would write the file with mac (CR-only) line endings. Or, if it was a mac file to start with, you would use :e ++ff=mac to read the file correctly, so you could convert the line endings to unix or dos.

Convert from dos/unix to dos

To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with CRLF only:

 

:update Save any changes.
:e ++ff=dos Edit file again, using dos file format ('fileformats' is ignored).[A 1]
:w

Write buffer using dos (CRLF) line endings.

 

Notes A
  1. ^ a b The :e command reads the current file again, using the ++ff=dos option so the read will omit all CRLF and LF-only line terminators (dos file format). Each ^M at the end of a line should disappear. Some older versions of Vim do not perform this step correctly and the ^M endings are not removed; upgrade Vim to fix. :help :e
  2. ^ Use :setlocal (or :setl ) to avoid changing the global default.

当然,有的时候,还是没有彻底把^M干掉。终极要义是用正则表达式 将他们替换掉。

First ensure you have read the file with the appropriate file format. For example, use :e ++ff=dos to remove all CRLF and LF-only line terminators, or use :e ++ff=mac if the file uses CR as a line terminator,.

After reading with the correct file format, the buffer may still contain unwanted CR characters. You can search for these with /\r (slash starts a search; backslash r represents CR when searching; press Enter to search).

To delete ^M at line endings, and replace it with a space everywhere else (the c flag will prompt to confirm that you want each replacement, and the e flag prevents an error message if the string is not found):

:%s/\r\+$//e
:%s/\r/ /gce


To process, say, all *.txt files in the current directory:

vim *.txt
:set hidden
:bufdo %s/\r\+$//e
:bufdo %s/\r/ /ge
:xa


To delete every ^M , regardless of where they occur in a line (this is not a good idea if two lines were separated only by a CR because the command joins the lines together):

:%s/\r//g


To replace every CR with LF (when searching, \r matches CR, but when replacing, \r inserts LF; this is not a good idea if LF occurs at the end of a line, because an extra blank line will be created):

:%s/\r/\r/g


后半部分是处理mac文件的,很少有人会感同身受,大家自己看看吧。

If a file uses CR line terminators, it should be read as mac (using :e ++ff=mac ). After doing that, you may see unwanted ^J (LF) characters. In a mac buffer, all CR characters will have been removed because CR is the line terminator, and searching for \r will find unwanted LF characters. Use these commands to remove ^J from the start of all lines, and to replace all other ^J with a line break:

%s/^\r//e
%s/\r/\r/ge



 

 

CR is carriage return (return cursor to left margin), which is Ctrl-M or ^M or hex 0D.

LF is linefeed (move cursor down), which is Ctrl-J or ^J or hex 0A. Sometimes, LF is written as NL (newline).

Mac OS version 9 and earlier use mac line endings, while Mac OS X and later use unix line endings.

你可能感兴趣的:(unix,正则表达式,dos,OS,vim)