----------------------------------------------------------------------------
---- 本文为andkylee个人原创,请在尊重作者劳动成果的前提下进行转载;
---- 转载务必注明原始出处 : http://blog.csdn.net/andkylee
---- 关键字: ASE bcp 语法 常见文件 存储结构
----------------------------------------------------------------------------
关于 BCP 工具的使用 ,介绍三个方面的东西
1. 命令及参数;
2. 出现的问题及解决方法;
3. 分析 bcp 出来的文件结构。
----------------------------------------------------------------------------------------------------------
第一: BCP 命令参数
bcp ( version 11.0.x )
f unction : Copies a database table to or from an operating system file in a userspecified format.
Syntax :
bcp [[database_name.]owner.]table_name {in | out}
datafile
[-m maxerrors] [-f formatfile] [-e errfile]
[-F firstrow] [-L lastrow] [-b batchsize]
[-n] [-c] [-t field_terminator] [-r row_terminator]
[-U username] [-P password] [-I sqlini_file]
[-S server] [-a display_charset]
[-q datafile_charset] [-z language] [-v]
[-A packet_size] [-J client_charset]
[-T text_or_image_size] [-E] [-N] [-X]
[-y sybase_dir]
Parameters :
database_name – is optional if the table being copied is in your default database. Otherwise, specify a database name.
owner – is optional if you or the Database Owner own the table being copied. If you do not specify an owner, bcp first looks for a table of that name owned by you. Then it looks for one owned by the Database Owner. If another user owns the table, you must specify the owner’s name or the command fails.
table_name – is the name of the database table to copy. The table name cannot be a Transact-SQL reserved word.
in | out – is the direction of the copy. in indicates a copy from a file into the database table; out indicates a copy to a file from the database table.
datafile – is the full path name of an operating system file. The path name can be from 1–255 characters in length.
-m max_errors – is the maximum number of nonfatal errors permitted before bcp aborts the copy. bcp discards each row that it cannot insert (due to a data conversion error, or an attempt to insert a null value into a column that does not allow them), counting eachrejected row as one error. If you do not include this option, bcp uses a default value of 10.
-f format_file – is the full path name of a file with stored responses from a previous use of bcp on the same table. After you answer the bcp format questions, bcp asks if you want to save your answers in a format file; creation of the format file is optional. The
default file name is bcp.fmt. The bcp program can refer to a format file when copying data, so that you do not have to duplicate your previous format responses interactively. Use this option only when you previously created a format file that you want to use now for a copy in or out. If this option is not used, bcp queries you
for format information interactively.
-e errfile – is the full path name of an error file where bcp stores any rows that it was unable to transfer from the file to the database.Error messages from the bcp program appear on your terminal. bcp creates an error file only when you specify this option. If you specify this option, and bcp does not encounter any nonfatal errors, it does not create the error file.
-F firstrow – is the number of the first row to copy (default is the first row).
-L lastrow – is the number of the last row to copy (default is the last row).
-b batchsize – is the number of rows per serial batch of data copied (the default is to copy all the rows in one batch). Each batch is a transaction that is committed at the end of the batch. Batching applies only when bulk copying in; it has no effect on bulk
copying out.
-n – performs the copy operation using native (operating system) formats. This option does not prompt for each field. Files in native data format are not human-readable.
-c – performs the copy operation with char datatype as the default.Use this format if you are sharing data between platforms. This option does not prompt for each field; it uses char as the default storage type, no prefixes, /t (tab) as the default field terminator, and /n (newline) as the default row terminator.
-t field_terminator – specifies the default field terminator.
-r row_terminator – specifies the default row terminator.
Note :
When specifying terminators from the command line with the -t or -r option, escape characters that have special significance to the Windows NT Command Prompt shell (see example 1 on page 1-11). Either place a backslash in front of the special character or enclose it in quotes. This is not necessary when bcp prompts you (interactive mode).
-U username – specifies a SQL Server login name. If you do not specify username, bcp uses value of the USERNAME environment variable (the current user’s operating system login name, not the Sybase user name and login).
-P password – specifies a SQL Server password. If you do not specify -P password, bcp prompts for a password. If your password is NULL, place the -P flag at the end of the command line by itself.
-I sqlini_file – specifies the name and location of the interfaces file (sql.ini) to search when connecting to SQL Server. If you do not specify -I, bcp looks for a file named sql.ini in the ini subdirectory of your Sybase release directory.
-S server – specifies the name of the SQL Server to connect to. If you specify -S with no argument, bcp uses the server specified by the DSQUERY environment variable.
-a display_charset – runs bcp from a terminal where the character set differs from that of the machine on which bcp is running. (See the System Administration Guide for more information about changing character sets.) -a in conjunction with -J specifies the character set translation file (.xlt file) required for the conversion. Use –a without -J only if the client character set is the same as the default character set.
-q datafile_charset – runs bcp to copy character data to or from a file system that uses a character set different from the client character set. -q in conjunction with -J specifies the character set translation file (.xlt file) required for the conversion. In Japanese language environments, the -q flag translates Hankaku Katakana (half-width characters) into Zenkaku Katakana (full-width characters). Use with the argument “zenkaku” and with the -J flag to indicate the client’s Japanese character set (sjis or eucjis). The zenkaku.xlt file was designed totranslate only from terminal display to SQL Server, not from SQL Server to the terminal.
Note :
The ascii_7 character set is compatible with all character sets. If either the SQL Server’s or client’s character set is set to ascii_7, any 7-bit ASCII character is allowed to pass between client and server unaltered. Other characters produce conversion errors. Character set conversion issues are covered more thoroughly in the System Administration Guide.
-z language – is the official name of an alternate language that the server uses to display bcp prompts and messages. Without the –z flag, bcp uses the server’s default language. You can add languages to a SQL Server during installation or add them
afterward with the langinstall utility or the stored procedure sp_addlanguage.
-v – displays the version number of bcp and a copyright message and returns to the operating system.
-A packet_size – specifies the network packet size to use for this bcp session. For example:
bcp -A 2048
sets the packet size to 2048 bytes for this bcp session. size must be between the values of the default network packet size and max network packet size configuration parameters, one-third the size of the additional network memory configuration parameter, and a multiple of 512. To improve the performance of large bulk copy operations, use network packet sizes that are larger than the default.
-J client_charset – specifies the character set to use on the client. bcp uses a filter to convert input between client_charset and the SQL Server character set.
-J client_charset requests that SQL Server convert to and from client_charset, the character set used on the client. -J with no argument sets character set conversion to NULL. No conversion takes place. Use this parameter if the client and server use the same character set.The default may not necessarily be the character set that the client is using. See the System Administration Guide for more information about character sets and the associated flags.
-T text_or_image_size – specifies in bytes the maximum length of text or image data that SQL Server sends. The default is 32K. If a text or image field is larger than the value of -T or the default, bcp does not send the overflow.
-E – explicitly specifies the value of a table’s IDENTITY column. By default, when you bulk copy data into a table with an IDENTITY column, the host file must contain a placeholder for the IDENTITY column (a value of 0 is recommended). The server assigns the row a unique, sequential IDENTITY column value, as bcp inserts each row into the table. If the number of inserted rows exceeds the maximum possible IDENTITY column value, SQL Server returns an error message. To use an explicit IDENTITY column value from the host file for each row, specify the -E flag when copying data into a table. The -E option has no effect on bulk copying out.
-N – skips the IDENTITY column. Use this option when you copy data in, if your host data file does not include a placeholder for the IDENTITY column values, or when you copy data out and you do not want to include the IDENTITY column information in
the host file
-X – when connecting to the server, bcp initiates the login with clientside password encryption. bcp (the client) specifies to the server that password encryption is desired. The server sends back an encryption key, which bcp uses to encrypt your password, and theserver uses the key to authenticate your password when it arrives. If bcp crashes, the system creates a core file that contains your password. If you did not use the encryption option, the password appears in plain text in the file. If you used the
encryption option, your password is not readable.
-y sybase_dir – specifies a Sybase directory other than the default Sybase release directory.
第二:出现的问题及解决方法
将某表的数据 bcp out 出来,再次 bcp in 的时候, isql 下提示: CSLIB Message: - L0/O0/S0/N24/1/0: cs_convert: cslib user api layer: common library error: The conversion/operation was stopped due to a syntax error in the source field.
上 google 和论坛寻找解决的方法。有两个地方需要检查一下:
1. 前后两张表的结构是否相同;
2. 原表数据库中是否含有和分隔符相同的字符。
第一点可以保证没有问题,因为新表是用 pb 导出来的旧表的语法创建的。最后检查导出的数据文件的内容,发现某字段数据中含有和默认分隔符( tab 键)相同的字符。原因找出来了。感觉如释重负。因为:以前用 -c 参数导出数据的时候有时候能够成功导入,有时候不能完全或者根本不能导入数据。原因在于分隔符选择的不对或者说数据中含有某些分隔符。
参数 -t 表示字段分隔符, -r 表示行分隔符。默认的字段分隔符为制表符 /t (tab 键 ) ,默认的行分隔符为换行符 /n (ascii 码 10). 。分析多个数据文件发现咋 windows 平台下用 -n 导出的数据文件中行分隔符为回车加换行 /r/n 。
上面的问题是在使用 -c 的情况下出现的,如果您在 bcp 命令行中使用 -n 的话,不会出现此错误信息,前提是你导入的时候也用 -n 这个参数。
用 -n 导出的数据文件是不可读的,你想修改其中的数据不是件简单的事情(除非你懂得数据文件的结构,后面会分析!)。这样,如果遇到不能导入的情况的话数据就不好会修复了。网上有人发帖求助:原表的数据在 bcp -c 出来后删除了,而表中包含 text 类型字段且该字段数据中包含软回车、硬回车、 tab 等特殊字符,数据也无法导入了。不知道怎么办了!!!
关于 -n 与 -c 这两个参数,选哪个好呢?我认为:如果您的表中只有数值、时间、或者不含特殊字符的字符串(也就是表的字符长度比较 fixed ,不含特殊字符),那么两者都可以。如果您想手动编辑修改导出的数据,那么请用 -c 导出数据。用 -n 导出的数据不好修改。如果表中含有 text 或者 image 类型字段的话,最好用 -n 导出数据。对于 text 、 image 类型字段有时候用 -c 也是可以正常导入的,前提是里面不含有特殊字符。如果您在 image 类型字符中存储了 word 文档的话, word 文档中含有的软、硬回车就会使您的 bcp in 工作失败,何况还有特殊字符。导出和导入长文本字段的时候别忘记加 -T 参数,显示指定 text 、 image 字段可以导出的文本的长度(如: -T1000000 [ 单位字节 ] )以此来避免导出默认的 32K 数据而截断多余的。请注意客户端与服务器字符集是否一致,可以用 -J 参数来强制指定。
补充三个问题:
Q : bcp 如何按照一定的条件导出数据?
A : bcp 实用工具不支持设定条件的方式导出数据。虽然参数中有 -F 和 -L 但是 bcp 导出数据的行范围是由内部存储顺序决定的,人为不太好控制。一个方法是:先把欲导出表的满足条件的数据建成视图,再用 BCP 导出这个视图的数据就可以了。
Q :导入数据的过程中发现某行有错误,如果定位具体是哪行数据不能正确导入?
A :导入数据的时候要用到 -b 参数,亦即指定每次提交或者回退的行数。默认是把所有行的导入当作一个事务。容易想到的方法是:用 -b1 让每次提交一行,这样不能成功导入的行就显示出来了。但是问题在:如果有 100M 行数据,具体哪行也不好定位。一般情况您可能会用 -b1000 来更快的导入数据,因为本来 fast bcp 操作是不记日志的。这时,错误信息会显示在 2000 至 3000 行之间有一行没有成功导入,如来一来错误行也不是很好定位。我认为比较好的办法是:用 -F 和 -L 参数来定位。这两个参数是用来确定导入导出范围的,此时我们可以用它们来确定错误的位置。方法类似于排序算法中的二分法。逐次缩小查询的范围,最终找到不能导入的数据行。
Q :用 bcp –n 导出的数据,导入的时候报错: Negative length-prefix found in BCP data-file.bcp copy in failed 。
A :错误信息的意思是说:在 bcp 数据文件的长度前缀存在负数,所以 bcp 复制失败。如果欲导入的新表的表结构和导出数据的旧表的结构不相同的话,可能会出现此问题。导入的过程中,按照新表的表结构读取 data-file 的时候本来应该是字符长度表示字符的数据可能为旧表某个字段的数据,这样就有可能出现负数(长度不可能为负数!)。
第三:分析 bcp 导出的数据文件的内部结构
用二进制或者十六进制编辑器 UE 等分析 -c 导出的数据文件的内容,发现除了分隔符、换行符以外其余基本为可见字符。
31 38 36 32 30 30 30 30 30 30 30 30 30 30 34 09
31 09 31 38 36 32 30 30 30 30 30 30 30 30 30 30
34 09 35 32 33 09 34 31 09 20 09 4D 61 72 20 20
31 20 32 30 30 39 20 20 39 3A 34 37 3A 31 35 3A
33 32 36 50 4D 09 4D 61 72 20 20 31 20 32 30 30
39 20 20 39 3A 35 30 3A 30 33 3A 32 35 30 50 4D
09 31 CC EC 09 4D 61 72 20 20 32 20 32 30 30 39
20 31 31 3A 35 39 3A 35 39 3A 30 30 30 50 4D 09
09 30 46 32 30 30 30 30 31 09 30 46 32 30 30 30
30 31 09 30 46 32 30 30 30 30 31 09 C3 F1 CA C2
D2 BB C9 F3 B0 B8 BC FE 09 66 A1 A2 67 67 67 CB
DF 66 66 66 C0 CD B6 AF D5 F9 D2 E9 A1 A2 C8 CB
CA C2 D5 F9 D2 E9 20 09 CA D5 B0 B8 C9 F3 B2 E9
09 39 0D 0A 31 38 36 32 30 30 30 30 30 30 30 30
30 30 34 09 32 09
其中红色标记出来的 09 表示 tab 键用来分隔字段。
用 -n 导出的数据文件内容结构如下:
0A 0F 00 00 00 AD 88 16 39 20 10 05 03 00 00 00
01 0F 00 31 39 30 38 30 30 30 30 30 30 30 30 30
30 30 03 00 32 32 31 02 00 34 31 01 00 20 08 A1
8E 00 00 E8 D4 05 01 08 A1 8E 00 00 E8 D4 05 01
01 00 20 08 A1 8E 00 00 E8 D4 05 01 00 00 08 00
30 46 36 42 30 30 33 35 08 00 30 46 36 42 30 30
33 35 08 00 30 46 36 42 30 30 33 35 24 00 28 32
30 30 30 29 C0 B3 D0 CC B3 F5 D7 D6 B5 DA 30 30
30 30 31 BA C5 09 B5 C8 B4 FD C1 A2 B0 B8 B5 C7
BC C7 01 00 39
如果含有 text 或者 image 类型字段的表中的数据不能导入的话,需要分析以上的数据结构才能拯救您的可怜的数据。我已经分析完毕,基本掌握了 bcp 导出的数据内部结构信息。如上面红色显示的是字段的长度 15 ( 0F00 ),浅蓝色显示的数据为: 190800000000000 ( 313930383030303030303030303030 ),红色的 08 表示长度为 8 ,浅蓝色的 A18E0000E8D40501 表示日期为: 1999-12-21 15:53:18 。
补充一下:前段时间我自己编写了一个能够从 sybase 数据和日志设备文件中提取数据的程序。通过研究 bcp –n 导出的数据,如果有必要可以再编写一个从 bcp 导出的不可见字符文件中提取数据的程序。可能您觉得该程序没有必要,但是如果存在如我在文章中提到的删除了原表数据而 bcp 无法导入时,可能这个程序有能起点作用了。
通过分析内部数据结构,我认为原旧表的表结构有时不一定完全相同也可以正常操作。只要新表的表结构能够完全容纳旧表的数据内容,应该可以成功导入。这一点,有时间测试一下。个人之见而已!