写在译文前的话
这是我第一次翻译技术文档,肯定有很多错误不妥之处,希望各位指出,我马上改过来。
我认为阅读英文文档应该分为三个阶段 1 理解英文含义 2 理解文字表达的内容 3 实际操作,本文力争做到第二个阶段。
译文虽然完成了90%,但主要的功能都翻译的很清楚。应该可以满足大部分的应用需求。
如果把这个文档弄明白了,其他的sqoop也不用看了
注:红字部分是有疑问,或需要标识的地方。
Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool.
sqoop是一组 相关工具的集合,你可以通过制定工具和参数来控制它。
If Sqoop is compiled from its own source, you can run Sqoop without a formal installation process by running thebin/sqoop
program. Users of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop) will see this program installed as/usr/bin/sqoop
. The remainder of this documentation will refer to this program assqoop
. For example:
sqoop的调用方式, /bin/sqoop 或 /usr/bin/sqoop
$ sqoop tool-name [tool-arguments]
Note | |
---|---|
$ 代表客户端,它不是输入的一部分 |
Sqoop ships with a help tool. To display a list of all available tools, type the following command:
Sqoop有一个帮助工具,来展示可用的工具列表,输入下面的命令
type 有打字和输入的意思 ships with :带有
$ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records 生成与数据库记录交互的代码 create-hive-table Import a table definition into Hive 创建hive型表结构 eval Evaluate a SQL statement and display the results 返回sql的执行结果并显示 export Export an HDFS directory to a database table 导出一个HDFS目录到一个表 help List available commands 列出可用命令行 import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS 导出指定数据库的所有表 list-databases List available databases on a server 列出所有数据库名 list-tables List available tables in a database列出所有表名 version Display version information 显示版本信息 See 'sqoop help COMMAND' for information on a specific command. help还可以用在一个指定的命令上
You can display help for a specific tool by entering:sqoop help (tool-name)
; for example,sqoop help import
.
You can also add the--help
argument to any command:sqoop import --help
.
你也可以这样使用:sqoop import --help
.
In addition to typing thesqoop (toolname)
syntax, you can use alias scripts that specify thesqoop-(toolname)
syntax. For example, the scriptssqoop-import
,sqoop-export
, etc. each select a specific tool.
使用别名,如sqoop-import
, sqoop-export.
You invoke Sqoop through the program launch capability provided by Hadoop. Thesqoop
command-line program is a wrapper which runs thebin/hadoop
script shipped with Hadoop. If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the$HADOOP_COMMON_HOME
and$HADOOP_MAPRED_HOME
environment variables.
For example:
$ HADOOP_COMMON_HOME=/path/to/some/hadoop \ HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \ sqoop import --arguments...
or:
$ export HADOOP_COMMON_HOME=/some/path/to/hadoop $ export HADOOP_MAPRED_HOME=/some/path/to/hadoop-mapreduce $ sqoop import --arguments...
If either of these variables are not set, Sqoop will fall back to$HADOOP_HOME
. If it is not set either, Sqoop will use the default installation locations for Apache Bigtop,/usr/lib/hadoop
and/usr/lib/hadoop-mapreduce
, respectively.
The active Hadoop configuration is loaded from$HADOOP_HOME/conf/
, unless the$HADOOP_CONF_DIR
environment variable is set.
To control the operation of each Sqoop tool, you use generic and specific arguments.
控制每个 Sqoop工具,你都可以使用通用参数和指定参数
For example: 例如:
$ sqoop help import usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string //指定JDBC连接字符串 --connect-manager <jdbc-uri> Specify connection manager class to use//指定连接管理者的class类 --driver <class-name> Manually specify JDBC driver class to use//手动指定 JDBC驱动类 --hadoop-mapred-home <dir>+ Override $HADOOP_MAPRED_HOME --help Print usage instructions //显示使用说明 -P Read password from console//从控制台读取参数 --password <password> Set authentication password //身份验证密码 --username <username> Set authentication username//身份验证用户 --verbose Print more information while working //输出debug信息 --hadoop-home <dir>+ Deprecated. Override $HADOOP_HOME //已经弃用 [...] Generic Hadoop command-line arguments: Hadoop通用命令参数(是hadoop的命令,详见hadoop命令文档,可以用在sqoop tool上)
(must preceed any tool-specific arguments)(必须优先指定工具的参数)
Generic options supported are
-conf <configuration file> specify an application configuration file//指定一个应用配置文件
-D <property=value> use value for given property //传参数
-fs <local|namenode:port> specify a namenode// 指定一个namenote
-jt <local|jobtracker:port> specify a job tracker//指定一个 job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster 指定逗号分隔文件。这些文件被拷贝到mapreduce 集群上
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is 通用命令行语法是
bin/hadoop command [genericOptions] [commandOptions]
You must supply the generic arguments-conf
,-D
, and so on after the tool name butbeforeany tool-specific arguments (such as--connect
). Note that generic Hadoop arguments are preceeded by a single dash character (-
), whereas tool-specific arguments start with two dashes (--
), unless they are single character arguments such as-P
.
-conf
,
-D和其他的通用参数必须写在所有的指定工具的参数前 (比如 --connect).注意 通用参数使用一个破折号 (-),在特定工具中除了单个字符的参数使用一个破折号外其他参数使用两个破折号(-)
The-conf
,-D
,-fs
and-jt
arguments control the configuration and Hadoop server settings. For example, the-D mapred.job.name=<job_name>
can be used to set the name of the MR job that Sqoop launches, if not specified, the name defaults to the jar name for the job - which is derived from the used table name.
-conf
, -D
, -fs
and -jt控制hadoop服务的设置。(具体 作用 就得研究hadoop的参数)
The-files
,-libjars
, and-archives
arguments are not typically used with Sqoop, but they are included as part of Hadoop’s internal argument-parsing system.
-files
, -libjars
, and -archives
通常不用于Sqoop,他们被包含作为hadoop的内部分析参数系统的一部分。
When using Sqoop, the command line options that do not change from invocation to invocation can be put in an options file for convenience. An options file is a text file where each line identifies an option in the order that it appears otherwise on the command line. Option files allow specifying a single option on multiple lines by using the back-slash character at the end of intermediate lines. Also supported are comments within option files that begin with the hash character. Comments must be specified on a new line and may not be mixed with option text. All comments and empty lines are ignored when option files are expanded. Unless options appear as quoted strings, any leading or trailing spaces are ignored. Quoted strings if used must not extend beyond the line on which they are specified.
Option files can be specified anywhere in the command line as long as the options within them follow the otherwise prescribed rules of options ordering. For instance, regardless of where the options are loaded from, they must follow the ordering such that generic options appear first, tool specific options next, finally followed by options that are intended to be passed to child programs.
To specify an options file, simply create an options file in a convenient location and pass it to the command line via--options-file
argument.
Whenever an options file is specified, it is expanded on the command line before the tool is invoked. You can specify more than one option files within the same invocation if needed.
For example, the following Sqoop invocation for import can be specified alternatively as shown below:
$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST $ sqoop --options-file /users/homer/work/import.txt --table TEST
where the options file/users/homer/work/import.txt
contains the following:
import --connect jdbc:mysql://localhost/db --username foo
The options file can have empty lines and comments for readability purposes. So the above example would work exactly the same if the options file/users/homer/work/import.txt
contained the following:
# # Options file for Sqoop import # # Specifies the tool being invoked import # Connect parameter and value --connect jdbc:mysql://localhost/db # Username parameter and value --username foo # # Remaining options should be specified in the command line. #
The following sections will describe each tool’s operation. The tools are listed in the most likely order you will find them useful.