使用shell脚本将mysql数据导入HIve中

使用sqoop将us_order表中的数据导入到hive中,hive的库名为exam_ods,表名叫ods_us_order,根据order_date的日期来实现分区导入数据,形成脚本。

解释

#!/bin/bash

#先删除hive中的此表信息
/usr/local/hive/bin/hive -e "drop table exam_ods.ods_us_order" > /dev/null 2>&1

ALL_DATE=`/usr/bin/mysql -h mypc01 -uroot -p123456 -s -e 'select date(order_date) from sz.us_order group by date(order_date)'`

# 设置一些下面用到的变量
SQOOP_HOME=/usr/local/sqoop
MYSQL_CONNECT=jdbc:mysql://mypc01:3306/sz
MYSQL_USERNAME=root
MYSQL_PWD=123456


for ONE_DAY in ${ALL_DATE}
do
${SQOOP_HOME}/bin/sqoop import \
--connect ${MYSQL_CONNECT}  \
--username ${MYSQL_USERNAME} \
--password ${MYSQL_PWD}  \
--table 'us_order' \
--where "date(order_date)='${ONE_DAY}'" \
--hive-import \
--hive-overwrite \
--hive-table 'exam_ods.ods_us_order' \
--fields-terminated-by ',' \
--hive-partition-key 'dt' \
--hive-partition-value "${ONE_DAY}" \
--num-mappers 3
done

注释

  • hive -e 用于执行一条sql查询语句.其他hive常用命令参数如下.
usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
  • > /dev/null 2>&1表示将输出送入黑洞,就是不在console上面显示.具体原因不深究了.
  • /usr/bin/mysql -h mypc01 -uroot -p123456 -s -e 'select date(order_date) from sz.us_order group by date(order_date) 表示远程启动mysql并执行一条sql语句.
  • 查看帮助可以知道 -h后面跟mysql所在的主机名, -e后面跟sql语句. -s表示静默输出
    mysql还有一些其他可以在linux命令行执行的指令,如
# mysql --help
Usage: mysql [OPTIONS] [database]
 -?, --help          Display this help and exit.
 -I, --help          Synonym for -?
 --auto-rehash       Enable automatic rehashing. One doesn't need to use
                     'rehash' to get table and field completion, but startup
                     and reconnecting may take a longer time. Disable with
                     --disable-auto-rehash.
                     (Defaults to on; use --skip-auto-rehash to disable.)
 -A, --no-auto-rehash
                     No automatic rehashing. One has to use 'rehash' to get
                     table and field completion. This gives a quicker start of
                     mysql and disables rehashing on reconnect.
 --auto-vertical-output
                     Automatically switch to vertical output mode if the
                     result is wider than the terminal width.
 -B, --batch         Don't use history file. Disable interactive behavior.
                     (Enables --silent.)
 --bind-address=name IP address to bind to.
 --binary-as-hex     Print binary data as hex
 --character-sets-dir=name
                     Directory for character set files.
 --column-type-info  Display column type information.
 -c, --comments      Preserve comments. Send comments to the server. The
                     default is --skip-comments (discard comments), enable
                     with --comments.
 -C, --compress      Use compression in server/client protocol.
 -#, --debug[=#]     This is a non-debug version. Catch this and exit.
 --debug-check       This is a non-debug version. Catch this and exit.
 -T, --debug-info    This is a non-debug version. Catch this and exit.
 -D, --database=name Database to use.
 --default-character-set=name
                     Set the default character set.
 --delimiter=name    Delimiter to be used.
 --enable-cleartext-plugin
                     Enable/disable the clear text authentication plugin.
 -e, --execute=name  Execute command and quit. (Disables --force and history
                     file.)
 -E, --vertical      Print the output of a query (rows) vertically.
 -f, --force         Continue even if we get an SQL error.
 --histignore=name   A colon-separated list of patterns to keep statements
                     from getting logged into syslog and mysql history.
 -G, --named-commands
                     Enable named commands. Named commands mean this program's
                     internal commands; see mysql> help . When enabled, the
                     named commands can be used from any line of the query,
                     otherwise only from the first line, before an enter.
                     Disable with --disable-named-commands. This option is
                     disabled by default.
 -i, --ignore-spaces Ignore space after function names.
 --init-command=name SQL Command to execute when connecting to MySQL server.
                     Will automatically be re-executed when reconnecting.
 --local-infile      Enable/disable LOAD DATA LOCAL INFILE.
 -b, --no-beep       Turn off beep on error.
 -h, --host=name     Connect to host.
 -H, --html          Produce HTML output.
 -X, --xml           Produce XML output.
 --line-numbers      Write line numbers for errors.
                     (Defaults to on; use --skip-line-numbers to disable.)
 -L, --skip-line-numbers
                     Don't write line number for errors.
 -n, --unbuffered    Flush buffer after each query.
 --column-names      Write column names in results.
                     (Defaults to on; use --skip-column-names to disable.)
 -N, --skip-column-names
                     Don't write column names in results.
 --sigint-ignore     Ignore SIGINT (CTRL-C).
 -o, --one-database  Ignore statements except those that occur while the
                     default database is the one named at the command line.
 --pager[=name]      Pager to use to display results. If you don't supply an
                     option, the default pager is taken from your ENV variable
                     PAGER. Valid pagers are less, more, cat [> filename],
                     etc. See interactive help (\h) also. This option does not
                     work in batch mode. Disable with --disable-pager. This
                     option is disabled by default.
 -p, --password[=name]
                     Password to use when connecting to server. If password is
                     not given it's asked from the tty.
 -P, --port=#        Port number to use for connection or 0 for default to, in
                     order of preference, my.cnf, $MYSQL_TCP_PORT,
                     /etc/services, built-in default (3306).
 --prompt=name       Set the mysql prompt to this value.
 --protocol=name     The protocol to use for connection (tcp, socket, pipe,
                     memory).
 -q, --quick         Don't cache result, print it row by row. This may slow
                     down the server if the output is suspended. Doesn't use
                     history file.
 -r, --raw           Write fields without conversion. Used with --batch.
 --reconnect         Reconnect if the connection is lost. Disable with
                     --disable-reconnect. This option is enabled by default.
                     (Defaults to on; use --skip-reconnect to disable.)
 -s, --silent        Be more silent. Print results with a tab as separator,
                     each row on new line.
 -S, --socket=name   The socket file to use for connection.
 --ssl-mode=name     SSL connection mode.
 --ssl               Deprecated. Use --ssl-mode instead.
                     (Defaults to on; use --skip-ssl to disable.)
 --ssl-verify-server-cert
                     Deprecated. Use --ssl-mode=VERIFY_IDENTITY instead.
 --ssl-ca=name       CA file in PEM format.
 --ssl-capath=name   CA directory.
 --ssl-cert=name     X509 cert in PEM format.
 --ssl-cipher=name   SSL cipher to use.
 --ssl-key=name      X509 key in PEM format.
 --ssl-crl=name      Certificate revocation list.
 --ssl-crlpath=name  Certificate revocation list path.
 --tls-version=name  TLS version to use, permitted values are: TLSv1, TLSv1.1,
                     TLSv1.2
 --server-public-key-path=name
                     File path to the server public RSA key in PEM format.
 --get-server-public-key
                     Get server public key
 -t, --table         Output in table format.
 --tee=name          Append everything into outfile. See interactive help (\h)
                     also. Does not work in batch mode. Disable with
                     --disable-tee. This option is disabled by default.
 -u, --user=name     User for login if not current user.
 -U, --safe-updates  Only allow UPDATE and DELETE that uses keys.
 -U, --i-am-a-dummy  Synonym for option --safe-updates, -U.
 -v, --verbose       Write more. (-v -v -v gives the table output format).
 -V, --version       Output version information and exit.
 -w, --wait          Wait and retry if connection is down.
 --connect-timeout=# Number of seconds before connection timeout.
 --max-allowed-packet=#
                     The maximum packet length to send to or receive from
                     server.
 --net-buffer-length=#
                     The buffer size for TCP/IP and socket communication.
 --select-limit=#    Automatic limit for SELECT when using --safe-updates.
 --max-join-size=#   Automatic limit for rows in a join when using
                     --safe-updates.
 --secure-auth       Refuse client connecting to server if it uses old
                     (pre-4.1.1) protocol. Deprecated. Always TRUE
 --server-arg=name   Send embedded server this as a parameter.
 --show-warnings     Show warnings after every statement.
 -j, --syslog        Log filtered interactive commands to syslog. Filtering of
                     commands depends on the patterns supplied via histignore
                     option besides the default patterns.
 --plugin-dir=name   Directory for client-side plugins.
 --default-auth=name Default authentication client-side plugin to use.
 --binary-mode       By default, ASCII '\0' is disallowed and '\r\n' is
                     translated to '\n'. This switch turns off both features,
                     and also turns off parsing of all clientcommands except
                     \C and DELIMITER, in non-interactive mode (for input
                     piped to mysql or loaded using the 'source' command).
                     This is necessary when processing output from mysqlbinlog
                     that may contain blobs.
 --connect-expired-password
                     Notify the server that this client is prepared to handle
                     expired password sandbox mode.

example

[root@mypc01 openresty]# /usr/bin/mysql -h mypc01 -uroot -p123456 -e 'select date(order_date) from sz.us_order group by date(order_date)'

+------------------+
| date(order_date) |
+------------------+
| 2019-07-14       |
| 2019-07-15       |
| 2019-07-16       |
+------------------+

如何把命令的查询结果赋给一个shell变量呢?需要给命令加反单引号

ALL_DATE=`/usr/bin/mysql -h mypc01 -uroot -p123456  -e 'select date(order_date) from sz.us_order group by date(order_date)'`
echo $ALL_DATE

执行结果如下

date(order_date) 2019-07-14 2019-07-15 2019-07-16

如果要去掉第一个,需要加上-s

或者如下写法也可以

ALL_DATE=$(/usr/bin/mysql -h mypc01 -uroot -p123456  -e 'select date(order_date) from sz.us_order group by date(order_date)')
echo $ALL_DATE

接下来为shell循环的写法
流类比

for i in $(seq 1 10)
do
echo $i
done

你可能感兴趣的:(hive,sqoop)