The following table declaration creates an external table that can read all the data files for this comma-delimited data in /data/stocks: CREATE EXTERNAL TABLE IF NOT EXISTS stocks ( exchange STRING, symbol STRING, ymd STRING, price_open FLOAT, price_high FLOAT, price_low FLOAT, price_close FLOAT, volume INT, price_adj_close FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/data/stocks'; 下面开始导入数据到economy的stocks表中(Note that:可以directory或file形式导入) hive (economy)> load data local inpath '/home/landen/下载/infochimps_dataset_4777_download_16185/NASDAQ/NASDAQ_daily_prices_character' overwrite into table stocks;
It's conventional practice to specify a path that is a directory, rather than an indivitual file.
Hive will copy all the files in the directory, which give you the flexibility of organizing
the data into multiple files and changing the file naming convention,without requiring a change
to your Hive scripts. Either way, the files will be copied to the appropriate location for the table
and the names will be the same.
If the local keyword is used, the path is assumed to be in the local filesystem. The data is copied
into the final location. If local keyword is omitted, the path is assumed to be in the distributed filesystem.
Notice:
@1. If you specify the overwrite keyword, any data already present in the target directory will be
deleted first. Without the keyword, the new files are simply added to the target directory. However, if files
already exist in the target directory that match filenames being loaded, the old files are overwritten.
@2. Hive does not verify that the data you are loading matches the schema for the table. However,it
will verify that the file format matches the table definition. For example, if the table was created with SEQUENCEFILE
storage, the loaded files must be sequence files.
hive (economy)> select count(*) from stocks where symbol = 'BBND'; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201303271617_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201303271617_0002 Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201303271617_0002 Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1 2013-03-27 18:54:23,829 Stage-1 map = 0%, reduce = 0% 2013-03-27 18:54:33,043 Stage-1 map = 28%, reduce = 0% 2013-03-27 18:54:36,236 Stage-1 map = 41%, reduce = 0% 2013-03-27 18:54:39,244 Stage-1 map = 57%, reduce = 0% 2013-03-27 18:54:42,252 Stage-1 map = 90%, reduce = 0% 2013-03-27 18:54:45,264 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:46,268 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:47,273 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:48,278 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:49,283 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:50,287 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:51,291 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:52,295 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:53,299 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.08 sec 2013-03-27 18:54:54,304 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 16.08 sec 2013-03-27 18:54:55,308 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 16.08 sec 2013-03-27 18:54:56,313 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 16.08 sec 2013-03-27 18:54:57,318 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.1 sec 2013-03-27 18:54:58,323 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.1 sec 2013-03-27 18:54:59,329 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.1 sec 2013-03-27 18:55:00,334 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.1 sec 2013-03-27 18:55:01,339 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.1 sec 2013-03-27 18:55:02,344 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.1 sec 2013-03-27 18:55:03,350 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 19.1 sec MapReduce Total cumulative CPU time: 19 seconds 100 msec Ended Job = job_201303271617_0002 MapReduce Jobs Launched: Job 0: Map: 2 Reduce: 1 Cumulative CPU: 19.1 sec HDFS Read: 481098497 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 19 seconds 100 msec OK 731 Time taken: 49.812 seconds hive (economy)>