create table

创建表

CREATE TABLE page_view(viewTime INT, userid BIGINT,
                page_url STRING, referrer_url STRING,
                ip STRING COMMENT'IP Address of the User')
COMMENT'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE;



In this example the columns of the table are specified with the corresponding types. Comments can be attached both at the column level as well as at the table level. Additionally the partitioned by clause(条款) defines the partitioning columns which are different from the data columns and are actually not stored with the data.【附加的是通过条款来定义分区列和通过数据列进行分区是不一样的,它们是不会和数据一样存储下来的】 When specified in this way, the data in the files is assumed to be delimited(分隔符) with ASCII 001(ctrl-A) as the field delimiter(分隔符)and newline as the row delimiter.【以这种方式标志一个表的时候,数据块默认的是以ASCII 001即【ctrl+a】来分隔数据的,用新建一行(换行)来分割行的】
Sequencefile:是hdfs中的容器,用于对一些小文件的组织起来统一存储.......
Mapfile:是序列化的Sequencefile

It is also a good idea to bucket the tables on certain columns so that efficient sampling queries can be executed against the data set. If bucketing is absent, random sampling can still be done on the table but it is not efficient as the query has to scan all the data. The following example illustrates the case of the page_view table that is bucketed on the userid column:

CREATE TABLE page_view(viewTime INT, userid BIGINT,
                page_url STRING, referrer_url STRING,
                ip STRING COMMENT'IP Address of the User')
COMMENT'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO32BUCKETS
ROW FORMAT DELIMITED
        FIELDS TERMINATED BY'1'
        COLLECTION ITEMS TERMINATED BY'2'
        MAP KEYS TERMINATED BY'3'
STORED AS SEQUENCEFILE;

你可能感兴趣的:(create table)