Spark SQL
IOE
SQL:schema + file
select ... from xxx where.....
SQL on Hadoop
Hive
Impala
Presto
Shark
Drill
Phoenix
Spark SQL
Hive on Spark
MapReduce
Tez
Spark
Spark API
SQL
DataFrame/Dataset
start-thriftserver.sh
Spark SQL is not about SQL
Spark SQL is about more then SQL
===>
ETL : DataSource API
V1
V2
Frontend
Catalyst Spark SQL的核心
Backend
create table dept(
deptno int, dname string, loc string
)row format delimited fields terminated by '\t';
load data local inpath '/home/hadoop/data/dept.txt' overwrite into table dept;
select e.empno,e.ename,d.dname from emp e join dept d on e.deptno=d.deptno;
create tablerpgone_test(key string,value string);
explain extended select a.key*(5+6), b.value
from ruoze_test a join ruoze_test b
on a.key=b.key and a.key>10;
大数据数据最简单的方式就是:忽略它
thriftserver和spark-sql或者spark-shell的区别在哪?