1.Avro基本数据类型
类型 描述 模式示例
null The absence of a value "null"
boolean A binary value "boolean"
int 32位带符号整数 "int"
long 64位带符号整数 "long"
float 32位单精度浮点数 "float"
double 64位双精度浮点数 "double"
bytes byte数组(8位无字符字节序列) "bytes"
string Unicode字符串 "string"
【Avro基本数据类型还可以使用更冗长的形式使用type属性来指定如{"type":"null"}】
2.Avro复杂数据类型
数据类型 类型描述 模式示例
array An ordered collection of objects. {
All objects in a particular "type": "array",
array must have the same schema. "items": "long"
}
map An unordered collection of key-value pairs. {
Keys must be strings and values may be any type, "type": "map",
although within a particular map, "values": "string"
all values must have the same schema. }
record A collection of named fields of any type. {
"type": "record",
"name": "WeatherRecord",
"doc": "A weather reading.",
"fields": [
{"name": "year", "type": "int"},
{"name": "temperature", "type": "int"},
{"name": "stationId", "type": "string"}
]
}
enum A set of named values. {
"type": "enum",
"name": "Cutlery",
"doc": "An eating utensil.",
"symbols": ["KNIFE", "FORK", "SPOON"]
}
fixed
A fixed number of 8-bit unsigned bytes.
{
"type": "fixed",
"name": "Md5Hash",
"size": 16
}
union A union of schemas. A union is represented by a JSON [
array, where each element in the array is a schema. "null",
Data represented by a union must match "string",
one of the schemas in the union. {"type": "map", "values": "string"}
]
[img]http://dl2.iteye.com/upload/attachment/0112/2122/4515e63a-8306-3af6-8546-ee0a80ff062d.png[/img]
通过上图所示,通过程序可以将本地的小文件进行打包,组装成一个大文件在HDFS中进行保存,本地的小文件成为Avro的记录。具体的程序如下面的代码所示:
//对Avro数据文件的写入
public class AVRO_WRITE {
public static final String FIELD_CONTENTS = "contents";
public static final String FIELD_FILENAME = "filename";
public static final String SCHEMA_JSON = "{\"type\": \"record\",\"name\": \"SmallFilesTest\", "
+ "\"fields\": ["
+ "{\"name\":\""
+ FIELD_FILENAME
+ "\",\"type\":\"string\"},"
+ "{\"name\":\""
+ FIELD_CONTENTS
+ "\", \"type\":\"bytes\"}]}";
public static final Schema SCHEMA = new Schema.Parser().parse(SCHEMA_JSON);
Edo Interactive在几年前遇到一个大问题:公司使用交易数据来帮助零售商和餐馆进行个性化促销,但其数据仓库没有足够时间去处理所有的信用卡和借记卡交易数据
“我们要花费27小时来处理每日的数据量,”Edo主管基础设施和信息系统的高级副总裁Tim Garnto说道:“所以在2013年,我们放弃了现有的基于PostgreSQL的关系型数据库系统,使用了Hadoop集群作为公司的数
例如我们把scott.dept表生成文本文件的语句写成dept.sql,内容如下:
set pages 50000;
set lines 200;
set trims on;
set heading off;
spool /oracle_backup/log/test/dept.lst;
select deptno||','||dname||','||loc
1. Download and unzip the SonarQube distribution
2. Starting the Web Server
The default port is "9000" and the context path is "/". These values can be changed in &l
昨天在为了把laravel升级到最新的版本,突然之间就出现了如下错误:
ErrorException thrown with message "Declaration of Illuminate\View\Engines\CompilerEngine::handleViewException() should be compatible with Illuminate\View\Eng
import java.util.Arrays;
import java.util.Random;
public class Nim {
/**编程之美 NIM游戏分析
问题:
有N块石头和两个玩家A和B,玩家A先将石头随机分成若干堆,然后按照BABA...的顺序不断轮流取石头,
能将剩下的石头一次取光的玩家获胜,每次取石头时,每个玩家只能从若干堆石头中任选一堆,
今天在测试环境使用yum安装,遇到一个问题:
Error: Cannot retrieve metalink for repository: epel. Please verify its path and try again
处理很简单,修改文件“/etc/yum.repos.d/epel.repo”, 将baseurl的注释取消, mirrorlist注释掉。即可。
&n
今天在linux下做hbase集群的时候,发现hmaster启动成功了,但是用hbase命令进入shell的时候报了一个错误 PleaseHoldException: Master is initializing,查看了日志,大致意思是说master和slave时间不同步,没办法,只好找一种手动同步一下,后来发现一共部署了10来台机器,手动同步偏差又比较大,所以还是从网上找现成的解决方