引入fastjson的时候提示类找不到,
一般我们开发的时候提示类找不到,或者方法找不到等问题都是因为Jar冲突之类的问题.
可以尝试以下解决方式:
重新build一下工程.
clean一下工程.
尝试重新引入依赖.注意新引入的依赖要放在最后面,防止被覆盖掉.
<dependencies>
<dependency>
<groupId>cn.itcastgroupId>
<artifactId>itcast_shop_commonartifactId>
<version>1.0-SNAPSHOTversion>
dependency>
<dependency>
<groupId>com.alibabagroupId>
<artifactId>fastjsonartifactId>
<version>1.2.58version>
dependency>
dependencies>
本地仓库中有依赖,比如log4j等依赖,但是IDEA不用,非得从网上下.
阿里云其实就是一个很大的私服.Nexus
阿里云->c3p0 ID: ali-repo
私服->c3p0 ID: my-repo
比如现在本地存放的是ali-repo,那么在_remote.repositories
文件中就会标记当前依赖是从ali-repo上面下 载的.后面如果我们把仓库的ID改为hello,就算地址没有改变,那么maven也会从远程重新下载,因为maven要 保证jar是我们所需要的,而不是之前的jar版本
如果用我的依赖,而且出现依赖一直下载不了,可以尝试将maven的settings.xml中的镜像修改为下方的:
<mirror> <id>centralid> <name>aliyun mavenname> <url>http://maven.aliyun.com/nexus/content/groups/public/url> <mirrorOf>centralmirrorOf> mirror>
Canal高可用:
搭建实时项目工程.
开发Canal客户端程序.获取Canal服务端数据,将数据发送到Kafka中.
开发Flink实时ETL程序.
BaseETL: 定义后面的业务流程
同步维度数据到Redis
点击流日志业务开发.
日志解析工具: LogParser.
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gTxElBMh-1591323666958)(assets/image-20200427094103666.png)]
因为后面的业务中需要对订单明细数据进行拉宽,拉宽后的数据需要商家/商品分类等维度数据,但是订单明细中没有这些数据.
假如现在新增一个订单,我们获取到这个订单之后,就需要及时的获取这个订单的维度数据,如果我们从MySQL中进行查询,效率太低,对MySQL也造成很大的压力,所以我们将维度数据放到Redis,提高查询效率.从而也导致我们需要让Redis中的数据和MySQL中的数据保持一致.
Redis中的数据的同步策略:
需要同步的维度表:
需求: 编写一个程序,将MySQL中的维度数据,同步到Redis中.
为了方便我们操作数据,我们可以定义一些维度数据的实体类.比如商品维度实体类,店铺实体类…
先获取连接: MySQL连接/Redis连接
编写SQL查询语句,获取MySQL中的维度数据(查询所有数据)
将查询出来的数据封装为实体类,便于后续操作.
将实体对象转换为json,便于存入Redis.
将数据存入Redis中:
Redis是一个Key-Value型数据库.hello->world
比如: key: itcast_goods value(map): (goodsId, json)
关闭连接
先确保已经安装Redis.如果没有安装,参考实时数仓Day03\资料\2.安装redis
先将Redis安装到Linux中.
安装Redis可视化工具:
工具路径: 实时数仓Day03\资料\2.安装redis\redisdesktopmanager_0.9.99.rar
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-o3X16rMJ-1591323666962)(assets/image-20200427101732551.png)]
导入Redis连接工具到项目中.
路径: 实时数仓Day03\资料\初始化实时ETL配置文件\1.redis连接池工具类\RedisUtil.scala
将这个类放入etl模块的util包下.
然后修改application.conf配置文件中Redis的连接地址:
# Redis配置
redis.server.ip="node2"
redis.server.port=6379
package cn.itcast.shop.realtime.etl.bean
import com.alibaba.fastjson.JSON
import com.alibaba.fastjson.serializer.SerializerFeature
import scala.beans.BeanProperty
/**
* 维度类,里面包含商品维度/商家维度....各个维度样例类.
*/
// 商品维度样例类
case class DimGoodsDBEntity(
@BeanProperty goodsId: Long, // 商品ID
@BeanProperty goodsName: String, // 商品名称
@BeanProperty goodsCatId: Int, //商品分类
@BeanProperty shopId: Long, //商家ID
@BeanProperty shopPrice: Double //商品价格
)
// 商品分类维度样例类
case class DimGoodsCatDBEntity(@BeanProperty catId:String = "", // 商品分类id
@BeanProperty parentId:String = "", // 商品分类父id
@BeanProperty catName:String = "", // 商品分类名称
@BeanProperty cat_level:String = "") // 商品分类级别
// 店铺维度样例类
case class DimShopsDBEntity(@BeanProperty shopId:Int = 0, // 店铺id
@BeanProperty areaId:Int = 0, // 店铺所属区域id
@BeanProperty shopName:String = "", // 店铺名称
@BeanProperty shopCompany:String = "") // 公司名称
// 组织结构维度样例类
case class DimOrgDBEntity(@BeanProperty orgId:Int = 0, // 机构id
@BeanProperty parentId:Int = 0, // 机构父id
@BeanProperty orgName:String = "", // 组织机构名称
@BeanProperty orgLevel:Int = 0) // 组织机构级别
// 门店商品分类维度样例类
case class DimShopCatDBEntity(@BeanProperty catId:String = "", // 商品分类id
@BeanProperty parentId:String = "", // 商品分类父id
@BeanProperty catName:String = "", // 商品分类名称
@BeanProperty catSort:String = "") // 商品分类级别
object DimEntity {
def main(args: Array[String]): Unit = {
val goodsDBEntity: DimGoodsDBEntity = DimGoodsDBEntity(
3L,
"华为手机Mate30",
1,
123L,
3000.00
)
//将商品数据转换为Json,使用FastJson将对象转换为json字符串.
val json: String = JSON.toJSONString(goodsDBEntity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
}
}
JSON.toJSONString(goodsDBEntity, SerializerFeature.DisableCircularReferenceDetect)
FastJson在Scala语言中,进行对象转json字符串的时候,需要额外加一个配置,关闭循环调用.
# Reids中key的值
redis.key.goods="itcast_shop:dim_goods"
redis.key.goods_cats="itcast_shop:goods_cats"
redis.key.shops="itcast_shop:shops"
redis.key.org="itcast_shop:org"
redis.key.shop_cats="itcast_shop:shop_cats"
val `redis.key.goods`: String = config.getString("redis.key.goods")
val `redis.key.goods_cats`: String = config.getString("redis.key.goods_cats")
val `redis.key.shops`: String = config.getString("redis.key.shops")
val `redis.key.org`: String = config.getString("redis.key.org")
val `redis.key.shop_cats`: String = config.getString("redis.key.shop_cats")
package cn.itcast.shop.realtime.etl.dataloader
import java.sql.{Connection, DriverManager, ResultSet, Statement}
import cn.itcast.shop.realtime.etl.bean.{DimGoodsCatDBEntity, DimGoodsDBEntity, DimOrgDBEntity, DimShopCatDBEntity, DimShopsDBEntity}
import cn.itcast.shop.realtime.etl.util.{GlobalConfigUtil, RedisUtil}
import com.alibaba.fastjson.JSON
import com.alibaba.fastjson.serializer.SerializerFeature
import redis.clients.jedis.Jedis
/**
* 维度数据离线同步程序
*/
object DimensionDataLoader {
def main(args: Array[String]): Unit = {
//1. 先获取连接: MySQL连接/Redis连接
val jedis: Jedis = RedisUtil.getJedis()
// 获取MySQL连接
//注册驱动
Class.forName("com.mysql.jdbc.Driver")
//获取MySQL连接
val connection: Connection = DriverManager.getConnection(
s"jdbc:mysql://${GlobalConfigUtil.`mysql.server.ip`}:${GlobalConfigUtil.`mysql.server.port`}/${GlobalConfigUtil.`mysql.server.database`}",
GlobalConfigUtil.`mysql.server.username`,
GlobalConfigUtil.`mysql.server.password`
)
//获取statement
val statement: Statement = connection.createStatement()
//加载商品维度数据
loadDimGoodsData(jedis, statement)
//加载商品分类维度数据
loadDimGoodsCatsData(jedis, statement)
//加载商家维度数据
loadDimShopsData(jedis, statement)
//加载组织机构维度数据
loadDimOrgData(jedis, statement)
//加载商家商品分类维度数据
loadDimShopCatsData(jedis, statement)
//6. 关闭连接
statement.close()
connection.close()
jedis.close()
//退出程序,给个状态码标识,只有0才是正常退出的程序
System.exit(0)
}
/**
* //加载商品维度数据
*
* @param jedis
* @param statement
*/
private def loadDimGoodsData(jedis: Jedis, statement: Statement): Unit = {
//2. 编写SQL查询语句,获取MySQL中的维度数据(查询所有数据)
// 加载商品维度数据
val sql =
"""
|select
|goodsId,
|goodsName,
|goodsCatId,
|shopId,
|shopPrice
|from
|itcast_goods
|""".stripMargin
//3. 将查询出来的数据封装为实体类,便于后续操作.
val resultSet: ResultSet = statement.executeQuery(sql)
while (resultSet.next()) {
// 获取查询的结果信息
val goodsId: String = resultSet.getString("goodsId")
val goodsName: String = resultSet.getString("goodsName")
val goodsCatId: String = resultSet.getString("goodsCatId")
val shopId: String = resultSet.getString("shopId")
val shopPrice: String = resultSet.getString("shopPrice")
// 将数据转换为对象
val goodsDBEntity: DimGoodsDBEntity = DimGoodsDBEntity(
goodsId.toLong,
goodsName,
goodsCatId.toInt,
shopId.toLong,
shopPrice.toDouble
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(goodsDBEntity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
// Redis是一个Key-Value型数据库.hello->world
// 比如: key: itcast_goods value(map): (goodsId, json)
jedis.hset(GlobalConfigUtil.`redis.key.goods`, goodsId, json)
}
}
/**
* //加载商品分类维度数据
*
* @param jedis
* @param statement
*/
def loadDimGoodsCatsData(jedis: Jedis, statement: Statement) = {
//2. 编写SQL查询语句,获取MySQL中的维度数据(查询所有数据)
// 加载商品维度数据
val sql =
"""
|select
|catId,
|parentId,
|catName,
|cat_level
|from
|itcast_goods_cats
|""".stripMargin
//3. 将查询出来的数据封装为实体类,便于后续操作.
val resultSet: ResultSet = statement.executeQuery(sql)
while (resultSet.next()) {
// 获取查询的结果信息
val catId: String = resultSet.getString("catId")
val parentId: String = resultSet.getString("parentId")
val catName: String = resultSet.getString("catName")
val cat_level: String = resultSet.getString("cat_level")
// 将数据转换为对象
val entity: DimGoodsCatDBEntity = DimGoodsCatDBEntity(
catId,
parentId,
catName,
cat_level
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
// Redis是一个Key-Value型数据库.hello->world
// 比如: key: itcast_goods value(map): (goodsId, json)
jedis.hset(GlobalConfigUtil.`redis.key.goods_cats`, catId, json)
}
}
/**
* 加载商家维度数据
* @param jedis
* @param statement
*/
def loadDimShopsData(jedis: Jedis, statement: Statement) = {
//2. 编写SQL查询语句,获取MySQL中的维度数据(查询所有数据)
// 加载商品维度数据
val sql =
"""
|select
|shopId,
|areaId,
|shopName,
|shopCompany
|from
|itcast_shops
|""".stripMargin
//3. 将查询出来的数据封装为实体类,便于后续操作.
val resultSet: ResultSet = statement.executeQuery(sql)
while (resultSet.next()) {
// 获取查询的结果信息
val shopId: String = resultSet.getString("shopId")
val areaId: String = resultSet.getString("areaId")
val shopName: String = resultSet.getString("shopName")
val shopCompany: String = resultSet.getString("shopCompany")
// 将数据转换为对象
val entity: DimShopsDBEntity = DimShopsDBEntity(
shopId.toInt,
areaId.toInt,
shopName,
shopCompany
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
// Redis是一个Key-Value型数据库.hello->world
// 比如: key: itcast_goods value(map): (goodsId, json)
jedis.hset(GlobalConfigUtil.`redis.key.shops`, shopId, json)
}
}
/**
* 加载组织机构维度
* @param jedis
* @param statement
*/
def loadDimOrgData(jedis: Jedis, statement: Statement) = {
//2. 编写SQL查询语句,获取MySQL中的维度数据(查询所有数据)
// 加载商品维度数据
val sql =
"""
|select
|orgId,
|parentId,
|orgName,
|orgLevel
|from
|itcast_org
|""".stripMargin
//3. 将查询出来的数据封装为实体类,便于后续操作.
val resultSet: ResultSet = statement.executeQuery(sql)
while (resultSet.next()) {
// 获取查询的结果信息
val orgId: String = resultSet.getString("orgId")
val parentId: String = resultSet.getString("parentId")
val orgName: String = resultSet.getString("orgName")
val orgLevel: String = resultSet.getString("orgLevel")
// 将数据转换为对象
val entity: DimOrgDBEntity = DimOrgDBEntity(
orgId.toInt,
parentId.toInt,
orgName,
orgLevel.toInt
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
// Redis是一个Key-Value型数据库.hello->world
// 比如: key: itcast_goods value(map): (goodsId, json)
jedis.hset(GlobalConfigUtil.`redis.key.org`, orgId, json)
}
}
/**
* 加载商家商品分类
* @param jedis
* @param statement
*/
def loadDimShopCatsData(jedis: Jedis, statement: Statement) = {
//2. 编写SQL查询语句,获取MySQL中的维度数据(查询所有数据)
// 加载商品维度数据
val sql =
"""
|select
|catId,
|parentId,
|catName,
|catSort
|from
|itcast_shop_cats
|""".stripMargin
//3. 将查询出来的数据封装为实体类,便于后续操作.
val resultSet: ResultSet = statement.executeQuery(sql)
while (resultSet.next()) {
// 获取查询的结果信息
val catId: String = resultSet.getString("catId")
val parentId: String = resultSet.getString("parentId")
val catName: String = resultSet.getString("catName")
val catSort: String = resultSet.getString("catSort")
// 将数据转换为对象
val entity: DimShopCatDBEntity = DimShopCatDBEntity(
catId,
parentId,
catName,
catSort
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
// Redis是一个Key-Value型数据库.hello->world
// 比如: key: itcast_goods value(map): (goodsId, json)
jedis.hset(GlobalConfigUtil.`redis.key.shop_cats`, catId, json)
}
}
}
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-s7yvfWMK-1591323666963)(assets/image-20200427115447878.png)]
目前我们已经实现了将维度数据同步到Redis,但是同步之后的数据如果发送改变,比如商品维度,修改了名字,那么Redis中的数据需要和MySQL保持一致,所以我们还需要一个实时的同步功能.
具体同步的策略:
MySQL中新增数据:Redis新增数据.
MySQL中修改数据:Redis新增数据.
MySQL中删除数据:Redis删除数据.
package cn.itcast.shop.realtime.etl.process
import cn.itcast.shop.bean.RowData
import cn.itcast.shop.realtime.etl.bean.{DimGoodsCatDBEntity, DimGoodsDBEntity, DimOrgDBEntity, DimShopCatDBEntity, DimShopsDBEntity}
import cn.itcast.shop.realtime.etl.process.base.MySQLBaseETL
import cn.itcast.shop.realtime.etl.util.{GlobalConfigUtil, RedisUtil}
import com.alibaba.fastjson.JSON
import com.alibaba.fastjson.serializer.SerializerFeature
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import redis.clients.jedis.Jedis
/**
* 实时同步维度数据到Redis的ETL程序
*/
class SyncDimDataETL(env: StreamExecutionEnvironment) extends MySQLBaseETL(env){
/**
* 后续所有的ETL操作都需要将功能实现放在Process方法中.
*/
override def process(): Unit = {
// 获取数据源
val sourceStream: DataStream[RowData] = getDataSource()
//1. 先将数据进行过滤: 只需要维度表数据.
val filterStream: DataStream[RowData] = sourceStream.filter(rowData => {
// 判断当前表是否是我们需要的维度表
// 因为表比较多,我们可以用模式匹配来进行操作.
rowData.getTableName match {
case "itcast_goods" => true
case "itcast_goods_cats" => true
case "itcast_shops" => true
case "itcast_org" => true
case "itcast_shop_cats" => true
// 如果没有匹配上
case _ => false
}
})
filterStream.print()
//2. 数据落地:
filterStream.addSink(new RichSinkFunction[RowData] {
var jedis: Jedis = _
// open方法是当线程开启的时候,此方法被调用一次.一般适用于创建连接/一些初始化的做
// 比如现在并行度为8,那么open方法会被调用8次.
override def open(parameters: Configuration): Unit = {
// 在open方法中,获取Redis连接对象
jedis = RedisUtil.getJedis()
println("初始化Redis连接资源")
}
override def invoke(rowData: RowData): Unit = {
// 1. 判断当前的操作类型
//使用模式匹配来判断当前的操作类型, 将操作类型转换为小写
rowData.getEventType.toLowerCase match {
case "delete" => {
println("删除数据: " + rowData)
deleteData(rowData)
}
case "insert" => {
println("新增数据: " + rowData)
insertData(rowData)
}
case "update" => {
println("更新数据: " + rowData)
insertData(rowData)
}
case _ => //如果什么都没有匹配上,那么这个invoke方法什么都不做.
}
}
/**
* 删除Redis中的数据
* @param rowData
*/
def deleteData(rowData: RowData): Unit = {
// 我们的数据是以Hash的格式进行保存的,所以数据删除需要使用jedis.hdel这个api
// 如果用户修改的是商品表,那么就去商品这个Key中删除数据
rowData.getTableName match {
case "itcast_goods" => jedis.hdel(GlobalConfigUtil.`redis.key.goods`, rowData.getColumns.get("goodsId"))
case "itcast_goods_cats" => jedis.hdel(GlobalConfigUtil.`redis.key.goods_cats`, rowData.getColumns.get("catId"))
case "itcast_shops" => jedis.hdel(GlobalConfigUtil.`redis.key.shops`, rowData.getColumns.get("shopId"))
case "itcast_org" => jedis.hdel(GlobalConfigUtil.`redis.key.org`, rowData.getColumns.get("orgId"))
case "itcast_shop_cats" => jedis.hdel(GlobalConfigUtil.`redis.key.shop_cats`, rowData.getColumns.get("catId"))
// 如果没有匹配上
case _ =>
}
}
/**
* 新增数据
* @param rowData
*/
def insertData(rowData: RowData): Unit = {
rowData.getTableName match {
case "itcast_goods" => {
// 获取数据
val goodsId: String = rowData.getColumns.get("goodsId")
val goodsName: String = rowData.getColumns.get("goodsName")
val goodsCatId: String = rowData.getColumns.get("goodsCatId")
val shopId: String = rowData.getColumns.get("shopId")
val shopPrice: String = rowData.getColumns.get("shopPrice")
// 将数据封装为对象
val entity: DimGoodsDBEntity = DimGoodsDBEntity(
goodsId.toLong,
goodsName,
goodsCatId.toInt,
shopId.toLong,
shopPrice.toDouble
)
// 将对象转换为json
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
// 将json保存到Redis中.
jedis.hset(GlobalConfigUtil.`redis.key.goods`, goodsId, json)
}
case "itcast_goods_cats" => {
val catId: String = rowData.getColumns.get("catId")
val parentId: String = rowData.getColumns.get("parentId")
val catName: String = rowData.getColumns.get("catName")
val cat_level: String = rowData.getColumns.get("cat_level")
// 将数据转换为对象
val entity: DimGoodsCatDBEntity = DimGoodsCatDBEntity(
catId,
parentId,
catName,
cat_level
)
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
jedis.hset(GlobalConfigUtil.`redis.key.goods_cats`, catId, json)
}
case "itcast_shops" => {
val shopId: String = rowData.getColumns.get("shopId")
val areaId: String = rowData.getColumns.get("areaId")
val shopName: String = rowData.getColumns.get("shopName")
val shopCompany: String = rowData.getColumns.get("shopCompany")
// 将数据转换为对象
val entity: DimShopsDBEntity = DimShopsDBEntity(
shopId.toInt,
areaId.toInt,
shopName,
shopCompany
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
jedis.hset(GlobalConfigUtil.`redis.key.shops`, shopId, json)
}
case "itcast_org" => {
val orgId: String = rowData.getColumns.get("orgId")
val parentId: String = rowData.getColumns.get("parentId")
val orgName: String = rowData.getColumns.get("orgName")
val orgLevel: String = rowData.getColumns.get("orgLevel")
// 将数据转换为对象
val entity: DimOrgDBEntity = DimOrgDBEntity(
orgId.toInt,
parentId.toInt,
orgName,
orgLevel.toInt
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
// Redis是一个Key-Value型数据库.hello->world
// 比如: key: itcast_goods value(map): (goodsId, json)
jedis.hset(GlobalConfigUtil.`redis.key.org`, orgId, json)
}
case "itcast_shop_cats" => {
val catId: String = rowData.getColumns.get("catId")
val parentId: String = rowData.getColumns.get("parentId")
val catName: String = rowData.getColumns.get("catName")
val catSort: String = rowData.getColumns.get("catSort")
val entity: DimShopCatDBEntity = DimShopCatDBEntity(
catId,
parentId,
catName,
catSort
)
//4. 将实体对象转换为json,便于存入Redis.
val json: String = JSON.toJSONString(entity, SerializerFeature.DisableCircularReferenceDetect)
println(json)
//5. 将数据存入Redis中:
jedis.hset(GlobalConfigUtil.`redis.key.shop_cats`, catId, json)
}
// 如果没有匹配上
case _ =>
}
}
override def close(): Unit = super.close()
})
}
}
在MySQL中触发维度数据的增删改,看Redis中的数据是否能够同步的进行增删改即可.
需求:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hIXFBbhx-1591323666966)(assets/image-20200427151924249.png)]
日志处理的方式:
LogParser是一个日志解析工具,可以解析Apache HTTPD和NGINX访问日志数据。我们只需要给它一个指定的日志格式,它就可以将日志转换为对象,我们项目中的日志解析工具采用LogParser进行解析.
github地址:https://github.com/nielsbasjes/logparser
2001-980:91c0:1:8d31:a232:25e5:85d 222.68.172.190 - [05/Sep/2010:11:27:50 +0200] "GET /images/my.jpg HTTP/1.1" 404 23617 "http://www.angularjs.cn/A00n" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; nl-nl) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533.17.8" "jquery-ui-theme=Eggplant; BuI=SomeThing; Apache=127.0.0.1.1351111543699529" "beijingshi"
参考:https://httpd.apache.org/docs/current/mod/mod_log_config.html
%u %h %l %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" "%{Cookie}i" "%{Addr}i"
创建一个封装数据的实体类,需要注意,实体类里面的字段需要提供set方法.
/**
* 测试使用点击日志封装对象
*/
public class MyClickLog {
// 定义客户端IP
private String userClientIP;
public void setUserClientIP(String clientIP) {
this.userClientIP = clientIP;
}
@Override
public String toString() {
return "MyClickLog{" +
"userClientIP='" + userClientIP + '\'' +
'}';
}
}
使用LogParser进行数据解析:
创建LogParser解析对象(实体类的字节码对象, 日志的格式化字符串)
指定解析器:
告诉LogParser,某个字段要设置成什么值.
开始进行解析.获取解析结果对象.
// 编写一个自己的日志解析代码
//1. 先创建一个我们的实体类.
//2. 定义解析规则
String myFormatStr = "%u %h %l %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Addr}i\"";
//3. 定义数据源
String logStr = "2001-980:91c0:1:8d31:a232:25e5:85d 222.68.172.190 - [05/Sep/2010:11:27:50 +0200] \"GET /images/my.jpg HTTP/1.1\" 404 23617 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; nl-nl) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533.17.8\" \"jquery-ui-theme=Eggplant; BuI=SomeThing; Apache=127.0.0.1.1351111543699529\" \"beijingshi\"";
//4. 创建LogParser解析对象
HttpdLoglineParser<MyClickLog> loglineParser = new HttpdLoglineParser<>(MyClickLog.class, myFormatStr);
// 我们在进行数据解析之前,需要指定一下解析器,就是告诉LogParser,应该如何去进行数据解析.
// 也就是说,LogParser解析出来的数据,应该如何跟对象中的字段去进行对应.
// 这些内容需要我们手动的告诉LogParser,否则无法进行字段和日志解析结果进行对应
loglineParser.addParseTarget("setUserClientIP", "IP:connection.client.host");
//5. 开始进行数据解析
MyClickLog myClickLog = loglineParser.parse(logStr);
//测试输出
System.out.println(myClickLog);
_开头的字段一般都是成员变量.
private[this]:
private: 私有变量
[this] : 只允许在当前类中能够使用当前成员变量.
样例类: 样例类中的变量默认都是val修饰的,一旦定义,不能修改,我们可以在前面加上var让变量更改为可以修改的,再配合@BeanProperty就可以实现set方法.
package cn.itcast.shop.realtime.etl.bean
import com.alibaba.fastjson.JSON
import nl.basjes.parse.httpdlog.HttpdLoglineParser
import scala.beans.BeanProperty
class ClickLogBean {
//用户id信息
private[this] var _connectionClientUser: String = _
def setConnectionClientUser (value: String): Unit = { _connectionClientUser = value }
def getConnectionClientUser = { _connectionClientUser }
//ip地址
private[this] var _ip: String = _
def setIp (value: String): Unit = { _ip = value }
def getIp = { _ip }
//请求时间
private[this] var _requestTime: String = _
def setRequestTime (value: String): Unit = { _requestTime = value }
def getRequestTime = { _requestTime }
//请求方式
private[this] var _method:String = _
def setMethod(value:String) = {_method = value}
def getMethod = {_method}
//请求资源
private[this] var _resolution:String = _
def setResolution(value:String) = { _resolution = value}
def getResolution = { _resolution }
//请求协议
private[this] var _requestProtocol: String = _
def setRequestProtocol (value: String): Unit = { _requestProtocol = value }
def getRequestProtocol = { _requestProtocol }
//响应码
private[this] var _responseStatus: Int = _
def setRequestStatus (value: Int): Unit = { _responseStatus = value }
def getRequestStatus = { _responseStatus }
//返回的数据流量
private[this] var _responseBodyBytes: String = _
def setResponseBodyBytes (value: String): Unit = { _responseBodyBytes = value }
def getResponseBodyBytes = { _responseBodyBytes }
//访客的来源url
private[this] var _referer: String = _
def setReferer (value: String): Unit = { _referer = value }
def getReferer = { _referer }
//客户端代理信息
private[this] var _userAgent: String = _
def setUserAgent (value: String): Unit = { _userAgent = value }
def getUserAgent = { _userAgent }
//跳转过来页面的域名:HTTP.HOST:request.referer.host
private[this] var _referDomain: String = _
def setReferDomain (value: String): Unit = { _referDomain = value }
def getReferDomain = { _referDomain }
}
object ClickLogBean{
//定义点击流日志解析规则
val getLogFormat: String = "%u %h %l %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""
//解析字符串转换成对象
def apply(parser:HttpdLoglineParser[ClickLogBean], clickLog:String): ClickLogBean ={
val clickLogBean = new ClickLogBean
parser.parse(clickLogBean, clickLog)
clickLogBean
}
//创建点击流日志解析规则
def createClickLogParser() ={
val parser = new HttpdLoglineParser[ClickLogBean](classOf[ClickLogBean], getLogFormat)
parser.addTypeRemapping("request.firstline.uri.query.g", "HTTP.URI")
parser.addTypeRemapping("request.firstline.uri.query.r", "HTTP.URI")
parser.addParseTarget("setConnectionClientUser", "STRING:connection.client.user")
parser.addParseTarget("setIp", "IP:connection.client.host")
parser.addParseTarget("setRequestTime", "TIME.STAMP:request.receive.time")
parser.addParseTarget("setMethod", "HTTP.METHOD:request.firstline.method")
parser.addParseTarget("setResolution", "HTTP.URI:request.firstline.uri")
parser.addParseTarget("setRequestProtocol", "HTTP.PROTOCOL_VERSION:request.firstline.protocol")
parser.addParseTarget("setResponseBodyBytes", "BYTES:response.body.bytes")
parser.addParseTarget("setReferer", "HTTP.URI:request.referer")
parser.addParseTarget("setUserAgent", "HTTP.USERAGENT:request.user-agent")
parser.addParseTarget("setReferDomain", "HTTP.HOST:request.referer.host")
//返回点击流日志解析规则
parser
}
def main(args: Array[String]): Unit = {
val logline = "2001:980:91c0:1:8d31:a232:25e5:85d 222.68.172.190 - [05/Sep/2010:11:27:50 +0200] \"GET /images/my.jpg HTTP/1.1\" 404 23617 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; nl-nl) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533.17.8\""
val record = new ClickLogBean()
val parser = createClickLogParser()
parser.parse(record, logline)
println(record.getConnectionClientUser)
println(record.getIp)
println(record.getRequestTime)
println(record.getMethod)
println(record.getResolution)
println(record.getRequestProtocol)
println(record.getResponseBodyBytes)
println(record.getReferer)
println(record.getUserAgent)
println(record.getReferDomain)
}
}
case class ClickLogWideBean(@BeanProperty uid:String, //用户id信息
@BeanProperty ip:String, //ip地址
@BeanProperty requestTime:String, //请求时间
@BeanProperty requestMethod:String, //请求方式
@BeanProperty requestUrl:String, //请求地址
@BeanProperty requestProtocol:String, //请求协议
@BeanProperty responseStatus:Int, //响应码
@BeanProperty responseBodyBytes:String,//返回的数据流量
@BeanProperty referrer:String, //访客的来源url
@BeanProperty userAgent:String, //客户端代理信息
@BeanProperty referDomain: String, //跳转过来页面的域名:HTTP.HOST:request.referer.host
@BeanProperty var province: String, //ip所对应的省份
@BeanProperty var city: String, //ip所对应的城市
@BeanProperty var timestamp:Long //时间戳
)
object ClickLogWideBean {
def apply(clickLogBean: ClickLogBean): ClickLogWideBean = {
val bean: ClickLogWideBean = ClickLogWideBean(
clickLogBean.getConnectionClientUser,
clickLogBean.getIp,
clickLogBean.getRequestTime,
//DateUtil.datetime2date(clickLogBean.getRequestTime),
clickLogBean.getMethod,
clickLogBean.getResolution,
clickLogBean.getRequestProtocol,
clickLogBean.getRequestStatus,
clickLogBean.getResponseBodyBytes,
clickLogBean.getReferer,
clickLogBean.getUserAgent,
clickLogBean.getReferDomain,
"",
"",
0)
bean
}
}