技术方案:
我们应该如何实现?
admin.serverPort=8001
重新启动即可。bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper 127.0.0.1:2181 --replication-factor 1 --partitions 1 --topic recommender
bin/kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic recommender
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic recommender --from-beginning
spark-streaming-kafka-0-10
// 定义kafka连接参数
val kafkaParam = Map(
"bootstrap.servers" -> "服务器IP:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "recommender",
"auto.offset.reset" -> "latest"
)
// 通过kafka创建一个DStream
val kafkaStream = KafkaUtils.createDirectStream[String, String]( ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String]( Array(config("kafka.topic")), kafkaParam )
)
// 把原始数据UID|MID|SCORE|TIMESTAMP 转换成评分流
// 1|31|4.5|
val ratingStream = kafkaStream.map{
msg =>
val attr = msg.value().split("\\|")
( attr(0).toInt, attr(1).toInt, attr(2).toDouble, attr(3).toInt )
}
(1)解决方法:修改kafka配置文件,设置为设置listeners为内网ip,设置外网ip
(2)重新启动,成功
在kafka生产一个数据,可以在MongoDB中得到推荐的电影结果
前端进行评分后,触发click事件,后端进行测试埋点,利用log4j写入本地文件中。
log4j.rootLogger=INFO, file, stdout
# write to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p --- [%50t] %-80c(line:%5L) : %m%n
# write to file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.Append=true
log4j.appender.FILE.Threshold=INFO
log4j.appender.file.File=F:/demoparent/business/src/main/log/agent.txt
log4j.appender.file.MaxFileSize=1024KB
log4j.appender.file.MaxBackupIndex=1
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p --- [%50t] %-80c(line:%6L) : %m%n
//埋点日志
import org.apache.log4j.Logger;
// 关键代码
Logger log = Logger.getLogger(MovieController.class.getName());
log.info(MOVIE_RATING_PREFIX + ":" + uid +"|"+ mid +"|"+ score +"|"+ System.currentTimeMillis()/1000)
log4j.appender.syslog=org.apache.log4j.net.SyslogAppender
log4j.appender.syslog.SyslogHost= 服务器IP
log4j.appender.syslog.Threshold=INFO
log4j.appender.syslog.layout=org.apache.log4j.PatternLayout
log4j.appender.syslog.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p --- [%20t] %-130c:(line:%4L) : %m%n
# log-kafka.properties
agent.sources = exectail
agent.channels = memoryChannel
agent.sinks = kafkasink
agent.sources.exectail.type = exec
agent.sources.exectail.command = tail -f /project/logs/agent.log agent.sources.exectail.interceptors=i1 agent.sources.exectail.interceptors.i1.type=regex_filter agent.sources.exectail.interceptors.i1.regex=.+MOVIE_RATING_PREFIX.+ agent.sources.exectail.channels = memoryChannel
agent.sinks.kafkasink.type = org.apache.flume.sink.kafka.KafkaSink agent.sinks.kafkasink.kafka.topic = log agent.sinks.kafkasink.kafka.bootstrap.servers = 服务器地址:9092 agent.sinks.kafkasink.kafka.producer.acks = 1 agent.sinks.kafkasink.kafka.flumeBatchSize = 20
agent.sinks.kafkasink.channel = memoryChannel
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 10000
ratingStream.foreachRDD{
rdds => rdds.foreach{
case (uid, mid, score, timestamp) => {
println("rating data coming! >>>>>>>>>>>>>>>>")
println(uid+",mid:"+mid)
// 1. 从redis里获取当前用户最近的K次评分,保存成Array[(mid, score)]
val userRecentlyRatings = getUserRecentlyRating( MAX_USER_RATINGS_NUM, uid, ConnHelper.jedis )
println("用户最近的K次评分:"+userRecentlyRatings)
// 2. 从相似度矩阵中取出当前电影最相似的N个电影,作为备选列表,Array[mid]
val candidateMovies = getTopSimMovies( MAX_SIM_MOVIES_NUM, mid, uid, simMovieMatrixBroadCast.value )
println("电影最相似的N个电影:"+candidateMovies)
// 3. 对每个备选电影,计算推荐优先级,得到当前用户的实时推荐列表,Array[(mid, score)]
val streamRecs = computeMovieScores( candidateMovies, userRecentlyRatings, simMovieMatrixBroadCast.value )
println("当前用户的实时推荐列表:"+streamRecs)
// 4. 把推荐数据保存到mongodb
saveDataToMongoDB( uid, streamRecs )
}
}
}
def computeMovieScores(candidateMovies: Array[Int],
userRecentlyRatings: Array[(Int, Double)],
simMovies: scala.collection.Map[Int, scala.collection.immutable.Map[Int, Double]]): Array[(Int, Double)] ={
// 定义一个ArrayBuffer,用于保存每一个备选电影的基础得分
val scores = scala.collection.mutable.ArrayBuffer[(Int, Double)]()
// 定义一个HashMap,保存每一个备选电影的增强减弱因子
val increMap = scala.collection.mutable.HashMap[Int, Int]()
val decreMap = scala.collection.mutable.HashMap[Int, Int]()
for( candidateMovie <- candidateMovies; userRecentlyRating <- userRecentlyRatings){
// 拿到备选电影和最近评分电影的相似度
val simScore = getMoviesSimScore( candidateMovie, userRecentlyRating._1, simMovies )
if(simScore > 0.7){
// 计算备选电影的基础推荐得分
scores += ( (candidateMovie, simScore * userRecentlyRating._2) )
if( userRecentlyRating._2 > 3 ){
increMap(candidateMovie) = increMap.getOrDefault(candidateMovie, 0) + 1
} else{
decreMap(candidateMovie) = decreMap.getOrDefault(candidateMovie, 0) + 1
}
}
}
// 根据备选电影的mid做groupby,根据公式去求最后的推荐评分
scores.groupBy(_._1).map{
// groupBy之后得到的数据 Map( mid -> ArrayBuffer[(mid, score)] )
case (mid, scoreList) =>
( mid, scoreList.map(_._2).sum / scoreList.length + log(increMap.getOrDefault(mid, 1)) - log(decreMap.getOrDefault(mid, 1)) )
}.toArray.sortWith(_._2>_._2)
}
cd /docker
docker-compose up -d
docker-compose ps
netstat -lanp | grep "27017"
bin/redis-server etc/redis.conf
./zkServer.sh start
bin/kafka-server-start.sh config/server.properties
bin/flume-ng agent -c ./conf/ -f ./conf/log-kafka.properties -n agent
前端评分成功后写入日志文件,flume对接log日志文件无问题,kafka对接flume无问题,spark streaming处理收到的一条数据,进行推荐,存入MongoDB中。
由于时间匆忙,写的有些匆忙,如果有需要前端设计代码和后端的代码可以评论我,我整理整理发到github上。
前端设计部分没有时间去详细做,后续再对前端页面进行美化。本科当时整合了一个管理系统,现在也没有时间做,总之,一周多时间把当时的系统快速复现了下,算是一个复习。
在进行开发时,遇到许多问题,版本问题、服务器内网外网问题、docker容器相关问题、协同过滤算法设计问题,但帮着自己复习了下Vue和SpringBoot。