我们都知道, Apache Hadoop
需要使用许多依赖项,我们当然不希望花费大量的时间在项目配置上,最好的方式是能够开箱即用、快速编码生成 Apache Hadoop MapReduce
项目。如果能够有一个快速生成 Apache Hadoop MapReduce
项目的原型,岂不是非常方便?
本原型适用于以下开发环境:
Java Development Kit 8
,其中 OpenJDK
与 Oracle JDK
都是可兼容的。Apache Hadoop v2.7.1
,暂不清楚其他版本的适用情况。Apache Maven
。*.pom
和 *.jar
。*.jar
与 *.pom
放置于 $LOCAL_REPO/io/github/dragon1573/hadoop-quickstart-archetype/1.0-mapr271-jdk8/
目录下。其中, $LOCAL_REPO
目录为 Apache Maven
本地仓库地址,安装后默认为 ~/.m2/repository
。$LOCAL_REPO/archetype-catalog.xml
,将下载获得的 Maven Archetype
严格按如下格式添加到目录中。
<archetype-catalog xsi:schemaLocation="http://maven.apache.org/plugins/maven-archetype-plugin/archetype-catalog/1.0.0 http://maven.apache.org/xsd/archetype-catalog-1.0.0.xsd" xmlns="http://maven.apache.org/plugins/maven-archetype-plugin/archetype-catalog/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<archetypes>
<archetype>
<groupId>io.github.dragon1573groupId>
<artifactId>hadoop-quickstart-archetypeartifactId>
<version>1.0-mapr271-jdk8version>
<description>Immediately generate an Apache Hadoop MapReduce Jobdescription>
<repository>https://maven.pkg.github.com/Dragon1573/Maven-Hadooprepository>
archetype>
archetypes>
archetype-catalog>
$ mvn -DarchetypeCatalog=local archetype:generate
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:3.1.2:generate (default-cli) > generate-sources @ standalone-pom >>>
[INFO]
[INFO] <<< maven-archetype-plugin:3.1.2:generate (default-cli) < generate-sources @ standalone-pom <<<
[INFO]
[INFO]
[INFO] --- maven-archetype-plugin:3.1.2:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: local -> io.github.dragon1573:hadoop-quickstart-archetype (Immediately generate an Apache Hadoop MapReduce Job)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : 1
Define value for property 'groupId': com.example
Define value for property 'artifactId': hadoop-quickstart
Define value for property 'version' 1.0-SNAPSHOT: : 1.0
Define value for property 'package' com.example: : main
Confirm properties configuration:
groupId: com.example
artifactId: hadoop-quickstart
version: 1.0
package: main
Y: : Y
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: hadoop-quickstart-archetype:1.0-mapr271-jdk8
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.example
[INFO] Parameter: artifactId, Value: hadoop-quickstart
[INFO] Parameter: version, Value: 1.0
[INFO] Parameter: package, Value: main
[INFO] Parameter: packageInPathFormat, Value: main
[INFO] Parameter: package, Value: main
[INFO] Parameter: groupId, Value: com.example
[INFO] Parameter: artifactId, Value: hadoop-quickstart
[INFO] Parameter: version, Value: 1.0
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\codeStyles\codeStyleConfig.xml
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\codeStyles\Project.xml
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\inspectionProfiles\Project_Default.xml
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\copyright\Apache_v2_0.xml
[INFO] Project created from Archetype in dir: D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 35.209 s
[INFO] Finished at: 2020-03-28T15:26:57+08:00
[INFO] ------------------------------------------------------------------------
$ cd hadoop-quickstart/
legen@Legend1949 MINGW64 /Repos/hadoop-quickstart
$ tree
.
├── LICENSE
├── README.md
├── pom.xml
└── src
└── main
├── java
│ ├── main
│ │ └── DailyAccessCount.java
│ └── mapreduce
│ ├── MyMapper.java
│ └── MyReducer.java
└── resources
└── user_login.txt
6 directories, 7 files
原型提供了一个简单的 Apache Hadoop MapReduce
项目——编程实现按日期统计访问次数,以下项目的任务描述:
- 本项目的设计目标是统计用户在2016年度每个自然日的总访问次数。
- 原始文件
src/main/resources/user_login.txt
中提供了用户名称与访问日期。- 本项目任务的是指是要获取以每个自然日为单位的所有用户访问次数的累加值。
pom.xml
的同级目录下,使用如下命令将项目打包生成 *.jar
程序包(示例)。$ mvn package
[INFO] Scanning for projects...
[INFO]
[INFO] -------------------< com.example:hadoop-quickstart >--------------------
[INFO] Building hadoop-quickstart 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hadoop-quickstart ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hadoop-quickstart ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 3 source files to D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\target\classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hadoop-quickstart ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\src\test\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-quickstart ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ hadoop-quickstart ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ hadoop-quickstart ---
[INFO] Building jar: D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\target\hadoop-quickstart-1.0.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.744 s
[INFO] Finished at: 2020-03-28T15:44:42+08:00
[INFO] ------------------------------------------------------------------------
src/main/resources/user_login.txt
上传至 Apache Hadoop HDFS
分布式文件系统,文件所在目录记为 $INPUT_DIR
。hadoop jar target/hadoop-quickstart-1.0.jar main.DailyAccessCount $INPUT_DIR $OUTPUT_DIR
将 MapReduce
任务程序包提交至 Apache Hadoop
集群运行。其中, $OUTPUT_DIR
是 MapReduce
任务完成后输出结果的 Apache Hadoop HDFS
分布式文件系统目录。hdfs dfs -cat "$OUTPUT_DIR/* | head -n 15"
查看前15项排序结果。 如果您在使用本原型的过程中遇到任何问题,欢迎在评论区或 Issues · Dragon1573/Maven-Hadoop 进行反馈,我会尽能力修复。
感谢你们下载、安装并使用本原型!