阿锁说编程

大数据笔记之Hadoop

1.预备知识

1.1 linux须知

linux微内核的特性，vmware安装linux
centos：稳定
linux常操作目录：/bin,/usr,/etc
xshell:实用xshell操作centos
linux记事本：vi/vim

1.2 常用命令

帮助：man
目录：mkdir，rmdir，mv,ls,rm -rf,cd
文件：touch/vi,cat,cp,rm,more，grep
搜索：which,whereis,find
时间：date ,date -s
用户和组管理：useradd…,groupadd…
进程：ps -ef,kill -9 进程id，pkill -p id 或者/-f 进程
网络：netstat -aux
磁盘:df
压缩和解压：zip，unzip，tar

tar -zcvf 压缩
tar -zxvf 解压

软件：yum
- yum list
- yum install
- yum remove
- rpm -ivh,evh:了解
上传、下载（lrzsz）：rz，sz
定时任务：crontab -e
- min,h,d,m,week
- crontab -l
- crontab -r:删除

1.3 shell脚本

变量：

x,$x
运算符：

$[3+6]
判断:

if [];then fi
循环：

for(()) 或者 for x in list

do

done

while [ ]

do

done
函数:

function fun(){} fun

2.hadoop之windows配置

解压hadoop压缩文件
指定HADOOP_HOME
指定path：/bin，/sbin
测试：

hadoop version

3.在linux上搭建hadoop集群

集群成员：

主机 hdfs yarn

master namenode ,secondarynamenode resourcemanager

slave1 datanode nodemanager

slave2 datanode nodemanager

3.1 安装jdk8，hadoop3.2.1

上传压缩文件并解压(/usr)
设置环境变量（/etc/profile）

export JAVA_HOME=/usr/jdk8
export HADOOP_HOME=/usr/hadoop321
export PATH= $P A T H :$ JAVA_HOME/bin: $HADOOP_HOME/bin:$ HADOOP_HOME/sbin
激活配置文件

. /etc/profile
测试：

hadoop version

3.2 hdfs配置

core-site.xml

fs.defaultFS hdfs://master:9000
hdfs-site.xml

dfs.replication 2 dfs.http.address 0.0.0.0:5700 dfs.namenode.name.dir file:///root/hadoop/dfs/namenode dfs.datanode.data.dir file:///root/hadoop/dfs/datanode dfs.webhdfs.enabled true
初始化namenode

hdfs namenode -format
start-dfs.sh,stop-dfs.sh

#设置用户
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
hadoop-env.sh

export JAVA_HOME=/usr/jdk8

3.3 集群成员配置

域名与ip绑定(/etc/hosts)

192.168.85.129 master
192.168.85.130 slave1
192.168.85.131 slave2
配置workers（工作节点）(/usr/hadoop321/etc/hadoop/workers)

slave1

slave2
修改副本数量(数据节点数量)(hdfs-site.xml)

dfs.replication 2

3.4 yarn配置

yarn-site.xml

yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname master yarn.resourcemanager.webapp.address master:8088 yarn.application.classpath /usr/hadoop321/etc/hadoop:/usr/hadoop321/share/hadoop/common/lib/*:/usr/hadoop321/share/hadoop/common/*:/usr/hadoop321/share/hadoop/hdfs:/usr/hadoop321/share/hadoop/hdfs/lib/*:/usr/hadoop321/share/hadoop/hdfs/*:/usr/hadoop321/share/hadoop/mapreduce/lib/*:/usr/hadoop321/share/hadoop/mapreduce/*:/usr/hadoop321/share/hadoop/yarn:/usr/hadoop321/share/hadoop/yarn/lib/*:/usr/hadoop321/share/hadoop/yarn/*
mapred-site.xml

mapreduce.framework.name yarn
start-yarn.sh,stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root

3.5 cnetos克隆

修改主机名

hostnamectl set-hostanme 主机名
删除/tmp目录下的文件，使数据节点在浏览器端能看见(注意：防火墙关闭)

systemctl disable firewalld(开机不自启)

3.6 master免密登录slave

在root目录创建密钥：

ssh-keygen
authorized_keys拷贝到slave上

cat id_rsa.pub >> authorized_keys

scp 拷贝到salve的.ssh文件夹下

scp authorized_keys root@slave1:/root/.ssh

3.7 启动hadoop集群

在master上启动

start-all.sh
测试

jps
查看节点

hdfs dfsadmin -report

4. mapreduce实例

4.1 单词统计(入门)

/**
 * 英文单词统计
 */
public class WordCounter {
     
    //实现分词
    public static class MyMapper extends Mapper<LongWritable, Text,Text, IntWritable>{
     
        public static Text text = new Text();
        public static IntWritable intWritable = new IntWritable(1);

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
     
            String v = value.toString();
            String[] words = v.split(" ");
            for (String word : words) {
     
                text.set(word);
                context.write(text,intWritable);
            }
        }
    }

    //实现统计
    public static class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
     
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
     
            int count = 0;
            for (IntWritable value : values) {
     
                count+=value.get();
            }
            context.write(key,new IntWritable(count));
        }
    }

    public static void main(String[] args) {
     
        Configuration conf = new Configuration();
        try {
     
            // 任务
            Job job = Job.getInstance(conf);
            job.setJobName("firstJob");
            job.setJarByClass(WordCounter.class);
            // 设置mapper，reducer
            job.setMapperClass(MyMapper.class);
            job.setReducerClass(MyReducer.class);
            // 设置输出数据类型
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            // 设置输入输出目录
            FileInputFormat.setInputPaths(job,"data6");
            FileOutputFormat.setOutputPath(job,new Path("dTemp"));
            // 执行并关闭
            job.waitForCompletion(true);
            job.close();
        } catch (Exception e) {
     
            e.printStackTrace();
        }
    }
}

4.2 中文分词统计（ik）

/**
 * 中文单词统计
 */
public class CNWordCounter {
     
    // 实现中文分词
    public static class MyMapper extends Mapper<LongWritable, Text,Text, IntWritable>{
     

        public static Text text = new Text();
        public static IntWritable intWritable = new IntWritable(1);

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
     
            byte[] bytes = value.toString().getBytes();
            ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
            InputStreamReader isReader = new InputStreamReader(bis);
            IKSegmenter ikSegmenter = new IKSegmenter(isReader, true);

            Lexeme lexeme=null;
            while ((lexeme=ikSegmenter.next())!=null){
     
                String word = lexeme.getLexemeText();
                text.set(word);
                context.write(text,intWritable);
            }
        }
    }

    // 实现统计
    public static class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
     

        public static Text text = new Text();
        public static List<Record> list =new ArrayList<Record>();

        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
     
            int count=0;
            for (IntWritable value : values) {
     
                count+=value.get();
            }
//            context.write(key,new IntWritable(count));
            Record record = new Record(key.toString(), count);
            list.add(record);
        }
        // 实现排序
        @Override
        protected void cleanup(Context context) throws IOException, InterruptedException {
     
            Collections.sort(list);
            Collections.reverse(list);
            for (Record record : list) {
     
                text.set(record.getWord());
                context.write(text,new IntWritable(record.getCount()));
            }
        }
    }

    
    public static void main(String[] args) {
     
        Configuration conf = new Configuration();
        try {
     
            Job job = Job.getInstance(conf);
            job.setJobName("secondJob");
            job.setJarByClass(CNWordCounter.class);
            // 设置mapper,reducer
            job.setMapperClass(MyMapper.class);
            job.setReducerClass(MyReducer.class);
            // 设置输出类型
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            // 设置输入输出目录
            FileInputFormat.setInputPaths(job,"/test99/data2");
            FileOutputFormat.setOutputPath(job,new Path("/test99/out"));
            // 启动任务并关闭
            job.waitForCompletion(true);
            job.close();
        } catch (Exception e) {
     
            e.printStackTrace();
        }

    }
}

4.3 数据清洗（去重、去空、去非）

/**
 * 数据清洗：去空，去重，去非
 */
public class DataClear {
     

    public static void main(String[] args) {
     
        Configuration conf = new Configuration();
        try {
     
            Job job = Job.getInstance(conf);
            job.setJobName("clearJob");
            job.setJarByClass(DataClear.class);
            // mapper
            job.setMapperClass(RemoveReplyMapper.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(NullWritable.class);
            // 输入输出目录
            FileInputFormat.setInputPaths(job,"data4");
            FileOutputFormat.setOutputPath(job,new Path("out"));
            job.waitForCompletion(true);
            job.close();
        } catch (Exception e) {
     
            e.printStackTrace();
        }
    }
}

/**
 * 去空Mapper类
 */
class RemoveNullMapper extends Mapper<LongWritable, Text,Text, NullWritable> {
     
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
     
        String v = value.toString();
        boolean flag = isEmpty(v);
        // 非空则写入
        if (!flag){
     
            context.write(value,NullWritable.get());
        }
    }
    // 如果某一字段为空，则判断为空
    private boolean isEmpty(String v) {
     
        String[] split = v.split("  ");
        for (String field : split) {
     
            if (field==null||field.equals("  ")||field.equals("")){
     
                return true;
            }
        }
        return false;
    }
}

/**
 * 去重Mapper类：利用set集合去重
 */
class RemoveReplyMapper extends Mapper<LongWritable,Text,Text,NullWritable>{
     

    public static Set<String> names = new HashSet<>();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
     
        String v = value.toString();
        boolean flag = isRely(v);
        // 非重复数据则写入
        if (!flag){
     
            context.write(value,NullWritable.get());
        }
    }

    // 如果姓名字段重复，则判定重复
    private boolean isRely(String v) {
     
        String[] split = v.split("  ");
        String name =split[0];
        // 重复
        if (names.contains(name)){
     
            return true;
        }
        // 不重复
        names.add(name);
        return false;
    }
}

/**
 * 去非Mapper类
 */
class RemoveIllegalMapper extends Mapper<LongWritable,Text,Text,NullWritable>{
     

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
     
        String v = value.toString();
        boolean flag = isIllegal(v);
        // 合法数据则写入
        if (!flag){
     
            context.write(value,NullWritable.get());
        }
    }
    // 如果成绩字段>100或<0,则为非法数据
    private boolean isIllegal(String v) {
     
        String[] split = v.split("\\s+");
        for (int i = 1; i < split.length; i++) {
     
            int score = Integer.parseInt(split[i]);
            if (score>100 || score<0){
     
                return true;
            }
        }
        return false;
    }
}

4.4 序列化和反序列化（writable）

/**
 * 手机话费序列化类：话费、流量费
 */
public class PhoneWritable implements Writable {
     

    private String num;
    private Double base;
    private Double flow;

    public PhoneWritable() {
     
    }

    @Override
    public String toString() {
     
        return "PhoneWritable{" +
                "base=" + base +
                ", flow=" + flow +
                '}';
    }

    public PhoneWritable(Double base, Double flow) {
     
        this.base = base;
        this.flow = flow;
    }


    public String getNum() {
     
        return num;
    }

    public void setNum(String num) {
     
        this.num = num;
    }

    public Double getBase() {
     
        return base;
    }

    public void setBase(Double base) {
     
        this.base = base;
    }

    public Double getFlow() {
     
        return flow;
    }

    public void setFlow(Double flow) {
     
        this.flow = flow;
    }

    // 序列化
    @Override
    public void write(DataOutput out) throws IOException {
     
        out.writeDouble(base);
        out.writeDouble(flow);
    }
    // 反序列化
    @Override
    public void readFields(DataInput in) throws IOException {
     
        this.base=in.readDouble();
        this.flow=in.readDouble();
    }
}

4.5 数据排序（WritableComparable）

@Data
@NoArgsConstructor
@AllArgsConstructor
public class SortRecord implements WritableComparable<SortRecord> {
     

    private String key;
    private Integer value;

    @Override
    public String toString() {
     
        return key+"  "+value;
    }


    @Override
    public int compareTo(SortRecord o) {
     
        // 降序
        return o.getValue()-this.getValue();
    }

    @Override
    public void write(DataOutput out) throws IOException {
     
        out.writeUTF(key);
        out.writeInt(value);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
     
        this.key=in.readUTF();
        this.value=in.readInt();
    }
}

4.6 数据压缩（map，reduce）

// map压缩
conf.setBoolean("mapreduce.map.output.compress",true);
conf.setClass("mapreduce.map.output.compress.codec", BZip2Codec.class, CompressionCodec.class);
// reduce压缩
FileOutputFormat.setCompressOutput(job,true);
FileOutputFormat.setOutputCompressorClass(job,BZip2Codec.class);

4.7 连接查询（mapJoin,reduceJoin）

reduce端连接

class JoinMapper extends Mapper<LongWritable, Text,Text,Record>{
     

    Record record=new Record();
    Text text =new Text();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
     
        InputSplit inputSplit = context.getInputSplit();
        FileSplit fileSplit = (FileSplit) inputSplit;
        String name = fileSplit.getPath().getName();
        String[] split = value.toString().split("\\s+");
        String pid=null;
        if (name.startsWith("order")){
     
            pid=split[1];
            record.setOrderid(split[0]);
            record.setPid(split[1]);
            record.setNum(Integer.parseInt(split[2]));
            record.setPname("");
        }else {
     
            pid=split[0];
            record.setOrderid("");
            record.setPid(split[0]);
            record.setPname(split[1]);
            record.setNum(0);
        }
        text.set(pid);
        context.write(text,record);
    }
}

class JoinReducer extends Reducer<Text,Record,Text, NullWritable>{
     

    Text text=new Text();

    @Override
    protected void reduce(Text key, Iterable<Record> values, Context context) throws IOException, InterruptedException {
     
        List<Record> list =new ArrayList<>();
        Record pd =new Record();
        for (Record record : values) {
     
            if (StringUtils.isEmpty(record.getPname())){
     
                Record record1 = new Record();
                // 订单
                try {
     
                    BeanUtils.copyProperties(record1,record);
                } catch (Exception e) {
     
                    e.printStackTrace();
                }
                list.add(record1);
            }else {
     
                pd.setPname(record.getPname());
            }
        }
        for (Record re : list) {
     
            String res =re.getOrderid()+" "+pd.getPname()+" "+re.getNum();
            text.set(res);
            context.write(text,NullWritable.get());
        }
    }
}
public class ReduceJoin {
     
    public static void main(String[] args) {
     
        Configuration conf = new Configuration();
        try {
     
            Job job = Job.getInstance(conf, "reduceJoin");
            job.setJarByClass(ReduceJoin.class);
            job.setMapperClass(JoinMapper.class);
            job.setReducerClass(JoinReducer.class);

            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Record.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(NullWritable.class);

            FileInputFormat.setInputPaths(job,"data");
            FileOutputFormat.setOutputPath(job,new Path("out"));
            job.waitForCompletion(true);
            job.close();
        } catch (Exception e) {
     
            e.printStackTrace();
        }
    }
}

map端连接

自定义序列化类

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Record implements Writable {
         
    @Override
    public void write(DataOutput out) throws IOException {
         
        out.writeUTF(orderid);
        out.writeUTF(pid);
        out.writeUTF(pname);
        out.writeInt(num);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
         
       orderid=in.readUTF();
       pid=in.readUTF();
       pname=in.readUTF();
       num=in.readInt();
    }

    private String orderid;
    private String pid;
    private String pname;
    private Integer num;
}

map任务

class MyMapper extends Mapper<LongWritable, Text,Text, NullWritable>{
         
    // 存放商品星系：id，name
    Map<String,String> map =new HashMap<>();

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
         
        FileInputStream fileInputStream = new FileInputStream("data/pd.txt");
        BufferedReader reader=new BufferedReader(new InputStreamReader(fileInputStream));
        String str=null;
        while ((str=reader.readLine())!=null){
         
            String[] split = str.split("\\s+");
            map.put(split[0],split[1]);
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
         
        InputSplit inputSplit = context.getInputSplit();
        FileSplit fileSplit= (FileSplit) inputSplit;
        String name = fileSplit.getPath().getName();
        if (name.startsWith("order")){
         
            String[] split = value.toString().split("\\s+");
            String res =split[0]+" "+map.get(split[1])+" "+split[2];
            context.write(new Text(res),NullWritable.get());
        }
    }
}

public class MapJoin {
         
    public static void main(String[] args) {
         
        Configuration conf = new Configuration();
        try {
         
            Job job = Job.getInstance(conf, "mapJoin");
            job.setMapperClass(MyMapper.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(NullWritable.class);
            FileInputFormat.setInputPaths(job,"data");
            FileOutputFormat.setOutputPath(job,new Path("out"));
            job.waitForCompletion(true);
            job.close();
        } catch (Exception e) {
         
            e.printStackTrace();
        }
    }
}

4.8 自定义分区（partitioner）

自定义分区类

class MyPartition extends Partitioner<Text,Record>{
       

    @Override
    public int getPartition(Text text, Record record, int i) {
       
        String key= text.toString();
        switch (key){
       
            case "01":
                return 1;
            case "02":
                return 2;
            case "03":
                return 3;
        }
        return 1;
    }
}

job中指定

job.setPartitionerClass(MyPartition.class);
job.setNumReduceTasks(3);

5. hadoop优化

选用高性能机器
map之前预处理:小文件合并成大文件
map阶段：combine，局部汇总
reduce阶段:设置reduce buff参数
数据倾斜
- 自定义分区
- mapJoin

6.zookeeper的使用

下载文件解压(/usr)
配置环境变量（/etc/profile）

export ZK_HOME=/usr/zk
export PATH= $P A T H :$ JAVA_HOME/bin: $HADOOP_HOME/bin:$ HADOOP_HOME/sbin:$ZK_HOME/bin
配置运行参数(/usr/zk/conf/zoo.cfg)

dataDir=/root/zk/data
dataLogDir=/root/zk/log
启动zk服务端

zkServer.sh start
打开zk客户端

zkCli.sh
关闭zk服务端

zkServer.sh stop

7.mapreduce进阶案例

7.1 多mr顺序执行

public static void main(String[] args) {
     
    Mr1.execMr1();
    Mr2.execMr2();
}

7.2 mapreduce的链式执行

//map链
ChainMapper.addMapper(job,MyMapper1.class,LongWritable.class,Text.class,Text.class,IntWritable.class,cfg);
ChainMapper.addMapper(job,MyMapper2.class,Text.class,IntWritable.class,Text.class,IntWritable.class,cfg);
//reducer
ChainReducer.setReducer(job,MyReducer.class,Text.class,IntWritable.class,Text.class,IntWritable.class,cfg);
//mapper3
ChainMapper.addMapper(job,MyMapper3.class,Text.class,IntWritable.class,Text.class,IntWritable.class,cfg);

7.3 气温指数分析

自定义天气类

/**
 * 天气类：年，月，日，温度
 */
@Data@NoArgsConstructor@AllArgsConstructor
public class TianQi implements WritableComparable<TianQi> {
       
    private Integer year;
    private Integer month;
    private Integer day;
    private Integer wd;

    @Override
    public String toString() {
       
        return year+"\t"+month+"\t"+day+"\t"+wd+"c";
    }

    @Override
    public int compareTo(TianQi o) {
       
        // 按年升序，月升序，温度降序，日升序
        int yAsc = Integer.compare(this.getYear(),o.getYear());
        if (yAsc==0){
       
            int mAsc = Integer.compare(this.getMonth(), o.getMonth());
            if (mAsc==0){
       
                int wdDesc = Integer.compare(o.getWd(), this.getWd());
                if (wdDesc==0){
       
                    int dAsc = Integer.compare(this.getDay(), o.getDay());
                    return dAsc;
                }
                return wdDesc;
            }
            return mAsc;
        }
        return yAsc;
    }

    @Override
    public void write(DataOutput out) throws IOException {
       
        out.writeInt(year);
        out.writeInt(month);
        out.writeInt(day);
        out.writeInt(wd);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
       
        year=in.readInt();
        month=in.readInt();
        day=in.readInt();
        wd=in.readInt();
    }
}

自定义分组类

/**
 * 按年月分组
 */
public class TianQiGroupComparator extends WritableComparator {
       
    public TianQiGroupComparator() {
       
        super(TianQi.class,true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
       
        TianQi aa= (TianQi) a;
        TianQi bb = (TianQi) b;
        int y =aa.getYear()-bb.getYear();
        if (y==0){
       
            return aa.getMonth()-bb.getMonth();
        }
        return y;
    }
}

编写mr程序

/**
 * 统计每月温度最高的两天
 */
public class TianQiClient {
       
    /**
     * 封装到TianQi类中
     */
    public static class TianQiMapper extends Mapper<LongWritable, Text,TianQi, NullWritable>{
       
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
       
            String[] split = value.toString().split("\\s+");
            String time =split[0]+" "+split[1];
            int wd = Integer.parseInt(split[2].substring(0, split[2].length()-1));
            SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            try {
       
                Date date = simpleDateFormat.parse(time);
                Calendar calendar = Calendar.getInstance();
                calendar.setTime(date);
                int year =calendar.get(Calendar.YEAR);
                int month =calendar.get(Calendar.MONTH)+1;
                int day =calendar.get(Calendar.DAY_OF_MONTH);
                TianQi tianQi = new TianQi(year, month, day, wd);
                context.write(tianQi,NullWritable.get());
            } catch (ParseException e) {
       
                e.printStackTrace();
            }
        }
    }
    public static class TianQiPartitioner extends Partitioner<TianQi, NullWritable>{
       

        @Override
        public int getPartition(TianQi tianQi, NullWritable nullWritable, int numPartitions) {
       
            return (tianQi.getYear()&Integer.MAX_VALUE)%numPartitions;
        }
    }

    /**
     * 找出温度最高的两天
     */
    public static class TianQiReducer extends Reducer<TianQi,NullWritable,TianQi,NullWritable>{
       
        @Override
        protected void reduce(TianQi key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
       
            int flag = 0;
            int day = 0;
            for (NullWritable nullWritable : values) {
       
                // 写出最高温度
                if(flag == 0){
       
                    context.write(key, NullWritable.get());
                    flag ++;
                    // 记录天
                    day = key.getDay();
                }
                // 写出次高温度
                if(key.getDay() != day){
       
                    context.write(key, NullWritable.get());
                    break;
                }
            }
        }
    }


    public static void main(String[] args) {
       
        Configuration cfg = new Configuration();
        try {
       
            Job job = Job.getInstance(cfg,"tianqi");
            job.setMapperClass(TianQiMapper.class);
            job.setReducerClass(TianQiReducer.class);

            job.setMapOutputKeyClass(TianQi.class);
            job.setMapOutputValueClass(NullWritable.class);
            job.setOutputKeyClass(TianQi.class);
            job.setOutputValueClass(NullWritable.class);

            FileInputFormat.setInputPaths(job,"data2");
            FileOutputFormat.setOutputPath(job,new Path("out"));

            job.setPartitionerClass(TianQiPartitioner.class);
            job.setNumReduceTasks(3);
            job.setGroupingComparatorClass(TianQiGroupComparator.class);

            job.waitForCompletion(true);
            job.close();
        } catch (Exception e) {
       
            e.printStackTrace();
        }
    }
}

7.4 好友推荐

/**
 * 好友推荐:推荐潜在好友
 */
public class FriendClient {
     
    /**
     * 直接好友：0，间接好友：1
     */
    public static class FriendMapper extends Mapper<LongWritable, Text,Text, IntWritable>{
     
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
     
            String[] split = value.toString().split(":");
            String left =split[0];
            String[] rights = split[1].split("\\s+");
            for (int i = 0; i < rights.length; i++) {
     
                // 直接好友
                context.write(new Text(unit(left,rights[i])),new IntWritable(0));
                for (int j = i+1; j < rights.length; j++) {
     
                    // 间接好友
                    context.write(new Text(unit(rights[i],rights[j])),new IntWritable(1));
                }
            }
        }
    }
    public static class FriendReducer extends Reducer<Text, IntWritable,Text, IntWritable> {
     
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
     
            int count = 0;
            for (IntWritable value : values) {
     
                // 排除直接好友
                if (value.get() == 0) {
     
                    return;
                }
                count++;
            }
            context.write(key, new IntWritable(count));
        }
    }
    // 交换
    private static String unit(String left, String right) {
     
        return left.compareTo(right)>0?left+":"+right:right+":"+left;
    }


    public static void main(String[] args) {
     
        Configuration cfg = new Configuration();
        try {
     
            Job job = Job.getInstance(cfg,"fried");
            job.setMapperClass(FriendMapper.class);
            job.setReducerClass(FriendReducer.class);

            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            FileInputFormat.setInputPaths(job,"data4");
            Path out = new Path("out");
            FileSystem fs = out.getFileSystem(cfg);
            if (fs.exists(out)){
     
                fs.delete(out,true);
            }
            FileOutputFormat.setOutputPath(job,out);
            job.waitForCompletion(true);
            job.close();
        } catch (Exception e) {
     
            e.printStackTrace();
        }
    }
}

8 部署高可用Hadoop集群

集群成员：

主机 hdfs yarn

master namenode resourcemanager

slave1 namenode,datanode resourcemanager，nodemanager

slave2 datanode nodemanager

8.1 在master上安装zk

8.2 修改core-site.xml

fs.defaultFS hdfs://cluster hadoop.tmp.dir file:/root/hadoop/tmp ha.zookeeper.quorum master:2181

8.3 修改hdfs-site.xml

<configuration>

<property>
      <name>dfs.nameservicesname>
      <value>clustervalue>
property>

<property>
      <name>dfs.ha.namenodes.clustername>
      <value>master,slave1value>
property>

<property>
      <name>dfs.namenode.rpc-address.cluster.mastername>
      <value>master:9000value>
property>


<property>
      <name>dfs.namenode.http-address.cluster.mastername>
      <value>master:50070value>
property>


<property>
      <name>dfs.namenode.rpc-address.cluster.slave1name>
      <value>slave1:9000value>
property>


<property>
      <name>dfs.namenode.http-address.cluster.slave1name>
      <value>slave1:50070value>
property>


<property>
      <name>dfs.namenode.shared.edits.dirname>
      <value>qjournal://master:8485;slave1:8485;slave2:8485/clustervalue>
property>


<property>
      <name>dfs.journalnode.edits.dirname>
      <value>/root/hadoop/journalDatavalue>
property>


<property>
      <name>dfs.ha.automatic-failover.enabledname>
      <value>truevalue>
property>


<property>
      <name>dfs.client.failover.proxy.provider.myNameService1name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>


<property>
      <name>dfs.ha.fencing.methodsname>
      <value>
              sshfence
              shell(/bin/true)
      value>
property>


<property>
      <name>dfs.ha.fencing.ssh.private-key-filesname>
      <value>/root/.ssh/id_rsavalue>
property>


<property>
      <name>dfs.ha.fencing.ssh.connect-timeoutname>
      <value>30000value>
property>
<property>
      <name>dfs.replicationname>
      <value>2value>
property>
<property>
      <name>dfs.namenode.name.dirname>
      <value>/root/hadoop/dfs/namenodevalue>
property>
<property>
      <name>dfs.datanode.data.dirname>
      <value>/root/hadoop/dfs/datanodevalue>
property>
<property>
       <name>dfs.webhdfs.enabledname>
       <value>truevalue>
property>
<property>
      <name>dfs.permissionsname>
      <value>falsevalue>
property>
<property> 
       <name>dfs.client.failover.proxy.provider.clustername>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
configuration>

8.4 修改mapred-site.xml

<configuration>
        
        <property>
                <name>mapreduce.framework.namename>
                <value>yarnvalue>
        property>
        
        <property>
                <name>mapreduce.jobhistory.addressname>
                <value>master:10020value>
        property>
        
        <property>
                <name>mapreduce.jobhistory.webapp.addressname>
                <value>master:19888value>
        property>
        <property>
                <name>mapreduce.application.classpathname>             			<value>/usr/hadoop321/share/hadoop/mapreduce/*,/usr/hadoop321/share/hadoop/mapreduce/lib/*value>
        property>
configuration>

8.5 修改yarn-site.xml

<configuration>
        
        <property>
                <name>yarn.resourcemanager.ha.enabledname>
                <value>truevalue>
        property>
        
        <property>
                <name>yarn.resourcemanager.cluster-idname>
                <value>yrcvalue>
        property>
        
        <property>
                <name>yarn.resourcemanager.ha.rm-idsname>
                <value>rm1,rm2value>
        property>
        
        <property>
                <name>yarn.resourcemanager.hostname.rm1name>
                <value>mastervalue>
        property>
        <property>
                <name>yarn.resourcemanager.hostname.rm2name>
                <value>slave1value>
        property>
		
        <property>
                <name>yarn.resourcemanager.webapp.address.rm1name>
                <value>master:8088value>
        property>
        <property>
                <name>yarn.resourcemanager.webapp.address.rm2name>
                <value>slave1:8088value>
        property>
        
        <property>
                <name>yarn.resourcemanager.zk-addressname>
                <value>master:2181value>
        property>
        <property>
                <name>yarn.nodemanager.aux-servicesname>
                <value>mapreduce_shufflevalue>
        property>
        <property>
                <name>yarn.application.classpathname>             	<value>/usr/hadoop321/etc/hadoop:/usr/hadoop321/share/hadoop/common/lib/*:/usr/hadoop321/share/hadoop/common/*:/usr/hadoop321/share/hadoop/hdfs:/usr/hadoop321/share/hadoop/hdfs/lib/*:/usr/hadoop321/share/hadoop/hdfs/*:/usr/hadoop321/share/hadoop/mapreduce/lib/*:/usr/hadoop321/share/hadoop/mapreduce/*:/usr/hadoop321/share/hadoop/yarn:/usr/hadoop321/share/hadoop/yarn/lib/*:/usr/hadoop321/share/hadoop/yarn/*value>
        property>
configuration>

8.6 修改hadoop-env.sh

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

8.7 设置多机互相免密登录

在各机器上生成密钥

ssh-keygen
都把公钥放到authorized_keys文件中

cat id_rsa.pub>>authorized_keys

cat id_rsa.pub.s1>>authorized_keys

cat id_rsa.pub.s2>>authorized_keys
发送到每台机器上

scp authorized_keys root@slave1:/root/.ssh

scp authorized_keys root@slave2:/root/.ssh

8.8 启动设置

三台机上都启动journalnode

hdfs --daemon start journalnode
在master上：

hdfs namenode -format #格式化namenode
zkServer.sh start #启动zk
hdfs zkfc -formatZK #格式化zk
scp -r /root/hadoop root@slave1:/root #同步两个namenode
start-all.sh #启动服务
jps查看进程，浏览器访问namenode

你可能感兴趣的:(大数据,hadoop,linux)

从宿主机到虚拟机，云环境下主机端IO路径的变化数据存储张存储技术从入门到精通 php 开发语言
前文我们对云计算的整体架构进行了介绍，并且了解到云计算的核心是虚拟化技术。这里的虚拟化技术包括计算虚拟化、网络虚拟化和存储虚拟化等技术。以基于Linux操作系统的虚拟化为例，通常在物理机上安装Linux操作系统和虚拟化软件，然后基于虚拟化软件创建虚拟机，并在虚拟机中按照操作系统。简而言之，虚拟化可以认为是通过软件虚拟出CPU、内存和硬盘等硬件，并在其上运行操作系统，具体如下图所示。添加图片注释，不
linux install RDMA IB netcard richer_live c++linux 服务器运维
安装RDMAIB网卡驱动指令sudoapt-getinstallbuild-essentiallibelf-devcmakesudoapt-getinstalllibibverbs1libibverbs-devlibrdmacm1librdmacm-devrdmacm-utilsibverbs-utilssudomodprobeib_coresudomodproberdma_ucm无IB网卡的机器
Oracle启动八戒只有一个
1.登录登录linux系统，切换到oracle用户#suoracle2.连接$cd$ORACLE_HOME/bin$sqlplus/nolog3.使用管理员权限>conn/assysdba4.启动服务>startup5.退出>quit6.启动监听$lsnrctlstart---------------------作者：Anniesama来源：CSDN原文：https://blog.csdn.net
新一轮黑产打击：上亿简历大数据公司被警方一锅端大数据的时代
近日，中国的简历大数据公司、曾获李开复旗下创新工场投资的“巧达科技”被警方一锅端，所有员工都被带走。随后，有部分员工被陆续放出。据悉，该公司被查可能缘起在没有获得授权下抓取用户简历。该公司此前曾获得天使轮、A轮和B轮融资，资方包括李开复的创新工场、中信产业基金等。有迹象显示，监管部门正在掀起对大数据灰产和黑产的新一轮打击。传公司被警方一锅端，网站已无法打开。3月23日，有网友在工商信息查询网站“天
兜兜转转，我又开始研究 Windows 系统
刚毕业那会，软件开发几乎只有Windows这一条路。那时，DOS已被Windows完全取代，苹果公司深陷低谷、摇摇欲坠；Linux还在襁褓之中，只是极客们手中的玩物。至于Android和iOS，那更是遥不可及——那是诺基亚称王的时代，手机还在拼铃声、换外壳。随着移动互联网浪潮到来，谷歌、苹果、Facebook等崛起，微软渐渐淡出媒体焦点。进入AI时代后，人们的视线更多投向OpenAI这样的“新锐明
贵州微商行业协会，今日成立我是磊少
图片发自App文/磊少2018年6.19是全国所有微商引以为傲的一天，因为这一天，微商立法了。且被纳入电子商务经营者范围。而我想说的是，今天（2018.8月28）是所有贵州微商最扬眉吐气的一天。因为今天，贵州省微商行业协会成立了。伴随着移动互联网的蓬勃发展，大数据的日新月异，尤其是贵州贵阳作为全球大数据研究中心，吸引了众多国际顶尖的互联网技术与核心人才，更是为贵州互联网的发展插上了理想的翅膀，飞翔
Hadoop与图像识别与处理 AI天才研究院 AI大模型企业级应用开发实战 Agentic AI 实战 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Hadoop与图像识别与处理作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来在大数据时代，数据的爆炸性增长对数据处理技术提出了新的挑战。图像数据作为一种重要的数据形式，其处理和分析在许多领域中具有重要意义，如医疗影像分析、自动驾驶、安防监控等。然而，传统的图像处理方法在面对海量图像数据时显得力不从心。Hadoop作为一种分
Dify丝滑云或本地docker部署步骤适用Linux & macOS neon98 大模型前端 docker 容器 ai macos linux 人工智能
服务器必须有超过4GB的内存！！！Step1:gitclonegitclonehttps://githubfast.com/langgenius/dify.git//从GitHub服务器直接克隆可能会失败。其他GitHub镜像也可以。Step2:docker登录dockerloginghcr.io-u[yourusername]-p[yourGitHubaccesstoken]//此命令用自己的的
深入解析Linux命令：创建目录mkdir的全面指南梦幻南瓜 linux linux 服务器运维
在Linux操作系统中，mkdir命令是创建目录的基础工具。无论是系统管理员还是普通用户，掌握mkdir的使用方法都是必不可少的。本文将详细解读mkdir命令的用法、选项及其在实际操作中的应用场景。1.mkdir命令的基本用法mkdir是“makedirectory”的缩写，用于在指定路径下创建新目录。其基本语法如下：mkdir[选项]目录名1.1创建单个目录最简单的用法是创建一个目录。例如，要在
【Linux指南】Linux系统 -权限全面解析倔强的石头_ Linux指南 linux 服务器运维
引言本文从权限的本质概念出发，逐步拆解Linux系统中用户角色、文件权限标识、权限操作命令等核心要素，不仅深入解析chmod、chown等基础命令的底层逻辑，还会揭示目录权限的特殊规则与默认权限生成机制。无论你是初次接触Linux的新手，还是希望完善权限管理知识体系的系统管理员，都能通过本文构建从理论到实战的完整认知框架，最终掌握在复杂场景下精准配置权限的核心技能。文章目录引言一、Linux权限的
【Redis篇】数据库架构演进中Redis缓存的技术必然性—高并发场景下穿透、击穿、雪崩的体系化解决方案奈斯DB Redis专栏缓存 redis 数据库架构运维
《博主主页》：CSDN主页__奈斯DBIFClub社区主页__奈斯、《擅长领域》：擅长阿里云AnalyticDBforMySQL(分布式数据仓库)、Oracle、MySQL、Linux、prometheus监控；并对SQLserver、NoSQL(Redis)有了解如果觉得文章对你有所帮助，欢迎点赞收藏加关注作为DBA或运维在日常与Redis打交道时，往往更关注部署安装、Key清理、内存回收、备份
Prometheus+node_exporter+grafana监控部署(上）努力爬的小菜鸟 Linux prometheus grafana linux
目录一、部署Prometheus二、部署node_exporter三、把node_exporter加载进Prometheus四、部署grafana一、部署Prometheus1、从官网下载好Prometheus的安装包2、解压tarxfprometheus-2.41.0.linux-amd64.tar.gz3、做个软链方便更新ln-svprometheus-2.41.0.linux-amd64pr
Linux部署Milvus数据库及Attu UI工具完全指南
一、准备工作1.1环境要求操作系统：Ubuntu20.04/Debian11/CentOS7+硬件配置：至少8GB内存，4核CPU，50GB磁盘空间网络要求：可访问互联网（用于拉取Docker镜像）1.2安装Docker和DockerCompose1.2.1安装Dockersudoapt-getupdatesudoapt-getinstall-ydocker.iosudosystemctlstar
Docker HAproxy 配置 & rsyslog 日志处理
环境CentOSLinuxrelease7.9.2009(Core)HAProxyversion2.3.6-7851701,released2021/03/03IP:192.168.0.1Docker安装Docker安装建立haproxy配置文件#新建目录mkdir-p/etc/haproxy#备份配置文件cp/etc/haproxy/haproxy.cfg/etc/haproxy/haproxy
Python.03 唯怡委员 python
1.技术面试题（1）解释Linux中的进程、线程和守护进程的概念，以及如何管理它们？答：进程是Linux中资源分配的基本单位，代表程序在内存中的执行实例，拥有独立的地址空间和系统资源。通过ps、top命令查看，kill命令终止，或使用systemctl管理服务进程。线程是进程内的轻量级执行单元，共享进程资源（如内存），切换开销小。Linux通过POSIX线程（pthread）库实现，可用htop查
Python.01 唯怡委员 python
Python.011.技术面试题（1）TCP与UDP的区别是什么？（2）DHCP和DNS的作用是什么？（3）简述Linux文件系统的目录结构，其中/boot、/var、/usr目录的作用分别是什么？（4）Linux系统突然无法访问外网，但内网通信正常。请列出至少5个可能的故障点及排查步骤。2.HR面试题（1）假如你成功入职，却发现直属领导能力远不如你，你会如何与他共事？（2）你简历上的经历并不突出
大数据领域数据架构的实时数据可视化架构 AGI大模型与大数据研究院 AI大模型应用开发实战信息可视化大数据架构 ai
大数据领域数据架构的实时数据可视化架构关键词：大数据架构、实时数据处理、数据可视化、流式计算、数据管道、可视化工具、性能优化摘要：本文深入探讨了大数据领域中实时数据可视化架构的设计与实现。我们将从基础概念出发，逐步分析实时数据处理流程，介绍关键技术和工具，并通过实际案例展示如何构建高性能的实时可视化系统。文章将涵盖数据采集、处理、存储和可视化展示的全链路架构，同时讨论性能优化策略和未来发展趋势。1
太强了！这款命令行工具可以在 Linux 文件目录快速自动跳转杰哥的IT之旅
作者：JackTian文章首发于公众号：【杰哥的IT之旅】在Linux终端下，跳转目录有时觉得会很麻烦，需要敲很长的命令路径才能进入指定的目录。autojump是提供了一种快速进行文件目录自动跳转的命令行工具。它会将命令行最常用的目录记录保存到数据库里，所以在使用它时，必须先访问目录，然后才能跳转到所要进入的目录。它会根据访问的目录频次添加不同的权重，访问的目录越频繁，权重越高，排名就越靠前，跳转
手动搭建PHP环境：步步为营，解锁Web开发奔跑吧邓邓子项目攻略 php 手动搭建php环境
目录一、引言二、准备工作2.1明确所需软件2.2下载软件三、Windows系统搭建步骤3.1安装Apache服务器3.2安装PHP3.3集成Apache与PHP3.4安装MySQL3.5配置PHP连接MySQL四、Linux系统搭建步骤（以Ubuntu为例）4.1更新系统4.2安装Apache4.3安装MySQL或MariaDB4.4安装PHP及其扩展五、macOS系统搭建步骤5.1安装Homeb
Linux系统修改时区以及校准时间
Linux系统修改时区以及校准时间修改时区（切换到root用户下执行suroot）删除系统自带的loacltime的文件rm-f/etc/loacltime将系统内置的时区文件Shanghai软连接到/etc/localtime（建议直接复制执行）ln-s/usr/share/zoneinfo/Asia/Shanghai/etc/localtime这个时候，就已经修改成功了，可以执行date看一下
hadoop 集群问题处理一切顺势而行 hadoop 大数据分布式
1.1.JournalNode的作用在HDFSHA配置中，为了实现两个NameNode之间的状态同步和故障自动切换，Hadoop使用了一组JournalNode来管理共享的编辑日志。具体来说，JournalNode的主要职责包括：共享编辑日志：JournalNode节点组成了一个分布式系统，用于存储HDFS的编辑日志（EditLogs）。这两个日志文件记录了对HDFS所做的所有更改，如文件创建、删
sqoop从mysql导数据到hdfs，出现java.lang.ClassNotFoundException: Class QueryResult not found 无级程序员大数据 sqoop mysql hdfs
运行sqoop从postgresql/mysql导入数据到hdfs,结果出现如下错误：2025-07-1816:59:13,624INFOorm.CompilationManager:HADOOP_MAPRED_HOMEis/opt/datasophon/hadoop-3.3.3Note:/opt/sqoop/bin/QueryResult.javausesoroverridesadeprecat
Linux 交叉编译分享 Mr_-G 编程入门 Linux 底层软件开发 linux 运维服务器交叉编译嵌入式底层软件
一、交叉编译的基本概念与应用场景1.1交叉编译的定义交叉编译是指在一个操作系统平台上编译出能在另一个不同架构或操作系统平台上运行的程序的过程。通俗地说，就是"在A机器上编译出能在B机器上运行的程序"，其中A和B的硬件架构、操作系统可能完全不同。在传统的本地编译中，编译环境与运行环境是一致的，例如在x86_64的Linux系统上编译并运行程序。而交叉编译的核心特点在于编译环境与目标运行环境的分离，这
Rust 开发环境搭建与插件 pumpkin84514 rust rust 开发语言后端
生成完整文章用户请求写一篇完整的CSDN文章，要求根据之前的建议生成。文章要包含结构清晰、标题、代码示例、常见提示、跨平台Rust工具链安装差异等内容。文章需准确，包括插件和使用提示，CLI程序示例等。主要内容包括：介绍、Rust工具链安装（Windows/macOS/Linux差异）、VSCode安装与扩展列表、扩展细节与使用、开发者常见提示（如rust-analyzer功能、crate使用等）
第二章 Linux常用命令及使用胖虎大魔王
目录操作cdusr/:切换到该目录下usr目录cd..:切换到上一层目录cd/:切换到系统根目录mkdir:目录名称创建目录ls:目录名称查询该目录下所有的目录和文件ls[-a]:目录名称查询该目录下所有的目录和文件,包含隐藏文件ls[-l]:目录名称查询该目录下所有的目录和文件的详细信息find/-name:目录名称查找/root下的目录(文件)mv:目录名称新目录名称修改目录名称mv:目录名称
python的pywebview库结合Flask和waitress开发桌面应用程序简介 czliutz python 笔记 python flask 开发语言
pywebview的用途与特点用途pywebview是一个轻量级Python库，用于创建桌面应用程序（GUI）。它通过嵌入Web浏览器组件（如Windows的Edge/IE、macOS的WebKit、Linux的GTKWebKit），允许开发者使用HTML/CSS/JavaScript构建界面，并用Python处理后端逻辑。这种方式结合了Web技术的灵活性和Python的强大功能，适合快速开发跨平
Rust交叉编译自动化实战
告别手动编译地狱！用GitLabRunner让Rust程序跨平台自动构建还在为不同平台手动编译Rust程序而烦恼？还在为环境配置不一致而抓狂？今天带你解锁Rust交叉编译的自动化大法，让你的CI/CD流水线飞起来！痛点在哪里？作为一个Rust开发者，你是否遇到过这些让人头疼的问题：平台差异：在Mac上开发，部署到Linux服务器，每次都要手动交叉编译环境配置：依赖库版本不一致，OpenSSL找不到
Windows安装子系统部署服务并实现外部访问，WSL安装Ubuntu、CentOS、openEuler子系统，最全保姆级教程，手把手教会你。 Mr.L-OAM linux系统运维服务器运维
1环境准备1.1系统设置打开powershellwin+r输入poweroff如果后面操作提示需要提升权限，此处在搜索栏输入powershell右键以管理员身份运行官网文档1.1.1适用于Linux的Windows子系统dism.exe/online/enable-feature/featurename:Microsoft-Windows-Subsystem-Linux/all
Linux Ubuntu安装教程|附安装文件➕安装教程仰望天空—永强软件工程数学建模硬件工程 linux python
[软件名称]:LinuxUbuntu18.0[软件大小]:1.8GB[安装环境]:VMware[夸克网盘接]链接：https://pan.quark.cn/s/971f685256ef（建议用手机保存到网盘后，再用电脑下载）更多免费软件见https://docs.qq.com/sheet/DRkdWVFFCWm9UeGJP
Linux 0.01源码深入解析羊迪
本文还有配套的精品资源，点击获取简介：Linux0.01源码代表了Linux操作系统的起点，揭示了其基本架构和内核设计原理。通过源码分析，开发者可以了解早期的进程管理、内存管理、文件系统、设备驱动、中断处理、系统调用等关键概念。此外，源码还展现了如何进行编译和构建，为想要深入理解操作系统和开源精神的开发者提供了一份宝贵的学习资源。1.Linux0.01源码概述Linux操作系统的核心是其内核，而L
数据采集高并发的架构应用 3golden .net
问题的出发点：最近公司为了发展需要，要扩大对用户的信息采集，每个用户的采集量估计约2W。如果用户量增加的话，将会大量照成采集量成3W倍的增长，但是又要满足日常业务需要，特别是指令要及时得到响应的频率次数远大于预期。 &n
不停止 MySQL 服务增加从库的两种方式 brotherlamp linux linux视频 linux资料 linux教程 linux自学
现在生产环境MySQL数据库是一主一从，由于业务量访问不断增大，故再增加一台从库。前提是不能影响线上业务使用，也就是说不能重启MySQL服务，为了避免出现其他情况，选择在网站访问量低峰期时间段操作。一般在线增加从库有两种方式，一种是通过mysqldump备份主库，恢复到从库，mysqldump是逻辑备份，数据量大时，备份速度会很慢，锁表的时间也会很长。另一种是通过xtrabacku
Quartz——SimpleTrigger触发器 eksliang SimpleTrigger TriggerUtils quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208166 一.概述 SimpleTrigger触发器，当且仅需触发一次或者以固定时间间隔周期触发执行；二.SimpleTrigger的构造函数 SimpleTrigger(String name, String group)：通过该构造函数指定Trigger所属组和名称； Simpl
Informatica应用（1） 18289753290 sql workflow lookup 组件 Informatica
1.如果要在workflow中调用shell脚本有一个command组件，在里面设置shell的路径；调度wf可以右键出现schedule，现在用的是HP的tidal调度wf的执行。 2.designer里面的router类似于SSIS中的broadcast（多播组件）;Reset_Workflow_Var：参数重置（比如说我这个参数初始是1在workflow跑得过程中变成了3我要在结束时还要
python 获取图片验证码中文字酷的飞上天空 python
根据现成的开源项目 http://code.google.com/p/pytesser/改写在window上用easy_install安装不上看了下源码发现代码很少于是就想自己改写一下添加支持网络图片的直接解析 #coding:utf-8 #import sys #reload(sys) #sys.s
AJAX 永夜-极光 Ajax
1.AJAX功能:动态更新页面,减少流量消耗,减轻服务器负担 2.代码结构: <html> <head> <script type="text/javascript"> function loadXMLDoc() { .... AJAX script goes here ...
创业OR读研随便小屋创业
现在研一，有种想创业的想法，不知道该不该去实施。因为对于的我情况这两者是矛盾的，可能就是鱼与熊掌不能兼得。研一的生活刚刚过去两个月，我们学校主要的是
需求做得好与坏直接关系着程序员生活质量 aijuans IT 生活
这个故事还得从去年换工作的事情说起，由于自己不太喜欢第一家公司的环境我选择了换一份工作。去年九月份我入职现在的这家公司，专门从事金融业内软件的开发。十一月份我们整个项目组前往北京做现场开发，从此苦逼的日子开始了。系统背景：五月份就有同事前往甲方了解需求一直到6月份，后续几个月也完
如何定义和区分高级软件开发工程师 aoyouzi
在软件开发领域，高级开发工程师通常是指那些编写代码超过 3 年的人。这些人可能会被放到领导的位置，但经常会产生非常糟糕的结果。Matt Briggs 是一名高级开发工程师兼 Scrum 管理员。他认为，单纯使用年限来划分开发人员存在问题，两个同样具有 10 年开发经验的开发人员可能大不相同。近日，他发表了一篇博文，根据开发者所能发挥的作用划分软件开发工程师的成长阶段。　　初
Servlet的请求与响应百合不是茶 servlet get提交 java处理post提交
Servlet是tomcat中的一个重要组成,也是负责客户端和服务端的中介 1,Http的请求方式(get ,post); 客户端的请求一般都会都是Servlet来接受的,在接收之前怎么来确定是那种方式提交的,以及如何反馈,Servlet中有相应的方法, http的get方式 servlet就是都doGet(
web.xml配置详解之listener bijian1013 java web.xml listener
一.定义 <listener> <listen-class>com.myapp.MyListener</listen-class> </listener> 二.作用该元素用来注册一个监听器类。可以收到事件什么时候发生以及用什么作为响
Web页面性能优化（yahoo技术） Bill_chen JavaScript Ajax Web css Yahoo
1.尽可能的减少HTTP请求数 content 2.使用CDN server 3.添加Expires头(或者 Cache-control) server 4.Gzip 组件 server 5.把CSS样式放在页面的上方。 css 6.将脚本放在底部(包括内联的) javascript 7.避免在CSS中使用Expressions css 8.将javascript和css独立成外部文
【MongoDB学习笔记八】MongoDB游标、分页查询、查询结果排序 bit1129 mongodb
游标游标，简单的说就是一个查询结果的指针。游标作为数据库的一个对象，使用它是包括声明打开循环抓去一定数目的文档直到结果集中的所有文档已经抓取完关闭游标游标的基本用法，类似于JDBC的ResultSet(hasNext判断是否抓去完,next移动游标到下一条文档)，在获取一个文档集时，可以提供一个类似JDBC的FetchSize
ORA-12514 TNS 监听程序当前无法识别连接描述符中请求服务的解决方法白糖_ ORA-12514
今天通过Oracle SQL*Plus连接远端服务器的时候提示“监听程序当前无法识别连接描述符中请求服务”，遂在网上找到了解决方案： ①打开Oracle服务器安装目录\NETWORK\ADMIN\listener.ora文件，你会看到如下信息： # listener.ora Network Configuration File: D:\database\Oracle\net
Eclipse 问题 A resource exists with a different case bozch eclipse
在使用Eclipse进行开发的时候，出现了如下的问题： Description Resource Path Location TypeThe project was not built due to "A resource exists with a different case: '/SeenTaoImp_zhV2/bin/seentao'.&
编程之美-小飞的电梯调度算法 bylijinnan 编程之美
public class AptElevator { /** * 编程之美小飞电梯调度算法 * 在繁忙的时间，每次电梯从一层往上走时，我们只允许电梯停在其中的某一层。 * 所有乘客都从一楼上电梯，到达某层楼后，电梯听下来，所有乘客再从这里爬楼梯到自己的目的层。 * 在一楼时，每个乘客选择自己的目的层，电梯则自动计算出应停的楼层。 * 问：电梯停在哪
SQL注入相关概念 chenbowen00 sql Web 安全
SQL Injection：就是通过把SQL命令插入到Web表单递交或输入域名或页面请求的查询字符串，最终达到欺骗服务器执行恶意的SQL命令。具体来说，它是利用现有应用程序，将（恶意）的SQL命令注入到后台数据库引擎执行的能力，它可以通过在Web表单中输入（恶意）SQL语句得到一个存在安全漏洞的网站上的数据库，而不是按照设计者意图去执行SQL语句。首先让我们了解什么时候可能发生SQ
[光与电]光子信号战防御原理 comsci 原理
无论是在战场上,还是在后方,敌人都有可能用光子信号对人体进行控制和攻击,那么采取什么样的防御方法,最简单,最有效呢? 我们这里有几个山寨的办法,可能有些作用,大家如果有兴趣可以去实验一下根据光
oracle 11g新特性:Pending Statistics daizj oracle dbms_stats
oracle 11g新特性:Pending Statistics 转从11g开始，表与索引的统计信息收集完毕后，可以选择收集的统信息立即发布，也可以选择使新收集的统计信息处于pending状态，待确定处于pending状态的统计信息是安全的，再使处于pending状态的统计信息发布，这样就会避免一些因为收集统计信息立即发布而导致SQL执行计划走错的灾难。在 11g 之前的版本中，D
快速理解RequireJs dengkane jquery requirejs
RequireJs已经流行很久了，我们在项目中也打算使用它。它提供了以下功能：声明不同js文件之间的依赖可以按需、并行、延时载入js库可以让我们的代码以模块化的方式组织初看起来并不复杂。在html中引入requirejs 在HTML中，添加这样的 <script> 标签： <script src="/path/to
C语言学习四流程控制if条件选择、for循环和强制类型转换 dcj3sjt126com c
# include <stdio.h> int main(void) { int i, j; scanf("%d %d", &i, &j); if (i > j) printf("i大于j\n"); else printf("i小于j\n"); retu
dictionary的使用要注意 dcj3sjt126com IO
NSDictionary *dict = [NSDictionary dictionaryWithObjectsAndKeys: user.user_id , @"id", user.username , @"username",
Android 中的资源访问(Resource) finally_m xml android String drawable color
简单的说，Android中的资源是指非代码部分。例如，在我们的Android程序中要使用一些图片来设置界面，要使用一些音频文件来设置铃声，要使用一些动画来显示特效，要使用一些字符串来显示提示信息。那么，这些图片、音频、动画和字符串等叫做Android中的资源文件。在Eclipse创建的工程中，我们可以看到res和assets两个文件夹，是用来保存资源文件的，在assets中保存的一般是原生
Spring使用Cache、整合Ehcache 234390216 spring cache ehcache @Cacheable
Spring使用Cache 从3.1开始，Spring引入了对Cache的支持。其使用方法和原理都类似于Spring对事务管理的支持。Spring Cache是作用在方法上的，其核心思想是这样的：当我们在调用一个缓存方法时会把该方法参数和返回结果作为一个键值对存放在缓存中，等到下次利用同样的
当druid遇上oracle blob(clob) jackyrong oracle
http://blog.csdn.net/renfufei/article/details/44887371 众所周知，Oracle有很多坑, 所以才有了去IOE。在使用Druid做数据库连接池后，其实偶尔也会碰到小坑，这就是使用开源项目所必须去填平的。【如果使用不开源的产品，那就不是坑，而是陷阱了，你都不知道怎么去填坑】用Druid连接池，通过JDBC往Oracle数据库的
easyui datagrid pagination获得分页页码、总页数等信息 ldzyz007
var grid = $('#datagrid'); var options = grid.datagrid('getPager').data("pagination").options; var curr = options.pageNumber; var total = options.total; var max =
浅析awk里的数组 nigelzeng 二维数组 array 数组 awk
awk绝对是文本处理中的神器，它本身也是一门编程语言，还有许多功能本人没有使用到。这篇文章就单单针对awk里的数组来进行讨论，如何利用数组来帮助完成文本分析。有这么一组数据： abcd,91#31#2012-12-31 11:24:00 case_a,136#19#2012-12-31 11:24:00 case_a,136#23#2012-12-31 1
搭建 CentOS 6 服务器(6) - TigerVNC rensanning centos
安装GNOME桌面环境 # yum groupinstall "X Window System" "Desktop" 安装TigerVNC # yum -y install tigervnc-server tigervnc 启动VNC服务 # /etc/init.d/vncserver restart # vncser
Spring 数据库连接整理 tomcat_oracle spring bean jdbc
1、数据库连接jdbc.properties配置详解　　jdbc.url=jdbc:hsqldb:hsql://localhost/xdb 　　jdbc.username=sa 　　jdbc.password= 　　jdbc.driver=不同的数据库厂商驱动，此处不一一列举　　接下来，详细配置代码如下：　　 Spring连接池
Dom4J解析使用xpath java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常 xp9802
用Dom4J解析xml,以前没注意,今天使用dom4j包解析xml时在xpath使用处报错异常栈：java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常导入包 jaxen-1.1-beta-6.jar 解决; &nb