尹忠政

MapReduce实现分词和倒排索引(算法TF-IDF)

介绍IFDF

 IF：词频（单词在文档中出现的次数/文档中的总词数）
 DF：逆向文件频率（log(文档总数/出现该单词的文件数量)），log归一化，避免了一些常用词如 的,了等词的评分
 IF/DF能表明单词在索引(文档库)的重要程度

输入文件

id \t 文本内容
3823891101582094	我爱中国
3823891201582094	北京天安门广场现在有升国旗
3823891301582094	武汉天气好热

思路

第一个MR需要将文件的每一行进行分词，每一行相当于elasticsearch的一个文档，汇总词在文档出现的次数和文档的单词总数，在这个里还需要记录文档的总数量，以及文件出现每个单词的文件数量。
```
part-r-00000、part-r-00001（id 词 词出现的次数 文档单词数）
3823891101582094	我	1	4
3823890210294392	爱	1	4

part-r-00002（词	出现该词的文档数量）
0	6
0.03	1
part-r-00003 （文档数量）
123564
```

第二MR的输入是上个MR的输出，该MR需要统计IDF。

	需要用到的文件是part-r-00002
	0	6
	0.03	1
	part-r-00003（作为cacheFile）
	123564

	输出 （单词 出现的文件数 IDF）
	0	66	274.0
	0.03	11	1645.0
	0.03元	11	1645.0

通过MR1和MR2的铺垫，最好可以通过MR1的TF和MR2的IDF，来计算单词的最终的评分

	输出 id	单词	TF DF IF/DF
3823953501983192	0.88斤	0.045454545454545456	9.575816471867984	0.4352643850849084
3823926323040930	0.9斤	0.05263157894736842	9.575816471867984	0.5039903406246308

实现

MR1

/**
 * @author yzz
 * @time 2019/6/1 14:03
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFWordCountJob extends Base {

    public static void client() throws Exception {
        Configuration conf = new Configuration();
        conf.set("mapreduce.app-submission.coress-paltform", "true");
        conf.set("mapreduce.framework.name", "local");

        Job job = Job.getInstance(conf);
        job.setJobName("IFWordCountJob");
        job.setJarByClass(IFWordCountJob.class);
        Path in = new Path(getConfig(conf, "search.in"));
        Path out = new Path(getConfig(conf, "search.out"));
        clean(out, conf);

        addInPath(job, conf, in);
        FileOutputFormat.setOutputPath(job, out);

        //map
        //切片
        job.setPartitionerClass(IFWordCountPartitioner.class);
        job.setNumReduceTasks(4);
        job.setMapperClass(IFWordCountMapper.class);
        job.setMapOutputKeyClass(IFWordCountVO.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setCombinerClass(IFWordCountReducer.class);
        //reduce
        job.setReducerClass(IFWordCountReducer.class);
        job.setOutputKeyClass(IFWordCountVO.class);
        job.setOutputValueClass(NullWritable.class);
        //submit
        boolean complete = job.waitForCompletion(true);
        Path documentCount = new Path(getConfig(conf, "df.document.count"));
        Path wordCountInAllFile = new Path(getConfig(conf, "df.temp.in"), "WordFileCountInIndex.txt");
        if (complete) {
            //更新 文件总数和词在文件中出现的文件数
            readAndWrite(new Path(out, "part-r-00003"), documentCount, conf);
            copyFile(new Path(out, "part-r-00000"), new Path(getConfig(conf, "search.if"), job.getJobID().toString() + "_" + "part-r-00000"), conf);
            copyFile(new Path(out, "part-r-00001"), new Path(getConfig(conf, "search.if"), job.getJobID().toString() + "_" + "part-r-00001"), conf);
            copyFile(new Path(out, "part-r-00002"), wordCountInAllFile, conf);
        }
    }
}

/**
 * @author yzz
 * @time 2019/6/1 14:22
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFWordCountMapper extends Mapper {

    IFWordCountVO wordCountVO = new IFWordCountVO();

    /**
     * 3823890264861035	我约了吃饭哦
     *
     * @param key
     * @param value
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //文档统计
        documentStatics(context);
        //切割
        String[] split = StringUtils.split(value.toString(), '\t');
        wordCountVO.setId(split[0]);
        String content = split[1];
        //分词
        IKSegmenter ikSegmenter = new IKSegmenter(new StringReader(content), true);
        Lexeme ikWord = null;
        List list = new ArrayList<>();

        //词统计和统计 DF
        while ((ikWord = ikSegmenter.next()) != null) {
            String word = ikWord.getLexemeText();
            list.add(word);
            wordCountVO.setId("");
            wordCountVO.setWord(word);
            wordCountVO.setCount(1);
            wordCountVO.setType(3);
            context.write(wordCountVO, NullWritable.get());
        }
        //输出
        for (String word : list) {
            wordCountVO.setId(split[0]);
            wordCountVO.setWord(word);
            wordCountVO.setType(1);
            wordCountVO.setWordCount(list.size());
            context.write(wordCountVO, NullWritable.get());
        }
    }

    public void documentStatics(Context context) throws IOException, InterruptedException {
        wordCountVO.setWord("");
        wordCountVO.setId("");
        wordCountVO.setCount(1);
        wordCountVO.setType(2);
        context.write(wordCountVO, NullWritable.get());
    }
}

/**
 * @author yzz
 * @time 2019/6/1 14:23
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFWordCountReducer extends Reducer {


    /**
     * 2222 q 1
     * 2222 q 1
     * 2222 q 1
     * 2
     * @param key
     * @param values
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void reduce(IFWordCountVO key, Iterable values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (NullWritable n : values) {
            sum += key.getCount();
        }
        key.setCount(sum);
        
        context.write(key, NullWritable.get());
    }

}

/**
 * @author yzz
 * @time 2019/6/1 14:13
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFWordCountPartitioner extends HashPartitioner {

    @Override
    public int getPartition(IFWordCountVO key, NullWritable value, int numReduceTasks) {
        if (key.getType() == 2) {
            return 3;
        }
        if (key.getType() == 3) {
            return 2;
        }
        return (key.getId().hashCode() & Integer.MAX_VALUE) % (numReduceTasks - 2);
    }
}

/**
 * @author yzz
 * @time 2019/6/1 14:14
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFWordCountVO implements WritableComparable {

    private String id;
    private String word;
    private int count = 1;
    private int wordCount = 0;
    int type = 1;//1:表示分词 2.表示文档数量 3 单词出现的文档数

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(id);
        out.writeUTF(word);
        out.writeInt(count);
        out.writeInt(type);
        out.writeInt(wordCount);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        id = in.readUTF();
        word = in.readUTF();
        count = in.readInt();
        type = in.readInt();
        wordCount = in.readInt();
    }

    @Override
    public int compareTo(IFWordCountVO o) {
        int c = id.compareTo(o.getId());
        if (c == 0) {
            return word.compareTo(o.getWord());
        }
        return c;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getWord() {
        return word;
    }

    public void setWord(String word) {
        this.word = word;
    }

    public int getCount() {
        return count;
    }

    public void setCount(int count) {
        this.count = count;
    }

    public int getType() {
        return type;
    }

    public void setType(int type) {
        this.type = type;
    }

    @Override
    public String toString() {
        if (type == 2) {
            return String.valueOf(count);
        }
        if (type == 3) {
            return word + '\t' + count;
        }
        return id + "\t" + word + '\t' + count + '\t' + wordCount;
    }

    public int getWordCount() {
        return wordCount;
    }

    public void setWordCount(int wordCount) {
        this.wordCount = wordCount;
    }
}

MR2

	/**
 * @author yzz
 * @time 2019/6/1 18:02
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class DFJob extends Base {

    /**
     * @throws IOException
     * @throws ClassNotFoundException
     * @throws InterruptedException
     */
    public static void client()  {
        try {
            Configuration conf = new Configuration(true);
            conf.set("mapreduce.app-submission.coress-paltform", "true");
            conf.set("mapreduce.framework.name", "local");

            Job job = Job.getInstance(conf);
            job.setJobName("DFJob");
            job.setJarByClass(DFJob.class);
            job.addCacheFile(new Path(getConfig(conf, "df.document.count")).toUri());

            Path in = new Path(getConfig(conf, "df.temp.in"));
            Path out = new Path(getConfig(conf, "df.temp.out"));
            clean(out, conf);

            addInPath(job,conf,in);
            FileOutputFormat.setOutputPath(job, out);

            //map
            job.setMapperClass(DFMapper.class);
            job.setMapOutputKeyClass(DFVO.class);
            job.setMapOutputValueClass(NullWritable.class);

            //reduce
            job.setReducerClass(DFReducer.class);
            job.setOutputKeyClass(DFVO.class);
            job.setOutputValueClass(NullWritable.class);
            job.setNumReduceTasks(1);

            boolean b = job.waitForCompletion(true);
        }catch (Exception e){
            e.fillInStackTrace();
        }
    }
}

/**
 * @author yzz
 * @time 2019/6/1 21:13
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class DFMapper extends Mapper {

    DFVO dfvo = new DFVO();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] split = StringUtils.split(value.toString(), '\t');
        dfvo.setWord(split[0]);
        dfvo.setDocumentCountInIndex(Long.valueOf(split[1]));
        context.write(dfvo, NullWritable.get());
    }
}

/**
 * @author yzz
 * @time 2019/6/1 21:16
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class DFReducer extends Reducer {

    private long documentCount;

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {

        URI[] cacheFiles = context.getCacheFiles();
        URI uri = cacheFiles[0];
        Path path = new Path(uri);
        FileSystem fileSystem = path.getFileSystem(context.getConfiguration());
        FSDataInputStream open = fileSystem.open(path);
        String s = open.readLine();
        documentCount = Long.valueOf(s);

    }

    @Override
    protected void reduce(DFVO key, Iterable values, Context context) throws IOException, InterruptedException {
        long num = 0;
        for (NullWritable n : values) {
            num += key.getDocumentCountInIndex();
        }
        key.setDocumentCountInIndex(num);
        key.setDf(Math.log((documentCount * 1.0) / (num * 1.0D)));
        context.write(key, NullWritable.get());
    }
}

/**
 * @author yzz
 * @time 2019/6/1 19:40
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class DFVO implements WritableComparable {
    private String word;
    private long documentCountInIndex;
    private double df;


    @Override
    public int compareTo(DFVO o) {
        return word.compareTo(o.getWord());
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(word);
        out.writeLong(documentCountInIndex);
        out.writeDouble(df);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        word = in.readUTF();
        documentCountInIndex = in.readLong();
        df = in.readLong();
    }

    public String getWord() {
        return word;
    }

    public void setWord(String word) {
        this.word = word;
    }

    public long getDocumentCountInIndex() {
        return documentCountInIndex;
    }

    public void setDocumentCountInIndex(long documentCountInIndex) {
        this.documentCountInIndex = documentCountInIndex;
    }

    @Override
    public String toString() {
        return word + '\t' + documentCountInIndex + '\t' + df;
    }

    public double getDf() {
        return df;
    }

    public void setDf(double df) {
        this.df = df;
    }
}

MR3

	/**
 * @author yzz
 * @time 2019/6/1 14:03
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFDFJob extends Base {

    public static void client() throws Exception {
        Configuration conf = new Configuration();
        conf.set("mapreduce.app-submission.coress-paltform", "true");
        conf.set("mapreduce.framework.name", "local");

        Job job = Job.getInstance(conf);
        job.setJobName("IFDFJob");
        job.setJarByClass(IFDFJob.class);
        Path in1 = new Path(getConfig(conf, "search.if"));
        Path in3 = new Path(getConfig(conf, "df.temp.out"));

        Path out = new Path("/search/end");
        clean(out, conf);
        FileInputFormat.addInputPath(job, in1);
        FileInputFormat.addInputPath(job, in3);
        FileOutputFormat.setOutputPath(job, out);

        //map
        //切片
        job.setMapperClass(IFDFMapper.class);
        job.setMapOutputKeyClass(IFDFVO.class);
        job.setMapOutputValueClass(NullWritable.class);
        //reduce
        job.setGroupingComparatorClass(IFDFGroupingComparator.class);
        job.setReducerClass(IFDFReducer.class);
        //submit
        boolean complete = job.waitForCompletion(true);
        if (complete) {
            if (complete) {
                //设置config
                reWriteConfig(conf, getConfigMap(conf));
            }
        }
    }
}

/**
 * @author yzz
 * @time 2019/6/1 21:13
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFDFMapper extends Mapper {

    IFDFVO ifdfvo = new IFDFVO();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] split = StringUtils.split(value.toString(), '\t');
        if (split.length > 3) {
            //if
            ifdfvo.setId(split[0]);
            ifdfvo.setWord(split[1]);
            int i1 = Integer.valueOf(split[2]);
            int i2 = Integer.valueOf(split[3]);
            ifdfvo.setIfc(i1 * 1.0 / i2 * 1.0);
            ifdfvo.setType(1);
        } else {
            //df
            ifdfvo.setId("");
            ifdfvo.setWord(split[0]);
            ifdfvo.setDfc(Double.valueOf(split[2]));
            ifdfvo.setType(2);
        }
        context.write(ifdfvo, NullWritable.get());
    }
}

/**
 * @author yzz
 * @time 2019/6/1 21:16
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFDFReducer extends Reducer {


    @Override
    protected void reduce(IFDFVO key, Iterable values, Context context) throws IOException, InterruptedException {
        double ifc = 0.0;
        double dfc = 0.0;
        String id = null;
        for (NullWritable n : values) {
            if (1 == key.getType()) {
                ifc = key.getIfc();
                id = key.getId();
                key.setId(id);
                key.setIfdf(ifc * dfc);
                key.setDfc(dfc);
                context.write(key, NullWritable.get());
            } else {
                dfc = key.getDfc();
            }
        }

    }
}

/**
 * @author yzz
 * @time 2019/6/2 13:48
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFDFGroupingComparator extends WritableComparator {

    public IFDFGroupingComparator() {
        super(IFDFVO.class,true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        IFDFVO ifdfvo1 = (IFDFVO) a;
        IFDFVO ifdfvo2 = (IFDFVO) b;
        return ifdfvo1.getWord().compareTo(ifdfvo2.getWord());
    }
}

/**
 * @author yzz
 * @time 2019/6/2 1:50
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class IFDFVO implements WritableComparable {
    private String id;
    private String word;
    private double ifc;
    private double dfc;
    private double ifdf;
    private int type;//1 if 2df


    @Override
    public int compareTo(IFDFVO o) {
        int c1 = word.compareTo(o.getWord());
        if (c1 == 0) {
            return o.getType() - type;
        }
        return c1;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(id);
        out.writeUTF(word);
        out.writeDouble(ifdf);
        out.writeInt(type);
        out.writeDouble(ifc);
        out.writeDouble(dfc);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        id = in.readUTF();
        word = in.readUTF();
        ifdf = in.readDouble();
        type = in.readInt();
        ifc = in.readDouble();
        dfc = in.readDouble();
    }

    public String getWord() {
        return word;
    }

    public void setWord(String word) {
        this.word = word;
    }

    public double getIfdf() {
        return ifdf;
    }

    public void setIfdf(double ifdf) {
        this.ifdf = ifdf;
    }

    public int getType() {
        return type;
    }

    public void setType(int type) {
        this.type = type;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public double getIfc() {
        return ifc;
    }

    public void setIfc(double ifc) {
        this.ifc = ifc;
    }

    public double getDfc() {
        return dfc;
    }

    public void setDfc(double dfc) {
        this.dfc = dfc;
    }

    @Override
    public String toString() {
        return id + '\t' + word + '\t' + ifc + '\t' + dfc + '\t' + ifdf;
    }
}

Base

	/**
 * @author yzz
 * @time 2019/6/1 18:53
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class Base {

    public static final String CONFIG_PATH = "/search/conf/config.txt";

    protected static void clean(Path out, Configuration conf) throws IOException {
        if (out.getFileSystem(conf).exists(out)) {
            out.getFileSystem(conf).delete(out, true);
            out.getFileSystem(conf);
        }
    }

    protected static Map getConfigMap(Configuration configuration) throws IOException {
        Path path = new Path(CONFIG_PATH);
        FileSystem fileSystem = path.getFileSystem(configuration);
        FSDataInputStream open = fileSystem.open(path);
        String str = null;
        Map map = new HashMap<>();
        while (null != (str = open.readLine())) {
            if (null != str) {
                String[] split = str.split("=");
                map.put(split[0], split[1]);
            }
        }
        open.close();
        return map;
    }


    protected static String getConfig(Configuration configuration, String key) throws IOException {
        FSDataInputStream open = null;
        try {
            Path path = new Path(CONFIG_PATH);
            FileSystem fileSystem = path.getFileSystem(configuration);
            open = fileSystem.open(path);
            String str = null;
            while (null != (str = open.readLine())) {
                if (null != str) {
                    String[] split = str.split("=");
                    if (key.equals(split[0])) return split[1];
                }
            }
        }catch (Exception e){
            e.fillInStackTrace();
        }finally {
            open.close();
        }

        return null;
    }

    /**
     * df.temp.in=/search/temp
     * df.temp.out=/search/temp0
     *
     * @param configuration
     * @param map
     * @throws IOException
     */
    protected static void reWriteConfig(Configuration configuration, Map map) throws IOException {
        Path path = new Path(CONFIG_PATH);
        FileSystem fileSystem = path.getFileSystem(configuration);
        FSDataOutputStream outputStream = fileSystem.create(path, true);
        String in = map.get("df.temp.in");
        String out = map.get("df.temp.out");
        map.put("df.temp.in", out);
        map.put("df.temp.out", in);
        for (Map.Entry en : map.entrySet()) {
            String content = en.getKey() + "=" + en.getValue() + '\n';
            outputStream.write(content.getBytes());
        }
        outputStream.flush();
        IOUtils.closeStream(outputStream);
    }

    public static void addInPath(Job job, Configuration conf, Path path) throws IOException {
        FileSystem fileSystem = path.getFileSystem(conf);
        RemoteIterator locatedFileStatusRemoteIterator = fileSystem.listFiles(path, true);
        while (locatedFileStatusRemoteIterator.hasNext()) {
            LocatedFileStatus next = locatedFileStatusRemoteIterator.next();
            Path p = next.getPath();
            if (next.getLen() > 0)
                FileInputFormat.addInputPath(job, p);
        }
    }

    public static int readInt(FSDataInputStream open) {
        try {
            String a = open.readLine();
            open.close();
            return Integer.valueOf(a);
        } catch (Exception e) {
            e.printStackTrace();
            return 0;
        }
    }

    public static void copyFile(Path out, Path target, Configuration conf) throws IOException {
        FileSystem fileSystem = out.getFileSystem(conf);
        if (!fileSystem.exists(target)){
            FSDataOutputStream outputStream = fileSystem.create(target);
            outputStream.close();
        }
        FileUtil.copy(fileSystem,out,fileSystem,target,false,conf);

    }

    public static void readAndWrite(Path out, Path target, Configuration conf) throws IOException {
        FileSystem fileSystem = out.getFileSystem(conf);
        if (!fileSystem.exists(target)) {
            fileSystem.createNewFile(target);
        }
        FSDataInputStream open = fileSystem.open(out);
        FSDataInputStream open1 = fileSystem.open(target);
        int c1 = readInt(open);
        int c2 = readInt(open1);
        FSDataOutputStream outputStream = fileSystem.create(target, true);
        outputStream.writeBytes(String.valueOf(c1 + c2));
        outputStream.flush();
        outputStream.close();
        open.close();
        open1.close();
    }
}

client

	/**
 * @author yzz
 * @time 2019/6/1 15:39
 * @E-mail [email protected]
 * @since 0.0.1
 */
public class client {

    public static void main(String[] args) {
        try {
            IFWordCountJob.client();
            DFJob.client();
            IFDFJob.client();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

pom

	
        
        
            org.apache.hadoop
            hadoop-common
            ${hadoop.version}
        
        
            org.apache.hadoop
            hadoop-client
            ${hadoop.version}
        
        
            org.apache.hadoop
            hadoop-hdfs
            ${hadoop.version}
        

        
            org.apache.hadoop
            hadoop-mapreduce-client-core
            ${hadoop.version}
        

        
            org.apache.hadoop
            hadoop-mapreduce-client-jobclient
            ${hadoop.version}
        


        
            junit
            junit
            4.12
        

        
            com.janeluo
            ikanalyzer
            2012_u6
            compile

HDFS目录

总结

需要注意的是，所有打开的流必须关闭，HDFS系统对于未关闭的流有保护，会导致程序报错。一批文档上传至HDFS后经过3个MR，最后系统所有的文档的词评分将更新。

Node.js 包管理配置文件详解：package.json、npmrc、package-lock.json 全面解析还是鼠鼠 node.js node.js json javascript 前端 vscode
目录Node.js包管理配置文件详解1.package.json：Node.js项目的核心配置文件示例：完整的package.json配置关键字段解析2.package-lock.json：锁定依赖版本示例：部分package-lock.json作用如果package-lock.json出现问题3..npmrc：npm的自定义配置文件示例：修改npm源应用.npmrc配置4..nvmrc：Node
大数据技术实战---项目中遇到的问题及项目经验一个“不专业”的阿凡大数据
问题导读：1、项目中遇到过哪些问题？2、Kafka消息数据积压，Kafka消费能力不足怎么处理？3、Sqoop数据导出一致性问题？4、整体项目框架如何设计？项目中遇到过哪些问题7.1Hadoop宕机（1）如果MR造成系统宕机。此时要控制Yarn同时运行的任务数，和每个任务申请的最大内存。调整参数：yarn.scheduler.maximum-allocation-mb（单个任务可申请的最多物理内存
uniapp发布成harmony时报错找不到@uni_modules/uni-push包跟这个包@uni_modules/hmr-for-uni-app ABCHERRY7 前端 uni-app harmony 打包
18:39:02.712项目a编译成功。18:39:04.009安装鸿蒙工程的依赖...18:39:06.874安装鸿蒙工程依赖成功18:39:06.874开始制作安装包.app，请耐心等待...........18:39:37.254>hvigorWARN:Thecurrentmodule'a'hasdependencywhichisnotinstalledatitsoh-package.jso
Apache大数据旭哥优选大数据选题 Apache大数据旭大数据定制选题 java hadoop spark 开发语言 idea hive 数据库架构
定制旭哥服务，一对一，无中介包安装+答疑+售后态度和技术都很重要定制按需求做要求不高就实惠一点定制需提前沟通好怎么做，这样才能避免不必要的麻烦python、flask、Django、mapreduce、mysqljava、springboot、vue、echarts、hadoop、spark、hive、hbase、flink、SparkStreaming、kafka、flume、sqoop分析+推
Java后端开发技术详解小二爱编程· java 开发语言
Java作为一门成熟的编程语言，已广泛应用于后端开发领域。其强大的生态系统和广泛的支持库使得Java成为许多企业和开发者的首选后端开发语言。随着云计算、微服务架构和大数据技术的兴起，Java后端开发的技术栈也不断演进。本文将详细介绍Java后端开发的核心技术，包括Java基础、常见框架、数据库操作、缓存技术、异步编程等。1.Java基础：理解面向对象的编程Java是一种面向对象的编程语言，面向对象
IDC权威认证！永洪科技入选 IDC「GBI图谱」，点亮生成式 BI 价值灯塔永洪科技科技人工智能 BI 大数据数据分析
大数据市场正在稳步前进，生成式AI已成为厂商服务的重点方向，其发展离不开数据底座建设和数据工程管理，反过来AI也会帮助开发运维人员、业务人员和管理层更好地使用、查询数据。IDC调研数据显示，在生成式AI的驱动下，未来5年企业在数据管理和数据分析基础设施建设的投资增长率将分别达到8.7%和9.2%。近日，国际咨询机构IDC发布了《中国数据智能市场生态图谱V5.0》，在这一领域，永洪科技以其创新前沿的
打造金融数据新引擎，看永洪科技助力头部农信社搭建一站式分析平台永洪科技金融数据可视化 BI 数据分析大数据
在数字化转型的浪潮中，金融行业作为经济发展的核心引擎，正加速探索数字化、智能化的新路径。永洪科技，近日成功助力某省农村信用社联合社（简称：Z企业）完成了其数字化转型的重要一步，通过部署先进的商业智能解决方案，为Z企业的业务升级与效能提升注入了强劲动力。随着智能金融时代的来临，以大数据、人工智能、移动互联等新兴技术为核心的金融科技持续赋能银行金融业务数字化、智能化、开放化的发展，为金融机构营销体系的
读书笔记五 ---大数据之路--数仓分层 qq_38215991 big data 大数据
数据分层在流式数据模型中,数据模型整体上分为五层。ODS层跟离线系统的定义一样,ODS层属于操作数据层,是直接从业务系统采集过来的最原始数据（进行了数据清洗）,包含了所有业务的变更过程,数据粒度也是最细的。在这一层,实时和离线在源头上是统一的,这样的好处是用同一份数据加工出来的指标,口径基本是统一的,可以更方便进行实时和离线问数据比对。例如:原始的订单变更记录数据、服务器引擎的访同日志。（原始数据
使用LangGraph迁移MapReduceDocumentsChain进行长文档的摘要 dgay_hua python
在大数据处理和文本分析领域，MapReduce是一种非常重要的策略，用于处理和分析大型数据集。具体到文本处理方面，MapReduceDocumentsChain구현了一种map-reduce策略，可以有效地处理长文本。本文将介绍如何从MapReduceDocumentsChain迁移到LangGraph，并探讨LangGraph在流处理、检查点恢复等方面的优势。技术背景介绍MapReduceDoc
通过Bokeh实现大规模数据可视化的最佳实践【从静态图表到实时更新】步入烟尘算法指南信息可视化 Bokeh python
本文已收录于《Python超入门指南全册》本专栏专门针对零基础和需要进阶提升的同学所准备的一套完整教学，从基础到精通不断进阶深入，后续还有实战项目，轻松应对面试，专栏订阅地址：https://blog.csdn.net/mrdeam/category_12647587.html优点：订阅限时19.9付费专栏，私信博主还可进入全栈VIP答疑群，作者优先解答机会（代码指导、远程服务），群里大佬众多可以
Python用Bokeh处理大规模数据可视化的最佳实践一键难忘 Bokeh python 开发语言
用Bokeh处理大规模数据可视化的最佳实践在大规模数据处理和分析中，数据可视化是一个至关重要的环节。Bokeh是一个在Python生态中广泛使用的交互式数据可视化库，它具有强大的可扩展性和灵活性。本文将介绍如何使用Bokeh处理大规模数据可视化，并提供一些最佳实践和代码实例，帮助你高效地展示大数据集中的重要信息。1.为什么选择Bokeh？Bokeh是一个专为浏览器呈现而设计的可视化库，它支持高效渲
mysql 游标分页_MySQL采用游标分页的方式，“下一页”很好实现，那“上一页”如何实现呢？... 酸流 mysql 游标分页
为让mysql达到最佳查询性能，我将分页查询改为cursor查询方式：select*fromreplywherereply_id>last_idlimit20orderbyreply_idASC;上面的last_id为本页最后一条回复的reply_id，这样就能实现“下一页”的查询了，但是“上一页”如何实现呢？我想过改变排列顺序select*fromreplywherereply_id,这样不管是
分页优化之——游标分页 PhilipJ0303 Java面试 java 数据库优化游标分页分页查询
游标分页（Cursor-basedPagination）是一种高效的分页方式，特别适用于大数据集和无限滚动的场景。与传统的基于页码的分页（如page=1&size=10）不同，游标分页通过一个唯一的游标（通常是时间戳或唯一ID）来标记分页的位置，避免了传统分页在数据变动时的重复或遗漏问题。以下是游标分页在前后端的实现方式：1.游标分页的核心概念游标（Cursor）：游标是一个唯一标识符，通常是数据
opencv对图像处理 syfirst1111 图像处理 opencv 计算机视觉
形态学转换：基于图像形状的操作，通常在二进制图像上执行。腐蚀、膨胀：腐蚀：求局部最小值，原图高亮部分被蚕食膨胀：求局部最大值，原图高亮部分部分扩张img=cv.imread(path)kenel=np.ones((5,5),np.uint8)#创建核结构img2=cv.erode(img,kenel)#腐蚀去噪img1=cv.dilate(img,kenel)#膨胀目标增大，填充孔洞图像平滑（去噪
轻松入门Apache SeaTunnel：数据集成利器窝窝和牛牛 SeaTunnel ETL 数据集成
文章目录轻松入门ApacheSeaTunnel：数据集成利器什么是SeaTunnel基本原理运行流程SeaTunnelvsDataX：两大数据集成工具对比实战场景：MySQL数据同步至ElasticsearchSeaTunnel实现方案DataX实现方案实现原理对比底层依赖环境方案优缺点分析快速上手环境准备简单示例总结轻松入门ApacheSeaTunnel：数据集成利器什么是SeaTunnelAp
查询数据库中第n行数据 weixin_44231698
一般情况下分页的时候是需要的关键字。Oracle中则是rownum，MSSQL中是top关键字，MySQL中是limit关键字。查第n条数据，如：（1）select*from(SELECTROWNUMrn,A.*FROMTABLEA)bwhereb.rn=n;(2)selecttop1*fromTABLEAwhereidnotin(selecttopn-1idfromTABLEA);(3)用mys
探索数据安全新境界：Apache Spark SQL Ranger Security插件深度揭秘乌昱有Melanie
探索数据安全新境界：ApacheSparkSQLRangerSecurity插件深度揭秘项目地址:https://gitcode.com/gh_mirrors/sp/spark-ranger随着大数据的爆炸性增长，数据安全性成为了企业不可忽视的核心议题。在这一背景下，【ApacheSparkSQLRangerSecurityPlugin】以其强大的数据访问控制能力脱颖而出，成为数据处理领域的明星级
探索简明虚拟机新纪元 —— SSVM 深度揭秘与应用指南殷巧或
探索简明虚拟机新纪元——SSVM深度揭秘与应用指南SSVMJavaVMrunningonaJVM项目地址:https://gitcode.com/gh_mirrors/ssv/SSVM在当今软件开发的浩瀚宇宙中，一种名为SSVM（StupidlySimpleVM）的轻量级虚拟机正悄然兴起，承诺为开发者带来前所未有的灵活性与效率。本文将深入剖析SSVM的核心特性，探讨其技术实现，展示应用场景，并揭示
Java 大视界 -- Java 大数据在智能医疗远程会诊与专家协作中的技术支持（146）青云交大数据新视界 Java 大视界 java 大数据智能医疗远程会诊专家协作数据安全病例诊断
亲爱的朋友们，热烈欢迎来到青云交的博客！能与诸位在此相逢，我倍感荣幸。在这飞速更迭的时代，我们都渴望一方心灵净土，而我的博客正是这样温暖的所在。这里为你呈上趣味与实用兼具的知识，也期待你毫无保留地分享独特见解，愿我们于此携手成长，共赴新程！一、欢迎加入【福利社群】点击快速加入：青云交灵犀技韵交响盛汇福利社群点击快速加入2：2024CSDN博客之星创作交流营（NEW)二、本博客的精华专栏：大数据新视
Flink相关面试题努力的搬砖人. 面试 java 后端 flink
以下是150道ApacheFlink面试题及其详细回答，涵盖了Flink的基础知识、核心架构、API使用、性能调优等多个方面，每道题目都尽量详细且简单易懂：Flink基础概念类1.什么是ApacheFlink？ApacheFlink是一个开源的流处理和批处理框架，能够实现快速、可靠、可扩展的大数据处理。它既可以处理无界的数据流，也可以处理有界的数据批，提供了低延迟和高吞吐量的实时数据处理能力。Fl
人脸识别的一些代码饿了就干饭 CV相关人脸识别
1、cv2入门函数imread及其相关操作2、（详解）opencv里的cv2.resize改变图片大小Python3、机器学习之人脸识别face_recognition使用4、使用face_recognition进行人脸校准5、简单的人脸识别通用流程示意图（这个看着写的挺好的）6、face_recognition和图像处理中left、top、right、bottom解释7、使用pillow库对图片
2017安全之势：云、大数据、IoT、人工智能 weixin_34392906 人工智能大数据嵌入式
“新技术让信息系统变成了孙悟空，开始无所不能，但安全仍是它的‘紧箍咒’！怎样解开这个‘紧箍咒’？各路安全厂商各显其能，但似乎路漫漫兮离目标还很遥远。”三未信安董事长张岳公在ZD至顶网《百位意见领袖寄语2017》中说出了这样一句话，我觉着很有道理。安全是一个永恒的话题，如果说它与新的信息技术相生相克也不过分。即便如此，我们更要尽可能的减少安全带来的束缚。2017已经到来，不妨来看看至顶网与业界大咖总
直方图梯度提升：大数据时代的极速决策引擎万事可爱^ 大数据机器学习深度学习直方图梯度提升 GBDT 算法
一、为什么需要直方图梯度提升？在Kaggle竞赛的冠军解决方案中，超过70%的获奖方案都使用了梯度提升算法。但当数据量突破百万级时，传统梯度提升树（GBDT）面临三大致命瓶颈：训练耗时剧增：每个特征的分割点计算都需要全量数据排序内存消耗爆炸：存储排序后的特征值需要额外空间处理效率低下：无法有效利用现代CPU的多核特性而梯度提升决策树（GBDT）作为集成学习的代表算法，通过迭代构建决策树实现预测能力
从原理到实践：Go 语言内存优化策略深度解析叶间清风1998 服务器 linux 网络
目录一、引言二、Go语言内存管理基础原理2.1栈与堆内存分配2.2垃圾回收机制剖析三、内存优化策略与实践3.1合理使用指针传递3.2避免不必要的内存分配3.3优化切片与映射的使用3.4控制变量作用域3.5减少闭包导致的变量逃逸四、内存优化工具与性能分析4.1pprof工具的使用4.2其他性能分析辅助手段五、不同场景下的内存优化案例分析5.1高并发Web服务场景5.2大数据处理与分析场景六、总结与展
Vue3-笔记002-Ref与Reactive ·焱· vue3学习笔记笔记 vue.js javascript
002-Ref与Reactive-目录Refref案例ref与RefifRefshallowReftriggerRefcustomRefdom元素的refReactive与ref的共同点与ref的不同点数组的异步赋值问题readonlyshallowReactivetoReftoRefstoRawRef接受一个内部值并返回一个响应式且可变的ref对象。ref对象仅有一个.valueproperty
RFM案例(简要版) 郜太素数据处理和统计分析 Numpy pandas RFM案例 mysql 学习方法 sql
一、会员价值度模型1、RFM模型介绍会员价值度用来评估用户的价值情况，是区分会员价值的重要模型和参考依据，也是衡量不同营销效果的关键指标之一。价值度模型一般基于交易行为产生，衡量的是有实体转化价值的行为。常用的价值度模型是RFMRFM模型是根据会员最近一次购买时间R（Recency）购买频率F（Frequency）购买金额M（Monetary）计算得出RFM得分通过这3个维度来评估客户的订单活跃价
使用Wolfram Alpha API在LangChain中的应用 shuoac langchain python
在AI技术应用中，WolframAlpha以其强大的计算能力和信息检索功能，被广泛应用于各类智能系统中。本文将为您介绍如何结合LangChain使用WolframAlphaAPI，以实现功能强大的计算和信息查询服务。技术背景介绍WolframAlpha是由WolframResearch开发的问答引擎，它通过计算从外部数据源中获取答案，实现对事实性问题的解答。在开发智能应用时，我们可以利用Wolfr
Android com.facebook.react:react-native:+ 版本问题小铁-Android react native android
Executionfailedfortask':app:desugarBetaDebugAndroidTestFileDependencies'.>Couldnotresolveallfilesforconfiguration':app:betaDebugRuntimeClasspath'.>Failedtotransformreact-native-0.71.0-rc.0-debug.aar(c
硅谷企业的大数据平台架构什么样？看看Twitter、Airbnb、Uber的实践大数据v 分布式数据库大数据编程语言 hadoop
导读：本文分析一下典型硅谷互联网企业的大数据平台架构。作者：彭锋宋文欣孙浩峰来源：大数据DT（ID：hzdashuju）01Twitter的大数据平台架构Twitter是最早一批推进数字化运营的硅谷企业之一，其公司运营和产品迭代的很多功能是由其底层的大数据平台提供的。图7-2所示为Twitter大数据平台的基本示意图。▲图7-2Twitter大数据平台架构Twitter的大数据平台开发比较早，很多
使用maven打包项目报错Please refer to... 编程_大白日常 maven java
报错描述：PleaserefertoD:\code\java\project_test\usercenter\usercenter_backend\target\surefire-reportsfortheindividualtestresults.Pleaserefertodumpfiles(ifanyexist)[date].dump,[date]-jvmRun[N].dumpand[date
JAVA基础灵静志远位运算加载 Date 字符串池覆盖
一、类的初始化顺序 1 （静态变量，静态代码块）-->（变量，初始化块）--> 构造器同一括号里的，根据它们在程序中的顺序来决定。上面所述是同一类中。如果是继承的情况，那就在父类到子类交替初始化。二、String 1 String a = "abc"; JAVA虚拟机首先在字符串池中查找是否已经存在了值为"abc"的对象，根
keepalived实现redis主从高可用 bylijinnan redis
方案说明两台机器（称为A和B），以统一的VIP对外提供服务 1.正常情况下，A和B都启动，B会把A的数据同步过来（B is slave of A） 2.当A挂了后，VIP漂移到B；B的keepalived 通知redis 执行：slaveof no one，由B提供服务 3.当A起来后，VIP不切换，仍在B上面；而A的keepalived 通知redis 执行slaveof B，开始
java文件操作大全 0624chenhong java
最近在博客园看到一篇比较全面的文件操作文章，转过来留着。 http://www.cnblogs.com/zhuocheng/archive/2011/12/12/2285290.html 转自http://blog.sina.com.cn/s/blog_4a9f789a0100ik3p.html 一.获得控制台用户输入的信息 &nbs
android学习任务不懂事的小屁孩工作
任务完成情况搞清楚带箭头的pupupwindows和不带的使用已完成熟练使用pupupwindows和alertdialog，并搞清楚两者的区别已完成熟练使用android的线程handler,并敲示例代码进行中了解游戏2048的流程，并完成其代码工作进行中-差几个actionbar 研究一下android的动画效果，写一个实例已完成复习fragem
zoom.js 换个号韩国红果果 oom
它的基于bootstrap 的 https://raw.github.com/twbs/bootstrap/master/js/transition.js transition.js模块引用顺序 <link rel="stylesheet" href="style/zoom.css"> <script src=&q
详解Oracle云操作系统Solaris 11.2 蓝儿唯美 Solaris
当Oracle发布Solaris 11时，它将自己的操作系统称为第一个面向云的操作系统。Oracle在发布Solaris 11.2时继续它以云为中心的基调。但是，这些说法没有告诉我们为什么Solaris是配得上云的。幸好，我们不需要等太久。Solaris11.2有4个重要的技术可以在一个有效的云实现中发挥重要作用：OpenStack、内核域、统一存档（UA）和弹性虚拟交换（EVS）。
spring学习——springmvc（一） a-john springMVC
Spring MVC基于模型-视图-控制器（Model-View-Controller，MVC）实现，能够帮助我们构建像Spring框架那样灵活和松耦合的Web应用程序。 1，跟踪Spring MVC的请求请求的第一站是Spring的DispatcherServlet。与大多数基于Java的Web框架一样，Spring MVC所有的请求都会通过一个前端控制器Servlet。前
hdu4342 History repeat itself-------多校联合五 aijuans 数论
水题就不多说什么了。 #include<iostream>#include<cstdlib>#include<stdio.h>#define ll __int64using namespace std;int main(){ int t; ll n; scanf("%d",&t); while(t--)
EJB和javabean的区别 asia007 bean ejb
EJB不是一般的JavaBean,EJB是企业级JavaBean,EJB一共分为3种,实体Bean,消息Bean,会话Bean,书写EJB是需要遵循一定的规范的,具体规范你可以参考相关的资料.另外,要运行EJB,你需要相应的EJB容器,比如Weblogic,Jboss等,而JavaBean不需要,只需要安装Tomcat就可以了 1.EJB用于服务端应用开发, 而JavaBeans
Struts的action和Result总结百合不是茶 struts Action配置 Result配置
一:Action的配置详解: 下面是一个Struts中一个空的Struts.xml的配置文件 <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE struts PUBLIC &quo
如何带好自已的团队 bijian1013 项目管理团队管理团队
在网上看到博客" 怎么才能让团队成员好好干活"的评论，觉得写的比较好。原文如下：我做团队管理有几年了吧，我和你分享一下我认为带好团队的几点： 1.诚信对团队内成员，无论是技术研究、交流、问题探讨，要尽可能的保持一种诚信的态度，用心去做好，你的团队会感觉得到。 2.努力提
Java代码混淆工具 sunjing ProGuard
Open Source Obfuscators ProGuard http://java-source.net/open-source/obfuscators/proguardProGuard is a free Java class file shrinker and obfuscator. It can detect and remove unused classes, fields, m
【Redis三】基于Redis sentinel的自动failover主从复制 bit1129 redis
在第二篇中使用2.8.17搭建了主从复制，但是它存在Master单点问题，为了解决这个问题，Redis从2.6开始引入sentinel，用于监控和管理Redis的主从复制环境，进行自动failover，即Master挂了后，sentinel自动从从服务器选出一个Master使主从复制集群仍然可以工作，如果Master醒来再次加入集群，只能以从服务器的形式工作。什么是Sentine
使用代理实现Hibernate Dao层自动事务白糖_ DAO spring AOP 框架 Hibernate
都说spring利用AOP实现自动事务处理机制非常好，但在只有hibernate这个框架情况下，我们开启session、管理事务就往往很麻烦。 public void save(Object obj){ Session session = this.getSession(); Transaction tran = session.beginTransaction(); try
maven3实战读书笔记 braveCS maven3
Maven简介是什么？ Is a software project management and comprehension tool.项目管理工具是基于POM概念(工程对象模型) [设计重复、编码重复、文档重复、构建重复，maven最大化消除了构建的重复] [与XP：简单、交流与反馈；测试驱动开发、十分钟构建、持续集成、富有信息的工作区] 功能：
编程之美-子数组的最大乘积 bylijinnan 编程之美
public class MaxProduct { /** * 编程之美子数组的最大乘积 * 题目: 给定一个长度为N的整数数组，只允许使用乘法，不能用除法，计算任意N-1个数的组合中乘积中最大的一组，并写出算法的时间复杂度。 * 以下程序对应书上两种方法，求得“乘积中最大的一组”的乘积——都是有溢出的可能的。 * 但按题目的意思，是要求得这个子数组，而不
读书笔记-2 chengxuyuancsdn 读书笔记
1、反射 2、oracle年-月-日时-分-秒 3、oracle创建有参、无参函数 4、oracle行转列 5、Struts2拦截器 6、Filter过滤器(web.xml) 1、反射 (1)检查类的结构在java.lang.reflect包里有3个类Field,Method,Constructor分别用于描述类的域、方法和构造器。 2、oracle年月日时分秒 s
[求学与房地产]慎重选择IT培训学校 comsci it
关于培训学校的教学和教师的问题,我们就不讨论了,我主要关心的是这个问题培训学校的教学楼和宿舍的环境和稳定性问题我们大家都知道，房子是一个比较昂贵的东西，特别是那种能够当教室的房子... &nb
RMAN配置中通道(CHANNEL)相关参数 PARALLELISM 、FILESPERSET的关系 daizj oracle rman filesperset PARALLELISM
RMAN配置中通道(CHANNEL)相关参数 PARALLELISM 、FILESPERSET的关系转 PARALLELISM --- 我们还可以通过parallelism参数来指定同时"自动"创建多少个通道： RMAN > configure device type disk parallelism 3 ; 表示启动三个通道，可以加快备份恢复的速度。
简单排序:冒泡排序 dieslrae 冒泡排序
public void bubbleSort(int[] array){ for(int i=1;i<array.length;i++){ for(int k=0;k<array.length-i;k++){ if(array[k] > array[k+1]){
初二上学期难记单词三 dcj3sjt126com sciet
concert 音乐会 tonight 今晚 famous 有名的；著名的 song 歌曲 thousand 千 accident 事故；灾难 careless 粗心的，大意的 break 折断；断裂；破碎 heart 心（脏） happen 偶尔发生，碰巧 tourist 旅游者；观光者 science （自然）科学 marry 结婚 subject 题目；
I.安装Memcahce 1. 安装依赖包libevent Memcache需要安装libevent,所以安装前可能需要执行 Shell代码收藏代码 dcj3sjt126com redis
wget http://download.redis.io/redis-stable.tar.gz tar xvzf redis-stable.tar.gz cd redis-stable make 前面3步应该没有问题，主要的问题是执行make的时候，出现了异常。异常一： make[2]: cc: Command not found 异常原因：没有安装g
并发容器 shuizhaosi888 并发容器
通过并发容器来改善同步容器的性能，同步容器将所有对容器状态的访问都串行化，来实现线程安全，这种方式严重降低并发性，当多个线程访问时，吞吐量严重降低。并发容器ConcurrentHashMap 替代同步基于散列的Map，通过Lock控制。 &nb
Spring Security（12）——Remember-Me功能 234390216 Spring Security Remember Me 记住我
Remember-Me功能目录 1.1 概述 1.2 基于简单加密token的方法 1.3 基于持久化token的方法 1.4 Remember-Me相关接口和实现
位运算焦志广位运算
一、位运算符Ｃ语言提供了六种位运算符： & 按位与 | 按位或 ^ 按位异或 ~ 取反 << 左移 >> 右移 1. 按位与运算按位与运算符"&"是双目运算符。其功能是参与运算的两数各对应的二进位相与。只有对应的两个二进位均为1时，结果位才为1 ，否则为0。参与运算的数以补码方式出现。例如：9&am
nodejs 数据库连接 mongodb mysql liguangsong mongodb mysql node 数据库连接
1.mysql 连接 package.json中dependencies加入 "mysql":"~2.7.0" 执行 npm install 在config 下创建文件 database.js
java动态编译 olive6615 java HotSpot jvm 动态编译
在HotSpot虚拟机中，有两个技术是至关重要的，即动态编译(Dynamic compilation)和Profiling。 HotSpot是如何动态编译Javad的bytecode呢？Java bytecode是以解释方式被load到虚拟机的。HotSpot里有一个运行监视器，即Profile Monitor,专门监视
Storm0.9.5的集群部署配置优化 roadrunners 优化 storm.yaml
nimbus结点配置（storm.yaml）信息： # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional inf
101个MySQL 的调节和优化的提示 tomcat_oracle mysql
　1. 拥有足够的物理内存来把整个InnoDB文件加载到内存中——在内存中访问文件时的速度要比在硬盘中访问时快的多。　　2. 不惜一切代价避免使用Swap交换分区 – 交换时是从硬盘读取的，它的速度很慢。　　3. 使用电池供电的RAM（注：RAM即随机存储器）。　　4. 使用高级的RAID（注：Redundant Arrays of Inexpensive Disks，即磁盘阵列
zoj 3829 Known Notation(贪心) 阿尔萨斯 ZOJ
题目链接：zoj 3829 Known Notation 题目大意：给定一个不完整的后缀表达式，要求有2种不同操作，用尽量少的操作使得表达式完整。解题思路：贪心，数字的个数要要保证比∗的个数多1，不够的话优先补在开头是最优的。然后遍历一遍字符串，碰到数字+1，碰到∗-1,保证数字的个数大于等1，如果不够减的话，可以和最后面的一个数字交换位置（用栈维护十分方便），因为添加和交换代价都是1

MapReduce实现分词和倒排索引(算法TF-IDF)

MapReduce实现分词和倒排索引(算法TF-IDF)

介绍IFDF

输入文件

思路

实现

MR1

MR2

MR3

Base

client

pom

HDFS目录

总结

你可能感兴趣的:(大数据,MR,TF-IDF)