Jack_F

MapReduce Design Patterns-chapter 5

CHAPTER 5：Join Patterns

A Refresher on Joins

INNER JOIN

With this type of join, records from both A and B that contain identical values for a given foreign key f are brought together, such that all the columns of both A and B now make a new table. Records that contain values of f that are contained in A but not in B, and vice versa, are not represented in the result table of the join operation.

OUTER JOIN

records with a foreign key not present in both tables will be in the final table.

In a left outer join, the unmatched records in the “left” table will be in the final table,with null values in the columns of the right table that did not match on the foreign key. Unmatched records present in the right table will be discarded. A right outer join is the same as a left outer, but the difference is the right table records are kept and the left table values are null where appropriate. A full outer join will contain all unmatched records from both tables, sort of like a combination of both a left and right outer join.

ANTIJOIN

An antijoin is a full outer join minus the inner join

CARTESIAN PRODUCT

A Cartesian product or cross product takes each record from a table and matches it up with every record from another table. If table X contains n records and table Y contains m records, the cross product of X and Y, denoted X × Y, contains n × m records.

Reduce Side Join

MapReduce Design Patterns-chapter 5_第1张图片

In the following example, two mapper classes are created: one for the user data and one for the comments. Each mapper class outputs the user ID as the foreign key, and the entire record as the value along with a single character to flag which record came from what set. The reducer then copies all values for each group in memory, keeping track of which record came from what data set. The records are then joined together and output.

Problem: Given a set of user information and a list of user’s comments, enrich each comment with the information about the user who created the comment.

Dirve Code

...
// Use MultipleInputs to set which input uses what mapper
// This will keep parsing of each data set separate from a logical standpoint
// The first two elements of the args array are the two inputs
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class,
        UserJoinMapper.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class,
        CommentJoinMapper.class);
job.getConfiguration()..set("join.type", args[2]);
...

Map Code

public static class UserJoinMapper extends Mapper {
    private Text outkey = new Text();
    private Text outvalue = new Text();
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        // Parse the input string into a nice map
        Map parsed =
                MRDPUtils.transformXmlToMap(value.toString());
        String userId = parsed.get("Id");
        
        // The foreign join key is the user ID
        outkey.set(userId);
        // Flag this record for the reducer and then output
        outvalue.set("A" + value.toString());
        context.write(outkey, outvalue);
    }
}

public static class CommentJoinMapper extends
        Mapperlt;Object, Text, Text, Text> {
    private Text outkey = new Text();
    private Text outvalue = new Text();
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        Map parsed = transformXmlToMap(value.toString());
        // The foreign join key is the user ID
        outkey.set( parsed.get("UserId"));
        // Flag this record for the reducer and then output
        outvalue.set("B" + value.toString());
        context.write(outkey, outvalue);
    }
}

 public static class UserJoinReducer extends Reducer {
  private static final Text EMPTY_TEXT = Text("");
    private Text tmp = new Text();
    private ArrayList listA = new ArrayList();
    private ArrayList listB = new ArrayList();
    private String joinType = null;
    public void setup(Context context) {
        // Get the type of join from our configuration
        joinType = context.getConfiguration().get("join.type");
    }
    public void reduce(Text key, Iterable values, Context context)
            throws IOException, InterruptedException {
        // Clear our lists
        listA.clear();
        listB.clear();
        // iterate through all our values, binning each record based on what
        // it was tagged with.  Make sure to remove the tag!
        while (values.hasNext()) {
            tmp = values.next();
            if (tmp.charAt(0) == 'A') {
                listA.add(new Text(tmp.toString().substring(1)));
            } else if (tmp.charAt('0') == 'B') {
                listB.add(new Text(tmp.toString().substring(1)));
            }
        }
        // Execute our join logic now that the lists are filled
        executeJoinLogic(context);
    }
    private void executeJoinLogic(Context context)
            throws IOException, InterruptedException {
        ...if (joinType.equalsIgnoreCase("inner")) {
    // If both lists are not empty, join A with B
    if (!listA.isEmpty() && !listB.isEmpty()) {
        for (Text A : listA) {
            for (Text B : listB) {
                context.write(A, B);
   	    }
	}
    }else if (joinType.equalsIgnoreCase("leftouter")) {
    // For each entry in A,
        for (Text A : listA) {
            // If list B is not empty, join A and B
            if (!listB.isEmpty()) {
                for (Text B : listB) {
                    context.write(A, B);
                }
            } else {
                // Else, output A by itself
                context.write(A, EMPTY_TEXT);
            }
        }
    }else if (joinType.equalsIgnoreCase("rightouter")) {
        // For each entry in B,
        for (Text B : listB) {
           // If list A is not empty, join A and B
            if (!listA.isEmpty()) {
                for (Text A : listA) {
                    context.write(A, B);
                }
            } else {
                // Else, output B by itself
                context.write(EMPTY_TEXT, B);
            }
        }
     }else if (joinType.equalsIgnoreCase("fullouter")) {
   	 // If list A is not empty
   	 if (!listA.isEmpty()) {
      	      // For each entry in A
      	      for (Text A : listA) {
                 // If list B is not empty, join A with B
                 if (!listB.isEmpty()) {
		     for (Text B : listB) {
                     context.write(A, B);
                 } else {
                    // Else, output A by itself
                    context.write(A, EMPTY_TEXT);
                 }
              }
        } else {
            // If list A is empty, just output B
            for (Text B : listB) {
                context.write(EMPTY_TEXT, B);
            }
        }
    } else if (joinType.equalsIgnoreCase("anti")) {
        // If list A is empty and B is empty or vice versa
        if (listA.isEmpty() ^ listB.isEmpty()) {
            // Iterate both A and B with null values
            // The previous XOR check will make sure exactly one of
            // these lists is empty and therefore the list will be skipped
            for (Text A : listA) {
               context.write(A, EMPTY_TEXT);
            }
            for (Text B : listB) {
                context.write(EMPTY_TEXT, B);
           }
       }
    }
}
 
Reduce Side Join with Bloom Filter

Saywe are only interested in enriching comments with reputable users, i.e., greater than 1,500 reputation.

在CommentMap中使用Bloom Filter，UserMap中直接保留大于1500

Replicated Join

All the data sets except the very large one are essentially read into memory during the setup phase of each map task, which is limited by the JVM heap. If you can live within this limitation, you get a drastic benefit because there is no reduce phase at all, and therefore no shuffling or sorting. The join is done entirely in the map phase, with the very large data set being the input for the MapReduce job.

The type of join to execute is an inner join or a left outer join, with the large input data set being the “left” part of the operation.All of the data sets, except for the large one, can be fit into main memory of each map task.

The mapper is responsible for reading all files from the distributed cache during the setup phase and storing them into in-memory lookup tables. After this setup phase completes, the mapper processes each record and joins it with all the data stored in-memory.

MapReduce Design Patterns-chapter 5_第2张图片

Problem: Given a small set of user information and a large set of comments, enrich the comments with user information data.

public static class ReplicatedJoinMapper extends
        Mapper {
    private static final Text EMPTY_TEXT = new Text("");
    private HashMap userIdToInfo = new HashMap();
    private Text outvalue = new Text();
    private String joinType = null;
    public void setup(Context context) throws IOException,
            InterruptedException {
        Path[] files =
                DistributedCache.getLocalCacheFiles(context.getConfiguration());
        // Read all files in the DistributedCache
        for (Path p : files) {
            BufferedReader rdr = new BufferedReader(
                    new InputStreamReader(
                            new GZIPInputStream(new FileInputStream(
                                    new File(p.toString())))));
            String line = null;
            // For each record in the user file
            while ((line = rdr.readLine()) != null) {
            // Get the user ID for this record
                Map parsed = transformXmlToMap(line);
                String userId = parsed.get("Id");
                // Map the user ID to the record
                userIdToInfo.put(userId, line);
            }
        }
        // Get the join type from the configuration
        joinType = context.getConfiguration().get("join.type");
    }
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        Map parsed = transformXmlToMap(value.toString());
        String userId = parsed.get("UserId");
        String userInformation = userIdToInfo.get(userId);
        // If the user information is not null, then output
        if (userInformation != null) {
            outvalue.set(userInformation);
            context.write(value, outvalue);
        } else if (joinType.equalsIgnoreCase("leftouter")) {
            // If we are doing a left outer join,
            // output the record with an empty value
            context.write(value, EMPTY_TEXT);
        }
    }
}

Composite Join

the data sets must first be sorted by foreign key, partitioned by foreign key,and read in a very particular manner in order to use this type of join.

MapReduce Design Patterns-chapter 5_第3张图片

MapReduce Design Patterns-chapter 5_第4张图片

Problem: Given two large formatted data sets of user information and comments, enrich the comments with user information data.

Drive Code

public static void main(String[] args) throws Exception {
    Path userPath = new Path(args[0]);
    Path commentPath = new Path(args[1]);
    Path outputDir = new Path(args[2]);
    String joinType = args[3];
    JobConf conf = new JobConf("CompositeJoin");
    conf.setJarByClass(CompositeJoinDriver.class);
    conf.setMapperClass(CompositeMapper.class);
    conf.setNumReduceTasks(0);
    // Set the input format class to a CompositeInputFormat class.
    // The CompositeInputFormat will parse all of our input files and output
    // records to our mapper.
    conf.setInputFormat(CompositeInputFormat.class);
    // The composite input format join expression will set how the records
    // are going to be read in, and in what input format.
    conf.set("mapred.join.expr", CompositeInputFormat.compose(joinType,
            KeyValueTextInputFormat.class, userPath, commentPath));
    TextOutputFormat.setOutputPath(conf, outputDir);
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(Text.class);
    RunningJob job = JobClient.runJob(conf);
    while (!job.isComplete()) {
        Thread.sleep(1000);
    }
    System.exit(job.isSuccessful() ? 0 : 1);
}

Map Code

public static class CompositeMapper extends MapReduceBase implements
        Mapper {
    public void map(Text key, TupleWritable value,
            OutputCollector output,
            Reporter reporter) throws IOException {
        // Get the first two elements in the tuple and output them
        output.collect((Text) value.get(0), (Text) value.get(1));
    }
}

Cartesian Product

Most use cases for using a Cartesian product are some sort of similarity analysis on documents or media.

Applicability
Use a Cartesian product when:
• You want to analyze relationships between all pairs of individual records.
• You’ve exhausted all other means to solve this problem.
• You have no time constraints on execution time.

MapReduce Design Patterns-chapter 5_第5张图片

Problem: Given a groomed data set of StackOverflow comments, find pairs of comments that are similar based on the number of like words between each pair.

inputFormat会通过getSplits对数据进行划分，划分后的数据块（byte-oriented view）会转换为一个RecordReader对象（record-oriented view），遍历RecordReader,一条记录调用一个调用map

Input format code：

public static class CartesianInputFormat extends FileInputFormat {
    public static final String LEFT_INPUT_FORMAT = "cart.left.inputformat";
    public static final String LEFT_INPUT_PATH = "cart.left.path";
    public static final String RIGHT_INPUT_FORMAT = "cart.right.inputformat";
    public static final String RIGHT_INPUT_PATH = "cart.right.path";
    public static void setLeftInputInfo(JobConf job,
            Class inputFormat, String inputPath) {
        job.set(LEFT_INPUT_FORMAT, inputFormat.getCanonicalName());
        job.set(LEFT_INPUT_PATH, inputPath);
    }
    public static void setRightInputInfo(JobConf job,
            Class inputFormat, String inputPath) {
        job.set(RIGHT_INPUT_FORMAT, inputFormat.getCanonicalName());
        job.set(RIGHT_INPUT_PATH, inputPath);
    }
    public InputSplit[] getSplits(JobConf conf, int numSplits)
            throws IOException {
        // Get the input splits from both the left and right data sets by LEFT_INPUT_FORMAT
        InputSplit[] leftSplits = getInputSplits(conf,
                conf.get(LEFT_INPUT_FORMAT), conf.get(LEFT_INPUT_PATH),
                    numSplits);
        InputSplit[] rightSplits = getInputSplits(conf,
                conf.get(RIGHT_INPUT_FORMAT), conf.get(RIGHT_INPUT_PATH),
                    numSplits);
        // Create our CompositeInputSplits, size equal to
        // left.length * right.length
        CompositeInputSplit[] returnSplits =
                new CompositeInputSplit[leftSplits.length *
                    rightSplits.length];
        int i = 0;
        // For each of the left input splits
        for (InputSplit left : leftSplits) {
                // For each of the right input splits
            for (InputSplit right : rightSplits) {
                // Create a new composite input split composing of the two
                returnSplits[i] = new CompositeInputSplit(2);
                returnSplits[i].add(left);
                returnSplits[i].add(right);
                ++i;
            }
        }
        // Return the composite splits
        LOG.info("Total splits to process: " + returnSplits.length);
        return returnSplits;
      }
    public RecordReader getRecordReader(InputSplit split, JobConf conf,
            Reporter reporter) throws IOException {
            // Create a new instance of the Cartesian record reader
            return new CartesianRecordReader((CompositeInputSplit) split,
                    conf, reporter);
    }
    private InputSplit[] getInputSplits(JobConf conf,
            String inputFormatClass, String inputPath, int numSplits)
            throws ClassNotFoundException, IOException {
        // Create a new instance of the input format
        FileInputFormat inputFormat = (FileInputFormat) ReflectionUtils
                .newInstance(Class.forName(inputFormatClass), conf);
        // Set the input path for the left data set
        inputFormat.setInputPaths(conf, inputPath);
        //按照inputFormat的划分方法，将inputPath的文件划分成numSplits份
        return inputFormat.getSplits(conf, numSplits);
    }
}

Drive Code：

public static void main(String[] args) throws IOException,
        InterruptedException, ClassNotFoundException {
    // Configure the join type
    JobConf conf = new JobConf("Cartesian Product");
    conf.setJarByClass(CartesianProduct.class);
    conf.setMapperClass(CartesianMapper.class);
    conf.setNumReduceTasks(0);
    conf.setInputFormat(CartesianInputFormat.class);
    // Configure the input format
    CartesianInputFormat.setLeftInputInfo(conf, TextInputFormat.class, args[0]);
    CartesianInputFormat.setRightInputInfo(conf, TextInputFormat.class, args[0]);
    TextOutputFormat.setOutputPath(conf, new Path(args[1]));
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(Text.class);
    RunningJob job = JobClient.runJob(conf);
    while (!job.isComplete()) {
        Thread.sleep(1000);
    }
    System.exit(job.isSuccessful() ? 0 : 1);
}

Record reader code.
During task setup, getRecordReader is called by the framework to return the CartesianRecordReader. The constructor of this class creates two separate record reader objects, one for each split.

The first call to next reads the first record from the left data set for the mapper input key, and the first record from the right data set as the mapper input value. This key/value pair is then given to the mapper for processing by the framework.

Subsequent calls to next then continue to read all the records from the right record reader, allowing the mapper to process them, until it says it has no more. In this case, a flag is set and the do-while will loop backwards, reading the second record from the left data set. The right record reader is reset, and the process continues.

This process completes until the left record reader returns false, stating there are no more key/value pairs. At this point, the record reader has given the Cartesian product
of both input splits to the map task.

public static class CartesianRecordReader implements
        RecordReader {
    // Record readers to get key value pairs
    private RecordReader leftRR = null, rightRR = null;
    // Store configuration to re-create the right record reader
    private FileInputFormat rightFIF;
    private JobConf rightConf;
    private InputSplit rightIS;
    private Reporter rightReporter;
    // Helper variables
    private K1 lkey;
    private V1 lvalue;
    private K2 rkey;
    private V2 rvalue;
    private boolean goToNextLeft = true, alldone = false;
    public CartesianRecordReader(CompositeInputSplit split, JobConf conf,
            Reporter reporter) throws IOException {
        this.rightConf = conf;
        this.rightIS = split.get(1);
        this.rightReporter = reporter;
        // Create left record reader
        FileInputFormat leftFIF = (FileInputFormat) ReflectionUtils
                .newInstance(Class.forName(conf
                        .get(CartesianInputFormat.LEFT_INPUT_FORMAT)), conf);
        leftRR = leftFIF.getRecordReader(split.get(0), conf, reporter);
        // Create right record reader
        rightFIF = (FileInputFormat) ReflectionUtils.newInstance(Class
                .forName(conf
                        .get(CartesianInputFormat.RIGHT_INPUT_FORMAT)), conf);
        rightRR = rightFIF.getRecordReader(rightIS, rightConf, rightReporter);
      
        // Create key value pairs for parsing
        lkey = (K1) this.leftRR.createKey();
        lvalue = (V1) this.leftRR.createValue();
        rkey = (K2) this.rightRR.createKey();
        rvalue = (V2) this.rightRR.createValue();
    }
    public boolean next(Text key, Text value) throws IOException {
        do {
            // If we are to go to the next left key/value pair
            if (goToNextLeft) {
                // Read the next key value pair, false means no more pairs
                if (!leftRR.next(lkey, lvalue)) {
                    // If no more, then this task is nearly finished
                    alldone = true;
                    break;
                } else {
                    // If we aren't done, set the value to the key and set
                    // our flags
                    key.set(lvalue.toString());
                    goToNextLeft = alldone = false;
                    // Reset the right record reader
                    this.rightRR = this.rightFIF.getRecordReader(
                            this.rightIS, this.rightConf,
                            this.rightReporter);
                }
            }
            // Read the next key value pair from the right data set
            if (rightRR.next(rkey, rvalue)) {
                // If success, set the value
                value.set(rvalue.toString());
            } else {
                // Otherwise, this right data set is complete
                // and we should go to the next left pair
                goToNextLeft = true;
            }
            // This loop will continue if we finished reading key/value
            // pairs from the right data set
        } while (goToNextLeft);
        // Return true if a key/value pair was read, false otherwise
        return !alldone;
    }
}

Map Code：

public static class CartesianMapper extends MapReduceBase implements
        Mapper {
    private Text outkey = new Text();
    public void map(Text key, Text value,
            OutputCollector output, Reporter reporter)
            throws IOException {
        // If the two comments are not equal
        if (!key.toString().equals(value.toString())) {
            String[] leftTokens = key.toString().split("\\s");
            String[] rightTokens = value.toString().split("\\s");
            HashSet leftSet = new HashSet(
                    Arrays.asList(leftTokens));
            HashSet rightSet = new HashSet(
                    Arrays.asList(rightTokens));
            int sameWordCount = 0;
            StringBuilder words = new StringBuilder();
            for (String s : leftSet) {
                if (rightSet.contains(s)) {
                    words.append(s + ",");
                    ++sameWordCount;
                }
            }
            // If there are at least three words, output
            if (sameWordCount > 2) {
                outkey.set(words + "\t" + key);
                output.collect(outkey, value);
            }
        }
    }
}

【iOS】MVC设计模式 Magnetic_h ios mvc 设计模式 objective-c 学习 ui
MVC前言如何设计一个程序的结构，这是一门专门的学问，叫做"架构模式"（architecturalpattern），属于编程的方法论。MVC模式就是架构模式的一种。它是Apple官方推荐的App开发架构，也是一般开发者最先遇到、最经典的架构。MVC各层controller层Controller/ViewController/VC（控制器）负责协调Model和View，处理大部分逻辑它将数据从Mod
浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
Hadoop 傲雪凌霜，松柏长青后端大数据 hadoop 大数据分布式
ApacheHadoop是一个开源的分布式计算框架，主要用于处理海量数据集。它具有高度的可扩展性、容错性和高效的分布式存储与计算能力。Hadoop核心由四个主要模块组成，分别是HDFS（分布式文件系统）、MapReduce（分布式计算框架）、YARN（资源管理）和HadoopCommon（公共工具和库）。1.HDFS（HadoopDistributedFileSystem）HDFS是Hadoop生
设计模式 23 访问者模式 WineMonk #设计模式设计模式访问者模式
设计模式23创建型模式（5）：工厂方法模式、抽象工厂模式、单例模式、建造者模式、原型模式结构型模式（7）：适配器模式、桥接模式、组合模式、装饰者模式、外观模式、享元模式、代理模式行为型模式（11）：责任链模式、命令模式、解释器模式、迭代器模式、中介者模式、备忘录模式、观察者模式、状态模式、策略模式、模板方法模式、访问者模式文章目录设计模式23访问者模式（VisitorPattern）1定义2结构3
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
MATLAB语言基础教程、小项目1：简单的计算器、小项目2：有页面的计算器、使用App Designer创建GUI计算器 azuredragonz 学习教程 matlab 开发语言
MATLABMATLAB语言基础教程1.MATLAB简介2.基本语法变量与赋值向量与矩阵矩阵运算数学函数控制流3.函数4.绘图案例：简单方程求解小项目1：简单的科学计算器功能代码项目说明小项目2：有页面的计算器使用AppDesigner创建GUI计算器主要步骤：完整代码（使用MATLAB编写）说明：如何运行：小项目总结MATLAB语言基础教程1.MATLAB简介MATLAB（矩阵实验室）是一种用于
从简单到复杂：三种工厂模式的对比与应用技术拾光者设计模式 java 设计模式简单工厂模式抽象工厂模式工厂方法模式
在软件设计中，创建型设计模式用于处理对象创建的复杂性。本文将对比三种常见的创建型设计模式：简单工厂模式、工厂方法模式和抽象工厂模式。一，简单工厂模式定义：简单工厂模式（SimpleFactoryPattern）定义了一个工厂类，该类可以根据传入的参数决定创建哪一种产品实例。结构：产品（Product）：定义产品的接口。具体产品（ConcreteProduct）：实现具体产品。工厂（Factory）
CISSP考点拾遗——软件保障SwA 我全家都是CISSP
说明：“考点拾遗”系列基于日常为学员和网友做的答疑整理，主要涉及教材中没有完全覆盖到的知识点。Softwareassuranceisthelevelofconfidencethatsoftwareisfreefromvulnerabilities,eitherintentionallydesignedintothesoftwareoraccidentallyinsertedatanytimedur
hbase介绍 CrazyL- 云计算+大数据 hbase
hbase是一个分布式的、多版本的、面向列的开源数据库hbase利用hadoophdfs作为其文件存储系统，提供高可靠性、高性能、列存储、可伸缩、实时读写、适用于非结构化数据存储的数据库系统hbase利用hadoopmapreduce来处理hbase、中的海量数据hbase利用zookeeper作为分布式系统服务特点：数据量大：一个表可以有上亿行，上百万列（列多时，插入变慢）面向列：面向列（族）的
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
Android干净架构MVI模板使用指南井美婵Toby
Android干净架构MVI模板使用指南android-clean-architecture-mvi-boilerplateAforkofourcleanarchitectureboilerplateusingtheModel-View-Intentpattern项目地址:https://gitcode.com/gh_mirrors/an/android-clean-architecture-mv
《Android进阶之光》读书笔记 soleil雪寂读书笔记 #Android进阶之光
文章目录第1章Android新特性1.1.Android5.0新特性1.2.RecyclerView1.1.4.3种Notification1.1.5.Toolbar与Palette1.1.6.Palette1.2.Android6.0新特性1.2.2.运行时权限机制1.3.Android7.0新特性第2章MaterialDesign2.2.DesignSupportLibrary常用控件详解第3
《Android进阶之光》— Android 书籍王睿丶 Android 永无止境《Android进阶之光》Android书籍 Android phoenix 移动开发
文章目录第1章Android新特性1第2章MaterialDesign48第3章View体系与自定义View87第4章多线程编程165第5章网络编程与网络框架204第6章设计模式271第7章事件总线308第8章函数响应式编程333第9章注解与依赖注入框架382第10章应用架构设计422第11章系统架构与MediaPlayer框架460出版年:2017-7简介：《Android进阶之光》是一本And
Java：日期类2 昭关969 java 开发语言
SimpleDateFormat日期格式化类构造SimpleDateFormat(Stringpattern);pattern是我们自己制定的日期格式，字母不能改变，但连接符可以改变yyyy--MM--dd--HH时间单位字母表示Y年M月d日H时m分s秒方法Stringformat（Datedate）将Date对象按照对应格式转成StringDateparse（Stringsource）将符合我们
深入理解单元测试元闰子单元测试 log4j
荐语本文要介绍的是2020年O’Reilly出版的书籍UnitTestingPrinciples,Practices,andPatterns，一本在豆瓣评分高达9.9的好书。作为一名软件开发工程师，你应该对单元测试（unittest）很熟悉，但单元测试的目的、Mock的正确用法、单元测试和集成测试的区别等等，你真的懂吗？书中对这些内容都做了深入的介绍，并通过实际案例教你如何写出好的单元测试。读完这
Spark集群的三种模式 MelodyYN #Spark spark hadoop big data
文章目录1、Spark的由来1.1Hadoop的发展1.2MapReduce与Spark对比2、Spark内置模块3、Spark运行模式3.1Standalone模式部署配置历史服务器配置高可用运行模式3.2Yarn模式安装部署配置历史服务器运行模式4、WordCount案例1、Spark的由来定义：Hadoop主要解决，海量数据的存储和海量数据的分析计算。Spark是一种基于内存的快速、通用、可
月度总结 | 2022年03月 | 考研与就业的抉择 | 确定未来走大数据开发路线「已注销」个人总结 hadoop
一、时间线梳理3月3日，寻找到同专业的就业伙伴3月5日，着手准备Java八股文，决定先走Java后端路线3月8月，申请到了校图书馆的考研专座，决定暂时放弃就业，先准备考研，买了数学和408的资料书3月9日-3月13日，因疫情原因，宿舍区暂封，这段时间在准备考研，发现内容特别多3月13日-3月19日，大部分时间在刷Hadoop、Zookeeper、Kafka的视频，同时在准备实习的项目3月20日，退
HBase介绍 mingyu1016 数据库
概述HBase是一个分布式的、面向列的开源数据库,源于google的一篇论文《bigtable：一个结构化数据的分布式存储系统》。HBase是GoogleBigtable的开源实现，它利用HadoopHDFS作为其文件存储系统，利用HadoopMapReduce来处理HBase中的海量数据，利用Zookeeper作为协同服务。HBase的表结构HBase以表的形式存储数据。表有行和列组成。列划分为
Java中的大数据处理框架对比分析省赚客app开发者 java 开发语言
Java中的大数据处理框架对比分析大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天，我们将深入探讨Java中常用的大数据处理框架，并对它们进行对比分析。大数据处理框架是现代数据驱动应用的核心，它们帮助企业处理和分析海量数据，以提取有价值的信息。本文将重点介绍ApacheHadoop、ApacheSpark、ApacheFlink和ApacheStorm这四种流行的
【Redis】Redis缓存 1886i Java Redis 缓存 redis 数据库
目录一、缓存1、概念2、作用3、缺点二、缓存模型三、缓存的更新1、更新策略2、主动更新的三种模式1.cacheasidepattern2.read/writethroughpattern3.writebehindcachingpattern3、线程安全问题1.缓存删除还是更新缓存2.先删除缓存后操作数据库3.先操作数据库后删除缓存4.如何保证缓存与数据库操作同时成功或失败4、最佳选择一、缓存1、概
Qt控件编辑功能(二) 雨田哥工作号
简述根据QtDesigner的控件选中，拉伸效果，用过Qt的盆友都很熟悉Qt的Designer，这个我就不多说了，我们先看看QtDesigner中的效果QtDesigner效果图图这里写图片描述模仿功能介绍1.支持选中效果；2.支持自由拉伸效果；3.支持双击鼠标左键编辑功能；4.支持键盘↑↓←→按键移动；5.支持按住ctrl+鼠标左键多选控件功能；6.支持键盘delete键，删除选中控件功能；模仿
Hadoop windows intelij 跑 MR WordCount piziyang12138
一、软件环境我使用的软件版本如下:IntellijIdea2017.1Maven3.3.9Hadoop分布式环境二、创建maven工程打开Idea,file->new->Project,左侧面板选择maven工程。(如果只跑MapReduce创建java工程即可，不用勾选Creatfromarchetype，如果想创建web工程或者使用骨架可以勾选)image.png设置GroupId和Artif
echarts象形渐变柱状图星星跌入梦境* echarts angular.js 前端
一、效果图如下：二、代码如下（1）父组件importitemfrom'../bigdata/components/item.vue'exportdefault{components:{item}}.page-con{width:100%;height:100%;.main-con{width:35%;height:33%;}}（2）子组件importechartsfrom"echarts";exp
LeetCode 2207. 字符串中最多数目的子字符串 Sasakihaise_ LeetCode leetcode 后缀和
题目链接：力扣https://leetcode-cn.com/problems/maximize-number-of-subsequences-in-a-string/【分析】由于pattern中只有两个字符，假设分别是a、b，只需要统计出text中每个a后面有多少b即可，这儿这个通过后缀和的思想，先算出总的b的个数，如果当前字符是a，那么后面b的个数就是总的b的个数，如果是b，就把总的b的个数-
Hadoop学习第三课（HDFS架构--读、写流程）小小程序员呀~ 数据库 hadoop 架构 big data
1.块概念举例1：一桶水1000ml，瓶子的规格100ml=>需要10个瓶子装完一桶水1010ml，瓶子的规格100ml=>需要11个瓶子装完一桶水1010ml，瓶子的规格200ml=>需要6个瓶子装完块的大小规格，只要是需要存储，哪怕一点点，也是要占用一个块的块大小的参数：dfs.blocksize官方默认的大小为128M官网：https://hadoop.apache.org/docs/r3.
sphinx-apidoc longgb246
[toc]一、使用sphinx-apidoc[OPTIONS]-o[EXCLUDE_PATTERN,…]sphinx-apidoc是一个自动生成Sphinx源的工具，使用该autodoc扩展。MODULE_PATH：是python的源码的文件目录。OUTPUT_PATH：是包含conf.py和*.rst的source目录。注意在python源码的py文件中，应该使用if__name__=='__m
算法设计与分析合并排序的递归实现算法 Jxcupupup 算法算法算法设计与分析
合并排序的递归实现算法。输入：先输入进行合并排序元素的个数，然后依次随机输入（或随机生成）每个数字。输出：元素排序后的结果，数字之间不加任何标识符。示//完整代码在GitHub上//https://github.com/Jxcup/Course_Algorithm_Analysis-Design/blob/main/MergeSort_iteration.cpp//合并排序递归#includeus
hadoop启动HDFS命令 m0_67401228 java 搜索引擎 linux 后端
启动命令：/hadoop/sbin/start-dfs.sh停止命令：/hadoop/sbin/stop-dfs.sh
如何利用命令模式实现一个手游后端架构? 隔窗听雨眠命令模式
命令模式的原理解读命令模式的英文翻译是CommandDesignPattern。在GoF的《设计模式》一书中，它是这么定义的：Thecommandpatternencapsulatesarequestasanobject,therebylettingusparameterizeotherobjectswithdifferentrequests,queueorlogrequests,andsuppo
redis学习笔记——不仅仅是存取数据 Everyday都不同 returnSource expire/del incr/lpush 数据库分区 redis
最近项目中用到比较多redis，感觉之前对它一直局限于get/set数据的层面。其实作为一个强大的NoSql数据库产品，如果好好利用它，会带来很多意想不到的效果。（因为我搞java，所以就从jedis的角度来补充一点东西吧。PS：不一定全，只是个人理解，不喜勿喷） 1、关于JedisPool.returnSource(Jedis jeids) 这个方法是从red
SQL性能优化-持续更新中。。。。。。 atongyeye oracle sql
1 通过ROWID访问表--索引你可以采用基于ROWID的访问方式情况,提高访问表的效率, , ROWID包含了表中记录的物理位置信息..ORACLE采用索引(INDEX)实现了数据和存放数据的物理位置(ROWID)之间的联系. 通常索引提供了快速访问ROWID的方法,因此那些基于索引列的查询就可以得到性能上的提高. 2 共享SQL语句--相同的sql放入缓存 3 选择最有效率的表
[JAVA语言]JAVA虚拟机对底层硬件的操控还不完善 comsci JAVA虚拟机
如果我们用汇编语言编写一个直接读写CPU寄存器的代码段，然后利用这个代码段去控制被操作系统屏蔽的硬件资源，这对于JVM虚拟机显然是不合法的，对操作系统来讲，这样也是不合法的，但是如果是一个工程项目的确需要这样做，合同已经签了，我们又不能够这样做，怎么办呢？那么一个精通汇编语言的那种X客，是否在这个时候就会发生某种至关重要的作用呢？ &n
lvs- real 男人50 LVS
#!/bin/bash # # Script to start LVS DR real server. # description: LVS DR real server # #. /etc/rc.d/init.d/functions VIP=10.10.6.252 host='/bin/hostname' case "$1" in sta
生成公钥和私钥 oloz DSA 安全加密
package com.msserver.core.util; import java.security.KeyPair; import java.security.PrivateKey; import java.security.PublicKey; import java.security.SecureRandom; public class SecurityUtil {
UIView 中加入的cocos2d，背景透明 374016526 cocos2d glClearColor
要点是首先pixelFormat:kEAGLColorFormatRGBA8，必须有alpha层才能透明。然后view设置为透明glView.opaque = NO;[director setOpenGLView:glView];[self.viewController.view setBackgroundColor:[UIColor clearColor]];[self.viewControll
mysql常用命令香水浓 mysql
连接数据库 mysql -u troy -ptroy 备份表 mysqldump -u troy -ptroy mm_database mm_user_tbl > user.sql 恢复表（与恢复数据库命令相同） mysql -u troy -ptroy mm_database < user.sql 备份数据库 mysqldump -u troy -ptroy
我的架构经验系列文章 - 后端架构 - 系统层面 agevs JavaScript jquery css html5
系统层面：高可用性所谓高可用性也就是通过避免单独故障加上快速故障转移实现一旦某台物理服务器出现故障能实现故障快速恢复。一般来说，可以采用两种方式，如果可以做业务可以做负载均衡则通过负载均衡实现集群，然后针对每一台服务器进行监控，一旦发生故障则从集群中移除；如果业务只能有单点入口那么可以通过实现Standby机加上虚拟IP机制，实现Active机在出现故障之后虚拟IP转移到Standby的快速
利用ant进行远程tomcat部署 aijuans tomcat
在javaEE项目中，需要将工程部署到远程服务器上，如果部署的频率比较高，手动部署的方式就比较麻烦，可以利用Ant工具实现快捷的部署。这篇博文详细介绍了ant配置的步骤（http://www.cnblogs.com/GloriousOnion/archive/2012/12/18/2822817.html），但是在tomcat7以上不适用，需要修改配置，具体如下： 1.配置tomcat的用户角色
获取复利总收入 baalwolf 获取
public static void main(String args[]){ int money=200; int year=1; double rate=0.1; &
eclipse.ini解释 BigBird2012 eclipse
大多数java开发者使用的都是eclipse，今天感兴趣去eclipse官网搜了一下eclipse.ini的配置，供大家参考，我会把关键的部分给大家用中文解释一下。还是推荐有问题不会直接搜谷歌，看官方文档，这样我们会知道问题的真面目是什么，对问题也有一个全面清晰的认识。 Overview 1、Eclipse.ini的作用 Eclipse startup is controlled by th
AngularJS实现分页功能 bijian1013 JavaScript AngularJS 分页
对于大多数web应用来说显示项目列表是一种很常见的任务。通常情况下，我们的数据会比较多，无法很好地显示在单个页面中。在这种情况下，我们需要把数据以页的方式来展示，同时带有转到上一页和下一页的功能。既然在整个应用中这是一种很常见的需求，那么把这一功能抽象成一个通用的、可复用的分页（Paginator）服务是很有意义的。 &nbs
[Maven学习笔记三]Maven archetype bit1129 ArcheType
archetype的英文意思是原型，Maven archetype表示创建Maven模块的模版，比如创建web项目，创建Spring项目等等. mvn archetype提供了一种命令行交互式创建Maven项目或者模块的方式， mvn archetype 1.在LearnMaven-ch03目录下，执行命令mvn archetype:gener
【Java命令三】jps bit1129 Java命令
jps很简单，用于显示当前运行的Java进程，也可以连接到远程服务器去查看 [hadoop@hadoop bin]$ jps -help usage: jps [-help] jps [-q] [-mlvV] [<hostid>] Definitions: <hostid>: <hostname>[:
ZABBIX2.2 2.4 等各版本之间的兼容性 ronin47
zabbix更新很快，从2009年到现在已经更新多个版本，为了使用更多zabbix的新特性，随之而来的便是升级版本，zabbix版本兼容性是必须优先考虑的一点客户端AGENT兼容 zabbix1.x到zabbix2.x的所有agent都兼容zabbix server2.4：如果你升级zabbix server，客户端是可以不做任何改变，除非你想使用agent的一些新特性。 Zabbix代理（p
unity 3d还是cocos2dx哪个适合游戏？ brotherlamp unity自学 unity教程 unity视频 unity资料 unity
unity 3d还是cocos2dx哪个适合游戏？问：unity 3d还是cocos2dx哪个适合游戏？答：首先目前来看unity视频教程因为是3d引擎，目前对2d支持并不完善，unity 3d 目前做2d普遍两种思路，一种是正交相机，3d画面2d视角，另一种是通过一些插件，动态创建mesh来绘制图形单元目前用的较多的是2d toolkit，ex2d，smooth moves，sm2，
百度笔试题：一个已经排序好的很大的数组，现在给它划分成m段，每段长度不定，段长最长为k，然后段内打乱顺序，请设计一个算法对其进行重新排序 bylijinnan java 算法面试百度招聘
import java.util.Arrays; /** * 最早是在陈利人老师的微博看到这道题： * #面试题#An array with n elements which is K most sorted，就是每个element的初始位置和它最终的排序后的位置的距离不超过常数K * 设计一个排序算法。It should be faster than O(n*lgn)。
获取checkbox复选框的值 chiangfai checkbox
<title>CheckBox</title> <script type = "text/javascript"> doGetVal: function doGetVal() { //var fruitName = document.getElementById("apple").value;//根据
MySQLdb用户指南 chenchao051 mysqldb
原网页被墙，放这里备用。 MySQLdb User's Guide Contents Introduction Installation _mysql MySQL C API translation MySQL C API function mapping Some _mysql examples MySQLdb
HIVE 窗口及分析函数 daizj hive 窗口函数分析函数
窗口函数应用场景：（1）用于分区排序（2）动态Group By （3）Top N （4）累计计算（5）层次查询一、分析函数用于等级、百分点、n分片等。函数说明 RANK() &nbs
PHP ZipArchive 实现压缩解压Zip文件 dcj3sjt126com PHP zip
PHP ZipArchive 是PHP自带的扩展类，可以轻松实现ZIP文件的压缩和解压，使用前首先要确保PHP ZIP 扩展已经开启，具体开启方法就不说了，不同的平台开启PHP扩增的方法网上都有，如有疑问欢迎交流。这里整理一下常用的示例供参考。一、解压缩zip文件 01 02 03 04 05 06 07 08 09 10 11
精彩英语贺词 dcj3sjt126com 英语
I'm always here 我会一直在这里支持你 &nb
基于Java注解的Spring的IoC功能 e200702084 java spring bean IOC Office
java模拟post请求 geeksun java
一般API接收客户端（比如网页、APP或其他应用服务）的请求，但在测试时需要模拟来自外界的请求，经探索，使用HttpComponentshttpClient可模拟Post提交请求。此处用HttpComponents的httpclient来完成使命。 import org.apache.http.HttpEntity ; import org.apache.http.HttpRespon
Swift语法之 ---- ?和!区别 hongtoushizi ?swift !
转载自： http://blog.sina.com.cn/s/blog_71715bf80102ux3v.html Swift语言使用var定义变量，但和别的语言不同，Swift里不会自动给变量赋初始值，也就是说变量不会有默认值，所以要求使用变量之前必须要对其初始化。如果在使用变量之前不进行初始化就会报错： var stringValue : String //
centos7安装jdk1.7 jisonami jdk centos
安装JDK1.7 步骤1、解压tar包在当前目录 [root@localhost usr]#tar -xzvf jdk-7u75-linux-x64.tar.gz 步骤2：配置环境变量在etc/profile文件下添加 export JAVA_HOME=/usr/java/jdk1.7.0_75 export CLASSPATH=/usr/java/jdk1.7.0_75/lib
数据源架构模式之数据映射器 home198979 PHP 架构数据映射器 datamapper
前面分别介绍了数据源架构模式之表数据入口、数据源架构模式之行和数据入口数据源架构模式之活动记录，相较于这三种数据源架构模式，数据映射器显得更加“高大上”。一、概念数据映射器（Data Mapper）：在保持对象和数据库（以及映射器本身）彼此独立的情况下，在二者之间移动数据的一个映射器层。概念永远都是抽象的，简单的说，数据映射器就是一个负责将数据映射到对象的类数据。 &nb
在Python中使用MYSQL pda158 mysql python
缘由　　近期在折腾一个小东西须要抓取网上的页面。然后进行解析。将结果放到数据库中。　　了解到 Python在这方面有优势，便选用之。　　由于我有台 server上面安装有 mysql，自然使用之。在进行数据库的这个操作过程中遇到了不少问题，这里记录一下，大家共勉。　　 python中mysql的调用　　百度之后能够通过MySQLdb进行数据库操作。
单例模式 hxl1988_0311 java 单例设计模式单件
package com.sosop.designpattern.singleton; /* * 单件模式：保证一个类必须只有一个实例，并提供全局的访问点 * * 所以单例模式必须有私有的构造器，没有私有构造器根本不用谈单件 * * 必须考虑到并发情况下创建了多个实例对象 * */ /** * 虽然有锁，但是只在第一次创建对象的时候加锁，并发时不会存在效率
27种迹象显示你应该辞掉程序员的工作 vipshichg 工作
1、你仍然在等待老板在2010年答应的要提拔你的暗示。 2、你的上级近10年没有开发过任何代码。 3、老板假装懂你说的这些技术，但实际上他完全不知道你在说什么。 4、你干完的项目6个月后才部署到现场服务器上。 5、时不时的，老板在检查你刚刚完成的工作时，要求按新想法重新开发。 6、而最终这个软件只有12个用户。 7、时间全浪费在办公室政治中，而不是用在开发好的软件上。 8、部署前5分钟才开始测试。