bug发现与制造

Hadoop小笔记

Hadoop安装和启动

执行hadoop namenode –format格式化hadoop
执行start-all.sh启动hadoop
执行jps显示当前所有java进程

2642 DataNode

3386 Jps

2538 NameNode

2860 JobTracker

2769 SecondaryNameNode

2982 TaskTracker

在linux的浏览器中

通过hadoop:50070查看namenode是否运行成功

通过hadoop:50060查看tasktracker是否运行成功

通过hadoop:50030查看jobtracker是否运行成功

Hadoop源码

如果在eclipse的工程中出现” The method ***** of type AuthenticationFilter must override a superclass method”的错误,进行如下设置:

在eclipse里，Windows->Preferences->Java->Compiler “configure project specific settings”, Change from java 1.5 （5.0） to 1.6 （6.0） and then “yes” rebuild project。

HDFS的shell操作

hadoop fs –ls / 查看hdfs的根目录下的内容

hadoop fs –lsr / 递归查看hdfs的根目录下的内容

hadoop fs –mkdir /d1 在hdfs上创建文件夹d1

hadoop fs –put 把数据从linux上传到hdfs的特定目录

hadoop fs -get 把数据从hdfs下载到linux的特定路径

hadoop fs –text 查看hdfs的文件

hadoop fs –rm 删除hdfs中的文件

hadoop fs –rmr 删除文件夹

hadoop fs -touchz 创建一个0字节的空文件。

hadoop fs -text 将源文件输出为文本格式。

使用java操作HDFS

文件上传/下载/删除

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.*;

import java.io.InputStream;

import java.net.URI;

import java.net.URL;

import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;

import org.apache.hadoop.io.IOUtils;

public class FileOperator {

// hadoop上的文件路径

public static final String HDFS_PATH = "hdfs://192.168.1.104:9000";

public static void OutputFile(String path) throws Exception {

// TODO Auto-generated method stub

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

final URL url = new URL(HDFS_PATH + path);

final InputStream in = url.openStream();

IOUtils.copyBytes(in, System.out, 1024, true);

}

public static void main(String[] args) throws Exception {

final FileSystem fileSystem = FileSystem.get(new URI(HDFS_PATH), new Configuration());

//创建文件夹

fileSystem.mkdirs(new Path("/dir2"));

//创建文件

final FSDataOutputStream out = fileSystem.create(new Path("/dir2/file"));

//上传文件

final FileInputStream in = new FileInputStream("C:/log.txt");

IOUtils.copyBytes(in, out, 1024, true);

OutputFile("/dir2/file");

//下载文件

final FSDataInputStream input = fileSystem.open(new Path(HDFS_PATH + "/dir2/file"));

final FileOutputStream output = new FileOutputStream("C:/file.txt");

IOUtils.copyBytes(input, output, 1024, true);

//删除文件

final boolean deleteFile = fileSystem.deleteOnExit(new Path("/dir2/file"));

}

RPC通信

简介

1、RPC是一种CS调用模式，用于不同java进程之间的方法调用, 不同java进程间的对象方法的调用。一方称作服务端(server)，一方称作客户端(client)。 server端提供对象，供客户端调用的，被调用的对象的方法的执行发生在server端。

2、RPC是hadoop框架运行的基础，hadoop是建立在RPC基础之上。

3、RPC源码在org.apache.hadoop.ipc中。

4、Hadoop RPC在整个Hadoop中应用非常广泛，Client、DataNode、NameNode之间的通讯全靠它了。

5、举个例子，我们平时操作HDFS的时候，使用的是FileSystem类，它的内部有个DFSClient对象，这个对象负责与NameNode打交道。在运行时，DFSClient在本地创建一个NameNode的代理，然后就操作这个代理，这个代理就会通过网络，远程调用到NameNode的方法，也能返回值。

服务端：

public class MyServer {

public static final String SERVER_ADDRESS = "hdfs://192.168.1.104";

public static final int SERVER_PORT = 9000;

/**

* @param args

* @throws IOException

public static void main(String[] args) throws IOException {

// 构造一个RPC server，第一个参数是被客户端调用的对象实例，第二个参数是用于监听连接的地址

final Server server = RPC.getServer(new MyBiz(), SERVER_ADDRESS, SERVER_PORT, new Configuration());

server.start();

}

客户端：

public class MyClient {

/**

* @param args

* @throws IOException

public static void main(String[] args) throws IOException {

//构造一个客户端的代理对象，该对象实现指定的协议，通过该对象可以和指定地址的server通信

//这里的返回值必须是实现了VersionedProtocol的接口

//也可以通过getProxy获得代理对象

final MyBizable proxy = (MyBizable) RPC.waitForProxy(MyBizable.class,

MyBizable.VERSION,

new InetSocketAddress(MyServer.SERVER_ADDRESS, MyServer.SERVER_PORT),

new Configuration());

final String result = proxy.hello("wanjun");

System.out.println("客户端调用结果：" + result);

RPC.stopProxy(proxy);

}

客户端通过代理对象要访问的服务端方法

import org.apache.hadoop.ipc.VersionedProtocol;

public interface MyBizable extends VersionedProtocol{

public static final long VERSION = 4324523423L;

public String hello(String name);

}

public class MyBiz implements MyBizable{

public String hello(String name)

{

return name;

}

@Override

public long getProtocolVersion(String protocol, long clientVersion)

throws IOException {

// TODO Auto-generated method stub

return VERSION;

}

HDFS文件读写流程

写文件流程

客户端

上传一个文件到hdfs，一般会调用DistributedFileSystem.create，其实现如下：

public FSDataOutputStream create(Path f, FsPermission permission,

boolean overwrite,

int bufferSize, short replication, long blockSize,

Progressable progress) throws IOException {

return new FSDataOutputStream

(dfs.create(getPathName(f), permission,

overwrite, replication, blockSize, progress, bufferSize),

statistics);

}

其最终生成一个FSDataOutputStream用于向新生成的文件中写入数据。其成员变量dfs的类型为DFSClient，DFSClient的create函数如下：

public OutputStream create(String src,

FsPermission permission,

boolean overwrite,

short replication,

long blockSize,

Progressable progress,

int buffersize

) throws IOException {

checkOpen();

if (permission == null) {

permission = FsPermission.getDefault();

}

FsPermission masked = permission.applyUMask(FsPermission.getUMask(conf));

OutputStream result = new DFSOutputStream(src, masked,

overwrite, replication, blockSize, progress, buffersize,

conf.getInt("io.bytes.per.checksum", 512));

leasechecker.put(src, result);

return result;

}

其中构造了一个DFSOutputStream，在其构造函数中，通过RPC(参见NameNode.java中的initialize方法中的RPC.getServer和DFSClient.java中的createRPCNamenode方法中的RPC.getProxy)调用NameNode的create来创建一个文件。当然，构造函数中还做了一件重要的事情，就是streamer.start()，也即启动了一个pipeline，用于写数据，在写入数据的过程中，我们会仔细分析。

DFSOutputStream(String src, FsPermission masked, boolean overwrite,

short replication, long blockSize, Progressable progress,

int buffersize, int bytesPerChecksum) throws IOException {

this(src, blockSize, progress, bytesPerChecksum);

computePacketChunkSize(writePacketSize, bytesPerChecksum);

try {

namenode.create(

src, masked, clientName, overwrite, replication, blockSize);

} catch(RemoteException re) {

throw re.unwrapRemoteException(AccessControlException.class,

QuotaExceededException.class);

}

streamer.start();

}

5.2、NameNode

NameNode的create函数调用namesystem.startFile函数，其又调用startFileInternal函数，实现如下：

private synchronized void startFileInternal(String src,

PermissionStatus permissions,

String holder,

String clientMachine,

boolean overwrite,

boolean append,

short replication,

long blockSize

) throws IOException {

......

//创建一个新的文件，状态为under construction，没有任何data block与之对应

long genstamp = nextGenerationStamp();

INodeFileUnderConstruction newNode = dir.addFile(src, permissions,

replication, blockSize, holder, clientMachine, clientNode, genstamp);

......

}

5.3、客户端

下面轮到客户端向新创建的文件中写入数据了，一般会使用FSDataOutputStream的write函数，最终会调用DFSOutputStream的writeChunk函数：

按照hdfs的设计，对block的数据写入使用的是pipeline的方式，也即将数据分成一个个的package，如果需要复制三份，分别写入DataNode 1, 2, 3，则会进行如下的过程：

首先将package 1写入DataNode 1

然后由DataNode 1负责将package 1写入DataNode 2，同时客户端可以将pacage 2写入DataNode 1

然后DataNode 2负责将package 1写入DataNode 3, 同时客户端可以讲package 3写入DataNode 1，DataNode 1将package 2写入DataNode 2

就这样将一个个package排着队的传递下去，直到所有的数据全部写入并复制完毕

protected synchronized void writeChunk(byte[] b, int offset, int len, byte[] checksum) throws IOException {

//创建一个package，并写入数据

currentPacket = new Packet(packetSize, chunksPerPacket,

bytesCurBlock);

currentPacket.writeChecksum(checksum, 0, cklen);

currentPacket.writeData(b, offset, len);

currentPacket.numChunks++;

bytesCurBlock += len;

//如果此package已满，则放入队列中准备发送

if (currentPacket.numChunks == currentPacket.maxChunks ||

bytesCurBlock == blockSize) {

......

dataQueue.addLast(currentPacket);

//唤醒等待dataqueue的传输线程，也即DataStreamer

dataQueue.notifyAll();

currentPacket = null;

......

}

DataStreamer的run函数如下：

public void run() {

while (!closed && clientRunning) {

Packet one = null;

synchronized (dataQueue) {

//如果队列中没有package，则等待

while ((!closed && !hasError && clientRunning

&& dataQueue.size() == 0) || doSleep) {

try {

dataQueue.wait(1000);

} catch (InterruptedException e) {

}

doSleep = false;

}

try {

//得到队列中的第一个package

one = dataQueue.getFirst();

long offsetInBlock = one.offsetInBlock;

//由NameNode分配block，并生成一个写入流指向此block

if (blockStream == null) {

nodes = nextBlockOutputStream(src);

response = new ResponseProcessor(nodes);

response.start();

}

ByteBuffer buf = one.getBuffer();

//将package从dataQueue移至ackQueue,等待确认

dataQueue.removeFirst();

dataQueue.notifyAll();

synchronized (ackQueue) {

ackQueue.addLast(one);

ackQueue.notifyAll();

}

//利用生成的写入流将数据写入DataNode中的block

blockStream.write(buf.array(), buf.position(), buf.remaining());

if (one.lastPacketInBlock) {

blockStream.writeInt(0); //表示此block写入完毕

}

blockStream.flush();

} catch (Throwable e) {

}

......

}

其中重要的一个函数是nextBlockOutputStream，实现如下：

private DatanodeInfo[] nextBlockOutputStream(String client) throws IOException {

LocatedBlock lb = null;

boolean retry = false;

DatanodeInfo[] nodes;

int count = conf.getInt("dfs.client.block.write.retries", 3);

boolean success;

do {

......

//由NameNode为文件分配DataNode和block

lb = locateFollowingBlock(startTime);

block = lb.getBlock();

nodes = lb.getLocations();

//创建向DataNode的写入流

success = createBlockOutputStream(nodes, clientName, false);

......

} while (retry && --count >= 0);

return nodes;

}

locateFollowingBlock中通过RPC调用namenode.addBlock(src, clientName)函数

5.4、NameNode

NameNode的addBlock函数实现如下：

public LocatedBlock addBlock(String src,

String clientName) throws IOException {

LocatedBlock locatedBlock = namesystem.getAdditionalBlock(src, clientName);

return locatedBlock;

}

FSNamesystem的getAdditionalBlock实现如下：

public LocatedBlock getAdditionalBlock(String src,

String clientName

) throws IOException {

long fileLength, blockSize;

int replication;

DatanodeDescriptor clientNode = null;

Block newBlock = null;

......

//为新的block选择DataNode

DatanodeDescriptor targets[] = replicator.chooseTarget(replication,

clientNode,

null,

blockSize);

......

//得到文件路径中所有path的INode，其中最后一个是新添加的文件对的INode，状态为under construction

INode[] pathINodes = dir.getExistingPathINodes(src);

int inodesLen = pathINodes.length;

INodeFileUnderConstruction pendingFile = (INodeFileUnderConstruction)

pathINodes[inodesLen - 1];

//为文件分配block, 并设置在那写DataNode上

newBlock = allocateBlock(src, pathINodes);

pendingFile.setTargets(targets);

......

return new LocatedBlock(newBlock, targets, fileLength);

}

5.5、客户端

在分配了DataNode和block以后，createBlockOutputStream开始写入数据。

private boolean createBlockOutputStream(DatanodeInfo[] nodes, String client,

boolean recoveryFlag) {

//创建一个socket，链接DataNode

InetSocketAddress target = NetUtils.createSocketAddr(nodes[0].getName());

s = socketFactory.createSocket();

int timeoutValue = 3000 * nodes.length + socketTimeout;

s.connect(target, timeoutValue);

s.setSoTimeout(timeoutValue);

s.setSendBufferSize(DEFAULT_DATA_SOCKET_SIZE);

long writeTimeout = HdfsConstants.WRITE_TIMEOUT_EXTENSION * nodes.length +

datanodeWriteTimeout;

DataOutputStream out = new DataOutputStream(

new BufferedOutputStream(NetUtils.getOutputStream(s, writeTimeout),

DataNode.SMALL_BUFFER_SIZE));

blockReplyStream = new DataInputStream(NetUtils.getInputStream(s));

//写入指令

out.writeShort( DataTransferProtocol.DATA_TRANSFER_VERSION );

out.write( DataTransferProtocol.OP_WRITE_BLOCK );

out.writeLong( block.getBlockId() );

out.writeLong( block.getGenerationStamp() );

out.writeInt( nodes.length );

out.writeBoolean( recoveryFlag );

Text.writeString( out, client );

out.writeBoolean(false);

out.writeInt( nodes.length - 1 );

//注意，次循环从1开始，而非从0开始。将除了第一个DataNode以外的另外两个DataNode的信息发送给第一个DataNode, 第一个DataNode可以根据此信息将数据写给另两个DataNode

for (int i = 1; i < nodes.length; i++) {

nodes[i].write(out);

}

checksum.writeHeader( out );

out.flush();

firstBadLink = Text.readString(blockReplyStream);

if (firstBadLink.length() != 0) {

throw new IOException("Bad connect ack with firstBadLink " + firstBadLink);

}

blockStream = out;

}

客户端在DataStreamer的run函数中创建了写入流后，调用blockStream.write将数据写入DataNode

5.6、DataNode

DataNode的DataXceiver中，收到指令DataTransferProtocol.OP_WRITE_BLOCK则调用writeBlock函数：

private void writeBlock(DataInputStream in) throws IOException {

DatanodeInfo srcDataNode = null;

//读入头信息

Block block = new Block(in.readLong(),

dataXceiverServer.estimateBlockSize, in.readLong());

int pipelineSize = in.readInt(); // num of datanodes in entire pipeline

boolean isRecovery = in.readBoolean(); // is this part of recovery?

String client = Text.readString(in); // working on behalf of this client

boolean hasSrcDataNode = in.readBoolean(); // is src node info present

if (hasSrcDataNode) {

srcDataNode = new DatanodeInfo();

srcDataNode.readFields(in);

}

int numTargets = in.readInt();

if (numTargets < 0) {

throw new IOException("Mislabelled incoming datastream.");

}

//读入剩下的DataNode列表，如果当前是第一个DataNode，则此列表中收到的是第二个，第三个DataNode的信息，如果当前是第二个DataNode，则受到的是第三个DataNode的信息

DatanodeInfo targets[] = new DatanodeInfo[numTargets];

for (int i = 0; i < targets.length; i++) {

DatanodeInfo tmp = new DatanodeInfo();

tmp.readFields(in);

targets[i] = tmp;

}

DataOutputStream mirrorOut = null; // stream to next target

DataInputStream mirrorIn = null; // reply from next target

DataOutputStream replyOut = null; // stream to prev target

Socket mirrorSock = null; // socket to next target

BlockReceiver blockReceiver = null; // responsible for data handling

String mirrorNode = null; // the name:port of next target

String firstBadLink = ""; // first datanode that failed in connection setup

try {

//生成一个BlockReceiver, 其有成员变量DataInputStream in为从客户端或者上一个DataNode读取数据，还有成员变量DataOutputStream mirrorOut，用于向下一个DataNode写入数据，还有成员变量OutputStream out用于将数据写入本地。

blockReceiver = new BlockReceiver(block, in,

s.getRemoteSocketAddress().toString(),

s.getLocalSocketAddress().toString(),

isRecovery, client, srcDataNode, datanode);

// get a connection back to the previous target

replyOut = new DataOutputStream(

NetUtils.getOutputStream(s, datanode.socketWriteTimeout));

//如果当前不是最后一个DataNode，则同下一个DataNode建立socket连接

if (targets.length > 0) {

InetSocketAddress mirrorTarget = null;

// Connect to backup machine

mirrorNode = targets[0].getName();

mirrorTarget = NetUtils.createSocketAddr(mirrorNode);

mirrorSock = datanode.newSocket();

int timeoutValue = numTargets * datanode.socketTimeout;

int writeTimeout = datanode.socketWriteTimeout +

(HdfsConstants.WRITE_TIMEOUT_EXTENSION * numTargets);

mirrorSock.connect(mirrorTarget, timeoutValue);

mirrorSock.setSoTimeout(timeoutValue);

mirrorSock.setSendBufferSize(DEFAULT_DATA_SOCKET_SIZE);

//创建向下一个DataNode写入数据的流

mirrorOut = new DataOutputStream(

new BufferedOutputStream(

NetUtils.getOutputStream(mirrorSock, writeTimeout),

SMALL_BUFFER_SIZE));

mirrorIn = new DataInputStream(NetUtils.getInputStream(mirrorSock));

mirrorOut.writeShort( DataTransferProtocol.DATA_TRANSFER_VERSION );

mirrorOut.write( DataTransferProtocol.OP_WRITE_BLOCK );

mirrorOut.writeLong( block.getBlockId() );

mirrorOut.writeLong( block.getGenerationStamp() );

mirrorOut.writeInt( pipelineSize );

mirrorOut.writeBoolean( isRecovery );

Text.writeString( mirrorOut, client );

mirrorOut.writeBoolean(hasSrcDataNode);

if (hasSrcDataNode) { // pass src node information

srcDataNode.write(mirrorOut);

}

mirrorOut.writeInt( targets.length - 1 );

//此出也是从1开始，将除了下一个DataNode的其他DataNode信息发送给下一个DataNode

for ( int i = 1; i < targets.length; i++ ) {

targets[i].write( mirrorOut );

}

blockReceiver.writeChecksumHeader(mirrorOut);

mirrorOut.flush();

}

//使用BlockReceiver接受block

String mirrorAddr = (mirrorSock == null) ? null : mirrorNode;

blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,

mirrorAddr, null, targets.length);

......

} finally {

// close all opened streams

IOUtils.closeStream(mirrorOut);

IOUtils.closeStream(mirrorIn);

IOUtils.closeStream(replyOut);

IOUtils.closeSocket(mirrorSock);

IOUtils.closeStream(blockReceiver);

}

BlockReceiver的receiveBlock函数中，一段重要的逻辑如下：

void receiveBlock(

DataOutputStream mirrOut, // output to next datanode

DataInputStream mirrIn, // input from next datanode

DataOutputStream replyOut, // output to previous datanode

String mirrAddr, BlockTransferThrottler throttlerArg,

int numTargets) throws IOException {

......

//不断的接受package，直到结束

while (receivePacket() > 0) {}

if (mirrorOut != null) {

try {

mirrorOut.writeInt(0); // mark the end of the block

mirrorOut.flush();

} catch (IOException e) {

handleMirrorOutError(e);

}

......

}

BlockReceiver的receivePacket函数如下：

private int receivePacket() throws IOException {

//从客户端或者上一个节点接收一个package

int payloadLen = readNextPacket();

buf.mark();

//read the header

buf.getInt(); // packet length

offsetInBlock = buf.getLong(); // get offset of packet in block

long seqno = buf.getLong(); // get seqno

boolean lastPacketInBlock = (buf.get() != 0);

int endOfHeader = buf.position();

buf.reset();

setBlockPosition(offsetInBlock);

//将package写入下一个DataNode

if (mirrorOut != null) {

try {

mirrorOut.write(buf.array(), buf.position(), buf.remaining());

mirrorOut.flush();

} catch (IOException e) {

handleMirrorOutError(e);

}

buf.position(endOfHeader);

int len = buf.getInt();

offsetInBlock += len;

int checksumLen = ((len + bytesPerChecksum - 1)/bytesPerChecksum)*

checksumSize;

int checksumOff = buf.position();

int dataOff = checksumOff + checksumLen;

byte pktBuf[] = buf.array();

buf.position(buf.limit()); // move to the end of the data.

......

//将数据写入本地的block

out.write(pktBuf, dataOff, len);

/// flush entire packet before sending ack

flush();

// put in queue for pending acks

if (responder != null) {

((PacketResponder)responder.getRunnable()).enqueue(seqno,

lastPacketInBlock);

}

return payloadLen;

Hadoop实战

安装eclipse hadoop插件

Mapper、Reducer

map函数和reduce函数是交给用户实现的，这两个函数定义了任务本身。

map函数：接受一个键值对（key-value pair），产生一组中间键值对。MapReduce框架会将map函数产生的中间键值对里键相同的值传递给一个reduce函数。

reduce函数：接受一个键，以及相关的一组值，将这组值进行合并产生一组规模更小的值（通常只有一个或零个值）。

map的输入默认是一行记录,每条记录都存放在value里面，map的输出是一条一条的key-value

实例

//统计专利引用情况

public class PatentStat extends Configured implements Tool {

// Mapper第一个参数的类型一定要和map的第一个参数类型一致，否则出错！！！

public static class Map extends Mapper {

// map方法把文件的行号当成key,所以要用LongWritable

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

// 将输入的纯文本文件的数据转化成String

String line = value.toString();

// 分别对每一行进行处理

// 每行按逗号分割

* 第一种分割方法：使用Tokenizer StringTokenizer tokenizerLine = new

* StringTokenizer(line, ","); String citing =

* tokenizerLine.nextToken().trim(); String cited =

* tokenizerLine.nextToken().trim(); Text citingText = new

* Text(citing); Text citedText = new Text(cited); context.write(new

* Text(citedText), new Text(citingText));

// 第二种方法：使用split

String[] citing = line.split(",");

context.write(new Text(citing[1]), new Text(citing[0]));

}

// reduce将输入中的key复制到输出数据的key上，

// 然后根据输入的value-list中元素的个数决定key的输出次数

// 用全局linenum来代表key的位次

public static class Reduce extends Reducer {

// 实现reduce函数

public void reduce(Text key, Iterable values, Context context)

throws IOException, InterruptedException {

String csv = "";

for (Text value : values) {

if (csv.length() > 0) {

csv += ",";

}

csv += value.toString();

}

context.write(key, new Text(csv));

}

@Override

public int run(String[] args) throws Exception {

// TODO Auto-generated method stub

Configuration conf = new Configuration();

Job job = new Job(conf, "Patent Statistic");

job.setJarByClass(PatentStat.class);

// 设置输入和输出目录

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

// 设置Map和Reduce处理类

job.setMapperClass(Map.class);

job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

// 设置输出类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);

return 0;

}

public static void main(String[] args) throws Exception {

// 这句话很关键

// conf.set("mapred.job.tracker", "192.168.1.2:9001");

if (args.length != 2) {

System.err.println("Usage: PatentStat ");

System.exit(2);

}

int res = ToolRunner.run(new Configuration(), new PatentStat(), args);

}

通过脚本使用Streaming

mapper和reducer会从标准输入中读取用户数据，一行一行处理后发送给标准输出。Streaming工具会创建MapReduce作业，发送给各个tasktracker，同时监控整个作业的执行过程。

如果一个文件（可执行或者脚本）作为mapper，mapper初始化时，每一个mapper任务会把该文件作为一个单独进程启动，mapper任务运行时，它把输入切分成行并把每一行提供给可执行文件进程的标准输入。同时，mapper收集可执行文件进程标准输出的内容，并把收到的每一行内容转化成key/value对，作为mapper的输出。默认情况下，一行中第一个tab之前的部分作为key，之后的（不包括tab）作为value。如果没有tab，整行作为key值，value值为null。

对于reducer，类似。

Hadoop Streaming用法

Usage: $HADOOP_HOME/bin/hadoop jar

$HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar [options]

options：

（1）-input：输入文件路径

（2）-output：输出文件路径

（3）-mapper：用户自己写的mapper程序，可以是可执行文件或者脚本

（4）-reducer：用户自己写的reducer程序，可以是可执行文件或者脚本

（5）-file：打包文件到提交的作业中，可以是mapper或者reducer要用的输入文件，如配置文件，字典等。

（6）-partitioner：用户自定义的partitioner程序

（7）-combiner：用户自定义的combiner程序（必须用java实现）

（8）-D：作业的一些属性（以前用的是-jonconf）

实例：

Mapper.py

#!/usr/bin/env python

import sys

# maps words to their counts

word2count = {}

# input comes from STDIN (standard input)

for line in sys.stdin:

# remove leading and trailing whitespace

line = line.strip()

# split the line into words while removing any empty strings

words = filter(lambda word: word, line.split())

# increase counters

for word in words:

# write the results to STDOUT (standard output);

# what we output here will be the input for the

# Reduce step, i.e. the input for reducer.py

# tab-delimited; the trivial word count is 1

print '%s\t%s' % (word, 1)

Reducer.py

#!/usr/bin/env python

from operator import itemgetter

import sys

# maps words to their counts

word2count = {}

# input comes from STDIN

for line in sys.stdin:

# remove leading and trailing whitespace

line = line.strip()

# parse the input we got from mapper.py

word, count = line.split()

# convert count (currently a string) to int

try:

count = int(count)

word2count[word] = word2count.get(word, 0) + count

except ValueError:

# count was not a number, so silently

# ignore/discard this line

pass

# sort the words lexigraphically;

# this step is NOT required, we just do it so that our

# final output will look more like the official Hadoop

# word count examples

sorted_word2count = sorted(word2count.items(), key=itemgetter(0))

# write the results to STDOUT (standard output)

for word, count in sorted_word2count:

print '%s\t%s'% (word, count)

运行：

[root@Hadoop streaming]# hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.1.2.jar

-mapper /home/hadoop/streaming/mapper.py

-reducer /home/hadoop/streaming/reducer.py

-input /mapreduce/wordcount/input/*

-output /python-output

-jobconf mapred.reduce.tasks=1

注意：这里的input和output地址是相对服务器的地址

使用DataJoin包实现Join

public class DataJoin extends Configured implements Tool {

//mapper的主要功能就是打包一个record使其能够和其他拥有相同group key的记录去向一个Reducer,DataJoinMapperBase完成所有的打包工作.

public static class MapClass extends DataJoinMapperBase {

//这个在任务开始时调用，用于产生标签,此处就直接以文件名作为标签，标签最终被保存在inputTag中

protected Text generateInputTag(String inputFile) {

//String datasource = inputFile.split("-")[0];

return new Text(inputFile);

}

//获取主键——用于表联结的字段

protected Text generateGroupKey(TaggedMapOutput aRecord) {

String line = ((Text) aRecord.getData()).toString();

String[] tokens = line.split(",");

String groupKey = tokens[0];

return new Text(groupKey);

}

//为map的输出打上标签

protected TaggedMapOutput generateTaggedMapOutput(Object value) {

TaggedWritable retv = new TaggedWritable((Text) value);

retv.setTag(this.inputTag);

return retv;

}

//我们的子类只是实现combine方法用来筛选掉不需要的组合，获得所需的联结操作（内联结，左联结等）。并且将结果化为合适输出格式（如：字段排列，去重等）

public static class Reduce extends DataJoinReducerBase {

/*对于两个datasource的联结，tags的长度最长为2，values的长度=tags的长度

* for example:

* tags:{"Customers", "Orders"}

* values:{"3,Jose Madrize, 13344409898", "A-1, 13400.00, 2014-12-12"}

protected TaggedMapOutput combine(Object[] tags, Object[] values) {

if (tags.length < 2) return null; //这一步，实现内联结

String joinedStr = "";

for (int i=0; ilength; i++) {

if (i > 0) joinedStr += ",";

TaggedWritable tw = (TaggedWritable) values[i];

String line = ((Text) tw.getData()).toString();

String[] tokens = line.split(",", 2);//将一条记录划分两组，去掉第一组的组键名。

joinedStr += tokens[1];

}

TaggedWritable retv = new TaggedWritable(new Text(joinedStr));

retv.setTag((Text) tags[0]);

return retv;

}

//TaggedMapOutput是一个抽象数据类型，封装了标签与记录内容

//此处作为DataJoinMapperBase的输出值类型，需要实现Writable接口，所以要实现两个序列化方法

public static class TaggedWritable extends TaggedMapOutput {

private Writable data;

public TaggedWritable() {

this.tag = new Text("");

this.data = new Text("");

}

public TaggedWritable(Writable data) {

this.tag = new Text("");

this.data = data;

}

public Writable getData() {

return data;

}

public void setData(Writable data) {

this.data = data;

}

public void write(DataOutput out) throws IOException {

this.tag.write(out);

this.data.write(out);

}

public void readFields(DataInput in) throws IOException {

this.tag.readFields(in);

this.data.readFields(in);

}

public int run(String[] args) throws Exception {

Configuration conf = new Configuration();

JobConf job = new JobConf(conf, DataJoin.class);

Path in = new Path(args[0]);

Path out = new Path(args[1]);

FileInputFormat.setInputPaths(job, in);

FileOutputFormat.setOutputPath(job, out);

job.setJobName("DataJoin");

job.setMapperClass(MapClass.class);

job.setReducerClass(Reduce.class);

job.setInputFormat(TextInputFormat.class);

job.setOutputFormat(TextOutputFormat.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(TaggedWritable.class);

job.set("mapred.textoutputformat.separator", ",");

JobClient.runJob(job);

return 0;

}

public static void main(String[] args) throws Exception {

int res = ToolRunner.run(new Configuration(),

new DataJoin(),

args);

System.exit(res);

}

你可能感兴趣的:(Hadoop)

深入解析Hadoop中的推测执行：原理、算法与策略码字的字节 hadoop布道师 hadoop 算法推测执行
Hadoop推测执行概述在分布式计算环境中，任务执行速度的不均衡是一个普遍存在的挑战。Hadoop作为主流的大数据处理框架，通过引入推测执行（SpeculativeExecution）机制有效缓解了这一问题。该技术本质上是一种乐观的容错策略，当系统检测到某些任务执行明显落后于预期进度时，会自动在其它计算节点上启动相同任务的冗余副本，最终选择最先完成的任务结果作为输出。核心设计动机推测执行的诞生源于
spark on yarn 不辉放弃 pyspark 大数据开发
SparkonYARN是指将Spark应用程序运行在HadoopYARN集群上，借助YARN的资源管理和调度能力来管理Spark的计算资源。这种模式能充分利用现有Hadoop集群资源，简化集群管理，是企业中常用的Spark部署方式。核心角色•Spark应用：包含Driver进程和Executor进程。Driver负责任务调度、逻辑处理；Executor负责执行具体任务并存储数据。•YARN组件：◦
深入解析Hadoop中的Region分裂与合并机制码字的字节 hadoop布道师 hadoop 大数据分布式 Region 分裂合并
Hadoop与Region的基本概念Hadoop的分布式架构基础作为大数据处理的核心框架，Hadoop通过分布式存储和计算解决了海量数据的处理难题。其架构核心由HDFS（HadoopDistributedFileSystem）和MapReduce组成，前者负责数据的分布式存储，后者实现分布式计算。在HDFS中，数据被分割成固定大小的块（默认128MB）分散存储在集群节点上，而MapReduce则通
深入解析Hadoop RPC：技术细节与推广应用码字的字节 hadoop布道师 Hadoop RPC
HadoopRPC框架概述在分布式系统的核心架构中，远程过程调用（RPC）机制如同神经网络般连接着各个计算节点。Hadoop作为大数据处理的基石，其自主研发的RPC框架不仅支撑着内部组件的协同运作，更以独特的工程哲学诠释了分布式通信的本质。透明性：隐形的通信桥梁HadoopRPC最显著的特征是其对通信细节的完美封装。当NameNode接收DataNode的心跳检测，或ResourceManager
深入解析Hadoop：大数据处理的基石学习的锅 hadoop 大数据分布式
随着信息技术的快速发展和互联网的普及，数据的产生速度极具增加。面对如此海量的数据，传统的数据处理工具显得力不从心。在这种背景下，诞生了一系列用于处理大数据的框架与工具，而ApacheHadoop便是其中最为知名和应用最广泛的一个。本文将深入解析Hadoop的基本原理、架构及其在大数据处理中的重要性。1.Hadoop的起源与发展Hadoop起源于Google公司的三篇奠基性论文：GoogleFile
大数据技术关键技术组件
大数据技术是一组用于处理、分析和管理大规模数据集的复杂方法和技术。这些数据集的特点是容量大、增长速度快，且结构多样化，包括结构化、半结构化和非结构化数据。传统数据库管理和分析工具在处理此类数据时效率低下或无法胜任，因此需要专门的大数据技术栈来支持高效的数据处理和智能决策。大数据技术的关键组件通常包括：分布式存储系统：HadoopDistributedFileSystem(HDFS)：一个高度可扩展
大数据领域HDFS的集群资源管理优化大数据洞察大数据与AI人工智能大数据AI应用大数据 hdfs hadoop ai
大数据领域HDFS的集群资源管理优化关键词：HDFS；集群资源管理；存储优化；性能调优；副本策略；负载均衡；NameNode优化摘要：HDFS（Hadoop分布式文件系统）作为大数据领域的基石，承载着海量数据的存储与管理重任。随着数据规模爆炸式增长和业务复杂度提升，HDFS集群的资源管理面临着"存不下、跑不快、管不好"的三重挑战：存储资源浪费与不足并存、计算与存储资源匹配失衡、集群运维效率低下。本
深入探索Hadoop技术：全面学习指南
引言在大数据时代，高效地存储、处理和分析海量数据已成为企业决策与创新的关键驱动力。Hadoop，作为开源的大数据处理框架，以其强大的分布式存储和并行计算能力，以及丰富的生态系统，为企业提供了应对大规模数据挑战的有效解决方案。本文旨在为初学者和进阶者提供一份详尽的Hadoop技术学习指南，涵盖HDFS、MapReduce、YARN等核心组件，以及Hive、Pig、HBase等生态系统工具，助您踏上H
HDFS文件系统
HDFS文件系统是hadoop生态系统的核心，主要用于分布式文件存储，它具备高可用，流式读取，文件结构简单，跨平台的特点，它的集群采用的是主从结构，分为命名节点和数据节点，命名节点主要用于元数据管理（例如对目录，文件的创建，数据块与数据节点的关系维护管理）及数据节点管理（例如数据节点之间数据的复制，节点状态的维护，节点间数据的均衡），该文件系统最基本的存储单位是block即数据块，默认大小是64M
Flink-Hadoop实战项目 Dylan_muc hadoop hdfs flink
项目说明文档1.项目概述1.1项目简介本项目是一个基于ApacheFlink的大数据流处理平台，专门用于处理铁路系统的票务和车次信息数据。系统包含两个核心流处理作业：文件处理作业和数据合并作业，采用定时调度机制，支持Kerberos安全认证，实现从文件读取到数据仓库存储的完整数据处理链路。1.2技术栈流处理引擎:ApacheFlink1.18.1存储系统:HDFS(Hadoop分布式文件系统)数据
大数据技术是解决什么问题的？ @佳瑞大数据
基础知识1TB（太字节）=1024GB1PB（拍字节）=1024TB大数据核心框架HadoopHadoop作为大数据技术生态的核心框架，主要解决了海量数据（TB/PB级）的存储、处理和分析难题，尤其是在传统数据库（如MySQL）和单机计算无法应对的场景下，提供了低成本、高可靠、可扩展的解决方案。其核心解决的问题可归纳为以下几点：海量数据的存储问题传统痛点：单机存储容量有限（如单服务器硬盘通常在TB
Hadoop与图像识别与处理 AI天才研究院 AI大模型企业级应用开发实战 Agentic AI 实战 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Hadoop与图像识别与处理作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来在大数据时代，数据的爆炸性增长对数据处理技术提出了新的挑战。图像数据作为一种重要的数据形式，其处理和分析在许多领域中具有重要意义，如医疗影像分析、自动驾驶、安防监控等。然而，传统的图像处理方法在面对海量图像数据时显得力不从心。Hadoop作为一种分
hadoop 集群问题处理一切顺势而行 hadoop 大数据分布式
1.1.JournalNode的作用在HDFSHA配置中，为了实现两个NameNode之间的状态同步和故障自动切换，Hadoop使用了一组JournalNode来管理共享的编辑日志。具体来说，JournalNode的主要职责包括：共享编辑日志：JournalNode节点组成了一个分布式系统，用于存储HDFS的编辑日志（EditLogs）。这两个日志文件记录了对HDFS所做的所有更改，如文件创建、删
sqoop从mysql导数据到hdfs，出现java.lang.ClassNotFoundException: Class QueryResult not found 无级程序员大数据 sqoop mysql hdfs
运行sqoop从postgresql/mysql导入数据到hdfs,结果出现如下错误：2025-07-1816:59:13,624INFOorm.CompilationManager:HADOOP_MAPRED_HOMEis/opt/datasophon/hadoop-3.3.3Note:/opt/sqoop/bin/QueryResult.javausesoroverridesadeprecat
hive底层原理 sql执行过程_Hive原理总结（完整版）
目录课程大纲(HIVE增强)31.Hive基本概念41.1Hive简介41.1.1什么是Hive41.1.2为什么使用Hive41.1.3Hive的特点41.2Hive架构51.2.1架构图51.2.2基本组成51.2.3各组件的基本功能51.3Hive与Hadoop的关系61.4Hive与传统数据库对比61.5Hive的数据存储62.Hive基本操作72.1DDL操作72.1.1创建表72.1.
六、深度剖析 Hadoop 分布式文件系统（HDFS）的数据存储机制与读写流程
深度剖析Hadoop分布式文件系统（HDFS）的数据存储机制与读写流程在当今大数据领域当中，Hadoop分布式文件系统（HDFS）作为极为关键的核心组件之一，为海量规模的数据的存储以及处理构筑起了坚实无比的根基。本文将会对HDFS的数据存储机制以及读写流程展开全面且深入的探究，通过将原理与实际的实例紧密结合的方式，助力广大读者更加全面地理解HDFS的工作原理以及其具体的应用场景。一、HDFS概述H
Linux教程（4）----[hive数据仓库工具] .房东的猫 Linux教程（完善中~~）linux
Hive基本概念Hive简介什么是HiveHive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张数据库表，并提供类SQL查询功能。为什么使用Hive直接使用hadoop所面临的问题人员学习成本太高
【Hadoop】onekey_install脚本菜萝卜子 Linux hadoop 大数据分布式
hosts[root@kafka01hadoop-script]#cat/etc/hosts127.0.0.1localhostlocalhost.localdomainlocalhost4localhost4.localdomain4::1localhostlocalhost.localdomainlocalhost6localhost6.localdomain6192.168.100.150k
Hadoop与云原生集成：弹性扩缩容与OSS存储分离架构深度解析
Hadoop与云原生集成的必要性Hadoop在大数据领域的基石地位作为大数据处理领域的奠基性技术，Hadoop自2006年诞生以来已形成包含HDFS、YARN、MapReduce三大核心组件的完整生态体系。根据CSDN技术社区的分析报告，全球超过75%的《财富》500强企业仍在使用Hadoop处理EB级数据，其分布式文件系统HDFS通过数据分片（默认128MB块大小）和三副本存储机制，成功解决了P
Hive简介
文章目录Hive简介Hive特点Hive和RDBMS的对比Hive的架构Hive的数据组织Hive数据类型Hive简介1、Hive由Facebook实现并开源2、是基于Hadoop的一个数据仓库工具3、可以将结构化的数据映射为一张数据库表4、并提供HQL(HiveSQL)查询功能5、底层数据是存储在HDFS上6、Hive的本质是将SQL语句转换为MapReduce任务运行7、使不熟悉MapRedu
python基于Hadoop的NBA球员大数据分析与可视化系统
目录技术栈介绍具体实现截图系统设计研究方法：设计步骤设计流程核心代码部分展示研究方法详细视频演示试验方案论文大纲源码获取/详细视频演示技术栈介绍Django-SpringBoot-php-Node.js-flask本课题的研究方法和研究步骤基本合理，难度适中，本选题是学生所学专业知识的延续，符合学生专业发展方向，对于提高学生的基本知识和技能以及钻研能力有益。该学生能够在预定时间内完成该课题的设计。
大数据技术之集群数据迁移
dfs.namenode.rpc-address.nameservice1.namenode30hadoop104:8020dfs.namenode.rpc-address.nameservice1.namenode37hadoop106:8020dfs.namenode.http-address.nameservice1.namenode30hadoop104:9870dfs.namenode.
HIVE（二） 2301_78012738 hive 数据仓库
目录访问HIVE的三种方式DDLDML数据操作向表中装载数据数据导出常用函数Like和RLike分组Join排序分区表和分桶表访问HIVE的三种方式启动Hive命令，CtrlC退出客户端，执行测试语句，与sql一致[wyc@hadoop102hive]$bin/hive经验小结：在hive中执行语句报错：ExecutionError,returncode2fromorg.apache.hadoop
安全运维的 “五层防护”：构建全方位安全体系 KKKlucifer 安全运维
在数字化运维场景中，异构系统复杂、攻击手段隐蔽等挑战日益突出。保旺达基于“全域纳管-身份认证-行为监测-自动响应-审计溯源”的五层防护架构，融合AI、零信任等技术，构建全链路安全运维体系，以下从技术逻辑与实践落地展开解析：第一层：全域资产纳管——筑牢安全根基挑战云网基础设施包含分布式计算（Hadoop/Spark）、数据流处理（Storm/Flink）等异构组件，通信协议繁杂，传统方案难以全面纳管
Hive 事务表(ACID)问题梳理
文章目录问题描述分析原因什么是事务表概念事务表和普通内部表的区别相关配置事务表的适用场景注意事项设计原理与实现文件管理格式参考博客问题描述工作中需要使用pyspark读取Hive中的数据，但是发现可以获取metastore，外部表的数据可以读取，内部表数据有些表报错信息是：AnalysisException:org.apache.hadoop.hive.ql.metadata.HiveExcept
Docker快速构建Hive测试环境静谧星光 docker hive 容器编程
Docker是一种流行的容器化平台，可以帮助我们快速构建和管理应用程序的环境。在本文中，我们将学习如何使用Docker快速构建Hive测试环境。Hive是一个基于Hadoop的数据仓库基础设施，它提供了一种类似于SQL的查询语言，用于分析和处理大规模数据集。步骤1：安装Docker和DockerCompose首先，我们需要安装Docker和DockerCompose。您可以根据您的操作系统类型，从
HDFS 伪分布模式搭建与使用全攻略（适合初学者 & 开发测试环境） huihui450 hdfs hadoop 大数据
HDFS（HadoopDistributedFileSystem）作为Hadoop生态系统的核心组件，广泛应用于海量数据的分布式存储场景。对于开发者而言，伪分布模式提供了一种低成本、高还原度的学习与测试方式。本文将详细介绍如何在本地搭建并使用HDFS的伪分布模式，包括环境准备、配置过程、常用命令及常见问题排查，帮助你快速入门Hadoop分布式文件系统的实践操作。一、什么是伪分布模式？Hadoop有
YARN container cpu超核如何解决 fzip YARN 超核
在ApacheHadoopYARN中，ContainerCPU超核（即Container使用的CPU资源超过分配量）是一个常见问题，可能导致集群性能下降或不稳定。以下是解决该问题的详细步骤：1.问题诊断1.1确认超核现象查看YARNWebUI：访问http://:8088，检查Container的CPU使用率是否持续超过分配的vCore数。检查NodeManager日志：查看/var/log/ha
Hadoop-Mapreduce入门
Hadoop-Mapreduce入门MapReduce介绍mapreduce设计MapReduce编程规范入门案例WordCountMapReduce介绍MapReduce的思想核心是“分而治之”，适用于大量复杂的任务处理场景（大规模数据处理场景）。知识。Map负责“分”，把复杂的任务分解为若干个“简单的任务”来并行处理。可以进行拆分的前提是这些小任务可以并行计算，彼此间几乎没有依赖关系。Redu
Hadoop MapReduce入门且行且安~ 数据分析进阶之路 Linux命令 hadoop MapReduce入门
入门简介计算过程分为两个阶段Map和ReduceMap阶段并行处理输入数据Reduce阶段对Map结果进行汇总针对python语言来说：map函数或者reduce函数来说，输出的数据格式为元组tuple一个简单的MapReduce程序只需要指定map()reduce()input()output()剩下的由框架完成。Linux常见命令：-读取文件（文本文件，在Windows下使用记事本打开的文件）
解线性方程组 qiuwanchi
package gaodai.matrix; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class Test { public static void main(String[] args) { Scanner scanner = new Sc
在mysql内部存储代码 annan211 性能 mysql 存储过程触发器
在mysql内部存储代码在mysql内部存储代码，既有优点也有缺点，而且有人倡导有人反对。先看优点： 1 她在服务器内部执行，离数据最近，另外在服务器上执行还可以节省带宽和网络延迟。 2 这是一种代码重用。可以方便的统一业务规则，保证某些行为的一致性，所以也可以提供一定的安全性。 3 可以简化代码的维护和版本更新。 4 可以帮助提升安全，比如提供更细
Android使用Asynchronous Http Client完成登录保存cookie的问题 hotsunshine android
Asynchronous Http Client是android中非常好的异步请求工具除了异步之外还有很多封装比如json的处理，cookie的处理引用 Persistent Cookie Storage with PersistentCookieStore This library also includes a PersistentCookieStore whi
java面试题 Array_06 java 面试
java面试题第一，谈谈final, finally, finalize的区别。 final-修饰符（关键字）如果一个类被声明为final，意味着它不能再派生出新的子类，不能作为父类被继承。因此一个类不能既被声明为 abstract的，又被声明为final的。将变量或方法声明为final，可以保证它们在使用中不被改变。被声明为final的变量必须在声明时给定初值，而在以后的引用中只能
网站加速 oloz 网站加速
前序:本人菜鸟，此文研究总结来源于互联网上的资料，大牛请勿喷！本人虚心学习，多指教. 1、减小网页体积的大小，尽量采用div+css模式，尽量避免复杂的页面结构，能简约就简约。 2、采用Gzip对网页进行压缩； GZIP最早由Jean-loup Gailly和Mark Adler创建，用于UNⅨ系统的文件压缩。我们在Linux中经常会用到后缀为.gz
正确书写单例模式随意而生 java 设计模式单例
　　单例模式算是设计模式中最容易理解，也是最容易手写代码的模式了吧。但是其中的坑却不少，所以也常作为面试题来考。本文主要对几种单例写法的整理，并分析其优缺点。很多都是一些老生常谈的问题，但如果你不知道如何创建一个线程安全的单例，不知道什么是双检锁，那这篇文章可能会帮助到你。　　懒汉式，线程不安全　　当被问到要实现一个单例模式时，很多人的第一反应是写出如下的代码，包括教科书上也是这样
单例模式香水浓 java
懒汉调用getInstance方法时实例化 public class Singleton { private static Singleton instance; private Singleton() {} public static synchronized Singleton getInstance() { if(null == ins
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" AdyZhang apache http server
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" 每次到这一步都很小心防它的端口冲突问题，结果，特意留出来的80端口就是不能用，烦。解决方法确保几处： 1、停止IIS启动 2、把端口80改成其它（譬如90，800，，，什么数字都好） 3、防火墙(关掉试试) 在运行处输入 cmd 回车，转到apa
如何在android 文件选择器中选择多个图片或者视频？ aijuans android
我的android app有这样的需求，在进行照片和视频上传的时候，需要一次性的从照片/视频库选择多条进行上传但是android原生态的sdk中，只能一个一个的进行选择和上传。我想知道是否有其他的android上传库可以解决这个问题，提供一个多选的功能，可以使checkbox之类的，一次选择多个处理方法官方的图片选择器(但是不支持所有版本的androi，只支持API Level
mysql中查询生日提醒的日期相关的sql baalwolf mysql
SELECT sysid,user_name,birthday,listid,userhead_50,CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')),CURDATE(), dayofyear( CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')))-dayofyear(
MongoDB索引文件破坏后导致查询错误的问题 BigBird2012 mongodb
问题描述： MongoDB在非正常情况下关闭时，可能会导致索引文件破坏，造成数据在更新时没有反映到索引上。解决方案：使用脚本，重建MongoDB所有表的索引。 var names = db.getCollectionNames(); for( var i in names ){ var name = names[i]; print(name);
Javascript Promise bijian1013 JavaScript Promise
Parse JavaScript SDK现在提供了支持大多数异步方法的兼容jquery的Promises模式，那么这意味着什么呢，读完下文你就了解了。一.认识Promises “Promises”代表着在javascript程序里下一个伟大的范式，但是理解他们为什么如此伟大不是件简
[Zookeeper学习笔记九]Zookeeper源代码分析之Zookeeper构造过程 bit1129 zookeeper
Zookeeper重载了几个构造函数，其中构造者可以提供参数最多，可定制性最多的构造函数是 public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, long sessionId, byte[] sessionPasswd, boolea
【Java命令三】jstack bit1129 jstack
jstack是用于获得当前运行的Java程序所有的线程的运行情况(thread dump），不同于jmap用于获得memory dump [hadoop@hadoop sbin]$ jstack Usage: jstack [-l] <pid> (to connect to running process) jstack -F
jboss 5.1启停脚本　动静分离部署 ronin47
以前启动jboss，往各种xml配置文件，现只要运行一句脚本即可。start nohup sh /**/run.sh -c servicename -b ip -g clustername -u broatcast jboss.messaging.ServerPeerID=int -Djboss.service.binding.set=p
UI之如何打磨设计能力? brotherlamp UI ui教程 ui自学 ui资料 ui视频
在越来越拥挤的初创企业世界里，视觉设计的重要性往往可以与杀手级用户体验比肩。在许多情况下，尤其对于 Web 初创企业而言，这两者都是不可或缺的。前不久我们在《右脑革命：别学编程了，学艺术吧》中也曾发出过重视设计的呼吁。如何才能提高初创企业的设计能力呢?以下是 9 位创始人的体会。 1.找到自己的方式如果你是设计师，要想提高技能可以去设计博客和展示好设计的网站如D-lists或
三色旗算法 bylijinnan java 算法
import java.util.Arrays; /** 问题：假设有一条绳子，上面有红、白、蓝三种颜色的旗子，起初绳子上的旗子颜色并没有顺序，您希望将之分类，并排列为蓝、白、红的顺序，要如何移动次数才会最少，注意您只能在绳子上进行这个动作，而且一次只能调换两个旗子。网上的解法大多类似：在一条绳子上移动，在程式中也就意味只能使用一个阵列，而不使用其它的阵列来
警告:No configuration found for the specified action: \'s chiangfai configuration
1.index.jsp页面form标签未指定namespace属性。  <%@taglib prefix="s" uri="/struts-tags"%> ... <s:form action="submit" method="post"&g
redis -- hash_max_zipmap_entries设置过大有问题 chenchao051 redis hash
使用redis时为了使用hash追求更高的内存使用率，我们一般都用hash结构，并且有时候会把hash_max_zipmap_entries这个值设置的很大，很多资料也推荐设置到1000，默认设置为了512，但是这里有个坑 #define ZIPMAP_BIGLEN 254 #define ZIPMAP_END 255 /* Return th
select into outfile access deny问题 daizj mysql txt 导出数据到文件
本文转自：http://hatemysql.com/2010/06/29/select-into-outfile-access-deny%E9%97%AE%E9%A2%98/ 为应用建立了rnd的帐号，专门为他们查询线上数据库用的，当然，只有他们上了生产网络以后才能连上数据库，安全方面我们还是很注意的，呵呵。授权的语句如下： grant select on armory.* to rn
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
<?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('This example should only be run from a Web Brows
美国电影超短200句 dcj3sjt126com 电影
1. I see．我明白了。2. I quit! 我不干了!3. Let go! 放手!4. Me too．我也是。5. My god! 天哪!6. No way! 不行!7. Come on．来吧(赶快)8. Hold on．等一等。9. I agree。我同意。10. Not bad．还不错。11. Not yet．还没。12. See you．再见。13. Shut up!
Java访问远程服务 dyy_gusi httpclient webservice get post
随着webService的崛起，我们开始中会越来越多的使用到访问远程webService服务。当然对于不同的webService框架一般都有自己的client包供使用，但是如果使用webService框架自己的client包，那么必然需要在自己的代码中引入它的包，如果同时调运了多个不同框架的webService，那么就需要同时引入多个不同的clien
Maven的settings.xml配置 geeksun settings.xml
settings.xml是Maven的配置文件，下面解释一下其中的配置含义： settings.xml存在于两个地方： 1.安装的地方：$M2_HOME/conf/settings.xml 2.用户的目录：${user.home}/.m2/settings.xml 前者又被叫做全局配置，后者被称为用户配置。如果两者都存在，它们的内容将被合并，并且用户范围的settings.xml优先。
ubuntu的init与系统服务设置 hongtoushizi ubuntu
转载自： http://iysm.net/?p=178 init Init是位于/sbin/init的一个程序，它是在linux下，在系统启动过程中，初始化所有的设备驱动程序和数据结构等之后，由内核启动的一个用户级程序，并由此init程序进而完成系统的启动过程。 ubuntu与传统的linux略有不同，使用upstart完成系统的启动，但表面上仍维持init程序的形式。运行
跟我学Nginx+Lua开发目录贴 jinnianshilongnian nginx lua
使用Nginx+Lua开发近一年的时间，学习和实践了一些Nginx+Lua开发的架构，为了让更多人使用Nginx+Lua架构开发，利用春节期间总结了一份基本的学习教程，希望对大家有用。也欢迎谈探讨学习一些经验。目录第一章安装Nginx+Lua开发环境第二章 Nginx+Lua开发入门第三章 Redis/SSDB+Twemproxy安装与使用第四章 L
php位运算符注意事项 home198979 位运算 PHP &
$a = $b = $c = 0; $a & $b = 1; $b | $c = 1 问a,b,c最终为多少? 当看到这题时，我犯了一个低级错误，误以为位运算符会改变变量的值。所以得出结果是1 1 0 但是位运算符是不会改变变量的值的，例如： $a=1;$b=2; $a&$b; 这样a,b的值不会有任何改变
Linux shell数组建立和使用技巧 pda158 linux
1.数组定义　　[chengmo@centos5 ~]$ a=(1 2 3 4 5) 　　[chengmo@centos5 ~]$ echo $a 　　1 　　一对括号表示是数组，数组元素用“空格”符号分割开。　　 2.数组读取与赋值　　得到长度：　　[chengmo@centos5 ~]$ echo ${#a[@]} 　　5 　　用${#数组名[@或
hotspot源码(JDK7) ol_beta java HotSpot jvm
源码结构图，方便理解： ├─agent Serviceab
Oracle基本事务和ForAll执行批量DML练习 vipbooks oracle sql
基本事务的使用：从账户一的余额中转100到账户二的余额中去，如果账户二不存在或账户一中的余额不足100则整笔交易回滚 select * from account; -- 创建一张账户表 create table account( -- 账户ID id number(3) not null, -- 账户名称 nam