sdfjlkjsdfsaldfsdf

2000个软件开发领域的高频特殊词及精选例句（一）

superword是一个Java实现的英文单词分析软件，主要研究英语单词音近形似转化规律、前缀后缀规律、词之间的相似性规律等等。

1、单词 hadoop 的匹配文本：

1. Subash D'Souza is a professional software developer with strong expertise in crunching big data using Hadoop/HBase with Hive/Pig. Apache Flume Distributed Log Collection for Hadoop

1. The growing pains you’re facing have been faced by other people and there are systems to handle it: Hadoop and HBase. Monitoring with Ganglia

1. BigTable, Hadoop, SimpleDB, and memcached restrict object lookups by key for exactly these reasons. CouchDB The Definitive Guide

1. The success of MR led to a wide range of third-party implementations (notably open-source Hadoop [15]), and a number of hybrid systems that combine parallel DBMSs with MR, offered by vendors like Aster, Cloudera, Greenplum, and Vertica. Dremel - Interactive Analysis of WebScale Datasets

1. As such, Hadoop has similar limitations as MapReduce when developing multi-stage pipelines. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Edward Capriolo is a member of the Apache Software Foundation and a committer for the Hadoop-Hive project. Cassandra High Performance Cookbook

1. This sort of dependency is similar in principle to the compile-time dependency you might have on any API you are programming, like what you might see if you are writing MapReduce jobs for execution on a Hadoop cluster or writing enterprise integration code with Apache Camel. Gradle Beyond The Basics

1. At Facebook, Hadoop has traditionally been used in conjunction with Hive for storage and analysis of large data sets. Apache Hadoop Goes Realtime at Facebook

1. Chapter 12, Integrating Hadoop In this chapter, written by Jeremy Hanna, we put Cassandra in a larger context and see how to integrate it with the popular implementation of Google’s Map/Reduce algorithm, Hadoop. Cassandra The Definitive Guide

1. Outside Google, projects such as Apache Hadoop have been gaining popularity rapidly. Tenzing - A SQL Implementation On The MapReduce Framework

2、单词 http 的匹配文本：

1. To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com. Getting Started with CouchDB

1. That’s a promise! So make sure to visit the book’s website at http://books.couchdb.org/relax to keep up-to-date. CouchDB The Definitive Guide

1. Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. Cassandra High Performance Cookbook

1. Ceph is licensed under the LGPL and is available at http://ceph.sourceforge.net/ Ceph A Scalable, High-Performance Distributed File System

1. For information on translations, please e-mail [email protected], or visit http://www.apress.com. Beginning CouchDB

1. For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com. Scaling CouchDB

1. It allows you to send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster. Elasticsearch in Action MEAP V07 cHAPTER 1

1. A request to Elasticsearch consists of the same parts as any HTTP request. Elasticsearch The Definitive Guide

1. Finding Out More If you’d like to find out more about Cassandra, and to get the latest updates, visit this book’s companion website at http://www.cassandraguide.com. Cassandra The Definitive Guide

1. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action

3、单词 api 的匹配文本：

1. An HTTP-based REST API makes communicating with the database easier, because so many modern environments are capable of talking HTTP. Getting Started with CouchDB

1. The CouchDB API is designed to provide a convenient but thin wrapper around the database core. CouchDB The Definitive Guide

1. In addition to this, the book you are holding will guide you through ElasticSearch Java API, showing how to use it, not only when it comes to CRUD operations but also when it comes to cluster and indices maintenance and manipulation. Mastering ElasticSearch

1. The CLI is a helpful interactive facade on top of the Cassandra API. Cassandra High Performance Cookbook

1. Another breakthrough came in April 2006 when it was announced that CouchDB would be solely accessible via an HTTP-based RESTful API. Beginning CouchDB

1. Since CouchDB uses an HTTP API, standard HTTP load balancing software or hardware can be used. Scaling CouchDB

1. Even though Elasticsearch is written in Java, there's more than just a Java API that lets you work with it. Elasticsearch in Action MEAP V07 cHAPTER 1

1. More information about the Java clients can be found in the Java API section of the Guide. Elasticsearch The Definitive Guide

1. We present an overview of what’s different about Cassandra’s model for querying and updating data, and then get to work using the API. Cassandra The Definitive Guide

1. Heterogeneous application integration—The ActiveMQ broker is written using the Java language, so naturally a Java client API is provided. ActiveMQ in Action

4、单词 xml 的匹配文本：

1. So what do you do if your data contains a newline character, like some XML? Using a sequence file can solve this problem because it uses non-printable characters for delimiters. Apache Flume Distributed Log Collection for Hadoop

1. Shows A show converts a single document into another format, usually HTML, although you can output the document in any format, including JSON or XML, to suit your. Getting Started with CouchDB

1. An XML-format dump of the entire cluster state can be requested by a remote poller from any single node in the cluster on port 8649 Monitoring with Ganglia

1. The design doc fields show and list contain functions used to transform raw JSON into HTML, XML, or other Content-Types. CouchDB The Definitive Guide

1. Many commercial DBMSs support storage of nested data using XML (e.g., [19]) Dremel - Interactive Analysis of WebScale Datasets

1. In the XML world he is known as the developer of DB Generator for the Apache Cocoon project, the open source projects DBPrism and DBPrism CMS, the Lucene-Oracle integration by using Oracle JVM Directory implementation and in the Restlet.org project the Oracle XDB Restlet Adapter, an alternative to write native REST web services inside the database-resident JVM. Mastering ElasticSearch

1. You can use the preceding build.xml with the Free-Form project type. Cassandra High Performance Cookbook

1. It is simple to map data from an existing database or XML to a JSON structure. Beginning CouchDB

1. This might include reducing or reorganizing joins, throwing out resource-intensive features such as XML processing within a stored procedure, and so forth. Cassandra The Definitive Guide

1. ActiveMQ also supports the use of Apache Camel within the broker’s XML configuration file. ActiveMQ in Action

5、单词 solr 的匹配文本：

1. SOLr: a rich Set of Interfaces to Lucene The SOLR project, http://lucene.apache.org/solr/, is a stand-alone, enterprise-grade search service built on top of Lucene. Pro Hadoop

1. He is also a co-founder of the solr.pl site which publishes information and tutorials about Solr and Lucene library and is the co-author of the ElasticSearch Server book published by Packt Publishing. Mastering ElasticSearch

1. Solr is a popular frontend that provides a web service for Lucene as well as caching warming and other advanced capabilities. Cassandra High Performance Cookbook

1. The name is a unique label for the module itself, and is often the same as the project name (like solr-core, spring-web, or junit) Gradle Beyond The Basics

1. If you've heard about Lucene before, you've probably also heard about Apache Solr, which is also an open-source, distributed search engine based on it. Elasticsearch in Action MEAP V07 cHAPTER 1

1. In this chapter we'll be using Apache Lucene-based Apache Solr and ElasticIndex for indexing purposes. Hadoop MapReduce Cookbook

1. If the reducer decides that all of the queue-ids for a message-id create a complete message path, then the message is serialized and queued for the SolrOutputFormat. Hadoop：The Definitive Guide 3rd Edition

1. Common choices are the Apache Lucene-based solutions, such as Lucene itself, or Solr, a high-performance enterprise search server.‡ Similar to the indexing solutions, there are a few possible approaches: HBase：The Definitive Guide

1. Comprehensive information on Apache Solr 3 with examples and tips so you can focus on the important parts. Apache Kafka

1. Integration with Apache Lucene/Solr is available to store external indexes and perform fast global searches. Cassandra The Definitive Guide

6、单词 hbase 的匹配文本：

1. Subash D'Souza is a professional software developer with strong expertise in crunching big data using Hadoop/HBase with Hive/Pig. Apache Flume Distributed Log Collection for Hadoop

1. The growing pains you’re facing have been faced by other people and there are systems to handle it: Hadoop and HBase. Monitoring with Ganglia

1. The subsystems should transfer data from different data sources to Hadoop-compatible data storage systems such as HDFS and HBase. Hadoop Operations and Cluster Management Cookbook

1. Hoya is a Java-tool designed to leverage YARN to spin up dynamic HBase clusters[21] on demand. Apache Hadoop YARN Yet Another Resource Negotiator

1. Installing HBase HBase is a highly scalable NoSQL data store that supports columnar-style data storage. Hadoop MapReduce Cookbook

1. A single master node runs the master HDFS, MapReduce, and HBase daemons. Hadoop in Practice

1. Given our high confidence in the reliability and scalability of HDFS, we began to explore Hadoop and HBase for such applications. Apache Hadoop Goes Realtime at Facebook

1. ZooKeeper is often a major component in other Hadoop-related projects, such as HBase and Katta. Hadoop in Action

1. Graph databases such as Neo4J and the set of databases derived at least in part from the design of Google’s Bigtable database (such as MongoDB, HBase, Hypertable, and Redis) all are focused slightly less on Availability and more on ensuring Consistency and Partition Tolerance. Cassandra The Definitive Guide

1. Another member of the broader Hadoop family is HBase, an open-source implementation of another Google technology. Hadoop Beginners Guide

7、单词 hdfs 的匹配文本：

1. After that we’ll deep-dive into the details on many of the more useful Flume components, including the very important File Channel for persistence of in-flight data records and the HDFS Sink for buffering and writing data into HDFS, the Hadoop Distributed File System. Apache Flume Distributed Log Collection for Hadoop

1. This file is organized into different contexts: jvm, rpc, hdfs, mapred, and hbase. Monitoring with Ganglia

1. As Hadoop’s fault tolerance improved, persistent HDFS clusters became the norm. Apache Hadoop YARN Yet Another Resource Negotiator

1. Chapter 11, Hadoop and Cassanda: Hadoop is a distributed file system, HDFS that provides high throughput and redundant storage and MapReduce, a software framework for distributed processing of large data sets on compute clusters. Cassandra High Performance Cookbook

1. Chapter 2, Advanced HDFS, introduces a set of advanced HDFS operations that would be useful when performing large-scale data processing with Hadoop MapReduce as well as with non-MapReduce use cases. Hadoop MapReduce Cookbook

1. After you’re used to thinking in MapReduce, the next challenge is typically related to the logistics of working with Hadoop, such as how to move data in and out of HDFS, and effective and efficient ways to work with data in Hadoop. Hadoop in Practice

1. As such, there has been less emphasis on making Hadoop performant for random access workloads by providing low latency access to HDFS. Apache Hadoop Goes Realtime at Facebook

1. An appendix contains a listing of HDFS commands along with their descriptions and usage. Hadoop in Action

1. Because Hadoop has some unfamiliar terminology, here are some useful definitions: HDFS. Cassandra The Definitive Guide

1. Parts of Hadoop The top-level Hadoop project has many component subprojects, several of which we'll discuss in this book, but the two main ones are Hadoop Distributed File System (HDFS) and MapReduce. Hadoop Beginners Guide

8、单词 mysql 的匹配文本：

1. He has worked with Perl/ PHP/Python, primarily for coding and MySQL/Oracle as the backend, for several years prior to moving into Hadoop fulltime. Apache Flume Distributed Log Collection for Hadoop

1. Formerly as a technical writer, he spent time building both the documentation system and writing content for MySQL and the MySQL groups within Sun and then Oracle. Getting Started with CouchDB

1. You should have PHP support for MySQL installed on your gweb server before attempting to configure this support. Monitoring with Ganglia

1. We've moved a large and critical application suite from MySQL to F1 F1 - The Fault-Tolerant Distributed RDBMS

1. FlumeJava’s parallel collections abstract away the details of how data is represented, including whether the data is represented as an in-memory data structure, as one or more files, or as an external storage service such as a MySql database or a Bigtable [5] FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. About Packt Publishing Packt, pronounced 'packed', published its first book "Mastering phpMyAdmin for Effective MySQL Management" in April 2004 and subsequently continued to specialize in publishing highly focused books on specific technologies and solutions. Cassandra High Performance Cookbook

1. The whole process involved running Map/Reduce[7] jobs against the MySQL data files, indexing them and then storing the reverse-index in Cassandra. Cassandra A Decentralized Structured Storage System

1. For several decades now, relational database management systems have reigned supreme in application and database development, with the likes of Oracle, SQL Server, and MySQL being used in every type of software application imaginable. Beginning CouchDB

1. Once you start to scale many other data stores (MySQL, Bigtable), some nodes need to be set up as masters in order to organize other nodes, which are set up as slaves. Cassandra The Definitive Guide

1. Here’s an example of an ActiveMQ configuration for the JDBC message store using the MySQL database: ActiveMQ in Action

9、单词 apache 的匹配文本：

1. Then I was introduced to Apache CouchDB, and I rediscovered the passion I had when developing applications on the Electron. Getting Started with CouchDB

1. Apache CouchDB is one of a new breed of database management systems. CouchDB The Definitive Guide

1. Nevertheless, I hope that by reading this book you'll be able to easily get into all the details about ElasticSearch and underlying Apache Lucene, and I also hope that it will let you get the desired knowledge easier and faster. Mastering ElasticSearch

1. Over 150 recipes to design and optimize large-scale Apache Cassandra deployments. Cassandra High Performance Cookbook

1. Apache CouchDB is an exciting database management system that is steadily improving with each new version. Beginning CouchDB

1. Instead, we will focus on installing and configuring the Apache HTTP Server as a load balancer. Scaling CouchDB

1. It's open-source, distributed and it's built on top of Apache Lucene, an open-source search engine library, which allows you to implement search functionality in your own Java application. Elasticsearch in Action MEAP V07 cHAPTER 1

1. It did not come out from any of the Apache releases yet. Namenode High Availability

1. Cassandra was accepted into the Apache Incubator, and by the time it graduated in March 2010, it had become a true open source success story, with committers from Rackspace, Digg, Twitter, and other companies that wouldn’t have written their own database from scratch, but together built something important. Cassandra The Definitive Guide

1. Chapter 11 presents advanced features provided by ActiveMQ such as wildcards and composite destinations, advisory messages, virtual topics, some info about ActiveMQ plug-ins, and an introduction to message routing with Apache Camel. ActiveMQ in Action

10、单词 gradle 的匹配文本：

1. Gradle has already amassed a large audience, and we’ve seen tremendous growth over the last two years, but we’re getting ready for a still larger influx of end-users. Gradle in Action

1. The conference is visited by 1000 Java developers and he got the chance to educate some of them about the greatness of Gradle and Groovy. Gradle Effective Implementation Guide

1. The build tool you are going to use in this book is Gradle (http://gradle.org) Pro Spring MVC

1. His current topics of research center around project automation: build tools (Maven, Leiningen, Gradle), distributed version control (Git), Continuous Integration (Jenkins) and Quality Metrics (Sonar) Jenkins The Definitive Guide

1. Chances are you might be using it already or have switched to a more modern system such as Maven or Gradle. Sonar Code Quality Testing Essentials

1. The first thing that is worth mentioning is that the current versions of Spring Security use Gradle as the build tool. Pro Spring Security

1. Gradle Beyond the Basics, the image of a Belgian shepherd dog, and related trade dress are trademarks of O’Reilly Media, Inc. Gradle Beyond The Basics

1. It is straightforward to use another build tool, such as Gradle or Ant with Ivy, as long as you use the same set of dependencies defined here.) For building MapReduce jobs you only need to have the hadoop-core dependency, which contains all the Hadoop classes. Hadoop：The Definitive Guide 3rd Edition

1. Additionally, you will need to have Maven or Gradle installed to pull in the dependencies required to work with the available sample code. Getting Started with Google Guava

1. Building and Testing with Gradle, the image of a bush wren, and related trade dress are trademarks of O’Reilly Media, Inc. Building and Testing with Gradle

11、单词 url 的匹配文本：

1. Depending on your platform and installation type, your browser may have already been opened with the correct link; if not, you can access Futon, the built-in administration interface to CouchDB by going to the URL: http://127.0.0.1:5984/_utils/index.html Futon is a web-based interface to the main functionality in CouchDB and provides support for editing the configuration information, creating databases, documents, design documents (and therefore views, shows and lists) and starting and requesting replication. Getting Started with CouchDB

1. To use it, invoke the URL along with key/value pairs that define events. Monitoring with Ganglia

1. It is followed by the URL that specifies your CouchDB IP address and port. CouchDB The Definitive Guide

1. ElasticSearch assumes that data is sent in the URL, or as the request body as JSON document (http://en.wikipedia.org/wiki/JSON) Mastering ElasticSearch

1. Open your favorite web browser, and navigate to the URL http://127.0.0.1:5984/_utils. Beginning CouchDB

1. Send an If-None-Match HTTP header containing the Etag’s value with subsequent requests to the same URL (Etags are only valid for a given URL) Scaling CouchDB

1. This method is often referred to as a query string search, since we pass the search as a URL query string parameter: Elasticsearch The Definitive Guide

1. Next, go to the GitHub page hosting the project you’re interested in and find the project’s Git URL. Cassandra The Definitive Guide

1. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action

1. It provides aggregate statistics, such as the number of unique visitors per day and the page views per URL per day, as well as site-tracking reports, such as the percentage of users that made a purchase, given that they earlier viewed a specific page. Bigtable - A Distributed Storage System for Structured Data

12、单词 schema 的匹配文本：

1. Just like in a database that requires a schema, there are some conventions and field types that you should consider including in your documents. Getting Started with CouchDB

1. We’ll show you how CouchDB’s schema-free document model is a better fit for common applications, how the built-in query engine is a powerful way to use and process your data, and how CouchDB’s design lends itself to modularization and scalability. CouchDB The Definitive Guide

1. Each document may have a different set of fields; there is no schema or imposed structure. Mastering ElasticSearch

1. Chapter 6, Schema Design: The Cassandra data model is designed for storing large amounts of data across many nodes. Cassandra High Performance Cookbook

1. Although the system supports the notion of multiple tables all deployments have only one table in their schema. Cassandra A Decentralized Structured Storage System

1. Although the strict schema-based structure these databases adhere to is required in some web applications, such as transaction systems, it is not a good fit for many modern web projects, such as blogs, wikis, and discussion forums, which by their nature are a better fit to a document-oriented database. Beginning CouchDB

1. CouchDB is a schema-less database, giving you much flexibility in designing the document boundaries for your data. Scaling CouchDB

1. Every type has its own mapping or schema definition, which defines the data structure for documents of that type, much like the columns in a database table. Elasticsearch The Definitive Guide

1. It is highly available and offers a schema-free data model. Cassandra The Definitive Guide

1. The JDBC message store uses a schema consisting of three tables. ActiveMQ in Action

13、单词 metadata 的匹配文本：

1. The HEAD method returns metadata about the document or other object being accessed within the HTTP headers returned. Getting Started with CouchDB

1. The actual result is JSON-encoded and contains a little more metadata: CouchDB The Definitive Guide

1. Recipes cover how the CLI is used to make changes to the metadata such as key spaces, column families, and cache settings. Cassandra High Performance Cookbook

1. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs) Ceph A Scalable, High-Performance Distributed File System

1. The metadata about the ranges a node is responsible is cached locally at each node and in a fault-tolerant manner inside Zookeeper - this way a node that crashes and comes back up knows what ranges it was responsible for. Cassandra A Decentralized Structured Storage System

1. In addition to data fields, each document includes metadata that is maintained by the CouchDB server itself, such as a revision number and more. Beginning CouchDB

1. A document can be a blog post with all its metadata, a user with all its metadata, or any other type of data you plan to search for. Elasticsearch in Action MEAP V07 cHAPTER 1

1. As storage clusters grow to thousands of devices or more, consistent management of data placement, failure detection, and failure recovery places an increasingly large burden on client, controller, or metadata directory nodes, limiting scalability. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. The following list shows only the commands related to metadata and configuration; there are other commands for getting and setting values that we explore later. Cassandra The Definitive Guide

1. The headers provide metadata about the message used by both clients and JMS providers. ActiveMQ in Action

14、单词 mongodb 的匹配文本：

1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop

1. Like MongoDB itself, this book is very straightforward and approachable. MongoDB The Definitive Guide 2nd Edition

1. Hadoop was a logical choice since it a widely used system, but we hope that users of MongoDB’s MapReduce and other MapReduce implementations will be able to extrapolate the examples in this text to their particular system of choice. MapReduce Design Patterns

1. The goal of Drill is to create an interactive analysis platform for Big Data using a standard SQL-supporting relational database management system (RDBMS), Hadoop, and other NoSQL implementations (including Cassandra and MongoDB) Professional Hadoop Solutions

1. Exporting data from HDFS into MongoDB This recipe will use the MongoOutputFormat class to load data from an HDFS instance into a MongoDB collection. Hadoop RealWorld Solutions Cookbook

1. You can do that with Elasticsearch, and it works especially well with document-oriented stores like MongoDB, because data is represented in Elasticsearch as documents, too. Elasticsearch in Action MEAP V07 cHAPTER 1

1. MongoDB also provides auto-sharding capabilities to manage failover and node balancing. Cassandra The Definitive Guide

1. The format of the row key is an account ID (which is a MongoDB ObjectId) and a content item ID (witha MD5 hash of the URL of the content item being tracked) separated by an underscore which we split on to provide that data in the result set. Programming Hive

1. That MongoDB also based its query language on JSON lent a high degree of comfort and harmony to the usage of this new database. MongoDB in Action

1. These systems can also be grouped with the NoSQL databases (MongoDB [27], to name one of many): both offer higher performance and scale better than traditional databases, but provide weaker semantics. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

15、单词 jvm 的匹配文本：

1. The downside of using this channel is that an agent failure (hardware problem, power outage, JVM crash, Flume restart, and so on) results in loss of data. Apache Flume Distributed Log Collection for Hadoop

1. Java Virtual Machine Metrics The sFlow Java Virtual Machine (JVM) metrics (Table 8-3) are based on the metrics exposed through the Java Management Extensions (JMX) interface, ensuring consistency with existing JMX-based monitoring systems. Monitoring with Ganglia

1. Groovy is a language for the Java Virtual Machine (JVM), such as Java and Scala. Gradle Effective Implementation Guide

1. In the XML world he is known as the developer of DB Generator for the Apache Cocoon project, the open source projects DBPrism and DBPrism CMS, the Lucene-Oracle integration by using Oracle JVM Directory implementation and in the Restlet.org project the Oracle XDB Restlet Adapter, an alternative to write native REST web services inside the database-resident JVM. Mastering ElasticSearch

1. In addition to JVM internals, applications can maintain their own counters and provide operations that the user can trigger remotely. Cassandra High Performance Cookbook

1. The File constructor will create an absolute path out of the supplied parameter, but the constructor argument will be considered as if it is relative to the directory in which the JVM started up.1 This directory may change if you are invoking Gradle directly, through the wrapper, through an IDE, or through integration with a Continuous Integration server. Gradle Beyond The Basics

1. It would work, but would need a lot of RAM to hold copies of essentially the same data, and would create many more data structures that the JVM would have to spend time garbage collecting. Elasticsearch The Definitive Guide

1. We pass the file locations to the JVM by pasting the following code at the bottom of this file. Cassandra The Definitive Guide

1. The SSL cipher suites for the ActiveMQ SSL transport are provided by the JVM. ActiveMQ in Action

1. Groovy is a dynamic language of the JVM, similar in many respects to Java, but with some important differences. Building and Testing with Gradle

16、单词 plugin 的匹配文本：

1. While not covered in the Flume documentation, it is common to enable JMX in your Flume JVM (http://bit.ly/javajmx) and use the Nagios JMX plugin (http://bit.ly/nagiosjmx) to alert on performance abnormalities in your Flume agents. Apache Flume Distributed Log Collection for Hadoop

1. Collectively, these enable Nagios to query metric information from gweb instead of relying on remote execution systems such as Nagios Service Check Acceptor (NSCA) and Nagios Remote Plugin Executor (NRPE) Monitoring with Ganglia

1. We’ll explore the use of the Java plugin for standardized and nonconventional use cases and examine productivity tools for fast development turnaround. Gradle in Action

1. We will see how we can decouple the definition and usage of a custom task and plugin into separate source files. Gradle Effective Implementation Guide

1. Finally, we will take a deep look at ElasticSearch extensions by developing a custom river plugin for data indexing and a custom analysis plugin for data analysis during query and index time. Mastering ElasticSearch

1. Using the Cassandra maven plugin With maven, it is simple to create a software project that has Cassandra support built in. Cassandra High Performance Cookbook

1. The base build from which we will derive FileCollection examples apply plugin: 'java' Gradle Beyond The Basics

1. BookKeeper Journal Manager (BKJM) is NameNode plugin implementation, involves BK client to read/write to/from BK cluster. Namenode High Availability

1. The Consumer is executed using the maven-exec-plugin (http://mng.bz/bf7g) by passing it some system properties as arguments using the exec.args property. ActiveMQ in Action

1. In Gradle, a plugin may provide configurable build actions by introducing one or more tasks, but it is fundamentally a means of extending the build DSL to include the domain encompassed by the plug-in. Building and Testing with Gradle

17、单词 sql 的匹配文本：

1. He also has experience optimizing SQL queries for database interactions. Apache Flume Distributed Log Collection for Hadoop

1. By working with the grain and letting CouchDB do this for us, we save ourselves a tremendous amount of CPU cycles that would otherwise have been spent serializing object graphs from SQL, converting them into domain objects, and using those objects to do application-level validation. CouchDB The Definitive Guide

1. Second, Dremel provides a high-level, SQL-like language to express ad hoc queries. Dremel - Interactive Analysis of WebScale Datasets

1. Using analogies from the SQL world, index is something similar to a database. Mastering ElasticSearch

1. Her interests and experiences lie across multiple tiers, from web frontend GUI to middleware, from middleware to backend SQL RDBMS and NoSQL data storage. Cassandra High Performance Cookbook

1. He introduced the concept of JavaScript frameworks and Ajax to the company’s web division in 2008, developing a PL/SQL and JavaScript framework that is now used in every Core web product. Beginning CouchDB

1. You can deploy a search engine on top of a relational database, to create indexes and speed up the SQL queries. Elasticsearch in Action MEAP V07 cHAPTER 1

1. However, our core data model is not restricted to our particular RPC framework; we also trace activities such as SMTP sessions in Gmail, HTTP requests from the outside world, and outbound queries to SQL servers. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. It is much like the INSERT keyword in SQL except that, if the document already exists, then the new document would replace the old. Elasticsearch The Definitive Guide

1. An important one is the Structured Query Language (SQL), which is feature-rich and uses a simple, declarative syntax. Cassandra The Definitive Guide

18、单词 implementations 的匹配文本：

1. Some custom implementations are covered should your needs necessitate it. Apache Flume Distributed Log Collection for Hadoop

1. Implementations In addition to gmetric, there are several other metric-generating utilities that are available. Monitoring with Ganglia

1. This header is often useful in web development when there are known errors in client implementations that a server might want to prepare the response for. CouchDB The Definitive Guide

1. The success of MR led to a wide range of third-party implementations (notably open-source Hadoop [15]), and a number of hybrid systems that combine parallel DBMSs with MR, offered by vendors like Aster, Cloudera, Greenplum, and Vertica. Dremel - Interactive Analysis of WebScale Datasets

1. We learned how to choose the right Lucene Directory implementation and which of the available implementations is the right choice in our environment. Mastering ElasticSearch

1. Other JVM implementations and ports exist for other operating systems. Cassandra High Performance Cookbook

1. This behavior is explicitly captured by the readdirplus [31] extension, which returns lstat results with directory entries (as some OS-specific implementations of getdir already do) Ceph A Scalable, High-Performance Distributed File System

1. Hence we do not need to deal with or handle the concurrency issues that exist in B-Tree based database implementations. Cassandra A Decentralized Structured Storage System

1. The strategy is outlined in the common abstract class, allowing different implementations of an algorithm (different strategies for accomplishing the same work) Cassandra The Definitive Guide

1. The JMS API minimizes the amount of enterprise messaging knowledge a Java programmer is required to possess in order to develop complex messaging applications, while still maintaining a certain amount of portability across JMS provider implementations. ActiveMQ in Action

19、单词 osgi 的匹配文本：

1. Preventing any kind of dependencies between the modules allows the new module systems, such as OSGi, to dynamically reload module B without reloading module A so long as the appropriate framework hooks are in place and all communications between the modules A and B happen via the core-api module. Scala in Depth

1. Gradle provides support for Java, Groovy, Scala, Web, and OSGi projects, out of the box. Gradle Effective Implementation Guide

1. However, SpringSource also hosts its own Maven repository and provides Spring libraries in the form of Enterprise Bundle Repositories (EBRs), which are OSGi compatible. Pro Spring 3

1. Imagine being able to replace the PDF parser with a commercially licensed alternative, upgrade the plain text parser for a version with the latest bug fixes, and add a new parser for a custom file format without having to restart or even reload any other components of the system! This is what the OSGi framework makes possible. Tika in Action

1. Chapter 4 says more about this topic, but for now, you just need to know that Spring Batch can run anywhere the Spring Framework can run: in its own Java process, in a web container, in an application, or even in an Open Services Gateway initiative (OSGi) container. Spring Batch in Action

1. However, I’d noticed the enterprise world’s interest in OSGi, so I too was interested. Spring Dynamic Modules in Action

1. The application server is easy to use (ZIP installer and NetBeans/ Eclipse/IntelliJ integration), lightweight (downloads starting at 37 MB, small disk/ memory footprint), and modular (OSGi-based, containers start on demand) Java EE 7 Essentials

1. This provides a series of architectural advantages, such as an OSGi-based component model and being able to run JRockit Mission Control both as a standalone application and inside the Eclipse IDE. Oracle JRockit The Definitive Guide

1. Chapter 17 presents the main principles behind the Open Services Gateway initiative (OSGi) and demonstrates how its approach to modularity and its service registry can be utilized in Spring Integration applications. Spring Integration in Action

1. Because add-ons are OSGi components, we spend time detailing enough of OSGi to be dangerous, and then we dive right in and create three add-ons: a Norwegian language addon, a Roo wrapper add-on to expose a non-OSGi JAR to the Roo system, and a “Simple” add-on to provide jQuery support. Spring Roo in Action

20、单词 dependencies 的匹配文本：

1. To install CouchDB for the first time, including any dependencies: Getting Started with CouchDB

1. For other versions of Mac OS X, the dependencies might vary. Monitoring with Ganglia

1. The appeal of this approach is that CouchDB can serve the whole application without dependencies on a complex environment such as might be maintained on a production web server. CouchDB The Definitive Guide

1. Its core comes as a single file of Java library with no dependencies, and allows you to index documents and search them with its out of the box full text search capabilities. Mastering ElasticSearch

1. As a result, you do not have to run a list of targets sequentially; the dist target will run its dependents such as compile and init and their dependencies in proper order. Cassandra High Performance Cookbook

1. Even if your system’s package manager does not include CouchDB itself in its repository, it’s quite likely that it does feature some of Couch’s dependencies (Erlang, ICU, Spidermonkey, libcurl, and so on) Beginning CouchDB

1. Dapper traces were used to assess the total query cost, and prompted an effort to redesign operations in order to minimize load on their system dependencies. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. The use of a separate building ensures both that the backup will survive building damage, and that the backups introduce no cyclic dependencies in the system; a GFS cell in the same building potentially might rely on the Chubby cell for electing its master. Chubby - The lock service for loosely-coupled distributed systems

1. Because Ivy takes care of all the dependencies, it’s easy to build Cassandra once you have the source. Cassandra The Definitive Guide

1. Loosely coupled architectures, on the other hand, exhibit fewer dependencies, making them better at handling unforeseen changes. ActiveMQ in Action

21、单词 runtime 的匹配文本：

1. You will need to include the flume-ng-sdk JAR in the classpath of your Java application at runtime to use Flume's Log4J Appender. Apache Flume Distributed Log Collection for Hadoop

1. Megastore emphasizes cost-transparent APIs with runtime costs that match application developers' intuitions. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. So rather than reinvent a new type of modular interface, why not just reuse a tried and true interface? Of course, the fact that gmond is built on top of the Apache Portability Runtime (APR) libraries made the Apache way of implementing a modular interface an obvious fit. Monitoring with Ganglia

1. We use CouchDB’s JavaScript runtime’s built-in toJSON function in our equality test, which is better than testing for raw equality. CouchDB The Definitive Guide

1. Please remember that the field data cache is not built during indexing, but can be rebuilt while querying and because of that we can change filtering during runtime by updating the fieldata section using the mappings API. Mastering ElasticSearch

1. Of course, it would be much nicer if you could define the URL and title you want to add to the database at runtime. Beginning CouchDB

1. Their implementations are connected to the Sawzall runtime and interact with system internals, managing low-level data formats and distributed computations. Interpreting the Data Parallel Analysis with Sawzall

1. Dapper penetration can be assessed in two dimensions: the fraction of production processes that can generate Dapper traces (i.e., those that are linked with Dapperinstrumented runtime libraries) and the fraction of production machines running Dapper’s trace collection daemon. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. It’s often useful to contextualize events at runtime against other events that might be related in order to infer some conclusion to support business decision making. Cassandra The Definitive Guide

1. The Apache Geronimo application server is a fully certified Java EE 5 runtime that uses many leading open source projects for various aspects of the application server. ActiveMQ in Action

22、单词 jenkins 的匹配文本：

1. This chapter discusses the installation and configuration procedures needed to run Gradle on Jenkins, an open-source CI server. Gradle in Action

1. Over 80 recipes to maintain, secure, communicate, test, build, and improve the software development process with Jenkins. Jenkins Continuous Integration Cookbook

1. Then, we are going to see how the continuous integration servers Jenkins, JetBrains TeamCity, and Atlassian Bamboo support Gradle. Gradle Effective Implementation Guide

1. Allows you to run SBT build actions from Jenkins and lets you configure SBT using Jenkins. Scala in Action

1. Seven years ago, I wrote the first line of code that started this whole project that is now known as Jenkins, and was originally called Hudson. Jenkins The Definitive Guide

1. There are two implementations you can choose from: Jenkins and Murmur. Hadoop in Practice

1. The encoding of the region names for the on-disk directories is also different: they use a Jenkins hash to encode the region name. HBase：The Definitive Guide

1. His current topics of research center around project automation, including: build tools (Gradle, Leiningen, Maven, Ant), distributed version control (Git, Mercurial), testing frameworks (Geb, Spock, JUnit, TestNG, Mockito), continuous integration (Jenkins, Hudson, Bamboo) and code quality metrics (Sonar, CodeNarc, PMD) Building and Testing with Gradle

1. Jenkins Tool used for OpenStack development to run jobs automatically. OpenStack Operations Guide

1. The first book to include detailed screenshots and recipes for using Jenkins continuous integration server (formerly known as Hudson) Selenium 2 Testing Tools Beginner's Guide

23、单词 couchdb 的匹配文本：

1. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc., Getting Started with CouchDB, the cover image of a hawk’s bill sea turtle, and related trade dress are trademarks of O’Reilly Media, Inc. Getting Started with CouchDB

1. CouchDB: The Definitive Guide, the image of a Pomeranian dog, and related trade dress are trademarks of O’Reilly Media, Inc. CouchDB The Definitive Guide

1. For example, CouchDB uses a combination of counter and content hash. NoSQL Distilled

1. Apache CouchDB is an exciting database management system that is steadily improving with each new version. Beginning CouchDB

1. Scaling CouchDB, the image of a chough, and related trade dress are trademarks of O’Reilly Media, Inc. Scaling CouchDB

1. In fact, some document stores (like CouchDB) have made this a key point of their value proposition: documents can be held in a multimaster database that automatically replicates concurrently accessed, contended state across instances without undue interference from the user. Graph Databases

1. CouchDB is a document-oriented database—a distributed keyvalue store that encodes values in JSON. SQL Antipatterns

1. This is not entirely surprising, as its creator, Damien Katz, worked on Lotus Notes at IBM before deciding to embark on this project, with the idea that it would be a database ”of the Web.” The documents stored by CouchDB do not need to share the same schema, and query capabilities are made available via views, which are constructed as JavaScript functions. Cassandra The Definitive Guide

1. For example, CouchDB (which is generally considered as MongoDB’s biggest “competitor”) doesn’t support dynamic queries. The Definitive Guide to MongoDB

1. As of this writing, the only well-known document database apart from MongoDB is Apache’s CouchDB. MongoDB in Action

24、单词 cpu 的匹配文本：

1. By working with the grain and letting CouchDB do this for us, we save ourselves a tremendous amount of CPU cycles that would otherwise have been spent serializing object graphs from SQL, converting them into domain objects, and using those objects to do application-level validation. CouchDB The Definitive Guide

1. You are only limited by resources such as memory, CPU time, and hard disk space. Cassandra High Performance Cookbook

1. More recent distributed file systems have adopted architectures based on object-based storage, in which conventional hard disks are replaced with intelligent object storage devices (OSDs) which combine a CPU, network. Ceph A Scalable, High-Performance Distributed File System

1. The salient feature of Scuttlebutt is that it has very efficient CPU utilization and very efficient utilization of the gossip channel. Cassandra A Decentralized Structured Storage System

1. This could be through added memory, a faster CPU, or larger hard drives. Scaling CouchDB

1. Ideal maximum throughput: • In practice, network bandwidth or cpu limits the total capacity in bytes. BookKeeper

1. OSDs constructed from commodity components combine a CPU, network interface, and local cache with an underlying disk or RAID, and replace the convention block-based storage interface with one based on named, variable-length objects. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. But with big enough from values, the sorting process can become very heavy indeed, using vast amounts of CPU, memory and bandwidth. Elasticsearch The Definitive Guide

1. The least disruptive thing (for the programming staff, anyway) is to vertically scale by adding CPU, adding memory, and getting faster hard drives, but if you continue to be successful and add more customers, at some point (perhaps into the tens of millions of rows), you’ll likely have to start thinking about how you can add more machines. Cassandra The Definitive Guide

1. It’s also advisable to reduce the CPU load per connection. ActiveMQ in Action

25、单词 bytes 的匹配文本：

1. The body is an array of bytes that contains the actual payload. Apache Flume Distributed Log Collection for Hadoop

1. For CPU stats, for example, the value represents a percentage, and network stats interpret the value as a raw number of bytes. Monitoring with Ganglia

1. The Content-Length header simply tells us how many bytes the response body has. CouchDB The Definitive Guide

1. The number of segments an index will contain is proportional to the logarithm of the next size in bytes. Mastering ElasticSearch

1. Storing large objects The row data stored in Cassandra is typically smaller in size, between a few bytes to a few thousand bytes. Cassandra High Performance Cookbook

1. File and directory metadata in Ceph is very small, consisting almost entirely of directory entries (file names) and inodes (80 bytes) Ceph A Scalable, High-Performance Distributed File System

1. Ideal maximum throughput: • In practice, network bandwidth or cpu limits the total capacity in bytes. BookKeeper

1. This sorts by directly comparing the bytes, validating that the input can be parsed as US-ASCII. Cassandra The Definitive Guide

1. Efficiency—Since this connector uses the OpenWire protocol to convert messages to a stream of bytes (and back), it’s very efficient in terms of network usage and performance. ActiveMQ in Action

1. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Bigtable - A Distributed Storage System for Structured Data

26、单词 lucene 的匹配文本：

1. Nevertheless, I hope that by reading this book you'll be able to easily get into all the details about ElasticSearch and underlying Apache Lucene, and I also hope that it will let you get the desired knowledge easier and faster. Mastering ElasticSearch

1. His background includes a wide range of programming languages (including Java and C#), search engines (including Lucene), databases, and NoSQL distributed data stores. Cassandra High Performance Cookbook

1. It's open-source, distributed and it's built on top of Apache Lucene, an open-source search engine library, which allows you to implement search functionality in your own Java application. Elasticsearch in Action MEAP V07 cHAPTER 1

1. In this chapter we'll be using Apache Lucene-based Apache Solr and ElasticIndex for indexing purposes. Hadoop MapReduce Cookbook

1. Your data might be XML files sitting behind a number of FTP servers, text log files sitting on a central web server, or Lucene indexes1 in HDFS. Hadoop in Practice

1. Elasticsearch uses Lucene internally for all of its indexing and search, but it. Elasticsearch The Definitive Guide

1. Does not work well with other Lucene constructs, like FieldCache (by doc info) elasticsearch. The Road to a Distributed, (Near) Real Time, Search Engine

1. Hadoop started out as a subproject of Nutch , which in turn was a subproject of Apache Lucene. Hadoop in Action

1. Integration with Apache Lucene/Solr is available to store external indexes and perform fast global searches. Cassandra The Definitive Guide

1. Doug started work on the implementations of these Google systems, and Hadoop was soon born, firstly as a subproject of Lucene and soon was its own top-level project within the Apache open source foundation. Hadoop Beginners Guide

27、单词 redis 的匹配文本：

1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop

1. Redis is good at queue operations, and occasionally you can use memcached for this purpose, too. High Performance MySQL 3rd Edition

1. Twitter is built atop several open source protocols, primarily HTTP, Thrift, Memcached, MySQL, and Redis. Netty in Action MEAP V08

1. In a realworld application you’d include the logic for increasing the user points, say on a Redis database. RabbitMQ in Action

1. This example is a basic means for writing to a number of Redis instances in parallel from MapReduce. MapReduce Design Patterns

1. For example, Riak allows you to add metadata to aggregates for indexing and interaggregate links, Redis allows you to break down the aggregate into lists or sets. NoSQL Distilled

1. Key-value stores You can use applications such as Memcached or Redis for external caching. bk-admin-guide-cloud-latest

1. There is also a recipe that shows how to use Redis to join data from an external data store. Hadoop RealWorld Solutions Cookbook

1. Graph databases such as Neo4J and the set of databases derived at least in part from the design of Google’s Bigtable database (such as MongoDB, HBase, Hypertable, and Redis) all are focused slightly less on Availability and more on ensuring Consistency and Partition Tolerance. Cassandra The Definitive Guide

1. Redis Cookbook, the image of the mouse opossum, and related trade dress are trademarks of O’Reilly Media, Inc. Redis Cookbook

28、单词 html 的匹配文本：

1. I won't cover this here, but if you need it, there is an example in the Flume User Guide (http://flume.apache.org/FlumeUserGuide.html) Apache Flume Distributed Log Collection for Hadoop

1. Depending on your platform and installation type, your browser may have already been opened with the correct link; if not, you can access Futon, the built-in administration interface to CouchDB by going to the URL: http://127.0.0.1:5984/_utils/index.html Futon is a web-based interface to the main functionality in CouchDB and provides support for editing the configuration information, creating databases, documents, design documents (and therefore views, shows and lists) and starting and requesting replication. Getting Started with CouchDB

1. The source and target members of our replication request are actually links (like in HTML) and so far we’ve seen links relative to the server we’re working on (hence local) CouchDB The Definitive Guide

1. One of the examples of character mapper is HTML tags removal process. Mastering ElasticSearch

1. He dabbled with web development in the late 1990s and was an expert in HTML, CSS, and JavaScript by the time he finished school. Beginning CouchDB

1. Change into the log directory and generate the HTML and graph reports using the tsung_stats.pl script package with Tsung: Scaling CouchDB

1. In this book, we'll stick with the popular formats: JSON for HTML communication and YAML for configuration. Elasticsearch in Action MEAP V07 cHAPTER 1

1. This contains a snippet of text from the about field with the matching words wrapped in HTML tags: Elasticsearch The Definitive Guide

1. If you’re interested, you can read the complete Avro specification at http://avro.apache.org/docs/current/spec.html, though it is definitely not required to work with Cassandra. Cassandra The Definitive Guide

1. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action

29、单词 metrics 的匹配文本：

1. This mode is good for things like metrics where gaps can easily be tolerated, as new data is just a second away. Apache Flume Distributed Log Collection for Hadoop

1. As you can see from the help page, the first version of ganglia allowed you to query and sort by 21 different system metrics right out of the box. Monitoring with Ganglia

1. Using the JDepend plugin To get quality metrics for our code base, we can use JDepend. Gradle Effective Implementation Guide

1. We expose various system level metrics to Ganglia and this has helped us understand the behavior of the system when subject to our production workload. Cassandra A Decentralized Structured Storage System

1. Every key in the response contains the same metrics: current, count, mean, max, min, stddev, and description. Beginning CouchDB

1. This makes it easy for the owners of shared services to rank their users in terms of various metrics (e.g., inbound network load, outbound network load, or total time spent servicing requests) Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Hector directly supports JMX by exposing metrics such as bad connections, available connections, idle connections, and more. Cassandra The Definitive Guide

1. As with any other infrastructure software, it’s important for developers and administrators to be able to monitor broker metrics during runtime and notice any suspicious behavior that could possibly impact messaging clients. ActiveMQ in Action

1. We log various facts and metrics for each of these queries. PowerDrill - Processing a Trillion Cells per Mouse Click

1. His current topics of research center around project automation, including: build tools (Gradle, Leiningen, Maven, Ant), distributed version control (Git, Mercurial), testing frameworks (Geb, Spock, JUnit, TestNG, Mockito), continuous integration (Jenkins, Hudson, Bamboo) and code quality metrics (Sonar, CodeNarc, PMD) Building and Testing with Gradle

30、单词 cassandra 的匹配文本：

1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop

1. Because of high performance, a small codebase, and compact resulting data, there is a wide adoption of Avro not only in the Hadoop community, but also by many other NoSQL implementations (including Cassandra) Professional Hadoop Solutions

1. Over 150 recipes to design and optimize large-scale Apache Cassandra deployments. Cassandra High Performance Cookbook

1. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers) Cassandra A Decentralized Structured Storage System

1. The company looked at various options, including RDBMS, in-memory counters, and Cassandra, before settling on HBase. HBase in Action

1. He closely works with Big Data technologies, such as Hadoop and Cassandra daily. Hadoop MapReduce Cookbook

1. Custom IndexReader and IndexWriter to work on top of Cassandra. elasticsearch. The Road to a Distributed, (Near) Real Time, Search Engine

1. Cassandra: The Definitive Guide, the image of a Paradise flycatcher, and related trade dress are trademarks of O’Reilly Media, Inc. Cassandra The Definitive Guide

1. Get the best out of Cassandra using this efficient recipe bank. Hadoop Beginners Guide

1. Programming Hive introduces Hive, an essential tool in the Hadoop ecosystem that provides an SQL (Structured Query Language) dialect for querying data stored in the Hadoop Distributed Filesystem (HDFS), other filesystems that integrate with Hadoop, such as MapR-FS and Amazon’s S3 and databases like HBase (the Hadoop database) and Cassandra. Programming Hive

31、单词 mapper 的匹配文本：

1. Usually, each mapper is assigned a single block of a file as input (unless you have used certain compression codecs) Apache Flume Distributed Log Collection for Hadoop

1. One of the examples of character mapper is HTML tags removal process. Mastering ElasticSearch

1. The Map phase then invokes a user-defined function, the Mapper, on each element, independently and in parallel. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. The mapper takes a column and breaks it into tokens (individual words) using StringTokenizer, a class that splits strings on common tokens such as spaces and columns. Cassandra High Performance Cookbook

1. The code has three parts—mapper, reducer, and the main program. Hadoop MapReduce Cookbook

1. The mapper splits the line into distinct words, and outputs each word (the key) along with the word's originating filename (the value) Hadoop in Practice

1. Each phase is defined by a data processing function, and these functions are called mapper and reducer, respectively. Hadoop in Action

1. Readers familiar with MapReduce programs will notice how familiar this mapper looks. Cassandra The Definitive Guide

1. If small enough, the secondary table is pulled into the memory of each mapper / reducer process for inmemory lookups, which typically is the fastest method for joining. Tenzing - A SQL Implementation On The MapReduce Framework

1. Note that each mapper simply outputs a series of individual key/value outputs; these are combined into a key and list of values in the shuffle method. Hadoop Beginners Guide

32、单词 filesystem 的匹配文本：

1. The problem with HDFS and streaming data/logs HDFS isn't a real filesystem, at least not in the traditional sense, and many of the things we take for granted with normal filesystems don't apply here, for example being able to mount it. Apache Flume Distributed Log Collection for Hadoop

1. Change to the actual mount point of a SAN filesystem. Monitoring with Ganglia

1. For single documents, the request then gets passed to the database module where the data for the document is looked up on the filesystem and returned all the way up again. CouchDB The Definitive Guide

1. Our access to the cluster was too brief to debug and characterize the middling performance from the 2.1.0 filesystem. Apache Hadoop YARN Yet Another Resource Negotiator

1. In case of ElasticSearch it is usually better to use the new IO-based system store instead of the simple filesystem store. Mastering ElasticSearch

1. Most of the software will be installed to various locations throughout the Linux filesystem, so where exactly you download and compile the applications is not important. Beginning CouchDB

1. For example, a call to fileTree() might scan the filesystem at the time it is called, producing an immutable list that the build can then manipulate. Gradle Beyond The Basics

1. At startup time, the HDFS NameNode reads filesystem metadata from a file called the fsimage file. Apache Hadoop Goes Realtime at Facebook

1. This might mean optimizing the channels the database uses to write to the underlying filesystem. Cassandra The Definitive Guide

1. The colon at the beginning of the project name indicates the root of the project tree, in a similar way that a forward slash indicates the root directory in a Unix filesystem, or a backslash indicates the root directory on Windows. Building and Testing with Gradle

33、单词 json 的匹配文本：

1. The format is also self-describing using JSON, making for a good long-term data storage format, as your data format may evolve over time. Apache Flume Distributed Log Collection for Hadoop

1. In many cases, your application can use the JSON object structure directly without having to serialize your objects or data into the fixed format required by the database engine. Getting Started with CouchDB

1. Case Study Greg Borenstein, a friend and coworker, built a small library for converting Songbird playlists to JSON objects and decided to store these in CouchDB as part of a backup application. CouchDB The Definitive Guide

1. From the client point of view, document is a JSON object (see more about JSON format at http://en.wikipedia.org/wiki/JSON) Mastering ElasticSearch

1. Chapter 9, Coding and Internals: This chapter covers programming recipes that go beyond the typical application programmer interface, including building Cassandra from source, creating custom types for use with Cassandra, and modifying tools like the JSON export tools. Cassandra High Performance Cookbook

1. In the second part of this book, you will learn about Futon, CouchDB’s excellent web-based admin interface, and then be introduced to the JSON format, which is used to describe the data in CouchDB. Beginning CouchDB

1. Replication can be run from Futon, CouchDB’s web administration console, or by sending a POST request to _replicate containing a JSON object with replication parameters. Scaling CouchDB

1. It allows you to send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Any optional query string parameters, eg ?pretty will pretty-print the JSON response to make it easier to read. Elasticsearch The Definitive Guide

1. It also has scripts for converting SSTables (the datafiles) to JSON and back. Cassandra The Definitive Guide

34、单词 annotations 的匹配文本：

1. Code annotations accompany many of the code listings and highlight important concepts. Gradle in Action

1. Gradle scans the test classpath for all class files and checks if they have specific JUnit or TestNG annotations. Gradle Effective Implementation Guide

1. To eliminate programmer annotations that help identify dependencies, we exploit the structure of MapReduce computations. Incoop MapReduce for Incremental Computations

1. A plug-in that creates WSDL files from an annotated Java web service implementation should not include its own code for scanning for annotations and generating content in a specialized XML vocabulary, but should instead declare a dependency on an existing library to do that work, and provide a mechanism for that library to be fetched from an online repository if it is not already present on the build system. Gradle Beyond The Basics

1. In commands and source code, we make extensive use of bold text; and annotations draw your attention to the important aspects of listings. HBase in Action

1. Our instrumentation is restricted to a low enough level in the software stack that even largescale distributed systems like Google web search could be traced without additional annotations. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Code annotations accompany many of the listings, highlighting important concepts. Hadoop in Action

1. Kundera (Java ORM) Kundera is an object-relational mapping (ORM) implementation for Cassandra written using Java annotations. Cassandra The Definitive Guide

1. Components of Hadoop that wish to produce metrics implement the MetricsSource interface or use a set of simple Java annotations; those that wish to receive and process metric data implement the MericsSink interface. Hadoop Operations

1. This facilitates compiled tests being able to reference JUnit or TestNG annotations or base classes. Building and Testing with Gradle

35、单词 servlet 的匹配文本：

1. An example of such an exception type is the ServletException. Core Java Volume I Fundamentals 9th Edition

1. Some of the additional modules that are available from this repository include modules for monitoring an Apache Web Server, MySQL database, and Xen virtual machine, as well as Tomcat and Jetty servlet monitoring through JMX. Monitoring with Ganglia

1. As the name Servlet may already indicate, it serves a client request and constructs the response. Gradle in Action

1. Finally, we will add a web project with a Java servlet class that uses classes from the services project. Gradle Effective Implementation Guide

1. In a servlet or JavaServer Page (JSP) that you write in a SPNEGO-protected web application, you can get the user information passed in the HTTP Kerberos SPNEGO simply by calling getUserPrincipal() and getRemoteUser() on the HttpRequest object in the same way that you would get the principal information if the user used username/password authentication, or even digital certificate authentication. Professional Hadoop Solutions

1. What’s interesting with the implementation of this servlet is that it redirects the actual file download to one of the DataNodes that contains the first block of the file. Hadoop in Practice

1. Finally, coprocessors can be chained, very similar to what the Java Servlet API does with request filters. HBase：The Definitive Guide

1. Similarly, as servlet authors need not implement the low-level details of socket I/O, event handling loops, and complex thread coordination, MapReduce developers program to a well-defined, simple interface and the “container” does the heavy lifting. Hadoop Operations

1. The first type of application server implements the Java Servlet specification (http://mng.bz/cmMj) and is known as a web container. ActiveMQ in Action

1. Most socket-based web services (for example, HTTP daemons, servlet engines, and application servers) are multithreaded. Concurrent Programming in Java Design Principles and Pattern

36、单词 jdbc 的匹配文本：

1. Using the JDBC channel is a bad idea as it would introduce a bottleneck and single point of failure instead of what should be designed as a highly distributed system. Apache Flume Distributed Log Collection for Hadoop

1. I don’t want to bore you with repetitive details of a class that uses JDBC, so I won’t discuss the code in detail. Gradle in Action

1. Instead of moving data between clusters, Sqoop was designed to move data from and into relational databases using a JDBC driver to connect. Hadoop RealWorld Solutions Cookbook

1. Alternatively, you can place the database in a directory outside of your development directory and edit the JDBC URL to point to the correct path, but this step is left as an exercise for the reader. Gradle Beyond The Basics

1. Discussion MapReduce contains DBInputFormat and DBOutputFormat classes, which can be used to read and write data from databases via JDBC. Hadoop in Practice

1. Conceptually this is analogous to using JDBC to execute SQL queries. Hadoop in Action

1. Instead of just abstracting the database interactions from the developer’s point of view, the way JDBC does, an entirely different mechanism is used. Cassandra The Definitive Guide

1. You can of course use any RDBMS for which a JDBC driver is available, but if you do so, you'll need to modify the aspects of this chapter that require direct interaction with the database server. Hadoop Beginners Guide

1. For even greater performance, Sqoop supports database-specific plug-ins that use native features of the RDBMS rather than incurring the overhead of JDBC. Hadoop Operations

1. For example, ActiveMQ offers its own style of ultra-fast message persistence via KahaDB, but also supports standard JDBC-accessible databases. ActiveMQ in Action

37、单词 parser 的匹配文本：

1. If there is an error message in a log file that indicates that Jetty could not start its web server, there is a nonvalidating XML parser in the classpath ahead of the validating XML parser that Hadoop Core supplies. Pro Hadoop

1. A typical example of a transitive dependency would be the XML parser library Xerces that requires the XML APIs library to function correctly. Gradle in Action

1. In the dialog window with task types, we select JUnit Parser from the Tests section. Gradle Effective Implementation Guide

1. Lucene query language Some of the query types provided by ElasticSearch support Apache Lucene query parser syntax. Mastering ElasticSearch

1. Since the XML is pretty simple, we usually avoid using a full-blown XML parser and just parse it with some string operations in our Java code. MapReduce Design Patterns

1. This is a perfect job for a Copy task with a filter attached to it, but that filter has a lot of work to do! It has to be a full-featured Markdown parser and HTML renderer. Gradle Beyond The Basics

1. The XML element in Text form has been supplied, so you’ll need to use an XML parser to extract content from the XML. Hadoop in Practice

1. It just calls the parser’s parse() method, which parses the fields of interest from a line of input, checks whether. Hadoop：The Definitive Guide 3rd Edition

1. Besides having added components like a crawler and a parser, a web search engine differs from a basic document search engine in terms of scale. Hadoop in Action

1. Hive processes the above SQL query in a series of phases: (1) The parser transforms the query into an Abstract Syntax Tree. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

38、单词 activemq 的匹配文本：

1. Right now there are many different pieces of software designed specifically for task queues (ActiveMQ, RabbitMQ, Gearman, Amazon SQS, and others), but there are also ad hoc methods of creating task queues in situations where queues aren’t expected. Redis in Action

1. We selected ActiveMQ as a potential system for the message queue. Building LinkedIn’s Real-time Activity Data1

1. Setting up an ActiveMQ server and using JMS with Spring are discussed in the following sections. Pro Spring 3

1. ActiveMQ will be supported for many more years to come! Next Generation Open Source Messaging with Apollo

1. Options like ActiveMQ, ZeroMQ, and Apache Qpid all providing different open source approaches to message queuing. RabbitMQ in Action

1. We’ll need to install the Sun Java Runtime, get the ActiveMQ package, and configure it. Managing Infrastructure with Puppet

1. Puppet queue uses the ActiveMQ middleware service to handle message passing and queuing. Pro Puppet

1. Kafka can be compared with Scribe or Flume as it is useful for processing activity stream data; but from the architecture perspective, it is closer to traditional messaging systems such as ActiveMQ or RabitMQ. Apache Kafka

1. ActiveMQ and RabbitMQ don’t seem to have an easy way to batch messages and we assume that it used a batch size of 1 Kafka a Distributed Messaging System for Log Processing

1. ActiveMQ was developed to provide an open source alternative: its central theme is to put the power in the hands of developers. ActiveMQ in Action

39、单词 jms 的匹配文本：

1. Performance test of different server types including web (HTTP and HTTPS), SOAP, database, LDAP, JMS, mail, and native commands or shell scripts. Performance Testing With JMeter 2.9

1. This specification a JMS provider that implements both point-to-point messaging as well as. Java EE 7 Specification

1. In addition, several JMS interfaces implement Autocloseable and thus are automatically closed after use. Java EE 7 Essentials

1. Since it is not a required component of Profile, Web Profile products are not required to include an entation of JMS, nor do they have to support other JMS-related. Java EE 7 Web Profile Specification

1. At first, it seems like a thoughtful and elegant design to ensure overall durability of the database, and appears unproblematic because it is familiar from many distributed computing paradigms, such as Java Message Service (JMS) Cassandra The Definitive Guide

1. By far, the most commonly used appenders write log events to disk, but appenders for outputting log events to the console, sending data to syslog, or even to JMS exist. Hadoop Operations

1. You can do this using an underlying messaging protocol, such as JMS, or independently of the transport protocol using the design of the Web Service. Java Performance Tuning 2nd Edition

1. Although ActiveMQ implements the JMS specification, which has not moved in many years, ActiveMQ provides many features beyond this spec and it has not stopped innovating. ActiveMQ in Action

1. However, for asynchronous interactions, say, a Web service or JMS client, the rate at which messages are injected into the system, injection rate, may be the more important criteria. Java Performance

1. For different approaches, see for example the Aleph, JMS, and Ninja packages, accessible via links from the online supplement. Concurrent Programming in Java Design Principles and Pattern

40、单词 tika 的匹配文本：

1. Apache Tika, a project in itself extends capabilities of Apache Solr to run on top of different types of fi les. Scaling Big Data with Hadoop and Solr Chapter-02 Understanding Solr

1. Chris began to advocate for Tika as a standalone project in 2006 Tika in Action

1. Tika makes a variety of metadata available about documents; this example just used two. Solr 3 Enterprise Search Server

1. To do that, we either need to parse the data in some external application or setup Solr to use Apache Tika. Solr 3.1 Cookbook

1. The fetch function makes use of the Apache Tika library (http://tika.apache.org/) to fetch and extract the contents of the document: Storm Real-Time Processing Cookbook

1. With Solr this is easy because it integrates with Apache Tika project that supports most popular document formats. Solr in action

1. It also does not provide any document filters, although Lucene has a sister project at Apache, Tika, which handles document filtering very well. Lucene in Action 2nd Edition

1. To do that, we either need to parse the data in some external application or set up Solr to use Apache Tika. Apache Solr 4 Cookbook

41、单词 configuring 的匹配文本：

1. So, let us start with the first step, downloading and configuring Flume. Apache Flume Distributed Log Collection for Hadoop

1. Configuring and Building CouchDB Unless you have specific requirements, configure will probably work everything out for you and you can simply run: Getting Started with CouchDB

1. Configuring CouchApp with .couchapprc If you don’t want to have to put the full URL (potentially including authentication parameters) of your database onto the command line each time you push, you can use the .couchapprc file to store deployment settings. CouchDB The Definitive Guide

1. Now let's look at the possibilities of configuring the behavior of query rewriting. Mastering ElasticSearch

1. However, there are a few subtle things to keep in mind when configuring them. Cassandra High Performance Cookbook

1. Configuring CouchDB When you first build or install CouchDB on your system, there is zero configuration required to start creating CouchDB databases. Beginning CouchDB

1. Instead, we will focus on installing and configuring the Apache HTTP Server as a load balancer. Scaling CouchDB

1. Additionally, configuring analyzers at node level requires changing the config file on every node and restarting every node which becomes a maintenance nightmare. Elasticsearch The Definitive Guide

1. Chapter 6, Configuring Cassandra This chapter shows you how to specify partitioners, replica placement strategies, and snitches. Cassandra The Definitive Guide

1. In part 2, you’ll learn about configuring various aspects of ActiveMQ for connectivity, message persistence, and security. ActiveMQ in Action

42、单词 namespace 的匹配文本：

1. To add properties, you’re required to use the ext namespace. Gradle in Action

1. Part of this long iteration phase was due to a slow realization that Spanner should do more than tackle the problem of a globallyreplicated namespace, and should also focus on database features that Bigtable was missing. Spanner - Google’s Globally-Distributed Database

1. The Ceph file system has three main components: the client, each instance of which exposes a near-POSIX file system interface to a host or process; a cluster of OSDs, which collectively stores all data and metadata; and a metadata server cluster, which manages the namespace (file names and directories) while coordinating security, consistency and coherence (see Figure 1) Ceph A Scalable, High-Performance Distributed File System

1. The basics: Planet-scale structured storage Next generation of Bigtable stack Provides a single, location-agnostic namespace Manual and access-based data placement. Storage Architecture and Challenges

1. In a different scenario, you might decide to “namespace” the task names by prefixing them with lb or liquibase to keep them from colliding with tasks from other plug-ins, but for our purposes here we can keep the task names short and simple. Gradle Beyond The Basics

1. Actually, in Elasticsearch, our data is stored and indexed in shards, while an index is just a logical namespace which groups together one or more shards. Elasticsearch The Definitive Guide

1. The keyspace is essentially just a logical namespace to hold column families and certain configuration properties. Cassandra The Definitive Guide

1. The following listing demonstrates how to configure ActiveMQ using its custom Spring schema via an XML namespace. ActiveMQ in Action

1. Chubby provides a namespace that consists of directories and small files. Bigtable - A Distributed Storage System for Structured Data

1. Gradle brings in everything from the Ant namespace into the Gradle namespace such that using a core Ant task is as easy as calling ant.
  
  Building and Testing with Gradle

43、单词 www 的匹配文本：

1. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub. Apache Flume Distributed Log Collection for Hadoop

1. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia. Getting Started with CouchDB

1. Dear reader, refer to http://www.elasticsearchserverbook.com from time to time, where you'll be able to find the newest errata dedicated to the book and additional articles extending it. Mastering ElasticSearch

1. Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. Cassandra High Performance Cookbook

1. For information on translations, please e-mail [email protected], or visit http://www.apress.com. Beginning CouchDB

1. For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com. Scaling CouchDB

1. Typically, you'll download the package from http://www.elasticsearch.org/download/, which is the best fit for your environment. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Finding Out More If you’d like to find out more about Cassandra, and to get the latest updates, visit this book’s companion website at http://www.cassandraguide.com. Cassandra The Definitive Guide

1. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action

1. Irrelevant details were elided to keep the example short.) The call to Apply performs an atomic mutation to the Webtable: it adds one anchor to www.cnn.com and deletes a different anchor. Bigtable - A Distributed Storage System for Structured Data

44、单词 jrockit 的匹配文本：

1. I remember quite clearly the first time I met the JRockit team. Oracle JRockit The Definitive Guide

1. Work with the JRockit Mission Control 3.1/4.0 tools suite to debug or profile your Java applications. Java 7 Concurrency Cookbook

45、单词 linux 的匹配文本：

1. You'd be hard pressed to find such a company using the bleeding-edge Linux kernel rather than something like Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu LTS, or any of the other distributions that aim for is stability and compatibility. Apache Flume Distributed Log Collection for Hadoop

1. Install using the native packages for your chosen Linux platform. Getting Started with CouchDB

1. Debian GNU/Linux You can install the CouchDB package by running: CouchDB The Definitive Guide

1. In this second case, called EC2, Amazon provides server instances and of course they can be used for installing and running the ElasticSearch cluster (among many other things as those are normal Linux machines) Mastering ElasticSearch

1. He has experience as a developer as well Linux and network administrator and enjoys the rich world of Open Source software. Cassandra High Performance Cookbook

1. A user space approach, aside from providing greater flexibility and easier implementation, also avoids cumbersome interaction with the Linux VFS and page cache, both of which were designed for a different interface and workload. Ceph A Scalable, High-Performance Distributed File System

1. Next, I provide you with a step-by-step guide to installing CouchDB on various distributions of the Linux operating system as well as on a Mac, before guiding you through the creation of your first database. Beginning CouchDB

1. Managing scalability in object storage systems for HPC Linux clusters. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. On Linux, GZip extraction utilities should be preinstalled; on Windows, you’ll need to get a program such as WinZip, which is commercial, or something like 7-Zip, which is freeware. Cassandra The Definitive Guide

1. If you don’t have the Sun J2SE installed and you’re using Linux, Solaris, or Windows, download and install it from the following URL: http://www.oracle.com/technetwork/java/javase/downloads/ index.html. ActiveMQ in Action

46、单词 jpa 的匹配文本：

1. This technique is helpful for defining Scala classes that work with the Java Persistence API (JPA) Scala in Depth

1. Hibernate implements the Java Persistence API (JPA), and by using the JPA annotation Entity you’re specifying Hibernate to persist the object to the database. Scala in Action

1. The support for Hibernate, MyBatis, JDO, and JPA makes already simple APIs even simpler, thus easing the burden on developers. Pro Spring 3

1. This work can be done by hand.The Java Persistence Architecture (JPA) allows you to annotate a Java object and use JPA to handle the serialization and de serialization automatically. Cassandra High Performance Cookbook

1. Often these objects are backed by a relational database of some type, and you manage them through object relational mapping tools such as Hibernate, JPA, or JDO. Solr 3 Enterprise Search Server

1. This causes problems when using JPA annotations in Scala code, for example, as discussed in [JPAScala] Programming Scala

1. JPA/JPO You can use, for example, DataNucleus to put a JPA/JPO access layer on top of HBase. HBase：The Definitive Guide

1. The Java Persistence API (JPA) is defined as JSR 338, and the complete specification can be downloaded. Java EE 7 Essentials

1. Note: The JPA specification is a separate specification referenced in the EJB 3.0 specification.) Session beans usually implement the core business logic of an enterprise application and represent the interactive session. Java Performance

1. Those of you that are new to JPA can rest assured that you were not forgotten. Pro JPA2

47、单词 rabbitmq 的匹配文本：

1. Other sinks are available on the Internet that can be used to write data to MongoDB, Cassandra, RabbitMQ, Redis, and just about any other data store you can think of. Apache Flume Distributed Log Collection for Hadoop

1. RabbitMQ and Gearman21 can be very helpful for some purposes, too. High Performance MySQL 3rd Edition

1. This is usually implemented with RabbitMQ today, but could be any AMPQ message queue (such as Apache Qpid), or Zero MQ. bk-compute-adminguide-grizzly

1. The easiest way to create a failure-tolerant message queue is probably a combination of RabbitMQ and Hovercraft. CouchDB The Definitive Guide

1. This is currently implemented with RabbitMQ today, but theoretically could be any AMPQ message queue supported by the Python ampqlib and carrot libraries. Deploying OpenStack

1. Usually implemented with RabbitMQ, but could be any AMPQ message queue, such as Apache Qpid or Zero MQ. bk-admin-guide-cloud-latest

1. An OpenStack highavailability configuration uses existing native Pacemaker RAs (such as those managing MySQL databases or virtual IP addresses), existing third-party RAs (such as for RabbitMQ), and native OpenStack RAs (such as those managing the OpenStack Identity and Image Services) high-availability-guide

1. ActiveMQ and RabbitMQ don’t seem to have an easy way to batch messages and we assume that it used a batch size of 1 Kafka a Distributed Messaging System for Log Processing

1. There are lots of different queues you could use: Amazon’s SQS, RabbitMQ, or even a MongoDB capped collection (although make sure it’s on a separate server than the cluster it’s protecting) Scaling MongoDB

1. A dedicated VM was created for each service (nova-scheduler, rabbitmq, database etc) OpenStack Operations Guide

48、单词 concurrency 的匹配文本：

1. By reducing latency for a given system (and that is true not only for storage systems), you affect concurrency and throughput capabilities. CouchDB The Definitive Guide

1. Like MapReduce, Dryad handles the details of communication, partitioning, placement, concurrency and fault tolerance. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. This architecture is designed to control resource utilization in a high concurrency environment. Cassandra High Performance Cookbook

1. In addition to the actual data persistence component, the system needs to have the following characteristics; scalable and robust solutions for load balancing, membership and failure detection, failure recovery, replica synchronization, overload handling, state transfer, concurrency and job scheduling, request marshalling, request routing, system monitoring and alarming, and configuration management. Cassandra A Decentralized Structured Storage System

1. Add to this the fact that CouchDB is developed in Erlang OTP, a fault-tolerant programming language that offers excellent concurrency features, and you know that your CouchDB database will scale well without a loss of reliability and availability. Beginning CouchDB

1. Transaction, distribution, and concurrency boundaries are also used in domain-driven design when defining Aggregates. Scaling CouchDB

1. Pessimistic concurrency control widely used by relational databases, assumes that conflicting changes are likely to happen and so blocks access to a resource in order to prevent conflicts. Elasticsearch The Definitive Guide

1. Cassandra, however, is different: its concurrency model is based on SEDA, so a single operation may start with one thread, which then hands off the work to another thread, which may hand it off to other threads. Cassandra The Definitive Guide

1. A note on multithreading in JMS applications The JMS spec specifically defines concurrency for various objects in the JMS API and requires that only a few objects support concurrent access. ActiveMQ in Action

1. As a result, concurrency control over rows can be implemented very efficiently. Bigtable - A Distributed Storage System for Structured Data

49、单词 frameworks 的匹配文本：

1. Designs that incorporate single points of failure are generally to be avoided in large application frameworks where it can be safely assumed, given the sheer amount of hardware involved, that some percentage of nodes are always going to be on the fritz. Monitoring with Ganglia

1. We simulate a single resource manager and two scheduler frameworks, one handling batch jobs and one handling service jobs. Omega - flexible, scalable schedulers for large compute clusters

1. A common pattern in most languages, frameworks, and databases is using types to distinguish between subsets of your data. CouchDB The Definitive Guide

1. Explore the versatility of Spring Python by integrating it with frameworks, libraries, and tools. Mastering ElasticSearch

1. Our books and publications share the experiences of your fellow IT professionals in adapting and customizing today's systems, applications, and frameworks. Cassandra High Performance Cookbook

1. He introduced the concept of JavaScript frameworks and Ajax to the company’s web division in 2008, developing a PL/SQL and JavaScript framework that is now used in every Core web product. Beginning CouchDB

1. We plan to instrument nonRPC communication frameworks as they evolve and find a user base. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. And if you’re an application developer, you’ll no doubt be familiar with the many object-relational mapping (ORM) frameworks that have sprung up in recent years to help ease the difficulty in mapping application objects to a relational model. Cassandra The Definitive Guide

1. These powerful but complex frameworks are designed for software engineers implementing complex parallel algorithms. Tenzing - A SQL Implementation On The MapReduce Framework

1. The first is the simple testing of Java classes with existing test frameworks like JUnit and TestNG. Building and Testing with Gradle

50、单词 subclass 的匹配文本：

1. If there are multiple requests to create the same vertex in the same superstep, then by default the system just picks one arbitrarily, but users with special needs may specify a better conflict resolution policy by defining an appropriate handler method in their Vertex subclass. Pregel - A System for Large-Scale Graph Processing

1. A TreeMap is a subclass of Map that sorts on key. MapReduce Design Patterns

1. No methods are required for implementation; however, in order for the class to function at Hive runtime as a UDF, your subclass must override evaluate() Hadoop RealWorld Solutions Cookbook

1. That Coordinate is translated into a Point, a subclass of Geometry, using the same GeometryFactory class you used earlier. HBase in Action

1. To use the TwoDArrayWritable type as the value type of a reducer's input, you need to specify the type of the stored values by creating a subclass of the TwoDArrayWritable type similar to the ArrayWritable type. Hadoop MapReduce Cookbook

1. The big question is, does Avro play nicely withMapReduce? Avro comes with somemapper and reducer classes that you can subclass to work with Avro. Hadoop in Practice

1. This is the approach that GenericWritable takes, and you have to subclass it to specify which types to support. Hadoop：The Definitive Guide 3rd Edition

1. Who wants to worry about how files are divided into splits ? In creating your own InputFormat class you should subclass the FileInputFormat class, which takes care of file splitting. Hadoop in Action

1. For our own mapper implementations, we will subclass this base class and override the specified method as follows: Hadoop Beginners Guide

1. Hadoop MapReduce (specifically jobs that use input formats that subclass FileIn putFormat) wind up with a map task for each block of the file(s) processed as part of the job.1 This means the file block size can significantly impact the efficiency of a MapReduce job for better or worse. Hadoop Operations

51、单词 boolean 的匹配文本：

1. The edge value has type void (edges carry no information), and the messages are boolean. Pregel - A System for Large-Scale Graph Processing

1. The rereduce is a boolean value that is used to indicate when the reduce function is being called by itself (i.e., in rereduce mode) Getting Started with CouchDB

1. In particular, boolean values can be set using yes, true, and on for a positive value and their opposites, no, false, and off for a negative value. Monitoring with Ganglia

1. A query can also contain Boolean operators that connect terms to each other forming clauses. Mastering ElasticSearch

1. Boolean: This is a logical data type having one of two values: true or false. Beginning CouchDB

1. Within a special construct, called a when statement, one defines a quantifier, a variable, and a boolean condition using the variable. Interpreting the Data Parallel Analysis with Sawzall

1. We are planning to add simple features supported by commercial search engines like boolean operators. The anatomy of a large-scale hypertextual Web search engine

1. A key is the name of a field or property, and a value can be a string, a number, a boolean, another object, an array of values, or some other specialized type such as a string representing a date or an object. Elasticsearch The Definitive Guide

1. These are number, Unicode string, boolean, array, object, and null. Cassandra The Definitive Guide

1. Methods are provided for working with many primitive Java types for header values including Boolean, byte, short, int, long, float, double, and also the String object type. ActiveMQ in Action

52、单词 permissions 的匹配文本：

1. Instead, create a CouchDB user (couchdb) and set the ownership and permissions of the CouchDB database and support files to the configured user. Getting Started with CouchDB

1. Does the Apache error log contain any errors? Look for errors about file permissions, missing PHP modules, and the like. Monitoring with Ganglia

1. The scheduler can see the entire state of the cell and has complete freedom to lay claim to any available cluster resources provided it has the appropriate permissions and priority – even ones that another scheduler has already acquired. Omega - flexible, scalable schedulers for large compute clusters

1. The Gradle wrapper also takes care of unpacking the distribution and setting the appropriate permissions to execute the batch file. Gradle in Action

1. Ensure the cassandra user has ownership of this directory and the directory has appropriate file access permissions: Cassandra High Performance Cookbook

1. I will now show how to change to the source subdirectory inside this new directory and change the permissions of a few files to make sure they are ready for the build process: Beginning CouchDB

1. To allow the files in different directories to be served from different Chubby masters, we do not expose operations that can move files from one directory to another, we do not maintain directory modified times, and we avoid path-dependent permission semantics (that is, access to a file is controlled by the permissions on the file itself rather than on directories on the path leading to the file) Chubby - The lock service for loosely-coupled distributed systems

1. Finally, create a couple of directories for Cassandra to store its data and logs, and give them the proper permissions, as shown here: Cassandra The Definitive Guide

1. Change the permissions on the file to be accessible only by the owner. Hadoop Beginners Guide

1. For example, does user X have the necessary permissions to execute program Y on system Z? Such privileges are often referred to as access control lists (ACLs) and determine who or what can access a given resource to perform a given operation. ActiveMQ in Action

53、单词 roo 的匹配文本：

1. A special thanks goes out to Raymond De Roo, Brian Wohlgemuth, David Calafrancesco, Tera Doty, Jay Rubin, Bill Catlan, Anthony Howe, Mark O’Neal, George Montgomery, George Barber, and the myriad other people who patiently listened to me gripe about things, let me bounce ideas off them to see whether an outsider could understand what I was trying to say, or just managed to bring a smile to my face when I needed it most. High Performance MySQL 3rd Edition

1. In each new version, more features are being added, such as Groovy scripting language support, Spring Roo support, and SpringSource tcServer (an application server with paid support offered by SpringSource that was built on top of the Tomcat server) support. Pro Spring 3

1. Ruby on Rails has been a particularly rich source of such ideas—Spring Roo and Grails are essentially Rails clones, bringing Rails-like development to Java and Groovy, respectively. Spring in Practice

1. Spring Roo and GWT If you are new to using GWT or just want a rapid way to create GWT applications, Spring Roo provides scaffolding support for GWT and simplifies integrating Spring and GWT. Spring Security 3.1

1. Later, we will build some advanced data-driven tests using different data sources in JUnit and a data-driven test in Ruby using Roo, and Python. Selenium Testing Tools Cookbook

1. Our pain is your gain, and that includes all of our hard work with code that was written the night before, identifying bugs for the Roo team to fix, and working with the fantastic community of readers we have in Manning’s MEAP program, aligned as well with completing the manuscript around the time of the Roo 1.2.1 release. Spring Roo in Action

1. The emphasis on practical development is also evident in the chapter on Spring Roo, the rapid application development (RAD) solution from the Spring team. Spring Data Modern Data Access for Enterprise Java

54、单词 apis 的匹配文本：

1. Megastore emphasizes cost-transparent APIs with runtime costs that match application developers' intuitions. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. First, one of the major components of gmond is the APR library, a cross-platform interface intended to provide a set of APIs to common platform functionality in a common and predictable manner. Monitoring with Ganglia

1. In this section, we’ll take a look at the function APIs, and talk about how functions in a design document are related within applications. CouchDB The Definitive Guide

1. In addition to that, it explains the batch APIs available in ElasticSearch and shows how to use filters to optimize your queries. Mastering ElasticSearch

1. The goal of FlumeJava is to allow a programmer to express his or her data-parallel computation in a clear, modular way, while simultaneously executing it with performance approaching that of the best possible hand-optimized programs written directly against MapReduce APIs. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Many web services and feed providers are making their APIs available with the option to return a response in JSON format. Beginning CouchDB

1. The resulting system also includes code to collect traces, tools to visualize them, and libraries and APIs (Application Programming Interfaces) to analyze large collections of traces. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. In this chapter we will discuss the APIs that we use to create, retrieve, update. Elasticsearch The Definitive Guide

1. Although they are often not stored directly in their “native” format, developers can work with the documents using APIs as if they were. Cassandra The Definitive Guide

1. Although ActiveMQ is written in Java, APIs for many languages other than Java are provided, including C/C++, .NET, Perl, PHP, Python, Ruby, and many more. ActiveMQ in Action

55、单词 asynchronous 的匹配文本：

1. This new asynchronous design leaves the input thread blissfully unaware of any downstream problem. Apache Flume Distributed Log Collection for Hadoop

1. Use this in cases where you don’t trust user input or where you need to trigger an asynchronous job. CouchDB The Definitive Guide

1. The merge process is asynchronous and in general, should not interfere with indexing and searching, from a Lucene point of view. Mastering ElasticSearch

1. With consistency thus relaxed, clients can buffer small writes and submit only large, asynchronous writes to OSDs; the only latency seen by applications will be due to clients which fill their caches waiting for data to flush to disk. Ceph A Scalable, High-Performance Distributed File System

1. The system can be configured to perform either synchronous or asynchronous writes. Cassandra A Decentralized Structured Storage System

1. You could then leverage these languages to output the data into JSON, which can be returned to JavaScript as the response to an asynchronous HTTP request. Beginning CouchDB

1. When computation is deferred or made asynchronous, most Google developers use a common control flow library to construct callbacks and schedule them in a thread pool or other executor. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Ironically, such programmers are usually wrong, especially when they use locks in a distributed system; few consider the effects of independent machine failures on locks in a system with asynchronous communications. Chubby - The lock service for loosely-coupled distributed systems

1. Framed Transport was added to Thrift in order to support asynchronous servers. Cassandra The Definitive Guide

1. Apache ActiveMQ is one such solution, providing the ability for applications to communicate in an asynchronous, loosely coupled manner. ActiveMQ in Action

56、单词 mvc 的匹配文本：

1. This decouples the sources of data, the services, from the visual consumers of the data, facilitating use of the Model- View-Controller (MVC) pattern. Thinking in Java 4th Edition

1. Figure 3.4 illustrates the MVC architecture pattern in the context of a Java web application. Gradle in Action

1. Think of Application as a controller of the MVC model that handles all the requests. Scala in Action

1. In fact, event-based architectures help to enforce the separation of concerns and modularity that MVC advocates. Testable JavaScript

1. It’s far from complete, but it should feel familiar to anyone who has worked with any of the popular frameworks, such as Spring MVC. Functional Programming Patterns in Scala and Clojure

1. The use of the Active Record pattern as a model in MVC is a good example of a leaky abstraction. SQL Antipatterns

1. Rails was so much easier to use than Spring MVC, Hibernate, and the other “streamlined” Java web frameworks. Beginning Scala

1. It accomplishes this flexibility by using a Model View Controller (MVC) design, which allows advanced features such as form validation against an XML schema document, and in-page data refreshes. Cassandra The Definitive Guide

1. Swing is based on the Model View Controller (MVC) architectural framework pattern. Java 7 Recipes A Problem-Solution Approach

1. Finally, Listing 6-25 shows the last part of our MVC architecture, the JSP page to render the results. Pro JPA2

57、单词 google 的匹配文本：

1. Creating mountains of data has become very easy, thanks to Facebook, Twitter, Amazon, digital cameras and camera phones, YouTube, Google, and just about anything else you can think of connected to the Internet. Apache Flume Distributed Log Collection for Hadoop

1. HBase is an Apache-licensed open source system modeled after Google’s Bigtable. Monitoring with Ganglia

1. We used some simple JavaScript and Google Groups to allow painless commenting. CouchDB The Definitive Guide

1. For example, this is what Google shows us when we enter elasticsaerch instead of elasticsearch: Mastering ElasticSearch

1. Similarly, the Google File System [7] is optimized for very large files and a workload consisting largely of reads and file appends. Ceph A Scalable, High-Performance Distributed File System

1. The Google File System (GFS)[9] is another distributed file system built for hosting the state of Google’s internal applications. Cassandra A Decentralized Structured Storage System

1. Google use an implementation of map/reduce in its web index. Beginning CouchDB

1. Before Chubby was deployed, most distributed systems at Google used ad hoc methods for primary election (when work could be duplicated without harm), or required operator intervention (when correctness was essential) Chubby - The lock service for loosely-coupled distributed systems

1. Chapter 12, Integrating Hadoop In this chapter, written by Jeremy Hanna, we put Cassandra in a larger context and see how to integrate it with the popular implementation of Google’s Map/Reduce algorithm, Hadoop. Cassandra The Definitive Guide

1. Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. Bigtable - A Distributed Storage System for Structured Data

58、单词 ruby 的匹配文本：

1. By implementing a similar module with Perl, Ruby, or PHP, gmond would instantly gain the ability to be extended by implementing gmond metric modules in those languages as well. Monitoring with Ganglia

1. In later chapters, we’ll focus on using CouchDB from server-side languages such as Ruby and Python. CouchDB The Definitive Guide

1. Even though it wasn’t part of the core Maven project, a similar approach was proposed by the project Maven Polyglot that allows you to write your build definition logic, which is the project object model (POM) file, in Groovy, Ruby, Scala, or Clojure. Gradle in Action

1. For example, generating Ruby binding may require ruby and ruby-devel packages to be installed. Cassandra High Performance Cookbook

1. You will also see how to work with CouchDB from your Python and Ruby applications. Beginning CouchDB

1. The results, shown in Table 1, demonstrate Sawzall is significantly faster than Python, Ruby, or Perl, at least for these microbenchmarks. Interpreting the Data Parallel Analysis with Sawzall

1. If you’re familiar with the methodMissing and propertyMissing facilities of Groovy or Ruby’s method_miss ing, you’ve already got the idea. Gradle Beyond The Basics

1. But because of the interface used to access Cassandra, you can use Cassandra from a wide variety of languages, including C#, Scala, Python, and Ruby. Cassandra The Definitive Guide

1. We will also show you a few programs in Ruby, but these are usually only to demonstrate language independence, and you don't need to be a Ruby expert. Hadoop Beginners Guide

1. Although ActiveMQ is written in Java, APIs for many languages other than Java are provided, including C/C++, .NET, Perl, PHP, Python, Ruby, and many more. ActiveMQ in Action

59、单词 sqoop 的匹配文本：

1. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook

1. It’s been four years since, via a post to the Apache JIRA, the first version of Sqoop was released to the world as an addition to Hadoop. Apache Sqoop Cookbook

1. Sqoop is a connectivity tool for moving data between relational databases and data warehouses and Hadoop. Professional Hadoop Solutions

1. Sqoop is an Apache project that is part of the broader Hadoop ecosphere. Hadoop RealWorld Solutions Cookbook

1. Finally, we’ll cover how to use Sqoop for database ingress and egress activities, and we’ll look at how to ingress and egress data in HBase. Hadoop in Practice

1. Sqoop A tool for efficient bulk transfer of data between structured data stores (such as relational databases) and HDFS. Hadoop：The Definitive Guide 3rd Edition

1. HBase Hadoop database for random read/write access Hive SQL-like queries and tables on large data sets Pig Dataflow language and compiler Oozie Workflow for interdependent Hadoop jobs Sqoop Integrates databases and data warehouses with Hadoop. HBase：The Definitive Guide

1. Wouldn't it be much better to use an existing tool for the purpose? There is indeed such a tool that we will use throughout the rest of this chapter, Sqoop. Hadoop Beginners Guide

1. Apache Sqoop Not only does Hadoop not want to replace your database, it wants to be friends with it. Hadoop Operations

60、单词 innodb 的匹配文本：

1. The innodb options help in dealing with the small, primary key size restriction of MySQL. Web Crawling and Data Mining with Apache Nutch

1. We cover SHOW INNODB STATUS in particular detail, because it provides deep insight into the operations of the InnoDB transactional storage engine. High Performance MySQL 3rd Edition

1. They span the spectrum from simple in-memory key/value stores to complicated SQL-speaking MySQL/InnoDB derivatives. MongoDB The Definitive Guide 2nd Edition

1. InnoDB - The MySQL engine once supported by Riak, but now deprecated. A Little Riak Book

1. Transaction logging One compromise between speed and durability can be seen in MySQL’s InnoDB. MongoDB in Action

1. Chapter 11, Transactions Lists the main issues required to support transactions, and uses InnoDB to illustrate the typical architecture used to provide that support. Understanding MySQL Internals

61、单词 caching 的匹配文本：

1. Cache-control The cache control HTTP response header provides a suggestion for client caching mechanisms on how to treat the returned information. Getting Started with CouchDB

1. In a high-availability deployment with proper caching, this means that a given show function will be called only once for any particular document, and the CouchDB server may not even be contacted for subsequent requests. CouchDB The Definitive Guide

1. Using Cassandra for distributed caching Cassandra has several variables that can be configured for each keyspace and column family that drastically change the profile of how it operates. Cassandra High Performance Cookbook

1. Ceph could allow consistency to be further relaxed by caching metadata longer, much like earlier versions of NFS, which typically cache for 30 seconds. Ceph A Scalable, High-Performance Distributed File System

1. In order to make the searches fast Cassandra provides certain hooks for intelligent caching of data. Cassandra A Decentralized Structured Storage System

1. Cache documents and query results using memcached or another caching system. Scaling CouchDB

1. More significantly, objects in PAST are immutable, facilitating cryptographic protection and simplifying consistency and caching, but limiting the systems usefulness as a general storage service. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. Older filters will never be reused so caching is disabled by default. Elasticsearch The Definitive Guide

1. But there are a number of things, including memory settings, data storage, hardware choices, caching, and buffer sizes, that you can tune to squeeze out even more performance. Cassandra The Definitive Guide

1. Chapter 5 discusses message persistence in ActiveMQ; how messages are stored for queues and topics, the four styles of message stores available, and message caching. ActiveMQ in Action

62、单词 scheduler 的匹配文本：

1. Extending gmond with gmetric The gmetric utility, which is distributed with the monitoring core, as well as being available in multiple native variants for various programming languages, allows us to submit metrics values without having to rely on either the scheduler present in gmond, or the constraints of the native shared library or Python modules on which it depends. Monitoring with Ganglia

1. Abstract Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. Omega - flexible, scalable schedulers for large compute clusters

1. You could use this feature to synchronize database servers within a cluster or between data centers using a job scheduler such as cron, or you could use it to synchronize data with your laptop for offline work as you travel. CouchDB The Definitive Guide

1. Each job consists of a set of tasks, and is mapped by the scheduler to a set of available machines within a cluster. MapReduce - Simplied Data Processing on Large Clusters

1. The concurrent merge scheduler This is a merge scheduler that will use multiple threads in order to perform segments merging. Mastering ElasticSearch

1. Somewhat akin to FlumeJava’s executor, the Cascading evaluator breaks the dataflow graph into pieces, and, if possible, runs those in parallel, using the underlying Hadoop job scheduler. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Normally, users will have access to an enterprise-wide task scheduler or configuration management system. Cassandra High Performance Cookbook

1. This provides our low-level disk scheduler with longer I/O queues and a corresponding increase in scheduling efficiency. Ceph A Scalable, High-Performance Distributed File System

1. We also restrict the Dapper daemon to the lowest possible priority in the kernel scheduler in case CPU contention arises within a heavily-loaded host machine. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Cron is a well-known job scheduler on Unix systems, and it uses an expression string to denote when a job should be scheduled. ActiveMQ in Action

63、单词 initialization 的匹配文本：

1. Multiple parameters can be passed to the module’s initialization function by including one or more param sections. Monitoring with Ganglia

1. Depending on how recently the partition was compacted, this should result in efficient, low-CPU initialization of a new node. CouchDB The Definitive Guide

1. GRADLE’S BUILD LIFECYCLE PHASES Whenever you execute a Gradle build, three distinct lifecycle phases are run: initialization, configuration, and execution. Gradle in Action

1. A recent problem we experienced was a bug in machine initialization code that caused processor caches to be disabled: computations on affected machines slowed down by over a factor of one hundred. MapReduce - Simplied Data Processing on Large Clusters

1. Then we have the constructor, which is only used for super class initialization; so we can skip discussing it. Mastering ElasticSearch

1. If, for some reason, the server is not running and you get an error message when you try the previous line, it may be an issue with the initialization script. Beginning CouchDB

1. The next declaration has a more complicated initialization expression that uses some built-in functions to extract the cardinal number of the minute of the week from the time value. Interpreting the Data Parallel Analysis with Sawzall

1. During the initialization phase, Gradle starts up and locates the build files it must process. Gradle Beyond The Basics

1. Plugins are loaded during initialization, and no plugins are included by default. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Every time Gradle executes a build, it runs through three lifecycle phases: initialization, configuration, and execution. Building and Testing with Gradle

64、单词 config 的匹配文本：

1. You aren’t averse to the idea of maintaining a config file on all of your hosts. Monitoring with Ganglia

1. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON. CouchDB The Definitive Guide

1. Many of these Gradle plugins require a configuration file located in a directory config/
  
  that defines the rules for the analysis. Gradle in Action

1. Suppose we want to work with a directory named config, in our build script. Gradle Effective Implementation Guide

1. If CouchDB is installed on your local computer, you can open the Futon configuration utility by pointing your web browser to the address http://127.0.0.1:5984/_utils/config.html. Beginning CouchDB

1. You’ll connect to the Flume Master UI, and select the config menu from the top, as highlighted in figure 2.7 Hadoop in Practice

1. Additionally, configuring analyzers at node level requires changing the config file on every node and restarting every node which becomes a maintenance nightmare. Elasticsearch The Definitive Guide

1. But this YAML file is intended only for users upgrading their config files from XML to YAML. Cassandra The Definitive Guide

1. An embedded broker using an external configuration file can be achieved using the brokerConfig transport option and by specifying the URI for the activemq.xml file. ActiveMQ in Action

1. The nested configurations via config and fileset directly correspond to the Checkstyle Ant equivalents as if this was being configured with Ant’s XML syntax. Building and Testing with Gradle

65、单词 classpath 的匹配文本：

1. Note that if the
  
  directory is specified, then it is always included first in the classpath. Apache Flume Distributed Log Collection for Hadoop

1. You tell the compiler about it by referencing the library’s JAR file in the classpath. Gradle in Action

1. Next, we must add the output directory with the compiled interface class file, to the compileClasspath property of the main source set. Gradle Effective Implementation Guide

1. Also, ensure that the test.classpath is included in the test target. Cassandra High Performance Cookbook

1. The Java compiler has a command-line switch for specifying the classpath, and that switch must be provided with an operatingspecific string. Gradle Beyond The Basics

1. For example, the following statements in pig-env.sh will tell Pig the version of Hadoop used by the cluster is Pig’s classpath: Hadoop in Action

1. This probably means that you started Cassandra directly from within the bin directory, and it therefore sets up its Java classpath incorrectly and can’t find the CliMain file to start the client. Cassandra The Definitive Guide

1. Time for action – setting up the classpath To compile any Hadoop-related code, we will need to refer to the standard Hadoop-bundled classes. Hadoop Beginners Guide

1. In order to run the clients using the HTTP transport protocol, one dependency must be added to the classpath. ActiveMQ in Action

1. This directory is automatically compiled and added to the build classpath. Building and Testing with Gradle

66、单词 superclass 的匹配文本：

1. The existing class is called the superclass, base class, or parent class. Core Java Volume I Fundamentals 9th Edition

1. Many of the methods used here come from the superclass AbstractCopyTask, as shown in figure 4.9 Gradle in Action

1. Any class that implements an interface must provide an implementation for each method defined by the interface or inherit the implementation from a superclass. Java 8 in Action Lambdas, Streams and Functional-style Programming

1. This implementation first leverages the superclass (FileInputFormat) to get splits with locations calculated to ensure data locality. Professional Hadoop Solutions

1. The basic idea in Java is that we can implement a generic class by using an appropriate superclass, such as Object. Data Structures and Algorithm Analysis in Java 3rd Edition

1. Similarly, all classes inheriting from a superclass inherit the validation behavior as well. Java EE 7 Essentials

1. The fingerprint is obtained by ordering the descriptions of the class, superclass, interfaces, field types, and method signatures in a canonical way, and then applying the so-called Secure Hash Algorithm (SHA) to that data. Core Java Volume II Advanced Features 9th Edition

1. All constructors must invoke a superclass constructor, explicitly or implicitly, and a subclass would have no accessible superclass constructor to invoke. Effective Java 2nd Edition

1. So the synchronized modifier is not automatically inherited when subclasses override superclass methods, and methods in interfaces cannot be declared as synchronized. Concurrent Programming in Java Design Principles and Pattern

1. All arrays, regardless of the type of data they hold, have Object as their superclass. Java 7 Recipes A Problem-Solution Approach

67、单词 plugins 的匹配文本：

1. The community is quite strong and there are many plugins for other applications available. Apache Flume Distributed Log Collection for Hadoop

1. Operators denote criticality The operators specified in the Nagios definitions for the Ganglia plugins always indicate the “critical” state. Monitoring with Ganglia

1. He has been a long term Gradle contributor and is the author of several popular Gradle plugins. Gradle in Action

1. We can write tasks, apply plugins, and write build logic using the Gradle build language. Gradle Effective Implementation Guide

1. Chapter 9, Developing ElasticSearch plugins, covers ElasticSearch plugins development by showing and deeply describing how to write your own river and language plugin. Mastering ElasticSearch

1. Plugins are loaded during initialization, and no plugins are included by default. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Facebook Insights provides developers and website owners with access to real-time analytics related to Facebook activity across websites with social plugins, Facebook Pages, and Facebook Ads. Apache Hadoop Goes Realtime at Facebook

1. Flume interceptors are plugins that can manipulate and modify events before they pass from the source to the channel. Hadoop Beginners Guide

1. The
  
  element provides the plugins element for declaring plug-ins. ActiveMQ in Action

1. Gradle’s built-in tasks and plugins are enough for many users to script their builds without any custom code, but not always. Building and Testing with Gradle

68、单词 mahout 的匹配文本：

1. Mahout in Action aims to be accessible, to unearth the interesting nuggets of insight for the enthusiast, and to save the practitioner time in getting work done. Mahout in Action

1. Apache Lucene along with Mahout and Solr were developed to address these needs. Scaling Big Data with Hadoop and Solr Chapter-02 Understanding Solr

1. The Taste project (a recommendation engine) has become a part of Mahout and is included in the 0.1 release. Pro Hadoop

1. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook

1. Mahout — This is a machine-learning and data-mining library that provides MapReduce implementations for popular algorithms used for clustering, regression testing, and statistical modeling. Professional Hadoop Solutions

1. This book covers unloading/loading to and from HDFS, graph analytics with Giraph, batch data analysis using Hive, Pig, and MapReduce, machine-learning approaches with Mahout, debugging and troubleshooting MapReduce jobs, and columnar storage and retrieval of structured data using Apache Accumulo. Hadoop RealWorld Solutions Cookbook

1. Chapter 9, Mass Text Data Processing, explains how to use Hadoop and Mahout to process large text datasets, and how to perform data preprocessing and loading operations using Hadoop. Hadoop MapReduce Cookbook

1. And more thanks go to Josh Patterson, who reviewed my Mahout chapter. Hadoop in Practice

1. Apache Mahout (http://mahout.apache.org/) is a project to build machine-learning libraries (such as classification and clustering algorithms) that run on Hadoop. Hadoop：The Definitive Guide 3rd Edition

1. The Cloudera distribution is available at http://www.cloudera.com/hadoop and contains a large number of Apache products, from Hadoop itself, Hive, Pig, and HBase through tools such as Sqoop and Flume, to other lesser-known products such as Mahout and Whir. Hadoop Beginners Guide

69、单词 enum 的匹配文本：

1. The ordinal method yields the position of an enumerated constant in the enum declaration, counting. Core Java Volume I Fundamentals 9th Edition

1. Each action command is mapped to an enum called CommandLineInput. Gradle in Action

1. The other values of AnalyzerScope enum are GLOBAL and INDEX. Mastering ElasticSearch

1. Enum has a name, an optional namespace, document, and alias, and contains a list of symbols (valid JSON strings) Professional Hadoop Solutions

1. In the CounterExample job, an Enum for tracking the count of each type of bad record was defined: Hadoop RealWorld Solutions Cookbook

1. This method returns a ReturnCode, which is an enum defined as a part of the Filter interface. HBase in Action

1. The set of counters in an enum will form a group of counters. Hadoop MapReduce Cookbook

1. The counter enum you’ll use to set the number of hops when you’ve reached the target node. Hadoop in Practice

1. ObjectWritable is a general-purpose wrapper for the following: Java primitives, String, enum, Writable, null, or arrays of any of these types. Hadoop：The Definitive Guide 3rd Edition

1. The second form uses a Java enum to specify counter names, which forces you to have them defined at compile time, but it also allows for type checking. Hadoop in Action

70、单词 proofreaders 的匹配文本：

1. Special thanks go to Ivan Kirkpatrick and Clint Combs, my technical proofreaders, for their in-depth feedback on the text and the code. Scala in Action

1. We hope your other projects aren’t as exciting as ours! Thanks also to our technical editor Mark Henry Ryan and our technical proofreaders Jerry Kuch and Kristine Kuch. HBase in Action

1. These will be cleaned up during production of the book by copyeditors and proofreaders. Elasticsearch in Action MEAP V07 cHAPTER 1

1. To our technical proofreaders, Neale Upstone and Doug Warren: your careful attention to the configuration and code has been greatly appreciated. Spring Integration in Action

1. We wish to thank our production team of Mary Piergies; maestro Troy Mott and his band of merry editors: Ben Berg, Tara McGoldrick, and Bob Herbstman; our talented proofreaders: Katie Tennant and Alyson Brener; and others behind the scenes whom we are not able to name. Spring Roo in Action

71、单词 copyeditors 的匹配文本：

1. These will be cleaned up during production of the book by copyeditors and proofreaders. Elasticsearch in Action MEAP V07 cHAPTER 1

72、单词 iterator 的匹配文本：

1. It can inspect and modify the values of out-edges using methods supplied by the out-edge iterator. Pregel - A System for Large-Scale Graph Processing

1. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON. CouchDB The Definitive Guide

1. The intermediate values are supplied to the user’s reduce function via an iterator. MapReduce - Simplied Data Processing on Large Clusters

1. Each Reducer invocation is passed a key and an iterator over all the values associated with that key, and emits zero. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. It is worth noting that by making the values available as an iterator, Hadoop gives the programmer a chance to process the data without storing them in memory. Hadoop MapReduce Cookbook

1. The MapReduce framework reuses iterator value objects supplied to combine. Hadoop in Practice

1. In addition, the reduce() method no longer accepts the list of values as an Iterator but as an Iterable, which is easier to iterate through using Java’s foreach syntax. Hadoop in Action

1. The Reducer interface uses an Iterator object instead of an Iterable object; this was changed as the latter works with the Java for each syntax and makes for cleaner code. Hadoop Beginners Guide

1. Percolator provides a low-level Get/Set/iterator API rather than a high-level SQL interface, so we created indexes and did all the ‘query planning’ by hand. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

1. Sometime you just need an iterator, and other times you’d like to express a moderately complex conditional without resorting to major build file surgery. Building and Testing with Gradle

73、单词 username 的匹配文本：

1. This will prompt you for a username and password that will be given administration rights. Getting Started with CouchDB

1. When a user successfully authenticates, a hash is generated from the username and a secret key and is stored in a cookie and made available to the rest of gweb. Monitoring with Ganglia

1. You can even do things like pass the username into the list using a query parameter (but it’s not recommended, as you’ll ruin cache efficiency) CouchDB The Definitive Guide

1. It requires you to fill out your email address, name, and password, as well as a username and domain. Gradle in Action

1. It is not a good idea to store the username and password as plain text in the build file; this is because anyone can read our password, if stored in plain text. Gradle Effective Implementation Guide

1. Each of your testing clients should use the same username for running tests. Scaling CouchDB

1. We assign values to the url, username, pass word, changeLog, and command properties through a very standard assignment syntax. Gradle Beyond The Basics

1. For example, if you have a User column family and you want to find users in a particular city, you might create a second column family called UserCity that stores user data with the city as keys (instead of the username) and that has columns named for the users who live in that city. Cassandra The Definitive Guide

1. Obviously, for authentication purposes, each user must have a username and a password. ActiveMQ in Action

1. Because most of the properties have defaults (some of which, like username and password, are obviously specific to the build), each invocation of the task has very little to configure. Building and Testing with Gradle

74、单词 jdk 的匹配文本：

1. As a prerequisite, make sure you’ve already installed the JDK with a version of 1.5 or higher. Gradle in Action

1. Suppose we have both OpenJDK and Oracle Java installed in /usr/openjdk and / usr/jdk respectively. Hadoop Operations and Cluster Management Cookbook

1. What you need for this book In order to work with Gradle and the code samples in the book, we need at least a Java Development Kit (JDK 1.5 or higher), Gradle, and a good text editor. Gradle Effective Implementation Guide

1. In order to check what Java processes are running and what their virtual machines identifiers are, one can just run a jps command which is provided with most JDK distributions. Mastering ElasticSearch

1. JAVA_HOME must be set to point to a JDK, not to a JRE. Hadoop MapReduce Cookbook

1. If a current JDK isn’t in the system path you’ll want to come here to configure your JAVA_HOME. Hadoop in Practice

1. You can download the latest JDK for other operating systems from Sun at http://java.sun.com/javase/downloads/index.jsp. Hadoop in Action

1. Ivy requires Ant, and building from source requires the complete JDK, version 1.6.0_20 or better, not just the JRE. Cassandra The Definitive Guide

1. Time for action – checking the prerequisites Hadoop is written in Java, so you will need a recent Java Development Kit (JDK) installed on the Ubuntu host. Hadoop Beginners Guide

1. Here’s an example of this using the keytool that comes with the JDK: ActiveMQ in Action

75、单词 timestamp 的匹配文本：

1. The headers are key/value pairs that can be used to make routing decisions or carry other structured information (such as the timestamp of the event or hostname of the server where the event originated) Apache Flume Distributed Log Collection for Hadoop

1. Allowed options are now (uses current system time), UNIX timestamp, or any other well-formed date, as supported by PHP’s strtotime function. Monitoring with Ganglia

1. Instead of a string, we are going to use an array, where individual members are part of a timestamp in decreasing significance. CouchDB The Definitive Guide

1. This method allows you to set your own value for this timestamp. Mastering ElasticSearch

1. The CLI uses microseconds since epoch time as the value of timestamp when setting columns. Cassandra High Performance Cookbook

1. When users were interested in individual machines, they were also interested in a specific service, so we eventually combined the two into a composite index which allows for efficient lookup by service name, host machine, and timestamp, in that order. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Instead, break a big reindex down into smaller jobs by filtering on a date or timestamp field: Elasticsearch The Definitive Guide

1. Putting this all together, we have the basic Cassandra data structures: the column, which is a name/value pair (and a client-supplied timestamp of when it was last updated), and a column family, which is a container for rows that have similar, but not identical, column sets. Cassandra The Definitive Guide

1. A few properties for the timestamp plug-in can be configured, as shown in table 11.5 ActiveMQ in Action

1. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Bigtable - A Distributed Storage System for Structured Data

76、单词 tcp 的匹配文本：

1. Syslog has an older UDP transport as well as a newer TCP protocol that can handle data larger than a single UDP packet can transmit (about 64k), as well as deal with network-related congestion events that might require the data to be retransmitted. Apache Flume Distributed Log Collection for Hadoop

1. It includes interactive query functionality and may be polled by external monitoring systems via a simple text protocol on TCP port 8652 Monitoring with Ganglia

1. This is curl telling us that it is going to establish a TCP connection to the CouchDB server we specified in our request URI. CouchDB The Definitive Guide

1. This recipe shows how to interpret what the following TCP connection graphs are showing. Cassandra High Performance Cookbook

1. In all tests, clients, OSDs, and MDSs are user processes running on a dual-processor Linux cluster with SCSI disks and communicating using TCP. Ceph A Scalable, High-Performance Distributed File System

1. All system control messages rely on UDP based messaging while the application related messages for replication and request routing relies on TCP. Cassandra A Decentralized Structured Storage System

1. The port attribute indicates the TCP/IP port number to use. Scaling CouchDB

1. A failure on the TCP socket results in a limited number of reconnect attempts before a failure is reported to the monitor cluster. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. It is accessible over a TCP/IP interface and is available from a variety of languages, including Java and web services. Cassandra The Definitive Guide

1. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action

77、单词 tuple 的匹配文本：

1. In the Pregel implementation of this algorithm the vertex value is a tuple of two values: a flag indicating which set the vertex is in (L or R), and the name of its matched vertex once known. Pregel - A System for Large-Scale Graph Processing

1. In addition, the values flowing through a Cascading pipeline are special untyped “tuple” values, and Cascading operations focus on transforms over tuples; in contrast, a FlumeJava pipeline computes over arbitrary Java objects using arbitrary Java computations. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. A column is stored as a tuple of name, value, and timestamp. Cassandra High Performance Cookbook

1. The IsUseragentBot class extends the abstract class FilterFunc, which allows us to override the exec(Tuple t) method. Hadoop RealWorld Solutions Cookbook

1. A compiletime directive in Sawzall, proto, somewhat analogous to C’s #include directive, imports the DDL for a protocol buffer from a file and defines the Sawzall tuple type that describes the layout. Interpreting the Data Parallel Analysis with Sawzall

1. This selfdescribing coordinate tuple was originally introduced to the Java ecosystem by the Maven repository format. Gradle Beyond The Basics

1. It works in a way similar to this technique and captures all of the content between a start and end tag, supplying it as a single byte array field in a Pig tuple. Hadoop in Practice

1. A tuple is just like a row of data in a database table, with multiple fields in a particular order. Hadoop：The Definitive Guide 3rd Edition

1. A field in a tuple or a value in a map can be null or any atomic or complex type. Hadoop in Action

1. The MapReduce jobs consist of DAGs of relational operators (such as filter, select (project), join, aggregation) that operate as iterators: each operator forwards a data tuple to the next operator after processing it. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

78、单词 screenshot 的匹配文本：

1. The screenshot in Figure 10-10 shows the word cloud generated from a MapReduce view of CouchDB’s archived tweets. CouchDB The Definitive Guide

1. Figure 3.2 shows a screenshot of the user interface rendered in a browser. Gradle in Action

1. We have only one task, which is listed in the task tree, as shown in the following screenshot: Gradle Effective Implementation Guide

1. After that Paramedic shows us two primary shards, as given in the following screenshot: Mastering ElasticSearch

1. The following screenshot shows the current status of the HDFS installation including the number of nodes, total storage, storage taken by each node. Hadoop MapReduce Cookbook

1. User annotations are not shown in this screenshot, but they may be selectively included in the global time line on a spanby-span basis. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Figure 13.12 Screenshot of the counter in JobTracker’s job summary page. Hadoop in Practice

1. Figure 8-2 shows a screenshot of the keyspace and column family configuration information that the web. Cassandra The Definitive Guide

1. As shown in the preceding screenshot, click on the Create bucket button and enter a name for the new bucket. Hadoop Beginners Guide

1. Though the screenshot in figure 8.1 shows the project structure for the jms-webapp-local application, the directory structure for the other instances of the application is only slightly different. ActiveMQ in Action

79、单词 scalability 的匹配文本：

1. CouchDB doesn’t attempt to solve your scalability problems with any single solution, but instead provides you with a simple and flexible system that can be molded and adapted to your needs. Getting Started with CouchDB

1. We’ll show you how CouchDB’s schema-free document model is a better fit for common applications, how the built-in query engine is a powerful way to use and process your data, and how CouchDB’s design lends itself to modularization and scalability. CouchDB The Definitive Guide

1. The experience has made him a learned individual in technologies like optimization, high-availability, and scalability. Cassandra High Performance Cookbook

1. Abstract We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph A Scalable, High-Performance Distributed File System

1. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra A Decentralized Structured Storage System

1. You should test the scalability of your system before it is deployed. Scaling CouchDB

1. Our implementation offers excellent performance, reliability, and scalability while providing clients with the illusion of a single logical object store. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. We will also discuss how best to model your data to take advantage of the horizontal scalability of Elasticsearch, and how to configure. Elasticsearch The Definitive Guide

1. We encounter scalability problems when our relational applications become successful and usage goes up. Cassandra The Definitive Guide

1. Part 4 discusses advanced features in ActiveMQ such as high availability, scalability, many advanced broker and client features, performance tuning, and administration of ActiveMQ. ActiveMQ in Action

80、单词 constructors 的匹配文本：

1. The memory management is performed explicitly by constructors, assignment operators, and destructors. Core Java Volume I Fundamentals 9th Edition

1. Constructors All code that creates and launches a MapReduce job into a Hadoop cluster creates a JobConf object. Pro Hadoop

1. In contrast, restructuring is usually required in XQuery and object-oriented query languages, e.g., using nested for-loops and constructors. Dremel - Interactive Analysis of WebScale Datasets

1. Making an HBaseAdmin instance explicitly requires a Configuration instance, a detail hidden from you by the default HTable and HTablePool constructors. HBase in Action

1. A node contains the data and links to the previous and next nodes, along with appropriate constructors. Data Structures and Algorithm Analysis in Java 3rd Edition

1. The first part of the implementation is straightforward: there are two Text instance variables, first and second, and associated constructors, getters, and setters. Hadoop：The Definitive Guide 3rd Edition

1. It expects one or a list of Put objects that, in turn, are created with one of these constructors: HBase：The Definitive Guide

1. If you want buffering and the data input methods for a file, you need to use the following rather monstrous sequence of constructors: Core Java Volume II Advanced Features 9th Edition

1. It consists of nothing but C# properties for each of those three items, plus constructors. Cassandra The Definitive Guide

1. Also, constructors cannot be qualified as synchronized (although block synchronization can be used within constructors) Concurrent Programming in Java Design Principles and Pattern

81、单词 dataset 的匹配文本：

1. The vertex-centric approach is reminiscent of MapReduce in that users focus on a local action, processing each item independently, and the system composes these actions to lift computation to a large dataset. Pregel - A System for Large-Scale Graph Processing

1. She runs an MR job that cranks through the input data and produces a dataset containing the new signals, stored in billions of records in the distributed file system. Dremel - Interactive Analysis of WebScale Datasets

1. Routing Maybe routing will be handy in our case? The obvious gain from using routing is the possibility to create effective queries that return only books from our base dataset or data that belongs to our business collaborate (because routing allows us to hit only a part of our index) Mastering ElasticSearch

1. The uncompressed dataset is tens of terabytes, which is small compared to many NoSQL instances, but was large enough to cause difficulties with sharded MySQL. Spanner - Google’s Globally-Distributed Database

1. Elasticsearch is designed to work well with a small dataset and to scale to big. Elasticsearch The Definitive Guide

1. Recently written data should be available quickly, but the entire dataset will also be periodically scanned in order to perform timebased rollups. Apache Hadoop Goes Realtime at Facebook

1. Doing queries across the entire dataset was tedious and could take hours to run. Cassandra The Definitive Guide

1. Limited support (no joins) for UPDATE and DELETE is implemented by applying the update or delete criteria on the data to create a new dataset. Tenzing - A SQL Implementation On The MapReduce Framework

1. These enable a highly interactive Web UI where it is common that a single mouse click leads to processing a trillion values in the underlying dataset. PowerDrill - Processing a Trillion Cells per Mouse Click

1. Unlike Percolator, ElasTraS offers limited transactional semantics (Sinfonia-like mini-transactions) when dynamically partitioning the dataset and has no support for structuring computation. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

82、单词 daemon 的匹配文本：

1. All of these run inside a daemon called an agent. Apache Flume Distributed Log Collection for Hadoop

1. To collect the dendrite updates, you deployed a single instance of a daemon process, called an axon, Monitoring with Ganglia

1. This comes as a daemon for Twisted, a popular and high-performance event-driven network programming framework for Python. CouchDB The Definitive Guide

1. The previous code uses the EsExecutors class, which is an ElasticSearch way of running daemon threads inside the ElasticSearch nodes. Mastering ElasticSearch

1. TrueTime is implemented by a set of time master machines per datacenter and a timeslave daemon per machine. Spanner - Google’s Globally-Distributed Database

1. After startup, Cassandra detaches from the console and runs as a daemon. Cassandra High Performance Cookbook

1. Sending data and running queries by sending JSON objects over HTTP makes it easy for someone to extend anything, from a syslog daemon like rsyslog to a connecting framework like ManifoldCF, to interact with Elasticsearch. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Dapper penetration can be assessed in two dimensions: the fraction of production processes that can generate Dapper traces (i.e., those that are linked with Dapperinstrumented runtime libraries) and the fraction of production machines running Dapper’s trace collection daemon. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Add -d if you want to run it in the background as a daemon. Elasticsearch The Definitive Guide

1. JConsole showing the peak thread count for a Cassandra daemon. Cassandra The Definitive Guide

83、单词 topology 的匹配文本：

1. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. Pregel - A System for Large-Scale Graph Processing

1. This topology is adequate for most environments, but in some cases it is desirable to specify a few specific listeners rather than allowing every node to receive (and thereby waste CPU cycles to process) metrics from every other node. Monitoring with Ganglia

1. It has the advantage that it’s simple, which helps in understanding topology and diagnosing failures. CouchDB The Definitive Guide

1. Figure 11 shows the execution times for each query as a function of the server topology. Dremel - Interactive Analysis of WebScale Datasets

1. Our network topology is simple enough that “distances” can be accurately estimated from IP addresses. GFS - The Google File System

1. This node is responsible for managing the cluster state and process of assigning shards to nodes in reaction of changes in cluster topology. Mastering ElasticSearch

1. Instead, it uses an internal process called Gossip to communicate changes about the topology of the Ring between nodes. Cassandra High Performance Cookbook

1. Having covered each of the Hadoop daemons, we depict the topology of one typical Hadoop cluster in figure 2.3 Hadoop in Action

1. The new node will not immediately accept requests so that it has time to learn the topology of the ring and accept data that it may also be responsible for. Cassandra The Definitive Guide

1. Figure 4.5 An example of a complex network of brokers topology. ActiveMQ in Action

84、单词 partitioning 的匹配文本：

1. You could use multi-master, master/slave, partitioning, sharding, write-through caches, and all sorts of other complex techniques. CouchDB The Definitive Guide

1. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. MapReduce - Simplied Data Processing on Large Clusters

1. Like MapReduce, Dryad handles the details of communication, partitioning, placement, concurrency and fault tolerance. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Ceph utilizes a novel metadata cluster architecture based on Dynamic Subtree Partitioning [30] that adaptively and intelligently distributes responsibility for managing the file system directory hierarchy among tens or even hundreds of MDSs. Ceph A Scalable, High-Performance Distributed File System

1. Describing the details of each of the solutions is beyond the scope of this paper, so we will focus on the core distributed systems techniques used in Cassandra: partitioning, replication, membership, failure handling and scaling. Cassandra A Decentralized Structured Storage System

1. An alternative to automatic partitioning is to manually partition your documents into different databases by type of document. Scaling CouchDB

1. Partitioning your documents into different containers or shards, which can be stored on a single node or on multiple nodes. Elasticsearch The Definitive Guide

1. Partitioning is intended to enable large Chubby cells with little communication between the partitions. Chubby - The lock service for loosely-coupled distributed systems

1. And even if you can write such a huge check, you still need to carefully choose partitioning keys to the point where you can never entirely ignore the limitation. Cassandra The Definitive Guide

1. Finally we’ll examine traffic partitioning, which will balance scaling and performance, but will add more complexity to your ActiveMQ application. ActiveMQ in Action

85、单词 urls 的匹配文本：

1. The structure for the URLs has been standardized, and you should be able to both look at a URL that you are using to understand what it does, and to construct one to access the information that you want. Getting Started with CouchDB

1. As of Ganglia 3.3, hostnames are converted to lowercase, RRD files are created with lowercase filenames, and lowercase hostnames should be used in URLs to access the metrics. Monitoring with Ganglia

1. We’ll also briefly describe capabilities available on CouchDB’s roadmap, like replication filters, update handlers for parsing non-JSON input, and a rewrite handler for making application URLs more palatable. CouchDB The Definitive Guide

1. A document can have multiple Names, which are different URLs by which the document can be referenced. Dremel - Interactive Analysis of WebScale Datasets

1. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "What we would like to do is, use the BM25 similarity model for the name field and the contents field." Mastering ElasticSearch

1. This means you can manage the database by simply visiting URLs in your web browser, using command-line tools such as curl or, more importantly, via any programming language that supports HTTP requests. Beginning CouchDB

1. The following ProxyPassReverse directives instructs Apache to adjust the URLs in the HTTP response headers to match that of the proxy server, instead of the reverse proxied server. Scaling CouchDB

1. Indicates new terms, URLs, email addresses, filenames, and file extensions. Cassandra The Definitive Guide

1. README.txt—A file containing some URLs to documentation to get new users started with ActiveMQ. ActiveMQ in Action

1. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving) Bigtable - A Distributed Storage System for Structured Data

86、单词 jmx 的匹配文本：

1. Several Flume components report metrics to the JMX platform MBean server. Apache Flume Distributed Log Collection for Hadoop

1. Java Virtual Machine Metrics The sFlow Java Virtual Machine (JVM) metrics (Table 8-3) are based on the metrics exposed through the Java Management Extensions (JMX) interface, ensuring consistency with existing JMX-based monitoring systems. Monitoring with Ganglia

1. This allows for the use of JMX-aware applications to collect information about the state of the servers. Pro Hadoop

1. In this chapter, we will first introduce the management framework, Java Management Extension (JMX) for system monitoring. Hadoop Operations and Cluster Management Cookbook

1. The nodetool program communicates with the JMX port to confirm that the server is alive. Cassandra High Performance Cookbook

1. The collection mechanism for metrics is decoupled from the component that receives the updates, and there are various pluggable outputs, including local files, Ganglia, and JMX. Hadoop：The Definitive Guide 3rd Edition

1. HBase already had a number of metrics exported through JMX. Apache Hadoop Goes Realtime at Facebook

1. Cassandra has a rich Java Management Extensions (JMX) interface baked in, which we put to use to monitor all of these and more. Cassandra The Definitive Guide

1. Notably, the new metrics subsystem (referred to as metrics2) supports sending metrics to multiple plug-ins, filtering of metrics in various ways, and more complete support for JMX. Hadoop Operations

1. It shows how to configure ActiveMQ for JMX monitoring and demonstrates this using JConsole. ActiveMQ in Action

87、单词 unix 的匹配文本：

1. The Ganglia projects uses the autotools build system to detect the tools available on most Linux and UNIX-like environments and build the binaries. Monitoring with Ganglia

1. The first few appendixes showed you how to install CouchDB packages for Unix-like, Mac OS X, and Windows operating systems. CouchDB The Definitive Guide

1. We will discuss more about it in the Avoid swapping on Unix-like systems section in Chapter 6, Fighting with Fire. Mastering ElasticSearch

1. The defaults for a Unix/Linux system limit the number of open files. Cassandra High Performance Cookbook

1. It does not assume any knowledge in relation to CouchDB, although some experience with UNIX commands and basic JavaScript skills are beneficial. Beginning CouchDB

1. The input files argument accepts standard Unix shell file-name-matching metacharacters to identify the files to be processed. Interpreting the Data Parallel Analysis with Sawzall

1. You can check if it's there by using the env command on UNIX-like systems and the set command on Windows. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Chubby exports a file system interface similar to, but simpler than that of UNIX [22] Chubby - The lock service for loosely-coupled distributed systems

1. The clock is stored as a long (representing the milliseconds since the Unix epoch), so we could wrap this in a new java.util.Date object if we wanted to. Cassandra The Definitive Guide

1. Click on the link to the 5.4.1 release and you’ll find both tarball and zip formats available (the tarball is for Linux and Unix; the zip is for Windows) ActiveMQ in Action

88、单词 packt 的匹配文本：

1. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. Apache Flume Distributed Log Collection for Hadoop

1. Little did I know, when I was first asked by Packt Publishing whether I would be interested in writing a book about HBase administration on September 2011, how much work and stress (but also a lot of fun) it was going to be. HBase Administration Cookbook

1. Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Hadoop Operations and Cluster Management Cookbook

1. However, Packt Publishing cannot guarantee the accuracy of this information. Gradle Effective Implementation Guide

1. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Mastering ElasticSearch

1. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Cassandra High Performance Cookbook

1. I would like to commend the authors of this book for a job well done, and would like to thank Packt Publishing for the opportunity to assist in the editing of this publication. Hadoop RealWorld Solutions Cookbook

1. Thanks to the editorial staff at Packt, for providing me the opportunity to write this book and for providing feedback and guidance throughout the process. Hadoop MapReduce Cookbook

1. Here, you can access, read and search across Packt's entire library of books. Getting Started with Google Guava

1. Thank you, Packt Publishing for selecting me as one of the technical reviewers for this wonderful book. Hadoop Beginners Guide

89、单词 ids 的匹配文本：

1. We haven’t even considered complexities like postal addresses (or the fact that there might be more than one), and my contact record doesn’t include additional details like my Skype IDs, instant messaging accounts, or that I have a few more addresses than those listed above. Getting Started with CouchDB

1. The bigger picture we’re working on is like an app server, but different in one crucial regard: rather than let the developer do whatever he wants (loop a list of DocIDs and make queries, make queries based on the results of other queries, etc.), we’re defining “safe” transformations, such as view, show, list, and update. CouchDB The Definitive Guide

1. A Document has a required integer DocId and optional Links, containing a list of Forward and Backward entries holding DocIds of other web pages. Dremel - Interactive Analysis of WebScale Datasets

1. The columns are sorted chronologically, using unique IDs (UUIDs) for column names to avoid clashes. Cassandra High Performance Cookbook

1. This unique ID can be assigned by the user or application, or it can use a universally unique identifier (UUID)—a random number generated by CouchDB that greatly reduces the chance of duplicate IDs being used. Beginning CouchDB

1. CouchDB will perform best with document IDs that are mostly monotonic (in simpler terms, mostly sequential) Scaling CouchDB

1. A trace context is a small and easily copyable container of span attributes such as trace and span ids. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Each shard returns the doc IDs and sort values of all of the docs in its priority queue to the coordinating node, Node 1, which merges these values into its own priority queue to produce a globally sorted list of results. Elasticsearch The Definitive Guide

1. It is still a common design requirement to store IDs related to other entities in your tables, but operations such as cascading deletes are not available. Cassandra The Definitive Guide

1. Its speed is the result of the combination of a fast transactional journal comprised of data log files, the highly optimized indexing of message IDs, and inmemory message caching. ActiveMQ in Action

90、单词 aop 的匹配文本：

1. If you are familiar with the AOP Spring Framework, it is similar to a MethodInterceptor. Apache Flume Distributed Log Collection for Hadoop

1. Because of this, mixins have part of the flavor of aspect-oriented programming (AOP), and aspects are often suggested to solve the mixin problem. Thinking in Java 4th Edition

1. However, the good news for Spring and AOP developers is that starting from version 2.0, Spring provides much tighter integration with AspectJ. Pro Spring 3

1. In this chapter, we will cover dependency injection and AOP. Pro Spring MVC

1. The main points you will see throughout the different projects is the use of DI, XML namespace-based configuration, and AOP, which as you will see in the next section, are the pillars upon which Spring is built on. Pro Spring Security

1. Advising the Build Graph In the early 2000s, the paradigm of aspect-oriented programming (AOP) became a marginally popular way to manage the complexity of enterprise Java software. Gradle Beyond The Basics

1. You can think of observer coprocessors as analogous to triggers from a relational database or to advice from aspect-oriented programming (AOP) HBase in Action

1. The DTD bean and property mechanism can’t hide complex bean creation, which is a shortcoming in configuring advanced features like aspect-oriented programming (AOP) and security. Spring Batch in Action

1. The goal of AOP is to enable systemic changes of the same kind across many modules, while avoiding copying and pasting the same code over and over into each location. Programming Scala

1. Using retry advice on a data access layer via Spring’s AOP (aspect-oriented programming) support is one way to implement this functionality. Spring Data Modern Data Access for Enterprise Java

91、单词 serialization 的匹配文本：

1. Tiering Flume agents is covered using Avro serialization, as well as using the Flume command line as a standalone Avro client for testing and importing data manually. Apache Flume Distributed Log Collection for Hadoop

1. MapReduce, however, is essentially functional, so expressing a graph algorithm as a chained MapReduce requires passing the entire state of the graph from one stage to the next—in general requiring much more communication and associated serialization overhead. Pregel - A System for Large-Scale Graph Processing

1. Every individual host in the grid is an active participant, and together they cooperate, organically distributing the workload while avoiding serialization and single points of failure. Monitoring with Ganglia

1. The extra serialization involved in using super columns and extra space used makes them less efficient. Cassandra High Performance Cookbook

1. This allows Ceph to leverage the intelligence present in OSDs to distribute the complexity surrounding data access, update serialization, replication and reliability, failure detection, and recovery. Ceph A Scalable, High-Performance Distributed File System

1. Elasticsearch uses JSON (or JavaScript Object Notation ) as the serialization format for documents. Elasticsearch The Definitive Guide

1. For example, it uses two different JSON serialization libraries, the Google collections project, and several Apache Commons libraries. Cassandra The Definitive Guide

1. However, this proved to be inefficient because of the serialization and deserialization costs associated with translating to and from Sawzall’s native type system. Tenzing - A SQL Implementation On The MapReduce Framework

1. The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk. Hadoop Beginners Guide

1. From a client perspective, the NIO transport connector is practically the same as the standard TCP connector, in terms of its use of TCP as the underlying network protocol and OpenWire as the message serialization protocol. ActiveMQ in Action

92、单词 namenode 的匹配文本：

1. Pacemaker ships with an IPaddr resource agent to manage VIP, so we only need to create our own namenode resource agent: HBase Administration Cookbook

1. They will experience substantial idle cycles waiting for replies from the namenode. HDFS scalability the limits to growth

1. The file system protocol is hdfs, the host to contact for services is NamenodeHost, and the port to connect to is 8020, which is the default port for HDFS. Pro Hadoop

1. Format the HDFS filesystem from NameNode with the following command: hadoop namenode -format. Hadoop Operations and Cluster Management Cookbook

1. Replace NAMENODE_ HOSTNAME and PORT with the relevant values corresponding to the NameNode of your HDFS cluster. Hadoop MapReduce Cookbook

1. Data compatibility concerns persistent data and metadata formats, such as the format in which the HDFS namenode stores its persistent data. Hadoop：The Definitive Guide 3rd Edition

1. Writing variations also exist for the different Hadoop components such as NameNode (name node, namenode, and namenode), DataNode, JobTracker, and TaskTracker. Hadoop in Action

1. It often runs on the same server as the namenode in smaller clusters. Cassandra The Definitive Guide

1. Namenode 1 Stores filesystem metadata, stores file to block map, and provides a global picture of the filesystem. Hadoop Operations

1. It runs the namenode and jobtracker daemons, but it also has Hive installed on it. Programming Hive

93、单词 uri 的匹配文本：

1. This is curl telling us that it is going to establish a TCP connection to the CouchDB server we specified in our request URI. CouchDB The Definitive Guide

1. Configure the NameNode RPC and HTTP URI for namenode1 by adding the following into the file: Hadoop Operations and Cluster Management Cookbook

1. We can pass a URL or URI instance as an argument. Gradle Effective Implementation Guide

1. The simplest way is to add a routing URI parameter when indexing a document. Mastering ElasticSearch

1. What the server does with your request depends on the URI you are issuing the request to, the request method you are using (GET, POST, PUT, DELETE), and the data you send along with your request. Beginning CouchDB

1. The subsequent rewrite rule then proxies all requests to URIs starting with /api to the equivalent URI on http://couch-master.example.com:5984 (again, only if the previous rewrite condition has been met): Scaling CouchDB

1. You can identify the current filesystem of the FileSystem object using the getUri() function as follows. Hadoop MapReduce Cookbook

1. The file:/ URI denotes a path on the local filesystem, and the hdfs:/ URI denotes a path in HDFS. Hadoop in Practice

1. As we’ll see shortly, most setups don’t need to specify the scheme://authority part of the URI. Hadoop in Action

1. Without going too deep into discussing URIs, let’s briefly summarize the URI structure. ActiveMQ in Action

94、单词 jsp 的匹配文本：

1. Check the job status via the MapReduce admin page—http://master1:50030/ jobtracker.jsp: HBase Administration Cookbook

1. To see it, visit http://bugs.sun.com/bugdatabase/index.jsp and enter the bug identification number 4252539 Core Java Volume I Fundamentals 9th Edition

1. The response of a Servlet is rendered by the view component— the Java Server Page (JSP) Gradle in Action

1. After the teragen job finishes, we can check the node storage space usage by opening the URL http://master:50070/dfsnodelist.jsp?whatNodes=LIVE. Hadoop Operations and Cluster Management Cookbook

1. The MapReduce administration JSP enables you to view both the overall state of the cluster and details of the particular job execution. Professional Hadoop Solutions

1. In CDH you can access the JobTracker UI at http://localhost:50030/jobtracker.jsp. Hadoop in Practice

1. Adding the preceding fragment to web.xml will display the /error-404.jsp page to a client attempting to access a nonexistent resource. Java EE 7 Essentials

1. Web technologies such as JavaServer Faces (JSF) and JavaServer Pages (JSP) rely heavily on the JavaBeans component model. Core Java Volume II Advanced Features 9th Edition

1. You can download the latest JDK for other operating systems from Sun at http://java.sun.com/javase/downloads/index.jsp. Hadoop in Action

1. Application frameworks such as Java Server Pages (JSP) and Java Server Faces (JSF) are built upon servlet technology. Java 7 Recipes A Problem-Solution Approach

95、单词 dynamically 的匹配文本：

1. Metric modules are dynamically loadable shared object files that extend the available metrics gmond is able to collect. Monitoring with Ganglia

1. Most programming environments solve this problem with templates; for example, documents that look like HTML but have portions of their content filled out dynamically. CouchDB The Definitive Guide

1. Changing the analyzer during indexing The next thing worth mentioning when it comes to handling multilingual data is the possibility of dynamically changing the analyzer during indexing. Mastering ElasticSearch

1. Figure 2: Ceph dynamically maps subtrees of the directory hierarchy to metadata servers based on the current workload. Ceph A Scalable, High-Performance Distributed File System

1. This requires, the ability to dynamically partition the data over the set of nodes (i.e., storage hosts) in the cluster. Cassandra A Decentralized Structured Storage System

1. This alleviates the performance issues associated with generating a view dynamically each and every time it is run, especially on databases that store millions of records of data. Beginning CouchDB

1. The number of replica shards can be changed dynamically on a live cluster, allowing us to scale up or down as demand requires. Elasticsearch The Definitive Guide

1. Robust data structures • An efficient, small binary format for RPC calls • Easy integration with dynamically typed languages such as Python, Ruby, Smalltalk, Perl, PHP, and Objective-C Avro has certain advantages that Thrift doesn’t, in particular the fact that static code generation is not required to use RPC for your application, though you can use it for. Cassandra The Definitive Guide

1. Each class will dynamically send and receive to/from the topics (an example is provided next) ActiveMQ in Action

1. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk. Bigtable - A Distributed Storage System for Structured Data

96、单词 cached 的匹配文本：

1. This means that they are largely restricted to handling GET requests, but it also means they can be cached by HTTP proxies like Varnish. CouchDB The Definitive Guide

1. What's more, cached filters don't require too much memory (it only carries the information about which documents matche the filter) and can be easily reused by consecutive queries run against the same filter to greatly increase query performance. Mastering ElasticSearch

1. Remember that all the columns of a row must be cached when using the row cache. Cassandra High Performance Cookbook

1. By default, if a readdir is immediately followed by one or more stats, the briefly cached information is returned; otherwise it is discarded. Ceph A Scalable, High-Performance Distributed File System

1. The metadata about the ranges a node is responsible is cached locally at each node and in a fault-tolerant manner inside Zookeeper - this way a node that crashes and comes back up knows what ranges it was responsible for. Cassandra A Decentralized Structured Storage System

1. Assuming you have cached your documents, then some percentage of these requests will result in cache hits. Scaling CouchDB

1. These cached filters can be reused very efficiently for subsequent requests. Elasticsearch The Definitive Guide

1. A notification of a conflicting lock request would tell a client to finish using data associated with the lock: it would finish pending operations, flush modifications to a home location, discard cached data, and release. Chubby - The lock service for loosely-coupled distributed systems

1. If you use a fraction, you’re indicating a percentage of keys to cache, and an integer value indicates an absolute number of keys whose locations will be cached. Cassandra The Definitive Guide

1. ActiveMQ supports both of these types of message delivery and can also be configured to support message recovery, an in-between state where messages are cached in memory. ActiveMQ in Action

97、单词 avro 的匹配文本：

1. Tiering Flume agents is covered using Avro serialization, as well as using the Flume command line as a standalone Avro client for testing and importing data manually. Apache Flume Distributed Log Collection for Hadoop

1. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook

1. You also learn about Apache Avro — an Hadoop framework for data serialization, which can be tightly integrated with Hadoop-based storage. Professional Hadoop Solutions

1. The OutputFormat uses Avro and more JARs are required to run the job. Cassandra High Performance Cookbook

1. It shows how to use different serialization libraries, including Avro, Thrift, and Protocol Buffers. Hadoop RealWorld Solutions Cookbook

1. The focus of chapter 3 is on ways to work with data stored in different formats, such as XML and JSON, which paves the way to a broader examination of data formats such as Thrift and Avro that work best with big data and Hadoop. Hadoop in Practice

1. Avro A serialization system for efficient, cross-language RPC and persistent data storage. Hadoop：The Definitive Guide 3rd Edition

1. This directory includes the Thrift and Avro RPC libraries for interacting with Cassandra. Cassandra The Definitive Guide

1. They are Protocol Buffers (created by Google and hosted at http://code.google.com/p/protobuf), Thrift (originally created by Facebook and now an Apache project at http://thrift.apache.org), and Avro (created by Doug Cutting, the original creator of Hadoop) Hadoop Beginners Guide

1. Some of the more common sources are an Avro RPC source to which applications can speak directly, syslog, a raw netcat-style source, and a source that executes a command and reads from its standard output. Hadoop Operations

98、单词 oozie 的匹配文本：

1. A typical Hadoop-based Big Data platform includes the Hadoop Distributed File System (HDFS), the parallel computing framework (MapReduce), common utilities, a column-oriented data storage table (HBase), high-level data management systems (Pig and Hive), a Big Data analytics library (Mahout), a distributed coordination system (ZooKeeper), a workflow management module (Oozie), data transfer modules such as Sqoop, data aggregation modules such as Flume, and data serialization modules such as Avro. Hadoop Operations and Cluster Management Cookbook

1. It is tested at scale, rest of ecosystem projects like Pig, Hive, Oozie, etc. Apache Hadoop YARN Yet Another Resource Negotiator

1. Oozie, an open source Apache project, has functionality for building workflows and coordinating job running. MapReduce Design Patterns

1. Why do you need an additional orchestration layer on top of MapReduce, and how does Oozie fit the bill? Professional Hadoop Solutions

1. Apache Oozie is a workflow system for Hadoop MapReduce computations. Hadoop MapReduce Cookbook

1. For this book you’ll also be working with Oozie, HBase, and other projects, but you’ll find instructions for these technologies in their respective sections. Hadoop in Practice

1. Oozie A service for running and scheduling workflows of Hadoop jobs (including MapReduce, Pig, Hive, and Sqoop jobs) Hadoop：The Definitive Guide 3rd Edition

1. HBase Hadoop database for random read/write access Hive SQL-like queries and tables on large data sets Pig Dataflow language and compiler Oozie Workflow for interdependent Hadoop jobs Sqoop Integrates databases and data warehouses with Hadoop. HBase：The Definitive Guide

1. Oozie We have said many times that Hadoop clusters do not live in a vacuum and need to integrate with other systems and into broader workflows. Hadoop Beginners Guide

1. Apache Oozie It’s not uncommon for large production clusters to run many coordinated MapReduce jobs in a workfow. Hadoop Operations

99、单词 writable 的匹配文本：

1. The key on each record will always be a LongWritable containing the current timestamp or if the timestamp event header is set, will be used instead. Apache Flume Distributed Log Collection for Hadoop

1. In this case, the filesystem is both readable and writable. Hadoop Operations and Cluster Management Cookbook

1. To avoid permission issues, carry out the installation in user-writable directories. Cassandra High Performance Cookbook

1. In the mapper, we use the ThriftWritable class of Elephant Bird to wrap the WeblogRecord object. Hadoop RealWorld Solutions Cookbook

1. Choosing appropriate Hadoop data types Hadoop uses the Writable interface based classes as the data types for the MapReduce computations. Hadoop MapReduce Cookbook

1. The DBInputFormat class requires a bean representation of the table being imported, which implements both the Writable and DBWritable interfaces. Hadoop in Practice

1. You may notice we use special classes such as LongWritable , IntWritable , and Text instead of the more familiar Long, Integer, and String classes of Java. Hadoop in Action

1. That is, a distributed database designer must choose to make the system either always readable or always writable. Cassandra The Definitive Guide

1. Take a look at the signatures of our mapper and reducer classes: the WordCountMapper class gives IntWritable and Text as input and gives Text and IntWritable as output. Hadoop Beginners Guide

1. The version 1.0.0 lineage of Hadoop releases has a large number of group writable files in the tarball. Hadoop Operations

100、单词 nutch 的匹配文本：

1. For example, it supports benchmarks for Nutch (a text indexing software package), PageRank (the PageRank algorithm), the Mahout machine learning algorithms, and Hive queries. Hadoop Operations and Cluster Management Cookbook

1. Shortly after the release of the paper, a free and open source software pioneer by the name of Doug Cutting started working on a MapReduce implementation to solve scalability in another project he was working on called Nutch, an effort to build an open source search engine. MapReduce Design Patterns

1. In 2002, while developing a project called Nutch (a search engine project focused on crawling, indexing, and searching Internet web pages), Doug Cutting and Mike Cafarella were struggling with a solution for processing a vast amount of information. Professional Hadoop Solutions

1. From there, Hadoop was extracted out of Nutch and eventually became an Apache top-level project. HBase in Action

1. The Apache Hadoop project was started as a support project for the Apache Nutch search engine, before spawning off as a separate top-level project. Hadoop MapReduce Cookbook

1. My team was making discoveries similar to those that Doug Cutting and others at Nutch had made several years earlier regarding how to efficiently store and manage terabytes of crawled and analyzed data. Hadoop in Practice

1. They’d devised systems to solve precisely the problems we were having with Nutch. Hadoop：The Definitive Guide 3rd Edition

1. It’s not perfect, but it’s ready for other people to play with and examine.” Mike had been working with Doug Cutting on Nutch, an open source search engine. HBase：The Definitive Guide

1. Hadoop started out as a subproject of Nutch , which in turn was a subproject of Apache Lucene. Hadoop in Action

1. Thanks, Doug At the same time, Doug Cutting was working on the Nutch open source web search engine. Hadoop Beginners Guide

101、单词 subclasses 的匹配文本：

1. To enable this optimization the user subclasses the Combiner class, overriding a virtual Combine() method. Pregel - A System for Large-Scale Graph Processing

1. FlumeJava includes subclasses of DoFn, e.g., MapFn and FilterFn, that provide simpler interfaces in special cases. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. The most common input formats are subclasses of FileInputFormat, with the Hadoop default being TextInputFormat. MapReduce Design Patterns

1. These two objects are subclasses of DataInputStream and DataOutputStream from the Java I/O package, which means that they support standard I/O operations. Professional Hadoop Solutions

1. Pig has an abstract class called LoadFunc that has a number of methods that must be provided by concrete subclasses, as seen in figure 11.4 Hadoop in Practice

1. Subclasses need only populate data in HDFS (perhaps by copying from a local file), run a MapReduce job, and confirm the output is as expected. Hadoop：The Definitive Guide 3rd Edition

1. There are specific subclasses, depending on where the host is used, in other words, on the master, region server, and so on. HBase：The Definitive Guide

1. This is an abstract class for interfacing with the filesystem, and there are different concrete subclasses for handling HDFS and the local filesystem. Hadoop in Action

1. In most cases, the implementation of a MapReduce job will provide job-specific subclasses of the Mapper and Reducer base classes found in this package. Hadoop Beginners Guide

1. Every instance of class Object (and its subclasses) possesses a lock that is obtained on entry to a synchronized method and automatically released upon exit. Concurrent Programming in Java Design Principles and Pattern

102、单词 operand 的匹配文本：

1. There are actually two forms of these operators; you’ve just seen the postfix form of the operator that is placed after the operand. Core Java Volume I Fundamentals 9th Edition

1. AND: It means that the given two terms (left and right operand) need to match in order for the clause to be matched. Mastering ElasticSearch

1. An example is division when the second operand is zero or sqrt when its argument is negative. Java 8 in Action Lambdas, Streams and Functional-style Programming

1. At an assignment to a field: if the type of the left-hand operand is a raw type, then a compile-time unchecked warning occurs if erasure changes the field's type. The Java Language Specification Java SE 7 Edition

1. The bytecode has been annotated with the hexadecimal values corresponding to the opcodes and operand data. Oracle JRockit The Definitive Guide

1. It is known as the functional approach because methods return the result of applying a function to their operand without modifying it. Effective Java 2nd Edition

1. The assumption is that the first four bits of the command represent an operator, while the next 16 bits are an operand. Java 7 New Features Cookbook

1. Some operators operate on only one operand; these are called unary operators. Java in a Nutshell 5th Edition

1. Specifically, the conditional-And operator, &&, evaluates its right side only if the result of its left operand is true. Java Performance Tuning 2nd Edition

1. Many of the constraints on bytecodes can be checked statically, such as the operand of an “ldc” bytecode must be a valid constant pool index whose type. Java Performance

103、单词 php 的匹配文本：

1. He has worked with Perl/ PHP/Python, primarily for coding and MySQL/Oracle as the backend, for several years prior to moving into Hadoop fulltime. Apache Flume Distributed Log Collection for Hadoop

1. Before I move on, however, I should mention that gweb is a PHP program, which most people run under the Apache web server (although any web server with PHP or FastCGI support should do the trick) Monitoring with Ganglia

1. If you are not familiar with JavaScript but have used C or any other C-like language such as Java, PHP, or C#, this should look familiar. CouchDB The Definitive Guide

1. He is also a well-known speaker who delivers talks on Drupal, Open Source, PHP, Moodle, and so on, along with organizing and delivering TechTalks in Drupal meetups and Drupal Camps in Mumbai, India. Mastering ElasticSearch

1. Generating Thrift bindings for other languages (C++, PHP, and others) Cassandra High Performance Cookbook

1. Later in this book you will use other tools to interact with the CouchDB database, including CouchDB’s very own Futon web-based administration interface, the JavaScript client-side web development language, and the PHP, Ruby, and Python programming languages. Beginning CouchDB

1. Zach is now a developer at Elasticsearch and maintains the PHP. Elasticsearch The Definitive Guide

1. Recall that PHP was originally designed to work within static HTML content. Hadoop in Action

1. To generate Thrift bindings for other languages, pass it to the --gen switch (for example, thrift --gen php) Cassandra The Definitive Guide

1. Although ActiveMQ is written in Java, APIs for many languages other than Java are provided, including C/C++, .NET, Perl, PHP, Python, Ruby, and many more. ActiveMQ in Action

104、单词 applet 的匹配文本：

1. We describe how to package programs in JAR files, and how to deliver applications over the Internet with the Java Web Start and applet mechanisms. Core Java Volume I Fundamentals 9th Edition

1. A Macintosh applet or application will have a different GUI than that of a Windows or Unix system. Java Network Programming and Distributed Computing

1. Also, the decision to compile Java to JVM bytecode (a virtual machine code that soon every browser supported) meant that it became the language of choice for internet applet programs (do you remember applets?) Java 8 in Action Lambdas, Streams and Functional-style Programming

1. The use of applet containers providing 1.1 APIs is outside the scope of this specification. Java EE 7 Specification

1. In fact, whenever the class loader of an applet reads a JAR file, it reads and decompresses data from the network. Core Java Volume II Advanced Features 9th Edition

1. The title of the window is specified using the java_applet_title parameter. Java 7 New Features Cookbook

1. As of Java 1.2, any Java code, whether it is an applet, a servlet, a JavaBeans component, or a complete Java application, can be run with restricted permissions that prevent it from doing harm to the host system. Java in a Nutshell 5th Edition

1. This also allows you to place the thread’s task in a subclass of some other class, such as Applet or HTTPServlet. Java Network Programming 4th Edition

1. As is typical of GUI-based programs, ParticleApplet uses several auxiliary classes that do most of the work. Concurrent Programming in Java Design Principles and Pattern

1. The following code is that of a Java applet class. Java 7 Recipes A Problem-Solution Approach

105、单词 sharding 的匹配文本：

1. We’ve mitigated this problem somewhat by setting up the Ganglia web UI on each server to redirect requests to the appropriate server for each category of metrics, so end users see a unified system that hides the sharding. Monitoring with Ganglia

1. You could use multi-master, master/slave, partitioning, sharding, write-through caches, and all sorts of other complex techniques. CouchDB The Definitive Guide

1. Those Lucene indices are called shards and the process of this spreading is called sharding. Mastering ElasticSearch

1. The MySQL sharding scheme assigned each customer and all related data to a fixed shard. Spanner - Google’s Globally-Distributed Database

1. Working with secondary indexes The primary ordering and sharding is done by the row key. Cassandra High Performance Cookbook

1. As your data keeps growing, an important feature is the ability to split your data across multiple servers, also known as sharding. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Because it doesn’t have any sharding capabilities, you need to be careful not to overwhelm the database from which you are reading by running too many mappers. Hadoop：The Definitive Guide 3rd Edition

1. This was true of many tools and techniques for scaling data processing, including caching (for example, memcached), replication, sharding, and, of course, MapReduce/ Hadoop. Hadoop in Action

1. Sharding and shared-nothing architecture If you can’t split it, you can’t scale it. Cassandra The Definitive Guide

1. A better and actually very common approach is to start by sharding (i.e., distributing) the data quasi randomly across the machines. PowerDrill - Processing a Trillion Cells per Mouse Click

106、单词 optimized 的匹配文本：

1. In large grids with thousands of nodes per cluster, or carefully optimized HPC grids, in which every CPU cycle spent on something other than the problem is a wasted cycle, “normal” compute nodes are often configured as deaf in order to minimize the overhead associated with aggregating cluster state. Monitoring with Ganglia

1. An index is a data structure optimized for quick search and retrieval. CouchDB The Definitive Guide

1. The algorithm addresses a general case when a query may reference repeated fields; a simpler optimized version is used for flat-relational queries, i.e., those referencing only required and optional fields. Dremel - Interactive Analysis of WebScale Datasets

1. Performance considerations The previous query execution is already optimized by ElasticSearch internals, by using caching mechanism. Mastering ElasticSearch

1. Because Cassandra handles replication, disk systems can be optimized for more performance versus redundancy. Cassandra High Performance Cookbook

1. This strategy provides the best of both worlds: streaming updates to disk in an efficient (sequential) fashion, and a vastly reduced re-write workload, allowing the long-term on-disk storage layout to be optimized for future read access. Ceph A Scalable, High-Performance Distributed File System

1. In some highly optimized services even small monitoring overheads are easily noticeable, and might compel the deployment teams to turn the tracing system off. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. The node handling the request can no longer make use of the optimized bulk format to parse just the metadata line in order to decide which shard should receive the request. Elasticsearch The Definitive Guide

1. This model is optimized for reading data, as it allows data to be read from any slave. Cassandra The Definitive Guide

1. Its speed is the result of the combination of a fast transactional journal comprised of data log files, the highly optimized indexing of message IDs, and inmemory message caching. ActiveMQ in Action

107、单词 transactional 的匹配文本：

1. Flume only provides transactional guarantees for each channel in each individual agent. Apache Flume Distributed Log Collection for Hadoop

1. Operations within an account are transactional and consistent: a user who sends or labels a message is guaranteed to observe the change despite possible failover to another replica. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. Accountants don’t use erasers” is a quote from Pat Helland, a senior architect of transactional systems who worked at Microsoft and Amazon. CouchDB The Definitive Guide

1. Third, F1 requires strong transactional semantics, which made using other NoSQL systems impractical. Spanner - Google’s Globally-Distributed Database

1. In Chapter 3, Application Programmer Interface we should know how to use transactional style locking using Cages. Cassandra High Performance Cookbook

1. However there have been some applications that have asked for transactional mainly for the purpose of maintaining secondary indices. Cassandra A Decentralized Structured Storage System

1. Within a single CouchDB node, an update of a single document is transactional. Scaling CouchDB

1. It’s transactional, supporting JTA/JTS, two-phase commit, deadlock detection, and MVCC. Cassandra The Definitive Guide

1. This is a file-based message store that combines a transactional journal, for reliable message storage and recovery, with good performance and scalability. ActiveMQ in Action

1. For example, in a transactional web-indexing system the programmer can make assumptions like: the hash of the contents of a document is always consistent with the table that indexes duplicates. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

108、单词 amazon 的匹配文本：

1. Creating mountains of data has become very easy, thanks to Facebook, Twitter, Amazon, digital cameras and camera phones, YouTube, Google, and just about anything else you can think of connected to the Internet. Apache Flume Distributed Log Collection for Hadoop

1. This issue is also relevant when monitoring dynamic environments such as cloud resources (for example, Amazon EC2) Monitoring with Ganglia

1. Werner Vogels, Amazon CTO and Vice President If availability is a priority, we can let clients write data to one node of the database without waiting for other nodes to come into agreement. CouchDB The Definitive Guide

1. Amazon EC2 discovery The Amazon store, in addition to selling goods has a few popular services as selling storage or computing power in the pay-as-you-go model. Mastering ElasticSearch

1. Dynamo[6] is a storage system that is used by Amazon to store and retrieve user shopping carts. Cassandra A Decentralized Structured Storage System

1. For example, Amazon has a tool called Elastic Load Balancing and Rackspace provides a service called Rackspace Cloud Load Balancers (in beta as of this writing) Scaling CouchDB

1. The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, and Microsoft. Hadoop in Action

1. This original version of Cassandra was written primarily by an ex-employee from Amazon and one from Microsoft. Cassandra The Definitive Guide

1. Before joining Improve Digital, he spent time at Amazon.co.uk, where he led several software development teams building systems that process Amazon catalog data for every item worldwide. Hadoop Beginners Guide

1. When a user makes a purchase on Amazon, there are quite a few separate stages through which that order must travel including order placement, invoice creation, payment processing, order fulfillment, shipping, and more. ActiveMQ in Action

109、单词 ssl 的匹配文本：

1. We suggest you read up on VPNs and setting up CouchDB behind an HTTP proxy (like Apache httpd’s mod_proxy, nginx, or varnish) that will handle SSL for you. CouchDB The Definitive Guide

1. For a while, Kerberos implementations had the option of using Kerberized SSL (KSSL) for HTTP authentication — and it is still an option in many distributions. Professional Hadoop Solutions

1. If SSL mutual authentication is used as the thentication protocol, the client can manage its own authentication ntext, and need not depend on references to the login session state. Java EE 7 Specification

1. Unfortunately, CouchDB does not support SSL at the time of writing. Beginning CouchDB

1. Using SSL over HTTP is transparent to application programmers; simply use URLs that start with https. Core Java Volume II Advanced Features 9th Edition

1. Chapter 9, Database, Security, and System Enhancements: Various database improvements such as the introduction of the new RowSetFactory class are illustrated along with how to take advantage of new SSL support. Java 7 New Features Cookbook

1. In Java 1.4 and later, you can use https:// with the URL class as previously shown to securely download documents from web servers that support SSL. Java in a Nutshell 5th Edition

1. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action

1. A similar technique can be used to disable the SSLv2Hello pseudo-protocol described above, which is not recognized by some SSL implementations. Fundamental Networking in Java

1. There are a great number of events that can be handled using a WebViewClient such as a malformed URL or a failed SSL handshake. Java 7 Recipes A Problem-Solution Approach

110、单词 ejb 的匹配文本：

1. Currently, he is developing web applications using the latest Java technologies on the market (EJB 3.0, CDI, Spring, JSF, Struts, Pro Java 7 NIO.2

1. Similar access to databases is provided to JSP SF applications, and servlets by the Web Container, and to enterprise y the EJB Container. Java EE 7 Specification

1. Transactional annotation brings transactional semantics to POJOs (plain old Java objects), outside of an EJB (Enterprise JavaBean) Java EE 7 Essentials

1. We’d like to address a common confusion before going any further: The JavaBeans that we discuss in this chapter have little in common with Enterprise JavaBeans (EJB) Core Java Volume II Advanced Features 9th Edition

1. He has written numerous publications and has also written the EJB 3.1 Cookbook. Java 7 New Features Cookbook

1. Particular care should be taken when determining applicable requirements based on the presence of EJB Lite in the Web Profile. Java EE 7 Web Profile Specification

1. However, in their book J2EE Performance Testing with BEA WebLogic Server (Expert Press), Peter Zadrozny, Philip Aston, and Ted Osborne state that there is no real difference in performance between these two options, as long as the beans are explicitly removed from the container when the session expires.[2] The beans are removed automatically by binding the session to a session listener that calls ejbRemove( ) when the session terminates. Java Performance Tuning 2nd Edition

1. Technologies such as those discussed in chapter 2 (COM, CORBA, DCE, and EJB) using RPC are considered to be tightly coupled. ActiveMQ in Action

1. A special thanks goes to Rahul Biswas for providing content relating to EJB and Java persistence and also for his willingness to review multiple drafts and provide valuable feedback. Java Performance

1. The Enterprise JavaBean (EJB) 3.0 model used interfaces for interacting with methods that performed database work. Java 7 Recipes A Problem-Solution Approach

111、单词 manning 的匹配文本：

1. Manning Early Access (MEAP) readers who posted comments in the Author Online forum. Mahout in Action

1. Copyright 2014 Manning Publications For more information on this and other Manning titles go to. Netty in Action MEAP V08

1. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Gradle in Action

1. Without his recommendation, Manning would not have come knocking on our inboxes, and we would never have written a book together. RabbitMQ in Action

1. Toward the end of 2010, the idea of writing HBase in Action for Manning came up. HBase in Action

1. I’m also indebted to Cynthia Kane, my development editor at Manning, who coached me through writing this book and provided invaluable feedback on my work. Hadoop in Practice

1. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Hadoop in Action

1. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. ActiveMQ in Action

1. Please post comments or corrections to the Author Online forum at http://www.manning-sandbox.com/forum.jspa?forumID=451 Lucene in Action 2nd Edition

1. The purchase of MongoDB in Action includes free access to a private forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and other users. MongoDB in Action

112、单词 querying 的匹配文本：

1. They could be built quickly, without having to worry about drivers, languages, or indeed many of the complexities of querying and retrieving information. Getting Started with CouchDB

1. These ports are used by gweb, which is usually installed on the same host as gmetad, so unless you’re using some of the advanced integration features, such as Nagios integration, or have custom scripts querying gmetad, you shouldn’t need any firewall ACLs for gmetad. Monitoring with Ganglia

1. In the blog app, we use group_level reduce queries to compute the count of comments both on a per-post and total basis, achieved by querying the same view index with different methods. CouchDB The Definitive Guide

1. Extend your knowledge on ElasticSearch, and querying and data handling, along with its internal workings. Mastering ElasticSearch

1. Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, ad hoc querying, and analysis of large data sets stored in Hadoop files. Cassandra High Performance Cookbook

1. Although it might be suitable and fast for general-purpose querying, it starts to become more cumbersome when you try to perform aggregation and reporting. Beginning CouchDB

1. Don’t use the include_docs parameter (or set it to false) when querying views. Scaling CouchDB

1. Query time search-as-you-type Index time optimizations Ngrams for partial matching Index time search-as-you-type Preparing the index Querying the field Edge n-grams and postcodes Ngrams for compound words 16 Elasticsearch The Definitive Guide

1. We present an overview of what’s different about Cassandra’s model for querying and updating data, and then get to work using the API. Cassandra The Definitive Guide

1. The Geronimo deployer is a command-line tool for querying and installing Geronimo plug-ins. ActiveMQ in Action

113、单词 junit 的匹配文本：

1. By the end of this chapter, you’ll write tests with JUnit, TestNG, and Spock and execute them as part of the build lifecycle. Gradle in Action

1. Let's write a simple JUnit test for a sample Java class. Gradle Effective Implementation Guide

1. The code For this chapter, we created the Maven project (http://maven.apache.org) and every aspect we described is connected with a particular JUnit (http://junit.org/) test in that project. Mastering ElasticSearch

1. From the JUnit installation, copy the junit.jar into your library path. Cassandra High Performance Cookbook

1. MRUnit is built on top of the popular JUnit testing framework. Hadoop RealWorld Solutions Cookbook

1. The name is a unique label for the module itself, and is often the same as the project name (like solr-core, spring-web, or junit) Gradle Beyond The Basics

1. One of the powerful features of JUnit and other test frameworks is that when tests fail, the failure message includes details on the cause of the failure. Hadoop in Practice

1. For running unit tests we use junit, as well as a couple of helper libraries: hamcrest-all provides useful matchers for writing test assertions, and mrunit is used for writing MapReduce tests. Hadoop：The Definitive Guide 3rd Edition

1. MRUnit takes the unit test model of its namesakes such as JUnit and DBUnit and provides a framework to help write and execute tests that can help improve the quality of your code. Hadoop Beginners Guide

1. This facilitates compiled tests being able to reference JUnit or TestNG annotations or base classes. Building and Testing with Gradle

114、单词 workflow 的匹配文本：

1. Let’s examine the workflow of the Songbird backup application, first as a user backing up from a single computer, and then using Songbird to synchronize playlists between multiple computers. CouchDB The Definitive Guide

1. Figure 7.13 demonstrates the page workflow you’ll need to script. Gradle in Action

1. Still worse, instead of losing a single workflow, a JobTracker failure caused an outage that would lose all the running jobs in a cluster and require users to manually recover their workflows. Apache Hadoop YARN Yet Another Resource Negotiator

1. This is an easy way to begin sketching out and prototyping code whose final form you do not yet know, which is a typical workflow in the development of new forms of build automation. Gradle Beyond The Basics

1. Often times we require multiple MapReduce applications to be executed in a workflow-like manner to achieve our objective. Hadoop MapReduce Cookbook

1. Space considerations do not allow us to demonstrate every feature therein, but a typical user workflow is shown in Figure 6 Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Within a few months we’d built and deployed a MapReduce application encompassing a number of MapReduce jobs, woven together with our own MapReduce workflow management system onto a small cluster of 18 nodes. Hadoop in Practice

1. The output of a workflow is some subset of the input pages along with a set of index terms. Hadoop in Action

1. Looking forward, Imagini hopes to simplify their workflow once Hadoop Streaming becomes available with Cassandra. Cassandra The Definitive Guide

1. Once again, using any of the multiple interfaces (web console, CLI, or API), a Hadoop workflow is defined with attributes such as the number of Hadoop hosts required and the location of the source data. Hadoop Beginners Guide

115、单词 gui 的匹配文本：

1. There are two ways to create/modify views: one is via the web GUI, and the other by programatically defining views using JSON.i Creating views using the GUI. Monitoring with Ganglia

1. The NameNode web GUI is not known to work correctly with Hadoop 0.19.0 under the virtual cluster. Pro Hadoop

1. The use case starts out as a plain Java application without a GUI, simply controlled through console input. Gradle in Action

1. Sometimes if a GUI web browser is not available, we can use a command line based web browser such as elinks and lynx. Hadoop Operations and Cluster Management Cookbook

1. We will write our first Gradle script and learn about the command-line and GUI features of Gradle. Gradle Effective Implementation Guide

1. Her interests and experiences lie across multiple tiers, from web frontend GUI to middleware, from middleware to backend SQL RDBMS and NoSQL data storage. Cassandra High Performance Cookbook

1. It functions much like a GUI-based FTP client in managing remote storage. Hadoop in Action

1. A GUI will start and load the JConsole tool that ships with Java. Cassandra The Definitive Guide

1. In addition to concurrency constructs, this example illustrates a few of the issues encountered when using threads with any GUI-based program. Concurrent Programming in Java Design Principles and Pattern

1. Gradle must drive the initiation of the build, say from from the command line, IDE, or GUI, but can be either the subordinate or dominate player in the actual XML and Gradle build script flow. Building and Testing with Gradle

116、单词 faceting 的匹配文本：

1. It consists of different UI templates, request/response handlers, and different faceting provided by Solr. Scaling Big Data with Hadoop and Solr Chapter-02 Understanding Solr

1. You should also know how to send queries to get the documents you are interested in, how to narrow down the results of your queries by using filtering, and how to calculate statistics for your data with the use of the faceting/ aggregation mechanism. Mastering ElasticSearch

1. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spellcheck, relevancy tuning, and more. Solr 3 Enterprise Search Server

1. Chapter 6, Using Faceting Mechanism: This chapter will show you the beautiful world of the Apache Solr faceting mechanism, including tasks like getting the number of documents with the same field value, matching the same query, matching given range values, and so on. Solr 3.1 Cookbook

1. Looking ahead, in part 2, we'll cover Elasticsearch's core functionality when it comes to dealing with data: indexing, searching, analysis, and faceting. Elasticsearch in Action MEAP V07 cHAPTER 1

1. Once the user performs a query, the results can be further categorized using Solr’s faceting support to show features of the documents in the result set. Solr in action

1. You will learn how to deal with the problems in Solr configuration and setup, how to handle common querying problems, how to fine-tune Solr instances, how to set up and use SolrCloud, how to use faceting and grouping, fight common problems, and many more things. Apache Solr 4 Cookbook

117、单词 gmond 的匹配文本：

1. Look at each instance of gmond within the same network broadcast domain (since reachability is based on multicast packets), and find the udp_recv_channel block in gmond.conf. Apache Flume Distributed Log Collection for Hadoop

1. We will install Ganglia Monitoring Daemon (Gmond) on each node in the cluster, which will gather the server and HBase metrics of that node. HBase Administration Cookbook

1. As depicted in Figure 1-1, Ganglia is architecturally composed of three daemons: gmond, gmetad, and gweb. Monitoring with Ganglia

1. Ganglia is composed of two servers: the gmetad server, which provides historical data and collects current data, and the gmond server, which collects and serves current statistics. Pro Hadoop

1. Start the gmond daemon with the following command on the master node: sudo service gmond start. Hadoop Operations and Cluster Management Cookbook

1. The process that feeds metrics to your statistics system, such as collectd and gmond, probably already runs on the storage node. bk-admin-guide-cloud-latest

1. The Ganglia monitoring daemon (gmond) should be running on every worker node in the cluster. Hadoop RealWorld Solutions Cookbook

1. Multicast Versus Unicast While the default communication method between monitoring daemons (gmond) is UDP multicast messages, you may encounter environments where multicast is either not possible or a limiting factor. HBase：The Definitive Guide

1. It works by running a small monitoring daemon on each host called gmond that collects metrics locally. Hadoop Operations

118、单词 endpoint 的匹配文本：

1. The syslog TCP source As previously mentioned, the syslog TCP source provides an endpoint for messages over TCP, allowing for a larger payload size and TCP retry semantics that should be used for any reliable inter-server communications. Apache Flume Distributed Log Collection for Hadoop

1. A maximal matching is one to which no additional edge can be added without sharing an endpoint. Pregel - A System for Large-Scale Graph Processing

1. Bulk Document Inserts The fastest mode for importing data into CouchDB via HTTP is the _bulk_docs endpoint. CouchDB The Definitive Guide

1. It’s responsible for accepting HTTP requests, executing a CRUD operation mapped to a URL endpoint, and forwarding the request to a JSP. Gradle in Action

1. MultiGet The MultiGet operation is available via the _mget endpoint and allows fetching several documents using a single request. Mastering ElasticSearch

1. As soon as a natural endpoint finds data, it is returned to the client. Cassandra High Performance Cookbook

1. Let’s look at how to write a MapReduce job to pull data from an HTTP endpoint: Hadoop in Practice

1. You can see that we’re still using index megacorp and type employee, but instead of specifying a document ID, we now use the _search endpoint. Elasticsearch The Definitive Guide

1. Set the environment variable EC2_URL to the service endpoint of a different region if you choose to. Hadoop in Action

1. When a server node is started, it registers itself with the gossiper to receive endpoint state information. Cassandra The Definitive Guide

119、单词 tuples 的匹配文本：

1. In particular, Java provides poor support for simple anonymous functions and heterogeneous tuples, which leads to verbosity and some loss of static type safety. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Finally, we used the FLATTEN operator to unnest the Tuples in the DataBags emitted from the UDF. Hadoop RealWorld Solutions Cookbook

1. Compound types include arrays, maps (an overloaded term in this paper), and tuples. Interpreting the Data Parallel Analysis with Sawzall

1. What is Hadoop? The role of the programmer is to define map and reduce functions, where the map function outputs key/value tuples, which are processed by reduce functions to produce the final output. Hadoop in Practice

1. The result of the LOAD operator, and indeed any operator in Pig Latin, is a relation, which is just a set of tuples. Hadoop：The Definitive Guide 3rd Edition

1. To support this diversity of data, Pig supports complex data types, such as bags and tuples that can be nested to form fairly sophisticated data structures. Hadoop in Action

1. It was new, with strange new vocabulary and terms such as “tuples”—familiar words used in a new and different manner. Cassandra The Definitive Guide

1. Tenzing currently has support for creating functions in Sawzall that take tables (vector of tuples) as input and emit tables as output. Tenzing - A SQL Implementation On The MapReduce Framework

1. Bigtable presents a multi-dimensional sorted map to users: keys are (row, column, timestamp) tuples. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

1. The bottom, colored segment of the bars represents the time taken by HadoopDB when hand-coded and the rest of the bar indicates the additional overhead as a result of the SMS planner (e.g., SMS jobs need to serialize tuples retrieved from the underlying database and deserialize them before further processing in Hadoop) HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

120、单词 standalone 的匹配文本：

1. Tiering Flume agents is covered using Avro serialization, as well as using the Flume command line as a standalone Avro client for testing and importing data manually. Apache Flume Distributed Log Collection for Hadoop

1. The installed CouchDB can be run both standalone and also as a service. Getting Started with CouchDB

1. Primarily, we discuss two ways in which to extend Ganglia, including the development and deployment of additional metric modules, and the use of a standalone utility called gmetric. Monitoring with Ganglia

1. The goal here is to provide a way to build standalone apps that can also be easily indexed by search engines and used via screen readers. CouchDB The Definitive Guide

1. Standalone writes are implemented as read-write transactions; non-snapshot standalone reads are implemented as read-only transactions. Spanner - Google’s Globally-Distributed Database

1. And it packages up all of this functionality into a standalone server that your. Elasticsearch The Definitive Guide

1. What’s in This Book? This book is designed with the chapters acting, to a reasonable extent, as standalone guides. Cassandra The Definitive Guide

1. There is also an API to directly execute queries on the pool, and a standalone binary which does not need any server side components, but rather launches its own MapReduce jobs. Tenzing - A SQL Implementation On The MapReduce Framework

1. Figure 2.2 Standalone terminals and applications connecting to a mainframe using many protocols. ActiveMQ in Action

1. Though many of these features look Maven-like in their naming and function, these are behaviors desired by nearly any Java project, even standalone single-JAR Java applications. Building and Testing with Gradle

121、单词 ssh 的匹配文本：

1. It is best to ensure that all nodes are set up the same way using a common shell script, executed via a parallel SSH client or configuration management tool. Monitoring with Ganglia

1. To establish a secure SSH connection between your computer and GitHub, you’ll need to generate SSH keys and add the public key to your GitHub account. Gradle in Action

1. The output tells us that the local machine has the following services running: ftp, ssh, smtp, rpcbind (service for remote procedure calls), and jpp (service for Java packaging) Hadoop Operations and Cluster Management Cookbook

1. The repository can be accessible via the file system, HTTP, SSH, or other protocols. Gradle Effective Implementation Guide

1. The recipe Connecting with JConsole over a SOCKS Proxy shows how to use JConsole with a host only reachable by SSH. Cassandra High Performance Cookbook

1. We will do this using public key authentication, but you could instead use ssh-agent or rsh. Scaling CouchDB

1. If you are using Windows machines, install Cygwin and SSH server in each machine. Hadoop MapReduce Cookbook

1. When you start Hadoop it’ll launch NameNode and JobTracker on the local host from which you issued the start command, and then SSH to all the nodes in this file to launch the SecondaryNameNode. Hadoop in Practice

1. If a Hadoop cluster is already available to you, you can skip the next section on how to set up Secure Shell (SSH) channels between nodes. Hadoop in Action

1. Time for action – setting up SSH Carry out the following steps to set up SSH: Hadoop Beginners Guide

122、单词 kafka 的匹配文本：

1. This paper discusses the design and engineering problems we encountered in moving LinkedIn’s data pipeline from a batch-oriented file aggregation mechanism to a real-time publish-subscribe system called Kafka. Building LinkedIn’s Real-time Activity Data1

1. Apache Kafka) and supports all three types of the Trident spout transaction. Storm Blueprints Patterns for Distributed Real-time Computation

1. Getting ready Before we get started, we need to install a message broker that more readily supports the transactional logic inherent in Storm's Trident, specifically Apache Kafka. Storm Real-Time Processing Cookbook

1. Set up Apache Kafka clusters and develop custom message producers and consumers using practical, hands-on examples. Apache Kafka

1. We introduce Kafka, a distributed messaging system that we developed for collecting and delivering high volumes of log data with low latency. Kafka a Distributed Messaging System for Log Processing

123、单词 rmi 的匹配文本：

1. How to write server-side Java applications for the WWW • Distributed computing technologies including remote method invocation (RMI) and. Java Network Programming and Distributed Computing

1. Applications that use RMI typically have trouble running on more secure networks. Cassandra High Performance Cookbook

1. I-IIOP subsystem is composed of APIs that allow for the use of RMI-style ming that is independent of the underlying protocol, as well as an. Java EE 7 Specification

1. I felt like we’d need to precisely control how things like connections, timeouts and buffers are handled, and RMI gives you little control over those. Hadoop：The Definitive Guide 3rd Edition

1. Communication between the client and the middle tier can occur through HTTP (when you use a web browser as the client) or another mechanism such as remote method invocation (RMI, see Chapter 11) Core Java Volume II Advanced Features 9th Edition

1. Throwable implements Serializable so exceptions from remote method invocation (RMI) can be passed from server to client. Effective Java 2nd Edition

1. Notable omissions are the RMI and IDL tools that are specific to enterprise programming and which are documented in Java Enterprise in a Nutshell (O’Reilly) Java in a Nutshell 5th Edition

1. In addition, Java makes heavy use of nonstandard custom schemes such as rmi, jar, jndi, and doc for various purposes. Java Network Programming 4th Edition

1. Such non-JMS APIs could consist of utilizing the CORBA IIOP protocol or some other native protocol beyond Java RMI. ActiveMQ in Action

1. By convention, this notion is further restricted to constructs affecting a single JVM, as opposed to distributed programming, for example using remote method invocation (RMI), that involves multiple JVMs residing on multiple computer systems. Concurrent Programming in Java Design Principles and Pattern

124、单词 bytecode 的匹配文本：

1. The Java compiler does this by generating bytecode instructions which have nothing to do with a particular computer architecture. Core Java Volume I Fundamentals 9th Edition

1. Additionally, bytecode swap technologies like JRebel can be configured to perform hot deployment for class file changes. Gradle in Action

1. As long as the code is compiled into bytecode, we are fine. Gradle Effective Implementation Guide

1. For example, if you need to hack an old, unsupported vendor JAR at the bytecode level, or otherwise have a highly custom. Gradle Beyond The Basics

1. The reason it was discouraged is that older compilers would generate bytecode using the StringBuffer class, whose synchronization slowed down its execution. Hadoop in Practice

1. However, the bytecode format used in the class files is well documented, and it is an easy matter for someone with experience in assembly programming and a hex editor to manually produce a class file containing valid but unsafe instructions for the Java virtual machine. Core Java Volume II Advanced Features 9th Edition

1. Besides being more readable and concise than using multiple catch blocks, the generated bytecode is also smaller and does not result in the generation of duplicate code. Java 7 New Features Cookbook

1. To use the UDF inside Hive, compile the Java code and package the UDF bytecode class file into a JAR file. Programming Hive

1. Here is one possible execution trace, showing only the reads and writes to variable n that would result from the putfields and getfields in compiled bytecode. Concurrent Programming in Java Design Principles and Pattern

1. Platform independent • Works by bytecode instrumentation using ObjectWeb ASM. BTrace Java Platform Observability by Bytecode Instrumentation

125、单词 datagram 的匹配文本：

1. Your operating system and network aren’t hostile to multicast and/or User Datagram Protocol (UDP) Monitoring with Ganglia

1. Each datagram is treated as a discrete unit, unrelated to any other previously sent packet—there are no "connections" between machines at the network layer. Java Network Programming and Distributed Computing

1. Common bulk command sends documents by HTTP protocol and UDP bulk sends these using connectionless datagram protocol. Mastering ElasticSearch

1. Note too that the receive( ) method of the DatagramSocket class in Java 2 now calls the checkAccept( ) rather than the checkConnect( ) method. Java Security 2nd Edition

1. In this book, we cover only the Transmission Control Protocol (TCP).The Java platform also supports the User Datagram Protocol (UDP), which can be used to send packets (also called datagrams) with much less overhead than that of TCP. Core Java Volume II Advanced Features 9th Edition

1. If, in addition, you don’t need a guarantee that your data will get there or that the packets of data will arrive in the order you sent them, you may be interested in the DatagramSocket and DatagramPacket classes: Java in a Nutshell 5th Edition

1. Chapter 12, UDP, introduces the User Datagram Protocol (UDP) and the associated DatagramPacket and DatagramSocket classes that provide fast, unreliable communication. Java Network Programming 4th Edition

1. User Datagram Protocol (UDP) along with TCP make up the core of internet protocols. ActiveMQ in Action

1. A ‘datagram’ is a single transmission which may be delivered zero or more times. Fundamental Networking in Java

1. Solution Make use of datagram multicasting using the DatagramChannel class. Java 7 Recipes A Problem-Solution Approach

126、单词 hostname 的匹配文本：

1. The headers are key/value pairs that can be used to make routing decisions or carry other structured information (such as the timestamp of the event or hostname of the server where the event originated) Apache Flume Distributed Log Collection for Hadoop

1. The UI allowed for filtering by hostname and could limit the total number of hosts displayed. Monitoring with Ganglia

1. It tells the server the hostname that came with the request. CouchDB The Definitive Guide

1. For a joining node to receive data from other nodes, do not include its hostname or IP in its seed list. Cassandra High Performance Cookbook

1. All client hosts must be able to resolve the hostname of the machine running the tests. Scaling CouchDB

1. The protocol, hostname and port of any node in the cluster. Elasticsearch The Definitive Guide

1. We can set two servers to be seeds just by indicating the IP address or hostname of this node and then adding our second instance: Cassandra The Definitive Guide

1. The /etc/hosts file must also contain an entry for the real IP address and the hostname for a proper configuration. ActiveMQ in Action

1. For example, in Webtable, pages in the same domain are grouped together into contiguous rows by reversing the hostname components of the URLs. Bigtable - A Distributed Storage System for Structured Data

1. See Task Types for a description of the most important builtin task types.) The task declares properties (i.e., hostname, database, sql, etc.) in conventional Groovy idiom. Building and Testing with Gradle

127、单词 clojure 的匹配文本：

1. I’ve written this book for experienced Java programmers who are interested in learning how to manage and make use of concurrency on the JVM, using languages such as Java, Clojure, Groovy, JRuby, and Scala. Programming Concurrency on the JVM

1. Even though it wasn’t part of the core Maven project, a similar approach was proposed by the project Maven Polyglot that allows you to write your build definition logic, which is the project object model (POM) file, in Groovy, Ruby, Scala, or Clojure. Gradle in Action

1. You may have heard of the term closure and may be wondering whether lambdas meet the definition of a closure (not to be confused with the Clojure programming language) Java 8 in Action Lambdas, Streams and Functional-style Programming

1. One route is a transactional memory system, such as the one that comes with the Clojure language. NoSQL Distilled

1. Jython is an implementation of Python on the JVM, and there are entirely unique languages like Clojure and Scala as well. HBase in Action

1. You can write MapReduce jobs in Clojure while accessing HBase tables. HBase：The Definitive Guide

1. Seven Languages in Seven Weeks In this book you’ll get a hands-on tour of Clojure, SQL Antipatterns

1. He envisages himself using functional languages in projects and the decision to learn Clojure (a little bit of JRuby, Scala, F#, and Dart lately) influences his current self-learning activities. Java 7 New Features Cookbook

1. All are JVM (Java Virtual Machine) libraries that can be used from programming languages like Java, Clojure, Scala, JRuby, Groovy, and Jython, as opposed to tools with their own languages, like Hive and Pig. Programming Hive

1. In addition to backing pure Java apps, the Java driver also forms the basis for the drivers powering JVM languages like Scala, Clojure, and JRuby. MongoDB in Action

128、单词 descriptor 的匹配文本：

1. You’ll need to learn about Mojos (Maven’s internal extension API), how to provide a plugin descriptor (again in XML), and about specific annotations to provide the data needed in your extension implementation. Gradle in Action

1. Ivy fully supports Maven descriptor files and repositories; it also adds some extra functionality. Gradle Effective Implementation Guide

1. If you look closely at the assembly plugin configuration, you'll notice that we specify the assembly descriptor, called release.xml, in the assembly directory. Mastering ElasticSearch

1. Each Maven coordinate, whether it corresponds to any code, source, or JavaDocs at all, always describes an XML descriptor file called a POM (Project Object Model) Gradle Beyond The Basics

1. This behavior is similar to the fsync system call in POSIX that commits buffered data for a file descriptor. Hadoop：The Definitive Guide 3rd Edition

1. Open() opens a named file or directory to produce a handle, analogous to a UNIX file descriptor. Chubby - The lock service for loosely-coupled distributed systems

1. The first row has the column descriptor “CITED” from the original file. Hadoop in Action

1. The database descriptor logs are a good thing, as this indicates that the operation loaded your schemas. Cassandra The Definitive Guide

1. Truncating the file doesn’t work because the JVM will continue to write to the open file descriptor until the daemon is restarted. Hadoop Operations

1. Instead of processing messages one at a time, MDBs can process multiple messages at the same time because the EJB container can create as many instances of the MDBs as are allowed by the EJB’s deployment descriptor. ActiveMQ in Action

129、单词 lifecycle 的匹配文本：

1. We’ll touch on how these objects map to the corresponding classes in the Gradle API, Gradle’s build lifecycle, the incremental build feature, and the mechanics of registering lifecycle hooks. Gradle in Action

1. We notice that only the LIFECYCLE, WARN, QUIET, and ERROR log levels are shown if we don't add any extra command-line options. Gradle Effective Implementation Guide

1. The ApplicationMaster is the “head” of a job, managing all lifecycle aspects including dynamically increasing and decreasing resources consumption, managing the flow of execution (e.g., running reducers against the output of maps), handling faults and computation skew, and performing other local optimizations. Apache Hadoop YARN Yet Another Resource Negotiator

1. The life cycle of Java object and garbage collections In order to see how garbage collector works, let's get through the lifecycle of a sample Java object. Mastering ElasticSearch

1. Cassandra is a distributed system, and distributed deletes have a complex lifecycle of their own. Cassandra High Performance Cookbook

1. We will cover four discrete areas of Gradle functionality: file operations, custom Gradle plug-ins, build lifecycle hooks, and dependency management. Gradle Beyond The Basics

1. States The ZooKeeper object transitions through different states in its lifecycle (see Figure 14-3) Hadoop：The Definitive Guide 3rd Edition

1. Common issues, such as the data lifecycle, that need to be considered regardless of technology. Hadoop Beginners Guide

1. This class is used to configure the broker and manage its entire lifecycle. ActiveMQ in Action

1. In Maven, a plug-in is a means of extending the tool with a particular, fine-grained action to be performed during your build, often associated with one or more phases in Maven’s elaborate lifecycle model. Building and Testing with Gradle

130、单词 optimizations 的匹配文本：

1. In this section we discuss the optimizations and innovations that make Paxos practical for our system. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. Its premise is zero configuration, heavy optimizations, and a beer theme. CouchDB The Definitive Guide

1. A number of optimizations in our system are therefore targeted at reducing the amount of data sent across the network: the locality optimization allows us to read data from local disks, and writing a single copy of the intermediate data to local disk saves network bandwidth. MapReduce - Simplied Data Processing on Large Clusters

1. Similarly, the optimizer does not modify any user code as part of its optimizations. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Many optimizations can be put in place to not affect in coming read requests. Cassandra A Decentralized Structured Storage System

1. Performance tuning of large scale systems can sometimes result in counterintuitive optimizations. Scaling CouchDB

1. Query time search-as-you-type Index time optimizations Ngrams for partial matching Index time search-as-you-type Preparing the index Querying the field Edge n-grams and postcodes Ngrams for compound words 16 Elasticsearch The Definitive Guide

1. This eliminates the need for any sophisticated schema design and allows Amazon to put effort toward low-latency and high-throughput performance optimizations and their other primary goals. Cassandra The Definitive Guide

1. It presents some general tuning techniques covering such topics as persistent versus nonpersistent messages, transactions, embedded brokers, tuning the wire level protocol, tuning the TCP transport, and some optimizations for message producers and message consumers. ActiveMQ in Action

1. Locality groups do not support CPUcache-level optimizations, such as those described by Ailamaki [2] Bigtable - A Distributed Storage System for Structured Data

131、单词 repositories 的匹配文本：

1. You’ll find that some RPM-based distributions ship with Ganglia packages in the base repositories, and others require you to use special-purpose package repositories, such as the Red Hat project’s EPEL (Extra Packages for Enterprise Linux) repository. Monitoring with Ganglia

1. You’ll find additional references to source code repositories that either take some examples from the book to the next level or demonstrate the use of Gradle in contexts not covered in the book. Gradle in Action

1. We will also see how we can use repositories with dependencies in our build scripts. Gradle Effective Implementation Guide

1. That's right; you don't need to manually download dependencies if they are available in one of the available Maven repositories. Mastering ElasticSearch

1. Licensing issues prevents distribution of the Oracle JVM in most RPM repositories. Cassandra High Performance Cookbook

1. Examples include telephone call records, network logs, and web document repositories. Interpreting the Data Parallel Analysis with Sawzall

1. You may prefer this scheme over repositories like Maven Central on principal. Gradle Beyond The Basics

1. The Dapper team also needs to control the total size of data written to its central repositories, though, and thus we incorporate a second round of sampling for that purpose. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Nexus provides many benefits, one of which is a proxy to remote Maven repositories with a local cache of all downloaded artifacts. ActiveMQ in Action

1. It also offers a bridge, in the form of a dependency management definition graph, to Maven and Ivy artifact repositories without demanding remote repositories. Building and Testing with Gradle

132、单词 reducers 的匹配文本：

1. It assumes that adding more workers results in an idealized linear speedup (modulo dependencies between mappers and reducers), up to the point where map activities and all reduce activities respectively run in parallel. Omega - flexible, scalable schedulers for large compute clusters

1. When a job finishes, final results will reside in multiple files, depending on the number of reducers used in the job. Hadoop Operations and Cluster Management Cookbook

1. The ApplicationMaster is the “head” of a job, managing all lifecycle aspects including dynamically increasing and decreasing resources consumption, managing the flow of execution (e.g., running reducers against the output of maps), handling faults and computation skew, and performing other local optimizations. Apache Hadoop YARN Yet Another Resource Negotiator

1. For Reducers that first combine all the values with a given key using an associative, commutative operation, a separate userdefined Combiner function can be specified to perform partial combining of values associated with a given key during the Map phase. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. This is useful when retrieving the output of a MapReduce computation with multiple reducers where each reducer produces a part of the output. Hadoop MapReduce Cookbook

1. Your reducers need to be able to generate a line for each word in your input, so your map output key should be each word in the input files so that MapReduce can join them all together. Hadoop in Practice

1. Under the MapReduce model, the data processing primitives are called mappers and reducers. Hadoop in Action

1. Using Hadoop Streaming, they use PHP to MapReduce over their data and output directly to Cassandra via Thrift in their reducers. Cassandra The Definitive Guide

1. Hadoop provides a standard specification (that is, interface) for the map and reduce functions, and implementations of these are often referred to as mappers and reducers. Hadoop Beginners Guide

1. Reducers, as we’ll see later, operate on the intermediate key-value data we output from the mapper. Hadoop Operations

133、单词 dao 的匹配文本：

1. Because all the necessary steps are implemented, each calculator can implement the overloaded calculate(s: DAO, productId: String) method. Scala in Action

1. Consider the case where you have a data access object (DAO) component that performs data operations against a PostgreSQL database and you want to upgrade to Oracle. Pro Spring 3

1. For several use cases, it can even eliminate the need to write our own implementation of a data access object (DAO) or repository. Pro Spring MVC

1. For example, if you want to ensure that a particular Data Access Object (DAO) in your application is called only by an administrator user, you can’t do that with web security alone. Pro Spring Security

1. The model, DAO, and command-line implementations look similar to what you’ve seen for the users table. HBase in Action

1. It depends on a favorites service to pull back a list of favorite movies and a movie DAO to fetch details about individual movies: Functional Programming Patterns in Scala and Clojure

1. The relationship between a model and a DAO like Active Record should. SQL Antipatterns

1. So you could create a DAO with a get method as shown here: Cassandra The Definitive Guide

1. Use a dirty flag where supported by the EJB server or in a BMP or DAO implementation to avoid writing unchanged EJBs to the database. Java Performance Tuning 2nd Edition

1. It should be simple to convert one to the other automatically.” This is a thought we’ve probably all had at one point or another while writing yet another Data Access Object (DAO) to convert Java Database Connectivity (JDBC) result sets into something object-oriented. Pro JPA2

134、单词 znode 的匹配文本：

1. ZooKeeper provides commands to access or update Znode from its command-line tool; type help for more information. HBase Administration Cookbook

1. This command will create a znode in ZooKeeper where the automatic failover system will store the data. Hadoop Operations and Cluster Management Cookbook

1. Outputs information that ZooKeeper has about HBase install Shows parent znode. HBase in Action

1. When the master notices that a regionserver is no longer reachable, usually because the servers’s znode has expired in ZooKeeper, it splits the dead regionserver’s commit log by region. Hadoop：The Definitive Guide 3rd Edition

1. Because of the distributed nature of these events, the servers are using ZooKeeper to track specific states in a dedicated znode. HBase：The Definitive Guide

1. The main differences between ZooKeeper and standard filesystems are that every znode can have data associated with it and znodes are limited to the amount of data that they can have. Apache Kafka

1. ZooKeeper holds a zNode with the physical address of the Primary AvatarNode for a given cluster. Apache Hadoop Goes Realtime at Facebook

1. The absence of data often conveys important information about a znode. ZooKeeper Distributed Process Coordination

135、单词 netty 的匹配文本：

1. Under the covers, Akka uses JBoss Netty and Google Protocol Buffers libraries to make the remoting happen seamlessly. Programming Concurrency on the JVM

1. Netty provides performance and scalability, leaving you free to focus on what really interests you - your unique application! Netty in Action MEAP V08

1. While the sort benchmark would typically benefit from improvements to the HDFS data path, both benchmarks perform better on YARN primarily due to significant improvements in the MapReduce runtime itself: map-side sort improvements, a reduce client that pipelines and batches transfers of map output, and a server-side shuffle based on Netty [3] Apache Hadoop YARN Yet Another Resource Negotiator

1. Think of the Google protocol buffer as XML but smaller and faster, and Netty as a non-blocking I/O (NIO) implementation, which allows Akka to efficiently use threads for I/O operations. Scala in Action

1. MapReduce 2 uses Netty, which by default allows up to twice as many threads as there are processors.) Hadoop：The Definitive Guide 3rd Edition

136、单词 localhost 的匹配文本：

1. In this case, we’ve used the localhost address and the default CouchDB port number, 5984 Getting Started with CouchDB

1. By default, it tries to talk to the gmond running on localhost, but you can specify another running gmond by specifying -i. Monitoring with Ganglia

1. Some common network configurations cause the replication test to fail when accessed via the localhost address. CouchDB The Definitive Guide

1. Starting the application with this configuration will expose the URL http://localhost:9090/todo. Gradle in Action

1. Gradle will keep running, and at the end, we see that the application is running at http://localhost:8080/web. Gradle Effective Implementation Guide

1. If you’re set on using localhost, disabling IPv6 will also fix this issue. Beginning CouchDB

1. If you would prefer to use cURL, first create the catalog-a database: curl -X PUT http://localhost:5984/catalog-a. Scaling CouchDB

1. If you didn't install Elasticsearch on your local machine, you should be able to reach it by replacing “localhost” with the IP address of the remote machine. Elasticsearch in Action MEAP V07 cHAPTER 1

1. That’s because this cluster of one node at localhost is set up for you by default. Cassandra The Definitive Guide

1. Note that all the artifacts are being downloaded from the localhost instead of from a remote Maven repository. ActiveMQ in Action

137、单词 iteration 的匹配文本：

1. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. Pregel - A System for Large-Scale Graph Processing

1. Metrics submitted using a “positive” slope generate RRDs that are created as COUNTERs; however, this mechanism is not ideal for situations involving incrementing values that submit on each iteration (i.e., Apache httpd page serving counts without log-scraping) Monitoring with Ganglia

1. We’ll build the first iteration of the example application without using show or list, because writing Ajax queries against the JSON API is a better way to teach CouchDB as a database. CouchDB The Definitive Guide

1. Groovy automatically exposes an implicit variable named it to indicate the loop iteration index. Gradle in Action

1. Part of this long iteration phase was due to a slow realization that Spanner should do more than tackle the problem of a globallyreplicated namespace, and should also focus on database features that Bigtable was missing. Spanner - Google’s Globally-Distributed Database

1. This will occur, of course, for every iteration of the loop, so each previous owner’s mileage will be added to the total. Beginning CouchDB

1. For example, the most recent iteration of the eBay architecture is mostly Java, but the search engine is in C++ Cassandra The Definitive Guide

1. For the third iteration, we did extensive experiments with two major styles of execution: LLVM based native code generation with row major block based intermediate data and column major vector based processing with columnar intermediate storage. Tenzing - A SQL Implementation On The MapReduce Framework

1. If we know that each partition is sorted and contains the same key range, simple iteration logic can do the required matching. Hadoop Beginners Guide

1. The iteration is stopped once no chunk with more rows than a given threshold, e.g., 50’000, exists. PowerDrill - Processing a Trillion Cells per Mouse Click

138、单词 pom 的匹配文本：

1. The binary package contains the same Maven POM file as the source archive so you can always recompile the code even if you start with the binary distribution. Apache Flume Distributed Log Collection for Hadoop

1. Even though it wasn’t part of the core Maven project, a similar approach was proposed by the project Maven Polyglot that allows you to write your build definition logic, which is the project object model (POM) file, in Groovy, Ruby, Scala, or Clojure. Gradle in Action

1. Uploading to a Maven repository If we want to upload to a Maven repository, we must create a Maven POM (Project Object Model) file. Gradle Effective Implementation Guide

1. If you would like the code examples to work with ElasticSearch 1.0.0Beta version, you should modify ElasticSearch version in the pom.xml file. Mastering ElasticSearch

1. A pom file contains information on the project, including dependencies and plugin configuration information. Cassandra High Performance Cookbook

1. Each Maven coordinate, whether it corresponds to any code, source, or JavaDocs at all, always describes an XML descriptor file called a POM (Project Object Model) Gradle Beyond The Basics

1. The following is the relevant part of the Maven Project Object Model (POM): Hadoop：The Definitive Guide 3rd Edition

1. Building with Maven The original authors of Cassandra apparently didn’t care much for Maven, so the early releases did not include any Maven POM file. Cassandra The Definitive Guide

1. Using Maven, you’ll need to add the following dependency to the pom.xml file: ActiveMQ in Action

1. The Maven POM and Gradle Build With the big picture established, let’s take a piecewise approach to comparing Maven and Gradle. Building and Testing with Gradle

139、单词 ldap 的匹配文本：

1. We rely on the web server to provide authentication, so any Apache authentication system (htpasswd, LDAP, etc.) is supported. Monitoring with Ganglia

1. The authentication system is pluggable, so you can integrate with existing services to authenticate users to CouchDB using an HTTP layer, using LDAP integration, or through other means. CouchDB The Definitive Guide

1. Can you tie in your organization’s Lightweight Directory Access Protocol (LDAP) directory and user groups to Hadoop’s permissions structure? Professional Hadoop Solutions

1. The classes themselves typically interact with the operating system, using system calls to authenticate the user via Solaris' NIS or NIS+, the Windows NT login service, an LDAP server, or whatever other authentication system is available on the platform. Java Security 2nd Edition

1. The chapter also has a brief introduction to hierarchical databases (such as LDAP directories) and the JNDI API. Core Java Volume II Advanced Features 9th Edition

1. A CertStore, however, may represent a large public database (in the form of an LDAP server, for examle) of untrusted certificates. Java in a Nutshell 5th Edition

1. Providing Your Own Authentication You can provide your own method of authenticating to Cassandra if you want to make special requirements, such as a Kerberos ticket or encryption, or if you want to store passwords in a different location, such as an LDAP directory. Cassandra The Definitive Guide

1. Consider the table Employee with fields emp id, ldap user, name, dept id, and salary. Tenzing - A SQL Implementation On The MapReduce Framework

1. The KDC contains a database of principals and their keys, very much like /etc/passwd and some KDC implementations (including MIT Kerberos) support storing this data in centralized systems like LDAP. Hadoop Operations

1. Fortunately, ActiveMQ comes with implementations of some modules that can authenticate users using properties files, LDAP, and SSL certificates, which will be enough for many use cases. ActiveMQ in Action

140、单词 nosql 的匹配文本：

1. NoSQL design space: we partition the datastore and replicate each partition separately, providing full ACID semantics within partitions, but only limited consistency guarantees across them. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. The uncompressed dataset is tens of terabytes, which is small compared to many NoSQL instances, but was large enough to cause difficulties with sharded MySQL. Spanner - Google’s Globally-Distributed Database

1. Can you have a scalable database without going NoSQL? Yes. F1 - The Fault-Tolerant Distributed RDBMS

1. His background includes a wide range of programming languages (including Java and C#), search engines (including Lucene), databases, and NoSQL distributed data stores. Cassandra High Performance Cookbook

1. Or, you can index data from your NoSQL data-store, to add search capabilities there. Elasticsearch in Action MEAP V07 cHAPTER 1

1. It has become the de factostandard for exchanging data in the NoSQL world. Elasticsearch The Definitive Guide

1. Apache Hadoop is a NoSQL applications framework that runs on distributed clusters. Hadoop in Action

1. It’s a picture that makes some NoSQL solutions seem perhaps less radical and less scary than we may have thought at first, and more like a natural expression and encapsulation of some of the work that was already being done to manage very large databases. Cassandra The Definitive Guide

1. He has worked with flat files, indexed files, hierarchical databases, network databases, relational databases, NoSQL databases, Hadoop, and related technologies. Hadoop Beginners Guide

1. These systems can also be grouped with the NoSQL databases (MongoDB [27], to name one of many): both offer higher performance and scale better than traditional databases, but provide weaker semantics. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

141、单词 udp 的匹配文本：

1. Syslog has an older UDP transport as well as a newer TCP protocol that can handle data larger than a single UDP packet can transmit (about 64k), as well as deal with network-related congestion events that might require the data to be retransmitted. Apache Flume Distributed Log Collection for Hadoop

1. Your operating system and network aren’t hostile to multicast and/or User Datagram Protocol (UDP) Monitoring with Ganglia

1. The UDP port for reporting is referred to as PORT, and for simplicity, the multicast port will be identical to PORT. Pro Hadoop

1. The second and third way allows us to send many documents using the bulk API and the UDP bulk API. Mastering ElasticSearch

1. Getting ready Syslog is a simple text-based protocol designed to transfer log messages over UDP. Cassandra High Performance Cookbook

1. All system control messages rely on UDP based messaging while the application related messages for replication and request routing relies on TCP. Cassandra A Decentralized Structured Storage System

1. Flume also supports TCP/UDP data sources that can receive logs from syslog. Hadoop in Practice

1. We were forced to send KeepAlive RPCs via UDP rather than TCP; UDP has no congestion avoidance mechanisms, so we would prefer to use UDP only when high-level timebounds must be met. Chubby - The lock service for loosely-coupled distributed systems

1. Flume NG also supports a sequence generator source (mostly for testing) as well as both TCP and UDP variants of a source that reads syslogd data. Hadoop Beginners Guide

1. The same applies to appropriate ActiveMQ connectors, so a UDP connector can offer some performance advantages over the TCP connector, but it’s still not often used because of the unreliability it introduces (as explained in more detail later) ActiveMQ in Action

142、单词 rpc 的匹配文本：

1. This file is organized into different contexts: jvm, rpc, hdfs, mapred, and hbase. Monitoring with Ganglia

1. Thrift's code generation takes the interface file as input and generates network RPC clients in many languages. Cassandra High Performance Cookbook

1. What has not changed: Latency of an RPC Disk drive throughput and seek latency. Storage Architecture and Challenges

1. True application-level transparency, possibly our most challenging design goal, was achieved by restricting Dapper’s core tracing instrumentation to a small corpus of ubiquitous threading, control flow, and RPC library code. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. We omit details that are covered elsewhere in the literature, such as the details of a consensus protocol or an RPC system. Chubby - The lock service for loosely-coupled distributed systems

1. We enhanced the Hadoop RPC software to automatically determine the version of the software running on the server that it is communicating with, and then talk the appropriate protocol while talking to that server. Apache Hadoop Goes Realtime at Facebook

1. This directory includes the Thrift and Avro RPC libraries for interacting with Cassandra. Cassandra The Definitive Guide

1. Loose coupling is commonly introduced into an architecture to mitigate the classic tight coupling of Remote Procedure Calls (RPC) ActiveMQ in Action

1. Using a scan reduces the number of RPCs executed by the benchmark since a single RPC fetches a large sequence of values from a tablet server. Bigtable - A Distributed Storage System for Structured Data

1. To save RPC overhead (at the cost of increasing transaction latency) each Percolator worker batches timestamp requests across transactions by maintaining only one pending RPC to the oracle. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

143、单词 dsl 的匹配文本：

1. A build tool that carries on this idea is Gant, a DSL on top of Ant written in Groovy. Gradle in Action

1. Build logic is described with a powerful DSL and empowers developers to create reusable and maintainable build logic. Gradle Effective Implementation Guide

1. Chapter 2, Power User Query DSL, describes how Apache Lucene scoring works, why ElasticSearch rewrites queries, and how query rescore mechanism works. Mastering ElasticSearch

1. Although entire books could be written about every DSL for Hadoop, this section gives you a quick “taste” of some of them to show how this growing area of Hadoop can lower the barriers to the learning curve of Hadoop for its users. Professional Hadoop Solutions

1. You can do many of the same things through the Gradle DSL as described previously, but you might prefer in some cases to drop back to direct manipulation. Gradle Beyond The Basics

1. HBase-DSL This project gives you dedicated classes that help when formulating queries against an HBase cluster. HBase：The Definitive Guide

1. Inverted index Analysis and analyzers Built-in analyzers When analyzers are used Testing analyzers Specifying analyzers Mapping Core simple field types Viewing the mapping Customizing field mappings Updating a mapping Testing the mapping Complex core field types Multi-value fields Empty fields Multi-level objects Mapping for inner objects How inner objects are indexed Arrays of inner objects Empty search Query DSL Structure of a query clause Combining multiple clauses Queries and Filters Performance differences When to use which Most important queries and filters Combining queries with filters Filtering a query Just a filter A query as a filter Validating queries Understanding errors Understanding queries Conclusion Sorting Sorting by field values Multi-level sorting Sorting on multi-value fields String sorting and multi-fields What is relevance? Understanding the score. Elasticsearch The Definitive Guide

1. Camel can use either a Java-based domain-specific language (DSL), or Scala DSL, or an XML-based DSL to define routes. ActiveMQ in Action

1. A Clojure DSL for Cascading that provides additional functionality inspired by Datalog for data processing and query abstractions. Programming Hive

1. Gradle provides a way of modeling a build system using a DSL with convention over configuration, yet allowing an enterprise to define its own model. Building and Testing with Gradle

144、单词 acl 的匹配文本：

1. Grant full privileges to admin users by uncommenting the following line in the kadm5.acl file: HBase Administration Cookbook

1. You will also discover how to view the access control list (ACL) of a file and how to set Unix permissions on a file. Pro Java 7 NIO.2

1. The default attribute defines the default policy for the entire ACL. Monitoring with Ganglia

1. We will show you how to configure ACL for authorization and Kerberos for authentication, configure NameNode HA, recover from a failed NameNode, and so on. Hadoop Operations and Cluster Management Cookbook

1. You can refresh the servicelevel authorization for the NameNode and JobTracker without restarting the Hadoop daemons using the –refreshServiceAcl command switch on the dsfadmin (for NameNode) and mradmin (for JobTracker) executables. Professional Hadoop Solutions

1. The format for an ACL is a comma-separated list of usernames, followed by whitespace, followed by a comma-separated list of group names. Hadoop：The Definitive Guide 3rd Edition

1. After the agent has started, modifying the ACL file has no further effect. Oracle JRockit The Definitive Guide

1. Each node has various meta-data, including three names of access control lists (ACLs) used to control reading, writing and changing the ACL names for the node. Chubby - The lock service for loosely-coupled distributed systems

1. Notice that the acl view is not supported and a unix and posix view are supported. Java 7 New Features Cookbook

1. The queue has an access control list (ACL) associated with it that defines which users and groups may submit jobs, but also which users may administer the queue. Hadoop Operations

145、单词 optimizer 的匹配文本：

1. What happens if the virtual machine loads another subclass that overrides an inlined method? Then the optimizer must undo the inlining. Core Java Volume I Fundamentals 9th Edition

1. The FlumeJava optimizer transforms a user-constructed, modular FlumeJava execution plan into one that can be executed efficiently. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. These analytics are running in batch and are taking several minutes already, so what does a minute or two more really matter? In some cases, the query plan optimizer in Pig or Hive will be better at optimizing your code than you are! In a small fraction of situations, the extra few minutes added by Pig or Hive will matter, in which case you should use Java MapReduce. MapReduce Design Patterns

1. A default optimizer is a cost-based optimizer, but additional custom optimizers can be introduced, based on the open APIs provided by Drill. Professional Hadoop Solutions

1. The compiler generates a logical plan that is then optimized through a simple rule based optimizer. Hive A Petabyte Scale Data Warehouse Using Hadoop

1. Aurora’s efficiency comes from a carefully designed run-time system and a query optimizer, rather than Sawzall’s brute force parallel style. Interpreting the Data Parallel Analysis with Sawzall

1. Hive currently uses a rule-based query optimizer for determining how to execute a query, but it’s likely that in the future a cost-based optimizer will be added. Hadoop：The Definitive Guide 3rd Edition

1. The Tenzing optimizer applies some basic rule and cost based optimizations to create an optimal execution plan. Tenzing - A SQL Implementation On The MapReduce Framework

1. Hive also provides a “hint” mechanism to tell the query optimizer which table should be streamed: Programming Hive

1. Parallel databases use an optimizer tailored for distributed workloads that turn SQL commands into a query plan whose execution is divided equally among multiple nodes. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

146、单词 mappings 的匹配文本：

1. The last thing is the character mappings, which is used before tokenizer and is responsible for processing text before any analysis is done. Mastering ElasticSearch

1. A tablet is similar to Bigtable’s tablet abstraction, in that it implements a bag of the following mappings: Spanner - Google’s Globally-Distributed Database

1. Moreover, when dealing with CSV or tab-separated data, a slightly misaligned INT or FLOAT type mapping in your Hive table declaration, where the data has a STRING, could lead to NULL mappings per row. Hadoop RealWorld Solutions Cookbook

1. If we provide no artifact or Ivy mappings (as in Example 4-21), Gradle assumes that the Ivy repo uses the default Maven structure. Gradle Beyond The Basics

1. Datanodes must send block reports to both namenodes because the block mappings are stored in a namenode’s memory, and not on disk. Hadoop：The Definitive Guide 3rd Edition

1. Inverted index Analysis and analyzers Built-in analyzers When analyzers are used Testing analyzers Specifying analyzers Mapping Core simple field types Viewing the mapping Customizing field mappings Updating a mapping Testing the mapping Complex core field types Multi-value fields Empty fields Multi-level objects Mapping for inner objects How inner objects are indexed Arrays of inner objects Empty search Query DSL Structure of a query clause Combining multiple clauses Queries and Filters Performance differences When to use which Most important queries and filters Combining queries with filters Filtering a query Just a filter A query as a filter Validating queries Understanding errors Understanding queries Conclusion Sorting Sorting by field values Multi-level sorting Sorting on multi-value fields String sorting and multi-fields What is relevance? Understanding the score. Elasticsearch The Definitive Guide

1. To resolve this problem, we chose to group name entries into batches so that a single lookup would return and cache the name mappings for a large number (typically 100) of related processes within a job. Chubby - The lock service for loosely-coupled distributed systems

1. Is it certain that we’ve done anything but move the problem here? Of course, with some systems, such as those that make extensive use of document exchange, as with services or XML-based applications, there are not always clear mappings to a relational database. Cassandra The Definitive Guide

1. For any Java Map object, its contents are a set of mappings from a given key of a specified type to a related value of a potentially different type. Hadoop Beginners Guide

1. One of the most common types of scripts is one that uses a CSV file of machine to rack mappings. Hadoop Operations

147、单词 runnable 的匹配文本：

1. Place the code for the task into the run method of a class that implements the Runnable interface. Core Java Volume I Fundamentals 9th Edition

1. Because Ganglia is an open source project, it is possible to compile a runnable binary executable of the gmond agent on virtually any platform with a C compiler. Monitoring with Ganglia

1. In Java, independently executable tasks must implement interface Runnable and define a run method. A Java ForkJoin Framework

1. A class that we will use to periodically check a given URL address; we will develop it as a Runnable interface implementation, because we will use thread executors to run it. Mastering ElasticSearch

1. In the case of parallel files upload, for every file in the directory, a new runnable class is created and is submitted to the executor. Professional Hadoop Solutions

1. Also, the scheduler employs a work stealing algorithm that allows some task scheduling flexibility to prevent nodes from running idle when runnable tasks are waiting. Incoop MapReduce for Incremental Computations

1. In the old API this is possible for mappers by writing a MapRunnable, but no equivalent exists for reducers. Hadoop：The Definitive Guide 3rd Edition

1. You can see the count for each possible thread state, including new, runnable, blocked, and so on. HBase：The Definitive Guide

1. The CPU load average in Linux is the average number of processes in a runnable state on a machine, over a rolling window of time. Hadoop Operations

1. The classes that can support Runnable typically have nothing in common except that they contain a run method. Concurrent Programming in Java Design Principles and Pattern

148、单词 init 的匹配文本：

1. If not, you can always start or stop using the init scripts. Getting Started with CouchDB

1. The metric_init callback function is the first of three functions that must be defined and implemented in every gmond metric module. Monitoring with Ganglia

1. SysV/BSD-Style Systems Depending on your operating system, the couchdb daemon script could be installed into a directory called init.d (for SysV-style systems) or rc.d (for BSD-style systems) under. CouchDB The Definitive Guide

1. Target named init that used task mkdir to create directory build. Gradle in Action

1. We create a new Git repository in the current project directory, with the init command in Git: Gradle Effective Implementation Guide

1. As a result, you do not have to run a list of targets sequentially; the dist target will run its dependents such as compile and init and their dependencies in proper order. Cassandra High Performance Cookbook

1. Finally, you'll have your init script as a symbolic link to the “elasticsearch” service wrapper script: Elasticsearch in Action MEAP V07 cHAPTER 1

1. You’ll need sudo access for your user to run this command (it launches the Hadoop services via init.d scripts): Hadoop in Practice

1. You can run it in a command prompt by executing ec2-init.bat. Hadoop in Action

1. The server starts up with a call to the initServer method of this class, upon which the server registers the SEDA verb handlers, makes some determinations about its state (such as whether it was bootstrapped or not, and what its partitioner is), and registers itself as an MBean with the JMX server. Cassandra The Definitive Guide

149、单词 multicast 的匹配文本：

1. Look at each instance of gmond within the same network broadcast domain (since reachability is based on multicast packets), and find the udp_recv_channel block in gmond.conf. Apache Flume Distributed Log Collection for Hadoop

1. Instead of the default UDP multicast communication method we use between Gmonds, we have configured it to use unicast messages. HBase Administration Cookbook

1. Ganglia’s use of a lightweight multicast listen/announce protocol was influenced by Brent Chun’s early work building a scalable execution environment for clusters. Monitoring with Ganglia

1. The UDP port for reporting is referred to as PORT, and for simplicity, the multicast port will be identical to PORT. Pro Hadoop

1. Hadoop supports network communication through both unicast (with normal IP addresses, which is the one we use here) and multicast, which uses multicast addresses, such as 239.2.11.71 Hadoop Operations and Cluster Management Cookbook

1. The boostrap process When the ElasticSearch node starts, it uses multicast (or unicast, if configured) to find the other nodes in the same cluster (the key here is the cluster name defined in the configuration) and connect to them. Mastering ElasticSearch

1. It actively monitors the host for changes, which it will announce using uni- or multicast network messages. HBase：The Definitive Guide

1. More sophisticated distributed object systems (such as JINI) allow for “multicast” lookup of remote objects that might be on a number of different servers. Core Java Volume II Advanced Features 9th Edition

1. It uses Kestrel as a loosely coupled, reliable message queue that picks a server at random to write to, so there is no cross-server communication (no clustering, no multicast, etc.) Cassandra The Definitive Guide

1. Dynamic networks using such protocols as multicast and discovery will also be covered. ActiveMQ in Action

150、单词 singleton 的匹配文本：

1. After type erasure there is only one Singleton class, and only one singleInstance field. Core Java Volume I Fundamentals 9th Edition

1. Implementing the JSONRiverModule class The JSONRiverModule class is responsible for binding the river class and telling ElasticSearch that it should be a singleton. Mastering ElasticSearch

1. In many scenarios, it is often cleaner and more readable than using blank placeholder values or static singleton instances for output. Hadoop RealWorld Solutions Cookbook

1. Another difference with String is that Text is mutable (like all Writable implementations in Hadoop, except NullWritable, which is a singleton) Hadoop：The Definitive Guide 3rd Edition

1. Remember to add a readResolve method to all typesafe enumerations in your legacy code and to all classes that follow the singleton design pattern. Core Java Volume II Advanced Features 9th Edition

1. When get is called, we could always return the same instance (singleton) or a new instance with each invocation. Getting Started with Google Guava

1. In the case of a singleton, getInstance takes no parameters and returns the sole instance. Effective Java 2nd Edition

1. AntiEntropyService implements the Singleton pattern and defines the static Dif ferencer class as well, which is used to compare two trees; if it finds any differences, it launches a repair for the ranges that don’t agree. Cassandra The Definitive Guide

1. NullWritable: This is a singleton object representation of a null value. Hadoop Beginners Guide

1. As described in the Design Patterns book, a Singleton class intentionally supports only one instance. Concurrent Programming in Java Design Principles and Pattern

151、单词 optionally 的匹配文本：

1. Remember if you increase this value you may also have to increase your Java heap space using the -Xmx and optionally the -Xms parameters. Apache Flume Distributed Log Collection for Hadoop

1. CouchDB can write the JSON document directly, simplifying the writing/update process, while allowing you to optionally enfore a structure on your JSON documents within the database itself if you need it. Getting Started with CouchDB

1. The addresses may be specified as IP addresses or DNS hostnames and may optionally be suffixed by a colon followed by the port number where the gmond tcp_accept_channel is to be found. Monitoring with Ganglia

1. An MSCR operation has M input channels (each performing a map operation) and R output channels (each optionally performing a shuffle, an optional combine, and a reduce) FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Each item in this array represents an individual view, and each of these views contains a map and, optionally, a reduce function. Beginning CouchDB

1. Optionally, you could have checked “Continuous” to trigger continuous replication. Scaling CouchDB

1. Optionally, the original values may also be stored unchanged so that they can be retrieved later. Elasticsearch The Definitive Guide

1. Optionally, the client may provide a content generation number to allow the client to simulate compare-and-swap on a file; the contents are changed only if the generation number is current. Chubby - The lock service for loosely-coupled distributed systems

1. For information on how to obtain (and optionally build) the NMS project, please refer to the NMS project site. ActiveMQ in Action

1. Optionally, an SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk. Bigtable - A Distributed Storage System for Structured Data

152、单词 failover 的匹配文本：

1. Failover options are also covered to create a more robust data pipeline. Apache Flume Distributed Log Collection for Hadoop

1. The master can support fast ACID transactions but risks downtime or data loss during failover to a slave. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. In this way, every node in a cluster is capable of acting as the primary node or reporting node for the gmetad aggregator in case of a failover situation. Monitoring with Ganglia

1. Then we’ll cover how to add redundant hot-failover twin nodes, so you don’t have to worry about losing machines. CouchDB The Definitive Guide

1. You will now need to set up a failover system so that in the event of the primary server crashing, requests are redirected to the secondary server. Beginning CouchDB

1. This setup typically involves having two or more load balancers sharing the same IP address, with one configured as a failover. Scaling CouchDB

1. A cluster is a group of nodes with the same cluster.name that are working together to share data and to provide failover and scale, although a single node can form a cluster all by itself. Elasticsearch The Definitive Guide

1. New process FailOver Controller (ZKFC) responsible for monitoring and failover. Namenode High Availability

1. Now you have the problem of data replication and consistency during regular usage and in failover scenarios. Cassandra The Definitive Guide

1. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action

153、单词 parameterized 的匹配文本：

1. However, we expect that most readers will use this knowledge primarily for help with troubleshooting and to satisfy their curiosity about the inner workings of the parameterized collection classes. Core Java Volume I Fundamentals 9th Edition

1. The Parameterized Trigger plugin extends the functionality of chaining build jobs with the ability to declare parameters for the triggered job. Gradle in Action

1. Instantiate the MapDriver class parameterized exactly as the mapper under test. Professional Hadoop Solutions

1. Then we set the parameterized Writable types for the key-value pairs out of the mapper. Hadoop RealWorld Solutions Cookbook

1. The Plugin interface is type-parameterized because plug-ins can theoretically be applied to any kind of Gradle object. Gradle Beyond The Basics

1. We are in the process of deploying an adaptive sampling scheme that is parameterized not by a uniform sampling probability, but by a desired rate of sampled traces per unit time. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. The context objects are used for emitting key-value pairs, and so they are parameterized by the output types so that the signature of the write() method is: Hadoop：The Definitive Guide 3rd Edition

1. This will change the how the job is parameterized in the future. HBase：The Definitive Guide

1. When you write a reusable script, it’s generally parameterized such that you can vary its operation for each run. Hadoop in Action

1. The get command is using the parameterized execute method, as are other sibling commands for insert and delete (not shown in the example) Cassandra The Definitive Guide

154、单词 serialized 的匹配文本：

1. As soon as UDP packets on port 1234 are received, gmond-debug will attempt to decode it and print a serialized version of the information contained therein. Monitoring with Ganglia

1. Updates are serialized, so CouchDB will never attempt to write documents faster than your disk can spin, and it also means that two mutually conflicting writes can’t be written at the same time. CouchDB The Definitive Guide

1. Internally, super columns must be completely serialized and de-serialized to be accessed. Cassandra High Performance Cookbook

1. As the columns for a given key are being serialized and written out to disk we generate indices at every 256K chunk boundary. Cassandra A Decentralized Structured Storage System

1. But when you’re working with big data volumes you’ll need serialization formats with compact serialized forms that natively support partitioning and have schema evolution features. Hadoop in Practice

1. When an object has been serialized into JSON it is. Elasticsearch The Definitive Guide

1. In the standard Mapper model , the output key/value pairs are serialized and written to disk,1 prepared to be shuffled. Hadoop in Action

1. Message streaming is Cassandra’s optimized way of sending sections of SSTable files from one node to another; all other communication between nodes occurs via serialized messages. Cassandra The Definitive Guide

1. In some cases, we also use a sorted disk based serialized implementation for the bigger tables to conserve memory. Tenzing - A SQL Implementation On The MapReduce Framework

1. Messages must be serialized in and out of a byte sequence to be sent over the wire using what’s known as a wire protocol. ActiveMQ in Action

155、单词 combiner 的匹配文本：

1. To enable this optimization the user subclasses the Combiner class, overriding a virtual Combine() method. Pregel - A System for Large-Scale Graph Processing

1. In this job configuration file, we specified the job name, the mapper class, the reducer class, the combiner class, the input format, and output format for the job. Hadoop Operations and Cluster Management Cookbook

1. We allow the user to specify an optional Combiner function that does partial merging of this data before it is sent over the network. MapReduce - Simplied Data Processing on Large Clusters

1. For Reducers that first combine all the values with a given key using an associative, commutative operation, a separate userdefined Combiner function can be specified to perform partial combining of values associated with a given key during the Map phase. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. We will demonstrate the application of a combiner to optimize data transfer overhead between the map and reduce stages. Hadoop RealWorld Solutions Cookbook

1. To optimize such scenarios, Hadoop supports a special function called combiner. Hadoop MapReduce Cookbook

1. Summary The simplest ways to cut down on shuffle and sort times is to use a combiner, and to compress your map outputs. Hadoop in Practice

1. Why do we need to sum the count? We’ve chosen this route because it will be easier for us later if we choose to add a combiner to enhance performance. Hadoop in Action

1. Apart from the combiner…maybe There is one additional, and optional, step that we omitted previously. Hadoop Beginners Guide

1. Unlike Hadoop’s Combiner, Hive serializes partial aggregates into strings instead of maintaining them in their natural binary representation. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

156、单词 generics 的匹配文本：

1. This sounds scary, but if you think about it, the behavior is simply as it was before generics were added to Java. Core Java Volume I Fundamentals 9th Edition

1. Higher level clients such as Hector bring more abstraction and take more advantage of language features such as Java's generics. Cassandra High Performance Cookbook

1. Because the entire reason to have generics is to generate compiler. Data Structures and Algorithm Analysis in Java 3rd Edition

1. After all, why can’t the types be determined from a combination of the mapper and the reducer? The answer has to do with a limitation in Java generics: type erasure means that the type information isn’t always present at runtime, so Hadoop has to be given it explicitly. Hadoop：The Definitive Guide 3rd Edition

1. Here we are creating ConcurrentMap with String keys and Book objects for the values (specified by the generics on the ConcurrentMap declaration) Getting Started with Google Guava

1. The most significant set of changes was the addition of generics, enum types, annotations, autoboxing, and the for-each loop in Java 5 Effective Java 2nd Edition

1. When generics were introduced in Java 1.5, it became easier to write code to address a number of similar problems. Java 7 New Features Cookbook

1. Although the use of Java generics can make this look a little opaque at first, there is actually not that much going on. Hadoop Beginners Guide

1. Internally, Hadoop uses many of the features introduced with Java 6, such as generics and concurrency utilities. Hadoop Operations

1. Note that Java syntax conventions for generics are followed for the collection types. Programming Hive

157、单词 compaction 的匹配文本：

1. Status Displays a list of the running background tasks on the server, including view index building, compaction, and replication. Getting Started with CouchDB

1. This is the number of Stores in the RegionServer that have been targeted for compaction. Monitoring with Ganglia

1. Cassandra has numerous counters and the ability to trigger operations such as clearing the Key Cache or disabling compaction over JMX. Cassandra High Performance Cookbook

1. This process is very similar to the compaction process that happens in the Bigtable system. Cassandra A Decentralized Structured Storage System

1. It guides you through the process of creating a database and working with data; covers more complex topics such as views and map/reduce; and explores advanced database concepts such as replication, compaction, and deployment. Beginning CouchDB

1. Run database compaction, view compaction, and view cleanup when the database is not under heavy load. Scaling CouchDB

1. Read performance is correlated with the number of files in a region and thus critically hinges on a well-tuned compaction algorithm. Apache Hadoop Goes Realtime at Facebook

1. During a major compaction, the server initiates a TreeRequest/ TreeReponse conversation to exchange Merkle trees with neighboring nodes. Cassandra The Definitive Guide

1. HBase regularly rewrites its data files (during a major compaction), which “fixes” block distribution and data locality anyway. Hadoop Operations

1. This minor compaction process has two goals: it shrinks the memory usage of the tablet server, and it reduces the amount of data that has to be read from the commit log during recovery if this server dies. Bigtable - A Distributed Storage System for Structured Data

158、单词 datasets 的匹配文本：

1. From a practical point of view, B-trees, therefore, guarantee an access time of less than 10 ms even for extremely large datasets. CouchDB The Definitive Guide

1. This paper describes a system called Dremel1 that supports interactive analysis of very large datasets over shared clusters of commodity machines. Dremel - Interactive Analysis of WebScale Datasets

1. As HDFS scaled, more compute clusters could be allocated on top of it, creating a virtuous cycle of increased user density over more datasets, leading to new insights. Apache Hadoop YARN Yet Another Resource Negotiator

1. With its distributed storage and compute capabilities, Hadoop is fundamentally an enabling technology for working with huge datasets. Hadoop in Practice

1. These use cases would have workloads that are challenging to scale because of very high write throughput, massive datasets, unpredictable growth, or other patterns that may be difficult or suboptimal in a sharded RDBMS environment. Apache Hadoop Goes Realtime at Facebook

1. Before explaining how to use the contrib package, let’s go through all the major steps in a repartitioned sort-merge join of our toy datasets. Hadoop in Action

1. In addition, Tenzing has metadata discovery mechanisms built-in to simplify importing datasets into Tenzing. Tenzing - A SQL Implementation On The MapReduce Framework

1. This data locality optimization is a huge reason behind Hadoop's ability to efficiently process such large datasets. Hadoop Beginners Guide

1. Highly tuned and performant systems have evolved that provide users with the possibility of answering ad hoc queries over large datasets in an interactive manner. PowerDrill - Processing a Trillion Cells per Mouse Click

1. This trend is beginning to reverse itself, however, as the enormous datasets like those Percolator is intended to process become far too large for a single shared-memory machine to handle. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

159、单词 daemons 的匹配文本：

1. It consisted of a federation of worker daemons (agents) configured from a centralized master (or masters) via Zookeeper (a federated configuration and coordination system) Apache Flume Distributed Log Collection for Hadoop

1. As depicted in Figure 1-1, Ganglia is architecturally composed of three daemons: gmond, gmetad, and gweb. Monitoring with Ganglia

1. Mac OS X has a service management framework called launchd that can be used to start, stop, or manage system daemons. CouchDB The Definitive Guide

1. Along the way, we’ll touch on topics like compiler daemons and joint compilation. Gradle in Action

1. For example, for a small to medium-sized cluster with up to a hundred slave nodes, the NameNode, JobTracker, and SecondaryNameNode daemons can be put on the same master machine. Hadoop Operations and Cluster Management Cookbook

1. Daemons apply a variant of Marzullo’s algorithm [27] to detect and reject liars, and synchronize the local machine clocks to the nonliars. Spanner - Google’s Globally-Distributed Database

1. For users seeking real-time data, the Dapper user interface is capable of communicating directly with Dapper daemons on each production machine. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. A reasonable question may be, why not split the Hadoop daemons onto separate hosts? If you were to do this, you would lose out on data locality (the ability to read from local disk), which is a key distributed system property of. Hadoop in Practice

1. These daemons have specific roles; some exist only on one server, some exist across multiple servers. Hadoop in Action

1. Daemons There are three daemons that make up a standard HDFS cluster, each of which serves a distinct role, shown in Table 2-1 Hadoop Operations

160、单词 foo 的匹配文本：

1. Here is how you might send a file named foo.log into the Flume agent we previously configured: Apache Flume Distributed Log Collection for Hadoop

1. If there are parameters in the query string (in this case corresponding to ?foo=bar), they will be parsed and available as a JSON object at req.query. CouchDB The Definitive Guide

1. Loading sample data for the following keywords: foo, bar, baz. Cassandra High Performance Cookbook

1. The command reads the /data/foo.txt file and prints it to the screen, just like the cat command in Unix system. Hadoop MapReduce Cookbook

1. If application owners choose to augment the trace with their own annotations (like the “foo” annotation in the figure), these are also recorded with the rest of the span data. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. This will copy the /foo directory (and its contents) from the first cluster to the /bar directory on the second cluster, so the second cluster ends up with the directory structure /bar/foo. Hadoop：The Definitive Guide 3rd Edition

1. The exact value "Foo" is not the same as the exact value "foo" Elasticsearch The Definitive Guide

1. The second component (foo) is the name of a Chubby cell; it is resolved to one or more Chubby servers via DNS lookup. Chubby - The lock service for loosely-coupled distributed systems

1. Users of HDFS are, in many cases, unaware of the block boundaries of files, so the fact that block two of file foo was accessed last week is of little value. Hadoop Operations

1. Now you can send some messages to the TEST.FOO destination. ActiveMQ in Action

161、单词 filename 的匹配文本：

1. Path and filename Each time Flume starts a new file at hdfs.path in HDFS to write data into, the filename is composed of hdfs.filePrefix, a period character, the epoch timestamp the file was started, and optionally, a file suffix specified by the hdfs.fileSuffix property (if set) Apache Flume Distributed Log Collection for Hadoop

1. Until the file is saved with a filename, you can’t edit the contents, only change the contents of the unsaved document. Getting Started with CouchDB

1. The filename must contain _report.json in it to be considered by the web UI. Monitoring with Ganglia

1. After assembling the JAR file, you’ll notice that the version number has been added to the JAR filename. Gradle in Action

1. Next, we define a new file collection with a filter that contains all files that have the filename extension .txt. Gradle Effective Implementation Guide

1. You can delete attachments using the small delete icon to the left of the attachment filename. Beginning CouchDB

1. To rename files using regular expressions, we can simply provide a source regex and a destination filename. Gradle Beyond The Basics

1. If you want to change the location of the logs directory, just find the following entry in the same log4j.properties file and chose a different filename: Cassandra The Definitive Guide

1. BrokerXmlConfig is the filename which is assumed to be on the classpath unless a URL is specified. ActiveMQ in Action

1. Notice not only that the Java code has been compiled and its class file placed in a directory we didn’t have to configure, but also that test report files have been generated (ready to receive actual unit test results when we write tests later), and a JAR has been built using the project directory name as the filename. Building and Testing with Gradle

162、单词 scripting 的匹配文本：

1. The advantages and disadvantages of each are, for the most part, the same advantages and disadvantages of the C/C++ languages versus the Python scripting language. Monitoring with Ganglia

1. We think this is a viable model for many document-oriented applications, and it makes a great way to teach CouchDB, because we can easily assume that all of you have a browser at hand without having to ensure that you’re familiar with a particular server-side scripting language. CouchDB The Definitive Guide

1. Conditional modifications using scripting Sometimes it is convenient to add some additional logic when modifying a document and that's why ElasticSearch allows us to use scripting along with the update API. Mastering ElasticSearch

1. The next recipe, Scripting a multiple instance installation does this process with a single script. Cassandra High Performance Cookbook

1. He also maintains a blog on his web site at http://www.joelennon.ie, where he has often posted tutorials about topics ranging from Oracle database development to Windows batch scripting. Beginning CouchDB

1. The query language, Sawzall, operates at about the level of a type-safe scripting language. Interpreting the Data Parallel Analysis with Sawzall

1. Scripting – when you need more (TODO) Scripting in Elasticsearch default scripting language mvel other scripting languages script fields Query DSL sorting Scripting and facets IV. Elasticsearch The Definitive Guide

1. Building from Source Cassandra uses Apache Ant for its build scripting language and the Ivy plug-in for dependency management. Cassandra The Definitive Guide

1. The CLI provides more power such as complex scripting and is used mostly by power users. Tenzing - A SQL Implementation On The MapReduce Framework

1. We’ll start by exploring the STOMP (Streaming Text Orientated Messaging Protocol) protocol, which due to its simplicity plays an important role in messaging for scripting languages. ActiveMQ in Action

163、单词 mac 的匹配文本：

1. You can also find a ready-to-use installation, CouchDBX, that does not require the command-line process of HomeBrew or MacPorts. Getting Started with CouchDB

1. Refer to the following instructions, which work for the latest Mac OS X Lion. Monitoring with Ganglia

1. Homebrew Homebrew is a recent addition to the software management tools on Mac OS X. CouchDB The Definitive Guide

1. Because he didn’t go through the motions of initially setting up Gradle, he’s also concerned about potential differences between setting up Gradle on his Windows machine versus installing it on a Mac. Gradle in Action

1. On a Microsoft Windows operating system, the semicolon (;) is used as a path separator, and in Linux or Mac OS X operating systems, the colon (:) is used. Gradle Effective Implementation Guide

1. Next, I provide you with a step-by-step guide to installing CouchDB on various distributions of the Linux operating system as well as on a Mac, before guiding you through the creation of your first database. Beginning CouchDB

1. In fact, MacBook Pro seems to be the laptop of choice among Hadoop developers, as they’re ubiquitous in Hadoop conferences and user group meetings. Hadoop in Action

1. The scheme Cassandra uses is a version one UUID, which means that it is generated based on conflating the computer’s MAC address and the number of 100-nanosecond intervals since the beginning of the Gregorian calendar. Cassandra The Definitive Guide

1. Windows and Mac OS X are popular choices for developers. Hadoop Beginners Guide

1. This client runs on Mac OS X and speaks many different protocols, including XMPP. ActiveMQ in Action

164、单词 browsers 的匹配文本：

1. It is intended for any mobile browsers supported by the jQueryMobile toolkit. Monitoring with Ganglia

1. Since it is extremely useful to be able to test CouchDB from a browser, CouchDB sends a text/plain content type, so all browsers will display the JSON as text. CouchDB The Definitive Guide

1. You need a tool that automates bringing up the browser, manipulates the data object model (DOM) of the web page, and supports running these tests against different browsers. Gradle in Action

1. Some modern web browsers include support for native JSON encoding and decoding. Beginning CouchDB

1. If that application is a web application, then it has clients connecting to it through web browsers. Scaling CouchDB

1. This server is important both for easing the transition from DNS names to Chubby names, and to accommodate existing applications that cannot be converted easily, such as browsers. Chubby - The lock service for loosely-coupled distributed systems

1. Note that the meta headers are intended for consumption by browsers and crawlers and not visible when the page is rendered. Hadoop in Action

1. The Web UI, with easier-to-use features such as query & table browsers and syntax highlighting, is geared towards novice and intermediate users. Tenzing - A SQL Implementation On The MapReduce Framework

1. This kind of URI is used by web browsers to identify websites. ActiveMQ in Action

1. Geb and EasyB Geb can be thought of as the furthest reaches of test automation on the JVM with Groovy-based control of web browsers for testing what is primarily thought of as a task for Selenium/WebDriver. Building and Testing with Gradle

165、单词 ec2 的匹配文本：

1. This issue is also relevant when monitoring dynamic environments such as cloud resources (for example, Amazon EC2) Monitoring with Ganglia

1. We will explain steps to register, connect, and start VM instances on EC2 Hadoop Operations and Cluster Management Cookbook

1. Another class of related technologies comes from the world of cloud infrastructures such as EC2, Azure, Eucalyptus and VMWare offerings. Apache Hadoop YARN Yet Another Resource Negotiator

1. Amazon EC2 discovery The Amazon store, in addition to selling goods has a few popular services as selling storage or computing power in the pay-as-you-go model. Mastering ElasticSearch

1. This chapter discusses how to use Amazon Elastic MapReduce (EMR), the hosted Hadoop infrastructure, to execute traditional MapReduce computations as well as Pig and Hive computations on the Amazon EC2 cloud infrastructure. Hadoop MapReduce Cookbook

1. In this section, we look at running Hadoop on Amazon EC2, which is a great way to try out your own Hadoop cluster on a low-commitment trial basis. Hadoop：The Definitive Guide 3rd Edition

1. The EC2 service provides compute capacity for running Hadoop nodes. Hadoop in Action

1. Since we also explore Amazon Web Services in this book, you can run all the examples on EC2 instances, and we will look at some other more Hadoop-specific uses of AWS throughout the book. Hadoop Beginners Guide

1. At the time of this writing, Amazon Web Services, a popular infrastructure as a service cloud provider, had a large power outage that impacted a large swath of EC2 users. Hadoop Operations

1. All experiments we run in this paper are on Amazon’s EC2 cloud offering; however our techniques are applicable to non-virtualized cluster computing grid deployments as well. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

166、单词 lambda 的匹配文本：

1. The second argument is lambda and it controls a feature of the factorizer called regularization. Mahout in Action

1. The distribution property (which can take the value ll or spl) • The lambda property (which can take the value df or tff) Mastering ElasticSearch

1. By the end of this first part, you’ll have a full understanding of what lambda expressions are, and you’ll be able to write code that’s both concise and flexible enough to easily adapt to changing requirements. Java 8 in Action Lambdas, Streams and Functional-style Programming

1. Maybe on a forum for programming language enthusiasts such as Lambda the Ultimate, or maybe in more pedestrian quarters: Reddit, or the like. Programming in Scala 2nd Edition

1. Also often called , , anonymous functions function literals lambda functions , or just. Functional Programming in Scala

1. We’ve got a list of people who asked to be notified when our new restaurant, The Lambda Bar and Grille, opens, and we’d like to send them an invitation to a grand-opening party. Functional Programming Patterns in Scala and Clojure

1. This is similar to a lambda and like constructs used in many other languages. Programming Scala

1. If this is the kind of discussion that is interesting to you, please check out Lambda the Ultimate at. Beginning Scala

1. Item 21: Use function objects to represent strategies Some languages support function pointers, delegates, lambda expressions, or similar facilities that allow programs to store and transmit the ability to invoke a particular function. Effective Java 2nd Edition

1. A lambda expression is a block of code with parameters. Java SE 8 for the Really Impatient

167、单词 grained 的匹配文本：

1. CGMgraph [8] is similar in concept, providing a number of parallel graph algorithms using the Coarse Grained Multicomputer (CGM) model based on MPI. Pregel - A System for Large-Scale Graph Processing

1. In the first, coarse-grained conflict detection, a scheduler’s placement choice would be rejected if any changes had been made to the target machine since the local copy of cell state was synchronized at the beginning of the transaction. Omega - flexible, scalable schedulers for large compute clusters

1. When a read arrives, it only needs to be checked against the fine-grained safe time for key ranges with which the read conflicts. Spanner - Google’s Globally-Distributed Database

1. This will allow scalable access to small amounts of data, and may facilitate fine-grained OSD load balancing using a mechanism similar to D-SPTF [15] Ceph A Scalable, High-Performance Distributed File System

1. Since the data flow from the input records to the Sawzall program is so well structured, it was easy to adapt it to provide fine-grained access control to individual fields within records. Interpreting the Data Parallel Analysis with Sawzall

1. This facilitates fine-grained balancing in the presence of transient load imbalance, much like D-SPTF [16] RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. By querying individual fields, you have more flexbility, power and fine-grained control over which results are considered to be most relevant. Elasticsearch The Definitive Guide

1. Abstract We describe our experiences with the Chubby lock service, which is intended to provide coarse-grained locking as well as reliable (though low-volume) storage for a loosely-coupled distributed system. Chubby - The lock service for loosely-coupled distributed systems

1. You should now be ready to perform routine maintenance, understand your Cassandra cluster better, and know how to perform some general and fine-grained tuning tasks to keep Cassandra healthy. Cassandra The Definitive Guide

1. We’ll cover the authorization plug-in, which in conjunction with the authentication plug-ins allows us to define a fine-grained security access to broker clients. ActiveMQ in Action

168、单词 append 的匹配文本：

1. The source configuration also has a parameter called channels (plural) that is the name of the channel/channels the source will append events to, in this case c1 Apache Flume Distributed Log Collection for Hadoop

1. Since the second request includes a completely new version of the document, CouchDB can simply append it to the database without having to wait for the read request to finish. CouchDB The Definitive Guide

1. If the value is not NULL, which is determined by looking at its definition level, we synchronize the record being assembled to the record structure of the current reader in the method MoveToLevel, and append the field value to the record. Dremel - Interactive Analysis of WebScale Datasets

1. With that background, we now describe how the client, master, and chunkservers interact to implement data mutations, atomic record append, and snapshot. GFS - The Google File System

1. If there is a text in the term (its length is higher than zero), we use a StringBuilder object to reverse the text, we clear the term buffer (by calling setEmpty() on the attribute), and we append the reversed text to the already emptied term buffer (we do that by calling the append() method of the attribute) Mastering ElasticSearch

1. While super columns look like an attractive option, it is almost always better to append the column and the super column together with a deliminator between them. Cassandra High Performance Cookbook

1. Right from the very beginning, CouchDB was designed to be schema-free and indexable, using a combination of append-only storage and atomic updates. Beginning CouchDB

1. No reads or seeks of any kind are required for writing a value to Cassandra because all writes are append operations. Cassandra The Definitive Guide

1. Essentially, this acts as a batch mode APPEND style INSERT. Tenzing - A SQL Implementation On The MapReduce Framework

1. In this case, Gradle has overloaded << to append a code block to the list of actions a task performs. Building and Testing with Gradle

169、单词 jit 的匹配文本：

1. Two forces can lead to race conditions—the Just-in-Time (JIT) compiler optimization and the Java Memory Model. Programming Concurrency on the JVM

1. On the JVM, there are optimizations in the JIT to attempt to avoid locks when they aren’t necessary. Scala in Depth

1. While some attempts have been made to increase the performance of Java software, such as just-in-time (JIT) compilers that convert Java bytecode to native code, developers and users will find that performance is not as fast as comparable C++ code, and that a greater amount of memory is consumed. Java Network Programming and Distributed Computing

1. Translating Java Virtual Machine code at load-time or during execution into the native instruction set of the host CPU (sometimes referred to as just-in-time, or JIT, code generation) The Java Virtual Machine Specification Java SE 7 Edition

1. An especially important one involves the operation of the loader and what is called a just-in-time (JIT) compiler. Thinking in Java 4th Edition

1. We start by discussing the Java bytecode format, and how a JIT compiler works, making a case for the power of adaptive runtimes. Oracle JRockit The Definitive Guide

1. Of particular note is a VM technology called just-in-time (JIT) compilation whereby Java byte codes are converted on the fly into native platform machine language, boosting execution speed for code that is run repeatedly. Java in a Nutshell 5th Edition

1. This code was then compiled using Sawzall’s just-in-time (JIT) compiler. Tenzing - A SQL Implementation On The MapReduce Framework

1. The default is with a JIT, and this is the mode used for all measurements in the book. Java Performance Tuning 2nd Edition

1. Chapter 7 presents a step-bystep approach to tuning the HotSpot VM covering choices such as which JIT compiler to use, which garbage collector to use, and how to size Java heaps, and also provides an indication when the Java application itself may require some rework to meet the performance goals set forth by application stakeholders. Java Performance

170、单词 jndi 的匹配文本：

1. The JNDI API parts: an application-level interface used by the application components to aming and directory services and a service provider interface to attach a r of a naming and directory service. Java EE 7 Specification

1. This uses the Java Naming and Directory Interface (JNDI) registry to look up the required details. HBase：The Definitive Guide

1. The map pedName attribute specifies the JNDI name of the JMS destination from which the bean will consume the message. Java EE 7 Essentials

1. The chapter also has a brief introduction to hierarchical databases (such as LDAP directories) and the JNDI API. Core Java Volume II Advanced Features 9th Edition

1. Additions to the platform included the Java Naming and Directory Interface (JNDI) and the Java Sound APIs, which were previously available as extensions to the platform. Java in a Nutshell 5th Edition

1. In addition, Java makes heavy use of nonstandard custom schemes such as rmi, jar, jndi, and doc for various purposes. Java Network Programming 4th Edition

1. For example, EJBHome objects and other JNDI lookups are often costly, but need to be performed regularly. Java Performance Tuning 2nd Edition

1. It also discusses the client-side JNDI support provided by ActiveMQ. ActiveMQ in Action

1. It is important to make a distinction between the bean instances that are pooled and/or cached versus the bean references that the client obtains though resource injection, EJBHome.create, or JNDI (Java Naming and Directory Interface) lookup. Java Performance

1. The preferred way to obtain a database connection is to use a DataSource when running on an application server or to have access to a Java Naming and Directory Interface (JNDI) service. Java 7 Recipes A Problem-Solution Approach

171、单词 enumerated 的匹配文本：

1. You can define your own enumerated type whenever such a situation arises. Core Java Volume I Fundamentals 9th Edition

1. Enumerated types An apparently small addition in Java SE5 is the enum keyword, which makes your life much easier when you need to group together and use a set of enumerated types. Thinking in Java 4th Edition

1. However, suppose you maintain legacy code that contains an enumerated type such as Click here to view code imageClick here to view code image. Core Java Volume II Advanced Features 9th Edition

1. How do you query the database for an enumerated list of values that are currently allowed in the status column? SQL Antipatterns

1. If the constants are best viewed as members of an enumerated type, you should export. Effective Java 2nd Edition

1. The three most significant new language features are generic types, enumerated types, and annotations, which are covered in a new chapter of their own. Java in a Nutshell 5th Edition

1. Avro allows the definition of hierarchical data structures; so, for example, we can create a record that contains an array, an enumerated type, and a subrecord. Hadoop Beginners Guide

1. With this option, the matching documents are first enumerated into an internal bit set and then that bit set is used to match documents. Lucene in Action 2nd Edition

1. The sessions bound to a session context can be enumerated as follows: Fundamental Networking in Java

1. The FetchType enumerated type defines the values for this element, which can be either EAGER or LAZY. Pro JPA2

172、单词 benchmarks 的匹配文本：

1. You can run the benchmarks yourself by changing to the bench/ directory of CouchDB’s trunk and running ./runner.sh while CouchDB is running on port 5984 CouchDB The Definitive Guide

1. We can also use Hadoop benchmarks to identify configuration problems and use it as reference for performance tuning. Hadoop Operations and Cluster Management Cookbook

1. In this section we present a few micro-benchmarks to illustrate the bottlenecks inherent in the GFS architecture and implementation, and also some numbers from real clusters in use at Google. GFS - The Google File System

1. While the sort benchmark would typically benefit from improvements to the HDFS data path, both benchmarks perform better on YARN primarily due to significant improvements in the MapReduce runtime itself: map-side sort improvements, a reduce client that pipelines and batches transfers of map output, and a server-side shuffle based on Netty [3] Apache Hadoop YARN Yet Another Resource Negotiator

1. These benchmarks performed different computational tasks, including analyzing ads logs (Ads Logs), extracting and joining data about websites from various sources (SiteData and IndexStats), and computing usage statistics from logs dumped by internal build tools (Build Logs) FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Benchmarking HDFS Running benchmarks is a good way to verify whether your HDFS cluster is set up properly and performs as expected. Hadoop MapReduce Cookbook

1. It contains benchmarks for Avro, Protocol Buffers, and Thrift, along with a number of other frameworks. Hadoop in Practice

1. In this paper, we describe the architecture and implementation of Tenzing, and present benchmarks of typical analytical queries. Tenzing - A SQL Implementation On The MapReduce Framework

1. Our load balancing algorithm attempts to deal with this imbalance, but cannot do a perfect job for two main reasons: rebalancing is throttled to reduce the number of tablet movements (a tablet is unavailable for a short time, typically less than one second, when it is moved), and the load generated by our benchmarks shifts around as the benchmark progresses. Bigtable - A Distributed Storage System for Structured Data

1. Through the performance benchmarks in the following sections, we show that using an efficient database storage layer cuts down on data processing time especially on tasks that require complex query processing over structured data such as joins. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

173、单词 bigtable 的匹配文本：

1. Storing and querying hierarchical data is straightforward in key-value stores like Bigtable. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. HBase is an Apache-licensed open source system modeled after Google’s Bigtable. Monitoring with Ganglia

1. Even though many projects happily use Bigtable [9], we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. Spanner - Google’s Globally-Distributed Database

1. FlumeJava’s parallel collections abstract away the details of how data is represented, including whether the data is represented as an in-memory data structure, as one or more files, or as an external storage service such as a MySql database or a Bigtable [5] FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Bigtable[4] provides both structure and data distribution but relies on a distributed file system for its durability. Cassandra A Decentralized Structured Storage System

1. For example, front-ends may handle many different request types, or a storage system such as Bigtable [8] may be most efficient when shared across multiple applications. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. In addition, both GFS and Bigtable use Chubby as a well-known and available location to store a small amount of meta-data; in effect they use Chubby as the root of their distributed data structures. Chubby - The lock service for loosely-coupled distributed systems

1. Once you start to scale many other data stores (MySQL, Bigtable), some nodes need to be set up as masters in order to organize other nodes, which are set up as slaves. Cassandra The Definitive Guide

1. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving) Bigtable - A Distributed Storage System for Structured Data

1. Distributed storage systems like Bigtable [9] can scale to the size of our repository but don’t provide tools to help programmers maintain data invariants in the face of concurrent updates. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

174、单词 myisam 的匹配文本：

1. Changing the value of AUTOCOMMIT has no effect on nontransactional tables, such as MyISAM or Memory tables, which have no notion of committing or rolling back changes. High Performance MySQL 3rd Edition

1. MySQL’s MyISAM storage engine, querying a table is most efficient. SQL Antipatterns

1. You might think this missing support constitutes a deal breaker; however, many people forget that one of the most popular table types in MySQL (MYISAM) doesn’t support transactions, either. The Definitive Guide to MongoDB

1. Although at the moment MySQL supports a number of page and row-level locking storage engines, the core architecture has a strong MyISAM heritage. Understanding MySQL Internals

175、单词 wiki 的匹配文本：

1. Sample configurations for other web servers such as Nginx and Lighttpd are available on the gweb wiki. Monitoring with Ganglia

1. To secure against these errors, or to handle non-existing documents in a custom way (e.g., a wiki. CouchDB The Definitive Guide

1. How do you know what’s out there, you may ask? Gradle provides a curated list of available community plugins on a wiki page: http://wiki.gradle.org/display/GRADLE/Plugins. Gradle in Action

1. From the client point of view, document is a JSON object (see more about JSON format at http://en.wikipedia.org/wiki/JSON) Mastering ElasticSearch

1. The API for list functions is volatile at present, so check the CouchDB wiki for the syntax to use for these functions. Beginning CouchDB

1. The main wiki is located at http://wiki.apache.org/hadoop/, and contains useful presentations, setup instructions, and troubleshooting instructions. Hadoop in Practice

1. Embedded Pig programs is a fairly advanced topic and you can find more details at http://wiki.apache.org/pig/ EmbeddedPig. Hadoop in Action

1. According to the project wiki, Cassandra has been used to create a variety of applications, including a windowed time-series store, an inverted index for document searching, and a distributed job priority queue. Cassandra The Definitive Guide

1. If transforms are of interest, they are well documented on the Hive wiki. Hadoop Beginners Guide

1. This is easy enough because a sample is provided in the JBoss wiki. ActiveMQ in Action

176、单词 analytics 的匹配文本：

1. For over a decade has worked for several startup companies in Silicon Valley and Raleigh, North Carolina, in the area of search and analytics. Apache Flume Distributed Log Collection for Hadoop

1. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. Omega - flexible, scalable schedulers for large compute clusters

1. Finally we've touched the surface of search analytics and we've seen what software we can use to measure and observe. Mastering ElasticSearch

1. If you are working with an existing system, then you can take a look at log files and analytics data to get a sense of how your application is used. Scaling CouchDB

1. Elasticsearch is a distributed scalable real-time search and analytics engine. Elasticsearch The Definitive Guide

1. Two main applications that fall into this category are Realtime Analytics [11] and MySQL backups. Apache Hadoop Goes Realtime at Facebook

1. These companies include: • Twitter is using Cassandra for analytics. Cassandra The Definitive Guide

1. It is possible to create a fully functional SQL engine on top of the MapReduce framework, with extensions that go beyond SQL into deep analytics. Tenzing - A SQL Implementation On The MapReduce Framework

1. For example, when you need real-time analytics about your enterprise: an application used by a retail store might want to send a message to request more inventory. ActiveMQ in Action

1. It records various information about the request in Google Analytics, such as a user identifier and information about the page being fetched. Bigtable - A Distributed Storage System for Structured Data

177、单词 executable 的匹配文本：

1. Because Ganglia is an open source project, it is possible to compile a runnable binary executable of the gmond agent on virtually any platform with a C compiler. Monitoring with Ganglia

1. Optimally, the automated tasks should be executable from the command line, which allows you to run the build from any machine you want, whenever you want. Gradle in Action

1. Running Java applications If we want to execute a Java executable from a Gradle build, we have several options. Gradle Effective Implementation Guide

1. Create an executable script /root/sync.sh that uses the rsync command: Cassandra High Performance Cookbook

1. A protocol compiler takes this language and generates executable code to manipulate the protocol buffers. Interpreting the Data Parallel Analysis with Sawzall

1. The fact that the Gradle build file is executable Groovy code gives you nearly unlimited flexibility in deciding on its origin. Gradle Beyond The Basics

1. Let the data remain where it is and move the executable code to its hosting machine. Hadoop in Action

1. The Ant tasks for Thrift generation directly ran the Thrift executable (as you would on the command line) Cassandra The Definitive Guide

1. After the usual routine of making the mapper executable and running the job we produced, data showing 29 different UFO shapes were reported. Hadoop Beginners Guide

1. Brew will install and build any dependencies, then place the Gradle executable in your path. Building and Testing with Gradle

178、单词 elasticsearch 的匹配文本：

1. Its vibrant and growing user community includes large companies like Facebook and Instagram as well as popular open-source projects such as Infinispan, HornetQ, Vert.x, Apache Cassandra and Elasticsearch, all of which have employed its powerful network abstractions in their core code. Netty in Action MEAP V08

1. For example, we would run a query lucene NOT elasticsearch to match documents that contain lucene term, but not elasticsearch term in the document. Mastering ElasticSearch

1. In order to achieve this, the recipe integrates with an open source product call Elastic Search, which is a general-use, clustered search engine with a RESTful API (http://www.elasticsearch.org/) Storm Real-Time Processing Cookbook

1. This is an intermediate level book designed for anyone writing applications using, or responsible for managing Elasticsearch in a production environment. Elasticsearch in Action MEAP V07 cHAPTER 1

1. ElasticSearch for indexing and searching ElasticSearch (http://www.elasticsearch.org/) is an Apache 2.0 licensed open source search solution built on top of Apache Lucene. Hadoop MapReduce Cookbook

1. Scripting – when you need more (TODO) Scripting in Elasticsearch default scripting language mvel other scripting languages script fields Query DSL sorting Scripting and facets IV. Elasticsearch The Definitive Guide

179、单词 comparator 的匹配文本：

1. The sample MapReduce job performs range joins, and uses custom comparator and partitioner classes. Pro Hadoop

1. The only control a developer has is how the keys are sorted and grouped by specifying a custom Comparator object. MapReduce Design Patterns

1. The group key is used by the partitioner and the group comparator. Professional Hadoop Solutions

1. Create a column family named cars specifying the comparator as LongType: Cassandra High Performance Cookbook

1. Finally, write the code to set up a normal MapReduce job, but tell the MapReduce framework to use our own partitioner and comparator classes: Hadoop RealWorld Solutions Cookbook

1. It uses the same comparison operators and comparator types as RowFilter. HBase in Action

1. Finally, the grouping RawComparator is responsible for determining the group boundaries across the sorted records. Hadoop in Practice

1. WritableComparator is a general-purpose implementation of RawComparator for WritableComparable classes. Hadoop：The Definitive Guide 3rd Edition

1. KeyComparator Compares two KeyValue keys, i.e., what is returned by the getKey() method, in their raw, byte array format. HBase：The Definitive Guide

1. Second, a column family has two attributes: a name and a comparator. Cassandra The Definitive Guide

180、单词 meta 的匹配文本：

1. All of the data and meta-information generated by CouchDB and internal elements such as the configuration information are all generated as JSON. Getting Started with CouchDB

1. You should see meta-information about the installation, your JVM, and the operating system. Gradle in Action

1. Client module dependency module() A dependency on an external module where the artifacts are stored in a repository but the meta information about the module is in the build file. Gradle Effective Implementation Guide

1. The information about keyspaces and column families is meta-information that is stored and replicated across all nodes in the cluster. Cassandra High Performance Cookbook

1. The headers usually provide information such as the time of the response, information about the server, the type of content contained in the response body, and more meta-information. Beginning CouchDB

1. This directory will automatically be included in the jar file under its META-INF directory. Gradle Beyond The Basics

1. In addition, both GFS and Bigtable use Chubby as a well-known and available location to store a small amount of meta-data; in effect they use Chubby as the root of their distributed data structures. Chubby - The lock service for loosely-coupled distributed systems

1. In Microsoft SQL Server, two meta-databases are maintained: master and tempdb. Cassandra The Definitive Guide

1. ColumnIO files have headers which contain meta information about the data, including the low and high values for each column. Tenzing - A SQL Implementation On The MapReduce Framework

1. The journal directory—Contains the data files for the data logs, and a data-control file that holds some meta information. ActiveMQ in Action

181、单词 cpus 的匹配文本：

1. This includes metrics such as the OS type or the number of CPUs installed in the system. Monitoring with Ganglia

1. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. Dremel - Interactive Analysis of WebScale Datasets

1. The programs were executed on a weekend afternoon, when the CPUs, disks, and network were mostly idle. MapReduce - Simplied Data Processing on Large Clusters

1. Another example is a big cluster with various hardware components, different CPUs, RAM amount, or disk space. Mastering ElasticSearch

1. This is especially useful under high load where CPUs tend to drift more. Cassandra High Performance Cookbook

1. While memory gets cheaper, CPUs get faster, and hard drives get larger every year, at any given moment there’s an upward limit to vertical scaling. Scaling CouchDB

1. Since GFS is a storage system, its CPUs are often lightly loaded, and the free computing cycles can be used to run Workqueue jobs. Interpreting the Data Parallel Analysis with Sawzall

1. Cassandra, like many databases, is particularly dependent on the speed of the hard disk and the speed of the CPUs (it’s best to have four or eight cores, to take advantage of Cassandra’s highly concurrent construction) Cassandra The Definitive Guide

1. This bandwidth is enough to saturate the tablet server CPUs because of overheads in our networking stack, SSTable parsing, and Bigtable code, and is also almost enough to saturate the network links used in our system. Bigtable - A Distributed Storage System for Structured Data

1. CPUs have become so much faster than disks that a few CPUs in a shared-memory machine can drive enough disk heads to service required loads without the complexity of distributed transactions: the top TPC-E benchmark results today are achieved on large shared-memory machines connected to a SAN. Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

182、单词 dom 的匹配文本：

1. If the user’s experience is impacted the most by DOM rendering in the browser, and MySQL contributes only a tiny fraction of the total response time, then optimizing queries further can never help the user experience appreciably. High Performance MySQL 3rd Edition

1. You need a tool that automates bringing up the browser, manipulates the data object model (DOM) of the web page, and supports running these tests against different browsers. Gradle in Action

1. The first thing to do is map the Story DOM with the STORIES table in the database using the table method defined in the Squeryl Schema class. Scala in Action

1. Produce/consume JSON text in a streaming fashion (similar to StAX API for XML) • Build a Java object model for JSON text (similar to DOM API for XML) Java EE 7 Essentials

1. Tree parsers, such as the Document Object Model (DOM) parser, that read an XML document into a tree structure. Core Java Volume II Advanced Features 9th Edition

1. A DOM parser reads an XML document and converts it into a tree of nodes that represent the full content of the document. Java in a Nutshell 5th Edition

1. These are logically defined processing units and may be accomplished within a single step (e.g., SAX) or may be distinct steps (e.g., DOM) Java Performance

1. For those familiar with the XML DOM interface the code presented below should be straight-forward. Lucene in Action 2nd Edition

1. How It Works JavaFX has the capability to apply CSS styles onto the Scene graph and its nodes very much like browsers applying CSS styles onto elements in an HTML document object model (DOM) Java 7 Recipes A Problem-Solution Approach

1. It has been developed using JavaScript so that it can interact with DOM (Document Object Model) using native JavaScript calls. Selenium 2 Testing Tools Beginner's Guide

183、单词 wildcard 的匹配文本：

1. How do you allow one call and disallow the other? The Java language designers invented an ingenious new concept, the wildcard type, to solve this problem. Core Java Volume I Fundamentals 9th Edition

1. Please note by default these wildcard characters can't be used as the first character in a term because of the performance reasons. Mastering ElasticSearch

1. Since we do not need to know the exact type T, we can use a wildcard. Data Structures and Algorithm Analysis in Java 3rd Edition

1. In bash and other shells it’s normal to affix the * wildcard to filesystem commands, and for the shell to expand that prior to running a program. Hadoop in Practice

1. Rather than having to enumerate each file and directory to specify the input, it is convenient to use wildcard characters to match multiple files with a single expression, an operation that is known as globbing. Hadoop：The Definitive Guide 3rd Edition

1. Fieldnames can be specified with wildcards—any field which matches the wildcard pattern will be included in the search. Elasticsearch The Definitive Guide

1. Methods that take a Builder instance would typically constrain the builder’s type parameter using a bounded wildcard type (Item 28) Effective Java 2nd Edition

1. ActiveMQ uses a special notation to denote a wildcard subscription; we’ll describe that in the next section. ActiveMQ in Action

1. Receiving limited or directed broadcasts is simply a matter of binding a datagram socket to the wildcard address and executing a receive operation. Fundamental Networking in Java

1. In order to add all the JAR files within a specified directory, use the wildcard character (*) after the directory containing the files. Java 7 Recipes A Problem-Solution Approach

184、单词 kerberos 的匹配文本：

1. We will show you how to configure ACL for authorization and Kerberos for authentication, configure NameNode HA, recover from a failed NameNode, and so on. Hadoop Operations and Cluster Management Cookbook

1. You learn how Hadoop provides authentication, with a focus on the details of Hadoop’s use of Kerberos. Professional Hadoop Solutions

1. Hadoop security – integrating with Kerberos Hadoop by default runs without security. Hadoop MapReduce Cookbook

1. Hadoop can be configured to run with Kerberos, a network authentication protocol, which requires Hadoop daemons to authenticate clients, both user and other Hadoop components. Hadoop in Practice

1. Note, however, that the latest versions of Hadoop support Kerberos authentication, which removes these restrictions; see “Security” on page 325.) Despite these limitations, it is worthwhile having permissions enabled (as it is by default; see the dfs.permissions property), to avoid accidental modification or deletion of substantial parts of the filesystem, either by users or by automated tools or programs. Hadoop：The Definitive Guide 3rd Edition

1. Different credential administrations are possible; the initial implementation will use Kerberos. The Hadoop Distributed File System

1. Providing Your Own Authentication You can provide your own method of authenticating to Cassandra if you want to make special requirements, such as a Kerberos ticket or encryption, or if you want to store passwords in a different location, such as an LDAP directory. Cassandra The Definitive Guide

1. To avoid reliance on simple user IDs, the developers need to learn the user identity from somewhere, and the Kerberos system was chosen with which to integrate. Hadoop Beginners Guide

1. Yahoo!, one of the major contributors to Apache Hadoop, invested in adding full Kerberos support to core Hadoop. Hadoop Operations

1. Some common forms of authentication include plain-text passwords, one-time password devices, smart cards, or Kerberos, just to name a few. ActiveMQ in Action

185、单词 aws 的匹配文本：

1. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: " Verify the startup from AWS Management Console" HBase Administration Cookbook

1. Amazon Amazon, http://aws.amazon.com, provides a large set of cloud computing services: Pro Hadoop

1. And, we will end this book by introducing steps to build a Hadoop cluster using the AWS cloud. Hadoop Operations and Cluster Management Cookbook

1. For the past two years, he has participated in the design and implementation of several Hadoop and Amazon Web Services (AWS) based implementations. Professional Hadoop Solutions

1. The biggest (and oldest) player right now in the cloud infrastructure space is Amazon Web Services (AWS) HBase in Action

1. Check the output of the computation in the output data bucket using the AWS S3 console. Hadoop MapReduce Cookbook

1. Deepak Singh, open data advocate and bizdev manager of the Amazon AWS cloud. Hadoop：The Definitive Guide 3rd Edition

1. Transferring data into and out of an AWS Hadoop cloud. Hadoop in Action

1. Since we also explore Amazon Web Services in this book, you can run all the examples on EC2 instances, and we will look at some other more Hadoop-specific uses of AWS throughout the book. Hadoop Beginners Guide

1. When you set up your AWS account, make sure that all your Amazon EC2 instances, key pairs, security groups, and EMR jobflows are located in the same region to avoid cross-region transfer costs. Programming Hive

186、单词 backend 的匹配文本：

1. He has worked with Perl/ PHP/Python, primarily for coding and MySQL/Oracle as the backend, for several years prior to moving into Hadoop fulltime. Apache Flume Distributed Log Collection for Hadoop

1. It is also possible to use a different backend for events, which can be useful if you need to scale up to hundreds or thousands of events without incurring the processing penalty associated with JSON parsing. Monitoring with Ganglia

1. Towards characterizing cloud backend workloads: insights from Google compute clusters. Omega - flexible, scalable schedulers for large compute clusters

1. Jason Davies swapped out the backend of the Ely Service website with CouchDB, without changing anything visible to the user. CouchDB The Definitive Guide

1. Her interests and experiences lie across multiple tiers, from web frontend GUI to middleware, from middleware to backend SQL RDBMS and NoSQL data storage. Cassandra High Performance Cookbook

1. Cassandra is now deployed as the backend storage system for multiple services within Facebook. Cassandra A Decentralized Structured Storage System

1. The max=4 parameters indicate the maximum number of connections that the proxy will allow to the backend server. Scaling CouchDB

1. Understanding: Ads Review queries fan out across many types of systems, including BigTable, the aforementioned database, a multi-dimensional indexing service, and various other C++ and Java backend services. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Keith Thornhill works as a Senior Software Engineer at Raptr and saw a need to take their storage and analytics backend to the next level. Cassandra The Definitive Guide

1. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving) Bigtable - A Distributed Storage System for Structured Data

187、单词 jmeter 的匹配文本：

1. Learn how to test web applications using Apache JMeter with practical, hands-on examples. Performance Testing With JMeter 2.9

1. JMeter JMeter is a Java application that can load another application and measure its performance. High Performance MySQL 3rd Edition

1. This is great for plotting the history of the standard result types, such as JUnit, FindBugs, JMeter, and NCSS. Jenkins Continuous Integration Cookbook

1. You can even run JMeter as a proxy, and then manipulate your application in an ordinary browser to prepare an initial version of your test script. Jenkins The Definitive Guide

1. Using SolrMeter you can quickly set this scenario up, and then use another tool like JMeter that drives your front-end search user interface to ensure your application meets your expected SLA when Solr is under load. Solr 3 Enterprise Search Server

1. For web applications, there are several open source stress and load testing tools available, including Grinder, JMeter, and Gatling.6 When testing load-balanced web applications, we should ensure our test clients are distributed across different IP addresses so that requests are balanced across the cluster. Graph Databases

1. JMeter (http://jakarta.apache.org/jmeter/), the Open Source load testing tool from the Apache Software Foundation, is one of the most popular tools in this area. Java Performance

188、单词 predicate 的匹配文本：

1. Using MultiGet to limit round trips and overhead MultiGet should be used as an alternative to multiple get operations when each get operation uses the same SlicePredicate. Cassandra High Performance Cookbook

1. A predicate, def(), can be used to test if a value is defined; it returns true for a defined value and false for an undefined value. Interpreting the Data Parallel Analysis with Sawzall

1. Inner joins compare all tuples in relations L and R, and produce a result if a join predicate is satisfied. Hadoop in Practice

1. It’s actually an equijoin because the join predicate is equality.) The result’s fields are made up of all the fields of all the input relations. Hadoop：The Definitive Guide 3rd Edition

1. Following the input, the “filter” operator produces a value when its predicate evaluates to true. Hadoop in Action

1. The slice predicate is a container object that allows us to specify the range of columns that we want to read between a start and a finish. Cassandra The Definitive Guide

1. If a predicate is a constant or a constant range (e.g., BETWEEN) and the source data is an indexed source (e.g., Bigtable), the compiler will push down the condition to an index range scan on the underlying source. Tenzing - A SQL Implementation On The MapReduce Framework

1. Sometimes, we want to only include particular columns or to apply a predicate to reduce the number of selected items. Hadoop Beginners Guide

1. The value of the predicate is the value returned by the closure. Building and Testing with Gradle

1. The first structured data task evaluates a simple selection predicate on the pageRank attribute from the Rankings table. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

189、单词 erlang 的匹配文本：

1. One camp of developers believes actors should really be used for interprocess communication, like in Erlang. Programming Concurrency on the JVM

1. Building from source requires a suitable build environment, some libraries, and prerequisites (such as Erlang) Getting Started with CouchDB

1. Erlang is one such language, and it includes safe mechanisms for one task to communicate with another. Thinking in Java 4th Edition

1. CouchDB is written in Erlang, but you don’t need to know anything about Erlang to use CouchDB. CouchDB The Definitive Guide

1. This will drag along a number of packages with it, mainly erlang ones. Deploying OpenStack

1. Add to this the fact that CouchDB is developed in Erlang OTP, a fault-tolerant programming language that offers excellent concurrency features, and you know that your CouchDB database will scale well without a loss of reliability and availability. Beginning CouchDB

1. CouchDB (due to its Erlang underpinnings) can take advantage of multiple CPUs. Scaling CouchDB

1. RabbitMQ is built using the OTP (Open Telecom Platform) and implemented in the Erlang language and runtime environment. Pro Puppet

1. Preparing RabbitMQ for Pacemaker high availability In order for Pacemaker monitoring to function properly, you must ensure that RabbitMQ’s .erlang.cookie files are identical on all nodes, regardless of whether DRBD is mounted there or not. high-availability-guide

1. Client: Riak offers three primary ways of interacting with it: via a JSON over HTTP interface; drivers for Erlang, Python, Java, PHP, JavaScript, and Ruby; and, finally, a Protocol Buffers client interface. Cassandra The Definitive Guide

190、单词 parsed 的匹配文本：

1. Once all the configurations have been parsed you see this message, which shows everything that was configured. Apache Flume Distributed Log Collection for Hadoop

1. The configuration file is parsed using libconfuse, a third-party API for configuration files. Monitoring with Ganglia

1. You get back a JSON string, that, if parsed into a native object or data structure of your programming language, gives you access to the welcome string and version information. CouchDB The Definitive Guide

1. The task sonarRunner will give you concrete information about the parsed report files: Gradle in Action

1. CouchDB then sends a response to the application in the form of JavaScript Object Notation (JSON), which can easily be parsed by most modern programming languages. Beginning CouchDB

1. Parsing a complex dataset with Hadoop Datasets we parsed so far were simple, where each data item was contained in a single line. Hadoop MapReduce Cookbook

1. Every Web page has an associated 1D number called a docID which is assigned whenever a new URL is parsed out of a Web page. The anatomy of a large-scale hypertextual Web search engine

1. The HPROF option you used will create a text file that can be easily parsed. Hadoop in Practice

1. At the reducer, the list of values for each key are parsed. Hadoop in Action

1. This sorts by directly comparing the bytes, validating that the input can be parsed as US-ASCII. Cassandra The Definitive Guide

191、单词 nagios 的匹配文本：

1. Chapter 7, Monitoring Flume, discusses various options available to monitor Flume both internally and externally including Monit, Nagios, Ganglia, and custom hooks. Apache Flume Distributed Log Collection for Hadoop

1. Chapter 5, Monitoring and Diagnosis: In this chapter, we will describe how to monitor and diagnose HBase cluster with Ganglia, OpenTSDB, Nagios, and other tools. HBase Administration Cookbook

1. As described in Chapter 7, various Nagios integration features may be set in gweb’s conf.php. Monitoring with Ganglia

1. Nagios: A Monitoring and Alert Generation Framework Nagios (http://www.nagios.org) provides a flexible customizable framework for collecting data about the state of a complex system and triggering various levels of alerts based on the collected data. Pro Hadoop

1. Chapter 6, Monitoring a Hadoop Cluster, explains how to monitor a Hadoop cluster with various tools, such as Ganglia and Nagios. Hadoop Operations and Cluster Management Cookbook

1. Nagios is the de facto standard Network Monitoring System (NMS) Cassandra High Performance Cookbook

1. There are many fine tools that you can use for this, ranging from sar,1 the built-in Linux utility that collects and reports on system activity, to more sophisticated tools such as Nagios and Ganglia. Hadoop in Practice

1. Many third-party monitoring and alerting systems (such as Nagios or Hyperic) can query MBeans, making JMX the natural way to monitor your Hadoop cluster from an existing monitoring system. Hadoop：The Definitive Guide 3rd Edition

1. Nagios Nagios is a very commonly used support tool for gaining qualitative data regarding cluster status. HBase：The Definitive Guide

1. Nagios, for example, is open source, free, and rather straightforward. Cassandra The Definitive Guide

192、单词 src 的匹配文本：

1. The binary distribution archive has bin in the name and the source archive is marked with src. Apache Flume Distributed Log Collection for Hadoop

1. Source and destination directories are configured by attributes srcdir and destdir; compile Java source files located in directory src and put class files into directory dest. Gradle in Action

1. The following example shows a simple Copy task called simpleCopy with a single source directory src/xml and a destination directory definitions: Gradle Effective Implementation Guide

1. You can see that the code is placed in the src folder (the code is in the main folder and the unit tests are located in the test folder) Mastering ElasticSearch

1. For this job we're only interested in retrieving the column qualifier src from the column family cf for each key. Hadoop RealWorld Solutions Cookbook

1. This will create a new directory called otp_src_R13B01 under /usr/local/erlang. Beginning CouchDB

1. Printing out the files property of goodPoems (or otherwise inspecting the contents of the collection) shows that it contains all of the .txt files in the src/main/resources directory, but not the file whose name starts with shelley. Gradle Beyond The Basics

1. Compile the source by running the ant build command from the CHAPTER_6_SRC folder. Hadoop MapReduce Cookbook

1. In the src directory, you’ll find the source code for Hadoop itself. Hadoop Operations

1. When custom task logic outgrows the build file, we can migrate it to the buildSrc directory at the project root. Building and Testing with Gradle

193、单词 css 的匹配文本：

1. The raw images, JavaScript, CSS, and HTML assets needed by Sofa are stored in the _attachments field, which is interesting in that by default it shows only the stubs, rather than the full content of the files. CouchDB The Definitive Guide

1. A WAR (web application archive) file is used to bundle web components, compiled classes, and other resource files like deployment descriptors, HTML, JavaScript, and CSS files. Gradle in Action

1. He dabbled with web development in the late 1990s and was an expert in HTML, CSS, and JavaScript by the time he finished school. Beginning CouchDB

1. Next, create the same document within the catalog-b database, but with a slightly different title (the comma is missing after “CSS”): Scaling CouchDB

1. In this code, the page defines the structure using
  
  and CSS (not shown here) Java EE 7 Essentials

1. JavaDocs now also use a CSS stylesheet to simplify changes to the appearance of the pages. Java 7 New Features Cookbook

1. It has produced several HTML standards, as well as a variety of others such as HTTP, PICS, XML, CSS, MathML, and more. Java Network Programming 4th Edition

1. It should be noted that delays are meant to be present between successive page requests but not when requesting content (CSS, JavaScript, images) considered part of a page request. Java Performance

1. Solution Create an application to dynamically customized border regions using JavaFX’s CSS styling API. Java 7 Recipes A Problem-Solution Approach

1. You can use CSS to change the visual appearance of your application. Java SE 8 for the Really Impatient

194、单词 endpoints 的匹配文本：

1. Our support is notable for its scale: declaring a queue automatically creates an inbox on each entity group, giving us millions of endpoints. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. In addition to less portability and JVM commitment, using the REST API you can connect to different major ElasticSearch versions if you are aware of the differences in REST endpoints and response returned by ElasticSearch. Mastering ElasticSearch

1. Because Oozie also has a web console that supports Kerberos HTTP SPNEGO authentication, it should also be configured to use the same HTTP principal and keytab file used for other HTTP endpoints. Professional Hadoop Solutions

1. It is low latency because the operation is acknowledgment to the client by only one of the natural endpoints for the data. Cassandra High Performance Cookbook

1. BigCouch introduces some new API endpoints that are needed to manage its clustering features. Scaling CouchDB

1. Endpoints allow you to extend the cluster’s capabilities, exposing new operations to client applications. HBase in Action

1. Endpoints are dynamic extensions to the RPC protocol, adding callable remote procedures. HBase：The Definitive Guide

1. Range The Range class allows us to create a specific interval or span of values with defined endpoints, and works with Comparable types. Getting Started with Google Guava

1. This strategy only overrides the calculateNatural Endpoints method from the abstract parent implementation. Cassandra The Definitive Guide

1. Its sequencing with respect to other datagrams between the same two endpoints is not guaranteed. Fundamental Networking in Java

195、单词 mappers 的匹配文本：

1. It assumes that adding more workers results in an idealized linear speedup (modulo dependencies between mappers and reducers), up to the point where map activities and all reduce activities respectively run in parallel. Omega - flexible, scalable schedulers for large compute clusters

1. This command will initiate a MapReduce job with a number of mappers to run the copy task in parallel. Hadoop Operations and Cluster Management Cookbook

1. Analysis is done by the analyzer, which is built of tokenizer and zero or more filters, and can also have zero or more character mappers. Mastering ElasticSearch

1. Most MapReduces have a single (possibly sharded) input source and a single Mapper, but in general a single MapReduce can have multiple input sources and associated Mappers. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Hive uses multiple mappers to fetch all the contents of the column family in parallel. Cassandra High Performance Cookbook

1. But, Hadoop runs the mappers in multiple nodes, and often in a machine different than the machine running the job's client. Hadoop MapReduce Cookbook

1. The MapReduce framework will take care of calling your reducer once per unique key outputted by the mappers, along with a list of document IDs. Hadoop in Practice

1. Under the MapReduce model, the data processing primitives are called mappers and reducers. Hadoop in Action

1. Hadoop provides a standard specification (that is, interface) for the map and reduce functions, and implementations of these are often referred to as mappers and reducers. Hadoop Beginners Guide

1. This is exactly how Hadoop Streaming - an API that permits non-Java mappers and reducers to be written and executed in Hadoop MapReduce - works; a user supplied script or program is created and executed under the control of the child JVM. Hadoop Operations

196、单词 rdbms 的匹配文本：

1. In a typical relational database management system (RDBMS) solution, for example, reliability and consistency, in particular, in a distributed system can start to be a problem. Getting Started with CouchDB

1. CouchDB differs from others by accepting eventual consistency, as opposed to putting absolute consistency ahead of raw availability, like RDBMS or Paxos. CouchDB The Definitive Guide

1. Her interests and experiences lie across multiple tiers, from web frontend GUI to middleware, from middleware to backend SQL RDBMS and NoSQL data storage. Cassandra High Performance Cookbook

1. Most developers with years of development experience working with RDBMS’s find this a very useful feature to have. Cassandra A Decentralized Structured Storage System

1. This entity storage system exposes certain RDBMS functionality on top of an underlying BigTable. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. These use cases would have workloads that are challenging to scale because of very high write throughput, massive datasets, unpredictable growth, or other patterns that may be difficult or suboptimal in a sharded RDBMS environment. Apache Hadoop Goes Realtime at Facebook

1. This technique is similar in spirit to vertical partitioning and column-oriented databases in the relational database management system (RDBMS) world. Hadoop in Action

1. RDBMS: The Awesome and the Not-So-Much There are many reasons that the relational database has become so overwhelmingly popular over the last four decades. Cassandra The Definitive Guide

1. Explore how we can move data from RDBMS into HDFS and Hive. Hadoop Beginners Guide

1. Databases satisfy many of the requirements of an incremental system: a RDBMS can make many independent and concurrent changes to a large corpus and provides a flexible language for expressing computation (SQL) Percolator - Large-scale Incremental Processing Using Distributed Transactions and Notifications

197、单词 instantiate 的匹配文本：

1. It is also a good practice to instantiate member variables, or thread local variables, for any key or value that would otherwise be created in the body of the map() method. Pro Hadoop

1. The settings file declares the configuration required to instantiate the project’s hierarchy. Gradle in Action

1. Because we were interested in the latest data in order to instantiate the river and start indexing we've used the following command: Mastering ElasticSearch

1. The header (see Table 2-1) contains key and value class names that are used by the reader to instantiate those classes, the version number, and compression information. Professional Hadoop Solutions

1. After choosing the correct transports and other connection settings, users can instantiate a Cassandra.Client instance. Cassandra High Performance Cookbook

1. In the mapper, we instantiate both ThriftWritable and WeblogRecord instances: Hadoop RealWorld Solutions Cookbook

1. All Writable implementations must have a default constructor so that the MapReduce framework can instantiate them, then populate their fields by calling readFields() Hadoop：The Definitive Guide 3rd Edition

1. The full request to create the index and instantiate the token filter and analyzer looks like this: Elasticsearch The Definitive Guide

1. You instantiate a Job object by passing a JobConf object to its constructor. Hadoop in Action

1. The BrokerFactory can instantiate the BrokerService class using the standard ActiveMQ XML configuration file as shown. ActiveMQ in Action

198、单词 facebook 的匹配文本：

1. Creating mountains of data has become very easy, thanks to Facebook, Twitter, Amazon, digital cameras and camera phones, YouTube, Google, and just about anything else you can think of connected to the Internet. Apache Flume Distributed Log Collection for Hadoop

1. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia. Getting Started with CouchDB

1. In contrast, Flickr.com at its core is like Facebook with users and data for users, but in Flickr’s case, the data that grows fastest is images uploaded by users. CouchDB The Definitive Guide

1. There are strict operational requirements on Facebook’s platform in terms of performance, reliability and efficiency, and to support continuous growth the platform needs to be highly scalable. Cassandra A Decentralized Structured Storage System

1. In the past two years, Joe has written more than a dozen articles and tutorials for the IBM developerWorks web site, including articles on Adobe Flex, Facebook applications, ExtJS, Adobe AIR, IBM DB2, and more. Beginning CouchDB

1. This has evolved from traditional use cases, such as combining OLTP and log files, to more sophisticated uses, such as using Hadoop for data warehousing (exemplified by Facebook) and the field of data science, which studies and makes new discoveries about data. Hadoop in Practice

1. At Facebook, Hadoop has traditionally been used in conjunction with Hive for storage and analysis of large data sets. Apache Hadoop Goes Realtime at Facebook

1. People upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. Hadoop in Action

1. Though as of this writing it has not yet reached a 1.0 release, Cassandra is being used in production by some of the biggest properties on the Web, including Facebook, Twitter, Cisco, Rackspace, Digg, Cloudkick, Reddit, and more. Cassandra The Definitive Guide

1. From emails to Facebook posts, from purchase histories to web links, there are large data sets growing everywhere. Hadoop Beginners Guide

199、单词 udf 的匹配文本：

1. Conceptually, such user defined functions (UDFs) can be combined with SQL queries, but the experience reported in the comparison paper indicates that UDF support is either buggy (in DBMS-X) or missing (in Vertica) MapReduce a flexible data processing tool

1. Basically, you would use Pig to build and group the records, then a UDF to make sense of the data. MapReduce Design Patterns

1. This is something that can be provided by a custom UDF. Professional Hadoop Solutions

1. Several of the Hive examples will demonstrate how to properly implement and use a custom function (UDF) for reuse in different analytics. Hadoop RealWorld Solutions Cookbook

1. Since we’re looking to cover more advanced techniques, we’ll look at how you can write a custom Hive UDF to geolocate your logs. Hadoop in Practice

1. A Filter UDF Let’s demonstrate by writing a filter function for filtering out weather records that do not have a temperature quality reading of satisfactory (or better) Hadoop：The Definitive Guide 3rd Edition

1. Since storage handlers work transparently for the higher-level layers in Hive, you can also use any user-defined function (UDF) supplied with Hive—or your own custom functions. HBase：The Definitive Guide

1. You should also consider contributing your UDF back to PiggyBank to benefit others in the Pig community. Hadoop in Action

1. Time for action – adding a new User Defined Function (UDF) Let us show how to create and invoke some custom Java code via a new UDF. Hadoop Beginners Guide

1. Unfortunately, [23] found that in practice, it was difficult to implement such a UDF inside the parallel databases. HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

200、单词 xpath 的匹配文本：

1. The only new element we have used here is the XPath Extractor post processor. Performance Testing With JMeter 2.9

1. The second, and the one that we will explore in this recipe, is the ability to design custom bug discovery rules for Java source code using Xpath. Jenkins Continuous Integration Cookbook

1. You can use various forms of XML schema definitions (DTDs, XML Schema, RelaxNG) to check document formats, run queries with XPath and XQuery, and perform transformations with XSLT. NoSQL Distilled

1. Cassandra does not have a standardized data access language such as SQL or XPATH. Cassandra High Performance Cookbook

1. Tsung allows matching using a limited subset of JSONPath (XPath for JSON), using the jsonpath attribute. Scaling CouchDB

1. This is a lot like how XPath lets us select an arbitrary element from a deeply nested XML structure, or how a CSS selector lets us do the same with HTML. Functional Programming Patterns in Scala and Clojure

1. The XPath language makes it simple to access tree nodes. Core Java Volume II Advanced Features 9th Edition

1. XPath is a language for referring to specific nodes in an XML document. Java in a Nutshell 5th Edition

1. Use of XML-friendly query mechanisms such as XPath and XQuery. Cassandra The Definitive Guide

1. The details for xpath don’t concern us here, but note that we pass string literals to the xpath function and use FROM src LIMIT 1 to specify the required FROM clause and to limit the output. Programming Hive

201、单词 errata 的匹配文本：

1. If you find any errata, please report them by visiting http://www. Apache Flume Distributed Log Collection for Hadoop

1. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Hadoop Operations and Cluster Management Cookbook

1. If you find any errata, please report them by visiting http://www.packtpub. Gradle Effective Implementation Guide

1. Dear reader, refer to http://www.elasticsearchserverbook.com from time to time, where you'll be able to find the newest errata dedicated to the book and additional articles extending it. Mastering ElasticSearch

1. Errata Although we have taken every care to ensure the accuracy of our content, mistakes do happen. Cassandra High Performance Cookbook

1. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support. Hadoop RealWorld Solutions Cookbook

1. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Hadoop MapReduce Cookbook

1. We have a web page for this book, where we list errata, examples, and any. Elasticsearch The Definitive Guide

1. We have a web page for this book, where we list errata, examples, and any additional information. Cassandra The Definitive Guide

1. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Hadoop Beginners Guide

202、单词 recommender 的匹配文本：

1. For better or worse, I instead prototyped a simple recommender for my friend’s startup, from scratch. Mahout in Action

1. The highlighted code inside the recommend() method configures a default recommender before calling methods to perform recommendation. Sonar Code Quality Testing Essentials

1. Recommender systems are used by many companies including Amazon, LinkedIn, and Facebook. Hadoop RealWorld Solutions Cookbook

1. Recommender systems, which are also known as collaborative filtering (CF) systems, are the computer equivalent of you asking your friends for a restaurant recommendation. Hadoop in Practice

203、单词 amqp 的匹配文本：

1. A cloud controller communicates with the internal object store via HTTP (Hyper Text Transfer Protocol), but it communicates with a scheduler, network controller, and volume controller via AMQP (Advanced Message Queue Protocol) bk-compute-adminguide-grizzly

1. Protocol: AMQP 1.0 AMQP is a binary wire protocol which was. Next Generation Open Source Messaging with Apollo

1. We go from basic concepts up to seeing how to map those concepts in AMQP (the protocol used by RabbitMQ) RabbitMQ in Action

1. The core framework is large, and dozens of portfolio projects extend that core, covering things like security, web flow, SOAP web services (REST web services are part of the core), enterprise integration, batch processing, mobile, various flavors of social (Facebook, LinkedIn, Twitter, GitHub, and so on), various flavors of NoSQL (MongoDB, Neo4j, Riak, and so on), BlazeDS/Flex, AMQP/Rabbit, and many more. Spring in Practice

1. The cloud controller communicates with the internal object store using HTTP, but it communicates with the scheduler, network controller, and volume controller using AMQP (advanced message queueing protocol) bk-admin-guide-cloud-latest

1. This leaves the door open for supporting other messaging technologies, such as Advanced Message Queuing Protocol (AMQP) Spring Batch in Action

1. Two additional plugins for RabbitMQ need to be installed to provide Stomp protocol support and AMQP protocol support. Pro Puppet

1. Highly available RabbitMQ RabbitMQ is the default AMQP server used by many OpenStack services. high-availability-guide

1. In this chapter, we’ll look at how to send and receive messages in Spring using both the Java Message Service (JMS) and the Advanced Message Queuing Protocol (AMQP) Spring in Action 4th Edition

1. The choice of RabbitMQ over other AMQP compatible options that are gaining support in OpenStack, such as ZeroMQ and Qpid is due to its ease of use with Ubuntu and significant testing in production. OpenStack Operations Guide

204、单词 optimizing 的匹配文本：

1. He also has experience optimizing SQL queries for database interactions. Apache Flume Distributed Log Collection for Hadoop

1. Ganglia’s protocols were carefully designed, optimizing at every opportunity to reduce overhead and achieve high performance. Monitoring with Ganglia

1. What about CSS? Requirements for optimizing CSS are in many cases no different from JavaScript. Gradle in Action

1. The routing is a very powerful mechanism for optimizing a cluster. Mastering ElasticSearch

1. Comparing the two MapReduce columns and the Sawzall column shows the importance of optimizing. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Geolocation (TODO) Adding geolocation to your documents Geolocation-aware search Sorting by distance Geo-shapes Optimizing geo-queries 24 Elasticsearch The Definitive Guide

1. This might mean optimizing the channels the database uses to write to the underlying filesystem. Cassandra The Definitive Guide

1. Since joins are one of the most important aspects of our system, we have spent considerable time on implementing different types of joins and optimizing them. Tenzing - A SQL Implementation On The MapReduce Framework

1. In an ideal world, consumers would always be running as fast as the fastest producer, which neatly brings us to the next section on optimizing message consumers. ActiveMQ in Action

1. This is advantageous when further optimizing the memory footprint, see Section 3 PowerDrill - Processing a Trillion Cells per Mouse Click

205、单词 jspa 的匹配文本：

1. Please post comments or corrections to the Author Online forum: http://www.manning-sandbox.com/forum.jspa?forumID=828 Solr in action

1. Please post comments or corrections to the Author Online forum at http://www.manning-sandbox.com/forum.jspa?forumID=451 Lucene in Action 2nd Edition

206、单词 workloads 的匹配文本：

1. In HPC clusters, CPUs can offload data parallel workloads to the accelerators. Monitoring with Ganglia

1. These are the sort of workloads experienced by the majority of web applications. CouchDB The Definitive Guide

1. A cluster typically hosts a multitude of distributed applications that share resources, have widely varying workloads, and run on machines with different hardware parameters. Dremel - Interactive Analysis of WebScale Datasets

1. The master can easily keep up with this rate, and therefore is not a bottleneck for these workloads. GFS - The Google File System

1. Tuning Memtables for write-heavy workloads Cassandra is designed so that all disk write operations are serial. Cassandra High Performance Cookbook

1. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Ceph A Scalable, High-Performance Distributed File System

1. This boundary is configurable, but we have found 256K to work well for us in our production workloads. Cassandra A Decentralized Structured Storage System

1. This may be because our Java workloads tend to be closer to the end user; these sorts of applications often handle a wider mix of requests and consequently have comparatively complex control paths. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. Although we have considered the statistical properties of such a distribution and demonstrated the effect of load variance on performance for certain workloads, the interaction of workload, PG distribution, and replication can be complex. RADOS A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.

1. In fact, the ability to handle application workloads that require high performance at significant write volumes with many concurrent client threads is one of the primary features of Cassandra. Cassandra The Definitive Guide

207、单词 sharded 的匹配文本：

1. Scaling” is an imprecise word, but in this chapter we’ll be dealing with the aspect of putting together a partitioned or sharded cluster that will have to grow at an increasing rate over time from day one. CouchDB The Definitive Guide

1. This interleaving of tables to form directories is significant because it allows clients to describe the locality relationships that exist between multiple tables, which is necessary for good performance in a sharded, distributed database. Spanner - Google’s Globally-Distributed Database

1. Most MapReduces have a single (possibly sharded) input source and a single Mapper, but in general a single MapReduce can have multiple input sources and associated Mappers. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. The value of individual records is often multipled by the way they are partitioned, sharded, or sorted. MapReduce Design Patterns

1. Sharded cache — Every CPU is dedicated to a specific portion of data. Professional Hadoop Solutions

1. Next-generation cluster-level file system Automatically sharded metadata layer Data typically written using Reed-Solomon (1.5x) Client-driven replication, encoding and replication Metadata space has enabled availability analyses. Storage Architecture and Challenges

1. Fortunately, Elasticsearch is sharded by default, as you'll come to understand, making it easy to spread your data in a cluster of multiple instances. Elasticsearch in Action MEAP V07 cHAPTER 1

1. And if it does offer scalability, does it imply specific steps to do so? The easiest solution would be to add one machine at a time, while sharded setups (especially those not supporting virtual shards) sometimes require for each shard to be increased simultaneously because each partition needs to be equally powerful. HBase：The Definitive Guide

1. These use cases would have workloads that are challenging to scale because of very high write throughput, massive datasets, unpredictable growth, or other patterns that may be difficult or suboptimal in a sharded RDBMS environment. Apache Hadoop Goes Realtime at Facebook

1. All this means that no application code has to handle these logistics; your application code communicates with a sharded cluster just as it speaks to a single node. MongoDB in Action

208、单词 latin 的匹配文本：

1. Pig uses Pig Latin as the programming language, which is extensible and easy to use. Hadoop Operations and Cluster Management Cookbook

1. A Pig Latin program combines high-level declarative operators similar to those in SQL, together with named intermediate variables representing edges in the dataflow graph between operators. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Pig operations are written in Pig Latin and take a more imperative form. Hadoop RealWorld Solutions Cookbook

1. Pig provides Pig Latin, an easy-to-use high-level query language similar to SQL. Incoop MapReduce for Incremental Computations

1. This section describes how to use Pig Latin script to find countries with 2000$ GNI sorted by the same criterion from the HDR dataset. Hadoop MapReduce Cookbook

1. It uses its own language called PigLatin to model and operate on data. Hadoop in Practice

1. The language used to express data flows, called Pig Latin. Hadoop：The Definitive Guide 3rd Edition

1. Computing similar documents efficiently, using a simple Pig Latin script. Hadoop in Action

1. Included in the platform is a high-level language called Pig Latin and a compiler that translates programs written in Pig Latin into sequences of MapReduce jobs. Cassandra The Definitive Guide

1. The overlap between the projects is that whereas Hive looks to present a familiar SQL-like interface to data, Pig uses a language called Pig Latin that specifies dataflow pipelines. Hadoop Beginners Guide

209、单词 dns 的匹配文本：

1. Finally, if you want a reverse DNS lookup of the hostname to be used instead of IP as a value, set the useIP property to false. Apache Flume Distributed Log Collection for Hadoop

1. The addresses may be specified as IP addresses or DNS hostnames and may optionally be suffixed by a colon followed by the port number where the gmond tcp_accept_channel is to be found. Monitoring with Ganglia

1. Crawlers have different states: DNS lookup, connecting to host, send request, receiving response. The anatomy of a large-scale hypertextual Web search engine short

1. This should allow the cluster to continue functioning even with DNS issues since Cassandra communicates exclusively by IP address. Cassandra High Performance Cookbook

1. A major performance stress is DNS lookup so each crawler maintains a DNS cache. The anatomy of a large-scale hypertextual Web search engine

1. It then updates the DNS tables, replacing the IP address of the failed replica with that of the new one. Chubby - The lock service for loosely-coupled distributed systems

1. Unfortunately, changing only the hostname is not sufficient as the DataNodes cache the DNS entry.) You’ll also have to run the backup node as a NameNode by executing bin/start-dfs.sh on it. Hadoop in Action

1. Round-Robin DNS Another option is to create a record in DNS that represents the set of servers in the cluster. Cassandra The Definitive Guide

1. Add the host's DNS name or IP address to the slaves file on the node from which you usually run commands such as slaves.sh or cluster start/stop scripts. Hadoop Beginners Guide

1. A DNS-based load-balancing system selects a cluster by accounting for the user’s geographic proximity to each physical cluster. Web search for a planet the google cluster architecture

210、单词 parsers 的匹配文本：

1. The Saxon JAR was in the classpath before the Jetty JAR, so it was being used to deliver the XML parsers, and the parser was not validating. Pro Hadoop

1. During query time, if you use one of the provided query parsers, your query will be analyzed. Mastering ElasticSearch

1. Most XML parsers operate on whole XML documents, so if a large XML document is made up of multiple input splits, it is a challenge to parse these individually. Hadoop：The Definitive Guide 3rd Edition

1. Tree parsers, such as the Document Object Model (DOM) parser, that read an XML document into a tree structure. Core Java Volume II Advanced Features 9th Edition

1. Nutch has parsers for HTML, a web crawler, a link-graph database, and other extra components necessary for a web search engine. Hadoop in Action

1. The parsers I use to extract API information from the source files and javadoc files are created using the Antlr parser generator developed by Terence Parr. Java in a Nutshell 5th Edition

1. Hadoop also provides mechanisms through its InputFormat and OutputFormat classes that provide implementations of common file formats and likewise remove the need of having to write file parsers for any but custom file types. Hadoop Beginners Guide

1. In this section the performance aspects related to the use of various parsers and serializers are discussed starting with a brief look at the steps involved in a typical XML processing cycle. Java Performance

1. Tika is actually a framework that hosts plugin parsers for each supported document type. Lucene in Action 2nd Edition

1. The indentation of the actual output was changed to fit the page.) For those not familiar with parsers and tokenizers, this can look overwhelming. Programming Hive

211、单词 ajax 的匹配文本：

1. It is easy to take the effects of the modern websites for granted, but the functionality of pop-up lists during searches, customization, and the in-page experience (traditionally referred to as AJAX) of a dynamic website. Getting Started with CouchDB

1. An interesting point to note is the large number of HTTP POST operations, which is typical of a Web 2.0 site where a large proportion of the traffic is AJAX requests from JavaScript applications running in client browsers. Monitoring with Ganglia

1. And most recently, he has been specializing in Web 2.0, Ajax, portals, and cloud computing. Pro Hadoop

1. We’ll build the first iteration of the example application without using show or list, because writing Ajax queries against the JSON API is a better way to teach CouchDB as a database. CouchDB The Definitive Guide

1. Instead of submitting the data to the server for each operation and rerendering the page, you change the functionality to exchange data via asynchronous JavaScript calls (AJAX) in the background. Gradle in Action

1. He introduced the concept of JavaScript frameworks and Ajax to the company’s web division in 2008, developing a PL/SQL and JavaScript framework that is now used in every Core web product. Beginning CouchDB

1. New features introduced in version 2 of the JSF specification, such as composite components and Ajax, are only exposed to page authors using facelets. Java EE 7 Essentials

1. ActiveMQ could be used in the Web environment through its REST and Ajax APIs. ActiveMQ in Action

1. If a page has multiple Ajax requests, it is important to measure the overall performance of the page by combining all associated requests. Java Performance

1. Its most visible part is an interactive Web UI making heavy use of AJAX with the help of the Google Web Toolkit [16] PowerDrill - Processing a Trillion Cells per Mouse Click

212、单词 emr 的匹配文本：

1. If you’re comfortable with the Amazon Elastic MapReduce (EMR) service, Mahout algorithms can be run as follows: Mahout in Action

1. An alternative method of running a MapReduce with the Amazon cloud is to use Amazon Elastic MapReduce (EMR) Hadoop Operations and Cluster Management Cookbook

1. You learn about Elastic MapReduce (EMR) and additional AWS services (such as S3, CloudWatch, Simple Workflow, and so on) that can be used to supplement Hadoop’s functionality. Professional Hadoop Solutions

1. This chapter discusses how to use Amazon Elastic MapReduce (EMR), the hosted Hadoop infrastructure, to execute traditional MapReduce computations as well as Pig and Hive computations on the Amazon EC2 cloud infrastructure. Hadoop MapReduce Cookbook

1. The trade-off is that you lose some control over how the cluster works and you have to pay extra for this EMR service. Hadoop in Action

1. A dual approach In this book we will not be limiting our discussion to EMR or Hadoop hosted on Amazon EC2; we will be discussing both the building and the management of local Hadoop clusters (on Ubuntu Linux) in addition to showing how to push the processing into the cloud via EMR. Hadoop Beginners Guide

1. Rather than allowing customers to rent machines for any type of process (like Amazon’s Elastic Cloud Computing [EC2] service and other cloud services), EMR allows users to rent virtual Hadoop clusters. Programming Pig

1. If you already use Amazon Web Services, the fastest path to setting up Hive for learning is to run a Hive-configured job flow on Amazon Elastic MapReduce (EMR) Programming Hive

213、单词 configurable 的匹配文本：

1. Configuration An interface into the configuration of your CouchDB installation that allows you to edit the different configurable parameters. Getting Started with CouchDB

1. On a configurable schedule, snmp2ganglia polls each host for the OIDs associated with it and delivers those values to Ganglia as metrics. Monitoring with Ganglia

1. Because of the way reverse HTTP proxies work, this automatically allows configurable security, encryption, load distribution, compression, and, of course, aggressive caching of your database resources. CouchDB The Definitive Guide

1. In this simple example, we will develop a river that will be able to write last modified date of a given website and will check and update that data periodically and the period will be configurable. Mastering ElasticSearch

1. Column families have several configurable parameters that are specified in a with clause and separated by and: Cassandra High Performance Cookbook

1. This boundary is configurable, but we have found 256K to work well for us in our production workloads. Cassandra A Decentralized Structured Storage System

1. A lot of what we've looked at here is configurable: from the node name to the Gateway settings. Elasticsearch in Action MEAP V07 cHAPTER 1

1. The default sort order is not configurable; by default, records are returned in the order in which they are written. Cassandra The Definitive Guide

1. This is all configurable and will be discussed later in chapter 4 ActiveMQ in Action

1. Internally, each SSTable contains a sequence of blocks (typically each block is 64KB in size, but this is configurable) Bigtable - A Distributed Storage System for Structured Data

214、单词 partitioner 的匹配文本：

1. Each map task in Hadoop is broken into the following phases: record reader, mapper, combiner, and partitioner. MapReduce Design Patterns

1. Using the row key of data, the partitioner calculates a token. Cassandra High Performance Cookbook

1. Finally, write the code to set up a normal MapReduce job, but tell the MapReduce framework to use our own partitioner and comparator classes: Hadoop RealWorld Solutions Cookbook

1. Hadoop uses HashPartitioner as the default Partitioner implementation to calculate the distribution of the intermediate data to the reducers. Hadoop MapReduce Cookbook

1. The partitioner's job is to logically funnel map outputs to the. Hadoop in Practice

1. There are overloaded template factory methods for setting a combiner, partitioner, record reader, or record writer. Hadoop：The Definitive Guide 3rd Edition

1. Hadoop enforces this strategy by use of the HashPartitioner class. Hadoop in Action

1. Depending on your security constraints and partitioner, it’s fine to run multiple keyspaces on the same cluster. Cassandra The Definitive Guide

1. This functionality is provided by the HashPartitioner class within the org. Hadoop Beginners Guide

1. Further, each key is assigned to a partition using a component called the partitioner. Hadoop Operations

215、单词 ubuntu 的匹配文本：

1. You'd be hard pressed to find such a company using the bleeding-edge Linux kernel rather than something like Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu LTS, or any of the other distributions that aim for is stability and compatibility. Apache Flume Distributed Log Collection for Hadoop

1. For example, if you use Ubuntu 9.10 (Karmic) or later, then CouchDB comes pre-installed. Getting Started with CouchDB

1. Installing The rrdcached package can be installed on Debian-based distributions (Debian, Ubuntu, Linux, Mint, etc.) by using apt: Monitoring with Ganglia

1. Be sure to read through the Ubuntu-specific system documentation that can be found under /usr/share/couchdb. CouchDB The Definitive Guide

1. Assume that the box is based on the Linux distribution Ubuntu. Gradle in Action

1. Binary installers are available for Ubuntu, Fedora, CentOS, FreeBSD, and Mac OS X systems through each system’s individual package manager. Beginning CouchDB

1. We’ll be using Ubuntu, but these instructions should be easily adaptable to your operating system. Scaling CouchDB

1. You can simplify your deployment of Hadoop by using their free Hadoop distribution, in either RPM or Ubuntu /Debian Packages. Hadoop in Action

1. If you’re on a Linux distribution such as Ubuntu, it couldn’t be easier to get Git. Cassandra The Definitive Guide

1. We assume you have a recent distribution of Ubuntu, but as long as you have command-line Linux familiarity any modern distribution will suffice. Hadoop Beginners Guide

216、单词 frontend 的匹配文本：

1. For example, a cluster that is used for backend batch processing and not serving frontend requests. HBase Administration Cookbook

1. If it seems like I’m reading your mind, it’s because the Ganglia authors are engineers like you, who designed Ganglia’s visualization UI, gweb, from their own notion of the ideal data visualization frontend. Monitoring with Ganglia

1. This choice of replica sites was made to cope with outages due to potential major natural disasters, and also the choice of their frontend sites. Spanner - Google’s Globally-Distributed Database

1. Her interests and experiences lie across multiple tiers, from web frontend GUI to middleware, from middleware to backend SQL RDBMS and NoSQL data storage. Cassandra High Performance Cookbook

1. You will also need the Ganglia meta daemon (gmetad) running on at least one node, and another node running the Ganglia web frontend. Hadoop RealWorld Solutions Cookbook

1. The frontend to the exception monitoring service then provides links from specific exception reports to their respective distributed traces. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. B.6 HDFS proxy The HDFS proxy is a component in the Hadoop contrib that provides a web app proxy frontend to HDFS. Hadoop in Practice

1. Counters At StumbleUpon, the first production feature deployed on HBase was keeping counters for the stumbleupon.com frontend. Hadoop：The Definitive Guide 3rd Edition

1. As soon as you have to retain information about your users, products, sessions, orders, and so on, you are typically going to use some storage backend providing a persistence layer for the frontend application server. HBase：The Definitive Guide

1. Client: FlockDB uses the Thrift 0.2 client, and Twitter has also written a Ruby frontend that offers a richer interface. Cassandra The Definitive Guide

217、单词 txt 的匹配文本：

1. Assume you want to determine all files within a directory and write their names to a new file named allfiles.txt. Gradle in Action

1. For each record, numWords is incremented by the number of terms in input.txtField returned by the CountWords function. Dremel - Interactive Analysis of WebScale Datasets

1. In the following example we have the filterFiles task, where we define a file collection of two files with the names INSTALL.txt and README. Gradle Effective Implementation Guide

1. Printing out the files property of goodPoems (or otherwise inspecting the contents of the collection) shows that it contains all of the .txt files in the src/main/resources directory, but not the file whose name starts with shelley. Gradle Beyond The Basics

1. Reads the stocks.txt file from the local filesystem and writes the Avro. Hadoop in Practice

1. Create some text file on your local filesystem called example.txt. Hadoop in Action

1. You might want to check out the README.txt file, as it will have the list of dependencies required to run the tool. Cassandra The Definitive Guide

1. Place the file on HDFS: $ hadoop fs -put states.txt states.txt. Hadoop Beginners Guide

1. README.txt—A file containing some URLs to documentation to get new users started with ActiveMQ. ActiveMQ in Action

1. In this case, the copyFiles task will copy any files with the .xml, .properties, or .txt extensions from the resources directory to the target directory. Building and Testing with Gradle

218、单词 schemas 的匹配文本：

1. Structure and Consistency There are no constraints or schemas for a document database, but that doesn’t mean that you can ignore aspects like the simplicity and consistency of the format. Getting Started with CouchDB

1. Even though many projects happily use Bigtable [9], we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. Spanner - Google’s Globally-Distributed Database

1. There is nothing in the standard DSL to describe the servers to which a deployment might be targeted, the database schemas associated with the application, or the operations exposed by an underlying source control tool. Gradle Beyond The Basics

1. It is unclear to us how effective schemas are in achieving transparency in practice. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. To do this you need to set the job configuration with the schema information for both schemas. Hadoop in Practice

1. Finally, you can look at the data as generic database relations with well-defined schemas, in a simple commaseparated format. Hadoop in Action

1. One often-lauded feature of relational database systems is the rich schemas they afford. Cassandra The Definitive Guide

1. The metadata server provides an API to store and fetch metadata such as table names and schemas, and pointers to the underlying data. Tenzing - A SQL Implementation On The MapReduce Framework

1. Unlike traditional relational databases that require structured data with well-defined schemas, MapReduce and Hadoop work best on semi-structured or unstructured data. Hadoop Beginners Guide

1. Clients can control the locality of their data through careful choices in their schemas. Bigtable - A Distributed Storage System for Structured Data

219、单词 javadoc 的匹配文本：

1. Finally, we cover the marvelous javadoc tool that formats your code comments as a set of hyperlinked web pages. Core Java Volume I Fundamentals 9th Edition

1. Its main purpose is to provide automation for typical tasks needed in Java projects, such as compiling source files to classes, running unit tests, packaging JAR files, and creating Javadoc documentation. Gradle in Action

1. The task javadoc is used to generate Javadoc documentation, and is in the Documentation tasks section. Gradle Effective Implementation Guide

1. A modifier is a keyword such as public or static.) The name of each annotation is preceded by an @ symbol, similar to Javadoc comments. Core Java Volume II Advanced Features 9th Edition

1. Not coincidentally, they are also the elements for which the Javadoc utility generates documentation in its default mode of operation. Effective Java 2nd Edition

1. An example of heap pollution Some methods should not be marked as safe, as illustrated with the following code adapted from the javadoc description of the @SafeVarargs annotation (http://download. Java 7 New Features Cookbook

1. However, take a look at what we have done here and browse through the Javadoc for these classes. Hadoop Beginners Guide

1. For more information on the advanced features of metrics2, see the Javadoc. Hadoop Operations

1. But if you want more in-depth information, see the Javadoc for the JMS Message type. ActiveMQ in Action

1. Use Javadoc to place comments before any of the classes, method, and fields that you want to document. Java 7 Recipes A Problem-Solution Approach

220、单词 aspectj 的匹配文本：

1. However, the good news for Spring and AOP developers is that starting from version 2.0, Spring provides much tighter integration with AspectJ. Pro Spring 3

1. For example, it’s possible to use the Spring Framework and AspectJ to enable dependency injection in our domain objects. Pro Spring MVC

1. The Spring Framework also supports aspect-oriented programming with both a simpler approach called Spring AOP and the more powerful AspectJ approach. Spring in Practice

1. The most popular, perhaps, is AspectJ which requires a special compilation process. Pro Spring Security

1. In AspectJ parlance, each execution point is called a join point, and a particular query over join points is a pointcut. Programming Scala

1. A dedicated project named Equinox Aspects allows you to use LTW and AspectJ within the Equinox container. Spring Dynamic Modules in Action

1. Spring supports two implementations of AOP: one using the AOP Alliance project (essentially wrapping objects as just described) and the other using AspectJ. Spring Integration in Action

1. Reports on the language AspectJ include some examples from this book expressed in an aspect-oriented fashion. Concurrent Programming in Java Design Principles and Pattern

1. The more advanced mapping mode leverages AspectJ to provide a live, connected representation of the graph elements bound to the domain objects. Spring Data Modern Data Access for Enterprise Java

1. Don’t worry if you don’t know AspectJ or the details of how AspectJ pointcut expressions are written. Spring in Action 4th Edition

221、单词 keystore 的匹配文本：

1. Constructing this object is beyond the scope of this chapter but in general requires a certificate and secret key stored in a keystore object. Netty in Action MEAP V08

1. If the certificate is generated by a well-known trusted authority, their root certificates are most likely already in the default keystore (cacerts) Jenkins Continuous Integration Cookbook

1. For example, the properties related to SSL and the keystore are only used if you are using LDAP over SSL. Professional Hadoop Solutions

1. The certificates themselves are held in a location (usually a file) called the keystore. Java Security 2nd Edition

1. In most Java environments, a keystore is used to store the private keys, and a truststore to store the trusted certificates. Oracle JRockit The Definitive Guide

1. Here is how Alice creates a keystore, alice.certs, and generates a key pair with alias alice: Core Java Volume II Advanced Features 9th Edition

1. This is accomplished from the command prompt by executing the following command: keytool -genkey -keystore mySrvKeystore -keyalg RSA. Java 7 New Features Cookbook

1. Assuming that a keystore exists on your system (see the keytool documentation in Chapter 8), you can get one with code like the following: Java in a Nutshell 5th Edition

1. Enter key password for
  
  (RETURN if same as keystore password): Java Network Programming 4th Edition

1. Without creating and denoting the proper keystore and truststore, you can expect to see the following exceptions: ActiveMQ in Action

222、单词 repl 的匹配文本：

1. Here’s how to create that user account, which we’ll call repl: High Performance MySQL 3rd Edition

1. PRIMARY" is the state of the member and "testReplSet" is an identifier for this set. MongoDB The Definitive Guide 2nd Edition

1. There you typed node() at the REPL and got back test@mrhyde as a result. RabbitMQ in Action

1. First, load the ring and a new ringbuilder object in a Python REPL: bk-admin-guide-cloud-latest

1. Before you begin, you should have the Scala interpreter (REPL) running and ready. Functional Programming in Scala

1. I start up a Clojure REPL and use it to poke around the problem API. Functional Programming Patterns in Scala and Clojure

1. You can run them interactively at a REPL (read-eval-print loop) command line. Beginning Scala

1. The --replSet option tells the instance the name of the replica set it is joining, as well as the name of at least one other member of the set. The Definitive Guide to MongoDB

1. You get what’s called a “read-eval-print” loop, or REPL, in the world of Lisp, Scala, and so on. Java SE 8 for the Really Impatient

1. Alternatively, a nice approach is to use Ruby’s interactive REPL, irb. MongoDB in Action

223、单词 mbean 的匹配文本：

1. Several Flume components report metrics to the JMX platform MBean server. Apache Flume Distributed Log Collection for Hadoop

1. This is readily visible only via the Ganglia reporting framework or through the JMX MBean interface. Pro Hadoop

1. An Mbean with a randomly chosen number will be below, which you need to expand again. Cassandra High Performance Cookbook

1. In order to find out how many documents you've indexed, you would look at the SolrIndexSearch Mbean. Solr 3 Enterprise Search Server

1. FSDatasetMBean Datanode Datanode storage metrics, such as capacity and free storage space. Hadoop：The Definitive Guide 3rd Edition

1. You do have to know the exact names of the MBean and the attribute or operation you want to get. HBase：The Definitive Guide

1. MBeans and the platform MBean server will be discussed in the next chapter. Oracle JRockit The Definitive Guide

1. CompactionManager implements an MBean interface so it can be introspected. Cassandra The Definitive Guide

1. ActiveMQ can use an existing MBean server or even create its own if necessary. ActiveMQ in Action

1. Java 6’s JConsole can also generate a heap dump using its HotSpotDiagnostics MBean. Java Performance

224、单词 bootstrap 的匹配文本：

1. This means that a new user can easily bootstrap your application on an empty database. CouchDB The Definitive Guide

1. You’ll learn how to apply Gradle’s standard plugins to bootstrap, configure, and run your application. Gradle in Action

1. A component of the RM negotiates for the container to spawn this bootstrap process. Apache Hadoop YARN Yet Another Resource Negotiator

1. Seed nodes do not auto bootstrap At least one seed node needs to be specified during initial deployment. Cassandra High Performance Cookbook

1. In the bootstrap case, when a node needs to join a cluster, it reads its configuration file which contains a list of a few contact points within the cluster. Cassandra A Decentralized Structured Storage System

1. The first thing you need to do is bootstrap the source code. Beginning CouchDB

1. The examples of Bootstrap actions include providing custom configuration for Hadoop, installing of any dependent software, distributing a common dataset, and so on. Hadoop MapReduce Cookbook

1. The first thing you need to do is download and install the “bootstrap” RPM, which will update your local Yum configuration to include Cloudera’s remote Yum repository: Hadoop in Practice

1. Seed nodes will ignore the AutoBootstrap setting because it is assumed that they will be the first nodes in the cluster. Cassandra The Definitive Guide

1. Confirm that the Proceed with no Bootstrap Actions radio button is selected and click on the Continue button. Hadoop Beginners Guide

225、单词 declarative 的匹配文本：

1. This chapter explores Gradle’s declarative support for dependency management, version conflict resolution strategies, and the inner workings of its cache. Gradle in Action

1. Declarative builds and convention over configuration Gradle uses a Domain Specific Language (DSL) based on Groovy to declare builds. Gradle Effective Implementation Guide

1. A Pig Latin program combines high-level declarative operators similar to those in SQL, together with named intermediate variables representing edges in the dataflow graph between operators. FlumeJava - Easy, Efficient Data-Parallel Pipelines

1. Hive lets you build analytics and manage data using a declarative, SQL-like dialect known as HiveQL. Hadoop RealWorld Solutions Cookbook

1. It allows you to write concise, declarative builds in an idiomatic language backed by rich, domain-specific functionality. Gradle Beyond The Basics

1. On the other hand, a declarative language such as SQL is sometimes easier for analysts who prefer to just state the results one expects from a program. Hadoop in Action

1. An important one is the Structured Query Language (SQL), which is feature-rich and uses a simple, declarative syntax. Cassandra The Definitive Guide

1. Declarative query languages built on top of the core frameworks with intermediate to advanced optimizations. Tenzing - A SQL Implementation On The MapReduce Framework

1. Remember that the major difference is that Pig is an imperative language (it defines how the process will be executed), while Hive is more declarative (defines the desired results but not how they will be produced) Hadoop Beginners Guide

1. Thus you can express the activities of your build in a high-level, declarative language that is meaningful in your business and technology domain. Building and Testing with Gradle

226、单词 nio 的匹配文本：

1. Java’s NIO API (non-blocking I/O), which is used for I/O operations such as processing a file, has been updated to take advantage of the Streams API. Java 8 in Action Lambdas, Streams and Functional-style Programming

1. You can achieve greater server throughput by using features of the java.nio package. Core Java Volume II Advanced Features 9th Edition

1. This chapter is concerned with the management of paths as represented by the java.nio. Java 7 New Features Cookbook

1. The java.io, java.net, and java.nio APIs for performing I/O operations are discussed later in this chapter.) Unfortunately, the interrupt() method does not wake up a thread blocked in an I/O method of the java.io package. Java in a Nutshell 5th Edition

1. Indeed, the java.nio package that provides this support is not part of any current or planned Java ME profiles, though it is found in Android. Java Network Programming 4th Edition

1. From 1.4, native ByteBuffer s (available with the java.nio packages) allow you to pass data to native libraries without necessarily passing the data through the JNI, which can be a significant gain. Java Performance Tuning 2nd Edition

1. It discusses ActiveMQ URIs and all the transport connectors for both client-to-broker and broker-to-broker communications including TCP, NIO, STOMP, failover, SSL, HTTP/S, and much more. ActiveMQ in Action

1. Java NIO frameworks also tend to simplify the programming of client-server type applications. Java Performance

1. The advantage of the new NIO.2 is that it is asynchronous in nature. Java 7 Recipes A Problem-Solution Approach

1. That class has methods to encode an array of bytes or a NIO ByteBuffer. Java SE 8 for the Really Impatient

227、单词 mongo 的匹配文本：

1. The mongo shell is a crucial tool for using MongoDB and is used extensively throughout the rest of the text. MongoDB The Definitive Guide 2nd Edition

1. To connect to the already running MongoDB server, create a Mongo client class with a hostname and port number: Scala in Action

1. It’s generally applied to a number of recent nonrelational databases such as Cassandra, Mongo, Neo4J, and Riak. NoSQL Distilled

1. To run the MongoDBserver, simply go to the bin directory of the directory you just unpacked and run the file ./mongod. Pro Spring Security

1. Exporting data from HDFS into MongoDB This recipe will use the MongoOutputFormat class to load data from an HDFS instance into a MongoDB collection. Hadoop RealWorld Solutions Cookbook

1. Quick Start If you’re looking to set up a cluster to play around with as fast as possible, you can set one up in a minute or two using the mongo-snippets repository on Github. Scaling MongoDB

1. The mongo application allows you to use the database shell; this shell enables you to accomplish practically anything you’d want to do with MongoDB. The Definitive Guide to MongoDB

1. The mongo executable loads the shell and connects to a specified mongod process. MongoDB in Action

1. This support is mainly implemented as XML namespace and support classes for Spring JavaConfig and allows us to easily set up access to a Mongo database, an embedded Neo4j instance, and the like. Spring Data Modern Data Access for Enterprise Java

1. Suppose that you need to authenticate against users in a non-relational database such as Mongo or Neo4j. Spring in Action 4th Edition

228、单词 akka 的匹配文本：

1. But we’re not forced to use Clojure, because there are ways to use STM directly in Java, thanks to nice tools like Multiverse and Akka. Programming Concurrency on the JVM

1. The chapter dives into issues of designing actor-based systems and finishes with a demonstration of how the Akka actors library provides best practices by default. Scala in Depth

1. At the end of the chapter you will build a large distributed and scalable application using Akka. Scala in Action

1. Scala—For Scala there’s an AMQP library that can be used directly with the AKKA Scalability framework. RabbitMQ in Action

1. They’re a big part of the actor model, a model for concurrent and distributed programming, that’s used by Scala’s Akka library and by Erlang, another functional language. Functional Programming Patterns in Scala and Clojure

1. Akka A project to implement a platform for building fault-tolerant, distributed applications based on REST, Actors, etc. Programming Scala

229、单词 instantiated 的匹配文本：

1. The only information the partitioner has is the key, the value, the number of partitions, and whatever data was made available to it when it was instantiated. Pro Hadoop

1. Every task is backed by an actual task object that’s instantiated for you during Gradle’s configuration phase. Gradle in Action

1. A new instance of a mapper is instantiated in a separate JVM instance for each map task that makes up part of the total job input. Professional Hadoop Solutions

1. The withInput() method is called to provide input to the Mapper class that the MapDriver class was instantiated with. Hadoop RealWorld Solutions Cookbook

1. Next, a Sawzall interpreter is instantiated for each piece of data. Interpreting the Data Parallel Analysis with Sawzall

1. One UniqueId class is instantiated by a tsd process for each kind of UID stored in the table. HBase in Action

1. That is, the same pipe assembly can be “instantiated” many times into new Flow, in the same application, without fear of any interference between them. Hadoop：The Definitive Guide 3rd Edition

1. The setup() method of ImportMapper overrides the method called once when the class is instantiated by the framework. HBase：The Definitive Guide

1. Users submit code in the form of MapReduce jobs to the cluster, which is instantiated and executed on worker nodes within the cluster. Hadoop Operations

1. Then the publisher is instantiated and used to send a simple message 100 times. ActiveMQ in Action

230、单词 classifier 的匹配文本：

1. You will first learn how to build and train a classifier model by “teaching” an algorithm with a series of examples. Mahout in Action

1. For example, let’s say you want to extend your version management functionality by two more tasks: one that increments the major version of the project and another to do the same work for the minor version classifier. Gradle in Action

1. They implemented a logistic regression classifier on top of Spark similar to the example in Section 6.1, but they used a distributed reduceByKey to sum the gradient vectors in parallel. Resilient Distributed Datasets - A Fault-Tolerant Abstraction for In-Memory Cluster Computing

1. To set the filename of the archive, we use any of these properties: baseName, appendix, version, classifier, and extension. Gradle Effective Implementation Guide

1. In this recipe, we will use a naive Bayes classifier from Apache Mahout to determine if a set of terms found in a movie review mean the movie had a negative or positive reception. Hadoop RealWorld Solutions Cookbook

1. In addition, modules may be qualified by their classifier and type. Gradle Beyond The Basics

1. Classification using Naive Bayes Classifier A classifier assigns inputs into one of the N classes based on some properties (features) of inputs. Hadoop MapReduce Cookbook

1. First we look at creating an external node classifier, and we provide some simple examples of these for you to model your own on; then we demonstrate the use of the LDAP node classifier. Pro Puppet

1. You’ll use recommenders to recommend movies similar to movies that users have already rated; you’ll write a classifier that can filter out spam emails; and, finally, we’ll look at how you can use clustering to discover structure in your data. Hadoop in Practice

1. In ES2, this labeling is accomplished using a rule-driven classifier that uses a number of page features extracted during local analysis. Hadoop in Action

231、单词 stateful 的匹配文本：

1. Pregel is also different because it implements a stateful model where long-lived processes compute, communicate, and modify local state, rather than a dataflow model where any process computes solely on input data and produces output data input by other processes. Pregel - A System for Large-Scale Graph Processing

1. Since useful HTTPbased applications will most likely require a stateful model, this presents an interesting challenge. Java Network Programming and Distributed Computing

1. What should reversing the stream of all prime numbers do? It should return the largest prime number, which mathematics tells us doesn’t exist.) We call these operations stateful operations. Java 8 in Action Lambdas, Streams and Functional-style Programming

1. The problem is that the majority of implementations of actors are stateful (compared to the in-memory combiner pattern, described in Chapter 4) Professional Hadoop Solutions

1. HBase’s Increment command is an example of such a stateful operation. HBase in Action

1. Filters could be enhanced to become stateful, and therefore make decisions across row boundaries. HBase：The Definitive Guide

1. An option has been added to disable passivation of stateful session beans. Java EE 7 Essentials

1. Graphs and MapReduce – a match made somewhere The main reason graphs don't look like many other MapReduce problems is due to the stateful nature of graph processing, which can be seen in the path-based relationship between elements and often between the large number of nodes processed together for a single algorithm. Hadoop Beginners Guide

1. Systems such as Linux-HA work well for stateless services such as static content serving, but for a stateful system such as the namenode, they’re insufficient. Hadoop Operations

1. Stateful Almost all the UDFs are stateful by default; a UDF that is not stateful is rand() because it returns a different value for each invocation. Programming Hive

232、单词 invocations 的匹配文本：

1. MapReduce implementations should strive to reduce startup latency by using techniques like worker processes that are reused across different invocations; MapReduce a flexible data processing tool

1. Subsequent invocations of the gradle command will now reuse the daemon process. Gradle in Action

1. The Map invocations are distributed across multiple machines by automatically partitioning the input data. MapReduce - Simplied Data Processing on Large Clusters

1. But our input file hasn't changed between the task invocations, so the task doesn't have to be executed. Gradle Effective Implementation Guide

1. Although, technically, these APIs can be used to write synchronous invocations, this must be done on top of fully asynchronous APIs. Professional Hadoop Solutions

1. Then we again form chunks from the aggregate result of all the Combine invocations and recursively apply the Combine function to these new chunks. Incoop MapReduce for Incremental Computations

1. If rereduce is true, however, the keys argument will be null, and the values argument will be an array of the results produced by the previous invocations of the reduce function. Beginning CouchDB

1. The two invocations of the mapper will therefore be given the following input: Hadoop Beginners Guide

1. Communication is performed through direct method invocations of the broker object. ActiveMQ in Action

1. Command-line Java can be a hassle, but this task tries to take the hassle away and integrate command-line Java invocations into your build. Building and Testing with Gradle

233、单词 asynchronously 的匹配文本：

1. Optimistic Replication Any member of a homogeneous replica group can accept mutations [23], which are asynchronously propagated through the group. Megastore - Providing Scalable, Highly Available Storage for Interactive Services

1. Our message passing model allows us to amortize latency by delivering messages asynchronously in batches. Pregel - A System for Large-Scale Graph Processing

1. The blocks in each stripe are prefetched asynchronously; the read-ahead cache typically achieves hit rates of 95% Dremel - Interactive Analysis of WebScale Datasets

1. Object locks are similarly used to mask latency for large writes by acquiring locks and flushing data asynchronously. Ceph A Scalable, High-Performance Distributed File System

1. Writes to local disk are the most expensive operation in Dapper’s runtime library, but their visible overhead is much reduced since each disk write coalesces multiple log file write operations and executes asynchronously with respect to the traced application. Dapper a Large-Scale Distributed Systems Tracing Infrastructure

1. When each DataNode has completed writing the block locally, the blocks are moved from temporary storage into permanent storage, and each DataNode asynchronously notifies the NameNode of their block storage (steps A, B, and C) Hadoop in Practice

1. Remember that these changes are forwarded to the replica shards asynchronously and there is no guarantee that they will arrive in the same order that they were sent. Elasticsearch The Definitive Guide

1. It can break down in some cases, such as programs that process records asynchronously (say, by spawning threads) or buffer them to process in chunks, but it usually works. Hadoop in Action

1. Unlike the commit log, data is written to this file asynchronously. Cassandra The Definitive Guide

1. The rest of the steps in the process are handled asynchronously. ActiveMQ in Action

234、单词 jsf 的匹配文本：

1. Currently, he is developing web applications using the latest Java technologies on the market (EJB 3.0, CDI, Spring, JSF, Struts, Pro Java 7 NIO.2

1. Performing static code analysis example, JEE components, web frameworks like JSF, and mobile technologies such as Android. Gradle in Action

1. For example, Java Server Pages (JSP) and Java Server Faces (JSF) both have their own EL, and their syntaxes are different. Pro Spring 3

1. This could be a JSP or JSF page, but it could also be a PDF or XML representation of a page. Pro Spring MVC

1. Simple HTML pages can also be used to provide a re limited user interface for Java EE applications.) vlets, JSP pages, JSF applications, filters, and web event listeners typically cute in a web container and may respond to HTTP requests from web cli. Java EE 7 Specification

1. JavaServer Faces (JSF) introduces pass-through attributes and elements that allow near-total control over the user experience of each individual element in the view. Java EE 7 Essentials

1. Web technologies such as JavaServer Faces (JSF) and JavaServer Pages (JSP) rely heavily on the JavaBeans component model. Core Java Volume II Advanced Features 9th Edition

1. Not all of them are discussed here—JSF and Jersey, just to name two. Java Performance

1. Application frameworks such as Java Server Pages (JSP) and Java Server Faces (JSF) are built upon servlet technology. Java 7 Recipes A Problem-Solution Approach

1. They are the foundation for technologies such as JavaServer Pages (JSP) and the backbone of web frameworks such as JavaServer Faces (JSF) Pro JPA2

235、单词 multithreaded 的匹配文本：

1. Multithreaded [The] benefits of multithreading are better interactive responsiveness and real-time behavior. Core Java Volume I Fundamentals 9th Edition

1. Due to the multithreaded nature of the web browser and web server, the read workload is likely to be random access. Monitoring with Ganglia

1. This particular constraint is also why the map tasks may be multithreaded, while the reduce tasks are explicitly only single-threaded. Pro Hadoop

1. However, the main bottleneck will be multithreaded access, which has poor performance. Mastering ElasticSearch

1. The stress tool is a multithreaded load tester specifically for Cassandra. Cassandra High Performance Cookbook

1. We expect you to understand basic concepts of concurrent programs, both multithreaded and concurrent processes. HBase in Action

1. Most developers will use the multithreaded library, zookeeper_mt, as it supports both the synchronous and asynchronous APIs. Hadoop：The Definitive Guide 3rd Edition

1. The NameNode is a multithreaded system and processes requests simultaneously from multiple clients. The Hadoop Distributed File System

1. They allow you to prevent race conditions in multithreaded or sharednothing application server design. HBase：The Definitive Guide

1. Most socket-based web services (for example, HTTP daemons, servlet engines, and application servers) are multithreaded. Concurrent Programming in Java Design Principles and Pattern

236、单词 pointcut 的匹配文本：

1. Pointcuts: A pointcut is a collection of joinpoints that you use to define when advice should be executed. Pro Spring 3

1. The advice is associated with a pointcut expression and runs at any join point matching the pointcut. Pro Spring MVC

1. In the next listing you supplement the breaker definition from recipe 14.1 with the Spring AOP interceptor bean (representing advice in general AOP parlance), a pointcut, and a Spring AOP advisor (representing an aspect in general AOP parlance) Spring in Practice

1. Instead, it uses aspect-oriented programming to declare a pointcut at a method or set of methods, with an advice that performs checks for role membership when the pointcut matches. Spring Security 3.1

1. One very good thing about the XML configuration for securing methods is that it can be applied to more than one method of more than one class at the same time using AspectJ pointcut expressions—the same way you would in the rest of Spring Security’s suite of products. Pro Spring Security

1. In AspectJ parlance, each execution point is called a join point, and a particular query over join points is a pointcut. Programming Scala

1. The method should have the following signature (pointcut expression language): Spring Integration in Action

1. In both cases, the pointcut-ref attribute refers to a pointcut named embark. Spring in Action 4th Edition

237、单词 util 的匹配文本：

1. The value you passed is used in a call to java.util. Apache Flume Distributed Log Collection for Hadoop

1. TableMapReduceUtil is a useful utility class for writing a MapReduce program over HBase. HBase Administration Cookbook

1. The method defines one parameter of type java.util.Map called options. Gradle in Action

1. Note the pmdMain, pmdTest, and pmdUtil tasks that are executed. Gradle Effective Implementation Guide

1. This recipe uses the helper functions in the ByteBufferUtil class to convert byte data into strings. Cassandra High Performance Cookbook

1. The Mongo Hadoop Adaptor also provides a convenient MongoConfigUtil class to help set up the job to connect to MongoDB as if it were a filesystem. Hadoop RealWorld Solutions Cookbook

1. We will configure the TableMapper and the TableReducer using the utility methods given in the TableMapReduceUtil class. Hadoop MapReduce Cookbook

1. MapFile can be thought of as a persistent form of java.util.Map (although it doesn’t implement this interface), which is able to grow beyond the size of a Map that is kept in memory. Hadoop：The Definitive Guide 3rd Edition

1. The clock is stored as a long (representing the milliseconds since the Unix epoch), so we could wrap this in a new java.util.Date object if we wanted to. Cassandra The Definitive Guide

1. The java.util.Map interface is the parent of commonly used classes such as HashMap and (through some library backward reengineering) even the original Hashtable. Hadoop Beginners Guide

238、单词 reusable 的匹配文本：

1. A dependency is generally an external, reusable library in the form of a JAR file (for example, Log4J for logging support) Gradle in Action

1. Build logic is described with a powerful DSL and empowers developers to create reusable and maintainable build logic. Gradle Effective Implementation Guide

1. Generally the default, unbounded filter cache is fine with the soft type and proper queries that are paying attention to make the caches reusable. Mastering ElasticSearch

1. We took a look at other design patterns books that have been written in the past as inspiration, particularly Design Patterns: Elements of Reusable Object-Oriented Software, by Gamma et al. MapReduce Design Patterns

1. You can see that those methods are reusable, and after they are created, the generation of a dynamic Workflow becomes a pretty trivial task. Professional Hadoop Solutions

1. Not only that, but you’ll build it in a way that’s reusable for other kinds of queries. HBase in Action

1. For each group in the regular expression, set the appropriate column in the array—the row is a reusable ArrayList that was created in. Hadoop in Practice

1. As another benefit, UDFs tend to be more reusable than the libraries developed for writing MapReduce programs. Hadoop：The Definitive Guide 3rd Edition

1. When you write a reusable script, it’s generally parameterized such that you can vary its operation for each run. Hadoop in Action

1. This model of having reusable mapper and reducer implementations is one thing to remember, especially in combination with the fact that often the best starting point for a new MapReduce job implementation is an existing one. Hadoop Beginners Guide

239、单词 reilly 的匹配文本：

1. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc., Getting Started with CouchDB, the cover image of a hawk’s bill sea turtle, and related trade dress are trademarks of O’Reilly Media, Inc. Getting Started with CouchDB

1. Monitoring with Ganglia, the image of a Porpita pacifica, and related trade dress are trademarks of O’Reilly Media, Inc. Monitoring with Ganglia

1. CouchDB: The Definitive Guide, the image of a Pomeranian dog, and related trade dress are trademarks of O’Reilly Media, Inc. CouchDB The Definitive Guide

1. O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Scaling CouchDB

1. Gradle Beyond the Basics, the image of a Belgian shepherd dog, and related trade dress are trademarks of O’Reilly Media, Inc. Gradle Beyond The Basics

1. Hadoop: The Definitive Guide, the image of an elephant, and related trade dress are trademarks of O’Reilly Media, Inc. Hadoop：The Definitive Guide 3rd Edition

1. O’Reilly books may be purchased for educational, business, or sales promotional use. Elasticsearch The Definitive Guide

1. Cassandra: The Definitive Guide, the image of a Paradise flycatcher, and related trade dress are trademarks of O’Reilly Media, Inc. Cassandra The Definitive Guide

1. Hadoop Operations, the cover image of a spotted cavy, and related trade dress are trademarks of O’Reilly Media, Inc. Hadoop Operations

1. Building and Testing with Gradle, the image of a bush wren, and related trade dress are trademarks of O’Reilly Media, Inc. Building and Testing with Gradle

240、单词 spel 的匹配文本：

1. Spring Expression Language (SpEL) Expression Language (EL) is a technology to allow an application to manipulate Java objects at runtime. Pro Spring 3

1. The value can either be a hardcoded value or a SpEL expression. Pro Spring MVC

1. Up to this point, all the expressions you’ve seen use the Spring Expression Language (SpEL) which is new to Spring 3.0 Spring in Practice

1. Fortunately, Spring Security can leverage Spring Expression Language (SpEL) to determine whether a user has authorization. Spring Security 3.1

1. SpEL makes it possible to use programming expressions inside bean definitions and other parts of the Spring portfolio—for example, in some annotations you will see later, which allows you to manipulate objects at runtime. Pro Spring Security

1. Next, you make the configuration more flexible by leveraging the Spring Expression Language (SpEL) to avoid hardcoding certain settings in Spring configuration files. Spring Batch in Action

1. If all the information is to be found in the message itself, and all you need is to write a logical expression that’s computed against the payload or the header values, you can use the Spring 3.0 Expression Language (SpEL) directly. Spring Integration in Action

1. A control bus can execute Groovy scripts or Spring Expression Language (SpEL) expressions, allowing you to manipulate the state of components inside the application. Spring Data Modern Data Access for Enterprise Java

1. Let’s start with a look at property placeholders, the simpler of the two, and then dig into the more powerful SpEL. Spring in Action 4th Edition

241、单词 predefined 的匹配文本：

1. Writing a Pregel program involves subclassing the predefined Vertex class (see Figure 3) Pregel - A System for Large-Scale Graph Processing

1. In contrast, CouchDB uses predefined map and reduce functions in a style known as MapReduce. CouchDB The Definitive Guide

1. The actual building of your software usually follows predefined and ordered steps. Gradle in Action

1. Users can add support for a new input type by providing an implementation of a simple reader interface, though most users just use one of a small number of predefined input types. MapReduce - Simplied Data Processing on Large Clusters

1. Adding Ivy repositories An Ivy repository has a customizable l