guxch

Hypertable - 程序员指南

Developer Guide

开发者手册

(http://hypertable.com/documentation/developer_guide/)

Tableof Contents

HQL Tutorial
Hadoop MapReduce
Hadoop Streaming MapReduce
Regular Expression Filtering
Atomic Counters
Group Commit

HQL概览
Hadoop MapReduce
Hadoop流式MapReduce
正则表达过滤
原子性计数
成组提交

HQL Tutorial

HQL教程

Introduction

介绍

This tutorial shows you how to import a search engine query log into Hypertable, storing the data into tables withdifferent primary keys, and how to issue queries against the tables. You'll need to download the data from http://hypertable.googlecode.com/files/query-log.tsv.gz:

本教程描述如何将一个查询引擎的查询日志导入到Hypertable中，将采用不同的主键保存到表中，以及如何查询这些表。你需要从http://hypertable.googlecode.com/files/query-log.tsv.gz下载有关的数据。

$ mkdir -phql_tutorial

$ cd hql_tutorial

$ wgethttp://cdn.hypertable.com/pub/query-log.tsv.gz

The next step is to make sure Hypertable is properly installed (see Installation) and then launch the service. Once youhave Hypertable up and running, fire up an interactive session:

接着需要确保Hypertable被正确安装（参见安装手册）并启动服务。一旦你启动Hypertable完成后，开启一个交互式会话：

$/opt/hypertable/current/bin/ht shell

Welcome to thehypertable command interpreter.

For information aboutHypertable, visit http://www.hypertable.org/

Type 'help' for alist of commands, or 'help shell' for a

list of shell metacommands.

hypertable>

Type"help" to display the list of valid HQL commands:

输入”help”得到合法的HQL命令列表：

hypertable> help

USE ................Sets the current namespace

CREATE NAMESPACE ...Creates a new namespace

DROP NAMESPACE .....Removes a namespace

EXISTS TABLE .......Check if table exists

CREATE TABLE .......Creates a table

DELETE .............Deletes all or part of a row from a table

DESCRIBE TABLE .....Displays a table's schema

DROP TABLE .........Removes a table

RENAME TABLE .......Renames a table

DUMP TABLE .........Create efficient backup file

ALTER TABLE ........Add/remove column family from existing table

INSERT .............Inserts data into a table

LOAD DATA INFILE ...Loads data from a TSV input file into a table

SELECT .............Selects (and display) cells from a table

SHOW CREATE TABLE ..Displays CREATE TABLE command used to create table

SHOW TABLES ........Displays only the list of tables in the current namespace

GET LISTING ........Displays the list of tables and namespace in the current namespace

Statements must be terminated with';'. For more information on a specific statement, type 'help <statement>', where <statement> is from thepreceeding list.

语言必须以’;’结束，对于上面列表中某个特定语句的更详细的信息，可输入’help 语句’。

USE

First, open the root namespace. For an explanation of namespaces see Namespaces. To open the root namespace issue the following HQL command:

首先，开发root命名空间。关于命名空间的解释，参见http://hypertable.com/documentation/#namespaces。打开root命名空间的HQL命令如下：

hypertable> use"/";

CREATE NAMESPACE

Nowcreate a namespace, Tutorial, within which we'll create our tables:

现在，创建一个命名空间Tutorial，我们要在这个空间中创建表：

hypertable> createnamespace "Tutorial";

hypertable> useTutorial;

CREATE TABLE

Now that we have created and opened the Tutorial namespace we can create tables within it. In this tutorial we will be loading data into, and querying data from, two separate tables. The firsttable, QueryLogByUserID, will be indexed by the fields UserID+QueryTime and thesecond table, QueryLogByTimestamp, will be indexed by the fieldsQueryTime+UserID. Notice that any identifier that contains non-alphanumericcharacters (e.g. '-') must be surrounded by quotes.

现在我们已经创建命名空间Tutorial，打开它，我们就可以在其中创建表。在本教程中，我们把数据载入到两个表，并从中查询出来。第一个表是QueryLogByUserID，采用UserID+QueryTime索引，第二个表是QueryLogByTimestamp，采用QueryTime+UserID索引。注意，标识中的任何非字母数字字符，必须用引号。

hypertable> CREATETABLE QueryLogByUserID ( Query, ItemRank, ClickURL );

hypertable> CREATETABLE QueryLogByTimestamp ( Query, ItemRank, ClickURL );

See the HQL Documentation: CREATE TABLE forcomplete syntax.

关于CREATE TABLE的完整语法，参阅HQL文档。

SHOW TABLES

Show all of the tables that exist in the current namespace:

查看所有的表在当前的命名空间下。

hypertable> showtables;

QueryLogByUserID

QueryLogByTimestamp

SHOW CREATE TABLE

Now, issue the SHOW CREATE TABLE command to make sure you got everything right. We didn't have to include the field called'row' because we'll use that in our LOAD DATA INFILE command later:

现在，用SHOW CREATE TABLE命令来看看是不是一切都做好了。我们没有包含’row’这一列，因为我们会在后面的LOAD DATA INFILE中用它。

hypertable> showcreate table QueryLogByUserID;

CREATE TABLEQueryLogByUserID (

Query,

ItemRank,

ClickURL,

ACCESS GROUP default (Query, ItemRank,ClickURL)

);

And, notice that, by default, a singleACCESS GROUP named "default" is created. Access groups are a way to physically groupcolumns together on disk. See the CREATETABLE documentation for a more detailed description of access groups.

注意，缺省情况下，一个叫做’default’的ACCESS GROUP被创建出来，Access groups是一个将有关列放在一起保存到磁盘的方法。有关access groups的更详细描述，参阅CREATE TABLE的文档。

LOAD DATA INFILE

Now, let's load some data using the MySQL-like TAB delimited format (TSV). For that, we assume you have the example data in the file query-log.tsv.gz. This file includes an initial header line indicating the format of each line in the file by listing tab delimited column names. To inspect this file we first quit out of the Hypertable command line interpreter and then use the zcat program (requires gzip package) to display the contents of query-log.tsv.gz:

现在，让我们载入采用TAB分割、类似MySQL一样的格式（TSV）的数据。我们假设示例数据已在文件query-log.tsv.gz中。这个文件包头部含了一个初始化行，通过列出以TAB分割的列名，指出每行数据的格式。为查看这个文件，我们先退出Hypertable的命令行解释器，用zcat程序（需要gzip包）来显示query-log.tsv.gz的内容。

hypertable> quit

$ zcatquery-log.tsv.gz

#QueryTime UserID Query ItemRank ClickURL

2008-11-1300:01:30 2289203 kitchen counter innew orleans 10 http://www.era.com

2008-11-1300:01:30 2289203 kitchen counter innew orleans 4 http://www.superpages.com

2008-11-1300:01:30 2289203 kitchen counter innew orleans 5 http://www.superpages.com

2008-11-1300:01:31 1958633 beads amethystgemstone 1 http://www.gemsbiz.com

2008-11-1300:01:31 3496052 chat

2008-11-13 00:01:33 892003 photo example quarter doubled die coin 5 http://www.coinresource.com

2008-11-1300:01:33 892003 photo example quarter doubled die coin 5 http://www.coinresource.com

2008-11-1300:01:35 2251112 radio stations inbuffalo 1 http://www.ontheradio.net

2008-11-1300:01:37 1274922 fafsa renewal 1 http://www.fafsa.ed.gov

2008-11-1300:01:37 441978 find phone numbers 1 http://www.anywho.com

2008-11-1300:01:37 441978 find phone numbers 3 http://www.411.com

...

Now let's load the data filequery-log.tsv.gz into the table QueryLogByUserID. The row key is formulated byzero-padding the UserID field out to nine digits and concatenating the QueryTime field. The QueryTime field is used as the internal cell timestamp. To load the file, first jump back into the Hypertable command line interpreter and then issue the LOAD DATAINFILE command show below.

现在我们将query-log.tsv.gz中的数据载入QueryLogByUserID，行键由UserID和QueryTime合成，其中UserID用0补齐为9位数。QueryTime用作内部单元的时间戳。为载入数据，首先回到Hypertable命令行解释器，然后采用如下面显示的LOAD DATA INFILE命令。

$/opt/hypertable/current/bin/ht shell

...

hypertable> useTutorial;

hypertable> LOADDATA INFILE ROW_KEY_COLUMN="%09UserID"+QueryTimeTIMESTAMP_COLUMN=QueryTime "query-log.tsv.gz" INTO TABLEQueryLogByUserID;

Loading 7,464,729bytes of input data...

0% 10 20 30 40 50 60 70 80 90 100%

|----|----|----|----|----|----|----|----|----|----|

***************************************************

Load complete.

Elapsed time: 9.84 s

Avg value size: 15.42 bytes

Avg key size: 29.00 bytes

Throughput: 4478149.39 bytes/s (764375.74 bytes/s)

Total cells: 992525

Throughput: 100822.73 cells/s

Resends: 0

A quick inspection of the table shows:

来快速看看表

hypertable> select* from QueryLogByUserID limit 8;

000000036 2008-11-1310:30:46 Query helena ga

000000036 2008-11-1310:31:34 Query helena ga

000000036 2008-11-1310:45:23 Query checeron s

000000036 2008-11-1310:46:07 Query cheveron gas station

000000036 2008-11-1310:46:34 Query cheveron gas station richmond virginia

000000036 2008-11-1310:48:56 Query cheveron glenside road richmond virginia

000000036 2008-11-1310:49:05 Query chevron glenside road richmond virginia

000000036 2008-11-1310:49:05 ItemRank 1

000000036 2008-11-1310:49:05 ClickURL http://yp.yahoo.com

000000053 2008-11-1315:18:21 Query mapquest

000000053 2008-11-1315:18:21 ItemRank 1

000000053 2008-11-1315:18:21 ClickURL http://www.mapquest.com

Elapsed time: 0.01 s

Avg value size: 18.08 bytes

Avg key size: 30.00 bytes

Throughput: 43501.21 bytes/s

Total cells: 12

Throughput: 904.70 cells/s

Now let's load the data file query-log.tsv.gz into the table QueryLogByTimestamp. The row key is formulated by concatenating the QueryTime field with the nine digit, zero-padded UserIDfield. The QueryTime field is used as the internal cell timestamp.

现在我们将query-log.tsv.gz中的数据载入到QueryLogByTimestamp表，行键由QueryTime和9位UserID(用0补齐)合成。QueryTime用作内部单元时间戳。

hypertable> LOADDATA INFILE ROW_KEY_COLUMN=QueryTime+"%09UserID"TIMESTAMP_COLUMN=QueryTime "query-log.tsv.gz" INTO TABLEQueryLogByTimestamp;

Loading 7,464,729bytes of input data...

0% 10 20 30 40 50 60 70 80 90 100%

|----|----|----|----|----|----|----|----|----|----|

***************************************************

Load complete.

Elapsed time: 10.18 s

Avg value size: 15.42 bytes

Avg key size: 29.00 bytes

Throughput: 4330913.20 bytes/s (739243.98 bytes/s)

Total cells: 992525

Throughput: 97507.80 cells/s

Resends: 0

And a quick inspection of the table shows:

来快速看看表的内容。

hypertable> select* from QueryLogByTimestamp limit 4;

2008-11-13 00:01:30002289203 Query kitchen counter in new orleans

2008-11-13 00:01:30002289203 ItemRank 5

2008-11-13 00:01:30002289203 ClickURL http://www.superpages.com

2008-11-13 00:01:31001958633 Query beads amethyst gemstone

2008-11-13 00:01:31001958633 ItemRank 1

2008-11-13 00:01:31001958633 ClickURL http://www.gemsbiz.com

2008-11-13 00:01:31003496052 Query chat

2008-11-13 00:01:33000892003 Query photo example quarter doubled die coin

2008-11-13 00:01:33000892003 ItemRank 5

2008-11-13 00:01:33000892003 ClickURL http://www.coinresource.com

Elapsed time: 0.00 s

Avg value size: 18.11 bytes

Avg key size: 30.00 bytes

Throughput: 287150.49 bytes/s

Total cells: 19

Throughput: 5969.21 cells/s

See the HQL Documentation: LOAD DATA INFILE for complete syntax.

关于LOAD DATA INFILE的完整语法，请参阅HQL文档。

SELECT

Let's start by examining the QueryLogByUserID table. To select all of the data for user ID 003269359 we need to use the starts with operator =^. Remember that the row key is the concatenation of the user ID and the timestamp which is why we need to use the starts with operator.

让我们从QueryLogByUserID表开始。为了查询出所有user ID为003269359的数据，我们需要用开始操作符=^。请记住，行键是user ID 和时间戳的合成，这是我们采用这个操作符的原因。

hypertable> select* from QueryLogByUserID where row =^ '003269359';

003269359 2008-11-1304:36:34 Query binibining pilipinas 2008 winners

003269359 2008-11-1304:36:34 ItemRank 5

003269359 2008-11-1304:36:34 ClickURL http://www.missosology.org

003269359 2008-11-1304:37:34 Query pawee's kiss and tell

003269359 2008-11-1304:37:34 ItemRank 3

003269359 2008-11-1304:37:34 ClickURL http://www.missosology.org

003269359 2008-11-1305:07:10 Query rn jobs in 91405

003269359 2008-11-1305:07:10 ItemRank 9

003269359 2008-11-1305:07:10 ClickURL http://91405.jobs.com

003269359 2008-11-1305:20:22 Query rn jobs in 91405

...

003269359 2008-11-1309:42:49 Query wound ostomy rn training

003269359 2008-11-1309:42:49 ItemRank 11

003269359 2008-11-1309:42:49 ClickURL http://www.wocn.org

003269359 2008-11-1309:46:50 Query pych nurse in encino tarzana hospital

003269359 2008-11-1309:47:18 Query encino tarzana hospital

003269359 2008-11-1309:47:18 ItemRank 2

003269359 2008-11-1309:47:18 ClickURL http://www.encino-tarzana.com

003269359 2008-11-1309:52:42 Query encino tarzana hospital

003269359 2008-11-1309:53:08 Query alhambra hospital

003269359 2008-11-1309:53:08 ItemRank 1

003269359 2008-11-1309:53:08 ClickURL http://www.alhambrahospital.com

Elapsed time: 0.01 s

Avg value size: 19.24 bytes

Avg key size: 30.00 bytes

Throughput: 2001847.79 bytes/s

Total cells: 352

Throughput: 40651.35 cells/s

The result set was fairly large (352 cells), so let's now try selecting just the queries that were issued by the user with ID 003269359 during the hour of 5am. To do this we need to add aTIMESTAMP predicate. Each cell has an internal timestamp and the TIMESTAMP predicate can be used to filter the results based on this timestamp.

结果相当大（352个单元数据），所以，现在我们查询003269359用户在5am发出的查询，这时，我们需要加一个时间戳TIMESTAMP谓词。每个单元都有一个内部的时间戳，这个TIMESTAMP谓词能用来过滤查询结果。

hypertable> select* from QueryLogByUserID where row =^ '003269359' AND "2008-11-1305:00:00" <= TIMESTAMP < "2008-11-13 06:00:00";

003269359 2008-11-1305:07:10 Query rn jobs in 91405

003269359 2008-11-1305:07:10 ItemRank 9

003269359 2008-11-1305:07:10 ClickURL http://91405.jobs.com

003269359 2008-11-1305:20:22 Query rn jobs in 91405

003269359 2008-11-1305:20:22 ItemRank 16

003269359 2008-11-1305:20:22 ClickURL http://www.careerbuilder.com

003269359 2008-11-1305:34:02 Query usc university hospital

003269359 2008-11-1305:34:02 ItemRank 1

003269359 2008-11-1305:34:02 ClickURL http://www.uscuh.com

003269359 2008-11-1305:37:01 Query rn jobs in san fernando valley

003269359 2008-11-1305:37:01 ItemRank 7

003269359 2008-11-1305:37:01 ClickURL http://www.medhunters.com

003269359 2008-11-1305:46:22 Query northridge hospital

003269359 2008-11-1305:46:22 ItemRank 2

003269359 2008-11-1305:46:22 ClickURL http://northridgehospital.org

003269359 2008-11-1305:53:34 Query valley presbyterian hospital

003269359 2008-11-1305:53:34 ItemRank 4

003269359 2008-11-1305:53:34 ClickURL http://www.hospital-data.com

003269359 2008-11-1305:55:36 Query valley presbyterian hospital website

003269359 2008-11-1305:55:36 ItemRank 1

003269359 2008-11-1305:55:36 ClickURL http://www.valleypres.org

003269359 2008-11-1305:59:24 Query mission community hospital

003269359 2008-11-1305:59:24 ItemRank 1

003269359 2008-11-1305:59:24 ClickURL http://www.mchonline.org

Elapsed time: 0.00 s

Avg value size: 18.50 bytes

Avg key size: 30.00 bytes

Throughput: 2602086.44 bytes/s

Total cells: 36

Throughput: 53651.27 cells/s

Keep in mind that the internal celltimestamp is different than the one embedded in the row key. In this example,they both represent the same time. By specifying the TIMESTAMP_COLUMN option toLOAD DATA INFILE, we extracted the QueryTime field to be used as the internalcell timestamp. If we hadn't supplied that option, the system would haveauto-assigned a timestamp. To display the internal cell timestamp, add theDISPLAY_TIMESTAMPS option:

请记住，单元内部时间戳与行键中的不同，在这个例子中，它们都代表同样的时间。在LOAD DATA INFILE中，通过指定TIMESTAMP_COLUMN选项，我们可以将QueryTime列用作单元内部时间戳。如果我们不用该选项，则系统将自动赋一个时间戳。添加DISPLAY_TIMESTAMPS选项可以显示单元内部时间戳。

hypertable> select* from QueryLogByUserID limit 5 DISPLAY_TIMESTAMPS;

2008-11-1310:30:46.000000000 000000036 2008-11-1310:30:46 Query helena ga

2008-11-1310:31:34.000000000 000000036 2008-11-1310:31:34 Query helena ga

2008-11-1310:45:23.000000000 000000036 2008-11-1310:45:23 Query checeron s

2008-11-1310:46:07.000000000 000000036 2008-11-1310:46:07 Query cheveron gas station

2008-11-1310:46:34.000000000 000000036 2008-11-1310:46:34 Query cheveron gas station richmond virginia

Elapsed time: 0.00 s

Avg value size: 17.20 bytes

Avg key size: 30.00 bytes

Throughput: 207563.76 bytes/s

Total cells: 5

Throughput: 4397.54 cells/s

There is no index for the internal cell timestamps, so if we don't include a row =^ expression in our predicate, the system will do a full table scan. This is why we imported the data into a second table QueryLogByTimestamp. This table includes the timestamp as the row key prefix which allows us to efficiently query data over a time interval.

The following query selects all query log data for November 14th, 2008:

单元内部时间戳没有索引，所以如果我们在谓词中没有添加=^，系统将做全表扫描。这就是为什么我们将数据载入第二个表QueryLogByTimestamp的原因。这个表的行键以时间戳为前缀，我们可以高效地查询出某个时间间隔的数据。

hypertable> select* from QueryLogByTimestamp WHERE ROW =^ '2008-11-14';

2008-11-14 00:00:00001040178 Query noodle tools

2008-11-14 00:00:00001040178 ItemRank 1

2008-11-14 00:00:00001040178 ClickURL http://www.noodletools.com

2008-11-14 00:00:01000264655 Query games.myspace.com

2008-11-14 00:00:01000264655 ItemRank 1

2008-11-14 00:00:01000264655 ClickURL http://games.myspace.com

2008-11-14 00:00:01000527424 Query franklinville schools new jersey

2008-11-14 00:00:01000527424 ItemRank 1

2008-11-14 00:00:01000527424 ClickURL http://www.greatschools.net

2008-11-14 00:00:01000632400 Query lack of eye contact symptom of...

...

2008-11-14 06:02:33003676354 Query baby 20showers

2008-11-14 06:02:35003378030 Query task and responsibility matrix

2008-11-14 06:02:35003378030 ItemRank 2

2008-11-14 06:02:35003378030 ClickURL http://im.ncsu.edu

2008-11-14 06:02:36004578101 Query jcpenneys

2008-11-14 06:02:37005120734 Query ebay

2008-11-14 06:02:40000957500 Query buccal fat size of ping pong ball

Elapsed time: 2.37 s

Avg value size: 15.36 bytes

Avg key size: 30.00 bytes

Throughput: 1709616.45 bytes/s

Total cells: 89412

Throughput: 37689.18 cells/s

And to select all query log data for November 14th, 2008 during the hour of 3 am:

查询在2008年11月14日上午3点的所有日志数据：

hypertable> select* from QueryLogByTimestamp WHERE ROW =^ '2008-11-14 03';

2008-11-14 03:00:00002512415 Query ny times

2008-11-14 03:00:00002512415 ItemRank 1

2008-11-14 03:00:00002512415 ClickURL http://www.nytimes.com

2008-11-14 03:00:00005294906 Query kickmeto.fosi

2008-11-14 03:00:00005459226 Query http://www.dickdyertoyota.com

2008-11-14 03:00:02000637292 Query days of our lives

2008-11-14 03:00:02000637292 ItemRank 3

2008-11-14 03:00:02000637292 ClickURL http://www.nbc.com

2008-11-14 03:00:03002675105 Query ghetto superstar lyrics

...

2008-11-14 03:59:52002874080 ClickURL http://www.paintball-discounters.com

2008-11-14 03:59:53004292772 Query drop down menu

2008-11-14 03:59:55005656539 Query to buy indian hair to make wigs in new york

2008-11-14 03:59:55005656539 ItemRank 1

2008-11-14 03:59:55005656539 ClickURL http://query.nytimes.com

2008-11-14 03:59:58004318586 Query myspace .com

Elapsed time: 0.17 s

Avg value size: 15.37 bytes

Avg key size: 30.00 bytes

Throughput: 2267099.06 bytes/s

Total cells: 8305

Throughput: 49967.51 cells/s

And finally, to select all query log data for November 14th, 2008 during the minute of 3:45 am:

最后，在2008年11月14日上午3点45分的所有日志数据：

hypertable> select* from QueryLogByTimestamp WHERE ROW =^ '2008-11-14 03:45';

2008-11-14 03:45:00003895650 Query ks lottery.

2008-11-14 03:45:00003895650 ItemRank 2

2008-11-14 03:45:00003895650 ClickURL http://www.lotterypost.com

2008-11-14 03:45:00005036796 Query http://www.glasgowdailytimes 10-20-2005

2008-11-14 03:45:01002863052 Query map quest

2008-11-14 03:45:01005514285 Query john bermeo

2008-11-14 03:45:02002394176 Query http://www.eggseye.com

2008-11-14 03:45:02003454227 Query hawaiian weddig band

2008-11-14 03:45:03001006089 Query brokers hiring loan officers in indiana

2008-11-14 03:45:06000844720 Query latest design microsoft freeware

...

2008-11-14 03:45:55003920469 ItemRank 3

2008-11-14 03:45:55003920469 ClickURL http://www.pennyblood.com

2008-11-14 03:45:56002729906 Query tryaold

2008-11-14 03:45:56003919348 Query feathered draped fox fur mandalas

2008-11-14 03:45:56003919348 ItemRank 8

2008-11-14 03:45:56003919348 ClickURL http://www.greatdreams.com

2008-11-14 03:45:56004803968 Query -

Elapsed time: 0.02 s

Avg value size: 15.71 bytes

Avg key size: 30.00 bytes

Throughput: 305030.80 bytes/s

Total cells: 130

Throughput: 6673.51 cells/s

See the HQL Documentation: SELECT for complete syntax.

关于SELECT的完整句法参阅HQL文档。

ALTER TABLE

The ALTER TABLE command can be used to add and/or remove columns from a table. The following command will add a 'Notes'column in a new access group called 'extra' and will drop column 'ItemRank'.

ALTER TABLE命令用来增加或移除表中的列。以下命令在一个新的叫’extra’的access group中增加一个’Notes’的列，并删除列’ItemRank’。

hypertable> ALTERTABLE QueryLogByUserID ADD(Notes, ACCESS GROUP extra(Notes)) DROP(ItemRank);

To verify the change, issue the SHOW CREATE TABLE command:

用SHOW CREATE TABLE来验证上面的修改。

hypertable> showcreate table QueryLogByUserID;

CREATE TABLEQueryLogByUserID (

Query,

ClickURL,

Notes,

ACCESS GROUP default (Query, ClickURL),

ACCESS GROUP extra (Notes)

)

And to verify that the column no longerexists, issue the same SELECT statement we issued above (NOTE: the data for thecolumn still exists in the file system, it will get lazily garbage collected).

为了验证列已不存在了，用上面的SELECT语言来看看。（注：列中的数据依然存在，它将被滞后的垃圾回收机制所回收）。

hypertable> select* from QueryLogByUserID limit 8;

000000036 2008-11-1310:30:46 Query helena ga

000000036 2008-11-1310:31:34 Query helena ga

000000036 2008-11-1310:45:23 Query checeron s

000000036 2008-11-1310:46:07 Query cheveron gas station

000000036 2008-11-1310:46:34 Query cheveron gas station richmond virginia

000000036 2008-11-1310:48:56 Query cheveron glenside road richmond virginia

000000036 2008-11-1310:49:05 Query chevron glenside road richmond virginia

000000036 2008-11-1310:49:05 ClickURL http://yp.yahoo.com

000000053 2008-11-1315:18:21 Query mapquest

000000053 2008-11-1315:18:21 ClickURL http://www.mapquest.com

Elapsed time: 0.00 s

Avg value size: 21.50 bytes

Avg key size: 30.00 bytes

Throughput: 140595.14 bytes/s

Total cells: 10

Throughput: 2730.00 cells/s

See HQL Documentation: ALTER TABLE forcomplete syntax.

关于ALTER TABLE的完整描述，参阅HQL文档。

INSERT & DELETE

Now let's augment the QueryLogByUserIDtable by adding some information in the Notes column for a few of the queries:

现在我们向QueryLogByUserID表中Notes列增加一些信息。

hypertable> INSERTINTO QueryLogByUserID VALUES

("0000190582008-11-13 07:24:43", "Notes", "animals"),

("0000190582008-11-13 07:57:16", "Notes", "food"),

("0000190582008-11-13 07:59:36", "Notes", "gardening");

Elapsed time: 0.01 s

Avg value size: 6.67 bytes

Total cells: 3

Throughput: 298.36 cells/s

Resends: 0

Notice the new data by querying the affectedrow:

来查询一下受影响的行。

hypertable> select* from QueryLogByUserID where row =^ '000019058';

000019058 2008-11-1307:24:43 Query tigers

000019058 2008-11-1307:24:43 Notes animals

000019058 2008-11-1307:57:16 Query bell peppers

000019058 2008-11-1307:57:16 Notes food

000019058 2008-11-1307:58:24 Query bell peppers

000019058 2008-11-1307:58:24 ClickURL http://agalternatives.aers.psu.edu

000019058 2008-11-1307:59:36 Query growing bell peppers

000019058 2008-11-1307:59:36 ClickURL http://www.farm-garden.com

000019058 2008-11-1307:59:36 ClickURL http://www.organicgardentips.com

000019058 2008-11-1307:59:36 Notes gardening

000019058 2008-11-1312:31:02 Query tracfone

000019058 2008-11-1312:31:02 ClickURL http://www.tracfone.com

Elapsed time: 0.00 s

Avg value size: 16.38 bytes

Avg key size: 30.00 bytes

Throughput: 162271.26 bytes/s

Total cells: 13

Throughput: 3498.39 cells/s

Now try deleting one of the notes we just added

现在尝试删除一下我们刚才加的note内容。

hypertable> deleteNotes from QueryLogByUserID where ROW ="000019058 2008-11-1307:24:43";

Elapsed time: 0.00 s

Total cells: 1

Throughput: 256.41 cells/s

Resends: 0

And verify that the cell was, indeed, deleted:

验证一下是否真正地删除掉了。

hypertable> select* from QueryLogByUserID where row =^ '000019058';

000019058 2008-11-1307:24:43 Query tigers

000019058 2008-11-1307:57:16 Query bell peppers

000019058 2008-11-1307:57:16 Notes food

000019058 2008-11-1307:58:24 Query bell peppers

000019058 2008-11-1307:58:24 ClickURL http://agalternatives.aers.psu.edu

000019058 2008-11-1307:59:36 Query growing bell peppers

000019058 2008-11-1307:59:36 ClickURL http://www.farm-garden.com

000019058 2008-11-1307:59:36 ClickURL http://www.organicgardentips.com

000019058 2008-11-1307:59:36 Notes gardening

000019058 2008-11-1312:31:02 Query tracfone

000019058 2008-11-1312:31:02 ClickURL http://www.tracfone.com

Elapsed time: 0.00 s

Avg value size: 16.38 bytes

Avg key size: 30.00 bytes

Throughput: 162271.26 bytes/s

Total cells: 12

Throughput: 3498.39 cells/s

See the HQL Documentation: INSERT and theHQL Documentation: DELETE for complete syntax.

关于INSERT,DELETE的完整语法，参阅HQL文档。

DROP TABLE

The DROP TABLE command is used to removetables from the system. The IF EXISTS option prevents the system from throwingan error if the table does not exist:

DROP TABLE命令用来从系统中删除表。可以用IFEXISTS选项来防止删除不存在表时产生的错误。

hypertable> droptable IF EXISTS foo;

Let's remove one of the example tables:

让我们删除一个我们的示例表。

hypertable> droptable QueryLogByUserID;

hypertable> showtables;

QueryLogByTimestamp

Then let's remove the other:

接着删除另一个。

hypertable> droptable QueryLogByTimestamp;

hypertable> showtables;

GET LISTING & DROP NAMESPACE

列出并删除命名空间

Now, we want to get rid of the Tutorialnamespace and verify that we have:

现在，我们想去掉Tutorial命名空间，并验证它。

hypertable> use"/";

hypertable> getlisting;

Tutorial (namespace)

sys (namespace)

hypertable> dropnamespace Tutorial;

hypertable> getlisting;

sys (namespace)

The sys namespace is used by the Hypertablesystem and should not be used to contain user tables.

Note that a namespace must be empty (iemust not contain any sub-namespaces or tables) before you can drop it. In thiscase since we had already dropped the QueryLogByUserID and QueryLogByTimestamptables, we could go ahead and drop the Tutorial namespace.

命名空间sys是供Hypertable系统使用的，不应该包含用户表。

请注意，只有命名空间为空（即不能包含子命名空间或表）时，你才能删除它。在我们的例子中，我们已经删除了QueryLogByUserID和QueryLogByTimestamp，所以就能删除Tutorial命名空间了。

Hadoop MapReduce

In order to run this example, Hadoop needsto be installed and HDFS and the MapReduce framework needs to be up andrunning. Hypertable builds againstCloudera's CDH3 distribution of hadoop. See CDH3 Installation for instructions on how to get Hadoop up andrunning.

Hypertable ships with a jar file,hypertable-x.x.x.x.jar (where x.x.x.x is the hypertable release level, e.g.,0.9.5.5) that contains Hadoop InputFormat and OutputFormat classes that allowMapReduce programs to directly read from and write to tables in Hypertable. In this section, we walk you through anexample MapReduce program, WikipediaWordCount, that tokenizes articles in atable called wikipedia that has been loaded with a Wikipedia dump. It reads the article column, tokenizes it,and populates the word column of the same table. Each unique word in thearticle turns into a qualified column and the value is the number of times theword appears in the article.

为了运行本例，需要安装Hadoop，HDFS和MapReduce框架必须启动。Hypertable是针对Cloudera CDH3发行版的Hadoop而构建的，关于如何运行hadoop，参阅CDH3的安装指导（https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation）。

Hypertable带有一个jar文件hypertable-x.x.x.x.jar(这里x.x.x.x是hypertable的发行号，例如0.9.5.5)，这个文件中，包含Hadoop的InputFormat 和OutputFormat类，允许MapReduce程序直接读取或写入Hypertable中的表。本节中，我们就走一遍一个MapReduce的例子程序WikipediaWordCount，它对一个叫wikipedia表中的文章进行词法分析，wikipedia表已经载入了维基百科导出的数据，程序读取列article中内容，进行词法分析，然后将结果写入到同一个表中的word列。文章中每一个不同的词变成一个列标识，该列的值为文章中词出现的次数。

Setup

First, exit the Hypertable command lineinterpreter and download the Wikipedia dump, for example:

首先，退出Hypertable命令行解释器，下载维基百科dump数据。例如：

$ wgethttp://cdn.hypertable.com/pub/wikipedia.tsv.gz

Next, jump back into the Hypertable commandline interpreter and create the wikipedia table by executing the HQL commandsshow below.

下一步，返回Hypertable命令行解释器，运行HQL命令创建wikipedia，如下所示。

CREATE NAMESPACEtest;

USE test;

DROP TABLE IF EXISTSwikipedia;

CREATE TABLEwikipedia (

title,

id,

username,

article,

word

);

Now load the compressed Wikipedia dump filedirectly into the wikipedia table by issuing the following HQL commands:

现在，采用下面的HQL命令，直接导入维基百科dump文件到wikipedia表中：

hypertable> LOADDATA INFILE "wikipedia.tsv.gz" INTO TABLE wikipedia;

Loading 638,058,135bytes of input data...

0% 10 20 30 40 50 60 70 80 90 100%

|----|----|----|----|----|----|----|----|----|----|

***************************************************

Load complete.

Elapsed time: 78.28 s

Avg value size: 1709.59 bytes

Avg key size: 24.39 bytes

Throughput: 25226728.63 bytes/s (8151017.58 bytes/s)

Total cells: 1138847

Throughput: 14548.46 cells/s

Resends: 8328

Example

In this example, we'll be running theWikipediaWordCount program which is included in thehypertable-X.X.X.X-examples.jar file included in the binary packageinstallation. The following is a link to the source code for this program.

在本例中，我们要运行WikipediaWordCount，这个程序包含在hypertable-X.X.X.X-examples.jar文件中，该jar在Hypertable二进制安装包中。下面是这个程序的源文件链接。

WikipediaWordCount.java（https://github.com/hypertable/hypertable/blob/master/examples/java/org/hypertable/examples/hadoop/mapreduce/WikipediaWordCount.java）

To get an idea of what the data looks like,try the following select:

想看看数据是什么样的，用如下的select命令：

hypertable> select* from wikipedia where row =^ "Addington";

Addington,Buckinghamshire title Addington, Buckinghamshire

Addington,Buckinghamshire id 377201

Addington,Buckinghamshire username Roleplayer

Addington,Buckinghamshire article {{infobox UKplace \n|country =England\n|latitude=51.95095\n|longitude=-0.92177\n|official_name= Addington\n|population = 145 ...

Now exit from the Hypertable command line interpreter and run the WikipediaWordCountMapReduce program:

现在退出Hypertable命令行解释器，运行MapReduce 程序WikipediaWordCount。

hypertable> quit

$ hadoop jar/opt/hypertable/current/lib/java/hypertable-*-examples.jar \

org.hypertable.examples.WikipediaWordCount\

--namespace=test --columns=article

To verify that it worked, jump back intothe Hypertable command line interpreter and try selecting for the word column:

要验证该程序正确运行了，返回Hypertable命令行解释器，用select查询列work:

$/opt/hypertable/current/bin/ht shell

hypertable> selectword from wikipedia where row =^ "Addington";

...

Addington,Buckinghamshire word:A 1

Addington,Buckinghamshire word:Abbey 1

Addington,Buckinghamshire word:Abbotts 1

Addington,Buckinghamshire word:According 1

Addington,Buckinghamshire word:Addington 6

Addington,Buckinghamshire word:Adstock 1

Addington,Buckinghamshire word:Aston 1

Addington,Buckinghamshire word:Aylesbury 3

Addington,Buckinghamshire word:BUCKINGHAM 1

Addington, Buckinghamshire word:Bayeux 2

Addington,Buckinghamshire word:Bene 1

Addington,Buckinghamshire word:Bishop 1

...

Hadoop Streaming MapReduce

In this example, we'll be running a HadoopStreaming MapReduce job that uses a Bash script as the mapper and a Bash scriptas the reducer. Like the example in theprevious section, the programs operate on a table called wikipedia that hasbeen loaded with a Wikipedia dump.

本例中，我们要运行一个Hadoop流MapReduce任务，该任务分别用两个Bash脚本作为mapper和reducer。像上节的例子一样，这个程序操作了一个叫wikipedia的表，表中的数据由维基百科dump导入。

Setup

First, exit the Hypertable command lineinterpreter and download the Wikipedia dump, for example:

首先，退出Hypertable命令行，下载维基百科dump文件，例如

$ wgethttp://cdn.hypertable.com/pub/wikipedia.tsv.gz

Next, jump back into the Hypertable command line interpreter and create the wikipedia table by executing the HQL commands show below.

接下来，回到Hypertable命令行解释器，用如下的HQL命令创建wikipedia表。

CREATE NAMESPACEtest;

USE test;

DROP TABLE IF EXISTSwikipedia;

CREATE TABLEwikipedia (

title,

id,

username,

article,

word

);

Now load the compressed Wikipedia dump filedirectly into the wikipedia table by issuing the following HQL commands:

现在，采用下面的HQL命令，直接导入维基百科dump文件到wikipedia表中：

hypertable> LOADDATA INFILE "wikipedia.tsv.gz" INTO TABLE wikipedia;

Loading 638,058,135bytes of input data...

0% 10 20 30 40 50 60 70 80 90 100%

|----|----|----|----|----|----|----|----|----|----|

***************************************************

Load complete.

Elapsed time: 78.28 s

Avg value size: 1709.59 bytes

Avg key size: 24.39 bytes

Throughput: 25226728.63 bytes/s (8151017.58 bytes/s)

Total cells: 1138847

Throughput: 14548.46 cells/s

Resends: 8328

The mapper script (tokenize-article.sh) and the reducer script (reduce-word-counts.sh) are show below.

mapper脚本（tokenize-article.sh)和reducer脚本 (reduce-word-counts.sh)如下。

Example

The following script, tokenize-article.sh,will be used as the mapper script.

下面的脚本tokenize-article.sh，可以作为mapper脚本。

#!/usr/bin/env bash

IFS=" "

read name columnarticle

while [ $? == 0 ] ;do

if [ "$column" =="article" ] ; then

# Strip punctuation

stripped_article=`echo $article | awk'BEGIN { FS="\t" } { print $NF }' | tr"\!\"#\$&'()*+,-./:;<=>?@[\\\\]^_\{|}~" " " |tr -s " "` ;

# Split article into words

echo $stripped_article | awk -vname="$name" 'BEGIN { article=name; FS=" "; } { for (i=1;i<=NF; i++) printf "%s\tword:%s\t1\n", article, $i; }' ;

# Read another line

read name column article

done

exit 0

The following script, reduce-word-counts.sh, will be used as the reducer script.

下面的脚本reduce-word-counts.sh，可以作为reducer脚本。

#!/usr/bin/env bash

last_article=

last_word=

let total=0

IFS=" "

read article wordcount

while [ $? == 0 ] ;do

if [ "$article" =="$last_article" ] && [ "$word" =="$last_word" ] ; then

let total=$count+total

else

if [ "$last_word" !="" ]; then

echo "$last_article$last_word $total"

let total=$count

last_word=$word

last_article=$article

read article word count

done

if [ $total -gt 0 ] ;then

echo "$last_article $last_word $total"

exit 0

To populate the word column of thewikipedia table by tokenizing the article column using the above mapper andreduce script, issue the following command:

用上面的mapper和reducer脚本，对article列中的内容进行词法分析，结果保存到wikipedia表中的word列中。

hypertable> quit

$ hadoop jar/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u*.jar \

-libjars/opt/hypertable/current/lib/java/hypertable-*.jar,/opt/hypertable/current/lib/java/libthrift-*.jar\

-Dhypertable.mapreduce.namespace=test\

-Dhypertable.mapreduce.input.table=wikipedia\

-Dhypertable.mapreduce.output.table=wikipedia\

-mapper/home/doug/tokenize-article.sh \

-combiner/home/doug/reduce-word-counts.sh \

-reducer/home/doug/reduce-word-counts.sh \

-file/home/doug/tokenize-article.sh \

-file/home/doug/reduce-word-counts.sh \

-inputformatorg.hypertable.hadoop.mapred.TextTableInputFormat \

-outputformatorg.hypertable.hadoop.mapred.TextTableOutputFormat \

-input wikipedia-output wikipedia

Input/Output Configuration Properties

The following table lists the jobconfiguration properties that are used to specify, among other things, theinput table, output table, and scan specification. These properties can besupplied to a streaming MapReduce job with -Dproperty=value arguments.

下表列出了需要指定的作业的配置属性，其中，有输入表，输出表，扫描标准。这些属性可以通过-Dproperty=value arguments 这样的方式传递给流MapReduce作业。

Input/Output Configuration Properties

Property	Description	Example Value
hypertable.mapreduce.namespace	Namespace for both input and output table	/test
hypertable.mapreduce.input.namespace	Namespace for input table	/test/intput
hypertable.mapreduce.input.table	Input table name	wikipedia
hypertable.mapreduce.input.scan_spec.columns	Comma separated list of input columns	id,title
hypertable.mapreduce.input.scan_spec.options	Input WHERE clause options	MAX_VERSIONS 1 KEYS_ONLY
hypertable.mapreduce.input.scan_spec.row_interval	Input row interval	Dog <= ROW < Kitchen
hypertable.mapreduce.input.scan_spec.timestamp_interval	Timestamp filter	TIMESTAMP >= 2011-11-21
hypertable.mapreduce.input.include_timestamps	Emit integer timestamp as the
1st field (nanoseconds since epoch)	TRUE
hypertable.mapreduce.output.namespace	Namespace containing output table	/test/output
hypertable.mapreduce.output.table	Output table name	wikipedia
hypertable.mapreduce.output.mutator_flags	flags parameter passed to mutator constructor (1 = NO_LOG_SYNC)	1

Column Selection

To run a MapReduce job over a subset of columns from the input table, specify a comma separated list of columns in the hypertable.mapreduce.input.scan_spec.columns Hadoop configurationproperty. For example,

为使MapReduce作业只对输入表的一些列起作用，在Hadoop的配置属性hypertable.mapreduce.input.scan_spec.columns中指定这些列，其中列之间用逗号分隔，例如

$ hadoop jar/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u*.jar \

-libjars/opt/hypertable/current/lib/java/hypertable-*.jar,/opt/hypertable/current/lib/java/libthrift-*.jar\

-Dhypertable.mapreduce.namespace=test\

-Dhypertable.mapreduce.input.table=wikipedia\

-Dhypertable.mapreduce.input.scan_spec.columns="id,title"\

-mapper /bin/cat-reducer /bin/cat \

-inputformatorg.hypertable.hadoop.mapred.TextTableInputFormat \

-input wikipedia-output wikipedia2

Timestamps

To filter the input table with a timestamp predicate, specify the timestamp predicate in the hypertable.mapreduce.input.scan_spec.timestamp_interval Hadoop configuration property. The timestamp predicate is specified using the same format as thetimestamp predicate in the WHERE clause of the SELECT statement, as illustratedin the following examples:

TIMESTAMP < 2010-08-03 12:30:00
TIMESTAMP >= 2010-08-03 12:30:00
2010-08-01 <= TIMESTAMP <= 2010-08-09

To preserve the timestamps from the inputtable, set the hypertable.mapreduce.input.include_timestamps Hadoop configuration property to true. This will cause the TextTableInputFormat class to produce an additional field (field 0) that represents the timestamp as nanoseconds since the epoch. The following example illustrates how to pass a timestamp predicate into a Hadoop Streaming MapReduce program.

为了在输入表用时间戳进行过滤，在Hadoop配置属性hypertable.mapreduce.input.scan_spec.timestamp_interval中指定时间戳谓词，该时间戳谓词的格式与在SELECT语句中WHERE子句的谓词格式相同，如下例子所示：

TIMESTAMP < 2010-08-03 12:30:00
TIMESTAMP >= 2010-08-03 12:30:00
2010-08-01 <= TIMESTAMP <= 2010-08-09

为在输入表中保留时间戳，设置Hadoop配置属性hypertable.mapreduce.input.include_timestamps为true。这将使TextTableInputFormat类产生额外的一列（field0）：时间戳（Unix起始时刻开始的纳秒值）。下面的例子显示了如何传递一个时间戳谓词给流MapReduce程序。

$ hadoop jar/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u*.jar \

-libjars/opt/hypertable/current/lib/java/hypertable-*.jar,/opt/hypertable/current/lib/java/libthrift-*.jar\

-Dhypertable.mapreduce.namespace=test\

-Dhypertable.mapreduce.input.table=wikipedia\

-Dhypertable.mapreduce.output.table=wikipedia2\

-Dhypertable.mapreduce.input.scan_spec.columns="id,title"\

-Dhypertable.mapreduce.input.scan_spec.timestamp_interval="2010-08-01<= TIMESTAMP <= 2010-08-09" \

-Dhypertable.mapreduce.input.include_timestamps=true\

-mapper /bin/cat-reducer /bin/cat \

-inputformatorg.hypertable.hadoop.mapred.TextTableInputFormat \

-outputformatorg.hypertable.hadoop.mapred.TextTableOutputFormat \

-input wikipedia-output wikipedia2

Row Intervals

To restrict the MapReduce to a specific rowinterval of the input table, a row range can be specified with thehypertable.mapreduce.input.scan_spec.row_interval Hadoop configuration property. The row interval predicate is specified using the same format as thetimestamp predicate in the WHERE clause of the SELECT statement, as illustratedin the following examples:

ROW < foo
ROW >= bar
bar <= ROW <= 'foo;'

The following example illustrates how a rowinterval is passed into a Hadoop Streaming MapReduce program.

为限制MapReduce只对特定的行起作用，通过设置Hadoop配置属性hypertable.mapreduce.input.scan_spec.row_interval来指定特定的行。行间隔谓词的格式与SELECT语句WHERE子句中时间戳的谓词相同，如下面所示：

ROW < foo
ROW >= bar
bar <= ROW <= 'foo;'

下面的例子显示了如何将一个行间隔传递给Hadoop流MapReduce程序。

$ hadoop jar/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u*.jar \

-libjars/opt/hypertable/current/lib/java/hypertable-*.jar,/opt/hypertable/current/lib/java/libthrift-*.jar\

-Dhypertable.mapreduce.namespace=test\

-Dhypertable.mapreduce.input.table=wikipedia\

-Dhypertable.mapreduce.output.table=wikipedia2\

-Dhypertable.mapreduce.input.scan_spec.columns="id,title"\

-Dhypertable.mapreduce.input.scan_spec.row_interval="Dog<= ROW <= Kitchen" \

-mapper /bin/cat-reducer /bin/cat \

-inputformatorg.hypertable.hadoop.mapred.TextTableInputFormat \

-outputformatorg.hypertable.hadoop.mapred.TextTableOutputFormat \

-input wikipedia-output wikipedia2

Options

A subset of the WHERE clause options of the HQL SELECT statement can be specified by supplying the options with the hypertable.mapreduce.input.scan_spec.options Hadoop configuration property. The following options are supported:

MAX_VERSIONS
CELL_LIMIT
KEYS_ONLY

The following example illustrates how to pass options to a Hadoop Streaming MapReduce program.

如果只需用HQL SELECT中WHERE结果的子集，可以添加Hadoop配置属性hypertable.mapreduce.input.scan_spec.options来设置选项，选项有以下值：

MAX_VERSIONS
CELL_LIMIT
KEYS_ONLY

下面的例子显示了如何将这些选项传递给Hadoop流MapReduce程序。

$ hadoop jar/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u*.jar \

-libjars/opt/hypertable/current/lib/java/hypertable-*.jar,/opt/hypertable/current/lib/java/libthrift-*.jar\

-Dhypertable.mapreduce.namespace=test\

-Dhypertable.mapreduce.input.table=wikipedia\

-Dhypertable.mapreduce.output.table=wikipedia2\

-Dhypertable.mapreduce.input.scan_spec.options="MAX_VERSIONS1 KEYS_ONLY CELL_LIMIT 2" \

-mapper /bin/cat-reducer /bin/cat \

-inputformatorg.hypertable.hadoop.mapred.TextTableInputFormat \

-outputformatorg.hypertable.hadoop.mapred.TextTableOutputFormat \

-input wikipedia-output wikipedia2

Regular ExpressionFiltering

Hypertable supports filtering of data usin gregular expression matching on the row key, column qualifiers and value.Hypertable uses RE2 for regular expression matching, the complete supported syntax can be found in the RE2 Syntax document.

Hypertable支持用正则表达式来过滤行键、列标识和值。Hypertable采用RE2库来完成正则表达式的匹配。关于完整的RE2正则表达式语法，请参阅其文档。

Example

In this example we'll use a DMOZ dataset which contains title, description and a bunch of topic tags for a set of URLs.The domain components of the URL have been reversed so that URLs from the same domain sort together. In the schema, the row key is a URL and the title,description and topic are column families. Heres a small sample from the dataset:

本例中，我们要用DMOZ数据集，其中包含了标题、描述以及一系列URL的标题tag。URL中的域部分已经反向，因而相同的域会放在一起。在这个例子中，行键是URL，标题、描述、主题是列族。以下是数据集的一小部分。

com.awn.www Title Animation World Network com.awn.www Description Provides information resources to the international animation community. Features include searchable database archives, monthly magazine, web animation guide,the Animation Village, discussion forums and other useful resources.

com.awn.www Topic:Arts

com.awn.www Topic:Animation

Exit the hypertable shell and download the dataset, which is in the .tsv.gz format which can be directly loaded into Hypertable without unzipping:

退出Hypertable命令行，下载这个数据集，它是.tsv.gz格式，不需要解压就可以直接导入到Hypertable中。

hypertable> quit

$ wgethttp://cdn.hypertable.com/pub/dmoz.tsv.gz

Jump back into the hypertable shell and create the dmoz table as follows:

返回Hypertable命令行，创建dmoz表：

/opt/hypertable/current/bin/htshell

hypertable> USE"/";

hypertable> CREATETABLE dmoz(Description, Title, Topic, ACCESS GROUP topic(Topic));

hypertable> LOADDATA INFILE "dmoz.tsv.gz" INTO TABLE dmoz;

Loading 412,265,627bytes of input data...

0% 10 20 30 40 50 60 70 80 90 100%

|----|----|----|----|----|----|----|----|----|----|

***************************************************

Load complete.

Elapsed time: 242.26 s

Avg value size: 15.09 bytes

Avg key size: 24.76 bytes

Throughput: 6511233.28 bytes/s (1701740.27 bytes/s)

Total cells: 39589037

Throughput: 163414.69 cells/s

Resends: 144786

In the following queries we limit the number of rows returned to 2 for brevity. Suppose you want a subset of the URLs from the domain inria.fr where the first component of the domain doesn't start with the letter 'a', you could run:

为简短，在下面的查询中，我们限制返回的行数为2。假设你想要一个域为inria.fr的URL的子集，第一个部分不以’a’开始，你可以运行：

hypertable> SELECTTitle FROM dmoz WHERE ROW REGEXP "fr\.inria\.[^a]" LIMIT 2 REVS=1KEYS_ONLY;

fr.inria.caml

fr.inria.caml/pub/docs/oreilly-book

To look at all topics which start with write(case insensitive):

hypertable> SELECTTopic:/(?i)^write/ FROM dmoz LIMIT 2;

13.141.244.204/writ_denTopic:Writers_Resources

ac.sms.www/clubpage.asp?club=CL001003004000301311 Topic:Writers_Resources

The next example shows how to query for data where the description contains the word game followed by either foosball or halo:

下一个例子显示了描述中包含’game’但紧接着为foosball或halo的数据。

hypertable> SELECTCELLS Description FROM dmoz WHERE VALUE REGEXP "(?i:game.*(foosball|halo)\s)"LIMIT 2 REVS=1;

com.armchairempire.www/Previews/PCGames/planetside.htm Description Preview by Mr. Nash. "So, on the oneside the game is sounding pretty snazzy, on the other it sounds sort of likeHalo at its core."

com.digitaldestroyers.www Description Video game fans in Spartanburg, SouthCarolina who like to get together and compete for bragging rights. Also competewith other Halo / Xbox fan clubs.

Atomic Counters

Column families can optionally act as atomiccounters by supplying the COUNTER option in the column specification of theCREATE TABLE command. Counter columns are accessed using the same methods asother columns. However, to modify the counter, the value must be formattedspecially, as described in the following table.

在CREATE TABLE命令中，列的定义可以添加COUNTER选项，这样列族就可以作为一个原子计数器。这个计数器列的访问方式与其他列相同。但是，要修改此计数器列的值，值的格式比较特别，如下表所示：

Value Format Description

['+'] n Increment the counter by n

'-' n Decrement the counter by n

'=' n Reset the counter to n

Example

In this example we create a table ofcounters called counts that contains a single column family url that acts as anatomic counter for urls. By convention, the row key is the URL with the domainname reversed (so that URLs from the same domain sort next to each other) andthe column qualifier is the hour in which the "hit" occurred. Thetable is created with the following HQL:

本例中，我们创建一个叫count的表，其中包含单一的一个列族url，作为url的原子计数器。通常，行键是域反向的url（相同域的url会排序在一起），列标识是点击该url的小时值。表的创建如下

hypertable> use"/";

hypertable> createtable counts ( url COUNTER );

Let's say we've accumulated url"hit" occurrences in the following .tsv file:

我们累计的url点击数在下面的.tsv文件中。

#row column value

org.hypertable.www/ url:2010-10-26_09 +1

org.hypertable.www/download.html url:2010-10-26_09 +1

org.hypertable.www/documentation.html url:2010-10-26_09 +1

org.hypertable.www/download.html url:2010-10-26_09 +1

org.hypertable.www/about.html url:2010-10-26_09 +1

org.hypertable.www/ url:2010-10-26_09 +1

org.hypertable.www/ url:2010-10-26_10 +1

org.hypertable.www/about.html url:2010-10-26_10 +1

org.hypertable.www/ url:2010-10-26_10 +1

org.hypertable.www/download.html url:2010-10-26_10 +1

org.hypertable.www/documentation.html url:2010-10-26_10 +1

org.hypertable.www/ url:2010-10-26_10 +1

If we were to load this file with LOAD DATAINFILE into the counts table, a subsequent select would yield the followingoutput:

如果我们想将此文件用LOAD DATA INFILE载入到counts表中，然后的select语句将产生如下的输出：

hypertable> select* from counts;

org.hypertable.www/ url:2010-10-26_09 3

org.hypertable.www/ url:2010-10-26_10 3

org.hypertable.www/about.html url:2010-10-26_09 1

org.hypertable.www/about.html url:2010-10-26_10 1

org.hypertable.www/documentation.html url:2010-10-26_09 1

org.hypertable.www/documentation.html url:2010-10-26_10 1

org.hypertable.www/download.html url:2010-10-26_09 2

org.hypertable.www/download.html url:2010-10-26_10 2

Group Commit

Updates are carried out by the RangeServers through the following steps:

1. Write the update to the commit log (in the DFS)

2. Sync the commit log (in the DFS)

3. Populate in-memory data structure with the update

Under high concurrency, step #2 can become a bottleneck. Distributed filesy stems such as HDFS can typically handle a small number of sync operations per second. The Group Commit feature solves this problem by delaying updates, grouping them together, and carrying them out in a batch on some regular interval.

A table can be configured to use group commit by supplying the GROUP_COMMIT_INTERVAL option in the CREATE TABLE statement. The GROUP_COMMIT_INTERVAL option tells the system that updates tothis table should be carried out with group commit and also specifies the commit interval in milliseconds. The interval is constrained by the value of the config property Hypertable.RangeServer. Commit Interval, which acts as a lower bound (default is 50ms). The value specified for GROUP_COMMIT_INTERVAL will get rounded up to the nearest multiple of this property value. The following is an example CREATE TABLE statement that creates a table counts setup for group commit operation.

更新操作由RangeServer按如下步骤完成：

1. 将更新写入提交日志（在DFS中）

2. 同步提交日志（在DFS中）

3. 更新内容写入内存结构

在高并发性下，步骤2可能是一个瓶颈，通常，像HDFS这样的分布式文件系统在一秒内只能处理不多的文件同步请求。成组提交是这个问题的解决方法，即滞后更新，将若干更新放在一起，定时将打包的更新一同提交。

在创建表的语句中，提供GROUP_COMMIT_INTERVAL选项，表可以被设置成成组提交。GROUP_COMMIT_INTERVAL选项告诉系统这个表的更新采用成组提交，并且指定提交的周期（毫秒）。这个周期有一个限制，它要大于配置文件中Hypertable.RangeServer.CommitInterval属性的值（缺省是50ms）。GROUP_COMMIT_INTERVAL中的值将被圆整为上面那个属性的最接近的倍数的值，下面的CREATE TABLE的例子创建了成组提交的表counts。

Example

hypertable> CREATETABLE counts (

url,

domain

)GROUP_COMMIT_INTERVAL=100;

你可能感兴趣的:(Hypertable - 程序员指南)

Nginx 请求转发配置指南 web13093320398 面试学习路线阿里巴巴 nginx linux 运维
Nginx请求转发配置指南1.简介Nginx是一款高性能的HTTP和反向代理服务器，也是一个IMAP/POP3/SMTP代理服务器。本文档将介绍如何使用Nginx配置请求转发，并解释一些常用的配置参数。2.Nginx安装在配置之前，确保你的系统已经安装了Nginx。如果未安装，可以使用以下命令进行安装：在CentOS/RHEL上：sudoyuminstallnginx-y在Ubuntu/Debia
说说 Spring MVC 的执行流程？浮生带你学Java Java面试题 Spring spring mvc java
高频面试题：说说SpringMVC的执行流程？大家好，我是浮生，一个工作了十四年的java程序员！昨天，一个工作2年的粉丝在面试的时候，面试官要求他说SpringMVC的执行流程。他没回答上来，错过了这个offer。一、问题解析SpringMVC的执行流程，一个面试频率超级高的问题，但是缺难倒了无数的程序员。这个问题的考察范围主要是3~5年，甚至5年以上都会问到。和它同类型的问题还有Bean的加载
Llama.cpp 服务器安装指南（使用 Docker，GPU 专用）田猿笔记 AI 高级应用 llama 服务器 docker llama.cpp
前置条件在开始之前，请确保你的系统满足以下要求：操作系统：Ubuntu20.04/22.04（或支持Docker的Linux系统）。硬件：NVIDIAGPU（例如RTX4090）。内存：16GB+系统内存，GPU需12GB+显存（RTX4090有24GB）。存储：15GB+可用空间（用于源码、镜像和模型文件）。网络：需要互联网连接以下载源码和依赖。软件：已安装并运行Docker。已安装NVIDIA
第14天：C++异常处理实战指南 - 构建安全的文件解析系统 JuicyActiveGilbert C++教程 c++安全开发语言
第14天：C++异常处理实战指南-构建安全的文件解析系统一、今日学习目标掌握C++异常处理的核心语法与流程️理解RAII在资源管理中的关键作用创建自定义文件解析异常体系实现安全的文件解析器原型二、C++异常处理核心机制1.异常处理基础语法#include#include#includevoidparseConfiguration(conststd::string&path){std::ifstre
Composer如何通过GitHub Personal Access Token安装私有包：完整教程 lihuang319 composer github php
使用Composer安全管理您的PHP私有依赖包一、前言在PHP开发中，我们经常需要将内部工具包托管为私有仓库。传统的账号密码验证方式存在安全隐患，而GitHubPersonalAccessToken（PAT）提供了一种更安全的鉴权方案。本文将通过4个核心步骤+3个避坑指南，手把手教您在Composer中优雅地使用PAT安装私有包。二、为什么要用PAT？安全性：细粒度权限控制（可设置过期时间/单仓
如何使用Python编程实现捕获笔记本电脑麦克风的音频并通过蓝牙耳机实时传输 winfredzhang python 音视频实时传输蓝牙耳机
在现代的工作和生活环境中，音频传输的需求日益增加。无论是远程会议、在线教育，还是家庭娱乐，音频的实时传输都扮演着至关重要的角色。今天，我将向大家介绍一个简单而实用的应用程序，它能够捕获笔记本电脑麦克风的音频，并通过蓝牙耳机实时传输。这款应用程序特别适用于需要在会议室等场景中远程听取声音的情况。接下来，我将详细讲解这个应用程序的实现过程，并提供完整的代码和使用指南。引言想象一下这样的场景：你需要离开
DeepSeek R1 简单指南：架构、训练、本地部署和硬件要求爱喝白开水a 人工智能 AI大模型 DeepSeek R1 DeepSeek 算法人工智能训练大模型部署
DeepSeek推出的LLM推理新策略DeepSeek最近发表的论文DeepSeek-R1中介绍了一种创新的方法，通过强化学习（RL）提升大型语言模型（LLM）的推理能力。这项研究在如何仅依靠强化学习而不是过分依赖监督式微调的情况下，增强LLM解决复杂问题的能力上，取得了重要进展。DeepSeek-R1技术概述模型架构DeepSeek-R1不是一个单独的模型，而是包括DeepSeek-R1-Zer
《RabbitMQ系列教程-第四章-07-RabbitMQ工作模式之Publisher Confirms 模式》_rabbitmq publisher confirms 2401_84264727 程序员 rabbitmq 分布式
写在最后在结束之际，我想重申的是，学习并非如攀登险峻高峰，而是如滴水穿石般的持久累积。尤其当我们步入工作岗位之后，持之以恒的学习变得愈发不易，如同在茫茫大海中独自划舟，稍有松懈便可能被巨浪吞噬。然而，对于我们程序员而言，学习是生存之本，是我们在激烈市场竞争中立于不败之地的关键。一旦停止学习，我们便如同逆水行舟，不进则退，终将被时代的洪流所淘汰。因此，不断汲取新知识，不仅是对自己的提升，更是对自己的
解决qt.network.ssl: QSslSocket::connectToHostEncrypted: TLS initialization failed 码农葫芦侠 Qt qt ssl c++
解决Qt/C++程序中的TLS初始化失败错误：全面排查指南当你在程序中遇到qt.network.ssl:QSslSocket::connectToHostEncrypted:TLSinitializationfailed错误时，可能意味着SSL/TLS协议栈未能正确初始化。本文将深入分析常见原因，并提供可直接操作的解决方案。目录快速诊断：确认SSL支持状态❓OpenSSL库缺失或路径错误❌Qt与O
使用 Microsoft OneDrive 加载文档的指南 shuoac microsoft onedrive python
技术背景介绍MicrosoftOneDrive（以前称为SkyDrive）是由微软运营的文件托管服务。通过OneDrive，你可以在云端存储和共享文档、照片、视频等数据。本文将介绍如何从OneDrive加载文档，目前支持的文件格式包括docx、doc和pdf。核心原理解析为了能够从OneDrive加载文档，需要进行以下几个步骤：注册应用程序以获取客户端ID和密钥。获取OneDrive的DriveI
利用DSPy优化LangChain RAG系统的实战指南 scaFHIO langchain python
利用DSPy优化LangChainRAG系统的实战指南技术背景介绍DSPy是一个用于大语言模型（LLMs）的出色框架，它引入了一个自动编译器，能够教会模型如何执行你程序中的声明性步骤。具体来说，DSPy编译器会在内部追踪你的程序，然后为大型语言模型（LLMs）创建高质量的提示（或为小型LLMs训练自动微调），以教会它们任务的步骤。感谢OmarKhattab的努力，现在DSPy可以与LangChai
自学网络安全（黑客技术）2025年 —100天学习计划白帽黑客cst 学习网络安全 web安全 linux
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包前言什么是网络安全网络安全可以基于攻击和防御视角来分类，我们经常听到的“红队”、“渗透测试”等就是研究攻击技术，而“蓝队”、“安全运营”、“安全运维”则研究防御技术。如何成为一名黑客很多朋友在学习安全方面都会半路转行，因为不知如何去学，在这里，我将这个整份答案分为黑客（网络安全）入门必备、黑客（网络安全）职业指南、黑客（网络安全）学习
自学网络安全（黑客技术）2025年 —100天学习计划白帽黑客cst 学习 web安全安全网络 linux
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包前言什么是网络安全网络安全可以基于攻击和防御视角来分类，我们经常听到的“红队”、“渗透测试”等就是研究攻击技术，而“蓝队”、“安全运营”、“安全运维”则研究防御技术。如何成为一名黑客很多朋友在学习安全方面都会半路转行，因为不知如何去学，在这里，我将这个整份答案分为黑客（网络安全）入门必备、黑客（网络安全）职业指南、黑客（网络安全）学习
自学网络安全（黑客技术）2025年 —100天学习计划白帽黑客cst 学习 web安全安全网络 linux
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包前言什么是网络安全网络安全可以基于攻击和防御视角来分类，我们经常听到的“红队”、“渗透测试”等就是研究攻击技术，而“蓝队”、“安全运营”、“安全运维”则研究防御技术。如何成为一名黑客很多朋友在学习安全方面都会半路转行，因为不知如何去学，在这里，我将这个整份答案分为黑客（网络安全）入门必备、黑客（网络安全）职业指南、黑客（网络安全）学习
网络安全（黑客技术) 最新—90天学习计划白帽黑客cst 网络安全 web安全网络安全学习数据结构 mysql
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包前言什么是网络安全网络安全可以基于攻击和防御视角来分类，我们经常听到的“红队”、“渗透测试”等就是研究攻击技术，而“蓝队”、“安全运营”、“安全运维”则研究防御技术。如何成为一名黑客很多朋友在学习安全方面都会半路转行，因为不知如何去学，在这里，我将这个整份答案分为黑客（网络安全）入门必备、黑客（网络安全）职业指南、黑客（网络安全）学习
自学网络安全（黑客技术）2024年 —100天学习计划白帽黑客cst 学习 web安全安全网络 linux
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包前言什么是网络安全网络安全可以基于攻击和防御视角来分类，我们经常听到的“红队”、“渗透测试”等就是研究攻击技术，而“蓝队”、“安全运营”、“安全运维”则研究防御技术。如何成为一名黑客很多朋友在学习安全方面都会半路转行，因为不知如何去学，在这里，我将这个整份答案分为黑客（网络安全）入门必备、黑客（网络安全）职业指南、黑客（网络安全）学习
程序员如何阅读英文文档 zcswl7961 分布式架构阅读技巧英文文档
文章目录为什么要看英文文档谈经验阅读方法语法词汇如何去记单词对全文的把握批判的观点读英语文章如何利用google来搜索答案转载为什么要看英文文档在回答怎么做之前，我想说说为什么要这么做。在我们学习新技术的过程中不是有中文文档吗？既然有中文文档，为什么还要去学习阅读英文文档呢，这不是多此一举吗？我想说的是，这绝对不算多此一举。很多的英文文档虽然有对应翻译的中文文档。但是很多东西的翻译质量是不敢恭维的
mPaas-RPC拦截器各种场景下的使用指南阿里开发者移动开发运维 JavaScript Java
简介：mPaas-RPC拦截器各种场景下的使用指南1.背景金融级移动开发平台mPaaS[1]（MobilePaaS）为App开发、测试、运营及运维提供云到端的一站式解决方案，能有效降低技术门槛、减少研发成本、提升开发效率，协助企业快速搭建稳定高质量的移动应用。其中移动网关服务（MobileGatewayService，简称MGS）作为mPaas最重要的组件之一，连接了移动客户端与服务端，简化了移动
Azure AI Search Retriever 深度指南 bBADAS azure 人工智能 flask python
技术背景介绍AzureAISearch（前称AzureCognitiveSearch）是微软提供的云端搜索服务，为开发者提供了强大的基础设施、API和工具，以扩展性地进行向量、关键词和混合查询的信息检索。AzureAISearchRetriever是一个集成模块，能够从非结构化查询中返回文档。它基于BaseRetriever类，并针对AzureAISearch的2023-11-01稳定RESTAP
大模型替代程序猿？不可能，绝对不可能！进化路线来了来！！！ XiaoDuofCSDN 算法 leetcode 职场和发展 java 分布式深度学习神经网络
大模型后端开发面试指南：技术体系构建与实践路线一、背景说明随着大模型（LLM）技术进入工业化落地阶段，行业对大模型后端开发工程师的需求呈现爆发式增长。该岗位要求候选人不仅需要掌握传统分布式系统开发能力，还需深入理解大模型特有的计算范式。本文针对零基础转型场景，提供体系化的能力建设方案。二、核心目标建立大模型后端开发技术认知图谱构建可验证的工程项目实践体系形成持续跟踪技术演进的方法论三、持续关注的核
STM32F4xx传感器分类及专业应用指南平凡灵感码头 stm32 stm32 分类嵌入式硬件
一、电平信号类传感器（GPIO中断驱动）1.红外避障传感器技术细节：阈值调节：传感器自带电位器可调节检测距离（典型范围2cm-30cm）抗干扰设计：需添加38kHz载波调制，防止自然光干扰STM32接口电路：VCC--3.3VDO--PA0（配置为上拉输入，下降沿触发中断）GND--共地代码实现：//GPIO初始化GPIO_InitTypeDefGPIO_InitStruct={0};GPIO_I
freemarker解析html标签,【转】Freemarker输出$和html标签等特殊符号 weixin_39970689
原文：http://blog.csdn.net/achilles12345/article/details/41820507场景：程序员都不喜欢看文档，而更喜欢抄例子。所以，我们把平台组的组件都做成例子供别人参考。我们前端展示层使用的是freemarker，所以遇到这个问题，比如我们要让前端显示freemarker自己的源码时就有问题了(因为我们例子程序的页面也是使用freemarker)。遇到的
使用GitPython和GitLoader进行版本控制与文档加载 vaidfl python
技术背景介绍Git是一种分布式版本控制系统，用于跟踪文件集的更改，通常用于程序员协作开发软件源代码。Git的特点包括支持分支和合并、轻量级、快速操作以及强大的社区支持等。在Python开发中，我们可以使用GitPython库来操作和管理Git仓库。此外，借助GitLoader，我们可以轻松地从Git仓库加载文档，以便在各种应用中使用。核心原理解析GitPython是一个Python库，它允许你通过
H100生成式AI效能跃升指南智能计算研究中心其他
内容概要作为NVIDIAHopper架构的旗舰产品，H100GPU通过革命性的硬件设计与计算范式重构，为生成式AI工作负载提供了前所未有的加速能力。本文将从芯片架构创新出发，首先解析第四代TensorCore如何通过FP8精度支持与动态指令调度机制，实现矩阵运算效率的指数级提升；继而探讨显存子系统在带宽扩容与智能缓存分配上的突破，揭示其突破生成式AI内存墙的关键路径。在技术实践层面，文章系统梳理了
设计模式-结构性01-适配器模式（Adapter Pattern）薇薇设计模式
适配器模式：将一个类的接口转换成客户希望的另外一个接口，使得原本由于接口不兼容而不能一起工作的那些类能一起工作（目的是消除由于接口不匹配所造成的类的兼容性问题）。适配器模式分为类结构型模式和对象结构型模式两种，前者类之间的耦合度比后者高，且要求程序员了解现有组件库中的相关组件的内部结构，所以应用相对较少些。该模式的主要优点如下。客户端通过适配器可以透明地调用目标接口。复用了现存的类，程序员不需要修
Vue.js 入门指南：从基础到实战阿绵前端 vue.js 前端 javascript
Vue.js是一款流行的渐进式JavaScript框架，广泛用于构建交互式Web界面。它具有简单易学、轻量级、高性能的特点，适合前端新手入门。本文将从Vue的基本概念入手，详细介绍Vue的生命周期及常见用法，帮助你快速上手Vue开发官网：https://cn.vuejs.org/1.Vue.js介绍1.1Vue的特点易学易用：Vue采用直观的模板语法，降低了学习成本响应式数据绑定：使用双向绑定(v
今天是我们的节日 1024 程序员节，码动未来，改变世界的神秘力量！程序员程序员节
今天是我们的节日1024程序员节，码动未来，改变世界的神秘力量！博主小程序体验|博主公众号分享在这个充满科技感的时代，有这样一群人，他们用一行行代码编织着未来的梦想，用智慧和创造力改变着世界的模样。今天，10月24日，程序员节，让我们一起走进这群神秘的“代码魔法师”的世界。程序员，他们是数字世界的建筑师。当我们在手机上轻松滑动，浏览各种信息；当我们在电脑前畅快地玩游戏、办公；当我们享受着各种便捷的
Python工厂模式封装Webhook群聊机器人忆想不到的晖 python 机器人开发语言 webhook 飞书钉钉
引言企业存在给特定群组自动推送消息的需求，比如：监控报警推送、销售线索推送、运营内容推送等。你可以在群聊中添加一个自定义机器人，通过服务端调用webhook地址，即可将外部系统的通知消息即时推送到群聊中。飞书自定义机器人使用指南：https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN钉钉自定义机器人使用指南：https:/
Python 调用本地部署DeepSeek的API 详细指南 kunwen123 python
B站先查看deepseek的应用和API调用和本地化部署这三方面知识确认Ollama是否正确运行如果你使用Ollama部署了DeepSeek，默认API运行在11434端口。首先，检查Ollama是否正常运行：curlhttp://localhost:11434/api/tags如果返回：{“models”:[“deepseek-coder:latest”,“deepseek-chat:lates
孔夫子旧书网 item_search_sold 接口开发应用指南 Jelena15779585792 孔夫子API 前端数据库 python
在二手书交易领域，孔夫子旧书网作为国内知名的平台，提供了丰富的API接口，允许开发者通过关键字搜索已售商品的详细信息。这些接口为市场分析、商品研究以及数据挖掘提供了强大的支持。本文将详细介绍如何使用孔夫子旧书网的item_search_sold接口，通过关键字获取已售商品的信息，并提供开发指南和代码示例。一、接口概述item_search_sold是孔夫子旧书网提供的一个API接口，专门用于搜索已
jQuery 键盘事件keydown ,keypress ,keyup介绍 107x js jquery keydown keypress keyup
本文章总结了下些关于jQuery 键盘事件keydown ,keypress ,keyup介绍，有需要了解的朋友可参考。一、首先需要知道的是： 1、keydown() keydown事件会在键盘按下时触发. 2、keyup() 代码如下复制代码 $('input').keyup(funciton(){
AngularJS中的Promise bijian1013 JavaScript AngularJS Promise
一.Promise Promise是一个接口，它用来处理的对象具有这样的特点：在未来某一时刻（主要是异步调用）会从服务端返回或者被填充属性。其核心是，promise是一个带有then()函数的对象。为了展示它的优点，下面来看一个例子，其中需要获取用户当前的配置文件： var cu
c++ 用数组实现栈类 CrazyMizzz 数据结构 C++
#include<iostream> #include<cassert> using namespace std; template<class T, int SIZE = 50> class Stack{ private: T list[SIZE];//数组存放栈的元素 int top;//栈顶位置 public: Stack(
java和c语言的雷同麦田的设计者 java 递归 scaner
软件启动时的初始化代码，加载用户信息2015年5月27号从头学java二 1、语言的三种基本结构：顺序、选择、循环。废话不多说，需要指出一下几点： a、return语句的功能除了作为函数返回值以外，还起到结束本函数的功能，return后的语句不会再继续执行。 b、for循环相比于whi
LINUX环境并发服务器的三种实现模型被触发 linux
服务器设计技术有很多，按使用的协议来分有TCP服务器和UDP服务器。按处理方式来分有循环服务器和并发服务器。 1 循环服务器与并发服务器模型在网络程序里面，一般来说都是许多客户对应一个服务器，为了处理客户的请求，对服务端的程序就提出了特殊的要求。目前最常用的服务器模型有： ·循环服务器：服务器在同一时刻只能响应一个客户端的请求 ·并发服务器：服
Oracle数据库查询指令肆无忌惮_ oracle数据库
20140920 单表查询 -- 查询************************************************************************************************************ -- 使用scott用户登录 -- 查看emp表 desc emp
ext右下角浮动窗口知了ing JavaScript ext
第一种 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/
浅谈REDIS数据库的键值设计矮蛋蛋 redis
http://www.cnblogs.com/aidandan/ 原文地址：http://www.hoterran.info/redis_kv_design 丰富的数据结构使得redis的设计非常的有趣。不像关系型数据库那样，DEV和DBA需要深度沟通，review每行sql语句，也不像memcached那样，不需要DBA的参与。redis的DBA需要熟悉数据结构，并能了解使用场景。
maven编译可执行jar包 alleni123 maven
http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven <build> <plugins> <plugin> <artifactId>maven-asse
人力资源在现代企业中的作用百合不是茶 HR 企业管理
//人力资源在在企业中的作用人力资源为什么会存在，人力资源究竟是干什么的人力资源管理是对管理模式一次大的创新，人力资源兴起的原因有以下点：工业时代的国际化竞争，现代市场的风险管控等等。所以人力资源在现代经济竞争中的优势明显的存在，人力资源在集团类公司中存在着明显的优势(鸿海集团)，有一次笔者亲自去体验过红海集团的招聘，只知道人力资源是管理企业招聘的当时我被招聘上了，当时给我们培训的人
Linux自启动设置详解 bijian1013 linux
linux有自己一套完整的启动体系，抓住了linux启动的脉络，linux的启动过程将不再神秘。阅读之前建议先看一下附图。本文中假设inittab中设置的init tree为： /etc/rc.d/rc0.d /etc/rc.d/rc1.d /etc/rc.d/rc2.d /etc/rc.d/rc3.d /etc/rc.d/rc4.d /etc/rc.d/rc5.d /etc
Spring Aop Schema实现 bijian1013 java spring AOP
本例使用的是Spring2.5 1.Aop配置文件spring-aop.xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmln
【Gson七】Gson预定义类型适配器 bit1129 gson
Gson提供了丰富的预定义类型适配器，在对象和JSON串之间进行序列化和反序列化时，指定对象和字符串之间的转换方式， DateTypeAdapter public final class DateTypeAdapter extends TypeAdapter<Date> { public static final TypeAdapterFacto
【Spark八十八】Spark Streaming累加器操作（updateStateByKey) bit1129 update
在实时计算的实际应用中，有时除了需要关心一个时间间隔内的数据，有时还可能会对整个实时计算的所有时间间隔内产生的相关数据进行统计。比如：对Nginx的access.log实时监控请求404时，有时除了需要统计某个时间间隔内出现的次数，有时还需要统计一整天出现了多少次404，也就是说404监控横跨多个时间间隔。 Spark Streaming的解决方案是累加器，工作原理是，定义
linux系统下通过shell脚本快速找到哪个进程在写文件 ronin47
一个文件正在被进程写我想查看这个进程文件一直在增大找不到谁在写使用lsof也没找到这个问题挺有普遍性的，解决方法应该很多，这里我给大家提个比较直观的方法。 linux下每个文件都会在某个块设备上存放，当然也都有相应的inode, 那么透过vfs.write我们就可以知道谁在不停的写入特定的设备上的inode。幸运的是systemtap的安装包里带了inodewatch.stp，位
java-两种方法求第一个最长的可重复子串 bylijinnan java 算法
import java.util.Arrays; import java.util.Collections; import java.util.List; public class MaxPrefix { public static void main(String[] args) { String str="abbdabcdabcx";
Netty源码学习-ServerBootstrap启动及事件处理过程 bylijinnan java netty
Netty是采用了Reactor模式的多线程版本，建议先看下面这篇文章了解一下Reactor模式： http://bylijinnan.iteye.com/blog/1992325 Netty的启动及事件处理的流程，基本上是按照上面这篇文章来走的文章里面提到的操作，每一步都能在Netty里面找到对应的代码其中Reactor里面的Acceptor就对应Netty的ServerBo
servelt filter listener 的生命周期 cngolon filter listener servelt 生命周期
1. servlet 当第一次请求一个servlet资源时，servlet容器创建这个servlet实例，并调用他的 init(ServletConfig config)做一些初始化的工作，然后调用它的service方法处理请求。当第二次请求这个servlet资源时，servlet容器就不在创建实例，而是直接调用它的service方法处理请求，也就是说
jmpopups获取input元素值 ctrain JavaScript
jmpopups 获取弹出层form表单首先，我有一个div，里面包含了一个表单，默认是隐藏的，使用jmpopups时，会弹出这个隐藏的div，其实jmpopups是将我们的代码生成一份拷贝。当我直接获取这个form表单中的文本框时，使用方法：$('#form input[name=test1]').val()；这样是获取不到的。我们必须到jmpopups生成的代码中去查找这个值，$(
vi查找替换命令详解 daizj linux 正则表达式替换查找 vim
一、查找查找命令 /pattern<Enter> ：向下查找pattern匹配字符串 ?pattern<Enter>：向上查找pattern匹配字符串使用了查找命令之后，使用如下两个键快速查找： n：按照同一方向继续查找 N：按照反方向查找字符串匹配 pattern是需要匹配的字符串，例如： 1: /abc<En
对网站中的js,css文件进行打包 dcj3sjt126com PHP 打包
一，为什么要用smarty进行打包 apache中也有给js,css这样的静态文件进行打包压缩的模块，但是本文所说的不是以这种方式进行的打包，而是和smarty结合的方式来把网站中的js,css文件进行打包。为什么要进行打包呢，主要目的是为了合理的管理自己的代码。现在有好多网站，你查看一下网站的源码的话，你会发现网站的头部有大量的JS文件和CSS文件，网站的尾部也有可能有大量的J
php Yii: 出现undefined offset 或者 undefined index解决方案 dcj3sjt126com undefined
在开发Yii 时，在程序中定义了如下方式： if($this->menuoption[2] === 'test')，那么在运行程序时会报：undefined offset:2，这样的错误主要是由于php.ini 里的错误等级太高了，在windows下错误等级
linux 文件格式（1） sed工具 eksliang linux linux sed工具 sed工具 linux sed详解
转载请出自出处： http://eksliang.iteye.com/blog/2106082 简介 sed 是一种在线编辑器，它一次处理一行内容。处理时，把当前处理的行存储在临时缓冲区中，称为“模式空间”（pattern space），接着用sed命令处理缓冲区中的内容，处理完成后，把缓冲区的内容送往屏幕。接着处理下一行，这样不断重复，直到文件末尾
Android应用程序获取系统权限 gqdy365 android
引用如何使Android应用程序获取系统权限第一个方法简单点，不过需要在Android系统源码的环境下用make来编译： 1. 在应用程序的AndroidManifest.xml中的manifest节点
HoverTree开发日志之验证码 hvt .net C#asp.net hovertree webform
HoverTree是一个ASP.NET的开源CMS，目前包含文章系统，图库和留言板功能。代码完全开放，文章内容页生成了静态的HTM页面，留言板提供留言审核功能，文章可以发布HTML源代码，图片上传同时生成高品质缩略图。推出之后得到许多网友的支持，再此表示感谢！留言板不断收到许多有益留言，但同时也有不少广告，因此决定在提交留言页面增加验证码功能。ASP.NET验证码在网上找，如果不是很多，就是特别多
JSON API：用 JSON 构建 API 的标准指南中文版 justjavac json
译文地址：https://github.com/justjavac/json-api-zh_CN 如果你和你的团队曾经争论过使用什么方式构建合理 JSON 响应格式，那么 JSON API 就是你的 anti-bikeshedding 武器。通过遵循共同的约定，可以提高开发效率，利用更普遍的工具，可以是你更加专注于开发重点：你的程序。基于 JSON API 的客户端还能够充分利用缓存，
数据结构随记_2 lx.asymmetric 数据结构笔记
第三章栈与队列一．简答题 1. 在一个循环队列中，队首指针指向队首元素的前一个位置。 2.在具有n个单元的循环队列中，队满时共有 n-1 个元素。 3. 向栈中压入元素的操作是先移动栈顶指针&n
Linux下的监控工具dstat 网络接口 linux
1) 工具说明dstat是一个用来替换 vmstat,iostat netstat,nfsstat和ifstat这些命令的工具, 是一个全能系统信息统计工具. 与sysstat相比, dstat拥有一个彩色的界面, 在手动观察性能状况时, 数据比较显眼容易观察; 而且dstat支持即时刷新, 譬如输入dstat 3, 即每三秒收集一次, 但最新的数据都会每秒刷新显示. 和sysstat相同的是,
C 语言初级入门--二维数组和指针 1140566087 二维数组 c/c++指针
/* 二维数组的定义和二维数组元素的引用二维数组的定义：当数组中的每个元素带有两个下标时，称这样的数组为二维数组； (逻辑上把数组看成一个具有行和列的表格或一个矩阵); 语法：类型名数组名[常量表达式1][常量表达式2] 二维数组的引用：引用二维数组元素时必须带有两个下标，引用形式如下：例如： int a[3][4]; 引用：
10点睛Spring4.1-Application Event wiselyman application
10.1 Application Event Spring使用Application Event给bean之间的消息通讯提供了手段应按照如下部分实现bean之间的消息通讯继承ApplicationEvent类实现自己的事件实现继承ApplicationListener接口实现监听事件使用ApplicationContext发布消息