yuankangjian_2

无题

Overlapping Experiment Infrastructure:
More, Better, Faster Experimentation
Diane Tang, Ashish Agarwal, Deirdre O’Brien, Mike Meyer
Google, Inc.
Mountain View, CA
[diane,agarwal,deirdre,mmm]@google.com
ABSTRACT
At Google, experimentation is practically a mantra; we evaluate
almost every change that potentially affects what our users experi-ence. Such changes include not only obvious user-visible changes
such asmodifications to a user interface, but alsomore subtle changes
such as different machine learning algorithms that might affect rank-ing or content selection. Our insatiable appetite for experimenta-tion has led us to tackle the problems of how to run more experi-ments, how to run experiments that produce better decisions, and
how to run them faster. In this paper, we describe Google’s overlap-ping experiment infrastructure that is a key component to solving
these problems. In addition, because an experiment infrastructure
alone is insufficient, we also discuss the associated tools and ed-ucational processes required to use it effectively. We conclude by
describing trends that show the success of this overall experimental
environment. While the paper specifically describes the experiment
system and experimental processes we have in place at Google, we
believe they can be generalized and applied by any entity interested
in using experimentation to improve search engines and other web
applications.
Categories and Subject Descriptors
G.3 [Probability and Statistics]: Experimental Design—controlled
experiments, randomized experiments, A/B testing; I.2.6 [Learning]:
[real-time, automation, causality]
General Terms
Measurement, Design, Experimentation, Human Factors, Perfor-mance
Keywords
Controlled experiments, A/B testing, Website Testing, MultiVari-able Testing
1. INTRODUCTION
Google is a data-driven company, which means that decision-makers in the companywant empirical data to drive decisions about
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
KDD’10, July 25–28, 2010, Washington, DC, USA.
Copyright 2010 ACM 978-1-4503-0055-1/10/07 ...$10.00.
whether a change should be launched to users. This data is most
commonly gathered by running live traffic experiments. In the con-text of the web, an experiment consists of a representative segment
of traffic (i.e., incoming requests) and a change in what is served
to that segment of traffic relative to a control. Both user-visible
changes (e.g., changing the background color of the top ads) and
non-visible changes, such as testing a new algorithm for predicting
the clickthrough rate (CTR) of ads, can be tested via experimenta-tion.
One challenge for supporting this data-driven methodology is
keeping up with the rate of innovation. We want to be able to exper-iment with as many ideas as possible; limiting our rate of change
by the number of simultaneous experiments we can run is simply
not acceptable. We use experiments to test out new features and
to explore the space around existing features. For the latter, ex-periments are used to learn about user response and optimize what
is already running. Imagine that the way we determine what to
show on a search results page is parameterized, both in terms of
presentation and algorithms. Experiments explore this parameter
space by setting different values for the parameters, and we can use
the measured impact (with regards to user experience, revenue, and
other metrics) to determine where to move in this space to achieve
a better result.
While evaluating user response toUI changes is the typical use of
experiments, it is worth noting that experimentation is also needed
for testing algorithm changes. For example, suppose some team
wants to test a new machine learning algorithm for predicting the
CTR of ads, or even to test variations of an existing algorithm (e.g,
by adjusting the learning or shrinkage rate) [1, 10]. While offline
evaluation can be useful for narrowing down the space of options to
test, ultimately these options must be tested on live traffic in order
to evaluate how well a particular parameterization of an algorithm
works in practice (changes may impact user behavior and alter the
traffic pattern itself, which would not be caught in offline evalua-tion). Thus, the evaluation of such machine learning algorithms is
limited as much by having the space to experiment as by thinking
of alternatives to try.
The design goals for our experiment infrastructure are therefore:
more, better, faster.
More: We need scalability to run more experiments simultane-ously. However, we also need flexibility: different experiments
need different configurations and different sizes to be able to
measure statistically significant effects. Some experiments only
need to change a subset of traffic, say Japanese traffic only, and
need to be sized appropriately. Other experiments may change
all traffic and produce a large change in metrics, and so can be
run on less traffic.
Better: Invalid experiments should not be allowed run on live
traffic. Valid but bad experiments (e.g., buggy or unintention-ally producing really poor results) should be caught quickly and
disabled. Standardized metrics should be easily available for
all experiments so that experiment comparisons are fair: two
experimenters should use the same filters to remove robot traf-fic [7] when calculating a metric such as CTR.
Faster: It should be easy and quick to set up an experiment;
easy enough that a non-engineer can do so without writing any
code. Metrics should be available quickly so that experiments
can be evaluated quickly. Simple iterations should be quick to
do. Ideally, the system should not just support experiments, but
also controlled ramp-ups, i.e., gradually ramping up a change
to all traffic in a systematic and well-understood way.
To meet these design goals, we need not only an experiment infras-tructure for running more experiments, but also tools and educa-tional processes to support better and faster experimentation.
For the experiment infrastructure, the obvious solutions are ei-ther to have a single layer of experiments or to have multi-factorial
experiment design. A single layer means that every query is in at
most one experiment, which is easy-to-use and flexible, but simply
insufficiently scalable. Multi-factorial experimental design is com-mon in statistical literature [3], where each parameter (factor) can
be experimented on independently; each experimental value for a
parameter overlaps with every other experiment value for all of the
other parameters. Effectively, each query would be in N experi-ments simultaneously, where N equals the number of parameters.
While this approach is backed by years of research and practice,
it is impractical in Google’s system where we have thousands of
parameters that cannot necessarily be varied independently. One
simple example is a two parameter system, one for the background
color of a web page and another for the text color. While “blue”
may be a valid value for both, if both parameters are blue at the
same time, then the page will be unreadable.
The solution we propose in this paper is to partition the param-eters into subsets, and each subset contains parameters that cannot
be varied independently of each other. A subset is associated with
a layer that contains experiments, and traffic diversion into experi-ments in different layers is orthogonal. Each query here would be
in N experiments, where N equals the number of layers. While this
solution may not sound novel in retrospect, we can find no pub-lished papers with this solution.
In this paper, we discuss this layered overlapping experiment in-frastructure as well as the associated tools and processes we have
put into place to support more, better, and faster experimentation,
and show results that support how well we have met those goals.
Note that while this paper is specific to Google web search, the
more general problem of supporting massive experimentation ap-plies to any company or entity that wants to gather empirical data
to evaluate changes. The responses and data gathered may differ,
but the approach outlined in this paper should be generalizable.
2. RELATED WORK
Work related to experimentation falls roughly in three areas. The
first area is the copious statistical literature onmulti-factorial exper-iments, which we briefly discuss in Section 4.
The second area is the growing body of work on how to run web
experiments. Kohavai et al. wrote an outstanding paper in KDD
that provides both a survey and tutorial on how to run controlled ex-periments for the web [7]. There are several follow-up papers that
describe various pitfalls when running and analyzing these experi-ments [4, 9]. Krieger has a presentation on how to run A-B tests [8]
and Google WebSite Optimizer helps web site designers run their
Figure 1: A sample flow of a query through multiple binaries.
Information (and time) flows from left to right.
own A-B tests [5]. In general, these papers focus more on how
to design and evaluate web-based experiments. The first Kohavi
paper is perhaps the most relevant, since several design considera-tions about how to build an infrastructure for running experiments
are discussed, including how to divert traffic into experiments, con-sistency across queries, interactions, etc. However, none of these
papers really addresses the issues involved in scaling an experiment
infrastructure and the overall experimentation environment to sup-port running more experiments more quickly and more robustly.
The final area of related work is on interleaved experiments [6],
which is focused on a specific type of experiment design (intra-query) used for evaluating ranking changes. This type of experi-ment may easily be run within the experimental environment that
we describe in this paper.
3. BACKGROUND
Before we discuss experimentation at Google, we first describe
the environment since it affords opportunities for the design of our
infrastructure as well as constraints. Note that in this paper, when
we refer to Google or Google’s serving infrastructure, we are re-ferring to the web search infrastructure only, and not Google Apps,
Android, Chrome, etc.
At a high level, users interact with Google by sending requests
for web pages via their browser. For search results pages, the re-quest comes into Google’s serving infrastructure andmay hit multi-ple binaries (i.e., programs running on a server machine) before re-turning the results page to the user. For example, there may be one
binary that is responsible for determining which organic search re-sults are most relevant to the query, another for determining which
ads are most relevant to the query, and a third for taking both or-ganic and ad results and formatting the resulting web page to return
to the user (see Figure 1). On the one hand, this modularization al-lows us to improve latency (non-dependent processes can run in
parallel), clearly separate organic results from ads so that one can-not influence the other, and innovate faster (each binary can evolve
separately and is a smaller piece to test allowing for a quicker re-lease cycle). On the other hand, this modularization can require
more careful design if every request is allowed to be in at most one
experiment. Possible problems include starvation (upstream bina-ries can starve downstream binaries by allocating all requests to be
in experiments prior to the requests being sent downstream) and
bias (e.g., an upstream binary may run experiments on all English
traffic, leaving downstream binaries with non-English traffic).
Each binary has an associated binary push and data push. The
binary push is when new code (bug fixes, performance enhance-ments, new features, etc.) is incorporated and rolled out to handle
live serving; it happens periodically (e.g., weekly). The data push
happens more frequently (i.e., on demand or every few hours), and
involves pushing updated data to the associated binary. One type
of data included in a data push has to do with the default values for
parameters that configure how the binary runs. For example, the bi-nary that controls how results are presented may have a parameter
that determines the background color of the top ads block. An-other example is a binary that predicts the CTR of ads might have a
parameter that controls the rate of learning or shrinkage, i.e., a pa-rameter that controls the step size the algorithm takes during each
iteration, impacting both the convergence and local optima chosen.
Binaries may have several hundred parameters. New features will
likely add one or more parameters: in the simplest case, a single
parameter to turn on or off the new feature, and in more complex
cases, there may be parameters that govern how the new feature is
formatted, numeric thresholds to determine when the new feature
is shown, etc. Having separate binary and data pushes means that if
we can find the right split, we can take advantage of having both a
slow and a fast path for making changes to serving (slow for code,
fast for new values for parameters).
An experiment in web search diverts some subset of the incom-ing queries to an alternate processing path and potentially changes
what is served to the user. A control experiment diverts some sub-set of incoming queries, but does not change what is served to the
user. We use the data push to specify and configure experiments.
Thus, in the data push, there is a file (as discussed above) for spec-ifying the default values for the parameters for a binary. There
is another file for experiments that specifies how to change what
is served to the user by providing alternate values for parameters.
Experiments need only specify the parameters that they change;
for all other parameters, the default values are used. For exam-ple, one simple experiment would change only one parameter, the
background color of the top ads, from yellow (the default value) to
pink.
In addition to specifying how serving is changed via alternate pa-rameter values, experiments must also specify what subset of traffic
is diverted. One easy way to do experiment diversion is random
traffic, which is effectively flipping a coin on every incoming query.
One issue with random traffic experiment diversion is that if the
experiment is a user-visible change (e.g., changing the background
color), the queries from a single user may pop in and pop out of
the experiment (e.g., toggle between yellow and pink), which can
be disorienting. Thus, a common mechanism used in web exper-imentation is to use the cookie as the basis of diversion; cookies
are used by web sites to track unique users. In reality, cookies are
machine/browser specific and easily cleared; thus, while a cookie
does not correspond to a user, it can be used to provide a consis-tent user experience over successive queries. For experiment diver-sion, we do not divert on individual cookies, but rather a cookie
mod: given a numeric representation of a cookie, take that number
modulo 1000, and all cookies whose mod equals 42, for example,
would be grouped together for experiment diversion. Assuming
cookie assignment is random, any cookie mod should be equiva-lent to any other cookie mod. Cookie mods are also easy to specify
in an experiment configuration and make it easy to detect conflicts:
experiment 1 may use cookie mods 1 and 2, while experiment 2
may use cookie mods 3 and 4. Those two experiments would be
the same size and, in theory, have comparable traffic.
Configuring experiments in data files makes experiments easy
and fast to create: the data files are human readable and easy to
edit, they are not code so that experiments can be created by non-engineers, and they are pushed more frequently than code allowing
for a “fast path” for experiment creation involving existing param-eters only.
Prior to developing our overlapping experiment infrastructure,
we used a basic single layer infrastructure. In such an infrastruc-ture, each query is in at most one experiment. Cookie-mod-based
experiments were diverted first, followed by random-traffic based
experiments. Upstream binaries got “first dibs” on a query, and
if the upstream binaries were running enough experiments, then
downstream binaries could be starved for traffic to run experiments
on. While there were several issues (including having to solve the
starvation and bias issues mentioned above), this single layer in-frastructure did meet a few of our design goals: it was easy to use
and reasonably flexible. However, given Google’s data-driven cul-ture, the single layer approach is not sufficiently scalable: we can-not run enough experiments fast enough.
4. OVERLAPPING EXPERIMENT INFRAS-TRUCTURE
In this section, we describe the overlapping experiment infras-tructure, which tries to keep the advantages of the single layer sys-tem (ease of use, speed) while increasing scalability, flexibility and
robustness. We also enable the gradual ramping-up of launches in
a controlled, well-defined fashion.
The obvious statistical solution is amulti-factorial system, where
each factor corresponds to a changeable parameter in the system.
Effectively, a request would be in N simultaneous experiments,
where each experiment would modify a different parameter and
N equals the number of parameters. Multi-factorial experiments
are backed by copious theory and practice [3]. However, a multi-factorial system is simply not feasible in our complex environment,
since not all parameters are independent and not all values that we
may want to test for a parameter work with the values for another
parameter (e.g., pink text color on a pink background). In other
words, Google has to always serve a readable, working web page.
Given this constraint, our main idea is to partition parameters
into N subsets. Each subset is associated with a layer of exper-iments. Each request would be in at most N experiments simul-taneously (one experiment per layer). Each experiment can only
modify parameters associated with its layer (i.e., in that subset),
and the same parameter cannot be associated with multiple layers.
The obvious question is how to partition the parameters. First,
we can leverage the modularization into multiple binaries: parame-ters fromdifferent binaries can be in different subsets (which solves
the starvation and bias issues mentioned above). However, all pa-rameters for a given binary do not need to be in a single subset: we
can further partition the parameters within a binary either by ex-amination (i.e., understanding which parameters cannot be varied
independently of one another) or by examining past experiments
(i.e., empirically seeing which parameters were modified together
in previous experiments). Looking at Figure 1, we could have one
or more layers each for the web server, the server for organic search
results, and the server for ads.
In fact, the system that we designed is more flexible than sim-ply partitioning the parameters into subsets that are then associated
with layers. To explain the flexibility, we introduce several defini-tions. Working within the space of incoming traffic and the system
parameters, we have three key concepts:
A domain is a segmentation of traffic.
A layer corresponds to a subset of the system parameters.
An experiment is a segmentation of traffic where zero or more
system parameters can be given alternate values that change
how the incoming request is processed.
We can nest domains and layers. Domains contain layers. Layers
contain experiments, and can also contain domains. Nesting a do-main within a layer allows for the subset of parameters associated
Figure 2: A diagram of (a) basic overlapping set-up with three layers and (b) a set-up with both non-overlapping and overlapping
domains, (c) a set-up with non-overlapping, overlapping, and launch domains, and (d) a complex set-up with multiple domains. An
incoming request would correspond to a vertical slice through the configuration; i.e., an incoming request in (b) would either be in
a single experiment (in the non-overlapping domain) or in at most three experiments, one each for the UI layer, search results layer,
and ads results layer.
with the layer to be partitioned further within that nested domain.
To get us started, we have the default domain and layer that contain
both all traffic and all parameters. Within the default domain and
layer, we could, for example:
Simply segment the parameters into three layers (Figure 2a).
In this case, each request would be in at most three experiments
simultaneously, one for each layer. Each experiment could only
modify the parameters corresponding to that layer.
First segment traffic into two domains. One domain could be a
domain with a single layer (the non-overlapping domain), and
the other domain would be the overlapping domain with three
layers (Figure 2b). In this case, each request would first be
assigned to either the non-overlapping or overlapping domain.
If the request was in the non-overlapping domain, then the re-quest would be in at most one experiment (and could change
any parameter in the entire space of parameters). If the request
was in the overlapping domain, then the request would be in
at most three experiments, one for each layer, and each experi-ment could only use the parameters corresponding to that layer.
While this nesting may seem complex, it affords several advan-tages. First, having a non-overlapping domain allows us to run
experiments that really need to change a wide swath of parameters
that might not normally be used together. Next, it allows us to have
different partitionings of parameters; one could imagine three do-mains, one non-overlapping, one overlapping with one partitioning
of the parameters, and a third overlapping domain with a different
parameter partitioning. Finally, the nesting allows us to more effi-ciently use space, depending on which partitionings are most com-monly used, and which cross-layer parameter experiments are most
commonly needed. Note that it is easy to move currently unused
parameters from one layer to another layer, as long as one checks
to make sure that the parameters can safely overlap with the param-eters in the original layer assignment
1
. Also note that to ensure
that the experiments in different layers are independently diverted,
for cookie-mod based experiments, instead of mod = f(cookie) %
1000, we use mod = f(cookie, layer) % 1000. While this nesting
1
Sociologically, we have observed that if layers have semantically
meaningful names, e.g., the “Ad Results Layer” and the “Search
Results Layer”, engineers tend to be reluctant to move flags that
violate that semantic meaning. Meaningful names can help with
robustness by making it more obvious when an experiment config-uration is incorrect, but it can also limit the flexibility that engineers
will take advantage of.
complexity does increase flexibility, there is a cost to changing the
configuration, especially of domains: changing how traffic is allo-cated to domains changes what traffic is available to experiments.
For example, if we change the non-overlapping domain from 10%
of cookie mods to 15%, the additional 5% of cookie mods comes
from the overlapping domain and cookies that were seeing experi-ments from each layer in the the overlapping domain are now see-ing experiments from the non-overlapping domain.
An additional concept is that of launch layers. Launch layers
differ from the experiment layers discussed up to this point in sev-eral key ways:
Launch layers are always contained within the default domain
(i.e., they run over all traffic).
Launch layers are a separate partitioning of the parameters, i.e.,
a parameter can be in at most one launch layer and at most one
“normal” layer (within a domain) simultaneously.
In order to make this overlap of parameters between launch
and normal layers work, experiments within launch layers have
slightly different semantics. Specifically, experiments in launch
layers provide an alternative default value for parameters. In
other words, if no experiments in the normal experiment lay-ers override a parameter, then in the launch layer experiment,
the alternate default value specified is used and the launch layer
experiment behaves just like a normal experiment. However,
if an experiment in the normal experiment layer does override
this parameter, then that experiment overrides the parameter’s
default value, regardless of whether that value is specified as
the system default value or in the launch layer experiment.
Examples of launch layers are shown in Figure 2c,d. Defining
launch layers in this way allows us to gradually roll out changes to
all users without interfering with existing experiments and to keep
track of these roll-outs in a standardized way. The general usage of
launch layers is to create a new launch layer for each launched fea-ture and to delete that layer when the feature is fully rolled out (and
the new parameter values are rolled into the defaults). Finally, be-cause experiments in launch layers are generally larger, they can be
used to test for interactions between features. While in theory we
can test for interactions in the normal experiment layers (assuming
that we either set up the experiments manually if the parameters
are in the same layer or look at the intersection if the parameters
are in different layers), because experiments are smaller in the nor-mal layers, the intersection is smaller and therefore interactions are
harder to detect.
Recall that both experiments and domains operate on a segment
of traffic (we call this traffic the “diverted” traffic). Diversion types
and conditions are two concepts that we use in order to determine
what that diverted segment of traffic is.
We have already described two diversion types earlier in Sec-tion 3, namely cookie-mods and random traffic. Also discussed
above is how cookie mod diversion changes with layers to also in-corporate the layer id (mod = f(cookie, layer) % 1000) to ensure
orthogonality between layers. Two other diversion types that we
support are user-id mods and cookie-day mods. User-id mods are
like cookie mods, except that we use the signed-in user id instead
of the cookie. For cookie-day mods, we take the mod of the cookie
combined with the day, so that for a given day, a cookie is in an
experiment or not, but the set of cookies in an experiment changes
from day-to-day. In all cases, there is no way to configure an exper-iment so that a specific cookie or user gets diverted to it. Similarly,
analyses always use aggregates over groups of queries, cookies,
or users. Also note that while we currently support four diversion
types, we could support other diversion types, e.g., by hashing the
query string.
The main reasons for different diversion types are to ensure con-sistency across successive queries and to potentially uncover any
learning effects over time. Given these reasons, we divert traffic
by diversion type in a particular order: user id, cookie, cookie-day,
and finally random traffic. Once an event meets the criteria for a
particular experiment in one diversion type, it is not considered by
the remaining diversion types (see Figure 3). While this order en-sures maximal consistency, the one downside is, for example, that a
1% random traffic experiment gets fewer requests than a 1% cookie
mod experiment in the same layer. At the extreme, we can see the
same starvation effect that we used to see between upstream and
downstream binaries. In practice, layers tend to have a predom-inant diversion type, and experiments and controls must have the
same diversion type. The main impact is that different diversion
types require different experiment sizes (see Section 5.2.1).
After selecting a subset of traffic by diversion type, conditions
provide better utilization of this traffic by only assigning specific
events to an experiment or domain. For example, an experiment
that only changes what gets served in queries coming from Japan
may include a “Japan” condition. We support conditions based on
country, language, browser, etc. With conditions, an experiment
that only needs Japanese traffic can use the same cookie mod as
another experiment that only needs English traffic, for example.
Another use of conditions is to canary new code (the code itself is
pushed via a binary push), i.e., test new code on a small amount of
traffic and make sure that the new code is not buggy and works as
expected before running it on more traffic (the canary is checked
for bugs via error logs from the binaries and real-time monitoring
of metrics). To support this use case, we provide conditions based
on machine or datacenter to further restrict traffic to an experiment.
While canary experiments do not replace rigorous testing, they are
a useful supplement, since it both limits the potential damage while
subjecting the new code to the variety of requests in live traffic that
is hard to duplicate in a test.
Conditions are specified directly in the experiment (or domain)
configuration, allowing us to do conflict detection based on the data
files at experiment creation time. As mentioned in the diversion
type section, once an event meets the diversion-type criteria for
an experiment in one diversion type, it is not considered by the
remaining diversion types even if it does not meet the conditions
for assignment to an experiment in the earlier diversion type. The
importance of this is perhaps best explained by example. If we
take all traffic corresponding to a particular cookie mod, we have
an unbiased diversion. But consider the case of two experiments
on a given cookie mod – one conditioned on Japanese traffic, the
other conditioned on English traffic. The rest of the traffic (traffic
in languages other than Japanese and English) for the same cookie
mod will not be assigned to any experiment in the cookie diversion
type. To avoid biased traffic in subsequent diversion types, it is
important that this otherwise available traffic (events meeting the
diversion-type criteria but not meeting any experiment conditions)
not be assigned to experiments in subsequent diversion types. We
avoid this by tagging this unassigned traffic with a biased id.
Figure 3 shows the logic for determining which domains, lay-ers, and experiments a request is diverted into. All of this logic is
implemented in a shared library compiled into the binaries, so that
any changes (e.g., new types of conditions, new diversion types,
etc.) are incorporated into all binaries during their regular binary
pushes. Given the complexity of the implementation, a shared li-brary allows for a consistent implementation across binaries and
means that new functionality automatically gets shared.
Given this infrastructure, the process of evaluating and launching
a typical feature might be something like:
Implement the new feature in the appropriate binary (including
code review, binary push, setting the default values, etc. as per
standard engineering practices).
Create a canary experiment (pushed via a data push) to ensure
that the feature is working properly. If not, then more code may
need to be written.
Create an experiment or set of experiments (pushed via a data
push) to evaluate the feature. Note that configuring experiments
involve specifying the diversion type and associated diversion
parameters (e.g., cookie mods), conditions, and the affected
system parameters.
Evaluate the metrics from the experiment. Depending on the
results, additional iteration may be required, either by modify-ing or creating new experiments, or even potentially by adding
new code to change the feature more fundamentally.
If the feature is deemed launchable, go through the launch pro-cess: create a new launch layer and launch layer experiment,
gradually ramp up the launch layer experiment, and then finally
delete the launch layer and change the default values of the rele-vant parameters to the values set in the launch layer experiment.
5. TOOLS & PROCESSES
While having the overlapping infrastructure is necessary to be
able to scale to running more experiments faster and evalute more
changes concurrently, the infrastructure by itself is not sufficient.
We also need tools, research, and educational processes to support
the faster rate of experimentation. In this section, we discuss sev-eral key tools and processes, and how they have helped us scale.
5.1 Tools
Data File Checks: One advantage of data files is that they can be
automatically checked for errors, which leads to fewer broken ex-periments being run. We have automated checks for syntax errors
(are all the required fields there and parseable), consistency and
constraints errors (i.e., uniqueness of id’s, whether the experiment
is in the right layer given the parameters used, whether the layer has
enough traffic to support the experiment, traffic constraint checks:
is the experiment asking for traffic already claimed by another ex-periment, etc.; note that these checks can get tricky as the set of
possible diversion conditions grows), and basic experiment design
checks (does the experiment have a control, is the control in the
Figure 3: Logic flow for determining which domains, layers, and experiments a query request is in.
same layer as the experiment, does the control divert on the same
set of traffic as the experiment, etc.).
Real-timeMonitoring: We use real-timemonitoring to capture ba-sic metrics (e.g., CTR) as quickly as possible in order to determine
if there is something unexpected happening. Experimenters can set
the expected range of values for the monitored metrics (there are
default ranges as well), and if the metrics are outside the expected
range, then an automated alert is fired. Experimenters can then ad-just the expected ranges, turn off their experiment, or adjust the
parameter values for their experiment. While real-time monitoring
does not replace careful testing and reviewing, it does allow ex-perimenters to be aggressive about testing potential changes, since
mistakes and unexpected impacts are caught quickly.
5.2 Experiment Design & Sizing
Experiment design and sizing go beyond the basic checks per-formed on data files (e.g., that every experiment must have a control
that uses the same diversion conditions).
5.2.1 Sizing
As Kohavi mentions [7], experiments should be sized to have
enough statistical power to detect as small a metric change as con-sidered interesting or actionable. In this section, we discuss both
how to size experiments and the dependency on experiment set-up,
as well as an associated experiment sizing tool.
Define the effective size of an experiment as:
N = (1=queries
control + 1=queries
experiment )
1
. In practice we are interested in the individual terms queries
control
and queries
experiment
, but it is through N that these affect the vari-ance of the relative metric estimates. To determine N correctly, we
need to know:
Which metric(s) the experimenter cares about,
For each metric, what change the experimenter would like to
detect (q ), e.g., the experimenter wants to be able to detect a
2% change in click through rate,
For each metric, the standard error for a one unit (i.e. N = 1)
sample (s). Thus the standard error for an experiment of size N
is s=
p
N.
Kohavi assumes that the experiment and control are the same
size, i.e., queries
experiment = 2N and so must be greater than or
equal to 16(s=q )
2
to meet the detection requirement. The number
16 is determined both by the desired confidence level (1   a , often
95%) and desired statistical power (1   b , often 80%).
One advantage of our overlapping set-up is that we can create
a large control in each layer that can be shared among multiple
experiments. If the shared control is much larger than the experi-ment (1=queries
control +1=queries
experiment 1=queries
experiment ),
then we can use queries
experiment = N rather than 2N, leading to
a smaller experiment size of queries
experiment = N 10:5(s=q )
2
while gaining statistical power (1   b ) of 90% [2].
The bigger issue we encountered in sizing experiments is how
to estimate s, the standard error, especially since we use many ra-tio metrics, y=z (e.g., coverage, the percentage of queries on which
we show ads (queries with an ad / total queries)). The problem
arises when the unit of analysis is different than the experimental
unit. For example, for coverage, the unit of analysis is a query, but
Figure 4: Slope for calculating s for coverage by diversion type.
for cookie-mod experiments, the experimental unit is a cookie (a
sequence of queries) and we cannot assume that queries from the
same user or cookie are independent. Our approach is to calculate
s0, the standard error per experimental unit and then write s in terms
of s0, e.g. here s0 would be the standard error per cookie-mod, and
s = s0
p
avg queries per cookie
_
mod. For ratio metrics, we calcu-late s0 using the delta method [11].
Figure 4 shows the standard error against 1=
p
N for the cover-age metric in different experiments—both cookie-mod and random
traffic experiments. The slope of the line gives s. While the axis
labels are elided for confidentiality, it is apparent that the slope of
the cookie line is much steeper than the slope of the query line, i.e.,
to measure the same change in coverage with the same precision, a
cookie mod experiment will need to be larger than the correspond-ing random traffic experiment.
Since s differs by both metric and by diversion type, rather than
having experimenters calculate these values individually, we pro-vide a sizing tool. Experimenters specify the metric(s) and the
change they want to be able to detect, the diversion type (e.g.,
cookie-mod vs. random traffic), and what segment of traffic they
are diverting on (e.g., the conditions, such as Japanese traffic only).
This tool then tells the experimenter how much traffic their experi-ment will need to detect the desired change with statistical signifi-cance. Experimenters can easily explore the trade-offs with regards
to what size change can be detected with what amount of traffic.
With a single canonical tool, we gain confidence that experiments
will be sized properly before they are run.
To gather data for our sizing tool, we constantly run a set of uni-formity trials, i.e., we run many controls or A vs. A experiments,
varying both experiment size and diversion type. We can use the
results to empirically measure the natural variance of our metrics
and test the accuracy of our calculated confidence intervals.
5.2.2 Triggering, Logging, & Counter-factuals
As a reminder, diversion refers to the segment of traffic in an ex-periment. However, an experiment may not actually change serving
on all diverted requests. Instead, the experiment may trigger only
on a subset of the diverted requests. For example, an experiment
that is testing when to show weather information on a query may
get all traffic diverted to it, but only show the weather information
on a subset of those queries; that subset is called the trigger set.
Often, we cannot divert only on the trigger set since determin-ing which requests would trigger the change may require addi-tional processing at runtime; this need for additional processing
is why triggers cannot be implemented as conditions (the informa-tion is available too late in the control flow). Thus, it is impor-tant to log both the factual (when the experiment triggered) and the
counter-factual (when the experiment would have triggered). The
counter-factual is logged in the control. For example, in the exper-iment mentioned above, the factual (when the weather information
is shown) is logged in the experiment, while the counter-factual
is logged in the control, i.e., when the weather information would
have been shown on this query (but was not since this is the con-trol). This logging is important for both sizing the experiment and
analyzing the experiment, since including the unchanged requests
dilutes the measured impact of the experiment. Restricting to the
trigger set allows experimenters to measure their impact more ac-curately. In addition, by focusing on the larger effect in the trigger
set, the amount of traffic needed is reduced since the effective size
of the experiment depends on the inverse of the square of the effect
that we aim to detect (1=q
2
).
5.2.3 Pre- & post-periods
A pre-period is a period of time prior to starting the experiment
where the same traffic (i.e., the same cookie mods) is diverted into
the experiment but no changes are made to serving. A post-period
is the same thing, but after the experiment. Both periods are akin to
comparing a control to a control but using the traffic that is actually
diverted to the experiment. Pre-periods are useful for ensuring that
the traffic diverted into an experiment really is comparable to its
control and does not have any issues, for example with uncaught
spam or robots. Post-periods are useful for determining if there are
any learned effects from running the experiment. These techniques
only apply to user-id mod and cookie mod experiments.
5.3 Fast Analytics
While the infrastructure and tools mentioned so far enable many
simultaneous experiments and expedite running an experiment, the
actual experimentation process will not be substantially faster un-less experiment analysis is also addressed. A full discussion of
experimental analysis tools is beyond the scope of this paper, but
we briefly discuss the main design goals here.
The primary goal of the analysis tool is to provide accurate val-ues for the suite of metrics that experimenters examine to evaluate
their experiment. At Google, rather than combining multiple met-rics into a single objective function, we examine a suite of metrics
to more fully understand how the user experience might change
(e.g., how quickly the user can parse the page, how clicks might
move around, etc.). Note that live traffic experiments can onlymea-sure what happens and not why these changes happen.
Beyond accuracy and completeness, other key design goals for
an experiment analysis tool include:
Correctly computed and displayed confidence intervals: exper-imenters need to understand whether the experiment simply has
not received enough traffic (confidence intervals are too wide)
or whether the observed changes are statistically significant.
We have researched a number of ways of calculating accurate
confidence intervals and while a full discussion is beyond the
scope of this paper, we note that we have considered both delta
method approaches (as mentioned previously) and an alternate
empirical method for computing confidence intervals: carve the
experiment up into smaller subsets and estimate the variance
from those subsets. Also note that care must be taken in look-ing at multiple metrics and experiments, since if enough are
examined, some value will randomly be shown as significant.
A good UI: it needs to be easy to use and easy to understand.
Graphs are useful; even simple graphs like sparklines can help
visualize if the aggregate change is consistent over time peri-ods. The UI should also point out when improper compar-isons aremade (e.g., comparing experiments across layers), and
make it easy to change which experiments are compared, the
time period, etc.
Support for slicing: aggregate numbers can often be mislead-ing, as a change may not be due to the metric actually changing
(e.g., CTR changing), but may rather be due to a mix shift (e.g.,
more commercial queries). These Simpson’s paradoxes are im-portant to spot and understand, as Kohavi mentions [4].
Extensibility: it must be easy to add custom metrics and slic-ings. Especially for new features, the existing suite of metrics
and slicings may be insufficient.
Having a single tool to provide accurate metrics for experiments
means that we have a single consistent implementation, with agreed
upon filters (e.g., to remove potential robot traffic or spam), so that
different teams that measure CTR know that their values are com-parable. A single tool is also more efficient, since the computation
is done once and presented to the users, rather than each experi-menter running their own computations.
5.4 Education
While the overlapping infrastructure and accompanying tools and
experiment design address the technical requirements to enablemore,
better, faster experimentation, we also need to address the people-side. Education is equally necessary to facilitate robust experimen-tation. At Google, two processes have helped to ensure that exper-iments are well-designed and that the results of an experiment are
understood and disseminated.
5.4.1 Experiment Council
The first process is something that we call experiment council,
which consists of a group of engineers who review a light-weight
checklist that experimenters fill out prior to running their experi-ment. Questions include:
basic experiment characterization (e.g., what does the experi-ment test, what are the hypotheses),
experiment set-up (e.g., which experiment parameters are var-ied, what each experiment or set of experiments tests, which
layer),
experiment diversion and triggering (e.g., what diversion type
and which conditions to use for diversion, what proportion of
diverted traffic triggers the experiment),
experiment analysis (e.g., whichmetrics are of interest, howbig
of a change the experimenter would like to detect),
experiment sizing and duration (to ensure that, given the af-fected traffic, the experiment has sufficient statistical power to
detect the desired metric changes),
experiment design (e.g., whether a pre- or post-period is war-ranted, whether counter-factual logging is correctly done, etc.).
First-time experimenters learn about proper experiment design and
sizing as well as the technical details behind implementing the
experiment. Repeat experimenters find the checklist light-weight
enough to still be useful. Moreover, the process is a useful way for
disseminating updated best practices with regards to experimenta-tion (e.g., pointers to new tools to facilitate experimentation, new
metrics that may be useful, etc.). The checklist is hosted on a web
application, which is useful both for archival purposes as well as
educational: new experimenters can read past checklists to under-stand the issues.
5.4.2 Interpreting the Data
The other process we put in place is a forum where experimenters
bring their experiment results to discuss with experts. The goal of
the discussion is to:
Ensure that the experiment results are valid. There are times,
even with experiment council, where something in the actual
implementation goes awry, or something unexpected happens.
In those cases, the discussion is as much a debugging session
as anything else. Having experts familiar with the entire stack
of binaries, logging, experiment infrastructure, metrics, and an-alytical tools is key.
Given valid results, make sure that the metrics being looked at
are a complete set with regards to understanding what is hap-pening. Other ways of slicing the data or metric variations may
be suggested to gain a more complete understanding of the im-pact of an experiment. Some experiments are complex enough
that experimenters come multiple times with follow-ups.
Given a full set of results, discuss and agree on whether overall
the experiment is a positive or negative user experience, so that
decision-makers can use this data (combined with other strate-gic or tactical information) to determine whether to launch this
change, suggest possible refinements, or give up.
The discussion forum is useful for experimenters to better learn
how to interpret experiment results. Repeat experimenters gener-ally do not make the same mistakes and can anticipate what anal-ysis needs to be done to gain a full understanding. The discussion
forum is also open, so that future experimenters can attend in order
to learn in advance of running their experiment. Experiments are
also documented so that we have a knowledge repository.
6. RESULTS
We first deployed our overlapping experiment infrastructure in
March 2007 (various tools and processes pre-dated or post-dated
the infrastructure launch). Ultimately, the success of this overall
system is measured in how well we met our goals of running more
experiments, running them better, and getting results faster.
6.1 More
We can use several measures to determine if we are successfully
running more experiments: how many experiments are run over
a period of time, how many launches resulted from those experi-ments, and how many different people run experiments (see Fig-ure 5). For the number of experiments, note that we are including
control experiments in the count. For the count of unique entities,
some experiments have multiple owners (e.g., in case anyone is out
of town and something happens) or team mailing lists included as
owners, both of which are included in the count. Unfortunately, we
do not have an easy way to determine how many owners are non-engineers, but anecdotally, this number has increased over time.
For the number of launches, we only present the numbers after the
launch of overlapping experiments. Prior to overlapping experi-ments, we used other mechanisms to launch changes; after overlap-
Figure 5: Graphs showing the trend over time for the number of experiments, people running experiments, and launches.
ping experiments, we still use other mechanisms to launch changes,
but with decreasing frequency. The y-axes on all of these graphs
have been elided for confidentiality (they are on a linear scale), but
the trends are clear in showing that we have enabled nearly an order
of magnitude more experiments, more launches, and more people
running experiments with this overall system.
6.2 Better
Another measure for the success of our overall system, tools, and
education is whether the experiments we run now are better than
before. We only have anecdotal data here, but we are members of
experiment council and the discussion forum and have seen many
of the experiments before and after this system was deployed. Our
observations are that we see:
Fewer misconfigured experiments, althoughwe do still encounter
the occasional logging issue (for counter-factuals) or weird er-ror / failure cases.
Fewer forgotten experiments (i.e., people who start experiments
and then forget to analyze them).
Fewer discussions about “what exactly are you measuring here
for CTR” or “what filters are you using”; with a canonical anal-ysis tool, the discussion can focus now solely on the interpreta-tion of the metrics rather than onmaking sure that the definition
and calculation of the metrics makes complete sense.
Better sanity checks, e.g., with pre-periods, to ensure that there
are no issues with the traffic being sent to the experiment.
While ideally “fewer” would actually be “none”, overall it seems
that there are fewer mistakes and problems being made in the de-sign, implementation, and analysis of experiments despite the fact
that we have even more people running even more experiments.
6.3 Faster
A final measure of the success of our overall system is whether
we are able to ultimately get data faster and make decisions more
quickly. For speed, we again do not have empirical data, but we
can discuss the general perception of the speed of experimentation
here. Experimentation can be broken up into several phases:
Implementing a new feature to test it out. This phase is now the
slowest part, and so we have built other tools (beyond the scope
of this paper) to expedite building and testing prototypes (i.e.,
separating out the process of building something for experimen-tation purposes vs. building the production-ready version).
Pushing a new experiment to serving given an implemented fea-ture. This phase can take minutes to hours to create depending
on the complexity of the parameters, a negligible amount of
time to run the pre-submit checks (seconds to minutes), and
then a time comparable to creation time to review. The time
needed for the data push depends on the binary, but ranges from
1-2 hours (which includes time running a canary) to half a day.
Running the experiment depends on the sizing and how long
it takes to get statistically significant numbers. We can typi-cally get at least a feel for what is happening with some basic
metrics within a few hours after the experiment starts running.
The total duration depends, both on the number of iterations the
experiment needs, the sizing, the urgency, etc.
Analyzing the experiment is also variable. In many cases, no
custom analysis is needed at all, or only slight extensions to the
analysis tool. In those cases, the analysis can often be reason-ably quick (days). However, in other cases, custom analysis is
still needed in which case the analysis time is highly variable.
Overall, the current pain points include the time needed to imple-ment an experiment, some time waiting for a pre-period to run, and
in custom analysis. Those are all areas we are still working on.
7. CONCLUSIONS AND FUTURE WORK
In this paper, we have described the design of an overlapping
experiment infrastructure and associated tools and educational pro-cesses to faciliate more experimentation, better andmore robust ex-perimentation, and faster experimentation. We have also given re-sults that show the practical impact of this work: more experiments,
more experimenters, more launches, all faster and with fewer er-rors. While the actual implementation is specific to Google, the
discussion of the design choices throughout the paper should be
generalizable to any any entity that wants to gather empirical data
to evaluate changes.
There are several areas in which we are continuing to improve
our experiment infrastructure, including:
Expediting the implementation of a new feature and facilitating
radically different experiments (beyond what can be expressed
via parameters).
Pushing on the limit of whether an experiment parameter is re-ally limited to a single layer. Especially for numeric param-eters, we have added operators (e.g., multiplication, addition)
that are transitive and therefore composable. With these opera-tors, we can use the same parameter in experiments in multiple
layers, as long as those experiments only specify operations on
the default value rather than overriding the default value.
There are times when we need to run experiments that focus
on small traffic slices, such as rare languages (e.g., Uzbek or
Swahili). Given the sheer volume of experiments that we run,
it is often difficult to carve out the space to run a sufficiently
large experiment to get statistically significant results within a
reasonable time frame.
Continuing to push on efficient experiment space utilization by
providing even more expressive conditions (and the associated
verification to ensure robustness), etc.
We continue to innovate on experimentation since the appetite for
experimentation and data-driven decisions keeps growing.
Acknowledgments: Many folks beyond the authors participated in
the work described here. An incomplete list includes Eric Bauer,
Ilia Mirkin, Jim Morrison, Susan Shannon, Daryl Pregibon, Diane
Lambert, Patrick Riley, Bill Heavlin, Nick Chamandy, Wael Sal-loum, Jeremy Shute, David Agraz, Simon Favreau-Lessard, Amir
Najmi, Everett Wetchler, Martin Reichelt, Jay Crim, and Eric Flatt.
Thanks also to Robin Jeffries, Rehan Khan, Ramakrishnan Srikant,
and Roberto Bayardo for useful comments on the paper.
8. REFERENCES
[1] D. Agarwal, A. Broder, D. Chakrabarti, D. Diklic,
V. Josifovski, and M. Sayyadian. Estimating rates of rare
events at multiple resolutions. In Proceedings of the ACM
Conference on Knowledge Discovery and Data Mining
(KDD), 2007.
[2] W. G. Cochran. Sampling Techniques. Wiley, 1977.
[3] D. Cox and N. Reid. The theory of the design of
experiments, 2000.
[4] T. Crook, B. Frasca, R. Kohavi, and R. Longbotham. Seven
pitfalls to avoid when running controlled experiments on the
web. Microsoft white paper, March 2008.
http://exp-platform.com/Documents/ExPpitfalls.pdf.
[5] Google. Google website optimizer.
http://www.google.com/analytics/siteopt.
[6] T. Joachims. Optimizing search engines using clickthrough
data. In Proceedings of the ACM Conference on Knowledge
Discovery and Data Mining (KDD), 2002.
[7] R. Kohavi, R. Longbotham, D. Sommerfield, and R. M.
Henne. Controlled experiments on the web: Survey and
practical guide. Data Mining and Knowledge Discovery, 18,
no. 1:140–181, July 2008.
[8] M. Krieger. Wrap up & experimentation: Cs147l lecture, 12
2009. http://hci.stanford.edu/courses/cs147/lab/slides/08-experimentation.pdf.
[9] Microsoft. Microsoft’s experimentation platform.
http://exp-platform.com/default.aspx.
[10] M. Richardson, E. Dominowska, and R. Ragno. Predicting
clicks: Estimating the click-through rate for new ads. In
Proceedings of the 16th International World Wide Web
Conference, 2007.
[11] L. Wasserman. All of Statistics: A Concise Course in
Statistical Inference. Springer Texts, 2004.

你可能感兴趣的:(无题)

无题，感慨竹间书编辑
玉生烟，雪落天，枯叶随雪葬行边，何有芳名，流落人世间。雪中行，路中停，风送鹅雪风无情，且将留此，风波却未平
无题琴韵无声
问了几家门诊部都没有科兴疫苗，突然自我感觉这种品牌的疫苗是不是少一些，于是又无端滋生焦虑感，可别一拖再拖影响孩子上学，学校要求下学期开学得接种完新冠疫苗。我在这种自制的焦虑的驱使下，立马上网查询看哪里能打到北京科兴的疫苗，终于找到了，大喜。与珊宝一起打车过去（路比较远，早想借此机会让她徒步拉练一下的计划泡汤了）。到达目的地，一看到医院大门前一条长龙似的队伍就知道那里应该是打疫苗的地方。迅速过去排队
0416-无题37 傻猪唛的天空之城
办公室外，哗啦啦的下着雨。经过上周闷热，原以为没有春雨了，只是来得晚。天气预报说：本周都是阴雨天气。也好，春天就该是百花盛开，绵绵细雨。刚开始打开编辑器时想发一篇用心写的文章，后来又放弃了。想想有里程碑的意义等过几天再发。日更写得无主题无重点但在末尾我还是用心写了一句鸡汤的啊！给平凡的生活添一点仪式感，自己给自己制造小惊喜。
2021-08-25 木棉yang0314
无题不知不觉八月末了，凉爽的风吹进来，舒服的想留在这时光里，过去的不去想，正在发生的也不去想，未来的更不去想。就这样听着隔壁老樊的歌，留在这时光里。浮世半生，刚刚学会偷得半日闲，还有种沉甸甸的负罪感，感觉自己好自私。是啊，这些年何曾为自己真正活过？可谁又不是背着沉重的担子，昨天去看，才知道北花的母亲病重，赶紧联系询问，她说：“谁都不要说，是我的一个情绪出口，写出来不是为了告诉别人，请你理解！”其实
无题念木森生
今天啥也不想写。感冒了，头好重，鼻子不通，难受。今晚想早点睡，天天熬夜到凌晨，早上六点多就要起床，身体受不了了。每天都无精打采的。从今天开始，我决定晚上不熬夜了，最迟十点半一定要睡觉了，不然白天没精神，什么事也做不了，还是白白荒废了大好时光。好了，今天就说这么多吧！
无题慢步细雨中
不必说，那片绿色的草坪，嫩青的小树，富有生机的松树，星星点点的花儿；也不必说，野鸭在小河里戏水，邻居家的老牛甩着大尾巴“哞哞”地叫，小巧的麻雀在电线杆上歌唱。单是老家那一个小小的村庄，就特别有趣。道路周边的大树，矗立在那里，享受着日光浴。五彩的蝴蝶悠闲地点缀在小巧的花朵上，贪婪地吮吸着花的芳香。河水“哗哗”地流着，迷糊地倒映着白云的纯洁。家家户户都几乎养着犬，都“汪汪”的叫着，划破了傍晚本该有的宁
无题白马卓玛
云堆巨浪茫，雪坠玉穹凉。纤态移金步，裙飘彩带长。红梅高翘盼，闻喜鹊声香。风荡愁云去，春风得意扬。图片发自App
无题辉姑娘吉将范身做女王
图片发自App今天，又如往常一样到了每周作业雨时间，我却从两三天前就不知道该写些什么，刚刚跟母上大人视频通话了近俩小时，细数自己毕业后两年来的变化，总是觉着缺点什么还是慢半拍的感觉，总是没有在该做什么事儿的年龄去做那些事儿，总是到事后才惊觉自己为什么不能早点开始……这一年来，我不断的尝试跟母亲尽可能多的沟通我工作及生活上的一些琐事和我对待这些的态度，想让她一点一点的走进我的生活和工作中，一方面是因
无题龙行天下谁与争锋
善恶有报辩是非自古天道好轮回诸君不信抬头看圣明穹爷饶过谁图片发自App图片发自App图片发自App图片发自App图片发自App图片发自App图片发自App图片发自App图片发自App
无题心若繁语
昨夜还算一夜好眠。就是临睡前有了一个小插曲。左耳的耳廓内起了一个小包，又痛又痒。于是左耳的感觉就比平日里稍微敏感一些。侧卧在左边，突然发现左耳的耳轮被压在枕头上，不管是卷着还是平铺都不是很舒服。不仅疑惑自己平时都是怎样的位置入眠的呢？辗转反侧纠结了几圈，突然想起小时候看到过的一个漫画故事。故事的大意是这样的：据说有一位长胡子老爷爷从来未在意过晚上睡觉，自己的胡子是放在被子里面还是外面的。直到有一天
无题悦读山人
问：姐，我最近对工作好迷茫，好困惑！以前留在学校是想安安稳稳的生二娃，照顾好家庭。现在二娃生了，我就不想这样碌碌无为，不求上进！我现在主要是在财经学校负责实训室管理工作，因为电脑较多，对计算机知识要求更高了。以我目前的能力完全不够，以前都是将就混日子。现在我就不知是继续学不喜欢的计算机知识呢？还是离职做其他的？我的专业你也知道，有局限性，做培训？做早教？还是继续到上班？迷茫！老公建议我们两个最好留
古诗中的风古典雨竹
风是古诗中常见的意象，是人的思想情感的载体，由于作者心境不同，或抒发相思和离别之情，或感伤身世和流离之苦，或倾诉旷达潇洒之胸襟，或是借以抒发对大千世界的赞颂。不同的风，不同的意蕴。一、东风东风，含有生机勃勃之意和喜悦之情。（1）反衬伤逝之情，李煜《虞美人》“小楼昨夜又东风，故国不堪回首月明中”；（2）渲染离别情绪，李商隐《无题》“相见时难别亦难，东风无力百花残”；（3）象征专制势力，陆游《钗头凤》
2022-02-22 长三角的夜
无题我知道你一直蛰伏在我们身边，黑暗使者一般观察洞悉着有关我们的一切，我原本也可以像你一样，窥视着有关你们的。我没有，我开诚布公的道出分享我能想到的所有，同时看着你一点点对有关你们的一切实施隐瞒跟保护，让人以为，你的家人跟你一样见不得光，让他们像你一样不敢热爱生活分享生活。如果不是，只能用物以类聚来解释了。我的坦诚无畏在你看来是愚蠢白痴的吧。在你的想法里是不是只有玩脑子玩心理有城府才符合人类这种高
无题星辰溥天
图片发自App图片发自App曾经雨不常有人却有--现今雨常有人却无有--一伞避雨避世尘20180907铁石
无题:此时小记諦羽沐楓
今日在去超市买鞋垫的时候，我竟冲着昨日卖臭豆腐的那个地方下意识瞄了一眼，没有看到我想要的答案，内心竟还有一些小失望，不过这种感觉随即也就消失了。买这个鞋垫主要是最近上班走的步数比较多，这脚未免就会有些疼痛，也就去了超市买了一双透气且除臭的鞋垫。不过在付账的时候，我颇为搞笑的问了一句，这鞋垫是一次性的么？只听老板淡淡回了一句，你要是当做一次性的，穿几天也可以扔掉的呀！虽说老板语气很平淡，不过我还是听
无题载舟RC
这两天在想一个问题，为什么同样一件事，两个人去做，一个把这个当成了负担，不情不愿地去完成这个过程，另一个人把这个过程当成享受，享受其中成长的乐趣。很明显，享受这个过程的人可以很快很轻松地完成任务，当成了负担的人，就算完成了工作，也未必能取得好的结果，或者说能满足上级的领导的要求。这就是人与人的区别，也是拉开距离的原因。无关年龄，无关经验，在于你有没有心。
无题云中听书
一只猫躲进了寓言你开始相信谎言我奔走从前只想互换你我不听寓言的我打不开故事的锁佛前佛光已开进你眉宇间
2021-10-04 无题海阔天高_
心情，是一种感情状态，拥有了好心情，也就拥有了自信，继而拥有了年轻和健康。就拥有了对未来生活的向往，充满期待，让我们拥有一份好心情吧，因为生活着就是幸运和快乐。生活中真正的安全感来源于，知道自己每一天都在以某种方式进步。不要忽略人生中每一个让你意想不到的时刻，那是你的灵魂摆渡人在向你告密。——克莱儿·麦克福尔《摆渡人》
无题 L派大星
：简单而又不乏用文字书写近况。这是我对的基本理解。其实根本就不记得是什么时候下载的，什么时候习惯每天会第一时间删除提醒。做为强迫症严重的患者对于此事一点也不马虎，不过我还是从未写过一篇文章，也从来没看过一篇完整的文字。初体验：记得有一天书桌上一盒彩色铅笔✏️嗯。点击进入搜索……就这一次，使用率极低的APP一直静躺在手机屏幕的最显眼处。过份扎眼。第一次体验到作家提供的绘画技巧及作品，显然是满意的。…
【白氏诗曲】白丙之
【第一首】［正宫*叨叨令］生活照暖洋洋红日空中照，叽喳喳小鸟枝头叫。轻飘飘绿柳河边俏，嘻哈哈孩子楼前闹；快活也么哥，快活也么哥，就连做梦也在梦中笑。【第二首】七绝•无题老伴文盲不爱静，天生个性互投缘。若夫她去闲聊后，清室填词可悟禅。
无题 Akira_6429
最近各种动乱，不要乱骂。相信祖国，会处理好的。游泳健将的事情，不理会他们就可以。做对的事情就行。像华为老总说的。就特别的中国！认识维度真的要广，要高！哈哈！最高境界，就是你气我，我不气。反倒气死你！
七言绝句《无题》原创诗词遥遥一峰3628996
无题更深漏浅印孤魂彼岸花前念旧人一梦江东千里路错将楚地作先秦～彭城周书坤2018.12.16新韵随笔
2023-09-04 牛则灵
回忆幕幕眼前复思念道道脑海绕年年岁岁人自醉来来往往空一场人间峥嵘如鬼工时光荏苒终有断只是枯荣太匆匆转瞬倏忽便虚无随手一写是离别起笔一画无佳话人间嬉闹切傲娇缘结相思笔下词岁月峥嵘一场空娑婆尘世各有志春水东流君莫愁我自狂笑任天高明晨艳阳依旧亮它朝故事亦将逝无题，无题旦夕祸福自相依
无题霁鲤蚌
很久不曾写过一段话，也不曾写过一篇文章，行动改变态度，与其每天想想想，不如先做起来，虽然漫无目的，就先记录每天的一点点想法，也许是无稽的，也许不切实际，写下来也是我自己梳理和思考的一种方式；今天午饭后看着晴朗的天空和翠绿的树木，心想春天这么快就来了，跟同事出去遛弯，路过广场远远听到吉他声，紧接着看到了露天弹唱的三个男生，同事说这广场还能随便卖唱？我说不可能的，这地段，估计是提前说好的把，走近一看，
无题数豆者m
在寻找了千百种理由之后，蓦然回首曾经走过的那些岁月，惊奇的发现，其实生活賜予的，并没有与别人有什么不同，呈现在视野里的生活，每个人其实都一样，不同的仅仅是每个人胸襟中缺少的那一份“坦然”。晚上好！
无题十七七七七七七七七七七
“你抽烟了。”“嗯。”“你说过为了她一辈子不抽烟的。”“嗯。”“你抽烟了。”“嗯。”“你抽烟了。”“她说过会爱我一辈子。”“你抽烟了。”“她不是骗子。”“别抽了。”“好。”
无题窮奇
昨夜在一个叫作北方刮起一阵风随着风他踩着树梢，踩着云彩在月光的注视下，在群星的围观下在第二天的清晨里他来到了一个叫作南方的地方从此他有了另外一个名字北方的人有时的夜里他会回到北方在第二天到来时他又回归南方每当雨季来临时他在路上淋着雨他觉得他是一个流浪的人一个独自划着小舟却在海的中央只能看到海中的风浪头顶的白云蓝天还有手中唯一的船桨他随风随着海浪漂流着从未用过手中的船桨他觉得顺其自然是最好的安排我期
无题深夜静思子
昨天又打开了卫星地图，看到了家乡的变化。突然涌现了好多回忆啊。有些小路，卫星地图上没有显示好多曾经的山路，在地图上是没有显示的。那都是盘山的羊肠小道，这几年，大家的摩托车、汽车普及后，都不走那种崎岖的道路了，而是走舒适的大道了。而它们，都开始淹没在林海之间。“我家背后，就没有大路了。”一个小伙伴说了一下。“但是我记得以前我去我大姨家，以前是有路的。”我的心里，有浮现了小时候去大姨家，走路走到腿酸的
2018-06-25 北方的海洋
无题文/梨落如风（西安）在黄昏，尝试理解桐木之悲。沿着真实的楼群。青鸟飞过它们，像一块柔顺的痣，然后掉下来，发出低低的呜咽。桐树捂紧自己精致的叶子。一个男人牵紧蘸满汗水的女儿。他们都有相似之处，在遥远的未来。我尝试理解桐木之悲。在美丽的黄昏，触碰河岸锯齿般的边陲，河水冰凉，水草萋萋，比衰老更慢的变化的脸，和时间一样，进退两难。限于泥泞中的桐木，一个男人握紧的蝴蝶，终将飞出屏风。手心吹散微汗，为这流
无题初夏9
己亥末，庚子春。人枉杀生灵为之口腹之欲。使荆楚大疫，染者数万计。众惶恐，举国防，皆闭户。街无车舟，万巷空寂。青丝白发，皆身先士卒。布衣商客皆争先解囊。南山镇守江南都，率白衣郎中数万抗之。且九州一心，月余，疫尽去。国泰民安。
戴尔笔记本win8系统改装win7系统 sophia天雪 win7 戴尔改装系统 win8
戴尔win8 系统改装win7 系统详述第一步：使用U盘制作虚拟光驱： 1）下载安装UltraISO：注册码可以在网上搜索。 2）启动UltraISO，点击“文件”—》“打开”按钮，打开已经准备好的ISO镜像文
BeanUtils.copyProperties使用笔记 bylijinnan java
BeanUtils.copyProperties VS PropertyUtils.copyProperties 两者最大的区别是： BeanUtils.copyProperties会进行类型转换，而PropertyUtils.copyProperties不会。既然进行了类型转换，那BeanUtils.copyProperties的速度比不上PropertyUtils.copyProp
MyEclipse中文乱码问题 0624chenhong MyEclipse
一、设置新建常见文件的默认编码格式，也就是文件保存的格式。在不对MyEclipse进行设置的时候，默认保存文件的编码，一般跟简体中文操作系统（如windows2000，windowsXP）的编码一致，即GBK。在简体中文系统下，ANSI 编码代表 GBK编码;在日文操作系统下，ANSI 编码代表 JIS 编码。 Window-->Preferences-->General -
发送邮件不懂事的小屁孩 send email
import org.apache.commons.mail.EmailAttachment; import org.apache.commons.mail.EmailException; import org.apache.commons.mail.HtmlEmail; import org.apache.commons.mail.MultiPartEmail;
动画合集换个号韩国红果果 html css
动画指一种样式变为另一种样式 keyframes应当始终定义0 100 过程 1 transition 制作鼠标滑过图片时的放大效果 css .wrap{ width: 340px;height: 340px; position: absolute; top: 30%; left: 20%; overflow: hidden; bor
网络最常见的攻击方式竟然是SQL注入蓝儿唯美 sql注入
NTT研究表明，尽管SQL注入（SQLi）型攻击记录详尽且为人熟知，但目前网络应用程序仍然是SQLi攻击的重灾区。信息安全和风险管理公司NTTCom Security发布的《2015全球智能威胁风险报告》表明，目前黑客攻击网络应用程序方式中最流行的，要数SQLi攻击。报告对去年发生的60亿攻击行为进行分析，指出SQLi攻击是最常见的网络应用程序攻击方式。全球网络应用程序攻击中，SQLi攻击占
java笔记2 a-john java
类的封装： 1，java中，对象就是一个封装体。封装是把对象的属性和服务结合成一个独立的的单位。并尽可能隐藏对象的内部细节（尤其是私有数据） 2，目的：使对象以外的部分不能随意存取对象的内部数据（如属性），从而使软件错误能够局部化，减少差错和排错的难度。 3，简单来说，“隐藏属性、方法或实现细节的过程”称为——封装。 4，封装的特性： 4.1设置
[Andengine]Error：can't creat bitmap form path “gfx/xxx.xxx” aijuans 学习Android遇到的错误
最开始遇到这个错误是很早以前了，以前也没注意，只当是一个不理解的bug，因为所有的texture，textureregion都没有问题，但是就是提示错误。昨天和美工要图片，本来是要背景透明的png格式，可是她却给了我一个jpg的。说明了之后她说没法改，因为没有png这个保存选项。我就看了一下，和她要了psd的文件，还好我有一点
自己写的一个繁体到简体的转换程序 asialee java 转换繁体 filter 简体
今天调研一个任务，基于java的filter实现繁体到简体的转换，于是写了一个demo，给各位博友奉上，欢迎批评指正。实现的思路是重载request的调取参数的几个方法，然后做下转换。
android意图和意图监听器技术百合不是茶 android 显示意图隐式意图意图监听器
Intent是在activity之间传递数据;Intent的传递分为显示传递和隐式传递显式意图：调用Intent.setComponent() 或 Intent.setClassName() 或 Intent.setClass()方法明确指定了组件名的Intent为显式意图，显式意图明确指定了Intent应该传递给哪个组件。隐式意图;不指明调用的名称,根据设
spring3中新增的@value注解 bijian1013 java spring @Value
在spring 3.0中，可以通过使用@value，对一些如xxx.properties文件中的文件，进行键值对的注入，例子如下： 1.首先在applicationContext.xml中加入： <beans xmlns="http://www.springframework.
Jboss启用CXF日志 sunjing log jboss CXF
1. 在standalone.xml配置文件中添加system-properties： <system-properties> <property name="org.apache.cxf.logging.enabled" value=&
【Hadoop三】Centos7_x86_64部署Hadoop集群之编译Hadoop源代码 bit1129 centos
编译必需的软件 Firebugs3.0.0 Maven3.2.3 Ant JDK1.7.0_67 protobuf-2.5.0 Hadoop 2.5.2源码包 Firebugs3.0.0 http://sourceforge.jp/projects/sfnet_findbug
struts2验证框架的使用和扩展白糖_ 框架 xml bean struts 正则表达式
struts2能够对前台提交的表单数据进行输入有效性校验，通常有两种方式： 1、在Action类中通过validatexx方法验证，这种方式很简单，在此不再赘述； 2、通过编写xx-validation.xml文件执行表单验证，当用户提交表单请求后，struts会优先执行xml文件，如果校验不通过是不会让请求访问指定action的。本文介绍一下struts2通过xml文件进行校验的方法并说
记录-感悟 braveCS 感悟
再翻翻以前写的感悟，有时会发现自己很幼稚，也会让自己找回初心。 2015-1-11 1. 能在工作之余学习感兴趣的东西已经很幸福了； 2. 要改变自己，不能这样一直在原来区域，要突破安全区舒适区，才能提高自己，往好的方面发展； 3. 多反省多思考；要会用工具，而不是变成工具的奴隶； 4. 一天内集中一个定长时间段看最新资讯和偏流式博
编程之美-数组中最长递增子序列 bylijinnan 编程之美
import java.util.Arrays; import java.util.Random; public class LongestAccendingSubSequence { /** * 编程之美数组中最长递增子序列 * 书上的解法容易理解 * 另一方法书上没有提到的是，可以将数组排序（由小到大）得到新的数组， * 然后求排序后的数组与原数
读书笔记5 chengxuyuancsdn 重复提交 struts2的token验证
1、重复提交 2、struts2的token验证 3、用response返回xml时的注意 1、重复提交 (1)应用场景 (1-1)点击提交按钮两次。 (1-2)使用浏览器后退按钮重复之前的操作，导致重复提交表单。 (1-3)刷新页面 (1-4)使用浏览器历史记录重复提交表单。 (1-5)浏览器重复的 HTTP 请求。 (2)解决方法 (2-1)禁掉提交按钮 (2-2)
[时空与探索]全球联合进行第二次费城实验的可能性 comsci
二次世界大战前后,由爱因斯坦参加的一次在海军舰艇上进行的物理学实验 -费城实验至今给我们大家留下很多迷团..... 关于费城实验的详细过程,大家可以在网络上搜索一下,我这里就不详细描述了在这里,我的意思是,现在
easy connect 之 ORA-12154: TNS: 无法解析指定的连接标识符 daizj oracle ORA-12154
用easy connect连接出现“tns无法解析指定的连接标示符”的错误，如下： C:\Users\Administrator>sqlplus username/[email protected]:1521/orcl SQL*Plus: Release 10.2.0.1.0 – Production on 星期一 5月 21 18:16:20 2012 Copyright (c) 198
简单排序:归并排序 dieslrae 归并排序
public void mergeSort(int[] array){ int temp = array.length/2; if(temp == 0){ return; } int[] a = new int[temp]; int
C语言中字符串的\0和空格 dcj3sjt126com c
\0 为字符串结束符，比如说： abcd (空格)cdefg；存入数组时，空格作为一个字符占有一个字节的空间，我们
解决Composer国内速度慢的办法 dcj3sjt126com Composer
用法：有两种方式启用本镜像服务： 1 将以下配置信息添加到 Composer 的配置文件 config.json 中（系统全局配置）。见“例1” 2 将以下配置信息添加到你的项目的 composer.json 文件中（针对单个项目配置）。见“例2” 为了避免安装包的时候都要执行两次查询，切记要添加禁用 packagist 的设置，如下 1 2 3 4 5
高效可伸缩的结果缓存 shuizhaosi888 高效可伸缩的结果缓存
/** * 要执行的算法，返回结果v */ public interface Computable<A, V> { public V comput(final A arg); } /** * 用于缓存数据 */ public class Memoizer<A, V> implements Computable<A,
三点定位的算法 haoningabc c 算法
三点定位，已知a,b,c三个顶点的x,y坐标和三个点都z坐标的距离，la，lb,lc 求z点的坐标原理就是围绕a,b,c 三个点画圆，三个圆焦点的部分就是所求但是，由于三个点的距离可能不准，不一定会有结果，所以是三个圆环的焦点，环的宽度开始为0，没有取到则加1 运行 gcc -lm test.c test.c代码如下 #include "stdi
epoll使用详解 jimmee c linux 服务端编程 epoll
epoll - I/O event notification facility在linux的网络编程中，很长的时间都在使用select来做事件触发。在linux新的内核中，有了一种替换它的机制，就是epoll。相比于select，epoll最大的好处在于它不会随着监听fd数目的增长而降低效率。因为在内核中的select实现中，它是采用轮询来处理的，轮询的fd数目越多，自然耗时越多。并且，在linu
Hibernate对Enum的映射的基本使用方法 linzx0212 enum Hibernate
枚举 /** * 性别枚举 */ public enum Gender { MALE(0), FEMALE(1), OTHER(2); private Gender(int i) { this.i = i; } private int i; public int getI
第10章高级事件（下） onestopweb 事件
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
孙子兵法 roadrunners 孙子兵法
始计第一孙子曰：兵者，国之大事，死生之地，存亡之道，不可不察也。故经之以五事，校之以计，而索其情：一曰道，二曰天，三曰地，四曰将，五曰法。道者，令民于上同意，可与之死，可与之生，而不危也；天者，阴阳、寒暑、时制也；地者，远近、险易、广狭、死生也；将者，智、信、仁、勇、严也；法者，曲制、官道、主用也。凡此五者，将莫不闻，知之者胜，不知之者不胜。故校之以计，而索其情，曰
MySQL双向复制 tomcat_oracle mysql
本文包括: 主机配置从机配置建立主-从复制建立双向复制背景按照以下简单的步骤: 参考一下：在机器A配置主机(192.168.1.30) 在机器B配置从机(192.168.1.29) 我们可以使用下面的步骤来实现这一点步骤1：机器A设置主机在主机中打开配置文件 ,
zoj 3822 Domination(dp) 阿尔萨斯 Mina
题目链接：zoj 3822 Domination 题目大意：给定一个N∗M的棋盘，每次任选一个位置放置一枚棋子，直到每行每列上都至少有一枚棋子，问放置棋子个数的期望。解题思路：大白书上概率那一张有一道类似的题目，但是因为时间比较久了，还是稍微想了一下。dp[i][j][k]表示i行j列上均有至少一枚棋子，并且消耗k步的概率（k≤i∗j）,因为放置在i+1~n上等价与放在i+1行上，同理