dingjun1

kafka-design

具体见附件pdf文档，留个做备份
：http://kafka.apache.org/07/design.html

Apache Kafka
A high-throughput distributed messaging system.
0.7
quickstart
design
clients
api docs
configuration
performance
Why we built this
Kafka is a messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn's activity stream and operational data processing pipeline. It is now used at a variety of different companies for various data pipeline and messaging uses.
Activity stream data is a normal part of any website for reporting on usage of the site. Activity data is things like page views, information about what content was shown, searches, etc. This kind of thing is usually handled by logging the activity out to some kind of file and then periodically aggregating these files for analysis. Operational data is data about the performance of servers (CPU, IO usage, request times, service logs, etc) and a variety of different approaches to aggregating operational data are used.
In recent years, activity and operational data has become a critical part of the production features of websites, and a slightly more sophisticated set of infrastructure is needed.
Use cases for activity stream and operational data
"News feed" features that broadcast the activity of your friends.
Relevance and ranking uses count ratings, votes, or click-through to determine which of a given set of items is most relevant.
Security: Sites need to block abusive crawlers, rate-limit apis, detect spamming attempts, and maintain other detection and prevention systems that key off site activity.
Operational monitoring: Most sites needs some kind of real-time, heads-up monitoring that can track performance and trigger alerts if something goes wrong.
Reporting and Batch processing: It is common to load data into a data warehouse or Hadoop system for offline analysis and reporting on business activity
Characteristics of activity stream data
This high-throughput stream of immutable activity data represents a real computational challenge as the volume may easily be 10x or 100x larger than the next largest data source on a site.
Traditional log file aggregation is a respectable and scalable approach to supporting offline use cases like reporting or batch processing; but is too high latency for real-time processing and tends to have rather high operational complexity. On the other hand, existing messaging and queuing systems are okay for real-time and near-real-time use-cases, but handle large unconsumed queues very poorly often treating persistence as an after thought. This creates problems for feeding the data to offline systems like Hadoop that may only consume some sources once per hour or per day. Kafka is intended to be a single queuing platform that can support both offline and online use cases.
Kafka supports fairly general messaging semantics. Nothing ties it to activity processing, though that was our motivating use case.
Deployment
The following diagram gives a simplified view of the deployment topology at LinkedIn.
Note that a single kafka cluster handles all activity data from all different sources. This provides a single pipeline of data for both online and offline consumers. This tier acts as a buffer between live activity and asynchronous processing. We also use kafka to replicate all data to a different datacenter for offline consumption.
It is not intended that a single Kafka cluster span data centers, but Kafka is intended to support multi-datacenter data flow topologies. This is done by allowing mirroring or "syncing" between clusters. This feature is very simple, the mirror cluster simply acts as a consumer of the source cluster. This means it is possible for a single cluster to join data from many datacenters into a single location. Here is an example of a possible multi-datacenter topology aimed at supporting batch loads:
Note that there is no correspondence between nodes in the two clusters—they may be of different sizes, contain different number of nodes, and a single cluster can mirror any number of source clusters. More details on using the mirroring feature can be found here.
Major Design Elements
There is a small number of major design decisions that make Kafka different from most other messaging systems:
Kafka is designed for persistent messages as the common case
Throughput rather than features are the primary design constraint
State about what has been consumed is maintained as part of the consumer not the server
Kafka is explicitly distributed. It is assumed that producers, brokers, and consumers are all spread over multiple machines.
Each of these decisions will be discussed in more detail below.
Basics
First some basic terminology and concepts.
Messages are the fundamental unit of communication. Messages are published to a topic by a producer which means they are physically sent to a server acting as a broker (probably another machine). Some number of consumers subscribe to a topic, and each published message is delivered to all the consumers.
Kafka is explicitly distributed—producers, consumers, and brokers can all be run on a cluster of machines that co-operate as a logical group. This happens fairly naturally for brokers and producers, but consumers require some particular support. Each consumer process belongs to a consumer group and each message is delivered to exactly one process within every consumer group. Hence a consumer group allows many processes or machines to logically act as a single consumer. The concept of consumer group is very powerful and can be used to support the semantics of either a queue or topic as found in JMS. To support queue semantics, we can put all consumers in a single consumer group, in which case each message will go to a single consumer. To support topic semantics, each consumer is put in its own consumer group, and then all consumers will receive each message. A more common case in our own usage is that we have multiple logical consumer groups, each consisting of a cluster of consuming machines that act as a logical whole. Kafka has the added benefit in the case of large data that no matter how many consumers a topic has, a message is stored only a single time.
Message Persistence and Caching
Don't fear the filesystem!
Kafka relies heavily on the filesystem for storing and caching messages. There is a general perception that "disks are slow" which makes people skeptical that a persistent structure can offer competitive performance. In fact disks are both much slower and much faster than people expect depending on how they are used; and a properly designed disk structure can often be as fast as the network.
The key fact about disk performance is that the throughput of hard drives has been diverging from the latency of a disk seek for the last decade. As a result the performance of linear writes on a 6 7200rpm SATA RAID-5 array is about 300MB/sec but the performance of random writes is only about 50k/sec—a difference of nearly 10000X. These linear reads and writes are the most predictable of all usage patterns, and hence the one detected and optimized best by the operating system using read-ahead and write-behind techniques that prefetch data in large block multiples and group smaller logical writes into large physical writes. A further discussion of this issue can be found in this ACM Queue article; they actually find that sequential disk access can in some cases be faster than random memory access!
To compensate for this performance divergence modern operating systems have become increasingly aggressive in their use of main memory for disk caching. Any modern OS will happily divert all free memory to disk caching with little performance penalty when the memory is reclaimed. All disk reads and writes will go through this unified cache. This feature cannot easily be turned off without using direct I/O, so even if a process maintains an in-process cache of the data, this data will likely be duplicated in OS pagecache, effectively storing everything twice.
Furthermore we are building on top of the JVM, and anyone who has spent any time with Java memory usage knows two things:
The memory overhead of objects is very high, often doubling the size of the data stored (or worse).
Java garbage collection becomes increasingly sketchy and expensive as the in-heap data increases.
As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure—we at least double the available cache by having automatic access to all free memory, and likely double again by storing a compact byte structure rather than individual objects. Doing so will result in a cache of up to 28-30GB on a 32GB machine without GC penalties. Furthermore this cache will stay warm even if the service is restarted, whereas the in-process cache will need to be rebuilt in memory (which for a 10GB cache may take 10 minutes) or else it will need to start with a completely cold cache (which likely means terrible initial performance). It also greatly simplifies the code as all logic for maintaining coherency between the cache and filesystem is now in the OS, which tends to do so more efficiently and more correctly than one-off in-process attempts. If your disk usage favors linear reads then read-ahead is effectively pre-populating this cache with useful data on each disk read.
This suggests a design which is very simple: rather than maintain as much as possible in-memory and flush to the filesystem only when necessary, we invert that. All data is immediately written to a persistent log on the filesystem without any call to flush the data. In effect this just means that it is transferred into the kernel's pagecache where the OS can flush it later. Then we add a configuration driven flush policy to allow the user of the system to control how often data is flushed to the physical disk (every N messages or every M seconds) to put a bound on the amount of data "at risk" in the event of a hard crash.
This style of pagecache-centric design is described in an article on the design of Varnish here (along with a healthy helping of arrogance).
Constant Time Suffices
The persistent data structure used in messaging systems metadata is often a BTree. BTrees are the most versatile data structure available, and make it possible to support a wide variety of transactional and non-transactional semantics in the messaging system. They do come with a fairly high cost, though: Btree operations are O(log N). Normally O(log N) is considered essentially equivalent to constant time, but this is not true for disk operations. Disk seeks come at 10 ms a pop, and each disk can do only one seek at a time so parallelism is limited. Hence even a handful of disk seeks leads to very high overhead. Since storage systems mix very fast cached operations with actual physical disk operations, the observed performance of tree structures is often superlinear. Furthermore BTrees require a very sophisticated page or row locking implementation to avoid locking the entire tree on each operation. The implementation must pay a fairly high price for row-locking or else effectively serialize all reads. Because of the heavy reliance on disk seeks it is not possible to effectively take advantage of the improvements in drive density, and one is forced to use small (< 100GB) high RPM SAS drives to maintain a sane ratio of data to seek capacity.
Intuitively a persistent queue could be built on simple reads and appends to files as is commonly the case with logging solutions. Though this structure would not support the rich semantics of a BTree implementation, but it has the advantage that all operations are O(1) and reads do not block writes or each other. This has obvious performance advantages since the performance is completely decoupled from the data size--one server can now take full advantage of a number of cheap, low-rotational speed 1+TB SATA drives. Though they have poor seek performance, these drives often have comparable performance for large reads and writes at 1/3 the price and 3x the capacity.
Having access to virtually unlimited disk space without penalty means that we can provide some features not usually found in a messaging system. For example, in kafka, instead of deleting a message immediately after consumption, we can retain messages for a relative long period (say a week).
Maximizing Efficiency
Our assumption is that the volume of messages is extremely high, indeed it is some multiple of the total number of page views for the site (since a page view is one of the activities we process). Furthermore we assume each message published is read at least once (and often multiple times), hence we optimize for consumption rather than production.
There are two common causes of inefficiency: too many network requests, and excessive byte copying.
To encourage efficiency, the APIs are built around a "message set" abstraction that naturally groups messages. This allows network requests to group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time.
The MessageSet implementation is itself a very thin API that wraps a byte array or file. Hence there is no separate serialization or deserialization step required for message processing, message fields are lazily deserialized as needed (or not deserialized if not needed).
The message log maintained by the broker is itself just a directory of message sets that have been written to disk. This abstraction allows a single byte format to be shared by both the broker and the consumer (and to some degree the producer, though producer messages are checksumed and validated before being added to the log).
Maintaining this common format allows optimization of the most important operation: network transfer of persistent log chunks. Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket; in Linux this is done with the sendfile system call. Java provides access to this system call with the FileChannel.transferTo api.
To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:
The operating system reads data from the disk into pagecache in kernel space
The application reads the data from kernel space into a user-space buffer
The application writes the data back into kernel space into a socket buffer
The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network
This is clearly inefficient, there are four copies, two system calls. Using sendfile, this re-copying is avoided by allowing the OS to send the data from pagecache to the network directly. So in this optimized path, only the final copy to the NIC buffer is needed.
We expect a common use case to be multiple consumers on a topic. Using the zero-copy optimization above, data is copied into pagecache exactly once and reused on each consumption instead of being stored in memory and copied out to kernel space every time it is read. This allows messages to be consumed at a rate that approaches the limit of the network connection.
For more background on the sendfile and zero-copy support in Java, see this article on IBM developerworks.
End-to-end Batch Compression
In many cases the bottleneck is actually not CPU but network. This is particularly true for a data pipeline that needs to send messages across data centers. Of course the user can always send compressed messages without any support needed from Kafka, but this can lead to very poor compression ratios as much of the redundancy is due to repetition between messages (e.g. field names in JSON or user agents in web logs or common string values). Efficient compression requires compressing multiple messages together rather than compressing each message individually. Ideally this would be possible in an end-to-end fashion—that is, data would be compressed prior to sending by the producer and remain compressed on the server, only being decompressed by the eventual consumers.
Kafka supports this by allowing recursive message sets. A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be delivered all to the same consumer and will remain in compressed form until it arrives there.
Kafka supports GZIP and Snappy compression protocols. More details on compression can be found here.
Consumer state
Keeping track of what has been consumed is one of the key things a messaging system must provide. It is not intuitive, but recording this state is one of the key performance points for the system. State tracking requires updating a persistent entity and potentially causes random accesses. Hence it is likely to be bound by the seek time of the storage system not the write bandwidth (as described above).
Most messaging systems keep metadata about what messages have been consumed on the broker. That is, as a message is handed out to a consumer, the broker records that fact locally. This is a fairly intuitive choice, and indeed for a single machine server it is not clear where else it could go. Since the data structure used for storage in many messaging systems scale poorly, this is also a pragmatic choice--since the broker knows what is consumed it can immediately delete it, keeping the data size small.
What is perhaps not obvious, is that getting the broker and consumer to come into agreement about what has been consumed is not a trivial problem. If the broker records a message as consumed immediately every time it is handed out over the network, then if the consumer fails to process the message (say because it crashes or the request times out or whatever) then that message will be lost. To solve this problem, many messaging systems add an acknowledgement feature which means that messages are only marked as sent not consumed when they are sent; the broker waits for a specific acknowledgement from the consumer to record the message as consumed. This strategy fixes the problem of losing messages, but creates new problems. First of all, if the consumer processes the message but fails before it can send an acknowledgement then the message will be consumed twice. The second problem is around performance, now the broker must keep multiple states about every single message (first to lock it so it is not given out a second time, and then to mark it as permanently consumed so that it can be removed). Tricky problems must be dealt with, like what to do with messages that are sent but never acknowledged.
Message delivery semantics
So clearly there are multiple possible message delivery guarantees that could be provided:
At most once—this handles the first case described. Messages are immediately marked as consumed, so they can't be given out twice, but many failure scenarios may lead to losing messages.
At least once—this is the second case where we guarantee each message will be delivered at least once, but in failure cases may be delivered twice.
Exactly once—this is what people actually want, each message is delivered once and only once.
This problem is heavily studied, and is a variation of the "transaction commit" problem. Algorithms that provide exactly once semantics exist, two- or three-phase commits and Paxos variants being examples, but they come with some drawbacks. They typically require multiple round trips and may have poor guarantees of liveness (they can halt indefinitely). The FLP result provides some of the fundamental limitations on these algorithms.
Kafka does two unusual things with respect to metadata. First the stream is partitioned on the brokers into a set of distinct partitions. The semantic meaning of these partitions is left up to the producer and the producer specifies which partition a message belongs to. Within a partition messages are stored in the order in which they arrive at the broker, and will be given out to consumers in that same order. This means that rather than store metadata for each message (marking it as consumed, say), we just need to store the "high water mark" for each combination of consumer, topic, and partition. Hence the total metadata required to summarize the state of the consumer is actually quite small. In Kafka we refer to this high-water mark as "the offset" for reasons that will become clear in the implementation section.
Consumer state
In Kafka, the consumers are responsible for maintaining state information (offset) on what has been consumed. Typically, the Kafka consumer library writes their state data to zookeeper. However, it may be beneficial for consumers to write state data into the same datastore where they are writing the results of their processing. For example, the consumer may simply be entering some aggregate value into a centralized transactional OLTP database. In this case the consumer can store the state of what is consumed in the same transaction as the database modification. This solves a distributed consensus problem, by removing the distributed part! A similar trick works for some non-transactional systems as well. A search system can store its consumer state with its index segments. Though it may provide no durability guarantees, this means that the index is always in sync with the consumer state: if an unflushed index segment is lost in a crash, the indexes can always resume consumption from the latest checkpointed offset. Likewise our Hadoop load job which does parallel loads from Kafka, does a similar trick. Individual mappers write the offset of the last consumed message to HDFS at the end of the map task. If a job fails and gets restarted, each mapper simply restarts from the offsets stored in HDFS.
There is a side benefit of this decision. A consumer can deliberately rewind back to an old offset and re-consume data. This violates the common contract of a queue, but turns out to be an essential feature for many consumers. For example, if the consumer code has a bug and is discovered after some messages are consumed, the consumer can re-consume those messages once the bug is fixed.
Push vs. pull
A related question is whether consumers should pull data from brokers or brokers should push data to the subscriber. In this respect Kafka follows a more traditional design, shared by most messaging systems, where data is pushed to the broker from the producer and pulled from the broker by the consumer. Some recent systems, such as scribe and flume, focusing on log aggregation, follow a very different push based path where each node acts as a broker and data is pushed downstream. There are pros and cons to both approaches. However a push-based system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred. The goal, is generally for the consumer to be able to consume at the maximum possible rate; unfortunately in a push system this means the consumer tends to be overwhelmed when its rate of consumption falls below the rate of production (a denial of service attack, in essence). A pull-based system has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model.
Distribution
Kafka is built to be run across a cluster of machines as the common case. There is no central "master" node. Brokers are peers to each other and can be added and removed at anytime without any manual configuration changes. Similarly, producers and consumers can be started dynamically at any time. Each broker registers some metadata (e.g., available topics) in Zookeeper. Producers and consumers can use Zookeeper to discover topics and to co-ordinate the production and consumption. The details of producers and consumers will be described below.
Producer
Automatic producer load balancing
Kafka supports client-side load balancing for message producers or use of a dedicated load balancer to balance TCP connections. A dedicated layer-4 load balancer works by balancing TCP connections over Kafka brokers. In this configuration all messages from a given producer go to a single broker. The advantage of using a level-4 load balancer is that each producer only needs a single TCP connection, and no connection to zookeeper is needed. The disadvantage is that the balancing is done at the TCP connection level, and hence it may not be well balanced (if some producers produce many more messages than others, evenly dividing up the connections per broker may not result in evenly dividing up the messages per broker).
Client-side zookeeper-based load balancing solves some of these problems. It allows the producer to dynamically discover new brokers, and balance load on a per-request basis. Likewise it allows the producer to partition data according to some key instead of randomly, which enables stickiness on the consumer (e.g. partitioning data consumption by user id). This feature is called "semantic partitioning", and is described in more detail below.
The working of the zookeeper-based load balancing is described below. Zookeeper watchers are registered on the following events—
a new broker comes up
a broker goes down
a new topic is registered
a broker gets registered for an existing topic
Internally, the producer maintains an elastic pool of connections to the brokers, one per broker. This pool is kept updated to establish/maintain connections to all the live brokers, through the zookeeper watcher callbacks. When a producer request for a particular topic comes in, a broker partition is picked by the partitioner (see section on semantic partitioning). The available producer connection is used from the pool to send the data to the selected broker partition.
Asynchronous send
Asynchronous non-blocking operations are fundamental to scaling messaging systems. In Kafka, the producer provides an option to use asynchronous dispatch of produce requests (producer.type=async). This allows buffering of produce requests in a in-memory queue and batch sends that are triggered by a time interval or a pre-configured batch size. Since data is typically published from set of heterogenous machines producing data at variable rates, this asynchronous buffering helps generate uniform traffic to the brokers, leading to better network utilization and higher throughput.
Semantic partitioning
Consider an application that would like to maintain an aggregation of the number of profile visitors for each member. It would like to send all profile visit events for a member to a particular partition and, hence, have all updates for a member to appear in the same stream for the same consumer thread. The producer has the capability to be able to semantically map messages to the available kafka nodes and partitions. This allows partitioning the stream of messages with some semantic partition function based on some key in the message to spread them over broker machines. The partitioning function can be customized by providing an implementation of the kafka.producer.Partitioner interface, default being the random partitioner. For the example above, the key would be member_id and the partitioning function would be hash(member_id)%num_partitions.
Support for Hadoop and other batch data load
Scalable persistence allows for the possibility of supporting batch data loads that periodically snapshot data into an offline system for batch processing. We make use of this for loading data into our data warehouse and Hadoop clusters.
Batch processing happens in stages beginning with the data load stage and proceeding in an acyclic graph of processing and output stages (e.g. as supported here). An essential feature of support for this model is the ability to re-run the data load from a point in time (in case anything goes wrong).
In the case of Hadoop we parallelize the data load by splitting the load over individual map tasks, one for each node/topic/partition combination, allowing full parallelism in the loading. Hadoop provides the task management, and tasks which fail can restart without danger of duplicate data.
Implementation Details
The following gives a brief description of some relevant lower-level implementation details for some parts of the system described in the above section.
API Design
Producer APIs
The Producer API that wraps the 2 low-level producers - kafka.producer.SyncProducer and kafka.producer.async.AsyncProducer.
class Producer{ /* Sends the data, partitioned by key to the topic using either the */ /* synchronous or the asynchronous producer */ public void send(kafka.javaapi.producer.ProducerDataproducerData); /* Sends a list of data, partitioned by key to the topic using either */ /* the synchronous or the asynchronous producer */ public void send(java.util.List< kafka.javaapi.producer.ProducerData> producerData); /* Closes the producer and cleans up */ public void close(); }
The goal is to expose all the producer functionality through a single API to the client. The new producer -
can handle queueing/buffering of multiple producer requests and asynchronous dispatch of the batched data -
kafka.producer.Producer provides the ability to batch multiple produce requests (producer.type=async), before serializing and dispatching them to the appropriate kafka broker partition. The size of the batch can be controlled by a few config parameters. As events enter a queue, they are buffered in a queue, until either queue.time or batch.size is reached. A background thread (kafka.producer.async.ProducerSendThread) dequeues the batch of data and lets the kafka.producer.EventHandler serialize and send the data to the appropriate kafka broker partition. A custom event handler can be plugged in through the event.handler config parameter. At various stages of this producer queue pipeline, it is helpful to be able to inject callbacks, either for plugging in custom logging/tracing code or custom monitoring logic. This is possible by implementing the kafka.producer.async.CallbackHandler interface and setting callback.handler config parameter to that class.
handles the serialization of data through a user-specified Encoder -
interface Encoder<T> {
public Message toMessage(T data);
}

The default is the no-op kafka.serializer.DefaultEncoder
provides zookeeper based automatic broker discovery -
The zookeeper based broker discovery and load balancing can be used by specifying the zookeeper connection url through the zk.connect config parameter. For some applications, however, the dependence on zookeeper is inappropriate. In that case, the producer can take in a static list of brokers through the broker.list config parameter. Each produce requests gets routed to a random broker partition in this case. If that broker is down, the produce request fails.
provides software load balancing through an optionally user-specified Partitioner -
The routing decision is influenced by the kafka.producer.Partitioner.
interface Partitioner<T> {
   int partition(T key, int numPartitions);
}

The partition API uses the key and the number of available broker partitions to return a partition id. This id is used as an index into a sorted list of broker_ids and partitions to pick a broker partition for the producer request. The default partitioning strategy is hash(key)%numPartitions. If the key is null, then a random broker partition is picked. A custom partitioning strategy can also be plugged in using the partitioner.class config parameter.
Consumer APIs
We have 2 levels of consumer APIs. The low-level "simple" API maintains a connection to a single broker and has a close correspondence to the network requests sent to the server. This API is completely stateless, with the offset being passed in on every request, allowing the user to maintain this metadata however they choose.
The high-level API hides the details of brokers from the consumer and allows consuming off the cluster of machines without concern for the underlying topology. It also maintains the state of what has been consumed. The high-level API also provides the ability to subscribe to topics that match a filter expression (i.e., either a whitelist or a blacklist regular expression).
Low-level API
class SimpleConsumer {

/* Send fetch request to a broker and get back a set of messages. */
public ByteBufferMessageSet fetch(FetchRequest request);

/* Send a list of fetch requests to a broker and get back a response set. */
public MultiFetchResponse multifetch(List<FetchRequest> fetches);

/**
   * Get a list of valid offsets (up to maxSize) before the given time.
   * The result is a list of offsets, in descending order.
   * @param time: time in millisecs,
   *              if set to OffsetRequest$.MODULE$.LATIEST_TIME(), get from the latest offset available.
   *              if set to OffsetRequest$.MODULE$.EARLIEST_TIME(), get from the earliest offset available.
   */
public long[] getOffsetsBefore(String topic, int partition, long time, int maxNumOffsets);
}

The low-level API is used to implement the high-level API as well as being used directly for some of our offline consumers (such as the hadoop consumer) which have particular requirements around maintaining state.
High-level API
/* create a connection to the cluster */
ConsumerConnector connector = Consumer.create(consumerConfig);

interface ConsumerConnector {

/**
   * This method is used to get a list of KafkaStreams, which are iterators over
   * MessageAndMetadata objects from which you can obtain messages and their
   * associated metadata (currently only topic).
   * Input: a map of <topic, #streams>
   * Output: a map of <topic, list of message streams>
   */
public Map<String,List<KafkaStream>> createMessageStreams(Map<String,Int> topicCountMap);

/**
   * You can also obtain a list of KafkaStreams, that iterate over messages
   * from topics that match a TopicFilter. (A TopicFilter encapsulates a
   * whitelist or a blacklist which is a standard Java regex.)
   */
public List<KafkaStream> createMessageStreamsByFilter(
      TopicFilter topicFilter, int numStreams);

/* Commit the offsets of all messages consumed so far. */
public commitOffsets()

/* Shut down the connector */
public shutdown()
}

This API is centered around iterators, implemented by the KafkaStream class. Each KafkaStream represents the stream of messages from one or more partitions on one or more servers. Each stream is used for single threaded processing, so the client can provide the number of desired streams in the create call. Thus a stream may represent the merging of multiple server partitions (to correspond to the number of processing threads), but each partition only goes to one stream.
The createMessageStreams call registers the consumer for the topic, which results in rebalancing the consumer/broker assignment. The API encourages creating many topic streams in a single call in order to minimize this rebalancing. The createMessageStreamsByFilter call (additionally) registers watchers to discover new topics that match its filter. Note that each stream that createMessageStreamsByFilter returns may iterate over messages from multiple topics (i.e., if multiple topics are allowed by the filter).
Network Layer
The network layer is a fairly straight-forward NIO server, and will not be described in great detail. The sendfile implementation is done by giving the MessageSet interface a writeTo method. This allows the file-backed message set to use the more efficient transferTo implementation instead of an in-process buffered write. The threading model is a single acceptor thread and N processor threads which handle a fixed number of connections each. This design has been pretty thoroughly tested elsewhere and found to be simple to implement and fast. The protocol is kept quite simple to allow for future implementation of clients in other languages.
Messages
Messages consist of a fixed-size header and variable length opaque byte array payload. The header contains a format version and a CRC32 checksum to detect corruption or truncation. Leaving the payload opaque is the right decision: there is a great deal of progress being made on serialization libraries right now, and any particular choice is unlikely to be right for all uses. Needless to say a particular application using Kafka would likely mandate a particular serialization type as part of its usage. The MessageSet interface is simply an iterator over messages with specialized methods for bulk reading and writing to an NIO Channel.
Message Format
   /**
     * A message. The format of an N byte message is the following:
     *
     * If magic byte is 0
     *
     * 1. 1 byte "magic" identifier to allow format changes
     *
     * 2. 4 byte CRC32 of the payload
     *
     * 3. N - 5 byte payload
     *
     * If magic byte is 1
     *
     * 1. 1 byte "magic" identifier to allow format changes
     *
     * 2. 1 byte "attributes" identifier to allow annotations on the message independent of the version (e.g. compression enabled, type of codec used)
     *
     * 3. 4 byte CRC32 of the payload
     *
     * 4. N - 6 byte payload
     *
     */

Log
A log for a topic named "my_topic" with two partitions consists of two directories (namely my_topic_0 and my_topic_1) populated with data files containing the messages for that topic. The format of the log files is a sequence of "log entries""; each log entry is a 4 byte integer N storing the message length which is followed by the N message bytes. Each message is uniquely identified by a 64-bit integer offset giving the byte position of the start of this message in the stream of all messages ever sent to that topic on that partition. The on-disk format of each message is given below. Each log file is named with the offset of the first message it contains. So the first file created will be 00000000000.kafka, and each additional file will have an integer name roughly S bytes from the previous file where S is the max log file size given in the configuration.
The exact binary format for messages is versioned and maintained as a standard interface so message sets can be transfered between producer, broker, and client without recopying or conversion when desirable. This format is as follows:
On-disk format of a message

message length : 4 bytes (value: 1+4+n)
"magic" value : 1 byte
crc            : 4 bytes
payload        : n bytes

The use of the message offset as the message id is unusual. Our original idea was to use a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker. But since a consumer must maintain an ID for each server, the global uniqueness of the GUID provides no value. Furthermore the complexity of maintaining the mapping from a random id to an offset requires a heavy weight index structure which must be synchronized with disk, essentially requiring a full persistent random-access data structure. Thus to simplify the lookup structure we decided to use a simple per-partition atomic counter which could be coupled with the partition id and node id to uniquely identify a message; this makes the lookup structure simpler, though multiple seeks per consumer request are still likely. However once we settled on a counter, the jump to directly using the offset seemed natural—both after all are monotonically increasing integers unique to a partition. Since the offset is hidden from the consumer API this decision is ultimately an implementation detail and we went with the more efficient approach.
Writes
The log allows serial appends which always go to the last file. This file is rolled over to a fresh file when it reaches a configurable size (say 1GB). The log takes two configuration parameter M which gives the number of messages to write before forcing the OS to flush the file to disk, and S which gives a number of seconds after which a flush is forced. This gives a durability guarantee of losing at most M messages or S seconds of data in the event of a system crash.
Reads
Reads are done by giving the 64-bit logical offset of a message and an S-byte max chunk size. This will return an iterator over the messages contained in the S-byte buffer. S is intended to be larger than any single message, but in the event of an abnormally large message, the read can be retried multiple times, each time doubling the buffer size, until the message is read successfully. A maximum message and buffer size can be specified to make the server reject messages larger than some size, and to give a bound to the client on the maximum it need ever read to get a complete message. It is likely that the read buffer ends with a partial message, this is easily detected by the size delimiting.
The actual process of reading from an offset requires first locating the log segment file in which the data is stored, calculating the file-specific offset from the global offset value, and then reading from that file offset. The search is done as a simple binary search variation against an in-memory range maintained for each file.
The log provides the capability of getting the most recently written message to allow clients to start subscribing as of "right now". This is also useful in the case the consumer fails to consume its data within its SLA-specified number of days. In this case when the client attempts to consume a non-existant offset it is given an OutOfRangeException and can either reset itself or fail as appropriate to the use case.
The following is the format of the results sent to the consumer.
MessageSetSend (fetch result)

total length     : 4 bytes
error code       : 2 bytes
message 1        : x bytes
...
message n        : x bytes

MultiMessageSetSend (multiFetch result)

total length       : 4 bytes
error code         : 2 bytes
messageSetSend 1
...
messageSetSend n

Deletes
Data is deleted one log segment at a time. The log manager allows pluggable delete policies to choose which files are eligible for deletion. The current policy deletes any log with a modification time of more than N days ago, though a policy which retained the last N GB could also be useful. To avoid locking reads while still allowing deletes that modify the segment list we use a copy-on-write style segment list implementation that provides consistent views to allow a binary search to proceed on an immutable static snapshot view of the log segments while deletes are progressing.
Guarantees
The log provides a configuration parameter M which controls the maximum number of messages that are written before forcing a flush to disk. On startup a log recovery process is run that iterates over all messages in the newest log segment and verifies that each message entry is valid. A message entry is valid if the sum of its size and offset are less than the length of the file AND the CRC32 of the message payload matches the CRC stored with the message. In the event corruption is detected the log is truncated to the last valid offset.
Note that two kinds of corruption must be handled: truncation in which an unwritten block is lost due to a crash, and corruption in which a nonsense block is ADDED to the file. The reason for this is that in general the OS makes no guarantee of the write order between the file inode and the actual block data so in addition to losing written data the file can gain nonsense data if the inode is updated with a new size but a crash occurs before the block containing that data is not written. The CRC detects this corner case, and prevents it from corrupting the log (though the unwritten messages are, of course, lost).
Distribution
Zookeeper Directories
The following gives the zookeeper structures and algorithms used for co-ordination between consumers and brokers.
Notation
When an element in a path is denoted [xyz], that means that the value of xyz is not fixed and there is in fact a zookeeper znode for each possible value of xyz. For example /topics/[topic] would be a directory named /topics containing a sub-directory for each topic name. Numerical ranges are also given such as [0...5] to indicate the subdirectories 0, 1, 2, 3, 4. An arrow -> is used to indicate the contents of a znode. For example /hello -> world would indicate a znode /hello containing the value "world".
Broker Node Registry
/brokers/ids/[0...N] --> host:port (ephemeral node)

This is a list of all present broker nodes, each of which provides a unique logical broker id which identifies it to consumers (which must be given as part of its configuration). On startup, a broker node registers itself by creating a znode with the logical broker id under /brokers/ids. The purpose of the logical broker id is to allow a broker to be moved to a different physical machine without affecting consumers. An attempt to register a broker id that is already in use (say because two servers are configured with the same broker id) is an error.
Since the broker registers itself in zookeeper using ephemeral znodes, this registration is dynamic and will disappear if the broker is shutdown or dies (thus notifying consumers it is no longer available).
Broker Topic Registry
/brokers/topics/[topic]/[0...N] --> nPartions (ephemeral node)

Each broker registers itself under the topics it maintains and stores the number of partitions for that topic.
Consumers and Consumer Groups
Consumers of topics also register themselves in Zookeeper, in order to balance the consumption of data and track their offsets in each partition for each broker they consume from.
Multiple consumers can form a group and jointly consume a single topic. Each consumer in the same group is given a shared group_id. For example if one consumer is your foobar process, which is run across three machines, then you might assign this group of consumers the id "foobar". This group id is provided in the configuration of the consumer, and is your way to tell the consumer which group it belongs to.
The consumers in a group divide up the partitions as fairly as possible, each partition is consumed by exactly one consumer in a consumer group.
Consumer Id Registry
In addition to the group_id which is shared by all consumers in a group, each consumer is given a transient, unique consumer_id (of the form hostname:uuid) for identification purposes. Consumer ids are registered in the following directory.
/consumers/[group_id]/ids/[consumer_id] --> {"topic1": #streams, ..., "topicN": #streams} (ephemeral node)

Each of the consumers in the group registers under its group and creates a znode with its consumer_id. The value of the znode contains a map of <topic, #streams>. This id is simply used to identify each of the consumers which is currently active within a group. This is an ephemeral node so it will disappear if the consumer process dies.
Consumer Offset Tracking
Consumers track the maximum offset they have consumed in each partition. This value is stored in a zookeeper directory
/consumers/[group_id]/offsets/[topic]/[broker_id-partition_id] --> offset_counter_value ((persistent node)

Partition Owner registry
Each broker partition is consumed by a single consumer within a given consumer group. The consumer must establish its ownership of a given partition before any consumption can begin. To establish its ownership, a consumer writes its own id in an ephemeral node under the particular broker partition it is claiming.
/consumers/[group_id]/owners/[topic]/[broker_id-partition_id] --> consumer_node_id (ephemeral node)

Broker node registration
The broker nodes are basically independent, so they only publish information about what they have. When a broker joins, it registers itself under the broker node registry directory and writes information about its host name and port. The broker also register the list of existing topics and their logical partitions in the broker topic registry. New topics are registered dynamically when they are created on the broker.
Consumer registration algorithm
When a consumer starts, it does the following:
Register itself in the consumer id registry under its group.
Register a watch on changes (new consumers joining or any existing consumers leaving) under the consumer id registry. (Each change triggers rebalancing among all consumers within the group to which the changed consumer belongs.)
Register a watch on changes (new brokers joining or any existing brokers leaving) under the broker id registry. (Each change triggers rebalancing among all consumers in all consumer groups.)
If the consumer creates a message stream using a topic filter, it also registers a watch on changes (new topics being added) under the broker topic registry. (Each change will trigger re-evaluation of the available topics to determine which topics are allowed by the topic filter. A new allowed topic will trigger rebalancing among all consumers within the consumer group.)
Force itself to rebalance within in its consumer group.
Consumer rebalancing algorithm
The consumer rebalancing algorithms allows all the consumers in a group to come into consensus on which consumer is consuming which partitions. Consumer rebalancing is triggered on each addition or removal of both broker nodes and other consumers within the same group. For a given topic and a given consumer group, broker partitions are divided evenly among consumers within the group. A partition is always consumed by a single consumer. This design simplifies the implementation. Had we allowed a partition to be concurrently consumed by multiple consumers, there would be contention on the partition and some kind of locking would be required. If there are more consumers than partitions, some consumers won't get any data at all. During rebalancing, we try to assign partitions to consumers in such a way that reduces the number of broker nodes each consumer has to connect to.
Each consumer does the following during rebalancing:
   1. For each topic T that Ci subscribes to
   2.   let PT be all partitions producing topic T
   3.   let CG be all consumers in the same group as Ci that consume topic T
   4.   sort PT (so partitions on the same broker are clustered together)
   5.   sort CG
   6.   let i be the index position of Ci in CG and let N = size(PT)/size(CG)
   7.   assign partitions from i*N to (i+1)*N - 1 to consumer Ci
   8.   remove current entries owned by Ci from the partition owner registry
   9.   add newly assigned partitions to the partition owner registry
        (we may need to re-try this until the original partition owner releases its ownership)

When rebalancing is triggered at one consumer, rebalancing should be triggered in other consumers within the same group about the same time.

你可能感兴趣的:(design)

MATLAB语言基础教程、小项目1：简单的计算器、小项目2：有页面的计算器、使用App Designer创建GUI计算器 azuredragonz 学习教程 matlab 开发语言
MATLABMATLAB语言基础教程1.MATLAB简介2.基本语法变量与赋值向量与矩阵矩阵运算数学函数控制流3.函数4.绘图案例：简单方程求解小项目1：简单的科学计算器功能代码项目说明小项目2：有页面的计算器使用AppDesigner创建GUI计算器主要步骤：完整代码（使用MATLAB编写）说明：如何运行：小项目总结MATLAB语言基础教程1.MATLAB简介MATLAB（矩阵实验室）是一种用于
CISSP考点拾遗——软件保障SwA 我全家都是CISSP
说明：“考点拾遗”系列基于日常为学员和网友做的答疑整理，主要涉及教材中没有完全覆盖到的知识点。Softwareassuranceisthelevelofconfidencethatsoftwareisfreefromvulnerabilities,eitherintentionallydesignedintothesoftwareoraccidentallyinsertedatanytimedur
《Android进阶之光》读书笔记 soleil雪寂读书笔记 #Android进阶之光
文章目录第1章Android新特性1.1.Android5.0新特性1.2.RecyclerView1.1.4.3种Notification1.1.5.Toolbar与Palette1.1.6.Palette1.2.Android6.0新特性1.2.2.运行时权限机制1.3.Android7.0新特性第2章MaterialDesign2.2.DesignSupportLibrary常用控件详解第3
《Android进阶之光》— Android 书籍王睿丶 Android 永无止境《Android进阶之光》Android书籍 Android phoenix 移动开发
文章目录第1章Android新特性1第2章MaterialDesign48第3章View体系与自定义View87第4章多线程编程165第5章网络编程与网络框架204第6章设计模式271第7章事件总线308第8章函数响应式编程333第9章注解与依赖注入框架382第10章应用架构设计422第11章系统架构与MediaPlayer框架460出版年:2017-7简介：《Android进阶之光》是一本And
Qt控件编辑功能(二) 雨田哥工作号
简述根据QtDesigner的控件选中，拉伸效果，用过Qt的盆友都很熟悉Qt的Designer，这个我就不多说了，我们先看看QtDesigner中的效果QtDesigner效果图图这里写图片描述模仿功能介绍1.支持选中效果；2.支持自由拉伸效果；3.支持双击鼠标左键编辑功能；4.支持键盘↑↓←→按键移动；5.支持按住ctrl+鼠标左键多选控件功能；6.支持键盘delete键，删除选中控件功能；模仿
算法设计与分析合并排序的递归实现算法 Jxcupupup 算法算法算法设计与分析
合并排序的递归实现算法。输入：先输入进行合并排序元素的个数，然后依次随机输入（或随机生成）每个数字。输出：元素排序后的结果，数字之间不加任何标识符。示//完整代码在GitHub上//https://github.com/Jxcup/Course_Algorithm_Analysis-Design/blob/main/MergeSort_iteration.cpp//合并排序递归#includeus
如何利用命令模式实现一个手游后端架构? 隔窗听雨眠命令模式
命令模式的原理解读命令模式的英文翻译是CommandDesignPattern。在GoF的《设计模式》一书中，它是这么定义的：Thecommandpatternencapsulatesarequestasanobject,therebylettingusparameterizeotherobjectswithdifferentrequests,queueorlogrequests,andsuppo
【项目实践】Pyside6+Qtdesigner：登录窗体设计 climber1121 Python项目实战 QT 数据库 python
代码功能：可实现应用程序用户管理，实现用户登录、注册、密码修改代码文档结构：user_database.db：数据库文件，用于存储用户数据，第一运行代码时自动生成login.ui：UI文件，由QT设计ui_login.py（由login.ui编译过来）在终端WindowsPowerShell运行如下代码即可生成pyside6-uiclogin.ui-oui_login.pytest_login.p
零基础安卓开发起步（二） zhyuzh3d
本文介绍从零开始进行安卓APP的开发，MaterialDesign的使用。记录学习过程，仅供新手参考。引入MaterialDesignMaterialDesign是谷歌发布的一套安卓界面元素模板。首先我们把它引入到项目里面。首先从项目列表中打开【GradleScripts-Build.gradle(module:app)】文件。如果没有这个文件，不要着急，注意AndroidStudio底部的文字，
mysql的关键字 cindyliao mysql
在mysql中建表的时候，或者用PowerDesigner生成数据库的时候如果表名是mysql的关键字则会报下面类似的错，ERROR1064(42000):YouhaveanerrorinyourSQLsyntax;checkthemanualthatcorrespondstoyourMySQLserverversionfortherightsyntaxtousenear'condition'而生
React Native实现手风琴折叠菜单效果做全栈攻城狮
使用了各类手风琴组件，都出现了各类问题。而市面上手风琴效果的第三方组件又不是很多。最终使用的是antdesign的手风琴效果，简单高效。组件网址：https://rn.mobile.ant.design/components/accordion-cn/#components-accordion-demo-basic使用DEMO：https://github.com/ant-design/ant-d
synonym 11.08 rubine_zdy
翻译：MujihasbeenasynonymforMinimalismandstraightdesign.例子：JayChouhasbeenasynonymforpopularmusicamongthosepost-90s’.
极简之美：探索Minimal Linux Live的开源之旅芮妍娉Keaton
极简之美：探索MinimalLinuxLive的开源之旅minimalMinimalLinuxLive(MLL)isatinyeducationalLinuxdistribution,whichisdesignedtobebuiltfromscratchbyusingacollectionofautomatedshellscripts.MinimalLinuxLiveoffersacoreenvi
UNIOP ERT-16 工业 PLC w15305925923 服务器网络运维自动化 linux
UNIOPERT-16工业PLC描述ERT-16可满足对功能强大但成本低廉的操作员界面系统的需求。显示屏为16行单色像素LCD。这些可寻址的像素显示屏允许使用我们的DesignerforWindows软件包进行图形处理。每页可显示16行。每行最多可包含40个字符。亮度控制可调节显示屏，以方便在几乎任何条件下观看。ERT-16-工业PLC工作站16行x40个字符LCD全图形320x240像素分辨率触
探索Plaid 2.0：重塑Android应用的架构之美薄或默Nursing
探索Plaid2.0：重塑Android应用的架构之美plaidAnAndroidappwhichprovidesdesignnews&inspirationaswellasbeinganexampleofimplementingmaterialdesign.项目地址:https://gitcode.com/gh_mirrors/pl/plaid在移动应用开发的世界里，架构的稳健性和可扩展性是每个
微服务与DDD简单介绍 analu 微服务
首先微服务是一种架构模式，相比较单体架构，微服务架构更独立，能够单独更新和发布。微服务里面的服务仅仅用于某一个特定的业务功能。举个例子，单体架构就想一碗面条，所有模块都在一起，而微服务相当于甜甜圈，模块清楚，可以单独发布，想更新哪个就更新哪个。DDD（DomainDrivenDesign）,简称DDD，领域驱动设计康威定律（Conway'sLaw）组织----对应------微服务拆分DDD作用-
[译] Plaid 应用迁移到 AndroidX 的实践经历 weixin_34029680 移动开发 java runtime
原文地址：Cross-stitchingPlaidandAndroidX原文作者：TiemSong译文出自：掘金翻译计划本文永久链接：github.com/xitu/gold-m…译者：Mirosalva校对者：PhxNirvana一份AndroidX的迁移指南由VirginiaPoltrack提供图片。Plaid是一款呈现MaterialDesign风格和丰富交互界面的有趣应用。最近这款应用通过
3.创建型设计模式详解：生成器模式与原型模式的深度解析胡耀超设计模式原型模式设计模式生成器模式创建型设计模式 java 后端
设计模式（DesignPatterns）是软件开发中常用的解决方案，帮助开发者处理常见的设计问题。创建型设计模式专注于对象的实例化，旨在提高系统的灵活性和可维护性。在这篇文章中，我们将深入探讨创建型设计模式中的生成器模式（BuilderPattern）和原型模式（PrototypePattern），详细分析它们的应用场景、优缺点，并通过类图和综合案例加以对比。1.创建型设计模式概述创建型设计模式包
桌面出版与排版软件Adobe InDesign(ID)2024WIN/MAC下载及安装教程 ss_dda adobe macos
目录一、AdobeInDesign软件简介1.1软件概述1.2核心功能1.3应用领域二、下载AdobeInDesign三、系统要求3.1硬件要求3.2软件要求3.3兼容性四、使用前准备4.1学习资源4.2模板与资源4.3备份与版本控制一、AdobeInDesign软件简介1.1软件概述AdobeInDesign是一款专业的桌面出版与排版软件，广泛应用于杂志、书籍、宣传册、海报、报纸、数字出版物等多
领域驱动设计（DDD）在Java项目中的实践省赚客app开发者 java python 开发语言
领域驱动设计（DDD）在Java项目中的实践大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！领域驱动设计（Domain-DrivenDesign，简称DDD）是一种软件开发方法，旨在通过对业务领域的深入理解，构建高内聚、低耦合的系统。在Java项目中，DDD可以帮助我们更好地组织代码，清晰地表达业务逻辑。本文将通过代码示例，展示如何在Java项目中实践领域驱动设计。什
Java领域驱动设计 star.29 java 数据库开发语言
Java领域驱动设计在Java项目中实现领域驱动设计（Domain-DrivenDesign，简称DDD）时，通常会遵循一套特定的结构和原则来构建应用程序。DDD结构旨在通过深入理解业务领域来指导软件开发过程，使软件系统更加符合业务需求，同时提高代码的可维护性、可扩展性和可测试性。核心概念与原则1.领域模型：对业务领域的抽象描述，包括实体（Entity）、值对象（ValueObject）、聚合（A
vue3+ant design vue实现表格导出（后端返回文件流类型导出）炒毛豆 vue.js 前端 javascript
1、之前的博客介绍了，依据页面展示的table表格数据为基础展示表格导出，今天介绍下后端返回文件流来实现表格导出。导出import{ExportTheEmployeesTab}from'@/api/import';//导出import{downExcelUrl}from'@/utils/downloadFile';constsearchRef=ref();constexportData1=()=>
设计模式学习笔记（6）工厂方法摆码王子
本文实例代码：https://github.com/JamesZBL/java_design_patterns工厂方法（FactoryMethod)模式，又叫做虚拟构造（VirtualConstructor）模式或多态工厂(PolymorphicFactory）模式。工厂方法的特点是定义一个用于创建对象的接口，让子类决定实例化哪一个类。工厂方法使一个类的实例化延迟到其子类。实例这次以顾客点餐为例，
Axure元件库Ant Design中后台原型模板：提升设计与开发效率的利器招风的黑耳观点与探讨 axure Ant Design
企业对于中后台产品的设计与开发需求日益增长。为了提升用户体验和开发效率，设计者和开发者们不断寻求更加高效、统一的解决方案。AntDesign，作为阿里巴巴开源的一套企业级UI设计语言和React组件库，凭借其丰富的组件和统一的设计风格，已成为众多项目的首选。而在Axure中使用AntDesign元件库，更是为中后台产品的原型设计带来了极大的便利。AntDesign简介AntDesign是一套服务于
GoFly企业版里的阿里图标如何增加自定义图标到后台 GoFly开发者 javascript vue.js GoFly快速开发框架
1.在使用的vue页面引入图标组件import{Icon}from'@/components/Icon';2.在具体位置使用3.使用选择icon组件import{IconPicker,Icon}from'@/components/Icon';在使用位置的组件，返回图标名称{formData.icon=icon}">4.使用ArcoDesign内置的图标直接去arco官网图标组件找到相应的图标名称5
5分钟入门DOE：从原理到实操，轻松掌握实验设计的精髓！天行健王春城老师六西格玛经验分享
在科研和工程实践中，实验设计（DOE，DesignofExperiment）是一种极其重要的方法，能够帮助我们有效地规划实验、分析数据，从而得出科学结论。今天，就让我们一同走进DOE的世界，从方法到实践，用5分钟的时间轻松掌握实验设计的精髓！首先，我们需要了解DOE的基本原理。实验设计的主要目标是减少实验误差，提高实验的效率和准确性。通过合理安排实验因素及其水平，我们可以系统地探索各因素之间的关系
推荐开源项目：Canta-gtk-theme，您的桌面美化新选择！史恋姬Quimby
推荐开源项目：Canta-gtk-theme，您的桌面美化新选择！Canta-themeCantaisaflatMaterialDesignthemeforGTK3,GTK2andGnome-ShellwhichsupportsGTK3andGTK2baseddesktopenvironmentslikeGnome,Unity,Budgie,Pantheon,XFCE,Mate,etc.项目地址:
【Python百日进阶-Web开发-Feffery】Day301 - fac(feffery-antd-compontents)快速入门岳涛@心馨电脑 Dash Feffery 前端 python react.js
文章目录前言：fac是什么？“人生苦短，我用Python；Web开发，首选Feffery！”↓↓↓今日笔记↓↓↓一、fac是什么？1.1feffery-antd-components:AntDesign在Dash中的最佳实现1.2fac特性1.3版本1.4安装二、Dash+fac开发环境的准备2.1创建虚拟环境2.2安装Dash2.3安装fac2.4查看dash和fac版本前言：fac是什么？fe
Vue3 + vite + antd2 vite.config.js及SyntaxError: The requested module ‘/node_modules/.vite/ant-design 咏春-迪迦 JavaScript javascript vue.js 前端
项目中使用Vue3+vite+antdv2.x的vite.config.js配置文件，复制粘贴即可使用！/***Vite配置文件*@author斗宗强者*@since2021-12-1623:09*/import{resolve}from"path";import{defineConfig}from"vite";importvuefrom"@vitejs/plugin-vue";importVit
在AntDesignPro中的ProFormSelect组件中加入滚动加载及模糊搜索小童不学前端 Ant Design Pro 前端 javascript 开发语言
ProComponents组件使用ProFormSelect组件文章目录ProComponents组件使用前言一、下拉框组件二、使用步骤页面渲染数据请求总结前言最近在学习使用AntDesignPro，在这里将遇到的问题罗列出来，方便后续学习，同样，如果你最近也在使用，希望能帮到你。一、下拉框组件ProFormSelect支持滚动选择，如果数据过多，查找可能浪费时间，此时，我们可以使用自带的搜索功能
jdk tomcat 环境变量配置 Array_06 java jdk tomcat
Win7 下如何配置java环境变量 1。准备jdk包，win7系统，tomcat安装包（均上网下载即可） 2。进行对jdk的安装，尽量为默认路径（但要记住啊！！以防以后配置用。。。） 3。分别配置高级环境变量。电脑-->右击属性-->高级环境变量-->环境变量。分别配置 : path &nbs
Spring调SDK包报java.lang.NoSuchFieldError错误 bijian1013 java spring
在工作中调另一个系统的SDK包，出现如下java.lang.NoSuchFieldError错误。 org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.l
LeetCode[位运算] - #136 数组中的单一数 Cwind java 题解位运算 LeetCode Algorithm
原题链接：#136 Single Number 要求：给定一个整型数组，其中除了一个元素之外，每个元素都出现两次。找出这个元素注意：算法的时间复杂度应为O(n)，最好不使用额外的内存空间难度：中等分析：题目限定了线性的时间复杂度，同时不使用额外的空间，即要求只遍历数组一遍得出结果。由于异或运算 n XOR n = 0, n XOR 0 = n，故将数组中的每个元素进
qq登陆界面开发 15700786134 qq
今天我们来开发一个qq登陆界面，首先写一个界面程序，一个界面首先是一个Frame对象，即是一个窗体。然后在这个窗体上放置其他组件。代码如下： public class First { public void initul(){ jf=ne
Linux的程序包管理器RPM 被触发 linux
在早期我们使用源代码的方式来安装软件时，都需要先把源程序代码编译成可执行的二进制安装程序，然后进行安装。这就意味着每次安装软件都需要经过预处理-->编译-->汇编-->链接-->生成安装文件--> 安装，这个复杂而艰辛的过程。为简化安装步骤，便于广大用户的安装部署程序，程序提供商就在特定的系统上面编译好相关程序的安装文件并进行打包，提供给大家下载，我们只需要根据自己的
socket通信遇到EOFException 肆无忌惮_ EOFException
java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:
基于spring的web项目定时操作知了ing java Web
废话不多说，直接上代码，很简单配置一下项目启动就行 1，web.xml <?xml version="1.0" encoding="UTF-8"?> <web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="h
树形结构的数据库表Schema设计矮蛋蛋 schema
原文地址： http://blog.csdn.net/MONKEY_D_MENG/article/details/6647488 程序设计过程中，我们常常用树形结构来表征某些数据的关联关系，如企业上下级部门、栏目结构、商品分类等等，通常而言，这些树状结构需要借助于数据库完成持久化。然而目前的各种基于关系的数据库，都是以二维表的形式记录存储数据信息，
maven将jar包和源码一起打包到本地仓库 alleni123 maven
http://stackoverflow.com/questions/4031987/how-to-upload-sources-to-local-maven-repository <project> ... <build> <plugins> <plugin> <groupI
java IO操作与 File 获取文件或文件夹的大小，可读，等属性！！！百合不是茶
类 File File是指文件和目录路径名的抽象表示形式。 1，何为文件：标准文件（txt doc mp3...）目录文件（文件夹）虚拟内存文件 2，File类中有可以创建文件的 createNewFile（）方法,在创建新文件的时候需要try{} catch(）{}因为可能会抛出异常；也有可以判断文件是否是一个标准文件的方法isFile();这些防抖都
Spring注入有继承关系的类（2） bijian1013 java spring
被注入类的父类有相应的属性，Spring可以直接注入相应的属性，如下所例：1.AClass类 package com.bijian.spring.test4; public class AClass { private String a; private String b; public String getA() { retu
30岁转型期你能否成为成功人士 bijian1013 成长励志
很多人由于年轻时走了弯路，到了30岁一事无成，这样的例子大有人在。但同样也有一些人，整个职业生涯都发展得很优秀，到了30岁已经成为职场的精英阶层。由于做猎头的原因，我们接触很多30岁左右的经理人，发现他们在职业发展道路上往往有很多致命的问题。在30岁之前，他们的职业生涯表现很优秀，但从30岁到40岁这一段，很多人
【Velocity四】Velocity与Java互操作 bit1129 velocity
Velocity出现的目的用于简化基于MVC的web应用开发，用于替代JSP标签技术，那么Velocity如何访问Java代码.本篇继续以Velocity三http://bit1129.iteye.com/blog/2106142中的例子为基础， POJO package com.tom.servlets; public
【Hive十一】Hive数据倾斜优化 bit1129 hive
什么是Hive数据倾斜问题操作：join,group by,count distinct 现象：任务进度长时间维持在99%（或100%），查看任务监控页面，发现只有少量（1个或几个）reduce子任务未完成；查看未完成的子任务，可以看到本地读写数据量积累非常大，通常超过10GB可以认定为发生数据倾斜。原因：key分布不均匀倾斜度衡量：平均记录数超过50w且
在nginx中集成lua脚本：添加自定义Http头，封IP等 ronin47 nginx lua csrf
Lua是一个可以嵌入到Nginx配置文件中的动态脚本语言，从而可以在Nginx请求处理的任何阶段执行各种Lua代码。刚开始我们只是用Lua 把请求路由到后端服务器，但是它对我们架构的作用超出了我们的预期。下面就讲讲我们所做的工作。强制搜索引擎只索引mixlr.com Google把子域名当作完全独立的网站，我们不希望爬虫抓取子域名的页面，降低我们的Page rank。 location /{
java-3.求子数组的最大和 bylijinnan java
package beautyOfCoding; public class MaxSubArraySum { /** * 3.求子数组的最大和题目描述：输入一个整形数组，数组里有正数也有负数。数组中连续的一个或多个整数组成一个子数组，每个子数组都有一个和。求所有子数组的和的最大值。要求时间复杂度为O(n)。例如输入的数组为1, -2, 3, 10, -4,
Netty源码学习-FileRegion bylijinnan java netty
今天看org.jboss.netty.example.http.file.HttpStaticFileServerHandler.java 可以直接往channel里面写入一个FileRegion对象，而不需要相应的encoder： //pipeline（没有诸如“FileRegionEncoder”的handler）： public ChannelPipeline ge
使用ZeroClipboard解决跨浏览器复制到剪贴板的问题 cngolon 跨浏览器复制到粘贴板 Zero Clipboard
Zero Clipboard的实现原理 Zero Clipboard 利用透明的Flash让其漂浮在复制按钮之上，这样其实点击的不是按钮而是 Flash ，这样将需要的内容传入Flash，再通过Flash的复制功能把传入的内容复制到剪贴板。 Zero Clipboard的安装方法首先需要下载 Zero Clipboard的压缩包，解压后把文件夹中两个文件：ZeroClipboard.js
单例模式 cuishikuan 单例模式
第一种（懒汉，线程不安全）： public class Singleton { 2 private static Singleton instance; 3 pri
spring+websocket的使用 dalan_123
一、spring配置文件 <?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.or
细节问题：ZEROFILL的用法范围。 dcj3sjt126com mysql
1、zerofill把月份中的一位数字比如1，2，3等加前导0 mysql> CREATE TABLE t1 (year YEAR(4), month INT(2) UNSIGNED ZEROFILL, -> day
Android开发10——Activity的跳转与传值 dcj3sjt126com Android开发
Activity跳转与传值，主要是通过Intent类，Intent的作用是激活组件和附带数据。一、Activity跳转方法一Intent intent = new Intent(A.this, B.class); startActivity(intent) 方法二Intent intent = new Intent();intent.setCla
jdbc 得到表结构、主键 eksliang jdbc 得到表结构、主键
转自博客：http://blog.csdn.net/ocean1010/article/details/7266042 假设有个con DatabaseMetaData dbmd = con.getMetaData(); rs = dbmd.getColumns(con.getCatalog(), schema, tableName, null); rs.getSt
Android 应用程序开关GPS gqdy365 android
要在应用程序中操作GPS开关需要权限： <uses-permission android:name="android.permission.WRITE_SECURE_SETTINGS" /> 但在配置文件中添加此权限之后会报错，无法再eclipse里面正常编译，怎么办？ 1、方法一：将项目放到Android源码中编译； 2、方法二：网上有人说cl
Windows上调试MapReduce zhiquanliu mapreduce
1.下载hadoop2x-eclipse-plugin https://github.com/winghc/hadoop2x-eclipse-plugin.git 把 hadoop2.6.0-eclipse-plugin.jar 放到eclipse plugin 目录中。 2.下载 hadoop2.6_x64_.zip http://dl.iteye.com/topics/download/d2b
如何看待一些知名博客推广软文的行为？ justjavac 博客
本文来自我在知乎上的一个回答：http://www.zhihu.com/question/23431810/answer/24588621 互联网上的两种典型心态：当初求种像条狗，如今撸完嫌人丑当初搜贴像条犬，如今读完嫌人软你为啥感觉不舒服呢？难道非得要作者把自己的劳动成果免费给你用，你才舒服？就如同 Google 关闭了 Gooled Reader，那是
sql优化总结 macroli sql
为了是自己对sql优化有更好的原则性，在这里做一下总结，个人原则如有不对请多多指教。谢谢！要知道一个简单的sql语句执行效率，就要有查看方式，一遍更好的进行优化。一、简单的统计语句执行时间 declare @d datetime ---定义一个datetime的变量set @d=getdate() ---获取查询语句开始前的时间select user_id
Linux Oracle中常遇到的一些问题及命令总结超声波 oracle linux
1.linux更改主机名 (1)#hostname oracledb　　　　临时修改主机名 (2) vi /etc/sysconfig/network 　　修改hostname (3) vi /etc/hosts　　　　　　　　修改IP对应的主机名 2.linux重启oracle实例及监听的各种方法（注意操作的顺序应该是先监听，后数据库实例） &nbs
hive函数大全及使用示例 superlxw1234 hadoop hive函数
具体说明及示例参见附件文档。文档目录：目录一、关系运算： 4 1. 等值比较: = 4 2. 不等值比较: <> 4 3. 小于比较: < 4 4. 小于等于比较: <= 4 5. 大于比较: > 5 6. 大于等于比较: >= 5 7. 空值判断: IS NULL 5
Spring 4.2新特性-使用@Order调整配置类加载顺序 wiselyman spring 4
4.1 @Order Spring 4.2 利用@Order控制配置类的加载顺序 4.2 演示两个演示bean package com.wisely.spring4_2.order; public class Demo1Service { } package com.wisely.spring4_2.order; public class