蚊子_banner

【转载】高性能、高并发、高扩展性和可读性的网络服务器架构：StateThreads

译

高性能、高并发、高扩展性和可读性的网络服务器架构：StateThreads

2012年11月30日 11:25:57 win_lin 阅读数：16178

译文在后面；代码我放在GITHUB了：http://github.com/ossrs/state-threads。

State Threads for Internet Applications

Introduction

State Threads is an application library which provides a foundation for writing fast and highly scalable Internet Applications on UNIX-like platforms. It combines the simplicity of the multithreaded programming paradigm, in which one thread supports each simultaneous connection, with the performance and scalability of an event-driven state machine architecture.

1. Definitions

1.1 Internet Applications

An Internet Application (IA) is either a server or client network application that accepts connections from clients and may or may not connect to servers. In an IA the arrival or departure of network data often controls processing (that is, IA is a data-driven application). For each connection, an IA does some finite amount of work involving data exchange with its peer, where its peer may be either a client or a server. The typical transaction steps of an IA are to accept a connection, read a request, do some finite and predictable amount of work to process the request, then write a response to the peer that sent the request. One example of an IA is a Web server; the most general example of an IA is a proxy server, because it both accepts connections from clients and connects to other servers.

We assume that the performance of an IA is constrained by available CPU cycles rather than network bandwidth or disk I/O (that is, CPU is a bottleneck resource).

1.2 Performance and Scalability

The performance of an IA is usually evaluated as its throughput measured in transactions per second or bytes per second (one can be converted to the other, given the average transaction size). There are several benchmarks that can be used to measure throughput of Web serving applications for specific workloads (such as SPECweb96, WebStone, WebBench). Although there is no common definition for scalability, in general it expresses the ability of an application to sustain its performance when some external condition changes. For IAs this external condition is either the number of clients (also known as "users," "simultaneous connections," or "load generators") or the underlying hardware system size (number of CPUs, memory size, and so on). Thus there are two types of scalability: load scalability and system scalability, respectively.

The figure below shows how the throughput of an idealized IA changes with the increasing number of clients (solid blue line). Initially the throughput grows linearly (the slope represents the maximal throughput that one client can provide). Within this initial range, the IA is underutilized and CPUs are partially idle. Further increase in the number of clients leads to a system saturation, and the throughput gradually stops growing as all CPUs become fully utilized. After that point, the throughput stays flat because there are no more CPU cycles available. In the real world, however, each simultaneous connection consumes some computational and memory resources, even when idle, and this overhead grows with the number of clients. Therefore, the throughput of the real world IA starts dropping after some point (dashed blue line in the figure below). The rate at which the throughput drops depends, among other things, on application design.

We say that an application has a good load scalability if it can sustain its throughput over a wide range of loads. Interestingly, the SPECweb99 benchmark somewhat reflects the Web server's load scalability because it measures the number of clients (load generators) given a mandatory minimal throughput per client (that is, it measures the server's capacity). This is unlike SPECweb96 and other benchmarks that use the throughput as their main metric (see the figure below).

System scalability is the ability of an application to sustain its performance per hardware unit (such as a CPU) with the increasing number of these units. In other words, good system scalability means that doubling the number of processors will roughly double the application's throughput (dashed green line). We assume here that the underlying operating system also scales well. Good system scalability allows you to initially run an application on the smallest system possible, while retaining the ability to move that application to a larger system if necessary, without excessive effort or expense. That is, an application need not be rewritten or even undergo a major porting effort when changing system size.

Although scalability and performance are more important in the case of server IAs, they should also be considered for some client applications (such as benchmark load generators).

1.3 Concurrency

Concurrency reflects the parallelism in a system. The two unrelated types are virtual concurrency and real concurrency.

Virtual (or apparent) concurrency is the number of simultaneous connections that a system supports.

Real concurrency is the number of hardware devices, including CPUs, network cards, and disks, that actually allow a system to perform tasks in parallel.

An IA must provide virtual concurrency in order to serve many users simultaneously. To achieve maximum performance and scalability in doing so, the number of programming entities than an IA creates to be scheduled by the OS kernel should be kept close to (within an order of magnitude of) the real concurrency found on the system. These programming entities scheduled by the kernel are known as kernel execution vehicles. Examples of kernel execution vehicles include Solaris lightweight processes and IRIX kernel threads. In other words, the number of kernel execution vehicles should be dictated by the system size and not by the number of simultaneous connections.

2. Existing Architectures

There are a few different architectures that are commonly used by IAs. These include the Multi-Process, Multi-Threaded, and Event-Driven State Machine architectures.

2.1 Multi-Process Architecture

In the Multi-Process (MP) architecture, an individual process is dedicated to each simultaneous connection. A process performs all of a transaction's initialization steps and services a connection completely before moving on to service a new connection.

User sessions in IAs are relatively independent; therefore, no synchronization between processes handling different connections is necessary. Because each process has its own private address space, this architecture is very robust. If a process serving one of the connections crashes, the other sessions will not be affected. However, to serve many concurrent connections, an equal number of processes must be employed. Because processes are kernel entities (and are in fact the heaviest ones), the number of kernel entities will be at least as large as the number of concurrent sessions. On most systems, good performance will not be achieved when more than a few hundred processes are created because of the high context-switching overhead. In other words, MP applications have poor load scalability.

On the other hand, MP applications have very good system scalability, because no resources are shared among different processes and there is no synchronization overhead.

The Apache Web Server 1.x ([Reference 1]) uses the MP architecture on UNIX systems.

2.2 Multi-Threaded Architecture

In the Multi-Threaded (MT) architecture, multiple independent threads of control are employed within a single shared address space. Like a process in the MP architecture, each thread performs all of a transaction's initialization steps and services a connection completely before moving on to service a new connection.

Many modern UNIX operating systems implement a many-to-few model when mapping user-level threads to kernel entities. In this model, an arbitrarily large number of user-level threads is multiplexed onto a lesser number of kernel execution vehicles. Kernel execution vehicles are also known as virtual processors. Whenever a user-level thread makes a blocking system call, the kernel execution vehicle it is using will become blocked in the kernel. If there are no other non-blocked kernel execution vehicles and there are other runnable user-level threads, a new kernel execution vehicle will be created automatically. This prevents the application from blocking when it can continue to make useful forward progress.

Because IAs are by nature network I/O driven, all concurrent sessions block on network I/O at various points. As a result, the number of virtual processors created in the kernel grows close to the number of user-level threads (or simultaneous connections). When this occurs, the many-to-few model effectively degenerates to a one-to-one model. Again, like in the MP architecture, the number of kernel execution vehicles is dictated by the number of simultaneous connections rather than by number of CPUs. This reduces an application's load scalability. However, because kernel threads (lightweight processes) use fewer resources and are more light-weight than traditional UNIX processes, an MT application should scale better with load than an MP application.

Unexpectedly, the small number of virtual processors sharing the same address space in the MT architecture destroys an application's system scalability because of contention among the threads on various locks. Even if an application itself is carefully optimized to avoid lock contention around its own global data (a non-trivial task), there are still standard library functions and system calls that use common resources hidden from the application. For example, on many platforms thread safety of memory allocation routines (malloc(3), free(3), and so on) is achieved by using a single global lock. Another example is a per-process file descriptor table. This common resource table is shared by all kernel execution vehicles within the same process and must be protected when one modifies it via certain system calls (such as open(2), close(2), and so on). In addition to that, maintaining the caches coherent among CPUs on multiprocessor systems hurts performance when different threads running on different CPUs modify data items on the same cache line.

In order to improve load scalability, some applications employ a different type of MT architecture: they create one or more thread(s) per task rather than one thread per connection. For example, one small group of threads may be responsible for accepting client connections, another for request processing, and yet another for serving responses. The main advantage of this architecture is that it eliminates the tight coupling between the number of threads and number of simultaneous connections. However, in this architecture, different task-specific thread groups must share common work queues that must be protected by mutual exclusion locks (a typical producer-consumer problem). This adds synchronization overhead that causes an application to perform badly on multiprocessor systems. In other words, in this architecture, the application's system scalability is sacrificed for the sake of load scalability.

Of course, the usual nightmares of threaded programming, including data corruption, deadlocks, and race conditions, also make MT architecture (in any form) non-simplistic to use.

2.3 Event-Driven State Machine Architecture

In the Event-Driven State Machine (EDSM) architecture, a single process is employed to concurrently process multiple connections. The basics of this architecture are described in Comer and Stevens [Reference 2]. The EDSM architecture performs one basic data-driven step associated with a particular connection at a time, thus multiplexing many concurrent connections. The process operates as a state machine that receives an event and then reacts to it.

In the idle state the EDSM calls select(2) or poll(2) to wait for network I/O events. When a particular file descriptor is ready for I/O, the EDSM completes the corresponding basic step (usually by invoking a handler function) and starts the next one. This architecture uses non-blocking system calls to perform asynchronous network I/O operations. For more details on non-blocking I/O see Stevens [Reference 3].

To take advantage of hardware parallelism (real concurrency), multiple identical processes may be created. This is called Symmetric Multi-Process EDSM and is used, for example, in the Zeus Web Server ([Reference 4]). To more efficiently multiplex disk I/O, special "helper" processes may be created. This is called Asymmetric Multi-Process EDSM and was proposed for Web servers by Druschel and others [Reference 5].

EDSM is probably the most scalable architecture for IAs. Because the number of simultaneous connections (virtual concurrency) is completely decoupled from the number of kernel execution vehicles (processes), this architecture has very good load scalability. It requires only minimal user-level resources to create and maintain additional connection.

Like MP applications, Multi-Process EDSM has very good system scalability because no resources are shared among different processes and there is no synchronization overhead.

Unfortunately, the EDSM architecture is monolithic rather than based on the concept of threads, so new applications generally need to be implemented from the ground up. In effect, the EDSM architecture simulates threads and their stacks the hard way.

3. State Threads Library

The State Threads library combines the advantages of all of the above architectures. The interface preserves the programming simplicity of thread abstraction, allowing each simultaneous connection to be treated as a separate thread of execution within a single process. The underlying implementation is close to the EDSM architecture as the state of each particular concurrent session is saved in a separate memory segment.

3.1 State Changes and Scheduling

The state of each concurrent session includes its stack environment (stack pointer, program counter, CPU registers) and its stack. Conceptually, a thread context switch can be viewed as a process changing its state. There are no kernel entities involved other than processes. Unlike other general-purpose threading libraries, the State Threads library is fully deterministic. The thread context switch (process state change) can only happen in a well-known set of functions (at I/O points or at explicit synchronization points). As a result, process-specific global data does not have to be protected by mutual exclusion locks in most cases. The entire application is free to use all the static variables and non-reentrant library functions it wants, greatly simplifying programming and debugging while increasing performance. This is somewhat similar to a co-routine model (co-operatively multitasked threads), except that no explicit yield is needed -- sooner or later, a thread performs a blocking I/O operation and thus surrenders control. All threads of execution (simultaneous connections) have the same priority, so scheduling is non-preemptive, like in the EDSM architecture. Because IAs are data-driven (processing is limited by the size of network buffers and data arrival rates), scheduling is non-time-slicing.

Only two types of external events are handled by the library's scheduler, because only these events can be detected by select(2) or poll(2): I/O events (a file descriptor is ready for I/O) and time events (some timeout has expired). However, other types of events (such as a signal sent to a process) can also be handled by converting them to I/O events. For example, a signal handling function can perform a write to a pipe (write(2) is reentrant/asynchronous-safe), thus converting a signal event to an I/O event.

To take advantage of hardware parallelism, as in the EDSM architecture, multiple processes can be created in either a symmetric or asymmetric manner. Process management is not in the library's scope but instead is left up to the application.

There are several general-purpose threading libraries that implement a many-to-one model (many user-level threads to one kernel execution vehicle), using the same basic techniques as the State Threads library (non-blocking I/O, event-driven scheduler, and so on). For an example, see GNU Portable Threads ([Reference 6]). Because they are general-purpose, these libraries have different objectives than the State Threads library. The State Threads library is not a general-purpose threading library, but rather an application library that targets only certain types of applications (IAs) in order to achieve the highest possible performance and scalability for those applications.

3.2 Scalability

State threads are very lightweight user-level entities, and therefore creating and maintaining user connections requires minimal resources. An application using the State Threads library scales very well with the increasing number of connections.

On multiprocessor systems an application should create multiple processes to take advantage of hardware parallelism. Using multiple separate processes is the only way to achieve the highest possible system scalability. This is because duplicating per-process resources is the only way to avoid significant synchronization overhead on multiprocessor systems. Creating separate UNIX processes naturally offers resource duplication. Again, as in the EDSM architecture, there is no connection between the number of simultaneous connections (which may be very large and changes within a wide range) and the number of kernel entities (which is usually small and constant). In other words, the State Threads library makes it possible to multiplex a large number of simultaneous connections onto a much smaller number of separate processes, thus allowing an application to scale well with both the load and system size.

3.3 Performance

Performance is one of the library's main objectives. The State Threads library is implemented to minimize the number of system calls and to make thread creation and context switching as fast as possible. For example, per-thread signal mask does not exist (unlike POSIX threads), so there is no need to save and restore a process's signal mask on every thread context switch. This eliminates two system calls per context switch. Signal events can be handled much more efficiently by converting them to I/O events (see above).

3.4 Portability

The library uses the same general, underlying concepts as the EDSM architecture, including non-blocking I/O, file descriptors, and I/O multiplexing. These concepts are available in some form on most UNIX platforms, making the library very portable across many flavors of UNIX. There are only a few platform-dependent sections in the source.

3.5 State Threads and NSPR

The State Threads library is a derivative of the Netscape Portable Runtime library (NSPR) [Reference 7]. The primary goal of NSPR is to provide a platform-independent layer for system facilities, where system facilities include threads, thread synchronization, and I/O. Performance and scalability are not the main concern of NSPR. The State Threads library addresses performance and scalability while remaining much smaller than NSPR. It is contained in 8 source files as opposed to more than 400, but provides all the functionality that is needed to write efficient IAs on UNIX-like platforms.

	NSPR	State Threads
Lines of code	~150,000	~3000
Dynamic library size(debug version)
IRIX	~700 KB	~60 KB
Linux	~900 KB	~70 KB

Conclusion

State Threads is an application library which provides a foundation for writing Internet Applications. To summarize, it has the following advantages:

It allows the design of fast and highly scalable applications. An application will scale well with both load and number of CPUs.

It greatly simplifies application programming and debugging because, as a rule, no mutual exclusion locking is necessary and the entire application is free to use static variables and non-reentrant library functions.

The library's main limitation:

All I/O operations on sockets must use the State Thread library's I/O functions because only those functions perform thread scheduling and prevent the application's processes from blocking.

References

Apache Software Foundation, http://www.apache.org.
Douglas E. Comer, David L. Stevens, Internetworking With TCP/IP, Vol. III: Client-Server Programming And Applications, Second Edition, Ch. 8, 12.
W. Richard Stevens, UNIX Network Programming, Second Edition, Vol. 1, Ch. 15.
Zeus Technology Limited, http://www.zeus.co.uk.
Peter Druschel, Vivek S. Pai, Willy Zwaenepoel, Flash: An Efficient and Portable Web Server. In Proceedings of the USENIX 1999 Annual Technical Conference, Monterey, CA, June 1999.
GNU Portable Threads, http://www.gnu.org/software/pth/.
Netscape Portable Runtime, http://www.mozilla.org/docs/refList/refNSPR/.

Other resources covering various architectural issues in IAs

Dan Kegel, The C10K problem, http://www.kegel.com/c10k.html.
James C. Hu, Douglas C. Schmidt, Irfan Pyarali, JAWS: Understanding High Performance Web Systems, http://www.cs.wustl.edu/~jxh/research/research.html.

网络架构库：StateThreads

介绍

StateThreads是一个C的网络程序开发库，提供了编写高性能、高并发、高可读性的网络程序的开发库，支持UNIX-like平台。它结合了多线程编写并行成的简单性，一个进程支持多个并发，支持基于事件的状态机架构的高性能和高并发能力。

(译注：提供了EDSM的高性能、高并发、稳定性，“多线程”形式的简单编程方式，用setjmp和longjmp实现的一个线程模拟多线程，即用户空间的多线程，类似于现在的协程和纤程)

1. 定义

1.1 网络程序（Internet Applications）

网络程序（Internet Application）（IA）是一个网络的客户端或者服务器程序，它接受客户端连接，同时可能需要连接到其他服务器。在IA中，数据的到达和发送完毕经常操纵控制流，就是说IA是数据驱动的程序。对每个连接，IA做一些有限的工作，包括和peer的数据交换，peer可能是客户端或服务器。IA典型的事务步骤是：接受连接，读取请求，做一些有限的工作处理请求，将相应写入peer。一个iA的例子是Web服务器，更典型的例子是代理服务器，因为它接受客户端连接，同时也连接到其他服务器。

我们假定IA的性能由CPU决定，而不是由网络带宽或磁盘IO决定，即CPU是系统瓶颈。

1.2 性能和可扩展性

IA的性能一般可以用吞吐量来评估，即每秒的事务数，或每秒的字节数（两者可以相互转换，给定事务的平均大小就可以）。有很多种工具可以用来测量Web程序的特定负载，譬如SPECweb96, WebStone, WebBench。尽管对扩展性没有通用的定义，一般而言，可扩展性指系统在外部条件改变时维持它的性能的能力。对于IAs而言，外部条件指连接数（并发），或者底层硬件（CPU数目，内存等）。因此，有两种系统的扩展性：负载能力和系统能力。

（译注：scalability可扩展性，指条件改变了系统是否还能高效运行，譬如负载能力指并发（条件）增多时系统是否能承担这么多负载，系统能力指CPU等增多时是否能高效的利用多CPU达到更强的能力）

下图描述了客户端数目增多时系统的吞吐量的变化，蓝色线条表示理想状况。最开始时吞吐量程线性增长，这个区间系统和CPU较为空闲。继续增长的连接数导致系统开始饱和，吞吐量开始触及天花板（CPU跑满能跑到的吞吐量），在天花板之后吞吐量变为平行线不再增长，因为CPU能力到达了极限。在实际应用中，每个连接消耗了计算资源和内存资源，就算是空闲状态，这些负担都随连接数而增长，因此，实际的IA吞吐量在某个点之后开始往下落（蓝色虚线表示）。开始掉的点，不是其他的原因，而是由系统架构决定的。

我们将系统有好的负载能力，是指系统在高负载时仍能很好的工作。SPECweb99基准测试能较好的反应系统的负载能力，因为它测量的是连接在最小流量需求时系统能支持的最大连接数（译注：如图中Capacity所指出的点即灰色斜线和蓝色线交叉的点）。而不像SPECweb96或其他的基准测试，是以系统的吞吐量来衡量的（译注：图中Max throughout，即蓝色线的天花板）。

系统能力指程序在增加硬件单元例如加CPU时系统的性能，换句话说，好的系统能力意味着CPU加倍时吞吐量会加倍（图中绿色虚线）。我们假设底层操作系统也具有很好的系统能力。好的系统能力指假设程序在一个小的机器上运行很好，当有需要换到大型服务器上运行时也能获得很高的性能。就是说，改变服务器环境时，系统不需要重写或者费很大的劲。

(译注：

纵坐标是吞吐量，横坐标是连接数。

灰色的线（min acceptable throughout pre client）表示是客户端的需要的吞吐量，至少这个量才流畅。

蓝色表示理想状态的server，系统能力一直没有问题，能达到最大吞吐量，CPU跑满能跑到的吞吐量。

蓝色虚线表示实际的server，每个连接都会消耗CPU和内存，所以在某个临界点之后吞吐量开始往下掉，这个临界点就是系统结构决定的。好的系统架构能将临界点往后推，稳定的支持更高的并发；差的架构在并发增加时可能系统就僵死了。

灰色虚线表示两个测量基准，一个是SPECweb96测量的是系统最大吞吐量，一个是SPECweb99测量每个连接在最小要求流量下系统能达到的最大连接数，后者更能反应系统的负载能力，因为它测量不同的连接的状况下系统的负载能力。

负载能力指的是系统支撑的最大负载，图中的横坐标上的值，对应的蓝色线和灰色线交叉的点，或者是蓝色线往下掉的点。

系统能力指的是增加服务器能力，如加CPU时，系统的吞吐量是否也会增加，图中绿色线表示。好的系统能力会在CPU增加时性能更高，差的系统能力增加CPU也不会更强。

)

尽管性能和扩展性对服务器来讲更重要，客户端也必须考虑这个问题，例如性能测试工具。

1.3 并发

并发反应了系统的并行能力，分为虚拟并发和物理并发：

虚拟并发是指操作系统同时支持很多并发的连接。

物理并发是指硬件设备，例如CPU，网卡，硬盘等，允许系统并行执行任务。

IA必须提供虚拟并发来支持用户的并发访问，为了达到最大的性能，IA创建的由内核调度的编程实体数目基本上和物理并发的数量要保持一致（在一个数量级上）（译注：有多少个CPU就用多少个进程）。内核调度的编程实体即内核执行对象（kernel execution vehicles），包括Solaris轻量级进程，IRIX内核线程。换句话说，内核执行对象应该由物理条件决定，而不是由并发决定（译注：即进程数目应该由CPU决定，而不是由连接数决定）。

2. 现有的架构

IAs(Internet Applications)有一些常见的被广泛使用的架构，包括基于进程的架构（Multi-Process）,基于线程的架构（Multi-Threaded）, 和事件驱动的状态机架构（Event-Driven State Machine）。

2.1 基于进程的架构：MP

（译注：Multi-Process字面意思是多进程，但事件驱动的状态机EDSM也常用多进程，所以为了区分，使用“基于进程的架构”，意为每个连接一个进程的架构）

在基于进程的架构（MP）中，一个独立的进程用来服务一个连接。一个进程从初始化到服务这个连接，直到服务完毕才服务其他连接。

用户Session是完全独立的，因此，在这些处理不同的连接的进程之间，完全没有同步的必要。因为每个进程有自己独立的地址空间，这种架构非常强壮。若服务某个连接的进程崩溃，其他的连接不会受到任何影响。然而，为了服务很多并发的连接，必须创建相等数量的进程。因为进程是内核对象，实际上是最“重”的一种对象，所以至少需要再内核创建和连接数相等的进程。在大多数的系统中，当创建了上千个进程时，系统性能将大幅降低，因为超负荷的上下文切换。也就是说，MP架构负载能力很弱，无法支持高负载（高并发）。

另一方面，MP架构有很高的系统能力（利用系统资源，稳定性，复杂度），因为不同的进程之间没有共享资源，因而没有同步的负担。

ApacheWeb服务器就是采用的MP架构。

2.2 基于线程的架构：MT

（译注：Multi-Threaded字面意思是多线程，但侧重一个线程服务一个连接的方式，用“基于线程”会更准确）

在基于线程（MT）架构中，使用多个独立的线程，它们共享地址空间。和MP结构的进程一样，每个线程独立服务每个连接直到服务完毕，这个线程才用来服务其他连接。

很多现代的UNIX操作系统实现了一个多对一的模型，用来映射用户空间的线程到系统内核对象。在这个模型中，任意多数量的用户空间线程复用少量的内核执行对象，内核执行对象即为虚拟处理器。当用户空间线程调用了一个阻塞的系统调用时，内核执行对象也会在内核阻塞。如果没有其他没有阻塞的内核执行对象，或者有其他需要运行的用户空间线程，一个新的内核执行对象会被自动创建，这样就防止一个线程阻塞时其他线程都被阻塞。

由于IAs由网络IO驱动，所有的并发连接都会阻塞在不同的地方。因此，内核执行对象的数目会接近用户空间线程的数目，也就是连接的数目。此时，多对一的模型就退化为一对一的模型，和MP架构一样，内核执行对象的数目由并发决定而不是由CPU数目决定。和MP一样，这降低了系统的负载能力。尽管这样，由于内核线程是轻量级进程，使用了较少的资源，比内核进程要轻，MT架构比MP架构在负载能力方面稍强一些。

在MT架构中，内核线程共享了地址空间，各种同步锁破坏了系统能力。尽管程序可以很小心的避免锁来提高程序性能（是个复杂的任务），标准库函数和系统调用也会对通用资源上锁，例如，平台提供的线程安全函数，例如内存分配函数（malloc，free等）都是用了一个全局锁。另外一个例子是进程的文件描述表，这个表被内核线程共享，在系统调用（open，close等）时需要保护。除此之外，多核系统中需要在CPU之间维护缓存的一致，当不同的线程运行在不同的CPU上并修改同样的数据时，严重降低了系统的性能。

为了提高负载能力，产生了一些不同类型的MT架构：创建多组线程，每组线程服务一个任务，而不是一个线程服务一个连接。例如，一小组线程负责处理客户端连接的任务，另外一组负责处理请求，其他的负责处理响应。这种架构的主要优点是它对并发和线程解耦了，不再需要同等数量的线程服务连接。尽管这样，线程组之间必须共享任务队列，任务队列需要用锁来保护（典型的生产者-消费者问题）。额外的线程同步负担导致在多处理器系统上性能很低。也就是说，这种架构用系统能力换取了负载能力（用性能换高并发）。

当然，线程编程的噩梦，包括数据破坏，死锁，条件竞争，也导致了任何形式的MT架构无法实用。

2.3 基于事件的状态机架构：EDSM

在基于事件驱动的状态机架构（EDSM）中，一个进程用来处理多个并发。Comer和Stevens[Reference 2]描述了这个架构的基础。EDSM架构中，每次每个连接只由数据驱动一步（译注：例如，收一个包，动作一次），因此必须复用多个并发的连接（译注：必须复用一个进程处理多个连接），进程设计成状态机每次收到一个时间就处理并变换到下一个状态。

在空闲状态时，EDSM调用select/poll/epoll等待网络事件，当一个特殊的连接可以读写时，EDSM调用响应的处理函数处理，然后处理下一个连接。EDSM架构使用非阻塞的系统调用完成异步的网络IO。关于非阻塞的IO，请参考Stevens [Reference 3]。

为了利用硬件并行性能，可以创建多个独立的进程，这叫均衡的多进程EDSM，例如ZeusWeb服务器[Reference 4]（译注：商业的高性能服务器）。为了更好的利用多磁盘的IO性能，可以创建一些辅助进程，这叫非均衡的多进程EDSM，例如DruschelWeb服务器[Reference 5]。

EDSM架构可能是IAs的最佳架构，因为并发连接完全和内核进程解耦，这种架构有很高的负载能力，它仅仅需要少量的用户空间的资源来管理连接。

和MP架构一样，多核的EDSM架构也有很高的系统能力（多核性能，稳定性等），因为进程间没有资源共享，所以没有同步锁的负担。

不幸的是，EDSM架构实际上是基于线程的概念（译注：状态机保存的其实就是线程的栈，上次调用的位置，下次继续从这个状态开始执行，和线程是一样的），所以新的EDSM系统需要从头开始实现状态机。实际上，EDSM架构用很复杂的方式模拟了多线程。

3. State Threads Library

StateThreads库结合了上面所有架构的优点，它的api提供了像线程一样的编程方式，允许一个并发在一个“线程”里面执行，但这些线程都在一个进程里面。底层的实现和EDSM架构类似，每个并发连接的session在单独的内存空间。

（译注：StateThreads提供的就是EDSM机制，只是将状态机换成了它的“线程”（协程或纤程），这些“线程”实际上是一个进程一个线程实现但表现起来像多线程。所以StateThread的模型是EDSM的高性能和高并发，然后提供了MT的可编程性和简单接口，简化了EDSM的状态机部分。）

3.1 状态改变和调度

每个并发的session包含它自己的栈环境（栈指针，PC，CPU寄存器）和它的栈。从概念上讲，一次线程上下文切换相当于进程改变它的状态。当然除了进程之外，并没有使用线程（译注：它是单线程的方式模拟多线程）。和其他通用的线程库不一样，StateThreads库的设计目标很明确。线程上下文切换（进程状态改变）只会在一些函数中才会发生（IO点，或者明确的同步点）。所以，进程级别的数据不需要锁来保护，因为是单线程。整个程序可以自由的使用静态变量和不可重入的函数，极大的简化了编程和调试，从而增加了性能。这实际上是和协程（co-routine）类似，但是不需要显式的用yield指定——线程调用阻塞的IO函数被阻塞而交出控制权是早晚的事。所有的线程（并发连接）都有同样的优先级，所以是非抢占式的调度，和EDSM架构类似。由于IAs是数据驱动（处理流程由网络缓冲区大小和数据到达的次序决定），调度不是按时间切片的。

只有两类的外部事件可以被库的调度器处理，因为只有这类事件能被select/poll检测到：

1. IO事件：一个文件描述符可读写时。

2. 定时器时间：指定了timeout。

尽管这样，其他类型的事件（譬如发送给进程的信号）也能被转换成IO事件来处理。例如，信号处理函数收到信号时可以写入pipe，因此将信号转换成了IO事件。

为了能更好的发挥硬件并行的性能，和EDSM架构一样，可以创建均衡和非均衡的进程。进程管理不是库的功能，而是留给用户处理。

有一些通用的线程库，实现了多对一的模型（多个用户空间的线程，对一个内核执行对象），使用了和StateThreads库类似的技术（非阻塞IO，事件驱动的调度器等）。譬如，GNU Portable Threads [Reference 6]。因为他们是通用库，所以它们和StateThreads有不同的目标。StateThreads不是通用的线程库，而是为少数的需要获得高性能、高并发、高扩展性和可读性的IAs系统而设计的。

3.2 可扩展性

StateThreads是非常轻量级的用户空间线程，因此创建和维护用户连接需要很少的资源。使用StateThreads的系统在高并发时能获得很高性能。

多CPU的系统上，程序需要创建多个进程才能利用硬件的平行能力。使用独立的进程是唯一获取高系统能力的方式，因为复制进程的资源是唯一的方式来避免锁和同步这种负担的唯一方式。创建UNIX进程一般会复制进程的资源。再次强调，EDSM架构中，并发的连接和系统对象（进程线程）没有任何的联系，也就是说，StateThreads库将大量并发复用到了少量的独立的进程上，因此获得很高的系统能力和负载能力。

3.3 性能

高性能是StateThreads库的主要目标之一，它实现了一系列的系统调用，尽可能的提高线程创建和切换的速度。例如，没有线程级别的信号屏蔽（和POSIX线程不一样），所以线程切换时不需要保存和恢复进程的信号屏蔽字，这样在线程切换时少了两个系统调用。信号事件能被高效的转换成IO事件（如上所述）。

3.4 便携性

StateThreads库使用了和EDSM架构同样的基础概念，包括非阻塞IO，文件描述符，IO复用。这些概念在大多数的UNIX平台都通用，所以UNIX下库的通用性很好，只有少数几个平台相关的特性。

3.5 State Threads 和 NSPR

StateThreads库是从Netscape Portable Runtime library (NSPR) [Reference 7]发展来的。NSPR主要的目标是提供一个平台无关的系统功能，包括线程，线程同步和IO。性能和可扩展性不是NSPR主要考虑的问题。StateThreads解决了性能和可扩展性问题，但是比NSPR要小很多；它仅仅包含8个源文件，却提供了在UNIX下写高效IAs系统的必要功能：

	NSPR	State Threads
Lines of code	~150,000	~3000
Dynamic library size(debug version)
IRIX	~700 KB	~60 KB
Linux	~900 KB	~70 KB

总结

StateThreads是一个提供了编写IA的基础库，它包含以下优点：

1. 能设计出高效的IA系统，包括很高的负载能力和系统能力。

2. 简化了编程和调试，因为没有同步锁，可以使用静态变量和不可重入函数。

它主要的限制：

1. 所有socket的IO必须要使用库的IO函数，因为调度器可以避免被阻塞（译注：用操作系统的socket的IO函数自然调度器就管不了了）。

References

Apache Software Foundation, http://www.apache.org.
Douglas E. Comer, David L. Stevens, Internetworking With TCP/IP, Vol. III: Client-Server Programming And Applications, Second Edition, Ch. 8, 12.
W. Richard Stevens, UNIX Network Programming, Second Edition, Vol. 1, Ch. 15.
Zeus Technology Limited, http://www.zeus.co.uk.
Peter Druschel, Vivek S. Pai, Willy Zwaenepoel, Flash: An Efficient and Portable Web Server. In Proceedings of the USENIX 1999 Annual Technical Conference, Monterey, CA, June 1999.
GNU Portable Threads, http://www.gnu.org/software/pth/.
Netscape Portable Runtime, http://www.mozilla.org/docs/refList/refNSPR/.

Other resources covering various architectural issues in IAs

Dan Kegel, The C10K problem, http://www.kegel.com/c10k.html.
James C. Hu, Douglas C. Schmidt, Irfan Pyarali, JAWS: Understanding High Performance Web Systems, http://www.cs.wustl.edu/~jxh/research/research.html.

译注：

用StateThread写了几个程序。

开启10K和30K个线程的程序：


  
    
    
    
    
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        #include 
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
        
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        /*
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        build and execute
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
            gcc -I../obj -g huge_threads.c ../obj/libst.a  -o huge_threads;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
            ./huge_threads 10000
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        10K report:
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
            10000 threads, running on 1 CPU 512M machine,
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
            CPU 6%, MEM 8.2% (~42M = 42991K = 4.3K/thread)
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        30K report:
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
            30000 threads, running on 1CPU 512M machine,
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
            CPU 3%, MEM 24.3% (4.3K/thread)
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        */
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        #include  
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
        
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        void* do_calc(void* arg){
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        int sleep_ms = (
      
        
        
        
        int)(
      
        
        
        
        long 
      
        
        
        
        int)(
      
        
        
        
        char*)arg * 
      
        
        
        
        10;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        for(;;){
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
               
      
        
        
        
        printf(
      
        
        
        
        "in sthread #%dms\n", sleep_ms);
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
              st_usleep(sleep_ms * 
      
        
        
        
        1000);
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
          }
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        return 
      
        
        
        
        NULL;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
      }
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
        
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
      
        
        
        
        int main(int argc, char** argv){
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        if(argc <= 
      
        
        
        
        1){
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
               
      
        
        
        
        printf(
      
        
        
        
        "Test the concurrence of state-threads!\n"
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
                   
      
        
        
        
        "Usage: %s \n"
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
                   
      
        
        
        
        "eg. %s 10000\n", argv[
      
        
        
        
        0], argv[
      
        
        
        
        0]);
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
               
      
        
        
        
        return 
      
        
        
        
        -1;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
          }
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        if(st_init() < 
      
        
        
        
        0){
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
               
      
        
        
        
        printf(
      
        
        
        
        "error!");
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
               
      
        
        
        
        return 
      
        
        
        
        -1;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
          }
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        int i;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        int count = atoi(argv[
      
        
        
        
        1]);
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        for(i = 
      
        
        
        
        1; i <= count; i++){
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
               
      
        
        
        
        if(st_thread_create(do_calc, (
      
        
        
        
        void*)i, 
      
        
        
        
        0, 
      
        
        
        
        0) == 
      
        
        
        
        NULL){
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
                   
      
        
        
        
        printf(
      
        
        
        
        "error!");
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
                   
      
        
        
        
        return 
      
        
        
        
        -1;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
              }
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
          }
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
          st_thread_exit(
      
        
        
        
        NULL);
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
           
      
        
        
        
        return 
      
        
        
        
        0;
     
       
       
       
       
    
      
      
      
      
     
     
     
     
    
      
      
      
      
     
       
       
       
       
    
      
      
      
      
    
      
      
      
      
     
       
       
       
       
        
        
        
        
      }

你可能感兴趣的:(高性能,高性能)

高性能数据库-Redis详解经典1992 数据库 redis 缓存
Redis（RemoteDictionaryServer）是一款高性能的开源键值对数据库，以“快”和“灵活”为核心优势，广泛应用于缓存、会话存储、实时排行榜、消息队列等场景。下面从基础概念、核心特性、应用场景到进阶用法，带你“深入浅出”了解Redis。一、Redis核心定位：为什么选择Redis？Redis的核心竞争力可以用三个词概括：快、灵活、功能丰富。快：基于内存存储（数据直接存在内存中，而非
Redis 深度解析：从核心原理到生产实践 Pasregret 缓存 redis 数据库缓存
Redis深度解析：从核心原理到生产实践一、Redis核心定位与数据结构1.核心能力矩阵深度解析Redis作为高性能内存数据库，核心能力覆盖缓存、数据存储、消息中间件等场景，其设计哲学围绕速度优先、内存高效、功能丰富展开：内存存储特性纯内存操作：基于内存寻址的O(1)复杂度数据操作，单节点QPS可达10万+持久化方案：RDB（快照）与AOF（日志）双模式，支持数据持久化与故障恢复单线程模型：基于事
英伟达Triton 推理服务详解 leo0308 基础知识机器人 Triton 人工智能
1.TritonInferenceServer简介TritonInferenceServer（简称Triton，原名NVIDIATensorRTInferenceServer）是英伟达推出的一个开源、高性能的推理服务器，专为AI模型的部署和推理服务而设计。它支持多种深度学习框架和硬件平台，能够帮助开发者和企业高效地将AI模型部署到生产环境中。Triton主要用于模型推理服务化，即将训练好的模型通过
深入理解红锁未来并未来 redis 数据库缓存
在构建高并发、高可用的分布式系统时，我们常常会遇到这样一个核心挑战：如何确保多个服务实例能够安全、有序地访问共享资源，避免竞态条件（RaceCondition）和数据不一致？传统单机环境下的锁机制（如Java的synchronized或ReentrantLock）在分布式场景下显得力不从心。于是，分布式锁应运而生，而基于Redis的分布式锁因其高性能和简单性而被广泛应用。然而，单节点Redis锁在
Anolis OS 23 架构支持家族新成员：Anolis OS 23.3 版本及 RISC-V 预览版发布
自AnolisOS23版本发布之始，龙蜥社区就一直致力于探索同源异构的发行版能力，从AnolisOS23.1版本支持龙芯架构同源异构开始，社区就在持续不断地寻找更多的异构可能性。RISC-V作为开放、模块化、可扩展的指令集架构，正成为全球芯片创新的核心驱动力，尤其在边缘计算、物联网、高性能计算等领域展现出巨大潜力。龙蜥社区在RISC-V生态建设中持续投入，并积极贡献上游社区。多位龙蜥社区成员在RI
Redis分布式锁深度剖析：从原理到高可用实践 JouJz redis 分布式 wpf
Redis分布式锁深度剖析：从原理到高可用实践引言：分布式环境下的锁之殇在分布式系统中，共享资源互斥访问是保证数据一致性的核心挑战。传统单机锁（如synchronized）在跨进程场景下完全失效，这就是分布式锁的用武之地。Redis凭借其高性能、原子操作等特性，成为实现分布式锁的主流方案。本文将深入解析Redis分布式锁的实现原理、典型问题及工业级解决方案。一、分布式锁的本质要求1.1必须满足的核
ZooKeeper架构及应用场景详解走过冬季学习笔记 zookeeper 架构分布式
ZooKeeper是一个开源的分布式协调服务，由Apache软件基金会维护。它旨在为分布式应用提供高性能、高可用、强一致性的基础服务，解决分布式系统中常见的协调难题（如配置管理、命名服务、分布式锁、服务发现、领导者选举等）。核心软件架构ZooKeeper的架构设计围绕其核心目标（协调）而优化，主要包含以下关键组件：集群模式(Ensemble):ZooKeeper通常部署为集群（称为ensemble
zookeeper etcd区别 sun007700 zookeeper etcd 分布式
ZooKeeper与etcd的核心区别体现在设计理念、数据模型、一致性协议及适用场景等方面。‌ZooKeeper基于ZAB协议实现分布式协调，采用树形数据结构和临时节点特性，适合传统分布式系统；而etcd基于Raft协议，以高性能键值对存储为核心，专为云原生场景优化，是Kubernetes等容器编排系统的默认存储组件。‌‌1‌‌2‌架构与设计目标差异‌‌ZooKeeper‌。‌设计定位‌:专注于分
Python的科学计算库NumPy（一） linlin_1998 python numpy 开发语言
NumPy(NumericalPython)是Python中最基础、最重要的科学计算库之一，提供了高性能的多维数组（ndarray）对象和大量数学函数，是许多数据科学、机器学习库（如Pandas、SciPy、TensorFlow等）的基础依赖。1.创建一个numpy里面的一维数组importnumpyasnp###通过array方法创建一个ndarrayarray1=np.array([1,2,3
上位机知识篇---文件系统 Atticus-Orion 上位机知识篇文件系统 windows linux FAT NTFS ext4 ZFS
文章目录前言1.FAT（FileAllocationTable）版本FAT12FAT16FAT32优势兼容性好简单轻量适合小文件存储劣势不支持大文件性能较差缺乏高级功能使用场景2.NTFS（NewTechnologyFileSystem）优势支持大文件和大分区高性能日记功能权限控制劣势兼容性差不适合嵌入式设备使用场景3.exFAT（ExtendedFileAllocationTable）优势支持大
8个Java TCP/UDP框架：优缺点及应用场景全解析！技术男老张 #编程语言 -JAVA 编程语言 java tcp/ip udp ssl 网络协议 websocket http
JavaTCP框架在现代网络编程中扮演着至关重要的角色，尤其是在需要高效、稳定且可扩展的网络通信解决方案时。本文将深入探讨一些主流的JavaTCP/UDP框架，分析它们的优缺点以及适用场景，旨在为开发者提供一份详尽的指南。一、NettyNetty是一个异步事件驱动的网络应用框架，用于快速开发高性能、高可靠性的网络IO程序。Netty的设计目标是简化网络编程的复杂性，同时提高网络应用的性能和可扩展性
信创海光x86服务器，定义、特点及应用详解
信创海光x86服务器是中国近年来在信息技术领域努力实现自主可控的成果之一，旨在打破国外技术封锁和限制，这类服务器的核心特点基于x86架构，这是一种广泛应用于全球的微处理器架构，由英特尔公司最初设计，海光作为国产处理器的代表之一，其技术基础来源于AMDZen的授权，主要面向服务器市场。服务器核心：海光C863350处理器海光C863350处理器是一款基于x86架构的高性能CPU，具体参数包括8核心1
电脑选购的基础知识 hello-hebin 有点杂的笔记电脑
文章目录餐前准备电脑的组成电脑选购餐前准备在选购电脑之前先学习一些电脑的基本知识，即电脑的硬件组成，如果你想diy一台比较便宜的高性能的，或者暂时学习了解一些市场的价格，建议点击这里，跳转太平洋电脑城，那么接下来就开始我们的旅途吧！电脑的组成都知道电脑是由硬件和软件组成的，其中硬件基本决定了我们的电脑性能，所有我们在选购电脑时，更加注重的是对硬件的要求，软件的要求并不高，因为软件基本差不多，而且可
配置Nginx实现静态资源访问 Gappsong874 nginx 运维网络安全 web安全安全架构运维开发
Nginx是一款高性能的HTTP和反向代理服务器，常用于处理静态资源请求。通过合理配置，可以显著提升静态资源的访问速度和服务器性能。以下内容将详细介绍如何配置Nginx以实现静态资源的高效访问。基本静态资源配置静态资源通常包括HTML文件、CSS样式表、JavaScript脚本、图片、视频等。Nginx通过简单的配置即可处理这些请求。在Nginx的配置文件中，通常位于/etc/nginx/ngin
实现快速查询的YashanDB数据库配置与调优方法数据库
在现代数据库应用中，查询速度直接影响到系统的性能与用户体验。因此，如何优化数据库查询速度成为一个亟需解决的问题。YashanDB作为一款高性能的数据库，支持多种配置与调优方法，以实现高效的查询性能。本文将探讨YashanDB的数据库配置与调优方法，帮助用户实现快速查询，提升数据库的使用效能和响应速度。数据库配置与调优方法部署架构的选择YashanDB支持多种部署架构，包括单机部署、共享集群部署及分
FastAPI 实用教程：构建高性能 Python Web API 的终极指南熊猫钓鱼>_> 大数据 hadoop 分布式
本文为原创实战教程，涵盖FastAPI核心特性、路由设计、数据验证、数据库集成、认证授权、测试部署全流程，4000+字助你快速掌握现代PythonWeb开发利器。一、FastAPI为何成为开发者新宠？在PythonWeb框架领域，Flask和Django长期占据主导地位。但FastAPI自2018年发布以来迅速崛起，其魅力在于：极致的性能：基于Starlette（异步Web框架）和Pydantic
Redis简介之它是啥财神爷首席大弟子 Redis redis 数据库缓存
什么是RedisRedis是一个基于BSD协议的开源数据库,是一个以键值对形式的存储系统Redis常用于消息队列,缓存,会话存储等场景Redis是使用C语言编写使用许可证：BSD许可证是一个开源的宽松的软件许可协议Redis优点性能极高Redis是以高性能著称,可全天24小时达到每秒十万次的读写操作数据类型丰富哈希字符串集合列表有序集合原子性操作原子性操作是指,程序要么不执行,要嘛执行完毕,这种对
STM32F1系列综合测试程序实践指南 Love Snape
本文还有配套的精品资源，点击获取简介：STM32F1系列微控制器是基于ARMCortex-M3内核的低成本、高性能嵌入式系统解决方案。本综合测试程序旨在帮助初学者快速掌握STM32的基础操作和关键知识点，包括裸机编程、GPIO操作、定时器应用、串行通信、ADC转换、中断处理和Bootloader等。同时，程序将指导学习者熟悉开发环境和理解代码结构，为未来在嵌入式系统开发领域打下坚实的基础。1.ST
多核MCU可用于简化嵌入式设计
转自：http://www.elecfans.com/d/851199.html嵌入式系统设计人员面临着对更高性能和更快上市时间的不断增长的需求。嵌入式处理器需要经常实时地执行不断扩展的任务。同时，应用需要高吞吐量和高能效以及小外形和低成本。多核微控制器单元（MCU）提供了一种可行的新解决方案，利用模块化设计以经济的价格提供多倍的性能提升。几十年来，随着IC上晶体管数量的增加，芯片性能不断提高。采
学习YashanDB数据库的数据完整性保证机制数据库
在现代企业中，数据完整性是维持数据库质量和稳定性的关键因素。随着数据量的不断增加和复杂性的发展，确保数据的准确性、一致性和可靠性变得愈发重要。如何在复杂的环境中保障数据的完整性，成为了许多企业面临的重大挑战。YashanDB作为一款高性能的数据库，通过一系列机制有效地确保了数据完整性，实现了对数据操作的准确管理。YashanDB的数据完整性保证机制完整性约束YashanDB支持多种完整性约束，主要
如何选择适合自己企业的YashanDB数据库托管服务？数据库
引言在当前数据驱动的业务环境中，企业面临着许多挑战，例如性能瓶颈、数据一致性问题和大规模数据处理需求等。因此，选择合适的数据库托管服务成为企业成功的关键因素之一。YashanDB作为一款具备高性能与高可用性的数据库系统，为企业提供了灵活的数据库部署和管理选项。然而，不同企业的需求差异化，需要综合考量多方面的因素来选择最适合的托管服务。本文旨在帮助企业在选择YashanDB数据库托管服务时从多角度进
如何通过YashanDB提升客户体验数据库
如何优化查询速度？这是许多企业在使用数据库技术时常常会遇到的问题。查询速度的快慢直接影响到用户的体验，尤其是在大数据量和高并发的使用场景中。顾客期望迅速获取信息，若响应时间过长，可能导致客户流失。因此，优化数据库的性能成为提升客户体验的关键举措之一。YashanDB作为一种高性能的数据库技术架构，提供了多种优化机制，以提升系统的查询速度和整体处理能力。多种部署架构YashanDB支持多种部署架构，
国产开源高性能对象存储RustFS保姆级上手指南光爷不秃对象存储 rust 国产开源软件 rust 云计算开源软件 github 开源数据仓库 database
在云计算与大数据爆发的时代，企业和开发者对存储方案的要求愈发严苛——不仅要能扛住海量数据的读写压力，还得兼顾安全性、可扩展性和兼容性。今天给大家介绍一款基于Rust语言开发的开源分布式对象存储系统——RustFS，它不仅是MinIO的国产化优秀替代方案，更是AI、大数据和云原生场景的理想之选。本文将从基础介绍到实战操作，带大家快速上手这款"优雅的存储解决方案"。一、RustFS核心特性解析Rust
RustFS一款Rust 驱动的高性能分布式存储系统 ❀͜͡傀儡师 rust 分布式开发语言
演示地址：https://play.rustfs.com/browser访问账号（默认rustfsadmin）。访问密钥（默认rustfsadmin）。下载mchttps://dl.min.io/client/mc/release可以直接在Linux系统上安装mc（，然后访问Docker容器内的RustFS服务。下载并安装：wgethttps://dl.min.io/client/mc/relea
深入理解 FastAPI 异步编程：从 async/await 到并发实战佑瞻 fastapi
在开发高性能Web应用时，我们常常面临这样的困惑：明明硬件配置不断升级，可系统在高并发场景下还是显得力不从心。是框架选择有误？还是代码架构存在短板？今天我们就以FastAPI为切入点，深入剖析异步编程的核心逻辑，揭开并发处理的神秘面纱，让你的API服务在高负载下依然能保持丝滑体验。一、快速决策：何时该用asyncdef在FastAPI中定义路由函数时，我们首先面临一个关键选择：用def还是asyn
实战讲解YashanDB数据库的索引创建与查询优化数据库
在当今数据驱动的时代，数据库技术在信息存储和处理方面扮演着至关重要的角色。然而，随着数据规模的不断增长，如何高效地管理和查询数据，提升系统性能成为了技术领域面临的常见挑战。尤其是在处理复杂查询时，性能瓶颈、响应时间延迟及数据一致性问题亟需找到相应的解决方案。YashanDB作为一款高性能的数据库系统，提供了丰富的索引创建机制和查询优化策略，以有效提升数据访问效率。本指南旨在为数据库开发者和管理员提
如何选择合适的硬件来优化YashanDB的运行？数据库
在现代数据库管理系统中，硬件选择对性能影响显著。尤其在处理大量数据时，硬件的优化与配置直接关系到数据库的查询速度和响应时间。对于YashanDB这样的数据库，如何选择合适的硬件以提升其运行效率，成为众多企业考虑的重点。本文旨在深入剖析YashanDB的硬件需求和选择标准，以提供相关技术指引。CPU选择YashanDB作为一款高性能的数据库，其查询和操作的性能在很大程度上依赖于计算能力。选择多核CP
如何通过YashanDB支持远程办公中的数据访问数据库
在当今以数据驱动的商业环境中，远程办公已成为一种常态。然而，随着团队分布在不同地点，数据访问的挑战也随之增加。机构需要确保远程用户能够高效、安全地访问数据，而这涉及到性能瓶颈、数据一致性及安全性等多种问题。YashanDB作为一款高性能分布式数据库，提供了多种技术方案以有效支持远程办公中的数据访问，其独特的体系结构能够应对这些挑战。YashanDB的部署架构YashanDB支持三种不同的部署形态：
如何通过YashanDB提升团队协作效率数据库
在当今信息化时代，数据驱动的决策成为企业成功的关键。然而，如何保障数据的即时获取与高效处理，解决团队协作中的数据访问障碍，是企业面临的重大挑战。YashanDB凭借其高性能的分布式数据库特性，提供了提升团队协作效率的有效解决方案。本文将深入探讨YashanDB在团队数据协作中的应用优势，以及具体的技术实现路径。YashanDB体系架构及其优势多种部署架构支持YashanDB支持单机（主备）、分布式
如何通过YashanDB数据库提升数据处理效率数据库
在当前数据密集型应用和海量数据处理需求日益增长的背景下，数据库性能瓶颈和数据一致性问题成为普遍挑战。面对业务复杂性和数据量的指数级增长，如何高效存储、调度与处理数据，保障系统的高可用性和扩展性，是数据库技术的重要课题。针对这些挑战，YashanDB作为一款新一代高性能关系型数据库，凭借其多样化部署模式、先进的存储机制和智能优化组件，为数据处理效率的提升提供了系统解决方案。本文将面向数据库设计者、系
算法单链的创建与删除换个号韩国红果果 c 算法
先创建结构体 struct student { int data; //int tag;//标记这是第几个 struct student *next; }; // addone 用于将一个数插入已从小到大排好序的链中 struct student *addone(struct student *h,int x){ if(h==NULL) //??????
《大型网站系统与Java中间件实践》第2章读后感白糖_ java中间件
断断续续花了两天时间试读了《大型网站系统与Java中间件实践》的第2章，这章总述了从一个小型单机构建的网站发展到大型网站的演化过程---整个过程会遇到很多困难，但每一个屏障都会有解决方案，最终就是依靠这些个解决方案汇聚到一起组成了一个健壮稳定高效的大型系统。看完整章内容，
zeus持久层spring事务单元测试 deng520159 java DAO spring jdbc
今天把zeus事务单元测试放出来,让大家指出他的毛病, 1.ZeusTransactionTest.java 单元测试 package com.dengliang.zeus.webdemo.test; import java.util.ArrayList; import java.util.List; import org.junit.Test; import
Rss 订阅开发周凡杨 html xml 订阅 rss 规范
RSS是 Really Simple Syndication的缩写（对rss2.0而言，是这三个词的缩写，对rss1.0而言则是RDF Site Summary的缩写，1.0与2.0走的是两个体系）。 RSS
分页查询实现 g21121 分页查询
在查询列表时我们常常会用到分页，分页的好处就是减少数据交换，每次查询一定数量减少数据库压力等等。按实现形式分前台分页和服务器分页：前台分页就是一次查询出所有记录，在页面中用js进行虚拟分页，这种形式在数据量较小时优势比较明显，一次加载就不必再访问服务器了，但当数据量较大时会对页面造成压力，传输速度也会大幅下降。服务器分页就是每次请求相同数量记录，按一定规则排序，每次取一定序号直接的数据
spring jms异步消息处理 510888780 jms
spring JMS对于异步消息处理基本上只需配置下就能进行高效的处理。其核心就是消息侦听器容器，常用的类就是DefaultMessageListenerContainer。该容器可配置侦听器的并发数量，以及配合MessageListenerAdapter使用消息驱动POJO进行消息处理。且消息驱动POJO是放入TaskExecutor中进行处理，进一步提高性能，减少侦听器的阻塞。具体配置如下：
highCharts柱状图布衣凌宇 hightCharts 柱图
第一步：导入 exporting.js,grid.js,highcharts.js;第二步：写controller @Controller@RequestMapping(value="${adminPath}/statistick")public class StatistickController { private UserServi
我的spring学习笔记2-IoC（反向控制依赖注入） aijuans spring mvc Spring 教程 spring3 教程 Spring 入门
IoC（反向控制依赖注入）这是Spring提出来了，这也是Spring一大特色。这里我不用多说，我们看Spring教程就可以了解。当然我们不用Spring也可以用IoC，下面我将介绍不用Spring的IoC。 IoC不是框架，她是java的技术，如今大多数轻量级的容器都会用到IoC技术。这里我就用一个例子来说明：如：程序中有 Mysql.calss 、Oracle.class 、SqlSe
TLS java简单实现 antlove java ssl keystore tls secure
1. SSLServer.java package ssl; import java.io.FileInputStream; import java.io.InputStream; import java.net.ServerSocket; import java.net.Socket; import java.security.KeyStore; import
Zip解压压缩文件百合不是茶 Zip格式解压 Zip流的使用文件解压
ZIP文件的解压缩实质上就是从输入流中读取数据。Java.util.zip包提供了类ZipInputStream来读取ZIP文件,下面的代码段创建了一个输入流来读取ZIP格式的文件; ZipInputStream in = new ZipInputStream(new FileInputStream(zipFileName)); &n
underscore.js 学习（一） bijian1013 JavaScript underscore
工作中需要用到underscore.js，发现这是一个包括了很多基本功能函数的js库，里面有很多实用的函数。而且它没有扩展 javascript的原生对象。主要涉及对Collection、Object、Array、Function的操作。学
java jvm常用命令工具——jstatd命令(Java Statistics Monitoring Daemon) bijian1013 java jvm jstatd
1.介绍 jstatd是一个基于RMI（Remove Method Invocation）的服务程序，它用于监控基于HotSpot的JVM中资源的创建及销毁，并且提供了一个远程接口允许远程的监控工具连接到本地的JVM执行命令。 jstatd是基于RMI的，所以在运行jstatd的服务
【Spring框架三】Spring常用注解之Transactional bit1129 transactional
Spring可以通过注解@Transactional来为业务逻辑层的方法(调用DAO完成持久化动作)添加事务能力，如下是@Transactional注解的定义： /* * Copyright 2002-2010 the original author or authors. * * Licensed under the Apache License, Version
我(程序员)的前进方向 bitray 程序员
作为一个普通的程序员,我一直游走在java语言中,java也确实让我有了很多的体会.不过随着学习的深入,java语言的新技术产生的越来越多,从最初期的javase,我逐渐开始转变到ssh,ssi,这种主流的码农,.过了几天为了解决新问题,webservice的大旗也被我祭出来了,又过了些日子jms架构的activemq也开始必须学习了.再后来开始了一系列技术学习,osgi,restful.....
nginx lua开发经验总结 ronin47
使用nginx lua已经两三个月了，项目接开发完毕了，这几天准备上线并且跟高德地图对接。回顾下来lua在项目中占得必中还是比较大的，跟PHP的占比差不多持平了，因此在开发中遇到一些问题备忘一下 1：content_by_lua中代码容量有限制，一般不要写太多代码，正常编写代码一般在100行左右（具体容量没有细心测哈哈，在4kb左右），如果超出了则重启nginx的时候会报 too long pa
java-66-用递归颠倒一个栈。例如输入栈{1,2,3,4,5}，1在栈顶。颠倒之后的栈为{5,4,3,2,1}，5处在栈顶 bylijinnan java
import java.util.Stack; public class ReverseStackRecursive { /** * Q 66.颠倒栈。 * 题目：用递归颠倒一个栈。例如输入栈{1,2,3,4,5}，1在栈顶。 * 颠倒之后的栈为{5,4,3,2,1}，5处在栈顶。 *1. Pop the top element *2. Revers
正确理解Linux内存占用过高的问题 cfyme linux
Linux开机后，使用top命令查看，4G物理内存发现已使用的多大3.2G，占用率高达80%以上： Mem: 3889836k total, 3341868k used, 547968k free, 286044k buffers Swap: 6127608k total,&nb
[JWFD开源工作流]当前流程引擎设计的一个急需解决的问题 comsci 工作流
当我们的流程引擎进入IRC阶段的时候，当循环反馈模型出现之后，每次循环都会导致一大堆节点内存数据残留在系统内存中，循环的次数越多，这些残留数据将导致系统内存溢出，并使得引擎崩溃。。。。。。而解决办法就是利用汇编语言或者其它系统编程语言，在引擎运行时，把这些残留数据清除掉。
自定义类的equals函数 dai_lm equals
仅作笔记使用 public class VectorQueue { private final Vector<VectorItem> queue; private class VectorItem { private final Object item; private final int quantity; public VectorI
Linux下安装R语言 datageek R语言 linux
命令如下：sudo gedit /etc/apt/sources.list1、deb http://mirrors.ustc.edu.cn/CRAN/bin/linux/ubuntu/ precise/ 2、deb http://dk.archive.ubuntu.com/ubuntu hardy universesudo apt-key adv --keyserver ke
如何修改mysql 并发数(连接数)最大值 dcj3sjt126com mysql
MySQL的连接数最大值跟MySQL没关系，主要看系统和业务逻辑了方法一：进入MYSQL安装目录打开MYSQL配置文件 my.ini 或 my.cnf查找 max_connections=100 修改为 max_connections=1000 服务里重起MYSQL即可　　方法二：MySQL的最大连接数默认是100客户端登录：mysql -uusername -ppass
单一功能原则 dcj3sjt126com 面向对象的程序设计软件设计编程原则
单一功能原则[ 编辑] SOLID 原则单一功能原则开闭原则 Liskov代换原则接口隔离原则依赖反转原则查论编在面向对象编程领域中，单一功能原则（Single responsibility principle）规定每个类都应该有
POJO、VO和JavaBean区别和联系 fanmingxing VO POJO javabean
POJO和JavaBean是我们常见的两个关键字，一般容易混淆，POJO全称是Plain Ordinary Java Object / Plain Old Java Object，中文可以翻译成：普通Java类，具有一部分getter/setter方法的那种类就可以称作POJO，但是JavaBean则比POJO复杂很多，JavaBean是一种组件技术，就好像你做了一个扳子，而这个扳子会在很多地方被
SpringSecurity3.X--LDAP：AD配置 hanqunfeng SpringSecurity
前面介绍过基于本地数据库验证的方式，参考http://hanqunfeng.iteye.com/blog/1155226，这里说一下如何修改为使用AD进行身份验证【只对用户名和密码进行验证，权限依旧存储在本地数据库中】。将配置文件中的如下部分删除：
mac mysql 修改密码 IXHONG mysql
$ sudo /usr/local/mysql/bin/mysqld_safe –user=root & //启动MySQL(也可以通过偏好设置面板来启动)$ sudo /usr/local/mysql/bin/mysqladmin -uroot password yourpassword //设置MySQL密码（注意，这是第一次MySQL密码为空的时候的设置命令，如果是修改密码，还需在-
设计模式--抽象工厂模式 kerryg 设计模式
抽象工厂模式：工厂模式有一个问题就是，类的创建依赖于工厂类，也就是说，如果想要拓展程序，必须对工厂类进行修改，这违背了闭包原则。我们采用抽象工厂模式，创建多个工厂类，这样一旦需要增加新的功能，直接增加新的工厂类就可以了，不需要修改之前的代码。总结：这个模式的好处就是，如果想增加一个功能，就需要做一个实现类，
评"高中女生军训期跳楼” nannan408
首先，先抛出我的观点，各位看官少点砖头。那就是，中国的差异化教育必须做起来。孔圣人有云：有教无类。不同类型的人，都应该有对应的教育方法。目前中国的一体化教育，不知道已经扼杀了多少创造性人才。我们出不了爱迪生，出不了爱因斯坦，很大原因，是我们的培养思路错了，我们是第一要“顺从”。如果不顺从，我们的学校，就会用各种方法，罚站，罚写作业，各种罚。军
scala如何读取和写入文件内容？ qindongliang1922 java jvm scala
直接看如下代码： package file import java.io.RandomAccessFile import java.nio.charset.Charset import scala.io.Source import scala.reflect.io.{File, Path} /** * Created by qindongliang on 2015/
C语言算法之百元买百鸡 qiufeihu c 算法
中国古代数学家张丘建在他的《算经》中提出了一个著名的“百钱买百鸡问题”，鸡翁一，值钱五，鸡母一，值钱三，鸡雏三，值钱一，百钱买百鸡，问翁，母，雏各几何？代码如下： #include <stdio.h> int main() { int cock,hen,chick; /*定义变量为基本整型*/ for(coc
Hadoop集群安全性：Hadoop中Namenode单点故障的解决方案及详细介绍AvatarNode wyz2009107220 NameNode
正如大家所知，NameNode在Hadoop系统中存在单点故障问题，这个对于标榜高可用性的Hadoop来说一直是个软肋。本文讨论一下为了解决这个问题而存在的几个solution。 1. Secondary NameNode 原理：Secondary NN会定期的从NN中读取editlog，与自己存储的Image进行合并形成新的metadata image 优点：Hadoop较早的版本都自带，