100 open source Big Data architecture papers for data professionals

zhuan :https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan

Big Data technology has been extremely disruptive with open source playing a dominant role in shaping its evolution. While on one hand it has been disruptive, on the other it has led to a complex ecosystem where new frameworks, libraries and tools are being released pretty much every day, creating confusion as technologists struggle and grapple with the deluge.

If you are a Big Data enthusiast or a technologist ramping up (or scratching your head), it is important to spend some serious time deeply understanding the architecture of key systems to appreciate its evolution. Understanding the architectural components and subtleties would also help you choose and apply the appropriate technology for your use case. In my journey over the last few years, some literature has helped me become a better educated data professional. My goal here is to not only share the literature but consequently also use the opportunity to put some sanity into the labyrinth of open source systems.
One caution, most of the reference literature included is hugely skewed towards deep architecture overview (in most cases original research papers) than simply provide you with basic overview. I firmly believe that deep dive will fundamentally help you understand the nuances, though would not provide you with any shortcuts, if you want to get a quick basic overview.

Jumping right in…

Key architecture layers

File Systems- Distributed file systems which provide storage, fault tolerance, scalability, reliability, and availability.
Data Stores– Evolution of application databases into Polyglot storage with application specific databases instead of one size fits all. Common ones are Key-Value, Document, Column and Graph.
Resource Managers– provide resource management capabilities and support schedulers for high utilization and throughput.
Coordination– systems that manage state, distributed coordination, consensus and lock management.
Computational Frameworks– a lot of work is happening at this layer with highly specialized compute frameworks for Streaming, Interactive, Real Time, Batch and Iterative Graph (BSP) processing. Powering these are complete computation runtimes like BDAS (Spark) & Flink.
DataAnalytics –Analytical (consumption) tools and libraries, which support exploratory, descriptive, predictive, statistical analysis and machine learning.
Data Integration– these include not only the orchestration tools for managing pipelines but also metadata management.
Operational Frameworks – these provide scalable frameworks for monitoring & benchmarking.

Architecture Evolution

The modern data architecture is evolving with a goal of reduced latency between data producers and consumers. This consequently is leading to real time and low latency processing, bridging the traditional batch and interactive layers into hybrid architectures like Lambda and Kappa.

Lambda - Established architecture for a typical data pipeline. More details.
Kappa– An alternative architecture which moves the processing upstream to the Stream layer.
SummingBird– a reference model on bridging the online and traditional processing models.

Before you deep dive into the actual layers, here are some general documents which can provide you a great background on NoSQL, Data Warehouse Scale Computing and Distributed Systems.

Data center as a computer– provides a great background on warehouse scale computing.
NOSQL Data Stores– background on a diverse set of key-value, document and column oriented stores.
NoSQL Thesis– great background on distributed systems, first generation NoSQL systems.
Large Scale Data Management- covers the data model, the system architecture and the consistency model, ranging from traditional database vendors to new emerging internet-based enterprises.
Eventual Consistency– background on the different consistency models for distributed systems.
CAP Theorem– a nice background on CAP and its evolution.

There also has been in the past a fierce debate between traditional Parallel DBMS with Map Reduce paradigm of processing. Pro parallel DBMS (another) paper(s) was rebutted by the pro MapReduce one. Ironically the Hadoop community from then has come full circle with the introduction of MPI style shared nothing based processing on Hadoop - SQL on Hadoop.

File Systems

As the focus shifts to low latency processing, there is a shift from traditional disk based storage file systems to an emergence of in memory file systems - which drastically reduces the I/O & disk serialization cost. Tachyon and Spark RDD are examples of that evolution.

Google File System- The seminal work on Distributed File Systems which shaped the Hadoop File System.
Hadoop File System– Historical context/architecture on evolution of HDFS.
Ceph File System– An alternative to HDFS.
Tachyon– An in memory storage system to handle the modern day low latency data processing.

File Systems have also seen an evolution on the file formats and compression techniques. The following references gives you a great background on the merits of row and column formats and the shift towards newer nested column oriented formats which are highly efficient for Big Data processing. Erasure codes are using some innovative techniques to reduce the triplication (3 replicas) schemes without compromising data recoverability and availability.

Column Oriented vs Row-Stores– good overview of data layout, compression and materialization.
RCFile– Hybrid PAX structure which takes the best of both the column and row oriented stores.
Parquet– column oriented format first covered in Google’s Dremel’s paper.
ORCFile– an improved column oriented format used by Hive.
Compression– compression techniques and their comparison on the Hadoop ecosystem.
Erasure Codes– background on erasure codes and techniques; improvement on the default triplication on Hadoop to reduce storage cost.

Data Stores

Broadly, the distributed data stores are classified on ACID & BASE stores depending on the continuum of strong to weak consistency respectively. BASE further is classified into KeyValue, Document, Column and Graph - depending on the underlying schema & supported data structure. While there are multitude of systems and offerings in this space, I have covered few of the more prominent ones. I apologize if I have missed a significant one...

BASE
Key Value Stores Dynamo – key-value distributed storage system Cassandra – Inspired by Dynamo; a multi-dimensional key-value/column oriented data store. Voldemort – another one inspired by Dynamo, developed at LinkedIn.
Column Oriented Stores BigTable – seminal paper from Google on distributed column oriented data stores. HBase – while there is no definitive paper , this provides a good overview of the technology. Hypertable – provides a good overview of the architecture.

Document Oriented Stores CouchDB – a popular document oriented data store. MongoDB – a good introduction to MongoDB architecture.

Graph Neo4j – most popular Graph database. Titan – open source Graph database under the Apache license.

ACID I see a lot of evolution happening in the open source community which will try and catch up with what Google has done – 3 out of the prominent papers below are from Google , they have solved the globally distributed consistent data store problem.

Megastore – a highly available distributed consistent database. Uses Bigtable as its storage subsystem. Spanner – Globally distributed synchronously replicated linearizable database which supports SQL access. MESA – provides consistency, high availability, reliability, fault tolerance and scalability for large data and query volumes. CockroachDB – An open source version of Spanner (led by former engineers) in active development.

Resource Managers

While the first generation of Hadoop ecosystem started with monolithic schedulers like YARN, the evolution now is towards hierarchical schedulers (Mesos), that can manage distinct workloads, across different kind of compute workloads, to achieve higher utilization and efficiency.
YARN – The next generation Hadoop compute framework. Mesos – scheduling between multiple diverse cluster computing frameworks.
These are loosely coupled with schedulers whose primary function is schedule jobs based on scheduling policies/configuration. Schedulers Capacity Scheduler - introduction to different features of capacity scheduler. FairShare Scheduler - introduction to different features of fair scheduler. Delayed Scheduling - introduction to Delayed Scheduling for FairShare scheduler. Fair & Capacity schedulers – a survey of Hadoop schedulers.

Coordination

These are systems that are used for coordination and state management across distributed data systems. Paxos – a simple version of the classical paper; used for distributed systems consensus and coordination. Chubby – Google’s distributed locking service that implements Paxos. Zookeeper – open source version inspired from Chubby though is general coordination service than simply a locking service

Computational Frameworks

The execution runtimes provide an environment for running distinct kinds of compute. The most common runtimes are

Spark – its popularity and adoption is challenging the traditional Hadoop ecosystem.Flink – very similar to Spark ecosystem; strength over Spark is in iterative processing.
The frameworks broadly can be classified based on the model and latency of processing
Batch MapReduce – The seminal paper from Google on MapReduce. MapReduce Survey – A dated, yet a good paper; survey of Map Reduce frameworks.

Iterative (BSP) Pregel – Google’s paper on large scale graph processing Giraph - large-scale distributed Graph processing system modelled around Pregel GraphX - graph computation framework that unifies graph-parallel and data parallel computation. Hama - general BSP computing engine on top of Hadoop Open source graph processing survey of open source systems modelled around Pregel BSP.

Streaming Stream Processing – A great overview of the distinct real time processing systems Storm – Real time big data processing system Samza - stream processing framework from LinkedIn Spark Streaming – introduced the micro batch architecture bridging the traditional batch and interactive processing.

Interactive Dremel – Google’s paper on how it processes interactive big data workloads, which laid the groundwork for multiple open source SQL systems on Hadoop. Impala – MPI style processing on make Hadoop performant for interactive workloads. Drill – A open source implementation of Dremel. Shark – provides a good introduction to the data analysis capabilities on the Spark ecosystem. Shark – another great paper which goes deeper into SQL access. Dryad – Configuring & executing parallel data pipelines using DAG. Tez – open source implementation of Dryad using YARN. BlinkDB - enabling interactive queries over data samples and presenting results annotated with meaningful error bars

RealTime Druid – a real time OLAP data store. Operationalized time series analytics databases Pinot – LinkedIn OLAP data store very similar to Druid.

Data Analysis

The analysis tools range from declarative languages like SQL to procedural languages like Pig. Libraries on the other hand are supporting out of the box implementations of the most common data mining and machine learning libraries.

Tools Pig – Provides a good overview of Pig Latin. Pig – provide an introduction of how to build data pipelines using Pig. Hive – provides an introduction of Hive. Hive – another good paper to understand the motivations behind Hive at Facebook. Phoenix – SQL on Hbase. Join Algorithms for Map Reduce – provides a great introduction to different join algorithms on Hadoop. Join Algorithms for Map Reduce – another great paper on the different join techniques.

Libraires MLlib – Machine language framework on Spark. SparkR – Distributed R on Spark framework. Mahout – Machine learning framework on traditional Map Reduce.

Data Integration Data integration frameworks provide good mechanisms to ingest and outgest data between Big Data systems. It ranges from orchestration pipelines to metadata framework with support for lifecycle management and governance.

Ingest/Messaging Flume – a framework for collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Sqoop– a tool to move data between Hadoop and Relational data stores. Kafka – distributed messaging system for data processing

ETL/Workflow Crunch – library for writing, testing, and running MapReduce pipelines. Falcon – data management framework that helps automate movement and processing of Big Data. Cascading – data manipulation through scripting. Oozie – a workflow scheduler system to manage Hadoop jobs.

Metadata HCatalog - a table and storage management layer for Hadoop.

Serialization ProtocolBuffers – language neutral serialization format popularized by Google. Avro – modeled around Protocol Buffers for the Hadoop ecosystem.

Operational Frameworks

Finally the operational frameworks provide capabilities for metrics, benchmarking and performance optimization to manage workloads.

Monitoring Frameworks OpenTSDB – a time series metrics systems built on top of HBase. Ambari - system for collecting, aggregating and serving Hadoop and system metrics

Benchmarking YCSB – performance evaluation of NoSQL systems. GridMix – provides benchmark for Hadoop workloads by running a mix of synthetic jobs Background on big data benchmarking with the key challenges associated.

Summary

I hope that the papers are useful as you embark or strengthen your journey. I am sure there are few hundred more papers that I might have inadvertently missed and a whole bunch of systems that I might be unfamiliar with - apologies in advance as don't mean to offend anyone though happy to be educated....

next-hexagonal-starter：前端六边形架构的简约实践翟珊兰
next-hexagonal-starter：前端六边形架构的简约实践next-hexagonal-starter项目地址:https://gitcode.com/gh_mirrors/ne/next-hexagonal-starter项目介绍在软件开发中，六边形架构（HexagonalArchitecture）是一种设计模式，它通过将应用程序的业务逻辑与外部关注点（如UI、数据库、框架等）解耦，
事件驱动架构（EDA）：不止是代码，更是现代运维的灵魂运维开发王义杰系统运维系统架构 aws 架构运维
今天我们来聊一个在云原生时代越来越火热的概念——事件驱动架构（Event-DrivenArchitecture,EDA）。大家可能在浏览AWSEventBridge、ApacheKafka或RabbitMQ的文档时遇到过它。起初，可能会觉得这只是软件工程师在设计微服务时用到的一种模式。但如果我们深入思考就会发现，EDA的精髓早已渗透到现代系统运维的方方面面，甚至可以说，它是一种构建和管理高韧性、高
RISC-V知识总结 —— 指令集思诺学长-刘竞泽 risc-v
资源1:RISC-VChina–RISC-VInternational资源2:RISC-VInternational–RISC-V:TheOpenStandardRISCInstructionSetArchitecture资源3:RV32I,RV64IInstructions—riscv-isa-pagesdocumentation1.指令集架构的类型在讨论RISC-V或任何处理器架构时，区分非特
c++常见英文单词（自用）叫我六胖子 c++英文 c++
c++常见英文单词application应用程式应用、应用程序applicationframework应用程式框架、应用框架应用程序框架architecture架构、系统架构体系结构argument引数（传给函式的值）。叁见parameter叁数、实质叁数、实叁、自变量array阵列数组arrowoperatorarrow（箭头）运算子箭头操作符assembly装配件assemblylanguag
ARMv7内核架构手册及全部ARM内核资料下载杨焕月Great
ARMv7内核架构手册及全部ARM内核资料下载去发现同类优质开源项目:https://gitcode.com/资源介绍本仓库提供了一个重要的资源文件下载，标题为“Armv7内核架构手册+全部arm内核资料”。该资源文件包含了ARMv7内核架构的详细手册以及其他相关的配套资料，非常适合想要深入了解和学习ARM内核的朋友。资源内容ARMArchitectureReferenceManualARMv7-
大语言模型全流程开发技术详解：从架构、训练到对齐与量化艾墨舟启航大模型实战架构人工智能大语言模型
github：https://github.com/mlabonne/llm-course大语言模型全流程开发技术详解：从架构、训练到对齐与量化大模型实战指南：多模型生态实战与论文解读一、LLM架构（TheLLMarchitecture）不需要对Transformer架构有深入的了解，但了解现代LLM的主要步骤很重要：通过分词化将文本转换为数字，通过包括注意力机制在内的层处理这些分词，最后通过各种
hexagonal_spring：构建企业级应用的架构典范
hexagonal_spring：构建企业级应用的架构典范hexagonal_springSpringtemplateforaRESTservicedesignedwithHexagonalarchitecture项目地址:https://gitcode.com/gh_mirrors/he/hexagonal_spring在当今快速发展的软件开发领域，良好的架构设计对于构建可维护、可扩展的应用至关
安卓App开发篇六：Android应用模板和框架李小白杂货铺斜杠人生 android App开发 Android应用模板 Android开发框架应用模板框架开发框架
文章目录系列文章官方模板和框架AndroidStudio内置模板NowinAndroid样例代码开源框架与最佳实践架构模式JetBrains/compose-multiplatformandroid/architecture-samplesandroid10/Android-CleanArchitectureionic-team/ionic-frameworkktorio/ktor其他响应式编程与
OpenRisc-54-play with OpenRISC based atlys board Rill OpenRisc openrisc
1.OpenRISC1200softprocessorIntroductionTheOpenRISC1200(OR1200)isasynthesizableCPUcoremaintainedbydevelopersatOpenCores.org.TheOR1200designisanopensourceimplementationoftheOpenRISC1000RISCarchitecture.
java开发为什么要分层 Miki_souls Java基础 Java开发 Java web开发 java 开发语言
在Java开发中，分层架构（LayeredArchitecture）是一个常见的设计模式，它将系统分为不同的层级（如表示层、业务逻辑层、数据访问层等），每个层次都有不同的职责和功能。分层架构的设计能够带来很多好处，下面是一些主要原因：1.提高代码的可维护性模块化：通过将系统功能划分为多个层次，每个层次只负责处理特定的任务，减少了代码之间的耦合性。如果需要修改某一层的功能，通常不需要影响到其他层。例
IEC61850 一致性测试中的 UCA 测试 alonetown IEC61850 详解 IEC61850 UCA
一、IEC61850与UCA的关系背景标准演进：IEC61850是电力系统自动化领域的国际通信标准，其发展与美国UCA（UserCommunicationsArchitecture）标准密切相关。2001年，UCA国际用户组织与IEC合作，将UCA2.0标准整合到IEC61850中，形成了统一的电力系统通信标准体系。UCA的核心定位：UCA测试在IEC61850框架下，主要针对设备的通信协议、数据
深入解析Seata：分布式事务的终极解决方案豪宇刘架构分布式微服务
一、分布式事务的挑战与Seata的定位在微服务架构中，业务操作通常涉及多个服务的数据变更。例如电商下单场景需要调用订单服务、库存服务和支付服务，如何保证这些跨服务的操作要么全部成功，要么全部回滚，是分布式事务的核心挑战。传统解决方案（如两阶段提交2PC）存在性能低下、侵入性强等问题，而Seata（SimpleExtensibleAutonomousTransactionArchitecture）作
Spring Cloud 框架下的事件驱动架构（EDA）和分布式事务处理西部驯兽师高并发场景软件工程方法论软件分析设计 spring cloud 架构分布式
在SpringCloud框架下，结合事件驱动（Event-DrivenArchitecture,EDA）与分布式事务，能有效解决分布式系统中数据一致性、服务解耦、性能优化等核心问题。以下是典型场景及所需技术组件的详细分析：一、核心解决场景跨服务事务一致性（最终一致性）问题：跨服务的业务操作（如订单创建、库存扣减、账户扣款）需保证原子性。解决方案：结合分布式事务框架（如Seata）和事件驱动，通过S
Asterinas: a new Linux-compatible kernel project mounter625 Linux kernel 服务器 linux kernel
AsterinasisanewLinux-ABI-compatiblekernelprojectwritteninRust,basedonwhattheauthorscalla"framekernelarchitecture".TheprojectoverlapssomewhatwiththegoalsoftheRustforLinuxproject,butapproachestheproblem
文件存储服务器架构,分布式存储架构犹大的狮子文件存储服务器架构
分布式存储架构由三个部分组成：客户端、元数据服务器和数据服务器。客户端负责发送读写请求，缓存文件元数据和文件数据。元数据服务器负责管理元数据和处理客户端的请求，是整个系统的核心组件。数据服务器负责存放文件数据，保证数据的可用性和完整性。该架构的好处是性能和容量能够同时拓展，系统规模具有很强的伸缩性。[1]中文名分布式存储架构外文名Distributedstoragearchitecture组成客户
Java单体架构 vs 分布式架构可曾去过倒悬山 java 架构分布式
Java单体架构vs分布式架构在电商系统开发中，当用户量从几百激增到百万级，你的架构是否还能从容应对？一次代码更新是否意味着整个系统停机？今天我们就来拆解Java架构设计的核心命题：单体还是分布式？一、Java单体架构：传统而稳固的基石1.什么是单体架构？单体架构（MonolithicArchitecture）如同一个巨型集装箱：所有功能模块（用户管理、订单处理、支付等）打包在同一个代码库中，编译
【Pytorch、torchvision、CUDA 各个版本对应关系以及安装指令】 CL_Meng77 安装教程基础知识 pytorch 人工智能 linux 服务器 python
Pytorch、torchvision、CUDA各个版本对应关系以及安装指令更多内容，可以移步到我的小红薯哦（复旦孟博士）1、名词解释1.1CUDACUDA（ComputeUnifiedDeviceArchitecture）是由NVIDIA开发的用于并行计算的平台和编程模型。CUDA旨在利用NVIDIAGPU（图形处理单元）的强大计算能力来加速各种科学计算、数值模拟和深度学习任务。GPU并行计算C
微服务架构的优点和缺点学会了没后端微服务架构微服务的优点微服务的缺点
AdvantagesofmicroservicesanddisadvantagestoknowMicroservicesarchitectureisanapproachtosystemdesignthatbreakscomplexsystemsintomoreminor,moremanageableservices.Usingmicroservicesframeworksresultsinmore
数字化转型-4A架构之技术架构 AI_Auto 工业4.0 (智能制造)架构 4A 技术架构
4A架构系列文章数字化转型-4A架构（业务架构、应用架构、数据架构、技术架构）数字化转型-4A架构之业务架构数字化转型-4A架构之应用架构数字化转型-4A架构之数据架构数字化转型-4A架构之技术架构一、技术架构TechnologyArchitecture1.技术架构(TA)定义技术架构，构筑企业业务运行的基石，涵盖硬件、软件、网络资源及服务。它让所有技术组件高效协同，为企业应用与数据架构提供坚实支
DeepSeek 源码解构：从 MoE 架构到 MLA 的工程化实现威哥说编程架构人工智能 AI编程
在机器学习和深度学习的领域中，**MoE（MixtureofExperts）架构和MLA（Multi-LevelArchitecture）**的工程化实现已成为极具前景和挑战的技术路线。MoE架构通过在模型中引入多个专家（Expert），根据输入数据的不同动态选择合适的专家，显著提高了模型的表达能力。而MLA作为一种多层次结构，在更大规模的数据集上展现出了出色的性能，尤其在模型训练和推理效率方面表
Spring项目Mock测试太难？我靠这套Hexagonal架构优雅通关复杂业务 hikktn 程序员的思维乐园 spring 架构 java
别再拿HexagonalArchitecture玩DEMO了：从真实项目hexagonal-architecture-java学会Mock的正确姿势作者：Killian（重庆后端开发者）｜架构落地实践篇｜更新日期：2025-06-12|字数：5000字一、前言：当HexagonalArchitecture遇上复杂业务系统“六边形架构真好，一切皆接口、Mock无压力！”这种说法我们听得太多，但很多后
为什么微服务是最佳选择？我们是否考虑过其他替代方案？冰糖心书房微服务架构设计微服务架构云原生
在决定采用微服务架构之前，认真考虑并评估其他替代方案是至关重要的。微服务并非银弹，它带来了显著的好处，但也引入了相当大的复杂性。不加选择的采用微服务可能弊大于利。以下是一些常见的替代方案，以及为什么在某些情况下微服务可能被认为是“最佳”（但更准确地说是“最适合当前需求”）的选择：常见的替代方案：单体架构(MonolithicArchitecture):描述:所有功能模块都打包在同一个应用程序或进程
鸿蒙开发实战之Crypto Architecture Kit构建美颜相机安全基座 harmonyos-next
一、安全架构设计基于CryptoArchitectureKit构建美颜相机三级防护体系：数据安全层人脸特征向量国密SM9加密存储用户生物特征TEE隔离处理通信安全层端到端加密聊天（Signal协议改进版）防中间人攻击的双向证书校验密钥管理层基于设备根密钥的派生体系动态密钥轮换策略（每小时自动更新）二、核心安全实现importcryptofrom'@ohos.cryptoArchitectureKi
软件开发怎么对抗抄袭_对抗软件开发中的复杂性 cullen2012 编程语言 python 人工智能 java 大数据
软件开发怎么对抗抄袭这是怎么回事(What'sthisabout)Afterworkingondifferentprojects,I'venoticedthateveryoneofthemhadsomecommonproblems,regardlessofdomain,architecture,codeconventionandsoon.Thoseproblemsweren'tchallengin
Gartner企业技术参考架构学习心得架构师学习成长之路架构大数据技术架构
一、引言在当今复杂多变的商业环境中，企业需要不断利用技术来支持其业务目标的实现。技术架构在这一过程中起着至关重要的作用，它不仅为企业的技术选型和系统开发提供了蓝图，还确保了企业的IT系统能够高效、安全、稳定地运行。Gartner,Inc.的《IntroductiontoEnterpriseTechnicalReferenceArchitecture》报告为技术架构师提供了一套全面的框架和指南，帮助
论文解析：一文弄懂ResNet(图像识别分类、目标检测) Nelson_hehe 深度学习-计算机视觉论文精读系列分类目标检测 ResNet 残差网络深度学习计算机视觉
目录一、相关资源二、Motivation三、技术细节1.残差学习过程2.快捷连接类型(1)IdentityShortcuts（恒等捷径）(2)ProjectionShortcuts（投影捷径）(3)两种捷径对比3.深层瓶颈结构DeeperBottleneckArchitectures四、网络结构及参数选择1.主网络2.残差连接五、创新点1.残差学习框架的提出2.高效的残差块设计3.极深网络的成功训
【Net】OPC UA（OPC Unified Architecture）协议 CodeWithMe C/C++网络开发语言
OPCUA（OPCUnifiedArchitecture）协议OPCUA是什么？OPCUA是由OPCFoundation制定的工业自动化通信标准，遵循IEC 62541，旨在实现跨平台、跨系统的安全数据交换([en.wikipedia.org][1])。支持Client–Server与Pub/Sub两种通信模式，可运行在嵌入式设备、PLC、PC或云端([en.wikipedia.org][1])。
Gartner＜Reference Architecture Brief: Data Integration＞学习心得架构师学习成长之路微服务架构云原生 ETL 数据集成
数据集成参考架构解析引言在当今数字化时代，数据已成为企业最宝贵的资产之一。随着企业规模的不断扩大和业务的日益复杂，数据来源也变得多样化，包括客户关系管理（CRM）、企业资源规划（ERP）、人力资源管理（HR）和市场营销等领域的运营系统。这些系统虽然在其特定功能领域表现出色，但将它们作为企业所有数据的中央存储库来满足运营、高级分析和人工智能/机器学习（AI/ML）需求则具有挑战性。因此，数据集成架构
Mock到底Mock谁？Clean Architecture 实战+Mock测试最佳实践（含对比代码） hikktn 程序员的思维乐园 Mock
CleanArchitecture与传统三层架构对比实战：从创建订单功能看架构演进之道作者：killian｜更新日期：2025年6月｜字数：约3200字在企业级开发中，架构往往决定了项目的可维护性、可测试性和可演化性。今天我们将以“创建订单”功能为例，深入对比两种常见的后端架构：传统三层架构与CleanArchitecture，并结合MyBatis-Plus持久化实现+Mockito单元测试场景，
网络受限情况下，在Ollama中导入从Model Scope下载的safetensors提示错误Error: unsupported architecture “Qwen3ForCausalLM“ stupidorclever AI知识库 Ollama Qwen3
Ollama版本：0.80大模型：Qwen3-14B-FP8模型架构：Qwen3ForCausalLMOllama此版本暂不支持此架构，待后续版本添加。Ollamasupportsimportingmodelsforseveraldifferentarchitecturesincluding:Llama(includingLlama2,Llama3,Llama3.1,andLlama3.2);Mi
jQuery 跨域访问的三种方式 No 'Access-Control-Allow-Origin' header is present on the reque qiaolevip 每天进步一点点学习永无止境跨域众观千象
XMLHttpRequest cannot load http://v.xxx.com. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:63342' is therefore not allowed access. test.html:1
mysql 分区查询优化 annan211 java 分区优化 mysql
分区查询优化引入分区可以给查询带来一定的优势，但同时也会引入一些bug. 分区最大的优点就是优化器可以根据分区函数来过滤掉一些分区，通过分区过滤可以让查询扫描更少的数据。所以，对于访问分区表来说，很重要的一点是要在where 条件中带入分区，让优化器过滤掉无需访问的分区。可以通过查看explain执行计划，是否携带 partitions
MYSQL存储过程中使用游标 chicony Mysql存储过程
DELIMITER $$ DROP PROCEDURE IF EXISTS getUserInfo $$ CREATE PROCEDURE getUserInfo(in date_day datetime)-- -- 实例-- 存储过程名为：getUserInfo-- 参数为：date_day日期格式:2008-03-08-- BEGINdecla
mysql 和 sqlite 区别 Array_06 sqlite
转载： http://www.cnblogs.com/ygm900/p/3460663.html mysql 和 sqlite 区别 SQLITE是单机数据库。功能简约，小型化，追求最大磁盘效率 MYSQL是完善的服务器数据库。功能全面，综合化，追求最大并发效率 MYSQL、Sybase、Oracle等这些都是试用于服务器数据量大功能多需要安装，例如网站访问量比较大的。而sq
pinyin4j使用 oloz pinyin4j
首先需要pinyin4j的jar包支持；jar包已上传至附件内方法一:把汉字转换为拼音；例如：编程转换后则为biancheng /** * 将汉字转换为全拼 * @param src 你的需要转换的汉字 * @param isUPPERCASE 是否转换为大写的拼音； true:转换为大写；fal
微博发送私信随意而生微博
在前面文章中说了如和获取登陆时候所需要的cookie，现在只要拿到最后登陆所需要的cookie，然后抓包分析一下微博私信发送界面 http://weibo.com/message/history?uid=****&name=**** 可以发现其发送提交的Post请求和其中的数据，让后用程序模拟发送POST请求中的数据，带着cookie发送到私信的接入口，就可以实现发私信的功能了。
jsp 香水浓 jsp
JSP初始化容器载入JSP文件后，它会在为请求提供任何服务前调用jspInit()方法。如果您需要执行自定义的JSP初始化任务，复写jspInit()方法就行了 JSP执行这一阶段描述了JSP生命周期中一切与请求相关的交互行为，直到被销毁。当JSP网页完成初始化后
在 Windows 上安装 SVN Subversion 服务端 AdyZhang SVN
在 Windows 上安装 SVN Subversion 服务端2009-09-16高宏伟哈尔滨市道里区通达街291号最佳阅读效果请访问原地址：http://blog.donews.com/dukejoe/archive/2009/09/16/1560917.aspx 现在的Subversion已经足够稳定，而且已经进入了它的黄金时段。我们看到大量的项目都在使
android开发中如何使用 alertDialog从listView中删除数据？ aijuans android
我现在使用listView展示了很多的配置信息，我现在想在点击其中一条的时候填出 alertDialog,点击确认后就删除该条数据，（ ArrayAdapter ，ArrayList，listView 全部删除），我知道在下面的onItemLongClick 方法中参数 arg2 是选中的序号，但是我不知道如何继续处理下去 1 2 3
jdk-6u26-linux-x64.bin 安装 baalwolf linux
1.上传安装文件(jdk-6u26-linux-x64.bin) 2.修改权限 [root@localhost ~]# ls -l /usr/local/jdk-6u26-linux-x64.bin 3.执行安装文件 [root@localhost ~]# cd /usr/local [root@localhost local]# ./jdk-6u26-linux-x64.bin&nbs
MongoDB经典面试题集锦 BigBird2012 mongodb
1.什么是NoSQL数据库？NoSQL和RDBMS有什么区别？在哪些情况下使用和不使用NoSQL数据库？ NoSQL是非关系型数据库，NoSQL = Not Only SQL。关系型数据库采用的结构化的数据，NoSQL采用的是键值对的方式存储数据。在处理非结构化/半结构化的大数据时；在水平方向上进行扩展时；随时应对动态增加的数据项时可以优先考虑使用NoSQL数据库。在考虑数据库的成熟
JavaScript异步编程Promise模式的6个特性 bijian1013 JavaScript Promise
Promise是一个非常有价值的构造器，能够帮助你避免使用镶套匿名方法，而使用更具有可读性的方式组装异步代码。这里我们将介绍6个最简单的特性。在我们开始正式介绍之前，我们想看看Javascript Promise的样子： var p = new Promise(function(r
[Zookeeper学习笔记之八]Zookeeper源代码分析之Zookeeper.ZKWatchManager bit1129 zookeeper
ClientWatchManager接口 //接口的唯一方法materialize用于确定那些Watcher需要被通知 //确定Watcher需要三方面的因素1.事件状态 2.事件类型 3.znode的path public interface ClientWatchManager { /** * Return a set of watchers that should
【Scala十五】Scala核心九：隐式转换之二 bit1129 scala
隐式转换存在的必要性，在Java Swing中，按钮点击事件的处理，转换为Scala的的写法如下： val button = new JButton button.addActionListener( new ActionListener { def actionPerformed(event: ActionEvent) {
Android JSON数据的解析与封装小Demo ronin47
转自：http://www.open-open.com/lib/view/open1420529336406.html package com.example.jsondemo; import org.json.JSONArray; import org.json.JSONException; import org.json.JSONObject; impor
[设计]字体创意设计方法谈 brotherlamp UI ui自学 ui视频 ui教程 ui资料
从古至今，文字在我们的生活中是必不可少的事物，我们不能想象没有文字的世界将会是怎样。在平面设计中，UI设计师在文字上所花的心思和功夫最多，因为文字能直观地表达UI设计师所的意念。在文字上的创造设计，直接反映出平面作品的主题。如设计一幅戴尔笔记本电脑的广告海报，假设海报上没有出现“戴尔”两个文字，即使放上所有戴尔笔记本电脑的图片都不能让人们得知这些电脑是什么品牌。只要写上“戴尔笔
单调队列-用一个长度为k的窗在整数数列上移动，求窗里面所包含的数的最大值 bylijinnan java 算法面试题
import java.util.LinkedList; /* 单调队列滑动窗口单调队列是这样的一个队列：队列里面的元素是有序的，是递增或者递减题目：给定一个长度为N的整数数列a(i),i=0,1,...,N-1和窗长度k. 要求：f(i) = max{a(i-k+1),a(i-k+2),..., a(i)},i = 0,1,...,N-1 问题的另一种描述就
struts2处理一个form多个submit chiangfai struts2
web应用中，为完成不同工作，一个jsp的form标签可能有多个submit。如下代码： <s:form action="submit" method="post" namespace="/my"> <s:textfield name="msg" label="叙述：">
shell查找上个月，陷阱及野路子 chenchao051 shell
date -d "-1 month" +%F 以上这段代码，假如在2012/10/31执行，结果并不会出现你预计的9月份，而是会出现八月份，原因是10月份有31天，9月份30天，所以-1 month在10月份看来要减去31天，所以直接到了8月31日这天，这不靠谱。野路子解决：假设当天日期大于15号
mysql导出数据中文乱码问题 daizj mysql 中文乱码导数据
解决mysql导入导出数据乱码问题方法：１、进入mysql，通过如下命令查看数据库编码方式： mysql> show variables like 'character_set_%'; +--------------------------+----------------------------------------+ | Variable_name&nbs
SAE部署Smarty出现：Uncaught exception 'SmartyException' with message 'unable to write dcj3sjt126com PHP smarty sae
对于SAE出现的问题：Uncaught exception 'SmartyException' with message 'unable to write file...。官方给出了详细的FAQ：http://sae.sina.com.cn/?m=faqs&catId=11#show_213 解决方案为： 01 $path
《教父》系列台词 dcj3sjt126com
Your love is also your weak point. 你的所爱同时也是你的弱点。 If anything in this life is certain, if history has taught us anything, it is that you can kill anyone. 不顾家的人永远不可能成为一个真正的男人。 &
mongodb安装与使用 dyy_gusi mongo
一.MongoDB安装和启动,widndows和linux基本相同 1.下载数据库, linux:mongodb-linux-x86_64-ubuntu1404-3.0.3.tgz 2.解压文件,并且放置到合适的位置 tar -vxf mongodb-linux-x86_64-ubun
Git排除目录 geeksun git
在Git的版本控制中，可能有些文件是不需要加入控制的，那我们在提交代码时就需要忽略这些文件，下面讲讲应该怎么给Git配置一些忽略规则。有三种方法可以忽略掉这些文件，这三种方法都能达到目的，只不过适用情景不一样。 1. 针对单一工程排除文件这种方式会让这个工程的所有修改者在克隆代码的同时，也能克隆到过滤规则，而不用自己再写一份，这就能保证所有修改者应用的都是同一
Ubuntu 创建开机自启动脚本的方法 hongtoushizi ubuntu
转载自： http://rongjih.blog.163.com/blog/static/33574461201111504843245/ Ubuntu 创建开机自启动脚本的步骤如下： 1) 将你的启动脚本复制到 /etc/init.d目录下以下假设你的脚本文件名为 test。 2) 设置脚本文件的权限 $ sudo chmod 755
第八章流量复制/AB测试/协程 jinnianshilongnian nginx lua coroutine
流量复制在实际开发中经常涉及到项目的升级，而该升级不能简单的上线就完事了，需要验证该升级是否兼容老的上线，因此可能需要并行运行两个项目一段时间进行数据比对和校验，待没问题后再进行上线。这其实就需要进行流量复制，把流量复制到其他服务器上，一种方式是使用如tcpcopy引流；另外我们还可以使用nginx的HttpLuaModule模块中的ngx.location.capture_multi进行并发
电商系统商品表设计 lkl
DROP TABLE IF EXISTS `category`; -- 类目表 /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `category` ( `id` int(11) NOT NUL
修改phpMyAdmin导入SQL文件的大小限制 pda158 sql mysql
　用phpMyAdmin导入mysql数据库时，我的10M的数据库不能导入，提示mysql数据库最大只能导入2M。　　 phpMyAdmin数据库导入出错：　　You probably tried to upload too large file. Please refer to documentation for ways to workaround this limit.
Tomcat性能调优方案 Sobfist apache jvm tomcat 应用服务器
一、操作系统调优对于操作系统优化来说，是尽可能的增大可使用的内存容量、提高CPU的频率，保证文件系统的读写速率等。经过压力测试验证，在并发连接很多的情况下，CPU的处理能力越强，系统运行速度越快。。【适用场景】任何项目。二、Java虚拟机调优应该选择SUN的JVM，在满足项目需要的前提下，尽量选用版本较高的JVM，一般来说高版本产品在速度和效率上比低版本会有改进。 J
SQLServer学习笔记 vipbooks 数据结构 xml
1、create database school 创建数据库school 2、drop database school 删除数据库school 3、use school 连接到school数据库，使其成为当前数据库 4、create table class(classID int primary key identity not null) 创建一个名为class的表，其有一