Jason897

大数据 - 从理论到实践

原文: What is Big Data – Theory to Implementation

Java Code Geeks联合创始人Byron Kiourtzoglou近日发表文章，从理论到实践剖析了大数据的4个V，并于文章最后分享了Java工程师可能会需要的13个主流开源大数据工具。

What is Big Data? You may ask; and more importantly why it is the latest trend in nearly every business domain? Is it just a hype or its here to stay?

As a matter of fact “Big Data” is a pretty straightforward term – its just what its says – a very large data-set. How large? The exact answer is “as large as you can imagine”!

事实上大数据是个非常简单的术语——就像它所说的一样，是非常大的数据集。究竟有大多？真实的答案就是“如你所想的那么大”！

How can this data-set be so massively big? Because the data may come from everywhere and in enormous rates: RFID sensors that gather traffic data, sensors used to gather weather information, GPRS packets from cell phones, posts to social media sites, digital pictures and videos, online purchase transaction records, you name it! Big Data is an enormous data-set that may contain information from every possible source that produces data that we are interested in.

为什么数据集会变得如此之大？因为当今的数据已经无所不在并且存在着巨大的回报：收集通信数据的RFID传感器，收集天气信息的传感器，移动设备给社交网站发送的GPRS数据包，图片视频，在线购物产生的交易记录，应有尽有！大数据是一个巨大的数据集，包含了任何数据源产生的信息，当然前提是这些信息是我们感兴趣的。

Nevertheless Big Data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make businesses more agile, and to answer questions that were previously considered beyond our reach. That is why Big Data is characterized by four main aspects: Volume, Variety, Velocity, and Veracity(Value) known as “the four Vs of Big Data”. Let’s briefly examine what each one of them stands for and what challenges it presents:

然而大数据的含义绝不只与体积相关，因为大数据还可以用于寻找新的真知、形成新的数据和内容；我们可以使用从大数据中提取的真知、数据和内容去使商业更加灵活，以及回答那些之前被认为远超当前范畴的问题。这也是大数据被从以下4个方面定义的原因：Volume（体积）、Variety（多样）、Velocity（效率）以及Veracity（Value，价值），也就是大数据的4V。下面将简述每个特性以及所面临的挑战：

Volume

Volume references the amount of content a business must be able to capture, store and access. 90% of the world’s data has been generated in the past two years alone. Organizations today are overwhelmed with volumes of data, easily amassing terabytes—even petabytes—of information of all types, some of which needs to be organized, secured and analyzed.

Volume说的是一个业务必须捕获、存储及访问的数据量，仅仅在过去两年内就生产了世界上所有数据的90%。现今的机构已完全被数据的体积所淹没，轻易的就会产生TB甚至是PB级不同类型的数据，并且其中有些数据需要被组织、防护（窃取）以及分析。

Variety

80% of the world’s data is semi – structured. Sensors, smart devices and social media are generating this data through Web pages, weblog files, social-media forums, audio, video, click streams, e-mails, documents, sensor systems and so on. Traditional analytics solutions work very well with structured information, for example data in a relational database with a well formed schema. Variety in data types represents a fundamental shift in the way data is stored and analysis needs to be done to support today’s decision-making and insight process. Thus Variety represents the various types of data that can’t easily be captured and managed in a traditional relational database but can be easily stored and analyzed with Big Data technologies.

世界上产生的数据有80%都是半结构化的，传感器、智能设备和社交媒体都是通过Web页面、网络日志文件、社交媒体论坛、音频、视频、点击流、电子邮件、文档、传感系统等生成这些数据。传统的分析方案往往只适合结构化数据，举个例子：存储在关系型数据库中的数据就有完整的结构模型。数据类型的多样化同样意味着为支持当下的决策制定及真知处理，我们需要在数据储存和分析上面进行根本的改变。Variety代表了在传统关系数据库中无法轻易捕获和管理的数据类型，使用大数据技术却可以轻松的储存和分析。

Velocity

Velocity requires analyzing data in near real time, aka “sometimes 2 minutes is too late!”. Gaining a competitive edge means identifying a trend or opportunity in minutes or even seconds before your competitor does. Another example is time-sensitive processes such as catching fraud where information must be analyzed as it streams into your enterprise in order to maximize its value. Time-sensitive data has a very short shelf-life; compelling organizations to analyze them in near real-time.

Velocity则需要对数据进行近实时的分析，亦称“sometimes 2 minutes is too late!”。获取竞争优势意味着你需要在几分钟，甚至是几秒内识别一个新的趋势或机遇，同样还需要尽可能的快于你竞争对手。另外一个例子是时间敏感性数据的处理，比如说捕捉罪犯，在这里数据必须被收集后就完成被分析，这样才能获取最大价值。对时间敏感的数据保质期往往都很短，这就需求组织或机构使用近实时的方式对其分析。

Veracity (Value)

Acting on data is how we create opportunities and derive value. Data is all about supporting decisions, so when you are looking at decisions that can have a major impact on your business, you are going to want as much information as possible to support your case. Nevertheless the volume of data alone does not provide enough trust for decision makers to act upon information. The truthfulness and quality of data is the most important frontier to fuel new insights and ideas. Thus establishing trust in Big Data solutions probably presents the biggest challenge one should overcome to introduce a solid foundation for successful decision making.

通过分析数据我们得出如何的抓住机遇及收获价值，数据的重要性就在于对决策的支持；当你着眼于一个可能会对你企业产生重要影响的决策，你希望获得尽可能多的信息与用例相关。单单数据的体积并不能决定其是否对决策产生帮助，数据的真实性和质量才是获得真知和思路最重要的因素，因此这才是制定成功决策最坚实的基础。

While the existing installed base of business intelligence and data warehouse solutions weren’t engineered to support the four V’s, big data solutions are being developed to address these challenges.

然而当下现有的商业智能和数据仓库技术并不完全支持4V理论，大数据解决方案的开发正是针对这些挑战。

What follows is a brief presentation of the major open-source Java based tools that are available today and support Big Data :

下面将介绍大数据领域支持Java的主流开源工具：

	HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. HDFS is specifically designed for storing vast amount of data, so it is optimized for storing/accessing a relatively small number of very large files compared to traditional file systems where are optimized to handle large numbers of relatively small files. HDFS是Hadoop应用程序中主要的分布式储存系统， HDFS集群包含了一个NameNode（主节点），这个节点负责管理所有文件系统的元数据及存储了真实数据的DataNode（数据节点，可以有很多）。HDFS针对海量数据所设计，所以相比传统文件系统在大批量小文件上的优化，HDFS优化的则是对小批量大型文件的访问和存储。
	Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. Hadoop MapReduce是一个软件框架，用以轻松编写处理海量（TB级）数据的并行应用程序，以可靠和容错的方式连接大型集群中上万个节点（商用硬件）。
	Apache HBase is the Hadoop database, a distributed, scalable, big data store. It provides random, realtime read/write access to Big Data and is optimized for hosting very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. In its core Apache HBase is a distributed, versioned, column-oriented store modeled after Google’s Bigtable: A Distributed Storage System for Structured Databy Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Apache HBase是Hadoop数据库，一个分布式、可扩展的大数据存储。它提供了大数据集上随机和实时的读/写访问，并针对了商用服务器集群上的大型表格做出优化——上百亿行，上千万列。其核心是Google Bigtable论文的开源实现，分布式列式存储。就像Bigtable利用GFS（Google File System）提供的分布式数据存储一样，它是Apache Hadoop在HDFS基础上提供的一个类Bigatable。
	The Apache Cassandra is a performant, linear scalable and high available database that can run on commodity hardware or cloud infrastructure making it the perfect platform for mission-critical data. Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for users and the peace of mind of knowing that you can survive regional outages. Cassandra’s data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching. Apache Cassandra是一个高性能、可线性扩展、高有效性数据库，可以运行在商用硬件或云基础设施上打造完美的任务关键性数据平台。在横跨数据中心的复制中，Cassandra同类最佳，为用户提供更低的延时以及更可靠的灾难备份。通过log-structured update、反规范化和物化视图的强支持以及强大的内置缓存，Cassandra的数据模型提供了方便的二级索引（column indexe）。
	Apache Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Apache Hive是Hadoop的一个数据仓库系统，促进了数据的综述（将结构化的数据文件映射为一张数据库表）、即席查询以及存储在Hadoop兼容系统中的大型数据集分析。Hive提供完整的SQL查询功能——HiveQL语言，同时当使用这个语言表达一个逻辑变得低效和繁琐时，HiveQL还允许传统的Map/Reduce程序员使用自己定制的Mapper和Reducer。
	Apache Pig is a platform for analyzing large data sets. It consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Pig’s infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs. Pig’s language layer currently consists of a textual language called Pig Latin, which is developed with ease of programming, optimization opportunities and extensibility in mind. Apache Pig是一个用于大型数据集分析的平台，它包含了一个用于数据分析应用的高级语言以及评估这些应用的基础设施。Pig应用的闪光特性在于它们的结构经得起大量的并行，也就是说让它们支撑起非常大的数据集。Pig的基础设施层包含了产生Map-Reduce任务的编译器。Pig的语言层当前包含了一个原生语言——Pig Latin，开发的初衷是易于编程和保证可扩展性。
	Apache Chukwa is an open source data collection system for monitoring large distributed systems. It is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a ﬂexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data. Apache Chukwa是个开源的数据收集系统，用以监视大型分布系统。建立于HDFS和Map/Reduce框架之上，继承了Hadoop的可扩展性和稳定性。Chukwa同样包含了一个灵活和强大的工具包，用以显示、监视和分析结果，以保证数据的使用达到最佳效果。
	Apache Ambari is a web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner. Apache Ambari是一个基于web的工具，用于配置、管理和监视Apache Hadoop集群，支持Hadoop HDFS,、Hadoop MapReduce、Hive、HCatalog,、HBase、ZooKeeper、Oozie、Pig和Sqoop。Ambari同样还提供了集群状况仪表盘，比如heatmaps和查看MapReduce、Pig、Hive应用程序的能力，以友好的用户界面对它们的性能特性进行诊断。
	Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. In short Apache ZooKeeper is a high-performance coordination service for distributed applications like those run on a hadoop cluster. Apache ZooKeeper是一个针对大型分布式系统的可靠协调系统，提供的功能包括：配置维护、命名服务、分布式同步、组服务等。ZooKeeper的目标就是封装好复杂易出错的关键服务，将简单易用的接口和性能高效、功能稳定的系统提供给用户。
	Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具，可以将一个关系型数据库中数据导入Hadoop的HDFS中，也可以将HDFS中数据导入关系型数据库中。
	Apache Oozie is a scalable, reliable and extensible workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). Apache Oozie是一个可扩展、可靠及可扩充的工作流调度系统，用以管理Hadoop作业。Oozie Workflow作业是活动的Directed Acyclical Graphs（DAGs）。Oozie Coordinator作业是由周期性的Oozie Workflow作业触发，周期一般决定于时间（频率）和数据可用性。Oozie与余下的Hadoop堆栈结合使用，开箱即用的支持多种类型Hadoop作业（比如：Java map-reduce、Streaming map-reduce、Pig、 Hive、Sqoop和Distcp）以及其它系统作业（比如Java程序和Shell脚本）。
	Apache Mahout is a scalable machine learning and data mining library. Currently Mahout supports mainly four use cases: Recommendation mining : Takes users’ behavior and from that tries to find items users might like. Clustering : Takes e.g. text documents and groups them into groups of topically related documents. Classification : Learns from existing categorized documents what documents of a specific category look like and is able to assign unlabeled documents to the (hopefully) correct category. Frequent itemset mining : Takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together. Apache Mahout是个可扩展的机器学习和数据挖掘库，当前Mahout支持主要的4个用例：推荐挖掘：搜集用户动作并以此给用户推荐可能喜欢的事物。聚集：收集文件并进行相关文件分组。分类：从现有的分类文档中学习，寻找文档中的相似特征，并为无标签的文档进行正确的归类。频繁项集挖掘：将一组项分组，并识别哪些个别项会经常一起出现。
	Apache HCatalog is a table and storage management service for data created using Apache Hadoop. This includes: Providing a shared schema and data type mechanism. Providing a table abstraction so that users need not be concerned with where or how their data is stored. Providing interoperability across data processing tools such as Pig, Map Reduce, and Hive. Apache HCatalog是Hadoop建立数据的映射表和存储管理服务，它包括：提供一个共享模式和数据类型机制。提供一个抽象表，这样用户就不需要关注数据存储的方式和地址。为类似Pig、MapReduce及Hive这些数据处理工具提供互操作性。

That’s it; Big Data, a short theoretical introduction and a compact matrix of implementation approaches focused on overcoming the problems of a new era – the era that forces us to ask bigger questions!

Happy Coding
Byron

《Operating System Concepts》阅读笔记：p449-p459 操作系统
《OperatingSystemConcepts》学习第35天，p449-p459总结，总计11页。一、技术总结1.NVM&SSDFlash-memory-basedNVMisfrequentlyusedinadisk-drive-likecontainer,inwhichcaseitiscalledasolid-statedisk(SSD)(Figure11.3)。2.HDDScheduling
Chapter 9: Using Templates in Practice_《C++ Templates》notes 郭涤生 c/c++c++开发语言笔记
UsingTemplatesinPracticeStep1:UnderstandTemplateDefinitionsandtheInclusionModelKeyConceptCodeExampleExplanationStep2:TackleLinkerErrorswithExplicitInstantiationKeyConceptCodeExampleTestCaseStep3:Decod
C++11 SFINAE概念介绍:类成员的编译时内省(译) 丸子叮咚响 #C++11/14/17/20 SFINAE
点击查看原文AnintroductiontoC++'sSFINAEconcept:compile-timeintrospectionofaclassmemberC++的SFINAE概念介绍：类成员的编译时内省Trivia:AsaC++enthusiast,IusuallyfollowtheannualC++conferencecppconforatleasttrytokeepmyselfup-to
Chapter 8: Advanced Template Metaprogramming in C++__《C++ Templates》notes 郭涤生 c/c++c++算法开发语言笔记
AdvancedTemplateMetaprogramminginC++1.KeyConcepts&CodeExplanations1.1SFINAE(SubstitutionFailureIsNotAnError)1.2`constexpr`andCompile-TimeComputation1.3TypeTraits1.4VariadicTemplateswithRecursion1.5C++
C++20 新特性全面解析：从概念到协程的编程革命小乌龟登顶记 java 算法数据结构
一、引言：C++20的里程碑意义2020年发布的C++20标准被公认为继C++11之后最重要的版本更新，带来了4大核心特性和20+项重大改进。这些变革不仅提升了代码表达力，更从根本上改变了C++的编程范式。本文将深入解析C++20的关键特性，并通过实战代码示例演示其应用场景。二、四大核心特性详解2.1概念（Concepts）：模板编程的革命基本概念类型约束：通过requires子句限制模板参数类型
《Operating System Concepts》阅读笔记：p359-p388 操作系统
《OperatingSystemConcepts》学习第32天，p359-p388总结，总计30页。一、技术总结1.paging(1)定义Acommonmemorymanagementschemethatavoidsexternalfragmentationbysplittingphysicalmemoryintofixed-sizedframesandlogicalmemoryintoblock
《Operating System Concepts》阅读笔记：p389-p407 操作系统
《OperatingSystemConcepts》学习第33天，p389-p407总结，总计19页。一、技术总结1.virtualmemeory(1)定义Atechniquethatallowstheexecutionofaprocessthatisnotcompletelyinmemory.Also,separationofcomputermemoryaddressspacefromphysic
《Operating System Concepts》阅读笔记：p354-p358 操作系统
《OperatingSystemConcepts》学习第31天，p354-p358总结，总计5页。一、技术总结1.logicaladdress&virtualaddressBindingaddressesateithercompileorloadtimegeneratesidenticallogicalandphysicaladdresses.However,theexecution-timead
《Operating System Concepts》阅读笔记：p331-p353 操作系统
《OperatingSystemConcepts》学习第30天，p331-p353总结，总计23页。一、技术总结1.lockdep工具2.claimedge3.banker'salgorithmAdeadlockavoidancealgorithm,lessefficientthantheresource-allocationgraphschemebutabletodealwithmultiple
《Operating System Concepts》阅读笔记：p286-p308 操作系统
《OperatingSystemConcepts》学习第28天，p286-p308总结，总计23页。一、技术总结1.reentrantlock(可重入锁)(1)为什么称为reentrantlock？AthreadacquiresaReentrantLocklockbyinvokingitslock()method.Ifthelockisavailable—orifthethreadinvoking
clickhouse报错Too many partitions for single INSERT block qq_35640866 clickhouse clickhouse sql 数据库
Code:252,e.displayText()=DB::Exception:ToomanypartitionsforsingleINSERTblock(morethan100).Thelimitiscontrolledby'max_partitions_per_insert_block'setting.Largenumberofpartitionsisacommonmisconception.I
《Operating System Concepts》阅读笔记：p309-p330 操作系统
《OperatingSystemConcepts》学习第29天，p309-p330总结，总计22页。一、技术总结1.Python中的并发编程(1)semaphoreclassthreading.Semaphore(value=1)。(2)conditionvariableclassthreading.Condition(lock=None)书上使用的是Java,因本人在开发工作中使用的是Pytho
24.pocsuite3：开源的远程漏洞测试框架白帽少女安琪拉安全工具网络安全网络
一、项目介绍pocsuite3是一款由Knownsec404Team开发的开源远程漏洞测试框架，专注于快速验证和利用已知漏洞。其通过模块化设计和插件化架构，支持用户编写自定义POC（ProofofConcept），覆盖Web漏洞、系统漏洞、数据库漏洞等多种类型，适用于红队渗透、漏洞验证及安全研究等场景。1.1核心功能模块化设计：支持编写、加载和执行自定义POC，灵活适应不同场景。内置丰富的POC库
C++20 新特性总结 arong-xu Modern C++c++20 算法
简要总结C++20引入了四项非常大的更新,分别是:概念(Concepts).用来简化模板编程,强化表达能力.并且使得出错原因更容易查找.模块(Modules).这是代码组织方面非常大的更新.提供了新的方式来组织代码,并且可以减少编译时间.范围库(RangesandViews).轻量级的,非拥有的范围库,允许对数据进行各种操作.协程(Coroutine).多线程编程方面的一次重大更新.本文将会对C+
c++20 Concepts的简写形式与requires 从句形式 JANGHIGH C++c++20
c++20Concepts的简写形式与requires从句形式原始写法（简写形式）等效写法（requires从句形式）关键区别说明：组合多个约束的示例：两种形式的编译结果：更复杂的约束示例：标准库风格的约束：在C++20Concepts中，使用简写形式的template与使用完整形式的templaterequiresConceptName是等价的。以下是两种写法的具体转换：原始写法（简写形式）te
《Operating System Concepts》阅读笔记：p272-p285 操作系统
《OperatingSystemConcepts》学习第27天，p272-p285总结，总计14页。一、技术总结1.semaphoreAsemaphoreSisanintegervariablethat,apartfrominitialization,isaccessedonlythroughtwostandardatomicoperations:wait()andsignal().2.monit
《Operating System Concepts》阅读笔记：p228-p257 codists 读书笔记操作系统
《OperatingSystemConcepts》学习第25天，p228-p257总结，总计30页。一、技术总结1.algorithmevaluation评估CPU调度算法需要考虑的因素有：CPUutilization,responsetime或者throughput。基于以上几个因素，选择依据为：(1)MaximizingCPUutilizationundertheconstraintthatt
《Operating System Concepts》阅读笔记：p258-p271 codists 读书笔记操作系统
《OperatingSystemConcepts》学习第26天，p258-p271总结，总计14页。一、技术总结1.criticalsectionAsectionofcoderesponsibleforchangingdatathatmustonlybeexecutedbyonethreadorprocessatatimetoavoidaracecondition.2.Peterson’ssolu
《Operating System Concepts》阅读笔记：p208-p227 codists 读书笔记操作系统
《OperatingSystemConcepts》学习第24天，p208-p227总结，总计20页。一、技术总结1.vmstatLinux系统上vmstat命令的作用是“Reportvirtualmemorystatistics”。2.schedulingalgorithms(1)FCFS(first-comefirst-serve)(2)SJF(shortest-job-first)准确的叫法应
《Operating System Concepts》阅读笔记：p258-p271 操作系统
《OperatingSystemConcepts》学习第26天，p258-p271总结，总计14页。一、技术总结1.criticalsectionAsectionofcoderesponsibleforchangingdatathatmustonlybeexecutedbyonethreadorprocessatatimetoavoidaracecondition.2.Peterson'ssolu
《Operating System Concepts》阅读笔记：p208-p227 操作系统
《OperatingSystemConcepts》学习第24天，p208-p227总结，总计20页。一、技术总结1.vmstatLinux系统上vmstat命令的作用是“Reportvirtualmemorystatistics”。2.schedulingalgorithms(1)FCFS(first-comefirst-serve)(2)SJF(shortest-job-first)准确的叫法应
学习prompt artificiali prompt
1解释概念中文指令：请借助费曼学习法，以简单的语言解释[特定概念]是什么，并提供一个例子来说明它如何应用。Prompt:PleaseusetheFeynmanLearningTechniquetoexplain[specificconcept]insimplelanguage,andprovideanexampletoillustratehowitapplies.2帕累托法则帮你找到最重要、最具挑
《Operating System Concepts》阅读笔记：p200-p202 操作系统
《OperatingSystemConcepts》学习第22天，p200-p202总结，总计3页。一、技术总结1.CPU-I/Oburst(1)CPUburstSchedulingprocessstateinwhichtheprocessexecutesonCPU.(2)I/OburstSchedulingprocessstateinwhichtheCPUperformsI/O.2.racecon
Hugging Face Agents Course （Dummy Agent） ZHOU_CAMP Hugging Face Agents Course 人工智能
参考链接：https://qwen.readthedocs.io/en/latest/getting_started/concepts.htmlhttps://huggingface.co/agents-course/notebooks原notebook中用到的是llama模型，但是需要认证，下面改成了Qwen模型DummyAgentLibrary在这个简单的示例中，我们将从零开始编写一个Agen
客户案例 | Ansys与Concepts NREC联合推出面向叶轮机械设计和分析的自动化工作流程 ueotek 光学 Ansys 光学软件自动化运维 Ansys 光学
Ansys拓展与ConceptsNREC的合作关系，通过CFD分析软件与叶片设计软件的集成，实现端到端工作流程，并加快产品上市进程主要亮点双方现在可以在ConceptsNREC的AxCent®3D叶轮机械组件设计中运行面向叶轮机械应用的AnsysCFX®计算流体力学软件该合作使设计人员能够以更高的预测准确性快速评估机器性能，从而缩短设计周期，并提高压缩机、涡轮机、泵、风扇和涡轮增压器等应用的性能近
《Operating System Concepts》阅读笔记：p188-p199 操作系统
《OperatingSystemConcepts》学习第21天，p188-p199总结，总计12页。一、技术总结1.thread-localstorageDataavailableonlytoagiventhread.2.transaction(1)英语中的意思c/u.trans-("across")+agere("todrive,do,peform")。theactofperformsthacr
《Operating System Concepts》阅读笔记：p179-p179 操作系统
《OperatingSystemConcepts》学习第19天，p179-p179总结，总计1页。一、技术总结1.Pythonthreadpool(1)示例书上介绍的是Javathreadpoo,因为本人工作中使用的编程语言是Python,所以补充一下Python中的threadpool用例。importconcurrent.futuresimporturllib.requestURLS=['ht
《Operating System Concepts》阅读笔记：p177-p178 操作系统
《OperatingSystemConcepts》学习第18天，p177-p178总结，总计2页。一、技术总结1.implicitthreadAprogrammingmodelthattransfersthecreationandmanagementofthreadingfromapplicationdeveloperstocompilersandrun-timelibraries.2.threa
Pulsar官方文档翻译-概念和架构-基于地理位置复制（Geo Replication）爱码叔 Pulsar官方文档翻译 Pulsar geo replication 概念架构
博主：爱码叔个人博客站点：[icodebook](https://icodebook.com/)公众号：爱码叔漫画软件设计（搜：爱码叔）专注于软件设计与架构、技术管理。擅长用通俗易懂的语言讲解技术。对技术管理工作有自己的一定见解。文章会第一时间首发在个站上，欢迎大家关注访问！官网原文标题《ConceptsandArchitecture--GeoReplication》翻译时间：2018-11-05
0. Kaggle实战：Kaggle竞赛实战记录列表（持续更新） AI量金术师 Kaggle竞赛人工智能 python 开发语言机器学习金融
目录1.专栏描述2.Kaggle竞赛列表2.1Eedi-MiningMisconceptionsinMathematics（持续更新中）1.专栏描述本专栏专注于记录与分享Kaggle竞赛的解题思路、项目框架及代码实现。通过通俗易懂的讲解和简单明了的测试数据，帮助每位读者轻松掌握参赛技巧，快速提升实战能力，一起探索数据科学的魅力！2.Kaggle竞赛列表2.1Eedi-MiningMisconcep
java的(PO,VO,TO,BO,DAO,POJO) Cb123456 VO TO BO POJO DAO
转: http://www.cnblogs.com/yxnchinahlj/archive/2012/02/24/2366110.html ------------------------------------------------------------------- O/R Mapping 是 Object Relational Mapping（对象关系映
spring ioc原理（看完后大家可以自己写一个spring） aijuans spring
最近，买了本Spring入门书：spring In Action 。大致浏览了下感觉还不错。就是入门了点。Manning的书还是不错的，我虽然不像哪些只看Manning书的人那样专注于Manning,但怀着崇敬的心情和激情通览了一遍。又一次接受了IOC 、DI、AOP等Spring核心概念。先就IOC和DI谈一点我的看法。IO
MyEclipse 2014中Customize Persperctive设置无效的解决方法 Kai_Ge MyEclipse2014
高高兴兴下载个MyEclipse2014，发现工具条上多了个手机开发的按钮，心生不爽就想弄掉他！结果发现Customize Persperctive失效！！有说更新下就好了，可是国内Myeclipse访问不了，何谈更新... so~这里提供了更新后的一下jar包，给大家使用！ 1、将9个jar复制到myeclipse安装目录\plugins中 2、删除和这9个jar同包名但是版本号较
SpringMvc上传 120153216 springMVC
@RequestMapping(value = WebUrlConstant.UPLOADFILE) @ResponseBody public Map<String, Object> uploadFile(HttpServletRequest request,HttpServletResponse httpresponse) { try { //
Javascript----HTML DOM 事件何必如此 JavaScript html Web
HTML DOM 事件允许Javascript在HTML文档元素中注册不同事件处理程序。事件通常与函数结合使用，函数不会在事件发生前被执行！注：DOM：指明使用的 DOM 属性级别。 1.鼠标事件属性
动态绑定和删除onclick事件 357029540 JavaScript jquery
因为对JQUERY和JS的动态绑定事件的不熟悉，今天花了好久的时间才把动态绑定和删除onclick事件搞定!现在分享下我的过程。在我的查询页面，我将我的onclick事件绑定到了tr标签上同时传入当前行(this值)参数，这样可以在点击行上的任意地方时可以选中checkbox，但是在我的某一列上也有一个onclick事件是用于下载附件的，当
HttpClient|HttpClient请求详解 7454103 apache 应用服务器网络协议网络应用 Security
HttpClient 是 Apache Jakarta Common 下的子项目，可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包，并且它支持 HTTP 协议最新的版本和建议。本文首先介绍 HTTPClient，然后根据作者实际工作经验给出了一些常见问题的解决方法。HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了，越来越多的 Java 应用程序需
递归逐层统计树形结构数据 darkranger 数据结构
将集合递归获取树形结构: /** * * 递归获取数据 * @param alist:所有分类 * @param subjname:对应统计的项目名称 * @param pk:对应项目主键 * @param reportList: 最后统计的结果集 * @param count:项目级别 */ public void getReportVO(Arr
访问WEB-INF下使用frameset标签页面出错的原因 aijuans struts2
<frameset rows="61,*,24" cols="*" framespacing="0" frameborder="no" border="0">
MAVEN常用命令 avords
Maven库： http://repo2.maven.org/maven2/ Maven依赖查询： http://mvnrepository.com/ Maven常用命令： 1. 创建Maven的普通java项目： mvn archetype:create -DgroupId=packageName
PHP如果自带一个小型的web服务器就好了 houxinyou apache 应用服务器 Web PHP 脚本
最近单位用PHP做网站，感觉PHP挺好的，不过有一些地方不太习惯，比如，环境搭建。PHP本身就是一个网站后台脚本，但用PHP做程序时还要下载apache，配置起来也不太很方便，虽然有好多配置好的apache+php+mysq的环境，但用起来总是心里不太舒服，因为我要的只是一个开发环境，如果是真实的运行环境，下个apahe也无所谓，但只是一个开发环境，总有一种杀鸡用牛刀的感觉。如果php自己的程序中
NoSQL数据库之Redis数据库管理(list类型) bijian1013 redis 数据库 NoSQL
3.list类型及操作 List是一个链表结构，主要功能是push、pop、获取一个范围的所有值等等，操作key理解为链表的名字。Redis的list类型其实就是一个每个子元素都是string类型的双向链表。我们可以通过push、pop操作从链表的头部或者尾部添加删除元素，这样list既可以作为栈，又可以作为队列。 &nbs
谁在用Hadoop？ bingyingao hadoop 数据挖掘公司应用场景
Hadoop技术的应用已经十分广泛了，而我是最近才开始对它有所了解，它在大数据领域的出色表现也让我产生了兴趣。浏览了他的官网，其中有一个页面专门介绍目前世界上有哪些公司在用Hadoop，这些公司涵盖各行各业，不乏一些大公司如alibaba,ebay,amazon,google,facebook,adobe等，主要用于日志分析、数据挖掘、机器学习、构建索引、业务报表等场景,这更加激发了学习它的热情。
【Spark七十六】Spark计算结果存到MySQL bit1129 mysql
package spark.examples.db import java.sql.{PreparedStatement, Connection, DriverManager} import com.mysql.jdbc.Driver import org.apache.spark.{SparkContext, SparkConf} object SparkMySQLInteg
Scala: JVM上的函数编程 bookjovi scala erlang haskell
说Scala是JVM上的函数编程一点也不为过，Scala把面向对象和函数型编程这两种主流编程范式结合了起来，对于熟悉各种编程范式的人而言Scala并没有带来太多革新的编程思想，scala主要的有点在于Java庞大的package优势，这样也就弥补了JVM平台上函数型编程的缺失，MS家.net上已经有了F#，JVM怎么能不跟上呢？对本人而言
jar打成exe bro_feng java jar exe
今天要把jar包打成exe，jsmooth和exe4j都用了。遇见几个问题。记录一下。两个软件都很好使，网上都有图片教程，都挺不错。首先肯定是要用自己的jre的，不然不能通用，其次别忘了把需要的lib放到classPath中。困扰我很久的一个问题是，我自己打包成功后，在一个同事的没有装jdk的电脑上运行，就是不行，报错jvm.dll为无效的windows映像，如截图最后发现
读《研磨设计模式》-代码笔记-策略模式-Strategy bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /* 策略模式定义了一系列的算法，并将每一个算法封装起来，而且使它们还可以相互替换。策略模式让算法独立于使用它的客户而独立变化简单理解： 1、将不同的策略提炼出一个共同接口。这是容易的，因为不同的策略，只是算法不同，需要传递的参数
cmd命令值cvfM命令 chenyu19891124 cmd
cmd命令还真是强大啊。今天发现jar -cvfM aa.rar @aaalist 就这行命令可以根据aaalist取出相应的文件例如：在d：\workspace\prpall\test.java 有这样一个文件，现在想要将这个文件打成一个包。运行如下命令即可比如在d：\wor
OpenJWeb(1.8) Java Web应用快速开发平台 comsci java 框架 Web 项目管理企业应用
OpenJWeb(1.8) Java Web应用快速开发平台的作者是我们技术联盟的成员，他最近推出了新版本的快速应用开发平台 OpenJWeb(1.8)，我帮他做做宣传 OpenJWeb快速开发平台以快速开发为核心，整合先进的java 开源框架，本着自主开发+应用集成相结合的原则，旨在为政府、企事业单位、软件公司等平台用户提供一个架构透
Python 报错：IndentationError: unexpected indent daizj python tab 空格缩进
IndentationError: unexpected indent 是缩进的问题，也有可能是tab和空格混用啦 Python开发者有意让违反了缩进规则的程序不能通过编译，以此来强制程序员养成良好的编程习惯。并且在Python语言里，缩进而非花括号或者某种关键字，被用于表示语句块的开始和退出。增加缩进表示语句块的开
HttpClient 超时设置 dongwei_6688 httpclient
HttpClient中的超时设置包含两个部分： 1. 建立连接超时，是指在httpclient客户端和服务器端建立连接过程中允许的最大等待时间 2. 读取数据超时，是指在建立连接后，等待读取服务器端的响应数据时允许的最大等待时间在HttpClient 4.x中如下设置： HttpClient httpclient = new DefaultHttpC
小鱼与波浪 dcj3sjt126com
一条小鱼游出水面看蓝天，偶然间遇到了波浪。　　小鱼便与波浪在海面上游戏，随着波浪上下起伏、汹涌前进。　　小鱼在波浪里兴奋得大叫：“你每天都过着这么刺激的生活吗？简直太棒了。”　　波浪说：“岂只每天过这样的生活，几乎每一刻都这么刺激！还有更刺激的，要有潮汐变化，或者狂风暴雨，那才是兴奋得心脏都会跳出来。”　　小鱼说：“真希望我也能变成一个波浪，每天随着风雨、潮汐流动，不知道有多么好！”　　很快，小鱼
Error Code: 1175 You are using safe update mode and you tried to update a table dcj3sjt126com mysql
快速高效用：SET SQL_SAFE_UPDATES = 0；下面的就不要看了！今日用MySQL Workbench进行数据库的管理更新时，执行一个更新的语句碰到以下错误提示： Error Code: 1175 You are using safe update mode and you tried to update a table without a WHERE that
枚举类型详细介绍及方法定义 gaomysion enum javaee
转发 http://developer.51cto.com/art/201107/275031.htm 枚举其实就是一种类型，跟int, char 这种差不多，就是定义变量时限制输入的，你只能够赋enum里面规定的值。建议大家可以看看，这两篇文章，《java枚举类型入门》和《C++的中的结构体和枚举》，供大家参考。枚举类型是JDK5.0的新特征。Sun引进了一个全新的关键字enum
Merge Sorted Array hcx2013 array
Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array. Note:You may assume that nums1 has enough space (size that is
Expression Language 3.0新特性 jinnianshilongnian el 3.0
Expression Language 3.0表达式语言规范最终版从2013-4-29发布到现在已经非常久的时间了；目前如Tomcat 8、Jetty 9、GlasshFish 4已经支持EL 3.0。新特性包括：如字符串拼接操作符、赋值、分号操作符、对象方法调用、Lambda表达式、静态字段/方法调用、构造器调用、Java8集合操作。目前Glassfish 4/Jetty实现最好，对大多数新特性
超越算法来看待个性化推荐 liyonghui160com 超越算法来看待个性化推荐
一提到个性化推荐，大家一般会想到协同过滤、文本相似等推荐算法，或是更高阶的模型推荐算法，百度的张栋说过，推荐40%取决于UI、30%取决于数据、20%取决于背景知识，虽然本人不是很认同这种比例，但推荐系统中，推荐算法起的作用起的作用是非常有限的。就像任何
写给Javascript初学者的小小建议 pda158 JavaScript
　　一般初学JavaScript的时候最头痛的就是浏览器兼容问题。在Firefox下面好好的代码放到IE就不能显示了，又或者是在IE能正常显示的代码在firefox又报错了。　　如果你正初学JavaScript并有着一样的处境的话建议你：初学JavaScript的时候无视DOM和BOM的兼容性，将更多的时间花在了解语言本身（ECMAScript）。只在特定浏览器编写代码（Chrome/Fi
Java 枚举 ShihLei java enum 枚举
注：文章内容大量借鉴使用网上的资料，可惜没有记录参考地址，只能再传对作者说声抱歉并表示感谢！一基础 1）语法枚举类型只能有私有构造器（这样做可以保证客户代码没有办法新建一个enum的实例）枚举实例必须最先定义 2）特性 &nb
Java SE 6 HotSpot虚拟机的垃圾回收机制 uuhorse java HotSpot GC 垃圾回收 VM
官方资料，关于Java SE 6 HotSpot虚拟机的garbage Collection，非常全，英文。 http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning &

大数据 - 从理论到实践

Volume

Variety

Velocity

Veracity (Value)

你可能感兴趣的:(Concept)