陈小哥cw

大数据相关资源网址

Awesome Big Data

A curated list of awesome big data frameworks, resources and other awesomeness. Inspired by awesome-php, awesome-python, awesome-ruby, hadoopecosystemtable & big-data.

Your contributions are always welcome!

Awesome Big Data
- RDBMS
- Frameworks
- Distributed Programming
- Distributed Filesystem
- Key-Map Data Model
- Document Data Model
- Key-value Data Model
- Graph Data Model
- NewSQL Databases
- Columnar Databases
- Time-Series Databases
- SQL-like processing
- Data Ingestion
- Service Programming
- Scheduling
- Machine Learning
- Benchmarking
- Security
- System Deployment
- Applications
- Search engine and framework
- MySQL forks and evolutions
- PostgreSQL forks and evolutions
- Memcached forks and evolutions
- Embedded Databases
- Business Intelligence
- Data Visualization
- Internet of things and sensor data
- Interesting Readings
- Interesting Papers
- Videos
- Books
Other Awesome Lists

RDBMS

MySQL The world’s most popular open source database.
PostgreSQL The world’s most advanced open source database.
Oracle Database - object-relational database management system.
Teradata - high-performance MPP data warehouse platform.

Frameworks

Bistro - general-purpose data processing engine for both batch and stream analytics. It is based on a novel data model, which represents data via functions and processes data via column operations as opposed to having only set operations in conventional approaches like MapReduce or SQL.
IBM Streams - platform for distributed processing and real-time analytics. Integrates with many of the popular technologies in the Big Data ecosystem (Kafka, HDFS, Spark, etc.)
Apache Hadoop - framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system).
Tigon - High Throughput Real-time Stream Processing Framework.
Pachyderm - Pachyderm is a data storage platform built on Docker and Kubernetes to provide reproducible data processing and analysis.
Polyaxon - A platform for reproducible and scalable machine learning and deep learning.

Distributed Programming

AddThis Hydra - distributed data processing and storage system originally developed at AddThis.
AMPLab SIMR - run Spark on Hadoop MapReduce v1.
Apache APEX - a unified, enterprise platform for big data stream and batch processing.
Apache Beam - an unified model and set of language-specific SDKs for defining and executing data processing workflows.
Apache Crunch - a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.
Apache DataFu - collection of user-defined functions for Hadoop and Pig developed by LinkedIn.
Apache Flink - high-performance runtime, and automatic program optimization.
Apache Gearpump - real-time big data streaming engine based on Akka.
Apache Gora - framework for in-memory data model and persistence.
Apache Hama - BSP (Bulk Synchronous Parallel) computing framework.
Apache MapReduce - programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
Apache Pig - high level language to express data analysis programs for Hadoop.
Apache REEF - retainable evaluator execution framework to simplify and unify the lower layers of big data systems.
Apache S4 - framework for stream processing, implementation of S4.
Apache Spark - framework for in-memory cluster computing.
Apache Spark Streaming - framework for stream processing, part of Spark.
Apache Storm - framework for stream processing by Twitter also on YARN.
Apache Samza - stream processing framework, based on Kafka and YARN.
Apache Tez - application framework for executing a complex DAG (directed acyclic graph) of tasks, built on YARN.
Apache Twill - abstraction over YARN that reduces the complexity of developing distributed applications.
Baidu Bigflow - an interface that allows for writing distributed computing programs providing lots of simple, flexible, powerful APIs to easily handle data of any scale.
Cascalog - data processing and querying library.
Cheetah - High Performance, Custom Data Warehouse on Top of MapReduce.
Concurrent Cascading - framework for data management/analytics on Hadoop.
Damballa Parkour - MapReduce library for Clojure.
Datasalt Pangool - alternative MapReduce paradigm.
DataTorrent StrAM - real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance.
Facebook Corona - Hadoop enhancement which removes single point of failure.
Facebook Peregrine - Map Reduce framework.
Facebook Scuba - distributed in-memory datastore.
Google Dataflow - create data pipelines to help themæingest, transform and analyze data.
Google MapReduce - map reduce framework.
Google MillWheel - fault tolerant stream processing framework.
IBM Streams - platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, etc. out of the box.
JAQL - declarative programming language for working with structured, semi-structured and unstructured data.
Kite - is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
Metamarkets Druid - framework for real-time analysis of large datasets.
Netflix PigPen - map-reduce for Clojure which compiles to Apache Pig.
Nokia Disco - MapReduce framework developed by Nokia.
Onyx - Distributed computation for the cloud.
Pinterest Pinlater - asynchronous job execution system.
Pydoop - Python MapReduce and HDFS API for Hadoop.
Ray - A fast and simple framework for building and running distributed applications.
Rackerlabs Blueflood - multi-tenant distributed metric processing system
Skale - High performance distributed data processing in NodeJS.
Stratosphere - general purpose cluster computing framework.
Streamdrill - useful for counting activities of event streams over different time windows and finding the most active one.
streamsx.topology - Libraries to enable building IBM Streams application in Java, Python or Scala.
Tuktu - Easy-to-use platform for batch and streaming computation, built using Scala, Akka and Play!
Twitter Heron - Heron is a realtime, distributed, fault-tolerant stream processing engine from Twitter replacing Storm.
Twitter Scalding - Scala library for Map Reduce jobs, built on Cascading.
Twitter Summingbird - Streaming MapReduce with Scalding and Storm, by Twitter.
Twitter TSAR - TimeSeries AggregatoR by Twitter.
Wallaroo - The ultrafast and elastic data processing engine. Big or fast data - no fuss, no Java needed.

Distributed Filesystem

Ambry - a distributed object store that supports storage of trillion of small immutable objects as well as billions of large objects.
Apache HDFS - a way to store large files across multiple machines.
Apache Kudu - Hadoop’s storage layer to enable fast analytics on fast data.
BeeGFS - formerly FhGFS, parallel distributed file system.
Ceph Filesystem - software storage platform designed.
Disco DDFS - distributed filesystem.
Facebook Haystack - object storage system.
Google Colossus - distributed filesystem (GFS2).
Google GFS - distributed filesystem.
Google Megastore - scalable, highly available storage.
GridGain - GGFS, Hadoop compliant in-memory file system.
Lustre file system - high-performance distributed filesystem.
Microsoft Azure Data Lake Store - HDFS-compatible storage in Azure cloud
Quantcast File System QFS - open-source distributed file system.
Red Hat GlusterFS - scale-out network-attached storage file system.
Seaweed-FS - simple and highly scalable distributed file system.
Alluxio - reliable file sharing at memory speed across cluster frameworks.
Tahoe-LAFS - decentralized cloud storage system.
Baidu File System - distributed filesystem.

Distributed Index

Pilosa Open source distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.

Document Data Model

Actian Versant - commercial object-oriented database management systems .
Crate Data - is an open source massively scalable data store. It requires zero administration.
Facebook Apollo - Facebook’s Paxos-like NoSQL database.
jumboDB - document oriented datastore over Hadoop.
LinkedIn Espresso - horizontally scalable document-oriented NoSQL data store.
MarkLogic - Schema-agnostic Enterprise NoSQL database technology.
Microsoft Azure DocumentDB - NoSQL cloud database service with protocol support for MongoDB
MongoDB - Document-oriented database system.
RavenDB - A transactional, open-source Document Database.
RethinkDB - document database that supports queries like table joins and group by.

Key Map Data Model

Note: There is some term confusion in the industry, and two different things are called “Columnar Databases”. Some, listed here, are distributed, persistent databases built around the “key-map” data model: all data has a (possibly composite) key, with which a map of key-value pairs is associated. In some systems, multiple such value maps can be associated with a key, and these maps are referred to as “column families” (with value map keys being referred to as “columns”).

Another group of technologies that can also be called “columnar databases” is distinguished by how it stores data, on disk or in memory – rather than storing data the traditional way, where all column values for a given key are stored next to each other, “row by row”, these systems store all column values next to each other. So more work is needed to get all columns for a given key, but less work is needed to get all values for a given column.

The former group is referred to as “key map data model” here. The line between these and the Key-value Data Model stores is fairly blurry.

The latter, being more about the storage format than about the data model, is listed under Columnar Databases.

You can read more about this distinction on Prof. Daniel Abadi’s blog: Distinguishing two major types of Column Stores.

Apache Accumulo - distributed key/value store, built on Hadoop.
Apache Cassandra - column-oriented distributed datastore, inspired by BigTable.
Apache HBase - column-oriented distributed datastore, inspired by BigTable.
Baidu Tera - an Internet-scale database, inspired by BigTable.
Facebook HydraBase - evolution of HBase made by Facebook.
Google BigTable - column-oriented distributed datastore.
Google Cloud Datastore - is a fully managed, schemaless database for storing non-relational data over BigTable.
Hypertable - column-oriented distributed datastore, inspired by BigTable.
InfiniDB - is accessed through a MySQL interface and use massive parallel processing to parallelize queries.
Tephra - Transactions for HBase.
Twitter Manhattan - real-time, multi-tenant distributed database for Twitter scale.
ScyllaDB - column-oriented distributed datastore written in C++, totally compatible with Apache Cassandra.

Key-value Data Model

Aerospike - NoSQL flash-optimized, in-memory. Open source and “Server code in ‘C’ (not Java or Erlang) precisely tuned to avoid context switching and memory copies.”
Amazon DynamoDB - distributed key/value store, implementation of Dynamo paper.
Badger - a fast, simple, efficient, and persistent key-value store written natively in Go.
Bolt - an embedded key-value database for Go.
BTDB - Key Value Database in .Net with Object DB Layer, RPC, dynamic IL and much more
BuntDB - a fast, embeddable, in-memory key/value database for Go with custom indexing and geospatial support.
Edis - is a protocol-compatible Server replacement for Redis.
ElephantDB - Distributed database specialized in exporting data from Hadoop.
EventStore - distributed time series database.
GridDB - suitable for sensor data stored in a timeseries.
HyperDex - a scalable, next generation key-value and document store with a wide array of features, including consistency, fault tolerance and high performance.
Ignite - is an in-memory key-value data store providing full SQL-compliant data access that can optionally be backed by disk storage.
LinkedIn Krati - is a simple persistent data store with very low latency and high throughput.
Linkedin Voldemort - distributed key/value storage system.
Oracle NoSQL Database - distributed key-value database by Oracle Corporation.
Redis - in memory key value datastore.
Riak - a decentralized datastore.
Storehaus - library to work with asynchronous key value stores, by Twitter.
SummitDB - an in-memory, NoSQL key/value database, with disk persistance and using the Raft consensus algorithm.
Tarantool - an efficient NoSQL database and a Lua application server.
TiKV - a distributed key-value database powered by Rust and inspired by Google Spanner and HBase.
Tile38 - a geolocation data store, spatial index, and realtime geofence, supporting a variety of object types including latitude/longitude points, bounding boxes, XYZ tiles, Geohashes, and GeoJSON
TreodeDB - key-value store that’s replicated and sharded and provides atomic multirow writes.

Graph Data Model

AgensGraph - a new generation multi-model graph database for the modern complex data environment.
Apache Giraph - implementation of Pregel, based on Hadoop.
Apache Spark Bagel - implementation of Pregel, part of Spark.
ArangoDB - multi model distributed database.
DGraph - A scalable, distributed, low latency, high throughput graph database aimed at providing Google production level scale and throughput, with low enough latency to be serving real time user queries, over terabytes of structured data.
EliasDB - a lightweight graph based database that does not require any third-party libraries.
Facebook TAO - TAO is the distributed data store that is widely used at facebook to store and serve the social graph.
GCHQ Gaffer - Gaffer by GCHQ is a framework that makes it easy to store large-scale graphs in which the nodes and edges have statistics.
Google Cayley - open-source graph database.
Google Pregel - graph processing framework.
GraphLab PowerGraph - a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API.
GraphX - resilient Distributed Graph System on Spark.
Gremlin - graph traversal Language.
Infovore - RDF-centric Map/Reduce framework.
Intel GraphBuilder - tools to construct large-scale graphs on top of Hadoop.
JanusGraph - open-source, distributed graph database
with multiple options for storage backends (Bigtable, HBase, Cassandra, etc.)
and indexing backends (Elasticsearch, Solr, Lucene).
MapGraph - Massively Parallel Graph processing on GPUs.
Microsoft Graph Engine - a distributed in-memory data processing engine, underpinned by a strongly-typed in-memory key-value store and a general distributed computation engine.
Neo4j - graph database written entirely in Java.
OrientDB - document and graph database.
Phoebus - framework for large scale graph processing.
Titan - distributed graph database, built over Cassandra.
Twitter FlockDB - distributed graph database.
NodeXL - A free, open-source template for Microsoft® Excel® 2007, 2010, 2013 and 2016 that makes it easy to explore network graphs.

Columnar Databases

Note please read the note on Key-Map Data Model section.

Columnar Storage - an explanation of what columnar storage is and when you might want it.
Actian Vector - column-oriented analytic database.
C-Store - column oriented DBMS.
ClickHouse - an open-source column-oriented database management system that allows generating analytical data reports in real time.
EventQL - a distributed, column-oriented database built for large-scale event collection and analytics.
MonetDB - column store database.
Parquet - columnar storage format for Hadoop.
Pivotal Greenplum - purpose-built, dedicated analytic data warehouse that offers a columnar engine as well as a traditional row-based one.
Vertica - is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses.
SQream DB - A GPU powered big data database, designed for analytics and data warehousing, with ANSI-92 compliant SQL, suitable for data sets from 10TB to 1PB.
Google BigQuery - Google’s cloud offering backed by their pioneering work on Dremel.
Amazon Redshift - Amazon’s cloud offering, also based on a columnar datastore backend.
IndexR - an open-source columnar storage format for fast & realtime analytic with big data.
LocustDB - an experimental analytics database aiming to set a new standard for query performance on commodity hardware.

NewSQL Databases

Actian Ingres - commercially supported, open-source SQL relational database management system.
ActorDB - a distributed SQL database with the scalability of a KV store, while keeping the query capabilities of a relational database.
Amazon RedShift - data warehouse service, based on PostgreSQL.
BayesDB - statistic oriented SQL database.
Bedrock - a simple, modular, networked and distributed transaction layer built atop SQLite.
CitusDB - scales out PostgreSQL through sharding and replication.
Cockroach - Scalable, Geo-Replicated, Transactional Datastore.
Comdb2 - a clustered RDBMS built on optimistic concurrency control techniques.
Datomic - distributed database designed to enable scalable, flexible and intelligent applications.
FoundationDB - distributed database, inspired by F1.
Google F1 - distributed SQL database built on Spanner.
Google Spanner - globally distributed semi-relational database.
H-Store - is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications.
Haeinsa - linearly scalable multi-row, multi-table transaction library for HBase based on Percolator.
HandlerSocket - NoSQL plugin for MySQL/MariaDB.
InfiniSQL - infinity scalable RDBMS.
Map-D - GPU in-memory database, big data analysis and visualization platform.
MemSQL - in memory SQL database witho optimized columnar storage on flash.
NuoDB - SQL/ACID compliant distributed database.
Oracle TimesTen in-Memory Database - in-memory, relational database management system with persistence and recoverability.
Pivotal GemFire XD - Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS.
SAP HANA - is an in-memory, column-oriented, relational database management system.
SenseiDB - distributed, realtime, semi-structured database.
Sky - database used for flexible, high performance analysis of behavioral data.
SymmetricDS - open source software for both file and database synchronization.
TiDB - TiDB is a distributed SQL database. Inspired by the design of Google F1.
VoltDB - claims to be fastest in-memory database.

Time-Series Databases

Axibase Time Series Database - Integrated time series database on top of HBase with built-in visualization, rule-engine and SQL support.
Chronix - a time series storage built to store time series highly compressed and for fast access times.
Cube - uses MongoDB to store time series data.
Heroic - is a scalable time series database based on Cassandra and Elasticsearch.
InfluxDB - distributed time series database.
IronDB - scalable, general-purpose time series database.
Kairosdb - similar to OpenTSDB but allows for Cassandra.
M3DB - a distributed time series database that can be used for storing realtime metrics at long retention.
Newts - a time series database based on Apache Cassandra.
OpenTSDB - distributed time series database on top of HBase.
Prometheus - a time series database and service monitoring system.
Beringei - Facebook’s in-memory time-series database.
TrailDB - an efficient tool for storing and querying series of events.
Druid Column oriented distributed data store ideal for powering interactive applications
Riak-TS Riak TS is the only enterprise-grade NoSQL time series database optimized specifically for IoT and Time Series data.
Akumuli Akumuli is a numeric time-series database. It can be used to capture, store and process time-series data in real-time. The word “akumuli” can be translated from esperanto as “accumulate”.
Rhombus A time-series object store for Cassandra that handles all the complexity of building wide row indexes.
Dalmatiner DB Fast distributed metrics database
Blueflood A distributed system designed to ingest and process time series data
Timely Timely is a time series database application that provides secure access to time series data based on Accumulo and Grafana.
SiriDB Highly-scalable, robust and fast, open source time series database with cluster functionality.
Thanos - Thanos is a set of components to create a highly available metric system with unlimited storage capacity using multiple (existing) Prometheus deployments.
VictoriaMetrics - fast, scalable and resource-effective open-source TSDB compatible with Prometheus. Single-node and cluster versions included

SQL-like processing

Actian SQL for Hadoop - high performance interactive SQL access to all Hadoop data.
Apache Drill - framework for interactive analysis, inspired by Dremel.
Apache HCatalog - table and storage management layer for Hadoop.
Apache Hive - SQL-like data warehouse system for Hadoop.
Apache Calcite - framework that allows efficient translation of queries involving heterogeneous and federated data.
Apache Phoenix - SQL skin over HBase.
Aster Database - SQL-like analytic processing for MapReduce.
Cloudera Impala - framework for interactive analysis, Inspired by Dremel.
Concurrent Lingual - SQL-like query language for Cascading.
Datasalt Splout SQL - full SQL query engine for big datasets.
Dremio - an open-source, SQL-like Data-as-a-Service Platform based on Apache Arrow.
Facebook PrestoDB - distributed SQL query engine.
Google BigQuery - framework for interactive analysis, implementation of Dremel.
PipelineDB - an open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables.
Pivotal HDB - SQL-like data warehouse system for Hadoop.
RainstorDB - database for storing petabyte-scale volumes of structured and semi-structured data.
Spark Catalyst - is a Query Optimization Framework for Spark and Shark.
SparkSQL - Manipulating Structured Data Using Spark.
Splice Machine - a full-featured SQL-on-Hadoop RDBMS with ACID transactions.
Stinger - interactive query for Hive.
Tajo - distributed data warehouse system on Hadoop.
Trafodion - enterprise-class SQL-on-HBase solution targeting big data transactional or operational workloads.

Data Ingestion

Amazon Kinesis - real-time processing of streaming data at massive scale.
Amazon Web Services Glue - serverless fully managed extract, transform, and load (ETL) service
Apache Chukwa - data collection system.
Apache Flume - service to manage large amount of log data.
Apache Kafka - distributed publish-subscribe messaging system.
Apache NiFi - Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems.
Apache Sqoop - tool to transfer data between Hadoop and a structured datastore.
Cloudera Morphlines - framework that help ETL to Solr, HBase and HDFS.
Embulk - open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services.
Facebook Scribe - streamed log data aggregator.
Fluentd - tool to collect events and logs.
Google Photon - geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency.
Heka - open source stream processing software system.
HIHO - framework for connecting disparate data sources with Hadoop.
Kestrel - distributed message queue system.
LinkedIn Databus - stream of change capture events for a database.
LinkedIn Kamikaze - utility package for compressing sorted integer arrays.
LinkedIn White Elephant - log aggregator and dashboard.
Logstash - a tool for managing events and logs.
Netflix Suro - log agregattor like Storm and Samza based on Chukwa.
Pinterest Secor - is a service implementing Kafka log persistance.
Linkedin Gobblin - linkedin’s universal data ingestion framework.
Skizze - sketch data store to deal with all problems around counting and sketching using probabilistic data-structures.
StreamSets Data Collector - continuous big data ingest infrastructure with a simple to use IDE.
Yahoo Pulsar - a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
Alooma - data pipeline as a service enabling moving data sources such as MySQL into data warehouses.

Service Programming

Akka Toolkit - runtime for distributed, and fault tolerant event-driven applications on the JVM.
Apache Avro - data serialization system.
Apache Curator - Java libaries for Apache ZooKeeper.
Apache Karaf - OSGi runtime that runs on top of any OSGi framework.
Apache Thrift - framework to build binary protocols.
Apache Zookeeper - centralized service for process management.
Google Chubby - a lock service for loosely-coupled distributed systems.
Hydrosphere Mist - a service for exposing Apache Spark analytics jobs and machine learning models as realtime, batch or reactive web services.
Linkedin Norbert - cluster manager.
Mara - A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
OpenMPI - message passing framework.
Serf - decentralized solution for service discovery and orchestration.
Spotify Luigi - a Python package for building complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Spring XD - distributed and extensible system for data ingestion, real time analytics, batch processing, and data export.
Twitter Elephant Bird - libraries for working with LZOP-compressed data.
Twitter Finagle - asynchronous network stack for the JVM.

Scheduling

Apache Airflow - a platform to programmatically author, schedule and monitor workflows.
Apache Aurora - is a service scheduler that runs on top of Apache Mesos.
Apache Falcon - data management framework.
Apache Oozie - workflow job scheduler.
Azure Data Factory - cloud-based pipeline orchestration for on-prem, cloud and HDInsight
Chronos - distributed and fault-tolerant scheduler.
Linkedin Azkaban - batch workflow job scheduler.
Schedoscope - Scala DSL for agile scheduling of Hadoop jobs.
Sparrow - scheduling platform.

Machine Learning

Azure ML Studio - Cloud-based AzureML, R, Python Machine Learning platform
brain - Neural networks in JavaScript.
Cloudera Oryx - real-time large-scale machine learning.
Concurrent Pattern - machine learning library for Cascading.
convnetjs - Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.
DataVec - A vectorization and data preprocessing library for deep learning in Java and Scala. Part of the Deeplearning4j ecosystem.
Deeplearning4j - Fast, open deep learning for the JVM (Java, Scala, Clojure). A neural network configuration layer powered by a C++ library. Uses Spark and Hadoop to train nets on multiple GPUs and CPUs.
Decider - Flexible and Extensible Machine Learning in Ruby.
ENCOG - machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data.
etcML - text classification with machine learning.
Etsy Conjecture - scalable Machine Learning in Scalding.
Feast - A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.
GraphLab Create - A machine learning platform in Python with a broad collection of ML toolkits, data engineering, and deployment tools.
H2O - statistical, machine learning and math runtime with Hadoop. R and Python.
Keras - An intuitive neural net API inspired by Torch that runs atop Theano and Tensorflow.
Lambdo is a workflow engine which significantly simplifies data processing and analysis by combining in one analysis pipeline (i) feature engineering and machine learning (ii) model training and prediction (iii) table population and column evaluation via user-defined (Python) functions.
Mahout - An Apache-backed machine learning library for Hadoop.
MLbase - distributed machine learning libraries for the BDAS stack.
MLPNeuralNet - Fast multilayer perceptron neural network library for iOS and Mac OS X.
ML Workspace - All-in-one web-based IDE specialized for machine learning and data science.
MOA - MOA performs big data stream mining in real time, and large scale machine learning.
MonkeyLearn - Text mining made easy. Extract and classify data from text.
ND4J - A matrix library for the JVM. Numpy for Java.
nupic - Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.
PredictionIO - machine learning server buit on Hadoop, Mahout and Cascading.
RL4J - Reinforcement learning for Java and Scala. Includes Deep-Q learning and A3C algorithms, and integrates with Open AI’s Gym. Runs in the Deeplearning4j ecosystem.
SAMOA - distributed streaming machine learning framework.
scikit-learn - scikit-learn: machine learning in Python.
Spark MLlib - a Spark implementation of some common machine learning (ML) functionality.
Sibyl - System for Large Scale Machine Learning at Google.
TensorFlow - Library from Google for machine learning using data flow graphs.
Theano - A Python-focused machine learning library supported by the University of Montreal.
Torch - A deep learning library with a Lua API, supported by NYU and Facebook.
Velox - System for serving machine learning predictions.
Vowpal Wabbit - learning system sponsored by Microsoft and Yahoo!.
WEKA - suite of machine learning software.
BidMach - CPU and GPU-accelerated Machine Learning Library.

Benchmarking

Apache Hadoop Benchmarking - micro-benchmarks for testing Hadoop performances.
Berkeley SWIM Benchmark - real-world big data workload benchmark.
Intel HiBench - a Hadoop benchmark suite.
PUMA Benchmarking - benchmark suite for MapReduce applications.
Yahoo Gridmix3 - Hadoop cluster benchmarking from Yahoo engineer team.
Deeplearning4j Benchmarks

Security

Apache Ranger - Central security admin & fine-grained authorization for Hadoop
Apache Eagle - real time monitoring solution
Apache Knox Gateway - single point of secure access for Hadoop clusters.
Apache Sentry - security module for data stored in Hadoop.
BDA - The vulnerability detector for Hadoop and Spark

System Deployment

Apache Ambari - operational framework for Hadoop mangement.
Apache Bigtop - system deployment framework for the Hadoop ecosystem.
Apache Helix - cluster management framework.
Apache Mesos - cluster manager.
Apache Slider - is a YARN application to deploy existing distributed applications on YARN.
Apache Whirr - set of libraries for running cloud services.
Apache YARN - Cluster manager.
Brooklyn - library that simplifies application deployment and management.
Buildoop - Similar to Apache BigTop based on Groovy language.
Cloudera HUE - web application for interacting with Hadoop.
Facebook Prism - multi datacenters replication system.
Google Borg - job scheduling and monitoring system.
Google Omega - job scheduling and monitoring system.
Hortonworks HOYA - application that can deploy HBase cluster on YARN.
Kubernetes - a system for automating deployment, scaling, and management of containerized applications.
Marathon - Mesos framework for long-running services.

Applications

411 - an web application for alert management resulting from scheduled searches into Elasticsearch.
Adobe spindle - Next-generation web analytics processing with Scala, Spark, and Parquet.
Apache Kiji - framework to collect and analyze data in real-time, based on HBase.
Apache Metron - a platform that integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis.
Apache Nutch - open source web crawler.
Apache OODT - capturing, processing and sharing of data for NASA’s scientific archives.
Apache Tika - content analysis toolkit.
Argus - Time series monitoring and alerting platform.
AthenaX - a streaming analytics platform that enables users to run production-quality, large scale streaming analytics using Structured Query Language (SQL).
Atlas - a backend for managing dimensional time series data.
Countly - open source mobile and web analytics platform, based on Node.js & MongoDB.
Domino - Run, scale, share, and deploy models — without any infrastructure.
Eclipse BIRT - Eclipse-based reporting system.
ElastAert - ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in ElasticSearch.
Eventhub - open source event analytics platform.
Hermes - asynchronous message broker built on top of Kafka.
HIPI Library - API for performing image processing tasks on Hadoop’s MapReduce.
Hunk - Splunk analytics for Hadoop.
Imhotep - Large scale analytics platform by indeed.
Indicative - Web & mobile analytics tool, with data warehouse (AWS, BigQuery) integration.
Jupyter - Notebook and project application for interactive data science and scientific computing across all programming languages.
MADlib - data-processing library of an RDBMS to analyze data.
Kapacitor - an open source framework for processing, monitoring, and alerting on time series data.
Kylin - open source Distributed Analytics Engine from eBay.
PivotalR - R on Pivotal HD / HAWQ and PostgreSQL.
Rakam - open-source real-time custom analytics platform powered by Postgresql, Kinesis and PrestoDB.
Qubole - auto-scaling Hadoop cluster, built-in data connectors.
Sense - Cloud Platform for Data Science and Big Data Analytics.
SnappyData - a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) built on Spark in a single integrated cluster.
Snowplow - enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres.
SparkR - R frontend for Spark.
Splunk - analyzer for machine-generated data.
Sumo Logic - cloud based analyzer for machine-generated data.
Talend - unified open source environment for YARN, Hadoop, HBASE, Hive, HCatalog & Pig.
Warp - query by example tool for big data (OS X app)

Search engine and framework

Apache Lucene - Search engine library.
Apache Solr - Search platform for Apache Lucene.
Elassandra - is a fork of Elasticsearch modified to run on top of Apache Cassandra in a scalable and resilient peer-to-peer architecture.
ElasticSearch - Search and analytics engine based on Apache Lucene.
Enigma.io – Freemium robust web application for exploring, filtering, analyzing, searching and exporting massive datasets scraped from across the Web.
Facebook Unicorn - social graph search platform.
Google Caffeine - continuous indexing system.
Google Percolator - continuous indexing system.
TeraGoogle - large search index.
HBase Coprocessor - implementation of Percolator, part of HBase.
Lily HBase Indexer - quickly and easily search for any content stored in HBase.
LinkedIn Bobo - is a Faceted Search implementation written purely in Java, an extension to Apache Lucene.
LinkedIn Cleo - is a flexible software library for enabling rapid development of partial, out-of-order and real-time typeahead search.
LinkedIn Galene - search architecture at LinkedIn.
LinkedIn Zoie - is a realtime search/indexing system written in Java.
MG4J - MG4J (Managing Gigabytes for Java) is a full-text search engine for large document collections written in Java. It is highly customisable, high-performance and provides state-of-the-art features and new research algorithms.
Sphinx Search Server - fulltext search engine.
Vespa - is an engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time.
Facebook Faiss - is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy.
Annoy - is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.

MySQL forks and evolutions

Amazon RDS - MySQL databases in Amazon’s cloud.
Drizzle - evolution of MySQL 6.0.
Google Cloud SQL - MySQL databases in Google’s cloud.
MariaDB - enhanced, drop-in replacement for MySQL.
MySQL Cluster - MySQL implementation using NDB Cluster storage engine.
Percona Server - enhanced, drop-in replacement for MySQL.
ProxySQL - High Performance Proxy for MySQL.
TokuDB - TokuDB is a storage engine for MySQL and MariaDB.
WebScaleSQL - is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale.

PostgreSQL forks and evolutions

HadoopDB - hybrid of MapReduce and DBMS.
IBM Netezza - high-performance data warehouse appliances.
Postgres-XL - Scalable Open Source PostgreSQL-based Database Cluster.
RecDB - Open Source Recommendation Engine Built Entirely Inside PostgreSQL.
Stado - open source MPP database system solely targeted at data warehousing and data mart applications.
Yahoo Everest - multi-peta-byte database / MPP derived by PostgreSQL.
TimescaleDB - An open-source time-series database optimized for fast ingest and complex queries
PipelineDB - The Streaming SQL Database. An open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables

Memcached forks and evolutions

Facebook McDipper - key/value cache for flash storage.
Facebook Memcached - fork of Memcache.
Twemproxy - A fast, light-weight proxy for memcached and redis.
Twitter Fatcache - key/value cache for flash storage.
Twitter Twemcache - fork of Memcache.

Embedded Databases

Actian PSQL - ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
BerkeleyDB - a software library that provides a high-performance embedded database for key/value data.
HanoiDB - Erlang LSM BTree Storage.
LevelDB - a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
LMDB - ultra-fast, ultra-compact key-value embedded data store developed by Symas.
RocksDB - embeddable persistent key-value store for fast storage based on LevelDB.

Business Intelligence

BIME Analytics - business intelligence platform in the cloud.
Blazer - business intelligence made simple.
Chartio - lean business intelligence platform to visualize and explore your data.
datapine - self-service business intelligence tool in the cloud.
GoodData - platform for data products and embedded analytics.
Jaspersoft - powerful business intelligence suite.
Jedox Palo - customisable Business Intelligence platform.
Jethrodata - Interactive Big Data Analytics.
intermix.io - Performance Monitoring for Amazon Redshift
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company.
Microsoft - business intelligence software and platform.
Microstrategy - software platforms for business intelligence, mobile intelligence, and network applications.
Numeracy - Fast, clean SQL client and business intelligence.
Pentaho - business intelligence platform.
Qlik - business intelligence and analytics platform.
Redash - Open source business intelligence platform, supporting multiple data sources and planned queries.
Saiku - open source analytics platform.
Knowage - open source business intelligence platform. (former SpagoBi)
SparklineData SNAP - modern B.I platform powered by Apache Spark.
Tableau - business intelligence platform.
Zoomdata - Big Data Analytics.

Data Visualization

Airpal - Web UI for PrestoDB.
AnyChart - fast, simple and flexible JavaScript (HTML5) charting library featuring pure JS API.
Arbor - graph visualization library using web workers and jQuery.
Banana - visualize logs and time-stamped data stored in Solr. Port of Kibana.
Bloomery - Web UI for Impala.
Bokeh - A powerful Python interactive visualization library that targets modern web browsers for presentation, with the goal of providing elegant, concise construction of novel graphics in the style of D3.js, but also delivering this capability with high-performance interactivity over very large or streaming datasets.
C3 - D3-based reusable chart library
CartoDB - open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API.
chartd - responsive, retina-compatible charts with just an img tag.
Chart.js - open source HTML5 Charts visualizations.
Chartist.js - another open source HTML5 Charts visualization.
Crossfilter - JavaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js.
Cubism - JavaScript library for time series visualization.
Cytoscape - JavaScript library for visualizing complex networks.
DC.js - Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3.
D3 - javaScript library for manipulating documents.
D3.compose - Compose complex, data-driven visualizations from reusable charts and components.
D3Plus - A fairly robust set of reusable charts and styles for d3.js.
DevExtreme React Chart - High-performance plugin-based React chart for Bootstrap and Material Design.
Echarts - Baidus enterprise charts.
Envisionjs - dynamic HTML5 visualization.
FnordMetric - write SQL queries that return SVG charts rather than tables
Frappe Charts - GitHub-inspired simple and modern SVG charts for the web with zero dependencies.
Freeboard - pen source real-time dashboard builder for IOT and other web mashups.
Gephi - An award-winning open-source platform for visualizing and manipulating large graphs and network connections. It’s like Photoshop, but for graphs. Available for Windows and Mac OS X.
Google Charts - simple charting API.
Grafana - graphite dashboard frontend, editor and graph composer.
Graphite - scalable Realtime Graphing.
Highcharts - simple and flexible charting API.
IPython - provides a rich architecture for interactive computing.
Kibana - visualize logs and time-stamped data
Lumify - open source big data analysis and visualization platform
Matplotlib - plotting with Python.
Metricsgraphic.js - a library built on top of D3 that is optimized for time-series data
NVD3 - chart components for d3.js.
Peity - Progressive SVG bar, line and pie charts.
Plot.ly - Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly’s online spreadsheet. Fork others’ plots.
Plotly.js The open source javascript graphing library that powers plotly.
Recline - simple but powerful library for building data applications in pure Javascript and HTML.
Redash - open-source platform to query and visualize data.
ReCharts - A composable charting library built on React components
Shiny - a web application framework for R.
Sigma.js - JavaScript library dedicated to graph drawing.
Superset - a data exploration platform designed to be visual, intuitive and interactive, making it easy to slice, dice and visualize data and perform analytics at the speed of thought.
Vega - a visualization grammar.
Zeppelin - a notebook-style collaborative data analysis.
Zing Charts - JavaScript charting library for big data.

Internet of things and sensor data

Apache Edgent (Incubating) - a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the edge devices.
Azure IoT Hub - Cloud-based bi-directional monitoring and messaging hub
TempoIQ - Cloud-based sensor analytics.
2lemetry - Platform for Internet of things.
Pubnub - Data stream network
ThingWorx - Rapid development and connection of intelligent systems
IFTTT - If this then that
Evrything- Making products smart
NetLytics - Analytics platform to process network data on Spark.

Interesting Readings

Big Data Benchmark - Benchmark of Redshift, Hive, Shark, Impala and Stiger/Tez.
NoSQL Comparison - Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison.
Monitoring Kafka performance - Guide to monitoring Apache Kafka, including native methods for metrics collection.
Monitoring Hadoop performance - Guide to monitoring Hadoop, with an overview of Hadoop architecture, and native methods for metrics collection.
Monitoring Cassandra performance - Guide to monitoring Cassandra, including native methods for metrics collection.

Interesting Papers

2015 - 2016

2015 - Facebook - One Trillion Edges: Graph Processing at Facebook-Scale.

2013 - 2014

2014 - Stanford - Mining of Massive Datasets.
2013 - AMPLab - Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.
2013 - AMPLab - MLbase: A Distributed Machine-learning System.
2013 - AMPLab - Shark: SQL and Rich Analytics at Scale.
2013 - AMPLab - GraphX: A Resilient Distributed Graph System on Spark.
2013 - Google - HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.
2013 - Microsoft - Scalable Progressive Analytics on Big Data in the Cloud.
2013 - Metamarkets - Druid: A Real-time Analytical Data Store.
2013 - Google - Online, Asynchronous Schema Change in F1.
2013 - Google - F1: A Distributed SQL Database That Scales.
2013 - Google - MillWheel: Fault-Tolerant Stream Processing at Internet Scale.
2013 - Facebook - Scuba: Diving into Data at Facebook.
2013 - Facebook - Unicorn: A System for Searching the Social Graph.
2013 - Facebook - Scaling Memcache at Facebook.

2011 - 2012

2012 - Twitter - The Unified Logging Infrastructure
for Data Analytics at Twitter.
2012 - AMPLab - Blink and It’s Done: Interactive Queries on Very Large Data.
2012 - AMPLab - Fast and Interactive Analytics over Hadoop Data with Spark.
2012 - AMPLab - Shark: Fast Data Analysis Using Coarse-grained Distributed Memory.
2012 - Microsoft - Paxos Replicated State Machines as the Basis of a High-Performance Data Store.
2012 - Microsoft - Paxos Made Parallel.
2012 - AMPLab - BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.
2012 - Google - Processing a trillion cells per mouse click.
2012 - Google - Spanner: Google’s Globally-Distributed Database.
2011 - AMPLab - Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.
2011 - AMPLab - Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
2011 - Google - Megastore: Providing Scalable, Highly Available Storage for Interactive Services.

2001 - 2010

2010 - Facebook - Finding a needle in Haystack: Facebook’s photo storage.
2010 - AMPLab - Spark: Cluster Computing with Working Sets.
2010 - Google - Pregel: A System for Large-Scale Graph Processing.
2010 - Google - Large-scale Incremental Processing Using Distributed Transactions and Notiﬁcations base of Percolator and Caffeine.
2010 - Google - Dremel: Interactive Analysis of Web-Scale Datasets.
2010 - Yahoo - S4: Distributed Stream Computing Platform.
2009 - HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
2008 - AMPLab - Chukwa: A large-scale monitoring system.
2007 - Amazon - Dynamo: Amazon’s Highly Available Key-value Store.
2006 - Google - The Chubby lock service for loosely-coupled distributed systems.
2006 - Google - Bigtable: A Distributed Storage System for Structured Data.
2004 - Google - MapReduce: Simplied Data Processing on Large Clusters.
2003 - Google - The Google File System.

Videos

Spark in Motion - Spark in Motion teaches you how to use Spark for batch and streaming data analytics.
Machine Learning, Data Science and Deep Learning with Python - LiveVideo tutorial that covers machine learning, Tensorflow, artificial intelligence, and neural networks.

Books

Streaming

Data Science at Scale with Python and Dask - Data Science at Scale with Python and Dask teaches you how to build distributed data projects that can handle huge amounts of data.
Streaming Data - Streaming Data introduces the concepts and requirements of streaming and real-time data systems.
Storm Applied - Storm Applied is a practical guide to using Apache Storm for the real-world tasks associated with processing and analyzing real-time data streams.
Fundamentals of Stream Processing: Application Design, Systems, and Analytics - This comprehensive, hands-on guide combining the fundamental building blocks and emerging research in stream processing is ideal for application designers, system builders, analytic developers, as well as students and researchers in the field.
Stream Data Processing: A Quality of Service Perspective - Presents a new paradigm suitable for stream and complex event processing.
Unified Log Processing - Unified Log Processing is a practical guide to implementing a unified log of event streams (Kafka or Kinesis) in your business
Kafka Streams in Action - Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort.
Big Data - Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.
Spark in Action & Spark in Action 2nd Ed. - Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0.
Kafka in Action - Kafka in Action is a fast-paced introduction to every aspect of working with Kafka you need to really reap its benefits.
Fusion in Action - Fusion in Action teaches you to build a full-featured data analytics pipeline, including document and data search and distributed data clustering.
Reactive Data Handling - Reactive Data Handling is a collection of five hand-picked chapters, selected by Manuel Bernhardt, that introduce you to building reactive applications capable of handling real-time processing with large data loads–free eBook!

Distributed systems

Distributed Systems for fun and profit – Theory of distributed systems. Include parts about time and ordering, replication and impossibility results.

Graph Based approach

Graph-Powered Machine Learning - Alessandro Negro. Combine graph theory and models to improve machine learning projects

Data Visualization

The beauty of data visualization
Designing Data Visualizations with Noah Iliinsky
Hans Rosling’s 200 Countries, 200 Years, 4 Minutes
Ice Bucket Challenge Data Visualization

Other Awesome Lists

Other awesome lists awesome-awesomeness.
Even more lists awesome.
Another list? list.
WTF! awesome-awesome-awesome.
Analytics awesome-analytics.
Public Datasets awesome-public-datasets.
Graph Classification awesome-graph-classification.
Network Embedding awesome-network-embedding.
Community Detection awesome-community-detection.
Decision Tree Papers awesome-decision-tree-papers.
Fraud Detection Papers awesome-fraud-detection-papers.
Gradient Boosting Papers awesome-gradient-boosting-papers.
Kafka awesome-kafka.

你可能感兴趣的:(大数据)

基于SpringBoot的物业管理系统计算机学姐 Java精选实战项目源码 SpringBoot源码 Vue源码 spring boot 后端 java mysql vue.js spring intellij-idea
作者：计算机学姐开发技术：SpringBoot、SSM、Vue、MySQL、JSP、ElementUI、Python、小程序等，“文末源码”。专栏推荐：前后端分离项目源码、SpringBoot项目源码、Vue项目源码、SSM项目源码、微信小程序源码精品专栏：Java精选实战项目源码、Python精选实战项目源码、大数据精选实战项目源码系统展示【2025最新】基于Java+SpringBoot+Vu
【原创】大数据治理入门（2）《提升数据质量：质量评估与改进策略》入门必看高赞实用精通代码大仙数据库 hadoop python 大数据信息可视化 python 数据库 sql
提升数据质量：质量评估与改进策略引言：数据质量的概念在大数据时代，数据的质量直接影响到数据分析的准确性和可靠性。数据质量是指数据在多大程度上能够满足其预定用途，确保数据的准确性、完整性、一致性和及时性是数据质量的关键要素。高质量的数据能够帮助企业更好地理解市场趋势、优化运营流程、支持业务决策，从而提升企业的竞争力。质量评估指标：准确性、完整性、一致性、及时性准确性（Accuracy）定义：数据的准
探索后端的无尽魅力：构建强大而高效的服务器世界后端
在当今数字化的时代，后端技术犹如一座坚实的桥梁，连接着用户和丰富多彩的互联网世界。后端是默默耕耘的力量，在互联网舞台上，前端吸引眼球，而后端是支撑舞台的坚实支柱。它负责处理数据、管理服务器、确保系统的稳定性和安全性，没有后端，前端的华丽展示将无从谈起。随着技术发展，后端领域迎来诸多热点，如人工智能、大数据、云计算等新兴技术崛起，为后端开发带来无限可能，但也带来新挑战，如在海量数据中实现高效处理和存
案例分享｜Alluxio数据流转方案在联通智网的应用人工智能运维大数据idc
分享嘉宾陈得泳-中国联通大数据平台SRE工程师，致力于基于开源生态构建稳定、高效、安全、低成本的大数据集群。观看完整分享回放业务背景统一底座和安全基座位于不同IDC；统一底座：承接O域全域网络数据，包括移动网信令、告警、故障、资源以及固网数据等基础数据加工的大数据集群，位于郑州IDC；安全基座：是应对网络安全专项支撑的大数据分析平台，位于呼和IDC。统一底座加工后的DNS/NetFlow等固网基础
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
WebMagic：强大的Java爬虫框架解析与实战 Aaron_945 Java java 爬虫开发语言
文章目录引言官网链接WebMagic原理概述基础使用1.添加依赖2.编写PageProcessor高级使用1.自定义Pipeline2.分布式抓取优点结论引言在大数据时代，网络爬虫作为数据收集的重要工具，扮演着不可或缺的角色。Java作为一门广泛使用的编程语言，在爬虫开发领域也有其独特的优势。WebMagic是一个开源的Java爬虫框架，它提供了简单灵活的API，支持多线程、分布式抓取，以及丰富的
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
如何利用大数据与AI技术革新相亲交友体验 h17711347205 回归算法安全系统架构交友小程序
在数字化时代，大数据和人工智能（AI）技术正逐渐革新相亲交友体验，为寻找爱情的过程带来前所未有的变革（编辑h17711347205）。通过精准分析和智能匹配，这些技术能够极大地提高相亲交友系统的效率和用户体验。大数据的力量大数据技术能够收集和分析用户的行为模式、偏好和互动数据，为相亲交友系统提供丰富的信息资源。通过分析用户的搜索历史、浏览记录和点击行为，系统能够深入了解用户的兴趣和需求，从而提供更
未来软件市场是怎么样的？做开发的生存空间如何？ cesske 软件需求
目录前言一、未来软件市场的发展趋势二、软件开发人员的生存空间前言未来软件市场是怎么样的？做开发的生存空间如何？一、未来软件市场的发展趋势技术趋势：人工智能与机器学习：随着技术的不断成熟，人工智能将在更多领域得到应用，如智能客服、自动驾驶、智能制造等，这将极大地推动软件市场的增长。云计算与大数据：云计算服务将继续普及，大数据技术的应用也将更加广泛。企业将更加依赖云计算和大数据来优化运营、提升效率，并
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
[转载] NoSQL简介 weixin_30325793 大数据数据库运维
摘自“百度百科”。NoSQL，泛指非关系型的数据库。随着互联网web2.0网站的兴起，传统的关系数据库在应付web2.0网站，特别是超大规模和高并发的SNS类型的web2.0纯动态网站已经显得力不从心，暴露了很多难以克服的问题，而非关系型的数据库则由于其本身的特点得到了非常迅速的发展。NoSQL数据库的产生就是为了解决大规模数据集合多重数据种类带来的挑战，尤其是大数据应用难题。虽然NoSQL流行语
Kafka详细解析与应用分析芊言芊语 kafka 分布式
Kafka是一个开源的分布式事件流平台（EventStreamingPlatform），由LinkedIn公司最初采用Scala语言开发，并基于ZooKeeper协调管理。如今，Kafka已经被Apache基金会纳入其项目体系，广泛应用于大数据实时处理领域。Kafka凭借其高吞吐量、持久化、分布式和可靠性的特点，成为构建实时流数据管道和流处理应用程序的重要工具。Kafka架构Kafka的架构主要由
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
疫情，疫情东山草
2020年，疫情爆发，至今已近三年，反反复复，此起彼伏。不但没被消灭，还自我发展，从德尔塔到奥密克戎，与时俱进的变异着。去年11月，疫情之下，大数据800米范围内，都成为时空伴随者。“你的码儿有没有变颜色”“你绿码还是黄码”成为那段时间的流行语，当然少不了的还有全员核酸。段子手整出来一首歌：我走过你走过的路,这算不算相逢？我吹过你吹过的风，这算不算相拥？800米内我们不曾擦肩而过，你却要我14天相
在服务器计算节点中使用 jupyter Lab ranshan567 程序人生
JupyterLab是一个基于网页的交互式开发环境,用于科学计算、数据分析和机器学.jupyterlab是jupyternotebook的下一代产品,集成了更多功能,使用起来更方便.在进行数据分析及可视化时，个人电脑不能满足大数据的分析需求，就需要用到高性能计算机集群资源，然而计算机集群的计算节点往往没有联网功能，所以在计算机集群中使用jupyterLab需要进行一些配置。具体的步骤如下：
大数据真实面试题---SQL The博宇大数据面试题——SQL 大数据 mysql sql 数据库 big data
视频号数据分析组外包招聘笔试题时间限时45分钟完成。题目根据3张表表结构，写出具体求解的SQL代码（搞笑品类定义：视频分类或者视频创建者分类为“搞笑”）1、表创建语句：createtablet_user_video_action_d(dsint,user_idstring,video_idstring,action_typeint,`timestamp`bigint)rowformatdelimi
Flume：大规模日志收集与数据传输的利器傲雪凌霜，松柏长青后端大数据 flume 大数据
Flume：大规模日志收集与数据传输的利器在大数据时代，随着各类应用的不断增长，产生了海量的日志和数据。这些数据不仅对业务的健康监控至关重要，还可以通过深入分析，帮助企业做出更好的决策。那么，如何高效地收集、传输和存储这些海量数据，成为了一项重要的挑战。今天我们将深入探讨ApacheFlume，它是如何帮助我们应对这些挑战的。一、Flume概述ApacheFlume是一个分布式、可靠、可扩展的日志
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
架构评审的自动化与人工智能: 如何提高效率光剑书架上的书架构自动化人工智能运维
1.背景介绍架构评审是软件开发过程中的一个关键环节，它旨在确保软件架构的质量、可维护性和可扩展性。传统的架构评审通常是由人工进行，需要大量的时间和精力。随着大数据技术和人工智能的发展，自动化和人工智能技术已经开始应用于架构评审，从而提高评审的效率和准确性。在本文中，我们将讨论如何通过自动化和人工智能技术来提高架构评审的效率。我们将从以下几个方面进行讨论：背景介绍核心概念与联系核心算法原理和具体操作
【数字化供应链】数字化供应链架构、全景管理、全流程贯通方案数字化建设方案数字化转型数据治理主数据数据仓库供应链数字仓储智慧物流智慧仓储物流园区架构微服务数据挖掘大数据人工智能
原文《数字化供应链架构、全景管理、全流程贯通方案》PPT格式。主要从供应链管理全景、智慧供应链建设总体目标、供应链总体业务流程、供应链总体功能架构、供应链总体技术架构、供应链全流程贯通、供应链全领域管理、供应链数据数据分析、供应链决策中台等进行建设。本文仅对主要内容进行介绍。来源网络公开渠道，旨在交流学习，如有侵权联系速删，更多参考公众号：优享智库基于先进IT技术、大数据能力、物联网应用、区块链平
80 鑫_259b
科普一个谈恋爱的方法。在以前，谈恋爱千难万难，就难在对对方不知底细，不知道对方希望自己是一个怎样的人，要耗费大量的时间去试探、再磨合，往往会因为一些小事一些细节，满盘皆输。在一个信息化的时代，在一个大数据近乎变成了流行语的时代，我们要跟上时代的步伐，通过大数据，去寻找异性最希望自己展现出来的形象是什么，才可以在爱情的道路上少走弯路。那这个大数据怎么操作呢？上街发问卷？问别人的择偶标准？一来会被打死
解锁企业潜能，Vatee万腾平台引领智能新纪元自媒体经济说其他
在数字化转型的浪潮中，企业正站在一个前所未有的十字路口，面对着前所未有的机遇与挑战。解锁企业内在潜能，实现跨越式发展，已成为众多企业的共同追求。而Vatee万腾平台，作为智能科技的先锋，正以其强大的智能赋能能力，引领企业步入一个全新的智能纪元。Vatee万腾平台，是一个集成了人工智能、大数据、云计算等前沿技术的综合性智能服务平台。它不仅仅是一个技术工具，更是企业转型升级的加速器，能够深入企业运营的
释放“AI+”新质生产力，深算院如何“把大数据变小”？ YashanDB YashanDB 国产数据库数据库数据库大数据
近期，南都·湾财社推出《新质·中国造》栏目，深入千行百业，遍访湾区企业，解锁湾区新质生产力，共探高质量发展之道。本期对话深圳计算科学研究院YashanDB首席技术官陈志标，探讨国产数据库如何实现创新突围，抢抓数字经济时代的新机遇。以下是专访内容：如何应对AI时代所面临的算力挑战？南都·湾财社：数据、算力和算法是发展人工智能的三要素，深算院做了怎样的前瞻性布局？陈志标：今年，政府工作报告中首次提及开
数字化智能工厂数字化供应链架构、全景管理、全流程贯通方案数字化建设方案智能制造数字工厂制造业数字化转型工业互联网架构
随着信息技术的飞速发展，数字化转型已成为制造企业提升竞争力的关键途径。数字化智能工厂通过集成先进的物联网(IoT)、大数据、云计算、人工智能(AI)等技术，实现了生产过程的智能化、供应链管理的精准化及决策的科学化。本方案旨在构建一套完善的数字化供应链架构，实现全景管理、全流程贯通、智慧化升级，以数据为驱动，强化技术支撑与安全管理体系，推动企业向智能制造迈进。一、数字化供应链架构1.**集成化平台构
日记——我的歌单静若小猴
又到一年一度大数据汇总的时候了，听歌已经成为很多人生活里的一种乐趣。春夏秋冬，我们都有自己喜欢的歌，歌词歌曲唱出沃尔玛你的心声。还记得大学时候最喜欢听的《春天里》，我有一天单曲回放了30遍，总觉得听着仿佛看到自己声音。还有的歌，初听不知曲中意，再听已经是曲终人，听着歌流泪，听着歌入睡……还记得那些年少的故事吗，总觉得自己才是故事外的人，却不是自己已经入歌。一段时间会喜欢一个人的音乐，一段时间会沉静
Linux dmesg命令：显示开机信息 fafadsj666 linux 数据库数据挖掘机器学习大数据
通过学习《Linux启动管理》一章可以知道，在系统启动过程中，内核还会进行一次系统检测（第一次是BIOS进行加测），但是检测的过程不是没有显示在屏幕上，就是会快速的在屏幕上一闪而过那么，如果开机时来不及查看相关信息，我们是否可以在开机后查看呢？答案是肯定的，使用dmesg命令就可以。无论是系统启动过程中，还是系统运行过程中，只要是内核产生的信息，都会被存储在系统缓冲区中，已经为大家精心准备了大数据
大数据新视界 --大数据大厂之揭秘大数据时代 Excel 魔法：大厂数据分析师进阶秘籍青云交大数据新视界 Excel 数据分析函数公式数据透视表图表功能规划求解数据分析工具库大数据新视界数据库
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
大数据新视界 --大数据大厂之数据挖掘入门：用 R 语言开启数据宝藏的探索之旅青云交大数据新视界数据库大数据数据挖掘 R 语言算法案例未来趋势应用场景学习建议大数据新视界
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
java的(PO,VO,TO,BO,DAO,POJO) Cb123456 VO TO BO POJO DAO
转: http://www.cnblogs.com/yxnchinahlj/archive/2012/02/24/2366110.html ------------------------------------------------------------------- O/R Mapping 是 Object Relational Mapping（对象关系映
spring ioc原理（看完后大家可以自己写一个spring） aijuans spring
最近，买了本Spring入门书：spring In Action 。大致浏览了下感觉还不错。就是入门了点。Manning的书还是不错的，我虽然不像哪些只看Manning书的人那样专注于Manning,但怀着崇敬的心情和激情通览了一遍。又一次接受了IOC 、DI、AOP等Spring核心概念。先就IOC和DI谈一点我的看法。IO
MyEclipse 2014中Customize Persperctive设置无效的解决方法 Kai_Ge MyEclipse2014
高高兴兴下载个MyEclipse2014，发现工具条上多了个手机开发的按钮，心生不爽就想弄掉他！结果发现Customize Persperctive失效！！有说更新下就好了，可是国内Myeclipse访问不了，何谈更新... so~这里提供了更新后的一下jar包，给大家使用！ 1、将9个jar复制到myeclipse安装目录\plugins中 2、删除和这9个jar同包名但是版本号较
SpringMvc上传 120153216 springMVC
@RequestMapping(value = WebUrlConstant.UPLOADFILE) @ResponseBody public Map<String, Object> uploadFile(HttpServletRequest request,HttpServletResponse httpresponse) { try { //
Javascript----HTML DOM 事件何必如此 JavaScript html Web
HTML DOM 事件允许Javascript在HTML文档元素中注册不同事件处理程序。事件通常与函数结合使用，函数不会在事件发生前被执行！注：DOM：指明使用的 DOM 属性级别。 1.鼠标事件属性
动态绑定和删除onclick事件 357029540 JavaScript jquery
因为对JQUERY和JS的动态绑定事件的不熟悉，今天花了好久的时间才把动态绑定和删除onclick事件搞定!现在分享下我的过程。在我的查询页面，我将我的onclick事件绑定到了tr标签上同时传入当前行(this值)参数，这样可以在点击行上的任意地方时可以选中checkbox，但是在我的某一列上也有一个onclick事件是用于下载附件的，当
HttpClient|HttpClient请求详解 7454103 apache 应用服务器网络协议网络应用 Security
HttpClient 是 Apache Jakarta Common 下的子项目，可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包，并且它支持 HTTP 协议最新的版本和建议。本文首先介绍 HTTPClient，然后根据作者实际工作经验给出了一些常见问题的解决方法。HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了，越来越多的 Java 应用程序需
递归逐层统计树形结构数据 darkranger 数据结构
将集合递归获取树形结构: /** * * 递归获取数据 * @param alist:所有分类 * @param subjname:对应统计的项目名称 * @param pk:对应项目主键 * @param reportList: 最后统计的结果集 * @param count:项目级别 */ public void getReportVO(Arr
访问WEB-INF下使用frameset标签页面出错的原因 aijuans struts2
<frameset rows="61,*,24" cols="*" framespacing="0" frameborder="no" border="0">
MAVEN常用命令 avords
Maven库： http://repo2.maven.org/maven2/ Maven依赖查询： http://mvnrepository.com/ Maven常用命令： 1. 创建Maven的普通java项目： mvn archetype:create -DgroupId=packageName
PHP如果自带一个小型的web服务器就好了 houxinyou apache 应用服务器 Web PHP 脚本
最近单位用PHP做网站，感觉PHP挺好的，不过有一些地方不太习惯，比如，环境搭建。PHP本身就是一个网站后台脚本，但用PHP做程序时还要下载apache，配置起来也不太很方便，虽然有好多配置好的apache+php+mysq的环境，但用起来总是心里不太舒服，因为我要的只是一个开发环境，如果是真实的运行环境，下个apahe也无所谓，但只是一个开发环境，总有一种杀鸡用牛刀的感觉。如果php自己的程序中
NoSQL数据库之Redis数据库管理(list类型) bijian1013 redis 数据库 NoSQL
3.list类型及操作 List是一个链表结构，主要功能是push、pop、获取一个范围的所有值等等，操作key理解为链表的名字。Redis的list类型其实就是一个每个子元素都是string类型的双向链表。我们可以通过push、pop操作从链表的头部或者尾部添加删除元素，这样list既可以作为栈，又可以作为队列。 &nbs
谁在用Hadoop？ bingyingao hadoop 数据挖掘公司应用场景
Hadoop技术的应用已经十分广泛了，而我是最近才开始对它有所了解，它在大数据领域的出色表现也让我产生了兴趣。浏览了他的官网，其中有一个页面专门介绍目前世界上有哪些公司在用Hadoop，这些公司涵盖各行各业，不乏一些大公司如alibaba,ebay,amazon,google,facebook,adobe等，主要用于日志分析、数据挖掘、机器学习、构建索引、业务报表等场景,这更加激发了学习它的热情。
【Spark七十六】Spark计算结果存到MySQL bit1129 mysql
package spark.examples.db import java.sql.{PreparedStatement, Connection, DriverManager} import com.mysql.jdbc.Driver import org.apache.spark.{SparkContext, SparkConf} object SparkMySQLInteg
Scala: JVM上的函数编程 bookjovi scala erlang haskell
说Scala是JVM上的函数编程一点也不为过，Scala把面向对象和函数型编程这两种主流编程范式结合了起来，对于熟悉各种编程范式的人而言Scala并没有带来太多革新的编程思想，scala主要的有点在于Java庞大的package优势，这样也就弥补了JVM平台上函数型编程的缺失，MS家.net上已经有了F#，JVM怎么能不跟上呢？对本人而言
jar打成exe bro_feng java jar exe
今天要把jar包打成exe，jsmooth和exe4j都用了。遇见几个问题。记录一下。两个软件都很好使，网上都有图片教程，都挺不错。首先肯定是要用自己的jre的，不然不能通用，其次别忘了把需要的lib放到classPath中。困扰我很久的一个问题是，我自己打包成功后，在一个同事的没有装jdk的电脑上运行，就是不行，报错jvm.dll为无效的windows映像，如截图最后发现
读《研磨设计模式》-代码笔记-策略模式-Strategy bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /* 策略模式定义了一系列的算法，并将每一个算法封装起来，而且使它们还可以相互替换。策略模式让算法独立于使用它的客户而独立变化简单理解： 1、将不同的策略提炼出一个共同接口。这是容易的，因为不同的策略，只是算法不同，需要传递的参数
cmd命令值cvfM命令 chenyu19891124 cmd
cmd命令还真是强大啊。今天发现jar -cvfM aa.rar @aaalist 就这行命令可以根据aaalist取出相应的文件例如：在d：\workspace\prpall\test.java 有这样一个文件，现在想要将这个文件打成一个包。运行如下命令即可比如在d：\wor
OpenJWeb(1.8) Java Web应用快速开发平台 comsci java 框架 Web 项目管理企业应用
OpenJWeb(1.8) Java Web应用快速开发平台的作者是我们技术联盟的成员，他最近推出了新版本的快速应用开发平台 OpenJWeb(1.8)，我帮他做做宣传 OpenJWeb快速开发平台以快速开发为核心，整合先进的java 开源框架，本着自主开发+应用集成相结合的原则，旨在为政府、企事业单位、软件公司等平台用户提供一个架构透
Python 报错：IndentationError: unexpected indent daizj python tab 空格缩进
IndentationError: unexpected indent 是缩进的问题，也有可能是tab和空格混用啦 Python开发者有意让违反了缩进规则的程序不能通过编译，以此来强制程序员养成良好的编程习惯。并且在Python语言里，缩进而非花括号或者某种关键字，被用于表示语句块的开始和退出。增加缩进表示语句块的开
HttpClient 超时设置 dongwei_6688 httpclient
HttpClient中的超时设置包含两个部分： 1. 建立连接超时，是指在httpclient客户端和服务器端建立连接过程中允许的最大等待时间 2. 读取数据超时，是指在建立连接后，等待读取服务器端的响应数据时允许的最大等待时间在HttpClient 4.x中如下设置： HttpClient httpclient = new DefaultHttpC
小鱼与波浪 dcj3sjt126com
一条小鱼游出水面看蓝天，偶然间遇到了波浪。　　小鱼便与波浪在海面上游戏，随着波浪上下起伏、汹涌前进。　　小鱼在波浪里兴奋得大叫：“你每天都过着这么刺激的生活吗？简直太棒了。”　　波浪说：“岂只每天过这样的生活，几乎每一刻都这么刺激！还有更刺激的，要有潮汐变化，或者狂风暴雨，那才是兴奋得心脏都会跳出来。”　　小鱼说：“真希望我也能变成一个波浪，每天随着风雨、潮汐流动，不知道有多么好！”　　很快，小鱼
Error Code: 1175 You are using safe update mode and you tried to update a table dcj3sjt126com mysql
快速高效用：SET SQL_SAFE_UPDATES = 0；下面的就不要看了！今日用MySQL Workbench进行数据库的管理更新时，执行一个更新的语句碰到以下错误提示： Error Code: 1175 You are using safe update mode and you tried to update a table without a WHERE that
枚举类型详细介绍及方法定义 gaomysion enum javaee
转发 http://developer.51cto.com/art/201107/275031.htm 枚举其实就是一种类型，跟int, char 这种差不多，就是定义变量时限制输入的，你只能够赋enum里面规定的值。建议大家可以看看，这两篇文章，《java枚举类型入门》和《C++的中的结构体和枚举》，供大家参考。枚举类型是JDK5.0的新特征。Sun引进了一个全新的关键字enum
Merge Sorted Array hcx2013 array
Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array. Note:You may assume that nums1 has enough space (size that is
Expression Language 3.0新特性 jinnianshilongnian el 3.0
Expression Language 3.0表达式语言规范最终版从2013-4-29发布到现在已经非常久的时间了；目前如Tomcat 8、Jetty 9、GlasshFish 4已经支持EL 3.0。新特性包括：如字符串拼接操作符、赋值、分号操作符、对象方法调用、Lambda表达式、静态字段/方法调用、构造器调用、Java8集合操作。目前Glassfish 4/Jetty实现最好，对大多数新特性
超越算法来看待个性化推荐 liyonghui160com 超越算法来看待个性化推荐
一提到个性化推荐，大家一般会想到协同过滤、文本相似等推荐算法，或是更高阶的模型推荐算法，百度的张栋说过，推荐40%取决于UI、30%取决于数据、20%取决于背景知识，虽然本人不是很认同这种比例，但推荐系统中，推荐算法起的作用起的作用是非常有限的。就像任何
写给Javascript初学者的小小建议 pda158 JavaScript
　　一般初学JavaScript的时候最头痛的就是浏览器兼容问题。在Firefox下面好好的代码放到IE就不能显示了，又或者是在IE能正常显示的代码在firefox又报错了。　　如果你正初学JavaScript并有着一样的处境的话建议你：初学JavaScript的时候无视DOM和BOM的兼容性，将更多的时间花在了解语言本身（ECMAScript）。只在特定浏览器编写代码（Chrome/Fi
Java 枚举 ShihLei java enum 枚举
注：文章内容大量借鉴使用网上的资料，可惜没有记录参考地址，只能再传对作者说声抱歉并表示感谢！一基础 1）语法枚举类型只能有私有构造器（这样做可以保证客户代码没有办法新建一个enum的实例）枚举实例必须最先定义 2）特性 &nb
Java SE 6 HotSpot虚拟机的垃圾回收机制 uuhorse java HotSpot GC 垃圾回收 VM
官方资料，关于Java SE 6 HotSpot虚拟机的garbage Collection，非常全，英文。 http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning &