皓哥好运来

斯坦福数据挖掘教程·第三版》读书笔记（英文版）Chapter 10 Mining Social-Network Graphs

来源：《斯坦福数据挖掘教程·第三版》对应的公开英文书和PPT。

Chapter 10 Mining Social-Network Graphs

The essential characteristics of a social network are:

There is a collection of entities that participate in the network. Typically, these entities are people, but they could be something else entirely.
There is at least one relationship between entities of the network. On Facebook or its ilk, this relationship is called friends. Sometimes the relationship is all-or-nothing; two people are either friends or they are not. However, in other examples of social networks, the relationship has a degree. This degree could be discrete; e.g., friends, family, acquaintances,
or none as in Google+. It could be a real number; an example would be the fraction of the average day that two people spend talking to each other.
There is an assumption of nonrandomness or locality. This condition is the hardest to formalize, but the intuition is that relationships tend to cluster. That is, if entity A is related to both B and C, then there is a higher probability than average that B and C are related.

Social networks are naturally modeled as graphs, which we sometimes refer to as a social graph. The entities are the nodes, and an edge connects two nodes if the nodes are related by the relationship that characterizes the network. If there is a degree associated with the relationship, this degree is represented by labeling the edges. Often, social graphs are undirected, as for the Facebook friends graph. But they can be directed graphs, as for example the graphs of followers on Twitter or Google+.

There are many examples of social networks other than “friends” networks. Here, let us enumerate some of the other examples of networks that also exhibit locality of relationships.

Telephone Networks
Here the nodes represent phone numbers, which are really individuals. There is an edge between two nodes if a call has been placed between those phones in some fixed period of time, such as last month, or “ever.” The edges could be weighted by the number of calls made between these phones during the period. Communities in a telephone network will form from groups of people that communicate frequently: groups of friends, members of a club, or people working at the same company, for example.

Email Networks
The nodes represent email addresses, which are again individuals. An edge represents the fact that there was at least one email in at least one direction between the two addresses. Alternatively, we may only place an edge if there were emails in both directions. In that way, we avoid viewing spammers as “friends” with all their victims. Another approach is to label edges as weak or strong. Strong edges represent communication in both directions, while weak edges indicate that the communication was in one direction only. The communities seen in email networks come from the same sorts of groupings we mentioned in connection with telephone networks. A similar sort of network involves people who text other people through their cell phones.

Collaboration Networks
Nodes represent individuals who have published research papers. There is an edge between two individuals who published one or more papers jointly. Optionally, we can label edges by the number of joint publications. The communities in this network are authors working on a particular topic.
An alternative view of the same data is as a graph in which the nodes are papers. Two papers are connected by an edge if they have at least one author in common. Now, we form communities that are collections of papers on the same topic.
There are several other kinds of data that form two networks in a similar way. For example, we can look at the people who edit Wikipedia articles and the articles that they edit. Two editors are connected if they have edited an article in common. The communities are groups of editors that are interested in the same subject. Dually, we can build a network of articles, and connect
articles if they have been edited by the same person. Here, we get communities of articles on similar or related subjects.
In fact, the data involved in Collaborative filtering, as was discussed in Chapter 9, often can be viewed as forming a pair of networks, one for the customers and one for the products. Customers who buy the same sorts of products, e.g., science-fiction books, will form communities, and dually, products that are bought by the same customers will form communities, e.g., all science-fiction books.

Other Examples of Social Graphs
Many other phenomena give rise to graphs that look something like social graphs, especially exhibiting locality. Examples include: information networks (documents, web graphs, patents), infrastructure networks (roads, planes, water pipes, powergrids), biological networks (genes, proteins, food-webs of animals eating each other), as well as other types, like product co-purchasing networks (e.g., Groupon).

Figure 10.2 is an example of a tripartite graph (the case k = 3 of a k-partite graph). There are three sets of nodes, which we may think of as users ${U_1, U_2\}$ , tags ${T_1, T_2, T_3, T_4\}$ , and Web pages ${W_1, W_2, W_3\}$ .
Define the betweenness of an edge $(a, b)$ to be the number of pairs of nodes $x$ and $y$ such that the edge $(a, b)$ lies on the shortest path between $x$ and $y$ . To be more precise, since there can be several shortest paths between $x$ and $y$ , edge $(a, b)$ is credited with the fraction of those shortest paths that include the edge $(a, b)$ . As in golf, a high score is bad. It suggests that the edge $(a, b)$ runs between two different communities; that is, $a$ and $b$ do not belong to the same community.

Following is Girvan-Newman Algorithm

Can see this blog: Girvan-Newman Algorithm

A proper definition of a “good” cut must balance the size of the cut itself against the difference in the sizes of the sets that the cut creates. One choice that serves well is the “normalized cut.” First, define the volume of a set S of nodes, denoted $V o l (S)$ , to be the number of edges with at least one end in S. Suppose we partition the nodes of a graph into two disjoint sets S and T .
Let $C u t (S, T)$ be the number of edges that connect a node in S to a node in T . Then the normalized cut value for S and T is

$\frac{Cut(S,T)}{Vol(S)}+\frac{Cut(S,T)}{Vol(T)}$

Suppose our graph has adjacency matrix A and degree matrix D. Our third matrix, called the Laplacian matrix, is $L = D - A$ , the difference between the degree matrix and the adjacency matrix. That is, the Laplacian matrix L has the same entries as D on the diagonal. Off the diagonal, at row i and column j, L has −1 if there is an edge between nodes i and j and 0 if not.

Eigenvalues of the Laplacian Matrix

The smallest eigenvalue for every Laplacian matrix is 0, and its corresponding eigenvector is [1, 1, . . . , 1].

There is a simple way to find the second-smallest eigenvalue for any matrix, such as the Laplacian matrix, that is symmetric (the entry in row $i$ and column
$j$ equals the entry in row $j$ and column $i$ ). While we shall not prove this fact, the second-smallest eigenvalue of L is the minimum of $x^TLx$ , where $x = [x_1, x_2, . . . , x_n]$ is a column vector with n components, and the minimum is
taken under the constraints:

The length of x is 1; that is $\sum_{i=1}^{n}x_i^2=1$
$x$ is orthogonal to the eigenvector associated with the smallest eigenvalue.

When L is a Laplacian matrix for an n-node graph, we know something more. The eigenvector associated with the smallest eigenvalue is 1. Thus, if x is orthogonal to 1, we must have

$\bold{x}^T\bold{1}=\sum_{i=1}^nx_i=0$

In addition for the Laplacian matrix, the expression $x^TLx$ has a useful equivalent expression. Recall that $L = D - A$ , where D and A are the degree and adjacency matrices of the same graph. Thus, $x^TLx = x^TDx − x^TAx$ .

The Affiliation-Graph Model

The elements of the graph-generation mechanism are:

There is a given number of communities, and there is a given number of individuals (nodes of the graph).
Each community can have any set of individuals as members. That is, the memberships in the communities are parameters of the model.
Each community C has a probability $p_C$ associated with it, the probability that two members of community C are connected by an edge because they are both members of C. These probabilities are also parameters of the model.
If a pair of nodes is in two or more communities, then there is an edge between them if any of the communities of which both are members induces that edge according to Rule (3).
The decision whether a community induces an edge between two of its members is independent of that decision for any other community of which these two individuals are also members.

There is a solution to the problem caused by the affiliation-graph model, where membership of individuals in communities is discrete: either you are a member of the community or not. Instead of “all-or-nothing” membership of nodes in communities, we can suppose that for each node and each community, there is a “strength of membership” for that node and community. Intuitively, the stronger the membership of two individuals in the same community is, the more likely it is that this community will cause there to be an edge between them.

In this model, we can adjust the strength of membership for an individual in a community continuously, just as we can adjust the probability associated with a community in the affiliation-graph model. That improvement allows us to use methods for optimizing continuous functions, such as gradient descent, to maximize the expression for likelihood. In the new model, we have

Fixed sets of communities and individuals, as before.
For each community C and individual x, there is a strength of membership
parameter $F_{xC}$ . These parameters can take any nonnegative value, and a value of 0 means the individual is definitely not in the community.
The probability that community C causes there to be an edge between nodes u and v is

$p_C(u,v)=1-e^{-F_{uC}F_{vC}}$

As before, the probability of there being an edge between u and v is 1 minus the probability that none of the communities causes there to be an edge between them. That is, each community independently causes edges, and an edge exists between two nodes if any community causes it to exist. More formally, $p_{uv}$ , the probability of an edge between nodes u and v, is

$p_{uv}=1-\prod_C(1-p_C(u,v))$

Finally, let E be the set of edges in the observed graph. As before, we can write the formula for the likelihood of the observed graph as the product of the expression for $p_{uv}$ for each edge $(u, v)$ that is in E, times the product of $1−p_{uv}$ for each edge $(u, v)$ that is not in E. Thus, in the new model, the formula for the likelihood of the graph with edges E is

$\prod_{(u,v)\space in \space E}(1-e^{-F_{uC}F_{vC}})\prod_{(u,v)\space not\space in \space E}(e^{-F_{uC}F_{vC}})$

Continuous Versus Discrete Optimization

Whenever you are trying to find an optimum solution to a problem where some of the elements are continuous (e.g., the probabilities of communities inducing edges) and some elements are discrete (e.g., the memberships of the communities), it might make sense to replace each discrete element by a continuous variable. That change, while it doesn’t strictly speaking represent reality, enables us to perform only one optimization, which will use a continuous optimization method such as gradient descent. We run the risk of winding up in a local optimum that is not the globally best solution.

SimRank

In this section, we shall take up another approach to analyzing social-network graphs. This technique, called “simrank,” applies best to graphs with several types of nodes, although it can in principle be applied to any graph. The purpose of simrank is to measure the similarity between nodes of the same type, and it does so by seeing where random walkers on the graph wind up when starting at a particular node, the source node. Because calculation must be carried out once for each source node, there is a limit to the size of graphs that can be analyzed completely in this manner. However, we shall also offer an algorithm that approximates simrank, but is far more efficient than iterated matrix-vector multiplication. Finally, we show how simrank can be used to find communities.

Let us use β as the probability that the walker continues at random, so 1 − β is the probability the walker will teleport to the source node S. Let $e_S$ be the column vector that has 1 in the row for node S and 0’s elsewhere. Then if v is the column vector that reflects the probability the walker is at each of the nodes at a particular round, and $v'$ is the probability the walker is at each of the nodes at the next round, then $v'$ is related to v by:

$v'=\beta Mv+(1-\beta)e_S$

A simple approach is to add nodes until the density of edges – the fraction of edges that connect nodes in the community divided by the number of possible edges connecting nodes in the community (i.e., $C_2^n$ for an n-node community) – goes below a threshold.

Another measure of the goodness of a community C is the conductance. To define conductance, we need first to define the volume of a community C, which is the smaller of the sum of the degrees of the nodes in C and the sum of the degrees of the nodes not in C. Then the conductance of C is the ratio of the number of edges with exactly one end in C divided by the volume of C. The intuitive idea behind conductance is that it looks for sets C that are neither too small nor too large, and yet can be disconnected from the network by cutting a relatively small number of edges. Small conductance suggests a closely knit community.

The neighborhood of radius d for a node v is the set of nodes u for which there is a path of length at most d from v to u. We denote this neighborhood by $N (v, d)$ . For example, N(v, 0) is always {v}, and N(v, 1) is v plus the set of nodes to which there is an arc from v. More generally, if V is a set of nodes, then N(V, d) is the set of nodes u for which there is a path of length d or less from at least one node in the set V .

The neighborhood profile of a node v is the sequence of sizes of its neighborhoods $∣ N (v, 1) ∣, ∣ N (v, 2) ∣, ....$ We do not include the neighborhood of distance 0, since its size is always 1.

Summary of Chapter 10

Social-Network Graphs: Graphs that represent the connections in a social network are not only large, but they exhibit a form of locality, where small subsets of nodes (communities) have a much higher density of edges than the average density.
Communities and Clusters: While communities resemble clusters in some ways, there are also significant differences. Individuals (nodes) normally belong to several communities, and the usual distance measures fail to represent closeness among nodes of a community. As a result, standard algorithms for finding clusters in data do not work well for community finding.
Betweenness: One way to separate nodes into communities is to measure the betweenness of edges, which is the sum over all pairs of nodes of the fraction of shortest paths between those nodes that go through the given edge. Communities are formed by deleting the edges whose betweenness is above a given threshold.
The Girvan-Newman Algorithm: The Girvan-Newman Algorithm is an efficient technique for computing the betweenness of edges. A breadth-first search from each node is performed, and a sequence of labeling steps computes the share of paths from the root to each other node that go through each of the edges. The shares for an edge that are computed for each root are summed to get the betweenness.
Communities and Complete Bipartite Graphs: A complete bipartite graph has two groups of nodes, all possible edges between pairs of nodes chosen one from each group, and no edges between nodes of the same group. Any sufficiently dense community (a set of nodes with many edges among them) will have a large complete bipartite graph.
Finding Complete Bipartite Graphs: We can find complete bipartite graphs by the same techniques we used for finding frequent itemsets. Nodes of the graph can be thought of both as the items and as the baskets. The basket corresponding to a node is the set of adjacent nodes, thought of as items. A complete bipartite graph with node groups of size t and s can be thought of as finding frequent itemsets of size t with supports.
Graph Partitioning: One way to find communities is to partition a graph repeatedly into pieces of roughly similar sizes. A cut is a partition of the nodes of the graph into two sets, and its size is the number of edges that have one end in each set. The volume of a set of nodes is the number of edges with at least one end in that set.
Normalized Cuts: We can normalize the size of a cut by taking the ratio of the size of the cut and the volume of each of the two sets formed by the cut. Then add these two ratios to get the normalized cut value. Normalized cuts with a low sum are good, in the sense that they tend to divide the nodes into two roughly equal parts, and have a relatively small size.
Adjacency Matrices: These matrices describe a graph. The entry in row i and column j is 1 if there is an edge between nodes i and j, and 0 otherwise.
Degree Matrices: The degree matrix for a graph has d in the ith diagonal entry if d is the degree of the ith node. Off the diagonal, all entries are 0.
Laplacian Matrices: The Laplacian matrix for a graph is its degree matrix minus its adjacency matrix. That is, the entry in row i and column i of the Laplacian matrix is the degree of the ith node of the graph, and the entry in row i and column j, for i 6 = j, is −1 if there is an edge between nodes i and j, and 0 otherwise.
Spectral Method for Partitioning Graphs: The lowest eigenvalue for any Laplacian matrix is 0, and its corresponding eigenvector consists of all 1’s. The eigenvectors corresponding to small eigenvalues can be used to guide a partition of the graph into two parts of similar size with a small cut value. For one example, putting the nodes with a positive component in the eigenvector with the second-smallest eigenvalue into one set and those with a negative component into the other is usually good.
Overlapping Communities: Typically, individuals are members of several communities. In graphs describing social networks, it is normal for the probability that two individuals are friends to rise as the number of communities of which both are members grows.
The Affiliation-Graph Model: An appropriate model for membership in communities is to assume that for each community there is a probability that because of this community two members become friends (have an edge in the social network graph). Thus, the probability that two nodes have an edge is 1 minus the product of the probabilities that none of the communities of which both are members cause there to be an edge between them. We then find the assignment of nodes to communities and the values of those probabilities that best describes the observed social graph.
Maximum-Likelihood Estimation: An important modeling technique, useful for modeling communities as well as many other things, is to compute, as a function of all choices of parameter values that the model allows, the probability that the observed data would be generated. The values that yield the highest probability are assumed to be correct, and called the maximum-likelihood estimate (MLE).
Use of Gradient Descent: If we know membership in communities, we can find the MLE by gradient descent or other methods. However, we cannot find the best membership in communities by gradient descent, because membership is discrete, not continuous.
Improved Community Modeling by Strength of Membership: We can formulate the problem of finding the MLE of communities in a social graph by assuming individuals have a strength of membership in each community, possibly 0 if they are not a member. If we define the probability of an edge between two nodes to be a function of their membership strengths in their common communities, we can turn the problem of finding the MLE into a continuous problem and solve it using gradient descent.
Simrank: One way to measure the similarity of nodes in a graph with several types of nodes is to start a random walker at one node and allow it to wander, with a fixed probability of restarting at the same node. The distribution of where the walker can be expected to be is a good measure of the similarity of nodes to the starting node. This process must be repeated with each node as the starting node if we are to get all-pairs similarity.
Triangles in Social Networks: The number of triangles per node is an important measure of the closeness of a community and often reflects its maturity. We can enumerate or count the triangles in a graph with m edges in $O(m^{3/2})$ time, but no more efficient algorithm exists in general.
Triangle Finding by MapReduce: We can find triangles in a single round of MapReduce by treating it as a three-way join. Each edge must be sent to a number of reducers proportional to the cube root of the total number of reducers, and the total computation time spent at all the reducers is proportional to the time of the serial algorithm for triangle finding.
Neighborhoods: The neighborhood of radius d for a node v in a directed or undirected graph is the set of nodes reachable from v along paths of length at most d. The neighborhood profile of a node is the sequence of neighborhood sizes for all distances from 1 upwards. The diameter of a connected graph is the smallest d for which the neighborhood of radius d for any starting node includes the entire graph.
Transitive Closure: A node v can reach node u if u is in the neighborhood of v for some radius. The transitive closure of a graph is the set of pairs of nodes (v, u) such that v can reach u.
Computing Transitive Closure: Since the transitive closure can have a number of facts equal to the square of the number of nodes of a graph, it is infeasible to compute transitive closure directly for large graphs. One approach is to find strongly connected components of the graph and collapse them each to a single node before computing the transitive closure.
Transitive Closure and MapReduce: We can view transitive closure computation as the iterative join of a path relation (pairs of nodes v and u such that u is known to be reachable from v) and the arc relation of the graph. Such an approach requires a number of MapReduce rounds equal to the diameter of the graph.
Seminaive Evaluation: When computing the transitive closure for a graph, we can speed up the iterative evaluation of the Path relation by recognizing that a Path fact is only useful on the round after that round when it was first discovered. A similar idea speeds up the reachability calculation and many similar iterative algorithms.
Transitive Closure by Recursive Doubling: An approach that uses fewer MapReduce rounds is to join the path relation with itself at each round. At each round, we double the length of paths that are able to contribute to the transitive closure. Thus, the number of needed rounds is only the base-2 logarithm of the diameter of the graph.
Smart Transitive Closure: While recursive doubling can cause the same path to be considered many times, and thus increases the total computation time (compared with iteratively joining paths with single arcs), a variant called smart transitive closure avoids discovering the same path more than once. The trick is to require that when joining two paths, the first has a length that is a power of 2.
Approximating Neighborhood Sizes: By using the Flajolet-Martin technique for approximating the number of distinct elements in a stream, we can find the neighborhood sizes at different radii approximately. We maintain a set of tail lengths for each node. To increase the radius by 1, we examine each edge (u, v) and for each tail length for u we set it equal to the corresponding tail length for v if the latter is larger than the former.

【人工智能】Spring AI Alibaba，一个面向 Java 开发者的开源框架，它旨在简化将人工智能（AI）功能集成到应用程序中的过程。本本本添哥 A -AIGC 人工智能大模型人工智能 java spring
一、SpringAIAlibaba介绍SpringAIAlibaba是一个面向Java开发者的开源框架，它旨在简化将人工智能（AI）功能集成到应用程序中的过程。该项目基于SpringAI构建，并且是阿里云通义系列模型及服务在JavaAI应用开发领域的最佳实践。SpringAIAlibaba的目标是为开发者提供一套高层次的AIAPI抽象以及与云原生基础设施的深度集成方案，从而帮助他们快速构建智能应用
模型融合与人机协同：构建人机共生的智能未来 AI天才研究院 Agentic AI 实战计算 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍在科技日新月异的今天，人工智能（AI）已经成为了我们生活中不可或缺的一部分。从智能手机，到自动驾驶汽车，再到医疗诊断，AI的应用已经渗透到了我们生活的方方面面。然而，尽管AI的发展已经取得了显著的成就，但是我们仍然面临着一个重大的挑战：如何让AI系统更好地理解和适应人类的需求，以实现人机共生的智能未来。为了解决这个问题，越来越多的研究者开始探索模型融合和人机协同的方法。2.核心概念与联
vLLM 优化与调优：提升模型性能的关键策略强哥之神人工智能深度学习计算机视觉 deepseek 智能体 vllm
在当今人工智能领域，大语言模型（LLM）的应用日益广泛，而优化和调优这些模型的性能成为了至关重要的任务。vLLM作为一种高效的推理引擎，提供了多种策略来提升模型的性能。本文将深入探讨vLLMV1的优化与调优策略，帮助读者更好地理解和应用这些技术。抢占式调度（Preemption）由于Transformer架构的自回归特性，有时键值缓存（KVcache）空间不足以处理所有批量请求。在这种情况下，vL
Spring Data Neo4j 与后端人工智能算法的数据交互 AI大模型应用实战 spring neo4j 人工智能 ai
SpringDataNeo4j与后端人工智能算法的数据交互关键词：SpringDataNeo4j、图数据库、人工智能算法、数据交互、知识图谱、图神经网络、数据集成摘要：本文深入探讨了如何利用SpringDataNeo4j框架实现后端人工智能算法与图数据库的高效数据交互。文章首先介绍了图数据库和人工智能算法的基本概念，然后详细解析了SpringDataNeo4j的核心架构和原理。接着，通过实际代码示
【AI大模型】深入解析预训练：大模型时代的核心引擎我爱一条柴ya 学习AI记录深度学习人工智能 ai python AI编程算法
预训练已成为现代人工智能，尤其是自然语言处理和计算机视觉领域的基石技术。它彻底改变了模型开发范式，催生了BERT、GPT等革命性模型。本文将系统阐述预训练的核心概念、原理、方法、应用及挑战。一、预训练的本质：为何需要它？核心问题：数据标注的瓶颈监督学习依赖海量高质量标注数据，获取成本极高（时间、金钱、专业知识）。对于复杂任务（如理解语义、生成文本），标注难度呈指数级上升。标注数据稀缺导致模型泛化能
广州曼顿2P数字微断：保护电力设备的安全守护者 mdkk678 安全
在现代社会，电力设备的安全运行对各行各业至关重要。然而，电力系统中存在各种电压波动、过载和短路等问题，可能对设备造成损害。为了保护电力设备免受这些问题的影响，广州曼顿推出了2P数字微断器。本文将介绍这一创新产品的特点和优势，以及它对电力设备的保护作用。广州曼顿科技有限公司专注用户侧智慧数字电气产品研制，以及智慧电能服务大数据云平台建设。基于人工智能技术，大幅提升人触电时的生命安全保障，以及电气火灾
Python通关秘籍之基础教程(一） Smile丶Life丶 Python 通关指南：从零基础到高手之路 python 开发语言后端
引言在编程的世界里，Python就像一位温和而强大的导师，它以简洁优雅的语法和强大的功能吸引着无数初学者和专业人士。无论你是想开发网站、分析数据、构建人工智能，还是仅仅想学习编程思维，Python都是你的理想选择。Python的魅力在于它的易读性和广泛的应用场景。它的代码就像英语句子一样自然，即使是完全没有编程经验的人也能快速上手。同时，Python拥有庞大的生态系统，从Web开发（Django、
多模态大模型发展全景：从架构创新到应用突破陈敬雷-充电了么-CEO兼CTO python 大模型多模态大模型 AIGC 机器学习深度学习 DeepSeek
注：此文章内容均节选自充电了么创始人，CEO兼CTO陈敬雷老师的新书《GPT多模态大模型与AIAgent智能体》（跟我一起学人工智能）【陈敬雷编著】【清华大学出版社】《GPT多模态大模型与AIAgent智能体》新出书籍配套视频【陈敬雷】推荐算法系统实战全系列精品课【陈敬雷】文章目录GPT多模态大模型系列四多模态大模型发展全景：从架构创新到应用突破更多技术内容总结GPT多模态大模型系列四多模态大模型
Python爬虫在社交平台数据挖掘中的应用：深入探索用户互动程序员威哥 python 爬虫数据挖掘
引言社交媒体已经成为全球用户互动的主要平台，每天都有大量的信息生成，用户之间的互动行为如点赞、评论、分享、转发等构成了宝贵的数据资源。如何利用这些互动数据为商业决策、用户行为分析以及产品优化提供支持，已经成为数据科学与大数据分析领域的一个重要课题。Python作为一款强大的编程语言，凭借其丰富的爬虫库和数据分析工具，已经成为挖掘社交平台数据的重要工具。在本文中，我们将通过Python爬虫技术，深入
ollama v0.9.6版本发布详解：修复启动屏幕样式及新增工具名称参数支持福大大架构师每日一题文心一言vschatgpt ollama
作为近年来备受瞩目的开源对话式人工智能框架之一，ollama持续更新优化其产品，致力于为开发者带来更稳定、高效的使用体验。2025年7月8日，ollama发布了v0.9.6版本，这一版本在用户界面和API的可用性方面做出了重要改进，进一步增强了开发和集成的便捷性。本文将对ollamav0.9.6版本的更新内容进行全面解析，详细介绍新特性、修复的具体问题、应用示例及最佳实践，帮助开发者快速掌握和应用
AI人工智能与机器学习的大数据融合应用 AI智能探索者人工智能机器学习大数据 ai
AI人工智能与机器学习的大数据融合应用关键词：AI人工智能、机器学习、大数据、融合应用、数据挖掘摘要：本文深入探讨了AI人工智能与机器学习在大数据融合应用方面的相关内容。首先介绍了研究的背景、目的、预期读者和文档结构，对核心术语进行了清晰定义。接着阐述了AI、机器学习和大数据的核心概念及相互联系，给出了形象的文本示意图和Mermaid流程图。详细讲解了核心算法原理，并通过Python源代码进行说明
深入解读 Qwen3 技术报告（一）：引言小爷毛毛（卓寿杰）大模型AIGC 深度学习基础/原理人工智能自然语言处理 python 语言模型深度学习
重磅推荐专栏：《大模型AIGC》《课程大纲》《知识星球》本专栏致力于探索和讨论当今最前沿的技术趋势和应用领域，包括但不限于ChatGPT和StableDiffusion等。我们将深入研究大型模型的开发和应用，以及与之相关的人工智能生成内容（AIGC）技术。通过深入的技术解析和实践经验分享，旨在帮助读者更好地理解和应用这些领域的最新进展1.引言：迎接大型语言模型的新纪元我们正处在一个由人工智能（AI
AI人工智能遇上TensorFlow：技术融合新趋势 AI大模型应用之禅人工智能 tensorflow python ai
AI人工智能遇上TensorFlow：技术融合新趋势关键词：人工智能、TensorFlow、深度学习、神经网络、机器学习、技术融合、AI开发摘要：本文深入探讨了人工智能技术与TensorFlow框架的融合发展趋势。我们将从基础概念出发，详细分析TensorFlow在AI领域的核心优势，包括其架构设计、算法实现和实际应用。文章包含丰富的技术细节，如神经网络原理、TensorFlow核心算法实现、数学
边缘人工智能与医疗AI融合发展路径：技术融合与应用前景（上） Allen_Lyb 数智化医院2025 人工智能健康医疗算法
引言人工智能技术正以前所未有的速度改变着医疗保健领域，从辅助诊断到个性化治疗，AI应用的广度和深度不断拓展。在这一浪潮中，边缘人工智能（EdgeAI）作为一种新兴技术范式，正成为推动医疗AI创新的关键力量。边缘AI区别于传统的云计算模式，它将数据处理和AI模型部署在数据源头附近，实现快速响应和隐私保护。这种特性使其在医疗保健领域具有独特优势，特别是在实时监测、紧急响应和患者隐私保护等方面。边缘AI
AI人工智能领域中AI作画的技术优势 AI大模型应用之禅人工智能 AI作画 ai
AI人工智能领域中AI作画的技术优势关键词：AI作画、技术优势、人工智能、艺术创作、图像生成摘要：本文深入探讨了AI人工智能领域中AI作画的技术优势。从背景介绍出发，阐述了AI作画的起源与发展，明确了文章的目的、范围、预期读者以及文档结构。接着详细分析了AI作画的核心概念，包括其原理和架构，并通过Mermaid流程图进行直观展示。对核心算法原理进行了深入剖析，结合Python代码示例进行讲解。同时
快速掌握Python编程基础张彦峰ZYF python
干货分享，感谢您的阅读！备注：本博客将自己初步学习Python的总结进行分享，希望大家通过本博客可以在短时间内快速掌握Python的基本程序编码能力，如有错误请留言指正，谢谢！（持续更新）一、快速了解Python和环境准备（一）Python快速介绍Python是一种简洁、强大、易读的编程语言，广泛应用于Web开发、数据分析、人工智能、自动化运维等领域。它由GuidovanRossum在1991年设
人工智能开源的大模型训练微调框架LLaMA-Factory
LLaMA-Factory是一个开源的大模型训练微调框架，具有模块化设计和多种高效的训练方法，能够满足不同用户的需求。用户可以通过命令行或Web界面进行操作，实现个性化的语言模型微调。LLaMA-Factory是一个专注于高效微调LLaMA系列模型的开源框架（GitHub项目地址：https://github.com/hiyouga/LLaMA-Factory）。它以极简配置、低资源消耗和对中文任
Python爬虫实战：利用Selenium与反反爬技术高效爬取天眼查企业信息 Python爬虫项目 2025年爬虫实战项目 python 爬虫开发语言 scrapy selenium
摘要本文将详细介绍如何使用Python爬虫技术获取天眼查的企业信息数据。我们将从爬虫基础开始，逐步深入到高级反反爬技术，最终构建一个能够稳定获取天眼查数据的爬虫系统。文章包含完整的代码实现、技术原理分析以及实际应用场景，帮助读者全面掌握企业信息爬取的核心技术。关键词：Python爬虫、天眼查、Selenium、反反爬技术、企业信息采集、数据挖掘一、引言在当今大数据时代，企业信息数据对于市场分析、商
Python 爬虫实战：京东商品数据采集（登录态验证 + 价格监控系统） Python核芯 Python爬虫实战项目 python 爬虫开发语言
一、引言在电商飞速发展的当下，京东作为国内头部电商平台之一，拥有海量商品数据。对于商家而言，精准掌握这些数据能助力优化定价策略、洞察市场动态；对消费者来说，追踪商品价格走势有助于把握最佳购买时机。本文将深入剖析如何借助Python爬虫技术实现京东商品数据采集，包括突破登录态验证以及搭建价格监控系统，为读者呈上一份实用的电商数据挖掘指南。二、环境搭建安装Python库：执行以下命令安装所需的库：pi
智慧城市大脑解决方案
智慧城市大脑背景与意义智慧城市大脑作为城市管理的创新模式，通过集成大数据、人工智能等技术，实现了对城市运行的全面感知与智能决策。它不仅提升了城市管理效率，还为市民带来了更加便捷、安全的生活体验。智慧城市大脑建设历程某城市作为智慧城市大脑的创新策源地，自2016年起便与阿里巴巴集团深度合作，投入巨资自主研发城市数据大脑“交通小脑”平台。该平台成功接入了大量视频和数据，实现了对道路和时间资源的再分配，
csdn-AI测评 Right.W 人工智能
一、你平时会使用这类AI工具吗？你对这类型的工具有什么看法？AI工具灵活、多样、能够回答各种问题，大为方便了人们日常学习、工作、生活的需要。目前很流行的chartgpt就是一款超火爆的ai工具，可以写论文、敲代码各种功能十分强大，为各个领域的数字化和智能化进程给予了很大帮助。但是人的智慧和意识是机器无法取代的，人类对人工智能不能过度依赖，人工智能只是改善生活、提高效率的工具而已。二、你可以花几分钟
智慧城市大脑：城市治理的新引擎 Fulima_cloud 智慧城市人工智能
在科技日新月异的今天，智慧城市的概念已经深入人心。而智慧城市大脑，作为智慧城市的中枢神经系统，运用大数据、云计算、物联网、人工智能等先进技术，构建的城市级智能化管理体系，正逐步成为提升城市治理能力、优化城市服务、推动城市可持续发展的重要力量。智慧城市大脑是什么，简而言之，是运用大数据、云计算、物联网、人工智能等先进技术，构建的城市级智能化管理体系。它如同城市的“智慧中枢”，通过对城市全域运行数据的
【亲测免费】探索AudioSlicer：智能音频分割工具秦贝仁Lincoln
探索AudioSlicer：智能音频分割工具去发现同类优质开源项目:https://gitcode.com/项目介绍AudioSlicer是一个基于Python的轻量级工具，专门用于切割.wav音频文件。它通过检测静音段将音频拆分成多个独立样本，并生成一个.json文件，详细记录了每个切片的时间范围。该项目灵感源自AndrewPhillipDoss的工作，现在正向着人工智能适应的方向发展，有望实现
人工智能怎么入门？零基础入门指南：从小白到AI实战者的第一步 OpenCV图像识别人工智能人工智能计算机视觉自然语言处理神经网络机器学习
人工智能（AI）是当今最具前景的科技领域之一。从聊天机器人到自动驾驶，从图像识别到语音翻译，AI正在以前所未有的速度改变世界。但对于初学者来说，一个最常见的问题是：“我没有基础，也不是学数学或计算机的，人工智能还能学吗？我该怎么入门？”答案是：可以学，而且你并不孤单。越来越多的人正在以“跨专业、转行、自学”的方式进入AI领域。关键是，你需要一个清晰的入门路径，理解应该先做什么、学什么、避开什么误区
深度学习基础与应用：从理论到实战创新工场
本文还有配套的精品资源，点击获取简介：深度学习是人工智能的核心分支，通过模拟人脑神经网络处理大量数据以执行复杂任务。Python因其简洁性和强大的库支持成为深度学习研究的首选语言。本文概述了深度学习基础概念、核心算法、Python框架，并假设了一个包含教程、示例代码、数据集、交互式学习环境、性能评估指标和进阶主题的“deep-learning-study-main”压缩包内容，旨在帮助学习者深入理
从点子到原型只需10分钟：用 Copilot 快速验证产品功能网罗开发 AI 大模型 Python 技术汇总人工智能 copilot
网罗开发（小红书、快手、视频号同名）大家好，我是展菲，目前在上市企业从事人工智能项目研发管理工作，平时热衷于分享各种编程领域的软硬技能知识以及前沿技术，包括iOS、前端、HarmonyOS、Java、Python等方向。在移动端开发、鸿蒙开发、物联网、嵌入式、云原生、开源等领域有深厚造诣。图书作者：《ESP32-C3物联网工程开发实战》图书作者：《SwiftUI入门，进阶与实战》超级个体：CO
阿里开源WebSailor：超越闭源模型的网络智能体新星
WebSailor简介与开源背景在人工智能领域持续创新的浪潮中，阿里通义实验室于2025年7月正式开源了其突破性成果——WebSailor网络智能体。这一开源项目标志着中国企业在复杂推理与检索技术领域的重要突破，其设计初衷直指开源生态中长期存在的关键短板：面对超高不确定性任务时的系统性推理能力缺失。填补开源生态的关键空白WebSailor的诞生源于一个被长期忽视的技术鸿沟。根据斯坦福大学《2025
RAG实战指南 Day 11：文本分块策略与最佳实践在未来等你 RAG实战指南 RAG 检索增强生成文本分块语义分割文档处理 NLP 人工智能
【RAG实战指南Day11】文本分块策略与最佳实践文章标签RAG,检索增强生成,文本分块,语义分割,文档处理,NLP,人工智能,大语言模型文章简述文本分块是RAG系统构建中的关键环节，直接影响检索准确率。本文深入解析5种主流分块技术：1)固定大小分块的实现与调优技巧；2)基于语义的递归分割算法；3)文档结构感知的分块策略；4)LLM增强的智能分块方法；5)多模态混合内容处理方案。通过电商知识库和科
Spring AI：Tool Calling 虾条_花吹雪 Spring AI ai java
工具调用（也称为函数调用）是人工智能应用程序中的一种常见模式，允许模型与一组API或工具交互，以增强其功能。工具主要用于：信息检索。此类工具可用于从外部源（如数据库、web服务、文件系统或web搜索引擎）检索信息。目标是增强模型的知识，使其能够回答否则无法回答的问题。因此，它们可用于检索增强生成（RAG）场景。例如，一个工具可用于检索给定位置的当前天气，检索最新的新闻文章，或查询数据库中的特定记录
AI产品经理技术篇：从传统AI到生成式AI，解密大模型的核心概念让我看看好学吗人工智能产品经理学习深度学习自然语言处理
在人工智能技术飞速发展的今天，AI产品经理不仅需要理解业务逻辑，还需深入技术底层，把握从传统AI到生成式AI的演进脉络。传统AI以分类、预测和规则驱动为核心，而生成式AI则颠覆了这一范式，通过大模型实现内容创作、对话生成等创造性任务。这种转变背后，是参数规模、模型架构和训练方式的根本性革新。作为AI产品经理，理解大模型的核心概念至关重要。从“参数”的意义到“Token”的向量化，从Transfor
iOS http封装 374016526 ios 服务器交互 http 网络请求
程序开发避免不了与服务器的交互，这里打包了一个自己写的http交互库。希望可以帮到大家。内置一个basehttp，当我们创建自己的service可以继承实现。 KuroAppBaseHttp *baseHttp = [[KuroAppBaseHttp alloc] init]; [baseHttp setDelegate:self]; [baseHttp
lolcat ：一个在 Linux 终端中输出彩虹特效的命令行工具 brotherlamp linux linux教程 linux视频 linux自学 linux资料
那些相信 Linux 命令行是单调无聊且没有任何乐趣的人们，你们错了，这里有一些有关 Linux 的文章，它们展示着 Linux 是如何的有趣和“淘气” 。在本文中，我将讨论一个名为“lolcat”的小工具 – 它可以在终端中生成彩虹般的颜色。何为 lolcat ? Lolcat 是一个针对 Linux，BSD 和 OSX 平台的工具，它类似于 cat 命令，并为 cat
MongoDB索引管理（1）——[九] eksliang mongodb MongoDB管理索引
转载请出自出处：http://eksliang.iteye.com/blog/2178427 一、概述数据库的索引与书籍的索引类似，有了索引就不需要翻转整本书。数据库的索引跟这个原理一样，首先在索引中找，在索引中找到条目以后，就可以直接跳转到目标文档的位置，从而使查询速度提高几个数据量级。不使用索引的查询称
Informatica参数及变量 18289753290 Informatica 参数变量
下面是本人通俗的理解，如有不对之处，希望指正 info参数的设置：在info中用到的参数都在server的专门的配置文件中（最好以parma）结尾下面的GLOBAl就是全局的，$开头的是系统级变量，$$开头的变量是自定义变量。如果是在session中或者mapping中用到的变量就是局部变量，那就把global换成对应的session或者mapping名字。 [GLOBAL] $Par
python 解析unicode字符串为utf8编码字符串酷的飞上天空 unicode
php返回的json字符串如果包含中文，则会被转换成\uxx格式的unicode编码字符串返回。在浏览器中能正常识别这种编码，但是后台程序却不能识别，直接输出显示的是\uxx的字符，并未进行转码。转换方式如下 >>> import json >>> q = '{"text":"\u4
Hibernate的总结永夜-极光 Hibernate
1.hibernate的作用,简化对数据库的编码,使开发人员不必再与复杂的sql语句打交道做项目大部分都需要用JAVA来链接数据库，比如你要做一个会员注册的页面，那么获取到用户填写的基本信后，你要把这些基本信息存入数据库对应的表中，不用hibernate还有mybatis之类的框架，都不用的话就得用JDBC，也就是JAVA自己的，用这个东西你要写很多的代码，比如保存注册信
SyntaxError: Non-UTF-8 code starting with '\xc4' 随便小屋 python
刚开始看一下Python语言，传说听强大的，但我感觉还是没Java强吧！写Hello World的时候就遇到一个问题，在Eclipse中写的，代码如下 ''' Created on 2014年10月27日 @author: Logic ''' print("Hello World!"); 运行结果 SyntaxError: Non-UTF-8
学会敬酒礼仪不做酒席菜鸟 aijuans 菜鸟
俗话说，酒是越喝越厚，但在酒桌上也有很多学问讲究，以下总结了一些酒桌上的你不得不注意的小细节。细节一：领导相互喝完才轮到自己敬酒。敬酒一定要站起来，双手举杯。细节二：可以多人敬一人，决不可一人敬多人，除非你是领导。细节三：自己敬别人，如果不碰杯，自己喝多少可视乎情况而定，比如对方酒量，对方喝酒态度，切不可比对方喝得少，要知道是自己敬人。细节四：自己敬别人，如果碰杯，一
《创新者的基因》读书笔记 aoyouzi 读书笔记《创新者的基因》
创新者的基因创新者的“基因”，即最具创意的企业家具备的五种“发现技能”：联想，观察，实验，发问，建立人脉。第一部分破坏性创新，从你开始第一章破坏性创新者的基因如何获得启示：发现以下的因素起到了催化剂的作用：(1) -个挑战现状的问题；(2)对某项技术、某个公司或顾客的观察；(3) -次尝试新鲜事物的经验或实验；(4)与某人进行了一次交谈，为他点醒
表单验证技术百合不是茶 JavaScript DOM对象 String对象事件
js最主要的功能就是验证表单,下面是我对表单验证的一些理解,贴出来与大家交流交流 ,数显我们要知道表单验证需要的技术点, String对象,事件,函数一:String对象;通常是对字符串的操作; 1,String的属性; 字符串.length;表示该字符串的长度; var str= "java"
web.xml配置详解之context-param bijian1013 java servlet web.xml context-param
一.格式定义： <context-param> <param-name>contextConfigLocation</param-name> <param-value>contextConfigLocationValue></param-value> </context-param> 作用：该元
Web系统常见编码漏洞（开发工程师知晓） Bill_chen sql PHP Web fckeditor 脚本
1.头号大敌：SQL Injection 原因：程序中对用户输入检查不严格，用户可以提交一段数据库查询代码，根据程序返回的结果，获得某些他想得知的数据，这就是所谓的SQL Injection，即SQL注入。本质: 对于输入检查不充分，导致SQL语句将用户提交的非法数据当作语句的一部分来执行。示例： String query = "SELECT id FROM users
【MongoDB学习笔记六】MongoDB修改器 bit1129 mongodb
本文首先介绍下MongoDB的基本的增删改查操作，然后，详细介绍MongoDB提供的修改器，以完成各种各样的文档更新操作 MongoDB的主要操作 show dbs 显示当前用户能看到哪些数据库 use foobar 将数据库切换到foobar show collections 显示当前数据库有哪些集合 db.people.update，update不带参数，可
提高职业素养，做好人生规划白糖_ 人生
培训讲师是成都著名的企业培训讲师，他在讲课中提出的一些观点很新颖，在此我收录了一些分享一下。注：讲师的观点不代表本人的观点，这些东西大家自己揣摩。 1、什么是职业规划：职业规划并不完全代表你到什么阶段要当什么官要拿多少钱，这些都只是梦想。职业规划是清楚的认识自己现在缺什么，这个阶段该学习什么，下个阶段缺什么，又应该怎么去规划学习，这样才算是规划。
国外的网站你都到哪边看？ bozch 技术网站国外
学习软件开发技术，如果没有什么英文基础，最好还是看国内的一些技术网站，例如：开源OSchina，csdn，iteye,51cto等等。个人感觉如果英语基础能力不错的话，可以浏览国外的网站来进行软件技术基础的学习，例如java开发中常用的到的网站有apache.org 里面有apache的很多Projects,springframework.org是spring相关的项目网站,还有几个感觉不错的
编程之美-光影切割问题 bylijinnan 编程之美
package a; public class DisorderCount { /**《编程之美》“光影切割问题” * 主要是两个问题： * 1.数学公式（设定没有三条以上的直线交于同一点）： * 两条直线最多一个交点，将平面分成了4个区域； * 三条直线最多三个交点，将平面分成了7个区域； * 可以推出：N条直线 M个交点，区域数为N+M+1。
关于Web跨站执行脚本概念 chenbowen00 Web 安全跨站执行脚本
跨站脚本攻击(XSS)是web应用程序中最危险和最常见的安全漏洞之一。安全研究人员发现这个漏洞在最受欢迎的网站,包括谷歌、Facebook、亚马逊、PayPal,和许多其他网站。如果你看看bug赏金计划,大多数报告的问题属于 XSS。为了防止跨站脚本攻击,浏览器也有自己的过滤器,但安全研究人员总是想方设法绕过这些过滤器。这个漏洞是通常用于执行cookie窃取、恶意软件传播,会话劫持,恶意重定向。在
[开源项目与投资]投资开源项目之前需要统计该项目已有的用户数 comsci 开源项目
现在国内和国外,特别是美国那边,突然出现很多开源项目,但是这些项目的用户有多少,有多少忠诚的粉丝,对于投资者来讲,完全是一个未知数,那么要投资开源项目,我们投资者必须准确无误的知道该项目的全部情况,包括项目发起人的情况,项目的维持时间..项目的技术水平,项目的参与者的势力,项目投入产出的效益.....
oracle alert log file（告警日志文件） daizj oracle 告警日志文件 alert log file
The alert log is a chronological log of messages and errors, and includes the following items: All internal errors (ORA-00600), block corruption errors (ORA-01578), and deadlock errors (ORA-00060)
关于 CAS SSO 文章声明 denger SSO
由于几年前写了几篇 CAS 系列的文章，之后陆续有人参照文章去实现，可都遇到了各种问题，同时经常或多或少的收到不少人的求助。现在这时特此说明几点： 1. 那些文章发表于好几年前了，CAS 已经更新几个很多版本了，由于近年已经没有做该领域方面的事情，所有文章也没有持续更新。 2. 文章只是提供思路，尽管 CAS 版本已经发生变化，但原理和流程仍然一致。最重要的是明白原理，然后
初二上学期难记单词 dcj3sjt126com english word
lesson 课 traffic 交通 matter 要紧；事物 happy 快乐的，幸福的 second 第二的 idea 主意；想法；意见 mean 意味着 important 重要的，重大的 never 从来，决不 afraid 害怕的 fifth 第五的 hometown 故乡，家乡 discuss 讨论；议论 east 东方的 agree 同意；赞成 bo
uicollectionview 纯代码布局, 添加头部视图 dcj3sjt126com Collection
#import <UIKit/UIKit.h> @interface myHeadView : UICollectionReusableView { UILabel *TitleLable; } -(void)setTextTitle; @end #import "myHeadView.h" @implementation m
N 位随机数字串的 JAVA 生成实现 FX夜归人 java Math 随机数 Random
/** * 功能描述随机数工具类<br /> * @author FengXueYeGuiRen * 创建时间 2014-7-25<br /> */ public class RandomUtil { // 随机数生成器 private static java.util.Random random = new java.util.R
Ehcache（09）——缓存Web页面 234390216 ehcache 页面缓存
页面缓存目录 1 SimplePageCachingFilter 1.1 calculateKey 1.2 可配置的初始化参数 1.2.1 cach
spring中少用的注解@primary解析 jackyrong primary
这次看下spring中少见的注解@primary注解，例子 @Component public class MetalSinger implements Singer{ @Override public String sing(String lyrics) { return "I am singing with DIO voice
Java几款性能分析工具的对比 lbwahoo java
Java几款性能分析工具的对比摘自：http://my.oschina.net/liux/blog/51800 在给客户的应用程序维护的过程中，我注意到在高负载下的一些性能问题。理论上，增加对应用程序的负载会使性能等比率的下降。然而，我认为性能下降的比率远远高于负载的增加。我也发现，性能可以通过改变应用程序的逻辑来提升，甚至达到极限。为了更详细的了解这一点，我们需要做一些性能
JVM参数配置大全 nickys jvm 应用服务器
JVM参数配置大全 /usr/local/jdk/bin/java -Dresin.home=/usr/local/resin -server -Xms1800M -Xmx1800M -Xmn300M -Xss512K -XX:PermSize=300M -XX:MaxPermSize=300M -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=5 -
搭建 CentOS 6 服务器(14) - squid、Varnish rensanning varnish
（一）squid 安装 # yum install httpd-tools -y # htpasswd -c -b /etc/squid/passwords squiduser 123456 # yum install squid -y 设置 # cp /etc/squid/squid.conf /etc/squid/squid.conf.bak # vi /etc/
Spring缓存注解@Cache使用 tom_seed spring
参考资料 http://www.ibm.com/developerworks/cn/opensource/os-cn-spring-cache/ http://swiftlet.net/archives/774 缓存注解有以下三个： @Cacheable @CacheEvict @CachePut
dom4j解析XML时出现"java.lang.noclassdeffounderror: org/jaxen/jaxenexception"错误 xp9802
java.lang.NoClassDefFoundError: org/jaxen/JaxenExc 关键字: java.lang.noclassdeffounderror: org/jaxen/jaxenexception 使用dom4j解析XML时，要快速获取某个节点的数据，使用XPath是个不错的方法，dom4j的快速手册里也建议使用这种方式执行时却抛出以下异常： Exceptio

斯坦福数据挖掘教程·第三版》读书笔记（英文版）Chapter 10 Mining Social-Network Graphs

Chapter 10 Mining Social-Network Graphs

SimRank

Summary of Chapter 10

你可能感兴趣的:(数据挖掘,数据挖掘,人工智能)