pro_panda

Latent Semantic Analysis (LSA) Tutorial

本文转载于：http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis-tutorial.html

LSA 也被称为 latent Semantic Indexing，LSI，可以用来分析文档内部的意义或者文档中的concept。

如果一个 word 只对应一个 concept，并且一个 concept 只描述一个 word，那么 LSI 将会变得非常容易，因为只需要简单在 words 和 concepts 之间建立一个一一映射，如下图：

不幸的是，实际上，words 和 concepts 之间不是简单的一一映射，而是多对多的映射，如下图：

LSI 是如何运作的呢？

LSI 起源是为了解决如下这个问题：如何使用 search words 找到相关的 documents；当我们通过比较 words 寻找相关的 documents 时，实际上想要比较 words 真正的含义，而非仅仅是形式上的不同；LSI 通过把 words 和 documents 映射到一个 concept space，然后在这个 space 里面进行比较，从而解决这个问题。

由于作者写作的时候，对于 words 的使用有多种选择，对于同一个 concept，由于不同的作者选择不同的 words，可能导致 concepts 模糊不清。这种对于 words 的随机选择，导致在 concept-word 映射关系里产生 noises。LSI 可以过滤掉一些噪音，并且试图找到能够跨越所有的 documents 的最小的一组 concepts。

为了解决这个问题，LSI 使用下面的一些简化：

1. documents 被表示为 “bags of words”，words 在 document 中的顺序是不重要的，只需要考虑 words 在 document 中出现的频率

2. concept 被表示为一组 words，这些 words 频繁地同时出现在 documents 中，For example "leash", "treat", and "obey" might usually appear in documents about dog training

3. 假设每个 word 都只有一个意思

一个简单的例子

在这个例子里，我尝试在 Amazon.com 使用 “investing”搜索书籍，然后取返回结果的前10个作为测试数据；因为有一本书的 title 与其他书籍的 titles 只有一个共同的 index word，所以被去掉了；index word 的定义如下：

1. 出现在 2个或者更过的书籍 title 中

2. 不是 stop words，例如, “and”，“the”

这个例子里，我们剔除这些 stop words：“and”, “edition”, “for”, “in”, “little”, “of”, “the”, “to”.

下面是剩下的9个 titles，index words 加了下划线：

The Neatest Little Guide to StockMarketInvesting
Investing For Dummies, 4th Edition
The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of StockMarket Returns
The Little Book of ValueInvesting
ValueInvesting: From Graham to Buffett and Beyond
RichDad'sGuide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not!
Investing in RealEstate, 5th Edition
StockInvesting For Dummies
RichDad's Advisors: The ABC's of RealEstateInvesting: The Secrets of Finding Hidden Profits Most Investors Miss

使用 LSI 分析这些 titles 后，我们可以在 XY坐标系里标记出 index words 的位置以及它们所属的 clusters；9个 titles 使用蓝色圆圈表示，11个 index words 使用红色方块表示，我们不仅可以画出 titles 所属的 clusters，而且可以给这些 clusters 打上 label，因为 index words 可以和 titles 在画在一起，如下图，蓝色的 cluster 代表 real estate，包含 titles T7 和 T9；蓝色的 cluster 是关于 value investing，包含 T2，T4，T5, 和 T8；红色的 cluster 代表 stock market，包含 T1 和 T3,；T6 代表的 title 是一个 outlier

QQ图片20130530221252.jpg

下面将分几步介绍使用 LSI 的几个步骤

Part 1 -- 创建 count matrix

第一步是创建 word by title matrix，每一个 index word 是一行，每一个 title 是一列；matrix 的每一项的值是对应的 word 在对应的 title 中出现的次数；一般的，这个 matrix 是很大的，但很稀疏，大部分项都是 0，下图中 0没有写出来

Index Words	Titles
	T1	T2	T3	T4	T5	T6	T7	T8	T9
book			1	1
dads						1			1
dummies		1						1
estate							1		1
guide	1					1
investing	1	1	1	1	1	1	1	1	1
market	1		1
real							1		1
rich						2			1
stock	1		1					1
value				1	1

Python 代码实现及介绍

Python - Getting Started

Download the python code here.

Throughout this article, we'll give Python code that implements all the steps necessary for doing Latent Semantic Analysis. We'll go through the code section by section and explain everything. The Python code used in this article can be downloaded here and then run in Python. You need to have already installed the Python NumPy and SciPy libraries.

Python - Import Functions

First we need to import a few functions from Python libraries to handle some of the math we need to do. NumPy is the Python numerical library, and we'll import zeros, a function that creates a matrix of zeros that we use when building our words by titles matrix. From the linear algebra part of the scientific package (scipy.linalg) we import the svd function that actually does the singular value decomposition, which is the heart of LSA.

from numpy import zeros
from scipy.linalg import svd

Python - Define Data

Next, we define the data that we are using. Titles holds the 9 book titles that we have gathered, stopwords holds the 8 common words that we are going to ignore when we count the words in each title, and ignorechars has all the punctuation characters that we will remove from words. We use Python's triple quoted strings, so there are actually only 4 punctuation symbols we are removing: comma (,), colon (:), apostrophe ('), and exclamation point (!).

titles =
  
  
  
  
   
   
   
   [ 
"The Neatest Little Guide to Stock Market Investing", 
"Investing For Dummies, 4th Edition", 
"The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns", 
"The Little Book of Value Investing", 
"Value Investing: From Graham to Buffett and Beyond", 
"Rich Dad's Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not!", 
"Investing in Real Estate, 5th Edition", 
"Stock Investing For Dummies", 
"Rich Dad's Advisors: The ABC's of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss" 
]
  
  
  
  
stopwords = ['and','edition','for','in','little','of','the','to'] 
ignorechars = ''',:'!'''

Python - Define LSA Class

The LSA class has methods for initialization, parsing documents, building the matrix of word counts, and calculating. The first method is the __init__ method, which is called whenever an instance of the LSA class is created. It stores the stopwords and ignorechars so they can be used later, and then initializes the word dictionary and the document count variables.

class LSA(object):
  
  
  
  
   
   
   
   def __init__(self, stopwords, ignorechars):
  
  
  
  
  
  
  
  
   
   
   
   self.stopwords = stopwords 
self.ignorechars = ignorechars 
self.wdict = {} 
self.dcount = 0

Python - Parse Documents

The parse method takes a document, splits it into words, removes the ignored characters and turns everything into lowercase so the words can be compared to the stop words. If the word is a stop word, it is ignored and we move on to the next word. If it is not a stop word, we put the word in the dictionary, and also append the current document number to keep track of which documents the word appears in.

The documents that each word appears in are kept in a list associated with that word in the dictionary. For example, since the word book appears in titles 3 and 4, we would have self.wdict['book'] = [3, 4] after all titles are parsed.

After processing all words from the current document, we increase the document count in preparation for the next document to be parsed.

  
  
  
  
   
   
   
   def parse(self, doc):
  
  
  
  
  
  
  
  
   
   
   
   words = doc.split(); 
for w in words:
  
  
  
  
  
  
  
  
   
   
   
   w = w.lower().translate(None, self.ignorechars) 
if w in self.stopwords:
  
  
  
  
  
  
  
  
   
   
   
   continue
  
  
  
  
  
  
  
  
   
   
   
   elif w in self.wdict:
  
  
  
  
  
  
  
  
   
   
   
   self.wdict[w].append(self.dcount)
  
  
  
  
  
  
  
  
   
   
   
   else:
  
  
  
  
  
  
  
  
   
   
   
   self.wdict[w] = [self.dcount]
  
  
  
  
  
  
  
  
   
   
   
   self.dcount += 1

Python - Build the Count Matrix

Once all documents are parsed, all the words (dictionary keys) that are in more than 1 document are extracted and sorted, and a matrix is built with the number of rows equal to the number of words (keys), and the number of columns equal to the document count. Finally, for each word (key) and document pair the corresponding matrix cell is incremented.

  
  
  
  
   
   
   
   def build(self):
  
  
  
  
  
  
  
  
   
   
   
   self.keys = [k for k in self.wdict.keys() if len(self.wdict[k]) > 1] 
self.keys.sort() 
self.A = zeros([len(self.keys), self.dcount]) 
for i, k in enumerate(self.keys):
  
  
  
  
  
  
  
  
   
   
   
   for d in self.wdict[k]:
  
  
  
  
  
  
  
  
   
   
   
   self.A[i,d] += 1

Python - Print the Count Matrix

The printA() method is very simple, it just prints out the matrix that we have built so it can be checked.

  
  
  
  
   
   
   
   def printA(self):
  
  
  
  
  
  
  
  
   
   
   
   print self.A

Python - Test the LSA Class

After defining the LSA class, it's time to try it out on our 9 book titles. First we create an instance of LSA, called mylsa, and pass it the stopwords and ignorechars that we defined. During creation, the __init__ method is called which stores the stopwords and ignorechars and initializes the word dictionary and document count.

Next, we call the parse method on each title. This method extracts the words in each title, strips out punctuation characters, converts each word to lower case, throws out stop words, and stores remaining words in a dictionary along with what title number they came from.

Finally we call the build() method to create the matrix of word by title counts. This extracts all the words we have seen so far, throws out words that occur in less than 2 titles, sorts them, builds a zero matrix of the right size, and then increments the proper cell whenever a word appears in a title.

mylsa = LSA(stopwords, ignorechars) 
for t in titles:
  
  
  
  
   
   
   
   mylsa.parse(t)
  
  
  
  
mylsa.build() 
mylsa.printA()

Here is the raw output produced by printA(). As you can see, it's the same as the matrix that we showed earlier.

[[ 0. 0. 1. 1. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 1.]
[ 0. 1. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 1. 0. 1.]
[ 1. 0. 0. 0. 0. 1. 0. 0. 0.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 1. 0. 1.]
[ 0. 0. 0. 0. 0. 2. 0. 0. 1.]
[ 1. 0. 1. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 1. 1. 0. 0. 0. 0.]]

Part 2 - Modify the Counts with TFIDF

In sophisticated Latent Semantic Analysis systems, the raw matrix counts are usually modified so that rare words are weighted more heavily than common words. For example, a word that occurs in only 5% of the documents should probably be weighted more heavily than a word that occurs in 90% of the documents. The most popular weighting is TFIDF (Term Frequency - Inverse Document Frequency). Under this method, the count in each cell is replaced by the following formula.

TFIDF_i,j = ( N_i,j / N_*,j ) * log( D / D_i ) where

N_i,j = the number of times word i appears in document j (the original cell count).
N_*,j = the number of total words in document j (just add the counts in column j).
D = the number of documents (the number of columns).
D_i = the number of documents in which word i appears (the number of non-zero columns in row i).

In this formula, words that concentrate in certain documents are emphasized (by the N_i,j / N_*,jratio) and words that only appear in a few documents are also emphasized (by the log( D / D_i ) term).

Since we have such a small example, we will skip this step and move on the heart of LSA, doing the singular value decomposition of our matrix of counts. However, if we did want to add TFIDF to our LSA class we could add the following two lines at the beginning of our python file to import the log, asarray, and sum functions.

from math import log
from numpy import asarray, sum

Then we would add the following TFIDF method to our LSA class. WordsPerDoc (N_*,j) just holds the sum of each column, which is the total number of index words in each document. DocsPerWord (D_i) uses asarray to create an array of what would be True and False values, depending on whether the cell value is greater than 0 or not, but the 'i' argument turns it into 1's and 0's instead. Then each row is summed up which tells us how many documents each word appears in. Finally, we just step through each cell and apply the formula. We do have to change cols (which is the number of documents) into a float to prevent integer division.

  
  
  
  
   
   
   
   def TFIDF(self):
  
  
  
  
  
  
  
  
   
   
   
   WordsPerDoc = sum(self.A, axis=0) 
DocsPerWord = sum(asarray(self.A > 0, 'i'), axis=1) 
rows, cols = self.A.shape 
for i in range(rows):
  
  
  
  
  
  
  
  
   
   
   
   for j in range(cols):
  
  
  
  
  
  
  
  
   
   
   
   self.A[i,j] = (self.A[i,j] / WordsPerDoc[j]) * log(float(cols) / DocsPerWord[i])

Part 3 - Using the Singular Value Decomposition

Once we have built our (words by titles) matrix, we call upon a powerful but little known technique called Singular Value Decomposition or SVD to analyze the matrix for us. The "Singular Value Decomposition Tutorial" is a gentle introduction for readers that want to learn more about this powerful and useful algorithm.

The reason SVD is useful, is that it finds a reduced dimensional representation of our matrix that emphasizes the strongest relationships and throws away the noise. In other words, it makes the best possible reconstruction of the matrix with the least possible information. To do this, it throws out noise, which does not help, and emphasizes strong patterns and trends, which do help. The trick in using SVD is in figuring out how many dimensions or "concepts" to use when approximating the matrix. Too few dimensions and important patterns are left out, too many and noise caused by random word choices will creep back in.

The SVD algorithm is a little involved, but fortunately Python has a library function that makes it simple to use. By adding the one line method below to our LSA class, we can factor our matrix into 3 other matrices. The U matrix gives us the coordinates of each word on our “concept” space, the Vt matrix gives us the coordinates of each document in our “concept” space, and the S matrix of singular values gives us a clue as to how many dimensions or “concepts” we need to include.

  
  
  
  
   
   
   
   def calc(self):
  
  
  
  
  
  
  
  
   
   
   
   self.U, self.S, self.Vt = svd(self.A)

In order to choose the right number of dimensions to use, we can make a histogram of the square of the singular values. This graphs the importance each singular value contributes to approximating our matrix. Here is the histogram in our example.

Singular Value Importance

对于很大的 documents 集合，我们一般选择 100-500 个 dimensions；在我们这个小例子中，由于我们想能够更好的画出示意图，我们仅使用3个 dimensions，并且扔到第一个 dimension，画出第二个和第三个 dimensions

我们为什么要扔掉第一个dimension 呢？因为，对于 documents，第一个 dimension 和 document 的长度是相关的，而对于 words，第一个 dimension 是和 word 在所有的 documents 中出现的次数相关；但是如果我们让matrix 的每一列减去该列的平均值，从而对 matrix 进行 center 操作，那么我们就可以使用第一个 dimension

但是我们在使用 LSI 时一般不对 matrix 进行 center，因为 LSI 会把一个 sparse matrix 转换为一个 dense matrix，并且会大幅度的增加内存和计算的消耗，所以不对 matrix 进行 center 操作，将第一个 dimension 丢弃，会提高效率

下面是我们的 matrix 的完整的3个dimension 的 Singular Value Decompostion 的结果，每一个 word 有 3个数字与它们相关，对应3个 dimensions，word 的第一个 dimension里面的数子对应该 word 在所有 tiltes 里面出现的次数，所以它不如第二个和第三个 dimension 有用；类似地，每个 title 有3个数字与之相关，对应3个 dimensions，同样地，第一个 dimension 里面的数字对应该 title 包含的 words 的数目，即该 title 的长度，它也被丢弃

book	0.15	-0.27	0.04
dads	0.24	0.38	-0.09
dummies	0.13	-0.17	0.07
estate	0.18	0.19	0.45
guide	0.22	0.09	-0.46
investing	0.74	-0.21	0.21
market	0.18	-0.30	-0.28
real	0.18	0.19	0.45
rich	0.36	0.59	-0.34
stock	0.25	-0.42	-0.28
value	0.12	-0.14	0.23


3.91	0	0
0	2.61	0
0	0	2.00


T1	T2	T3	T4	T5	T6	T7	T8	T9
0.35	0.22	0.34	0.26	0.22	0.49	0.28	0.29	0.44
-0.32	-0.15	-0.46	-0.24	-0.14	0.55	0.07	-0.31	0.44
-0.41	0.14	-0.16	0.25	0.22	-0.51	0.55	0.00	0.34

Part 4 -- 使用 color 进行 clustering

将数字转换为 colors，蓝色代表负数，红色代表正数，白色代表接近0的数字：

Top 3 Dimensions of Book Titles

We can use these colors to cluster the titles. We ignore the first dimension for clustering because all titles are red. In the second dimension, we have the following result.

Dim2	Titles
red	6-7, 9
blue	1-5, 8

Using the third dimension, we can split each of these groups again the same way. For example, looking at the third dimension, title 6 is blue, but title 7 and title 9 are still red. Doing this for both groups, we end up with these 4 groups.

Dim2	Dim3	Titles
red	red	7, 9
red	blue	6
blue	red	2, 4-5, 8
blue	blue	1, 3

It’s interesting to compare this table with what we get when we graph the results in the next section.

Part 5 - Clustering by Value

Leaving out the first dimension, as we discussed, let's graph the second and third dimensions using a XY graph. We'll put the second dimension on the X axis and the third dimension on the Y axis and graph each word and title. It's interesting to compare the XY graph with the table we just created that clusters the documents.

In the graph below, words are represented by red squares and titles are represented by blue circles. For example the word "book" has dimension values (0.15, -0.27, 0.04). We ignore the first dimension value 0.15 and graph "book" to position (x = -0.27, y = 0.04) as can be seen in the graph. Titles are similarly graphed.

xygraph2

One advantage of this technique is that both words and titles are placed on the same graph. Not only can we identify clusters of titles, but we can also label the clusters by looking at what words are also in the cluster. For example, the lower left cluster has titles 1 and 3 which are both about stock market investing. The words "stock" and "market" are conveniently located in the cluster, making it easy to see what the cluster is about. Another example is the middle cluster which has titles 2, 4, 5, and, to a somewhat lesser extent, title 8. Titles 2, 4, and 5 are close to the words "value" and "investing" which summarizes those titles quite well.

LSI 的优缺点以及应用

优点：

1. documents 和 words 都被映射到同一个 concept space，在这个 space 里面，我们可以进行 cluster documents，cluster words，并且更重要的是，我们可以给定 words，搜索 documents，反之亦然

2. 得到的 concept space 和原来的 matrix 比起来，包含少得多的 dimensions，这些 dimensions 包含最重要信息，最少的 noiese，所以这个 concept space 可以用来使用运行其它算法，例如测试不同的 clustering 算法

3. LSI 是一个 global algorithm，它基于所有的 words 和 documents 寻找 trends 和 pattern，所以它可能找到其它 local algorithms 不能找到的信息，它还可以结合 local algorithms 使用，例如 nearest neighbours，从而变得更加有用

缺点：

1. LSI 假设数据符合 Gaussian distribution 和 Frobenius norm，这并不适合所有的情况，例如，documents 中的 words 服从 Poisson distribution，而非 Gaussian distribution

2. LSI 假设一个 word 只有一个 concept，所以不能处理一词多义的情况

3. LSI 依赖于 SVD，需要大量的计算，所以当有新的 document 时，难以更新

尽管有这么多缺点，LSI 仍被大量使用，例如寻找和组织搜索结果，文档聚类，垃圾过滤，语音识别，专利查找，自动文章评价等

本文转载于：http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis-tutorial.html

你可能感兴趣的:(lsi,LSA)

Some jenkins settings SnC_
Jenkins连接到特定gitlabproject的特定branch我采用的方法是在pipeline的script中使用git命令来指定branch。如下：stage('Clonerepository'){steps{gitbranch:'develop',credentialsId:'gitlab-credential-id',url:'http://gitlab.com/repo.git'}}
matlab delsat = setdiff(1:69,unique(Eph(30,:)))；语句含义黄卷青灯77 matlab 开发语言 setdiff
这行MATLAB代码用于计算在范围1:69中不包含在Eph矩阵第30行的唯一值集合中的所有元素。具体解释如下：delsat=setdiff(1:69,unique(Eph(30,:)));解释Eph(30,:)Eph(30,:)提取矩阵Eph的第30行的所有列元素。这是一个行向量，包含了第30行的所有值。unique(Eph(30,:))unique函数返回Eph(30,:)中的唯一元素。这意味着
✔2848. 与车相交的点程序员小小聪力扣 leetcode
代码实现：方法一：哈希表#definefmax(a,b)((a)>(b)?(a):(b))intnumberOfPoints(int**nums,intnumsSize,int*numsColSize){inthash[101]={0};intmax=0;for(inti=0;i=x){j--;}if(i=nums[i][0]){r=r>nums[i][1]?r:nums[i][1];}else{
如何利用Samba跨平台分享Ubuntu文件夹 GottenZZP 部署相关 ubuntu linux 运维
1.安装Samba终端输入sudoaptinstallsamba2.配置Samba终端输入sudovim/etc/samba/smb.conf打开配置文件滑动文件到最底下输入以下内容[Share]#要共享的文件夹路径path=/home/xxx/sambasharereadonly=nobrowsable=yes编辑完成后按一下Esc按键后输入:wq回车保存3.重启Samba服务终端输入sudos
2005年高考英语北京卷 - 阅读理解C 让文字更美
Howcouldwepossiblythinkthatkeepinganimalsincagesinunnaturalenvironments-mostlyforentertainmentpurposes-isfairandrespectful?我们怎么可能认为把动物关在非自然环境的笼子里——主要是为了娱乐目的——是公平和尊重的呢？Zooofficialssaytheyareconcernedab
OSPF LSA5、LSA7 中 FA 工作原理详解斐夷所非 network 网络
FA(ForwardingAddress)仅出现在LSA5或LSA7中，它是数据包访问外部网络时，在数据报文离开OSPF路由域时必须经过的设备地址。LSA5作用LSA5区别于LSA3/LSA4，LSA5仅负责通告OSPF路由域外其他协议的路由，如RIP、BGP等。当外部路由引入到OSPF后，靠LSA5将其泛洪到OSPF路由域。LSA5具有其他LSA所没有的泛洪范围，除了特殊类型区域(Stub及NS
python打包whl文件梦中学逆向 python python
简介当我们自己开发完一个项目时，如果想在别的电脑上使用，将所有文件复制比较麻烦，这时将所有项目打包成一个whl文件，再到别的电脑安装就很简单了准备一个新的虚拟环境：避免版本冲突安装wheel，后面打包要用pipinstallwheel将所有项目放在同一个文件夹下，新建一个setup.py文件importosimportreimportsysfromsetuptoolsimportsetupimpo
VLSI电路单元的自动布局：全局布局基础介绍 Jaaiko 数学建模算法开源图论 matlab
2024年华数杯全国大学生数学建模竞赛B题为：VLSI电路单元的自动布局。本题主要关注的是全局布局问题。学术界针对全局布局的评估模型和优化方法的研究历史悠久。本文借题顺势介绍全局布局的一些重点基础内容和相关工具/资料，以期为对EDA算法设计领域感兴趣、对数学建模感兴趣的人降低研究门槛。VLSI是超大规模集成电路的简称。完成一个VLSI设计的流程十分复杂，包含多种数据格式的转化，其中将逻辑网表转变为
VLSI 电路单元的自动布局-2024华数杯B题 2401_84314384 算法 python 数学建模
摘要超大规模集成电路设计通常采用电子设计自动化(EDA)的方式进行，布局是EDA工具的核心的核心。本文通过构建的线长评估模型及网格密度评估模型，并在此基础上对模型进行优化，最后利用模型实现VLSI电路单元的自动布局。问题一：基于结合直线型斯坦纳最小树思想的优化HPWL模型评估与电路单元连线接口坐标相关的线长。本题需要建立与电路单元连线接口坐标相关的线长评估模型，最小化每组估计线长与对应RSMT的差
概率潜在语义分析（Probabilistic Latent Semantic Analysis,PLSA）—无监督学习方法、概率模型、生成模型、共现模型、非线性模型、参数化模型、批量学习剑海风云 Artificial Intelligence 人工智能机器学习概率潜在语义分析 PLSA
定义输入:设单词集合为W={ω1,ω2,⋯ ,ωM}W=\{\omega_1,\omega_2,\cdots,\omega_M\}W={ω1,ω2,⋯,ωM},文本集合为D={d1,d2,⋯ ,dN}D=\{d_1,d_2,\cdots,d_N\}D={d1,d2,⋯,dN},话题集合为Z={z1,z2,⋯ ,zN}Z=\{z_1,z_2,\cdots,z_N\}Z={z1,z2,⋯,zN},共现
python sanic orm_sanic中使用tortoise-orm Mr浪子相依 python sanic orm
#models.pyfromtortoise.modelsimportModelfromtortoiseimportfieldsclassUser(Model):id=fields.IntField(pk=True,,source_field="userID")name=fields.CharField(max_length=100)date_field=fields.DateTimeField(
Python convtools 展示：流行聚合 Nikita Almakov python 后端低代码
今天，我想演示一下convtools如何帮助你避免重复实施流行的聚合。将对象序列转换为字典，其中的值按和计算fromconvtoolsimportconversionascfunc=c.aggregate(c.ReduceFuncs.DictSum(c.item("a"),c.item("b"))).gen_converter()data=[{"a":1,"b":10},{"a":2,"b":11
Kafka和Pulsar深入解析 jasen91 大数据开发 kafka 分布式
Kafka多租户：单租户系统数据迁移：依赖MirrorMaker，需要额外维护。市场上也有ConfluentReplicator等供应商工具。分层存储：由供应商提供商业使用。组件依赖：KafkaRaft（KRaft）从Kafka2.8开始处于早期访问模式，允许Kafka在没有ZooKeeper的情况下工作。这对Kafka来说是一个显著的优势，因为它简化了Kafka的体系结构并降低了学习成本。云原生
登录校验实现——Jwt、Filter/Interceptor 应起忆 java spring
Jwt令牌生成引入依赖，JDK8之后的版本需要引入JAXBjavax.xml.bindjaxb-api2.3.1org.glassfish.jaxbjaxb-runtime2.3.1io.jsonwebtokenjjwt0.9.1写JwtUtilsimportio.jsonwebtoken.Claims;importio.jsonwebtoken.Jwts;importio.jsonwebtoke
Django 创建好的模块怎么在后台显示 u010373106 python Django django 数据库 sqlite
1、配置模型及其需要显示的数据刚才创建好的tests的增删改查，在后台是不显示的，所以需要进行配置,在刚才创建好的模块里找到admin.py文件，在里面进行如下配置fromdjango.contribimportadminfrom.importmodelsfrom.modelsimportTests#Registeryourmodelshere.classTestsAdmin(admin.Mode
Spring Boot 2.0 解决跨域问题：WebMvcConfiguration implements WebMvcConfigurer 令狐少侠2011 spring 前端 spring boot java 后端
WhenallowCredentialsistrue,allowedOriginscannotcontainthespecialvalue“*“sincethatcannotWhenallowCredentialsistrue,allowedOriginscannotcontainthespecialvalue"*"sincethatcannotbesetonthe“Access-Control-
连续发送多个数据（uart串口RS232协议/verilog详细代码+仿真）勇敢牛牛（FPGA学习版） fpga开发嵌入式硬件 matlab 智能硬件
写在前言以下内容详细源文件，已经上传个人主页资源，需要自取~目录写在前言需求分析UART简介整体架构流程小结需求分析使用串口（rs232协议）间隔1s连续发送16byte的数据。由于每次发送的数据只有8bit，16byte=128bit，所以要发送16帧。UART简介这里实验所使用的参数有：rs232通信协议+9600bps+quartus18.0+modelsim2020异步通信：UART是一种
Python——破解rar压缩包密码星和月 python 算法
破解RAR压缩包密码一般是通过穷举法来实现的，即尝试所有可能的密码组合，直到找到正确的密码为止。以下是使用Python编写的一个简单的RAR密码破解程序：importitertoolsimportrarfiledefcrack_rar_password(rar_file,password_length):#创建RAR文件对象rf=rarfile.RarFile(rar_file)#定义密码字符集合
获取视频长度 AI算法网奇 python基础 python 开发语言
fromdecordimportVideoReadersys.path.insert(0,'/home/model-server/dev/data_platform/processors')fromaestheticimportget_aesthetic_model,get_aesthetic_score_batch_queuefrommytools.utilsimportprint_with_t
The Magician's Nephew Chapter15 Mr_Oldman
"Oh,"saidDigory,verysurprised."Well,alright,I'llsayI'msorry.AndIreallyamsorryaboutwhathappenedinthewaxworksroom.There:I'vesaidI'msorry.Andnow,dobedecent(正派的)andcomeback.Ishallbeinafrightfulholeifyoudo
MegaCli查看RAID z1119400608 linux 运维服务器
文章本身我不做过多修改了，在这里我就把自己在安装时候碰到的难点跟大家提下。1.何处下载？首先，根据文章中的路径已经下载不到相应的文件了，在此我们就自己到http://www.lsi.com的网站上去搜，尝试了各种组合最后终于用linux_cli.zip在搜索栏中搜索出相应结果，找到linux版本，下载即可。为了安全起见我还将文件上传至下载空间，以备不时之需。http://down.51cto.co
ImportError: cannot import name ‘conv_utils‘ from ‘keras.utils‘ CheCacao keras 深度学习 python tensorflow tensorflow2 人工智能
将fromkeras.utilsimportconv_utils改为fromtensorflow.python.keras.utilsimportconv_utilsImportError:nomodulenamed'tensorflow.keras.engine将fromkeras.engine.topologyimportLayer改为fromtensorflow.python.keras.l
jupyter出错ImportError: cannot import name ‘np_utils‘ from ‘keras.utils‘ ，怎么解决？七月初七淮水竹亭～人工智能 python jupyter keras 深度学习
文章前言此篇文章主要是记录一下我遇到的问题以及我是如何解决的，希望下次遇到类似问题可以很快解决。此外，也希望能帮助到大家。遇到的问题出错：ImportError:cannotimportname'np_utils'from'keras.utils'，如图：如何解决首先我根据网上文章的一些提示，将fromkeras.utilsimportnp_utils换成了fromtensorflow.keras
Ubuntu 开机出现 recovering journal 无法进入图形界面解决流程(不通用，自用) Artintel 学习 ubuntu
远程连接进入命令行：rm-rf/etc/X11/xorg.confcp/etc/X11/xorg.conf.failsafe/etc/X11/xorg.confsudoservicelightdmstopsudoapt-getremovenvidia*cdjohn/qudong+cuda9.0\+\cudnn/sudochmoda+xnv.runsudo./nv.run-no-x-check-no
【drools】文档2：起步 drools和dmn 等风来不如迎风去网络服务入门与实战 spring drools
GettingStartedDroolsUserGuide8.44.0.FinalGettingStartedFirstRuleProjectThisguidewalksyouthroughtheprocessofcreatingasimpleDroolsapplicationproject.PrerequisitesJDK11+withJAVA_HOMEconfiguredappropriate
新航线(优惠)! 波音787上海直飞芬兰赫尔辛基别游天台云卧往
每日推送旅行优惠！优惠消息：亚航新航线北京直飞清迈，今晚12点开促！含税800CNY往返，对华北的同学来说比较难得，夜熬起来OK，正文开始吉祥航空赫尔辛基（Helsinki），是芬兰的首都和最大的港口城市，一座北欧的小城市，芬兰的首都，一个美丽、安静、生动的城市！芬兰被称为千湖之国，赫尔辛基这座城市被誉为波罗的海的女儿！赫尔辛基毗邻波罗的海，是一座古典美与现代文明融为一体的都市，又是一座都市建筑与
使用Ananconda prompt创建环境时出现：UnavailableInvalidChannel: HTTP 404 NOT FOUND for channel simple 小新eQ prompt python pip
出现以下问题：UnavailableInvalidChannel:HTTP404NOTFOUNDforchannelsimple Thechannelisnotaccessibleorisinvalid.Youwillneedtoadjustyourcondaconfigurationtoproceed.Use`condaconfig--showchannels`toviewyourconfigu
1（新生、音乐、虔诚和勤奋）二瘦不太傻
原文:Thouhastmademeendless,suchisthypleasure.Thisfrailvesselthouemptiestagainandagain,andfillestiteverwithfreshlife.Thislittlefluteofareedthouhastcarriedoverhillsanddales,andhastbreathedthroughitmelodie
ThreadPoolExecutor常用方法君子剑mango java 开发语言后端
一线程池中线程数量ThreadPoolExecutor类中线程数量相关方法publicintgetCorePoolSize()：thecorenumberofthreads，核心线程数，固定值；publicintgetMaximumPoolSize()：themaximumallowednumberofthreads，最大线程数，固定值；publicintgetPoolSize()：thecurr
【Sqlite】.NET Framework使用Sqlite的注意事项主宰者个人笔记数据库
注意：NuGet引入System.Data.SQLite.Core不要引入System.Data.SQLite注意：局域网共享链接正常链接DataSource=\\BAT-OCV\SqliteDB\batOCV.db;Version=3;Pooling=True;MaxPoolSize=100;局域网链接DataSource=\\\BAT-OCV\SqliteDB\batOCV.db;Versio
解线性方程组 qiuwanchi
package gaodai.matrix; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class Test { public static void main(String[] args) { Scanner scanner = new Sc
在mysql内部存储代码 annan211 性能 mysql 存储过程触发器
在mysql内部存储代码在mysql内部存储代码，既有优点也有缺点，而且有人倡导有人反对。先看优点： 1 她在服务器内部执行，离数据最近，另外在服务器上执行还可以节省带宽和网络延迟。 2 这是一种代码重用。可以方便的统一业务规则，保证某些行为的一致性，所以也可以提供一定的安全性。 3 可以简化代码的维护和版本更新。 4 可以帮助提升安全，比如提供更细
Android使用Asynchronous Http Client完成登录保存cookie的问题 hotsunshine android
Asynchronous Http Client是android中非常好的异步请求工具除了异步之外还有很多封装比如json的处理，cookie的处理引用 Persistent Cookie Storage with PersistentCookieStore This library also includes a PersistentCookieStore whi
java面试题 Array_06 java 面试
java面试题第一，谈谈final, finally, finalize的区别。 final-修饰符（关键字）如果一个类被声明为final，意味着它不能再派生出新的子类，不能作为父类被继承。因此一个类不能既被声明为 abstract的，又被声明为final的。将变量或方法声明为final，可以保证它们在使用中不被改变。被声明为final的变量必须在声明时给定初值，而在以后的引用中只能
网站加速 oloz 网站加速
前序:本人菜鸟，此文研究总结来源于互联网上的资料，大牛请勿喷！本人虚心学习，多指教. 1、减小网页体积的大小，尽量采用div+css模式，尽量避免复杂的页面结构，能简约就简约。 2、采用Gzip对网页进行压缩； GZIP最早由Jean-loup Gailly和Mark Adler创建，用于UNⅨ系统的文件压缩。我们在Linux中经常会用到后缀为.gz
正确书写单例模式随意而生 java 设计模式单例
　　单例模式算是设计模式中最容易理解，也是最容易手写代码的模式了吧。但是其中的坑却不少，所以也常作为面试题来考。本文主要对几种单例写法的整理，并分析其优缺点。很多都是一些老生常谈的问题，但如果你不知道如何创建一个线程安全的单例，不知道什么是双检锁，那这篇文章可能会帮助到你。　　懒汉式，线程不安全　　当被问到要实现一个单例模式时，很多人的第一反应是写出如下的代码，包括教科书上也是这样
单例模式香水浓 java
懒汉调用getInstance方法时实例化 public class Singleton { private static Singleton instance; private Singleton() {} public static synchronized Singleton getInstance() { if(null == ins
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" AdyZhang apache http server
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" 每次到这一步都很小心防它的端口冲突问题，结果，特意留出来的80端口就是不能用，烦。解决方法确保几处： 1、停止IIS启动 2、把端口80改成其它（譬如90，800，，，什么数字都好） 3、防火墙(关掉试试) 在运行处输入 cmd 回车，转到apa
如何在android 文件选择器中选择多个图片或者视频？ aijuans android
我的android app有这样的需求，在进行照片和视频上传的时候，需要一次性的从照片/视频库选择多条进行上传但是android原生态的sdk中，只能一个一个的进行选择和上传。我想知道是否有其他的android上传库可以解决这个问题，提供一个多选的功能，可以使checkbox之类的，一次选择多个处理方法官方的图片选择器(但是不支持所有版本的androi，只支持API Level
mysql中查询生日提醒的日期相关的sql baalwolf mysql
SELECT sysid,user_name,birthday,listid,userhead_50,CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')),CURDATE(), dayofyear( CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')))-dayofyear(
MongoDB索引文件破坏后导致查询错误的问题 BigBird2012 mongodb
问题描述： MongoDB在非正常情况下关闭时，可能会导致索引文件破坏，造成数据在更新时没有反映到索引上。解决方案：使用脚本，重建MongoDB所有表的索引。 var names = db.getCollectionNames(); for( var i in names ){ var name = names[i]; print(name);
Javascript Promise bijian1013 JavaScript Promise
Parse JavaScript SDK现在提供了支持大多数异步方法的兼容jquery的Promises模式，那么这意味着什么呢，读完下文你就了解了。一.认识Promises “Promises”代表着在javascript程序里下一个伟大的范式，但是理解他们为什么如此伟大不是件简
[Zookeeper学习笔记九]Zookeeper源代码分析之Zookeeper构造过程 bit1129 zookeeper
Zookeeper重载了几个构造函数，其中构造者可以提供参数最多，可定制性最多的构造函数是 public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, long sessionId, byte[] sessionPasswd, boolea
【Java命令三】jstack bit1129 jstack
jstack是用于获得当前运行的Java程序所有的线程的运行情况(thread dump），不同于jmap用于获得memory dump [hadoop@hadoop sbin]$ jstack Usage: jstack [-l] <pid> (to connect to running process) jstack -F
jboss 5.1启停脚本　动静分离部署 ronin47
以前启动jboss，往各种xml配置文件，现只要运行一句脚本即可。start nohup sh /**/run.sh -c servicename -b ip -g clustername -u broatcast jboss.messaging.ServerPeerID=int -Djboss.service.binding.set=p
UI之如何打磨设计能力? brotherlamp UI ui教程 ui自学 ui资料 ui视频
在越来越拥挤的初创企业世界里，视觉设计的重要性往往可以与杀手级用户体验比肩。在许多情况下，尤其对于 Web 初创企业而言，这两者都是不可或缺的。前不久我们在《右脑革命：别学编程了，学艺术吧》中也曾发出过重视设计的呼吁。如何才能提高初创企业的设计能力呢?以下是 9 位创始人的体会。 1.找到自己的方式如果你是设计师，要想提高技能可以去设计博客和展示好设计的网站如D-lists或
三色旗算法 bylijinnan java 算法
import java.util.Arrays; /** 问题：假设有一条绳子，上面有红、白、蓝三种颜色的旗子，起初绳子上的旗子颜色并没有顺序，您希望将之分类，并排列为蓝、白、红的顺序，要如何移动次数才会最少，注意您只能在绳子上进行这个动作，而且一次只能调换两个旗子。网上的解法大多类似：在一条绳子上移动，在程式中也就意味只能使用一个阵列，而不使用其它的阵列来
警告:No configuration found for the specified action: \'s chiangfai configuration
1.index.jsp页面form标签未指定namespace属性。  <%@taglib prefix="s" uri="/struts-tags"%> ... <s:form action="submit" method="post"&g
redis -- hash_max_zipmap_entries设置过大有问题 chenchao051 redis hash
使用redis时为了使用hash追求更高的内存使用率，我们一般都用hash结构，并且有时候会把hash_max_zipmap_entries这个值设置的很大，很多资料也推荐设置到1000，默认设置为了512，但是这里有个坑 #define ZIPMAP_BIGLEN 254 #define ZIPMAP_END 255 /* Return th
select into outfile access deny问题 daizj mysql txt 导出数据到文件
本文转自：http://hatemysql.com/2010/06/29/select-into-outfile-access-deny%E9%97%AE%E9%A2%98/ 为应用建立了rnd的帐号，专门为他们查询线上数据库用的，当然，只有他们上了生产网络以后才能连上数据库，安全方面我们还是很注意的，呵呵。授权的语句如下： grant select on armory.* to rn
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
<?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('This example should only be run from a Web Brows
美国电影超短200句 dcj3sjt126com 电影
1. I see．我明白了。2. I quit! 我不干了!3. Let go! 放手!4. Me too．我也是。5. My god! 天哪!6. No way! 不行!7. Come on．来吧(赶快)8. Hold on．等一等。9. I agree。我同意。10. Not bad．还不错。11. Not yet．还没。12. See you．再见。13. Shut up!
Java访问远程服务 dyy_gusi httpclient webservice get post
随着webService的崛起，我们开始中会越来越多的使用到访问远程webService服务。当然对于不同的webService框架一般都有自己的client包供使用，但是如果使用webService框架自己的client包，那么必然需要在自己的代码中引入它的包，如果同时调运了多个不同框架的webService，那么就需要同时引入多个不同的clien
Maven的settings.xml配置 geeksun settings.xml
settings.xml是Maven的配置文件，下面解释一下其中的配置含义： settings.xml存在于两个地方： 1.安装的地方：$M2_HOME/conf/settings.xml 2.用户的目录：${user.home}/.m2/settings.xml 前者又被叫做全局配置，后者被称为用户配置。如果两者都存在，它们的内容将被合并，并且用户范围的settings.xml优先。
ubuntu的init与系统服务设置 hongtoushizi ubuntu
转载自： http://iysm.net/?p=178 init Init是位于/sbin/init的一个程序，它是在linux下，在系统启动过程中，初始化所有的设备驱动程序和数据结构等之后，由内核启动的一个用户级程序，并由此init程序进而完成系统的启动过程。 ubuntu与传统的linux略有不同，使用upstart完成系统的启动，但表面上仍维持init程序的形式。运行
跟我学Nginx+Lua开发目录贴 jinnianshilongnian nginx lua
使用Nginx+Lua开发近一年的时间，学习和实践了一些Nginx+Lua开发的架构，为了让更多人使用Nginx+Lua架构开发，利用春节期间总结了一份基本的学习教程，希望对大家有用。也欢迎谈探讨学习一些经验。目录第一章安装Nginx+Lua开发环境第二章 Nginx+Lua开发入门第三章 Redis/SSDB+Twemproxy安装与使用第四章 L
php位运算符注意事项 home198979 位运算 PHP &
$a = $b = $c = 0; $a & $b = 1; $b | $c = 1 问a,b,c最终为多少? 当看到这题时，我犯了一个低级错误，误以为位运算符会改变变量的值。所以得出结果是1 1 0 但是位运算符是不会改变变量的值的，例如： $a=1;$b=2; $a&$b; 这样a,b的值不会有任何改变
Linux shell数组建立和使用技巧 pda158 linux
1.数组定义　　[chengmo@centos5 ~]$ a=(1 2 3 4 5) 　　[chengmo@centos5 ~]$ echo $a 　　1 　　一对括号表示是数组，数组元素用“空格”符号分割开。　　 2.数组读取与赋值　　得到长度：　　[chengmo@centos5 ~]$ echo ${#a[@]} 　　5 　　用${#数组名[@或
hotspot源码(JDK7) ol_beta java HotSpot jvm
源码结构图，方便理解： ├─agent Serviceab
Oracle基本事务和ForAll执行批量DML练习 vipbooks oracle sql
基本事务的使用：从账户一的余额中转100到账户二的余额中去，如果账户二不存在或账户一中的余额不足100则整笔交易回滚 select * from account; -- 创建一张账户表 create table account( -- 账户ID id number(3) not null, -- 账户名称 nam