rokia_xmu

PCA and K-means Clustering of Delta Aircraft

June 22, 2014

By Myles Harrison

name="f149933c48" width="100px" height="21px" frameborder="0" allowtransparency="true" allowfullscreen="true" scrolling="no" title="fb:like Facebook Social Plugin" src="http://www.facebook.com/plugins/like.php?app_id=&channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2FwjDNIDNrTQG.js%3Fversion%3D41%23cb%3Df2819fab98%26domain%3Dwww.r-bloggers.com%26origin%3Dhttp%253A%252F%252Fwww.r-bloggers.com%252Ff45965f68%26relation%3Dparent.parent&container_width=0&height=21&href=http%3A%2F%2Fwww.r-bloggers.com%2Fpca-and-k-means-clustering-of-delta-aircraft%2F&layout=button_count&locale=en_US&sdk=joey&send=true&show_faces=false&width=100" style="position: absolute; border-style: none; border-width: initial; visibility: visible; width: 0px; height: 0px;">

(This article was first published on everyday analytics, and kindly contributed to R-bloggers)

Introduction

I work in consulting. If you’re a consultant at a certain type of company, agency, organization, consultancy, whatever, this can sometimes mean travelling a lot.

Many business travellers ‘in the know’ have heard the old joke that if you want to stay at any type of hotel anywhere in the world and get a great rate, all you have to do is say that you work for IBM.

The point is that my line of business requires travel, and sometimes that is a lot of the time, like say almost all of last year. Inevitable comparisons to George Clooney’s character in Up in the Air were made (ironically I started to read that book, then left it on a plane in a seatback pocket), requests about favours involving duty free, and of course many observations and gently probing questions about frequent flier miles (FYI I’ve got more than most people, but a lot less than the entrepreneur I sat next to one time, who claimed to have close to 3 million).

But I digress.

Background

The point is that, as I said, I spent quite a bit of time travelling for work last year. Apparently the story with frequent fliers miles is that it’s best just to pick one airline and stick with it – and this also worked out well as most companies, including my employer, have preferred airlines and so you often don’t have much of a choice in the matter.

In my case this means flying Delta.

So I happened to notice in one of my many visits to Delta’s website that they have data on all of their aircraft in a certain site section. I thought this would be an interesting data set on which to do some analysis, as it has both quantitative and qualitative information and is relatively complex. What can we say about the different aircraft in Delta’s fleet, coming at it with ‘fresh eyes’? Which planes are similar? Which are dissimilar?

Aircraft data card from Delta.com

The data set comprises 33 variables on 44 aircraft taken from Delta.com, including both quantitative measures on attributes like cruising speed, accommodation and range in miles, as well as categorical data on, say, whether a particular aircraft has Wi-Fi or video. These binary categorical variables were transformed into quantitative variables by assigning them values of either 1 or 0, for yes or no respectively.

Analysis

As this a data set of many variables (33) I thought this would be an interesting opportunity to practice using a dimensionality reduction method to make the information easier to visualize and analyze.

First let’s just look at the intermediary quantitative variables related to the aircraft physical characteristics: cruising speed, total accommodation, and other quantities like length and wingspan. These variables are about in the middle of the data frame, so we can visualize all of them at once using a scatterplot matrix, which is the default for R’s output if plot() is called on a dataframe.

data <- read.csv(file="delta.csv", header=T, sep=",", row.names=1)

# scatterplot matrix of intermediary (size/non-categorical) variables
plot(data[,16:22])

We can see that there are pretty strong positive correlations between all these variables, as all of them are related to the aircraft’s overall size. Remarkably there is an almost perfectly linear relationship between wingspan and tail height, which perhaps is related to some principle of aeronautical engineering of which I am unaware.

The exception here is the variable right in the middle which is the number of engines. There is one lone outlier [Boeing 747-400 (74S)] which has four, while all the other aircraft have two. In this way the engines variable is really more like a categorical variable, but we shall as the analysis progresses that this is not really important, as there are other variables which more strongly discern the aircraft from one another than this.

How do we easier visualize a high-dimensional data set like this one? By using a dimensionality reduction technique like principal components analysis.

Principal Components Analysis

Next let’s say I know nothing about dimensionality reduction techniques and just naively apply principle components to the data in R:

# Naively apply principal components analysis to raw data and plot
pc <- princomp(data)
plot(pc)

Taking that approach we can see that the first principal component has a standard deviation of around 2200 and accounts for over 99.8% of the variance in the data. Looking at the first column of loadings, we see that the first principle component is just the range in miles.

# First component dominates greatly. What are the loadings?
summary(pc) # 1 component has > 99% variance
loadings(pc) # Can see all variance is in the range in miles

Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 2259.2372556 6.907940e+01 2.871764e+01 2.259929e+01
Proportion of Variance 0.9987016 9.337038e-04 1.613651e-04 9.993131e-05
Cumulative Proportion 0.9987016 9.996353e-01 9.997966e-01 9.998966e-01


Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
Seat.Width..Club. -0.144 -0.110
Seat.Pitch..Club. -0.327 -0.248 0.189
Seat..Club.
Seat.Width..First.Class. 0.250 -0.160 -0.156 0.136
Seat.Pitch..First.Class. 0.515 -0.110 -0.386 0.112 -0.130 0.183
Seats..First.Class. 0.258 -0.124 -0.307 -0.109 0.160 0.149
Seat.Width..Business. -0.154 0.142 -0.108
Seat.Pitch..Business. -0.514 0.446 -0.298 0.154 -0.172 0.379
Seats..Business. -0.225 0.187
Seat.Width..Eco.Comfort. 0.285 -0.224
Seat.Pitch..Eco.Comfort. 0.159 0.544 -0.442
Seats..Eco.Comfort. 0.200 -0.160
Seat.Width..Economy. 0.125 0.110
Seat.Pitch..Economy. 0.227 0.190 -0.130
Seats..Economy. 0.597 -0.136 0.345 -0.165 0.168
Accommodation 0.697 -0.104 0.233
Cruising.Speed..mph. 0.463 0.809 0.289 -0.144 0.115
Range..miles. 0.999
Engines
Wingspan..ft. 0.215 0.103 -0.316 -0.357 -0.466 -0.665
Tail.Height..ft. -0.100 -0.187
Length..ft. 0.275 0.118 -0.318 0.467 0.582 -0.418
Wifi
Video
Power
Satellite
Flat.bed
Sleeper
Club
First.Class
Business
Eco.Comfort

Economy

This is because the scale of the different variables in the data set is quite variable; we can see this by plotting the variance of the different columns in the data frame (regular scaling on the left, logarithmic on the right):

# verify by plotting variance of columns
mar <- par()$mar
par(mar=mar+c(0,5,0,0))
barplot(sapply(data, var), horiz=T, las=1, cex.names=0.8)
barplot(sapply(data, var), horiz=T, las=1, cex.names=0.8, log='x')
par(mar=mar)

We correct for this by scaling the data using the scale()function. We can then verify that the variances across the different variables are equal so that when we apply principal components one variable does not dominate.

# Scale
data2 <- data.frame(scale(data))
# Verify variance is uniform
plot(sapply(data2, var))

After applying the scale() function the variance is now constant across variables

Now we can apply principal components to the scaled data. Note that this can also be done automatically in call to theprcomp() function by setting the parameter scale=TRUE. Now we see a result which is more along the lines of something we would expect:

# Proceed with principal components
pc <- princomp(data2)
plot(pc)
plot(pc, type='l')
summary(pc) # 4 components is both 'elbow' and explains >85% variance

Great, so now we’re in business. There are various rules of thumb for selecting the number of principal components to retain in an analysis of this type, two of which I’ve read about are:

Pick the number of components which explain 85% or greater of the variation
Use the ‘elbow’ method of the scree plot (on right)

Here we are fortunate in that these two are the same, so we will retain the first four principal components. We put these into new data frame and plot.

# Get principal component vectors using prcomp instead of princomp
pc <- prcomp(data2)

# First for principal components
comp <- data.frame(pc$x[,1:4])
# Plot
plot(comp, pch=16, col=rgb(0,0,0,0.5))

So what were are looking at here are twelve 2-D projections of data which are in a 4-D space. You can see there’s a clear outlier in all the dimensions, as well as some bunching together in the different projections.

Normally, I am a staunch opponent of 3D visualization, as I’ve spoken strongly about previously . The one exception to this rule is when the visualization is interactive, which allows the user to explore the space and not lose meaning due to three dimensions being collapsed into a 2D image. Plus, in this particular case, it’s a good excuse to use the very cool, very awesome rgl package.

Click on the images to view the interactive 3D versions (requires a modern browser). You can better see in the 3D projections that the data are confined mainly to the one plane one the left (components 1-3), with the exception of the outlier, and that there is also bunching in the other dimensions (components 1,3,4 on right).

library(rgl)
# Multi 3D plot
plot3d(comp$PC1, comp$PC2, comp$PC3)
plot3d(comp$PC1, comp$PC3, comp$PC4)

So, now that we’ve simplified the complex data set into a lower dimensional space we can visualize and work with, how do we find patterns in the data, in our case, the aircraft which are most similar? We can use a simple unsupervised machine learning technique like clustering.

Cluster Analysis

Here because I’m not a data scientist extraordinaire, I’ll stick to the simplest technique and do a simple k-means – this is pretty straightforward to do in R.

First how do we determine the number of clusters? The simplest method is to look at the within groups sum of squares and pick the ‘elbow’ in the plot, similar to as with the scree plot we did for the PCA previously. Here I used the code from R in Action:

# Determine number of clusters
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(mydata,
                                     centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares")

However, it should be noted that it is very important to set the nstart parameter and iter.max parameter (I’ve found 25 and 1000, respectively to be okay values to use), which the example in Quick-R fails to do, otherwise you can get very different results each time you run the algorithm, as below.


Clustering without the nstart parameter can lead to variable results for each run


Clustering with the nstart and iter.max parameters leads to consistent results, allowing proper interpretation of the scree plot

So here we can see that the “elbow” in the scree plot is at k=4, so we apply the k-means clustering function with k = 4 and plot.

# From scree plot elbow occurs at k = 4
# Apply k-means with k=4
k <- kmeans(comp, 4, nstart=25, iter.max=1000)
library(RColorBrewer)
library(scales)
palette(alpha(brewer.pal(9,'Set1'), 0.5))
plot(comp, col=k$clust, pch=16)

We can see that the one outlier is in its own cluster, there’s 3 or 4 in the other and the remainder are split into two clusters of greater size. We visualize in 3D below, as before (click for interactive versions):

# 3D plot
plot3d(comp$PC1, comp$PC2, comp$PC3, col=k$clust)
plot3d(comp$PC1, comp$PC3, comp$PC4, col=k$clust)

We look at the exact clusters below, in order of increasing size:

# Cluster sizes
sort(table(k$clust))
clust <- names(sort(table(k$clust)))

# First cluster
row.names(data[k$clust==clust[1],])
# Second Cluster
row.names(data[k$clust==clust[2],])
# Third Cluster
row.names(data[k$clust==clust[3],])
# Fourth Cluster
row.names(data[k$clust==clust[4],])

[1] “Airbus A319 VIP”

[1] “CRJ 100/200 Pinnacle/SkyWest” “CRJ 100/200 ExpressJet”
[3] “E120″ “ERJ-145″

[1] “Airbus A330-200″ “Airbus A330-200 (3L2)”
[3] “Airbus A330-200 (3L3)” “Airbus A330-300″
[5] “Boeing 747-400 (74S)” “Boeing 757-200 (75E)”
[7] “Boeing 757-200 (75X)” “Boeing 767-300 (76G)”
[9] “Boeing 767-300 (76L)” “Boeing 767-300 (76T)”
[11] “Boeing 767-300 (76Z V.1)” “Boeing 767-300 (76Z V.2)”
[13] “Boeing 767-400 (76D)” “Boeing 777-200ER”
[15] “Boeing 777-200LR”

[1] “Airbus A319″ “Airbus A320″ “Airbus A320 32-R”
[4] “Boeing 717″ “Boeing 737-700 (73W)” “Boeing 737-800 (738)”
[7] “Boeing 737-800 (73H)” “Boeing 737-900ER (739)” “Boeing 757-200 (75A)”
[10] “Boeing 757-200 (75M)” “Boeing 757-200 (75N)” “Boeing 757-200 (757)”
[13] “Boeing 757-200 (75V)” “Boeing 757-300″ “Boeing 767-300 (76P)”
[16] “Boeing 767-300 (76Q)” “Boeing 767-300 (76U)” “CRJ 700″
[19] “CRJ 900″ “E170″ “E175″
[22] “MD-88″ “MD-90″ “MD-DC9-50″

The first cluster contains a single aircraft, the Airbus A319 VIP. This plane is on its own and rightly so – it is not part of Delta’s regular fleet but one of Airbus’ corporate jets. This is a plane for people with money, for private charter. It includes “club seats” around tables for working (or not). Below is a picture of the inside of the A319 VIP:

Ahhh, that’s the way fly (some day, some day…). This is apparently the plane professional sports teams and the American military often charter to fly – this article in the Sydney Morning Herald has more details.

The second cluster contains four aircraft – the two CRJ 100/200’s and the Embraer E120 and ERJ-145. These are the smallest passenger aircraft, with the smallest accommodations – 28 for the E120 and 50 for the remaining craft. As such, there is only economy seating in these planes which is what distinguishes them from the remainder of the fleet. The E120 also has the distinction of being the only plane in the fleet with turboprops. Photos below.

Top: CRJ100/200. Bottom left: Embraer E120. Bottom right: Embraer ERJ-145.

I’ve flown many times in the venerable CRJ 100/200 series planes, in which I can assure you there is only economy seating, and which I like to affectionately refer to as “little metal tubes of suffering.”

The other two clusters comprise the remainder of the fleet, the planes with which most commercial air travellers are familiar – your Boeing 7-whatever-7’s and other Airbus and McDonnell-Douglas planes.

These are split into two clusters, which seem to again divide the planes approximately by size (both physical and accommodation), though there is crossover in the Boeing craft.

# Compare accommodation by cluster in boxplot
boxplot(data$Accommodation ~ k$cluster,
        xlab='Cluster', ylab='Accommodation',
        main='Plane Accommodation by Cluster')

# Compare presence of seat classes in largest clusters
data[k$clust==clust[3],30:33]
data[k$clust==clust[4],30:33]

	First.Class	Business	Eco.Comfort	Economy
Airbus A330-200	0	1	1	1
Airbus A330-200 (3L2)	0	1	1	1
Airbus A330-200 (3L3)	0	1	1	1
Airbus A330-300	0	1	1	1
Boeing 747-400 (74S)	0	1	1	1
Boeing 757-200 (75E)	0	1	1	1
Boeing 757-200 (75X)	0	1	1	1
Boeing 767-300 (76G)	0	1	1	1
Boeing 767-300 (76L)	0	1	1	1
Boeing 767-300 (76T)	0	1	1	1
Boeing 767-300 (76Z V.1)	0	1	1	1
Boeing 767-300 (76Z V.2)	0	1	1	1
Boeing 767-400 (76D)	0	1	1	1
Boeing 777-200ER	0	1	1	1
Boeing 777-200LR	0	1	1	1

	First.Class	Business	Eco.Comfort	Economy
Airbus A319	1	0	1	1
Airbus A320	1	0	1	1
Airbus A320 32-R	1	0	1	1
Boeing 717	1	0	1	1
Boeing 737-700 (73W)	1	0	1	1
Boeing 737-800 (738)	1	0	1	1
Boeing 737-800 (73H)	1	0	1	1
Boeing 737-900ER (739)	1	0	1	1
Boeing 757-200 (75A)	1	0	1	1
Boeing 757-200 (75M)	1	0	1	1
Boeing 757-200 (75N)	1	0	1	1
Boeing 757-200 (757)	1	0	1	1
Boeing 757-200 (75V)	1	0	1	1
Boeing 757-300	1	0	1	1
Boeing 767-300 (76P)	1	0	1	1
Boeing 767-300 (76Q)	1	0	1	1
Boeing 767-300 (76U)	0	1	1	1
CRJ 700	1	0	1	1
CRJ 900	1	0	1	1
E170	1	0	1	1
E175	1	0	1	1
MD-88	1	0	1	1
MD-90	1	0	1	1
MD-DC9-50	1	0	1	1

Looking at the raw data, the difference I can ascertain between the largest two clusters is that all the aircraft in the one have first class seating, whereas all the planes in the other have business class instead [the one exception being the Boeing 767-300 (76U)].

Conclusions

This was a little analysis which for me not only allowed me to explore my interest in commercial aircraft, but was also educational about finer points of what to look out for when using more advanced data science techniques like principal components, clustering and advanced visualization.

All in all, the techniques did a pretty admirable job in separating out the different type of aircraft into distinct categories. However I believe the way I structured the data may have biased it towards categorizing the aircraft by seating class, as that quality was replicated in the data set compared to other variables, being represented both in quantitative variables (seat pitch & width, number of seat in class) and categorical (class presence). So really the different seating classes where represented in triplicate within the data set compared to other variables, which is why the methods separated the aircraft in this way.

If I did this again, I would structure the data differently and see what relationships such analysis could draw out using only select parts of the data (e.g. aircraft measurements only). The interesting lesson here is that it when using techniques like dimensionality reduction and clustering it is not only important to be mindful of applying them correctly, but also what variables are in your data set and how they are represented.

For now I’ll just keep on flying, collecting the miles, and counting down the days until I finally get that seat in first class.

References & Resources

Delta Fleet at Delta.com

http://www.delta.com/content/www/en_US/traveling-with-us/airports-and-aircraft/Aircraft.html

Principal Components Analysis (Wikipedia):
http://en.wikipedia.org/wiki/Principal_components_analysis

The Little Book of R for Multivariate Analysis

http://little-book-of-r-for-multivariate-analysis.readthedocs.org/en/latest/src/multivariateanalysis.html

Quick R: Cluster Analysis

http://www.statmethods.net/advstats/cluster.html

Plane Luxury: how US sports stars fly (Syndney Morning Herald)

http://www.smh.com.au/travel/travel-planning/travel-news/plane-luxury-how-us-sports-stars-fly-20110106-19h6y.html

Code & Data on github
https://github.com/mylesmharrison/delta_PCA_kmeans

android系统selinux中添加新属性property 辉色投像
1.定位/android/system/sepolicy/private/property_contexts声明属性开头：persist.charge声明属性类型：u:object_r:system_prop:s0图12.定位到android/system/sepolicy/public/domain.te删除neverallow{domain-init}default_prop:property
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
用Python实现读取统计单词个数程序媛了了 python 游戏 java
完整实例代码：fromcollectionsimportCounterdefpythonit():danci={}withopen("pythonit.txt","r",encoding="utf-8")asf:foriinf:words=i.strip().split()forwordinwords:ifwordnotindanci:danci[word]=1else:danci[word]+=
光盘文件系统 (iso9660) 格式解析穷人小水滴光盘文件系统 iso9660 deno GNU/Linux javascript
越简单的系统,越可靠,越不容易出问题.光盘文件系统(iso9660)十分简单,只需不到200行代码,即可实现定位读取其中的文件.参考资料:https://wiki.osdev.org/ISO_9660相关文章:《光盘防水嘛?DVD+R刻录光盘泡水实验》https://blog.csdn.net/secext2022/article/details/140583910《光驱的内部结构及日常使用》ht
《跃迁》5/7-5组-橙子-张静12.16 静言物于
【便签5】【片段来源】《跃迁：成为高手的技术》第四章【R原文】一位客户咨询时抱怨：“这个我做不到。”我问他：“如果我请你现在出去裸奔，你能做到吗？”“这个我也做不到”“其实并不是做不到，而是不愿意做，或者不想承担裸奔的代价吧。你不是做不到，而是选择不去做。如果有一天你裸奔能救自己家人、孩子，也许就能做到了。”为什么要做这个区分？如果一个人经常和自己说“做不到”，他的能力范围会越来越小，会成为一个无
✔2848. 与车相交的点程序员小小聪力扣 leetcode
代码实现：方法一：哈希表#definefmax(a,b)((a)>(b)?(a):(b))intnumberOfPoints(int**nums,intnumsSize,int*numsColSize){inthash[101]={0};intmax=0;for(inti=0;i=x){j--;}if(i=nums[i][0]){r=r>nums[i][1]?r:nums[i][1];}else{
02-Cesium聚合分析EntityCluster完整代码 fxshy html css javascript
1.完整代码Document-->-->Cesium.Ion.defaultAccessToken='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJqdGkiOiJhZjZkZDAwZC1mNTFhLTRhOTEtOGExNi00MzRhNGIzMDdlNDQiLCJpZCI6MTA1MTUzLCJpYXQiOjE2NjA4MDg0Njd9.qajeJtc4-kp
【加密算法基础——RSA 加密】 XWWW668899 网络服务器笔记 python
RSA加密RSA（Rivest-Shamir-Adleman）加密是非对称加密，一种广泛使用的公钥加密算法，主要用于安全数据传输。公钥用于加密，私钥用于解密。RSA加密算法的名称来源于其三位发明者的姓氏：R:RonRivestS:AdiShamirA:LeonardAdleman这三位计算机科学家在1977年共同提出了这一算法，并发表了相关论文。他们的工作为公钥加密的基础奠定了重要基础，使得安全通
Acwing 区间合并 Curry_Math 算法学习算法 c++开发语言
区间合并主要思想：给定很多区间。若两个区间有交集，将二者合并成一个区间。具体做法:先按照区间的左端点进行排序然后遍历每个区间，根据不同的情况进行合并，有一下几种情况：第一种情况，区间不变；第二种情况，end更新为区间i的右端点；以上两种情况，可以归结为end更新为max（end，r）;r为区间右端点第三种情况，将当前维护的区间加入结果，并将维护的区间更新为区间i；下面给出区间合并的板子：//区间合
Android shell 常用 debug 命令晨春计 Audio debug android linux
目录1、查看版本2、am命令3、pm命令4、dumpsys命令5、sed命令6、log定位查看APK进程号7、log定位使用场景1、查看版本1.1、Android串口终端执行getpropro.build.version.release#获取Android版本uname-a#查看linux内核版本信息uname-r#单独查看内核版本1.2、linux服务器执行lsb_release-a#查看Lin
k8s中Service暴露的种类以及用法听说唐僧不吃肉 K8S kubernetes 容器云原生
一、说明在Kubernetes中，有几种不同的方式可以将服务（Service）暴露给外部流量。这些方式通过定义服务的spec.type字段来确定。二、详解1.ClusterIP定义：默认类型，服务只能在集群内部访问。作用：通过集群内部IP地址暴露服务。示例：spec:type:ClusterIPports:-port:80targetPo
Windows安装ciphey编码工具，附一道ciscn编码题例 im-Miclelson CTF工具网络安全
TA是什么一款智能化的编码分析解码工具，对于CTF中复杂性编码类题目可以快速攻破。编码自动分析解码的神器。如何安装Windows环境Python3.864位（最新的版本不兼容，32位的也不行）PIP直接安装pipinstallciphey-ihttps://pypi.mirrors.ustc.edu.cn/simple/安装后若是出现报错请根据错误代码行数找到对应文件，r修改成rb即可。使用标准语
linux简单安装gcc和gdb chn-zgq Linux linux ubuntu
linux安装gcc以及环境配置和gdb安装gcc-10.0添加源:sudoadd-apt-repositoryppa:ubuntu-toolchain-r/ppa更新源:sudoaptupdate下载gcc:sudoaptinstallgcc-10g++-10默认GCC版本设置为gcc-10.0:sudoupdate-alternatives--install/usr/bin/gccgcc/us
梧桐数据库（WuTongDB）：数据库技术中都有哪些常见的优化器鲁鲁517 梧桐数据库梧桐数据库
以下是一些常见的数据库优化器：1.CBO（Cost-BasedOptimizer）应用场景：广泛应用于关系型数据库中，如Oracle、PostgreSQL、MySQL等。工作原理：通过计算不同执行计划的代价（如CPU、I/O等资源消耗），选择最低代价的执行计划。代表数据库：Oracle、PostgreSQL、MySQL。特点：CBO使用统计信息（如表大小、索引分布）来评估查询的代价。2.RBO（R
【机器人建模和控制】读书笔记 Piccab0o 机器人
机器人建模和控制——马克·斯庞A.x10=x1∙x0x^0_1=x_1\bulletx_0x10=x1∙x0，其实就是：1）x1x_1x1轴向量在O0O_0O0系下的坐标2）在x0x_0x0轴上的投影3）坐标变换矩阵的R10R_1^0R10的第一个元素B.点p在o1x1y1z1o_1x_1y_1z_1o1x1y1z1系下的坐标p1p^1p1可以表示为：p=ux1+vy1+wz1p=ux_1+vy_
Python和R均方根误差平均绝对误差算法模型亚图跨际 Python 交叉知识 R 回归模型误差指标归一化均方根误差生态状态指标神经网络成本误差气体排放气候模型多项式拟合
要点回归模型误差评估指标归一化均方根误差生态状态指标神经网络成本误差计算气体排放气候算法模型Python误差指标均方根误差和平均绝对误差均方根偏差或均方根误差是两个密切相关且经常使用的度量值之一，用于衡量真实值或预测值与观测值或估计值之间的差异。估计器θ^\hat{\theta}θ^相对于估计参数θ\thetaθ的RMSD定义为均方误差的平方根：RMSD⁡(θ^)=MSE⁡(θ^)=E((θ^−θ
PCIe进阶之TL：Common Packet Header Fields & TLPs with Data Payloads Rules 芯芯之火，可以燎原 PCIe进阶 PCIe进阶硬件工程信息与通信
1TransactionLayerProtocol-PacketDefinitionTLP有四种事务类型：Memory、I/O、Configuration和Messages，两种地址格式：32bit和64bit。构成TLP时，所有标记为Reserved的字段（有时缩写为R）都必须全为0。接收者Rx必须忽略此字段中的值，PCIeSwitch必须对其进行原封不动的转发。请注意，对于某些字段，既有指定值
python下载pandas库镜像_下载pandas库 weixin_39791152
背景交代：在下载matplotlib库时，我已经将pip的下载源手动更改为清华的镜像，所以，如果有小伙伴在下载库遇到问题，如timeout，请先将下载源改为国内镜像，具体操作见我的另一篇文章：今天的主题是安装pandas库~首先，按田字格+R，打开cmd，输入：pipinstallpandas嗯，不出所料地报错了……主要原因：pip._vendor.urllib3.exceptions.ReadT
用了这么多年的PCA可视化竟然是错的！！！生信宝典
本文启发于上周开的单细胞转录组课程，本次课程由资深单细胞算法研究者戴老师主讲，深入浅出，各部分分析原理从理论到应用层面解释透彻，最新流程，最新代码，绝对值得学习。课程尚未结束，我就迫不及待向一位未能安排出时间参加此课程的老友及时安利了视频课。言归正传，介绍培训课程的一张幻灯片：很多PCA可视化结果都是不合适的。PCA或PCoA是常用的降维工具，之前有几篇文章介绍PCA的原理和可视化。一文看懂PCA
FlexibleBI系统是现代制造企业提升生产质量和效率的重要工具三坐标CMM质量数据系统制造
SPC（统计过程控制）系统是现代制造企业提升生产质量和效率的重要工具。我们的SPC系统通过一键生成全面的SPC分析报告，帮助企业快速、精准地完成质量分析，并大大减少了手动处理数据的复杂性。FlexibleBI实时更新的控制图在生产过程中，控制图可以实时自动更新，确保企业能够随时掌握生产状态，及时发现并处理潜在问题。系统支持多种标准SPC控制图，如X-bar、R、P等图表，全面覆盖所有常见生产场景。
ResNet的半监督和半弱监督模型 Valar_Morghulis
Billion-scalesemi-supervisedlearningforimageclassificationhttps://arxiv.org/pdf/1905.00546.pdfhttps://github.com/facebookresearch/semi-supervised-ImageNet1K-models/权重在timm中也有：https://hub.fastgit.org/r
node初奶瓶SAMA
www.nodejs.org下载nodejs的安装文件,然后就直接下一步，下一步，下一步傻瓜式安装（打开命令符widow+r输入cmd）node-v查单当前node的版本号安装nodejs时，会自动安装npm包管理工具npm-v查看npm的版本可以直接在黑窗口中输入node然后点击回车以后，就可以输入javascripnt的代码了既然在浏览器鼠标右键中console和在黑窗口中输入node点击回车
ros2中使用launch.xml启动时，怎么在命令行里设置参数，或者加载参数文件（params.yaml） code . Autoware 自动驾驶 ROS2 xml Ros2 自动驾驶机器人
在ROS2中使用launch.xml启动时，可以通过命令行设置参数或加载参数文件（如params.yaml）。以下是具体的方法：1.在命令行中设置参数你可以在运行ros2launch命令时直接设置参数，使用key:=value的语法。例如：ros2launchparam_name:=param_value例如，如果你有一个参数background_r，你可以这样设置：ros2launchmy_pa
【机器学习与R语言】1-机器学习简介苹果酱0567 面试题汇总与解析 java 中间件开发语言 spring boot 后端
1.基本概念机器学习：发明算法将数据转化为智能行为数据挖掘VS机器学习：前者侧重寻找有价值的信息，后者侧重执行已知的任务。后者是前者的先期准备过程：数据——>抽象化——>一般化。或者：收集数据——推理数据——归纳数据——发现规律抽象化：训练：用一个特定模型来拟合数据集的过程用方程来拟合观测的数据：观测现象——数据呈现——模型建立。通过不同的格式来把信息概念化一般化：一般化：将抽象化的知识转换成可用
商业预测初识R hongyanwin r语言预测
1.打开帮助文档首页，查阅其中的“IntroductiontoR”helpRhelp2.安装vcd包install.packages("vcd")3.列出此包中可用的函数和数据集ls("package:vcd")/data(package="vcd")4.载入包并阅读数据集Arthritis的描述library("v.d")/?Arthritis5.显示数据集Arthritis的内容查看数据集结构
【YashanDB知识库】YashanDB 开机自启 YashanDB YashanDB知识库数据库数据库系统崖山数据库 YashanDB oracle
【问题分类】YashanDB开机自启【关键字】开机自启，依赖包【问题描述】数据库所在服务器重启后只拉起monit、yasom、yasom进程，缺少yasdb进程：【问题原因分析】数据库安装的时候未启动守护进程【解决/规避方法】进入数据库之前的安装目录，启动守护进程：Shellcd/home/yashan/install./bin/yasbootmonitstart--clusteryashandb
【NLP5-RNN模型、LSTM模型和GRU模型】一蓑烟雨紫洛 nlp rnn lstm gru nlp
RNN模型、LSTM模型和GRU模型1、什么是RNN模型RNN（RecurrentNeuralNetwork)中文称为循环神经网络，它一般以序列数据为输入，通过网络内部的结构设计有效捕捉序列之间的关系特征，一般也是以序列形式进行输出RNN的循环机制使模型隐层上一时间步产生的结果，能够作为当下时间步输入的一部分（当下时间步的输入除了正常的输入外还包括上一步的隐层输出）对当下时间步的输出产生影响2、R
2024上半年软考系统架构设计师-综合知识选择题及答案不对法系统架构
1.操作系统先来先服务调度算法2.操作系统多道程序设计，利用率3.操作系统状态流转错误的，执行态到运行态4.数据库2NF每一个非主属性完全依赖主键5.数据库笛卡尔积m*n6.数据库不属于事务的特点，并发性7.数据库交集表达式R-(R-S)8.数据库反规范化属于逻辑设计9.网络没有加密功能，物理层10.网络二层交换机数据，数据链路层11.知识产权专利法是否属于民法12.知识产权商标不属于，其他几个是
python 判断 ‘NoneType’的方法 cuisidong1997 文本转换 python
的错误时说明需要进行判断，而对‘NoneType’进行判断时直接使用‘isNone’即可，如下：iftextisNone:print('testis’+None)else:print('testisnot’+None)a=re.match(r’主叫号码(.*)客户姓名’,r’2、主叫号码：15558191990;3、客户姓名：韩东远;')print(type(a))ifaisNone:print(
R 数据可视化 —— 韦恩图名本无名
前言对于数据集之间交叠关系的可视化，通常想到的是绘制韦恩图。韦恩图是一种关系型图表，通过图形之间的重叠来反映数据集之间的相交关系。下面，我们来简单介绍一下如何绘制韦恩图韦恩图绘制韦恩图的包有很多，比如gplots包的venn()函数、limma包的vennDiagram()函数、venneuler包的venneuler()函数。但是这些包绘制出来的图像效果都不是很好，所以我们使用比较成熟的包Ven
java线程Thread和Runnable区别和联系 zx_code java jvm thread 多线程 Runnable
我们都晓得java实现线程2种方式，一个是继承Thread，另一个是实现Runnable。模拟窗口买票，第一例子继承thread，代码如下 package thread; public class ThreadTest { public static void main(String[] args) { Thread1 t1 = new Thread1(
【转】JSON与XML的区别比较丁_新 json xml
1.定义介绍 (1).XML定义扩展标记语言 (Extensible Markup Language, XML) ，用于标记电子文件使其具有结构性的标记语言，可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。 XML使用DTD(document type definition)文档类型定义来组织数据;格式统一，跨平台和语言，早已成为业界公认的标准。 XML是标
c++ 实现五种基础的排序算法 CrazyMizzz C++c 算法
#include<iostream> using namespace std; //辅助函数，交换两数之值 template<class T> void mySwap(T &x, T &y){ T temp = x; x = y; y = temp; } const int size = 10; //一、用直接插入排
我的软件麦田的设计者我的软件音乐类娱乐放松
这是我写的一款app软件，耗时三个月，是一个根据央视节目开门大吉改变的，提供音调，猜歌曲名。1、手机拥有者在android手机市场下载本APP，同意权限，安装到手机上。2、游客初次进入时会有引导页面提醒用户注册。（同时软件自动播放背景音乐）。3、用户登录到主页后，会有五个模块。a、点击不胫而走，用户得到开门大吉首页部分新闻，点击进入有新闻详情。b、
linux awk命令详解被触发 linux awk
awk是行处理器: 相比较屏幕处理的优点，在处理庞大文件时不会出现内存溢出或是处理缓慢的问题，通常用来格式化文本信息 awk处理过程: 依次对每一行进行处理，然后输出 awk命令形式: awk [-F|-f|-v] ‘BEGIN{} //{command1; command2} END{}’ file [-F|-f|-v]大参数，-F指定分隔符，-f调用脚本，-v定义变量 var=val
各种语言比较 _wy_ 编程语言
Java Ruby PHP 擅长领域
oracle 中数据类型为clob的编辑知了ing oracle clob
public void updateKpiStatus(String kpiStatus,String taskId){ Connection dbc=null; Statement stmt=null; PreparedStatement ps=null; try { dbc = new DBConn().getNewConnection(); //stmt = db
分布式服务框架 Zookeeper -- 管理分布式环境中的数据矮蛋蛋 zookeeper
原文地址： http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/ 安装和配置详解本文介绍的 Zookeeper 是以 3.2.2 这个稳定版本为基础，最新的版本可以通过官网 http://hadoop.apache.org/zookeeper/来获取，Zookeeper 的安装非常简单，下面将从单机模式和集群模式两
tomcat数据源 alafqq tomcat
数据库 JNDI(Java Naming and Directory Interface，Java命名和目录接口)是一组在Java应用中访问命名和目录服务的API。没有使用JNDI时我用要这样连接数据库： 03. Class.forName("com.mysql.jdbc.Driver"); 04. conn
遍历的方法百合不是茶遍历
遍历在java的泛
linux查看硬件信息的命令 bijian1013 linux
linux查看硬件信息的命令一.查看CPU： cat /proc/cpuinfo 二.查看内存： free 三.查看硬盘： df linux下查看硬件信息 1、lspci 列出所有PCI 设备； lspci - list all PCI devices:列出机器中的PCI设备（声卡、显卡、Modem、网卡、USB、主板集成设备也能
java常见的ClassNotFoundException bijian1013 java
1.java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory 添加包common-logging.jar2.java.lang.ClassNotFoundException: javax.transaction.Synchronization
【Gson五】日期对象的序列化和反序列化 bit1129 反序列化
对日期类型的数据进行序列化和反序列化时，需要考虑如下问题： 1. 序列化时，Date对象序列化的字符串日期格式如何 2. 反序列化时，把日期字符串序列化为Date对象，也需要考虑日期格式问题 3. Date A -> str -> Date B,A和B对象是否equals 默认序列化和反序列化 import com
【Spark八十六】Spark Streaming之DStream vs. InputDStream bit1129 Stream
1. DStream的类说明文档： /** * A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous * sequence of RDDs (of the same type) representing a continuous st
通过nginx获取header信息 ronin47 nginx header
1. 提取整个的Cookies内容到一个变量，然后可以在需要时引用，比如记录到日志里面， if ( $http_cookie ~* "(.*)$") { set $all_cookie $1; } 变量$all_cookie就获得了cookie的值，可以用于运算了
java-65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 bylijinnan java
参考了网上的http://blog.csdn.net/peasking_dd/article/details/6342984 写了个java版的： public class Print_1_To_NDigit { /** * Q65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 * 1.使用字符串
Netty源码学习-ReplayingDecoder bylijinnan java netty
ReplayingDecoder是FrameDecoder的子类，不熟悉FrameDecoder的，可以先看看 http://bylijinnan.iteye.com/blog/1982618 API说，ReplayingDecoder简化了操作，比如： FrameDecoder在decode时，需要判断数据是否接收完全： public class IntegerH
js特殊字符过滤 cngolon js特殊字符 js特殊字符过滤
1.js中用正则表达式过滤特殊字符, 校验所有输入域是否含有特殊符号function stripscript(s) { var pattern = new RegExp("[`~!@#$^&*()=|{}':;',\\[\\].<>/?~！@#￥……&*（）——|{}【】‘；：”“'。，、？]"
hibernate使用sql查询 ctrain Hibernate
import java.util.Iterator; import java.util.List; import java.util.Map; import org.hibernate.Hibernate; import org.hibernate.SQLQuery; import org.hibernate.Session; import org.hibernate.Transa
linux shell脚本中切换用户执行命令方法 daizj linux shell 命令切换用户
经常在写shell脚本时，会碰到要以另外一个用户来执行相关命令，其方法简单记下： 1、执行单个命令：su - user -c "command" 如：下面命令是以test用户在/data目录下创建test123目录 [root@slave19 /data]# su - test -c "mkdir /data/test123"
好的代码里只要一个 return 语句 dcj3sjt126com return
别再这样写了：public boolean foo() { if (true) { return true; } else { return false;
Android动画效果学习 dcj3sjt126com android
1、透明动画效果方法一：代码实现 public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) { View rootView = inflater.inflate(R.layout.fragment_main, container, fals
linux复习笔记之bash shell (4)管道命令 eksliang linux管道命令汇总 linux管道命令 linux常用管道命令
转载请出自出处： http://eksliang.iteye.com/blog/2105461 bash命令执行的完毕以后，通常这个命令都会有返回结果，怎么对这个返回的结果做一些操作呢？那就得用管道命令‘|’。上面那段话，简单说了下管道命令的作用，那什么事管道命令呢？答：非常的经典的一句话，记住了，何为管
Android系统中自定义按键的短按、双击、长按事件 gqdy365 android
在项目中碰到这样的问题：由于系统中的按键在底层做了重新定义或者新增了按键，此时需要在APP层对按键事件（keyevent）做分解处理，模拟Android系统做法，把keyevent分解成： 1、单击事件：就是普通key的单击； 2、双击事件：500ms内同一按键单击两次； 3、长按事件：同一按键长按超过1000ms（系统中长按事件为500ms）； 4、组合按键：两个以上按键同时按住；
asp.net获取站点根目录下子目录的名称 hvt .net C#asp.net hovertree Web Forms
使用Visual Studio建立一个.aspx文件(Web Forms)，例如hovertree.aspx,在页面上加入一个ListBox代码如下： <asp:ListBox runat="server" ID="lbKeleyiFolder" /> 那么在页面上显示根目录子文件夹的代码如下： string[] m_sub
Eclipse程序员要掌握的常用快捷键 justjavac java eclipse 快捷键 ide
判断一个人的编程水平，就看他用键盘多，还是鼠标多。用键盘一是为了输入代码（当然了，也包括注释），再有就是熟练使用快捷键。曾有人在豆瓣评《卓有成效的程序员》：“人有多大懒，才有多大闲”。之前我整理了一个程序员图书列表，目的也就是通过读书，让程序员变懒。写道程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可
c++编程随记 lx.asymmetric C++笔记
为了字体更好看，改变了格式…… &&运算符： #include<iostream> using namespace std; int main(){ int a=-1,b=4,k; k=(++a<0)&&!(b--
linux标准IO缓冲机制研究音频数据 linux
一、什么是缓存I/O(Buffered I/O)缓存I/O又被称作标准I/O,大多数文件系统默认I/O操作都是缓存I/O。在Linux的缓存I/O机制中，操作系统会将I/O的数据缓存在文件系统的页缓存(page cache)中，也就是说，数据会先被拷贝到操作系统内核的缓冲区中，然后才会从操作系统内核的缓冲区拷贝到应用程序的地址空间。1.缓存I/O有以下优点:A.缓存I/O使用了操作系统内核缓冲区，
随想生活暗黑小菠萝生活
其实账户之前就申请了，但是决定要自己更新一些东西看也是最近。从毕业到现在已经一年了。没有进步是假的，但是有多大的进步可能只有我自己知道。毕业的时候班里12个女生，真正最后做到软件开发的只要两个包括我，PS：我不是说测试不好。当时因为考研完全放弃找工作，考研失败，我想这只是我的借口。那个时候才想到为什么大学的时候不能好好的学习技术，增强自己的实战能力，以至于后来找工作比较费劲。我
我认为POJO是一个错误的概念 windshome java POJO 编程 J2EE 设计
这篇内容其实没有经过太多的深思熟虑，只是个人一时的感觉。从个人风格上来讲，我倾向简单质朴的设计开发理念；从方法论上，我更加倾向自顶向下的设计；从做事情的目标上来看，我追求质量优先，更愿意使用较为保守和稳妥的理念和方法。 &