Kaggle Bike Sharing Demand Prediction – How I got in top 5 percentile of participants?

Introduction

There are three types of people who take part in a Kaggle Competition:

Type 1: Who are experts in machine learning and their motivation is to compete with the best data scientists across the globe. They aim to achieve the highest accuracy

Type 2: Who aren’t experts exactly, but participate to get better at machine learning. These people aim to learn from the experts and the discussions happening and hope to become better with time.

Type 3: Who are new to data science and still choose to participate and gain experience of solving a data science problem.

If you think you fall in Type 2 and Type 3, go ahead and check how I got close to rank 150. I would strongly recommend you to type out the code and follow the article as you go. This will help you develop your data science muscles and they will be in better shape in the next challenge. The more you practice, the faster you’ll learn.

And if you are a Type 1 player, please feel free to drop your approach applied in this competition in the comments section below. I would like to learn from you!

Kaggle Bike Sharing Competition went live for 366 days and ended on 29th May 2015. My efforts would have been incomplete, had I not been supported by Aditya Sharma, IIT Guwahati (doing internship at Analytics Vidhya) in solving this competition.

Before you start – warming up to participate in Kaggle Competition

Here’s a quick approach to solve any Kaggle competition:

Acquire basic data science skills (Statistics + Basic Algorithms)
Get friendly with 7 steps of Data Exploration
Become proficient with any one of the language Python, R or SAS (or the tool of your choice).
Identify the right competition first according to your skills. Here’s a good read: Kaggle Competitions: How and where to begin?

Kaggle Bike Sharing Demand Challenge

In Kaggle knowledge competition – Bike Sharing Demand, the participants are asked to forecast bike rental demand of Bike sharing program in Washington, D.C based on historical usage patterns in relation with weather, time and other data.

Using these Bike Sharing systems, people rent a bike from one location and return it to a different or same place on need basis. People can rent a bike through membership (mostly regular users) or on demand basis (mostly casual users). This process is controlled by a network of automated kiosk across the city.

Solution

Here is the step by step solution of this competition:

Step 1. Hypothesis Generation

Before exploring the data to understand the relationship between variables, I’d recommend you to focus on hypothesis generation first. Now, this might sound counter-intuitive for solving a data science problem, but if there is one thing I have learnt over years, it is this. Before exploring data, you should spend some time thinking about the business problem, gaining the domain knowledge and may be gaining first hand experience of the problem (only if I could travel to North America!)

How does it help? This practice usually helps you form better features later on, which are not biased by the data available in the dataset. At this stage, you are expected to posses structured thinking i.e. a thinking process which takes into consideration all the possible aspects of a particular problem.

Here are some of the hypothesis which I thought could influence the demand of bikes:

Hourly trend: There must be high demand during office timings. Early morning and late evening can have different trend (cyclist) and low demand during 10:00 pm to 4:00 am.
Daily Trend: Registered users demand more bike on weekdays as compared to weekend or holiday.
Rain: The demand of bikes will be lower on a rainy day as compared to a sunny day. Similarly, higher humidity will cause to lower the demand and vice versa.
Temperature: In India, temperature has negative correlation with bike demand. But, after looking at Washington’s temperature graph, I presume it may have positive correlation.
Pollution: If the pollution level in a city starts soaring, people may start using Bike (it may be influenced by government / company policies or increased awareness).
Time: Total demand should have higher contribution of registered user as compared to casual because registered user base would increase over time.
Traffic: It can be positively correlated with Bike demand. Higher traffic may force people to use bike as compared to other road transport medium like car, taxi etc

2. Understanding the Data Set

The dataset shows hourly rental data for two years (2011 and 2012). The training data set is for the first 19 days of each month. The test dataset is from 20th day to month’s end. We are required to predict the total count of bikes rented during each hour covered by the test set.

In the training data set, they have separately given bike demand by registered, casual users and sum of both is given as count.

Training data set has 12 variables (see below) and Test has 9 (excluding registered, casual and count).

Independent Variables

datetime:   date and hour in "mm/dd/yyyy hh:mm" format
season:     Four categories-> 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday:    whether the day is a holiday or not (1/0)
workingday: whether the day is neither a weekend nor holiday (1/0)
weather:    Four Categories of weather
            1-> Clear, Few clouds, Partly cloudy, Partly cloudy
            2-> Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
            3-> Light Snow and Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
            4-> Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp:       hourly temperature in Celsius
atemp:      "feels like" temperature in Celsius
humidity:   relative humidity
windspeed:  wind speed

Dependent Variables

registered: number of registered user
casual:     number of non-registered user
count:      number of total rentals (registered + casual)

3. Importing Data set and Basic Data Exploration

For this solution, I have used R (R Studio 0.99.442) in Windows Environment.

Below are the steps to import and perform data exploration. If you are new to this concept, you can refer this guide on Data Exploration in R

Import Train and Test Data Set

setwd("E:/kaggle data/bike sharing")
train=read.csv("train_bike.csv")
test=read.csv("test_bike.csv")

Combine both Train and Test Data set (to understand the distribution of independent variable together).
```
test$registered=0
test$casual=0
test$count=0
data=rbind(train,test)
```
Before combing test and train data set, I have made the structure similar for both.

Variable Type Identification

str(data)
'data.frame':  17379 obs. of  12 variables:
$ datetime  : Factor w/ 17379 levels "2011-01-01 00:00:00",..: 1 2 3 4 5 6 7 8 9 10 ...
$ season    : int  1 1 1 1 1 1 1 1 1 1 ...
$ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
$ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
$ weather   : int  1 1 1 1 1 2 1 1 1 1 ...
$ temp      : num  9.84 9.02 9.02 9.84 9.84 ...
$ atemp     : num  14.4 13.6 13.6 14.4 14.4 ...
$ humidity  : int  81 80 80 75 75 75 80 86 75 76 ...
$ windspeed : num  0 0 0 0 0 ...
$ casual    : num  3 8 5 3 0 0 2 1 1 8 ...
$ registered: num  13 32 27 10 1 1 0 2 7 6 ...
$ count     : num  16 40 32 13 1 1 2 3 8 14 ...

Find missing values in data set if any.
```
table(is.na(data))

FALSE
208548
```
Above you can see that it has returned no missing values in the data frame.
Understand the distribution of numerical variables and generate a frequency table for numeric variables. Now, I’ll test and plot a histogram for each numerical variables and analyze the distribution.
```
par(mfrow=c(4,2))
par(mar = rep(2, 4))
hist(data$season)
hist(data$weather)
hist(data$humidity)
hist(data$holiday)
hist(data$workingday)
hist(data$temp)
hist(data$atemp)
hist(data$windspeed)
```
Few inferences can be drawn by looking at the these histograms:
- Season has four categories of almost equal distribution
- Weather 1 has higher contribution i.e. mostly clear weather.
```
prop.table(table(data$weather))
1     2      3      4
0.66  0.26   0.08   0.00
```
- As expected, mostly working days and variable holiday is also showing a similar inference. You can use the code above to look at the distribution in detail. Here you can generate a variable for weekday using holiday and working day. Incase, if both have zero values, then it must be a working day.
- Variables temp, atemp, humidity and windspeed looks naturally distributed.

Convert discrete variables into factor (season, weather, holiday, workingday)

data$season=as.factor(data$season)
data$weather=as.factor(data$weather)
data$holiday=as.factor(data$holiday)
data$workingday=as.factor(data$workingday)

4. Hypothesis Testing (using multivariate analysis)

Till now, we have got a fair understanding of the data set. Now, let’s test the hypothesis which we had generated earlier. Here I have added some additional hypothesis from the dataset. Let’s test them one by one:

Hourly trend: We don’t have the variable ‘hour’ with us right now. But we can extract it using the datetime column.
```
data$hour=substr(data$datetime,12,13)
data$hour=as.factor(data$hour)
```
Let’s plot the hourly trend of count over hours and check if our hypothesis is correct or not. We will separate train and test data set from combined one.
```
train=data[as.integer(substr(data$datetime,9,10))<20,]
test=data[as.integer(substr(data$datetime,9,10))>19,]

boxplot(train$count~train$hour,xlab="hour", ylab="count of users")
```
Above, you can see the trend of bike demand over hours. Quickly, I’ll segregate the bike demand in three categories:
- High : 7-9 and 17-19 hours
- Average : 10-16 hours
- Low : 0-6 and 20-24 hours
Here I have analyzed the distribution of total bike demand. Let’s look at the distribution of registered and casual users separately.Above you can see that registered users have similar trend as count. Whereas, casual users have different trend. Thus, we can say that ‘hour’ is significant variable and our hypothesis is ‘true’.

You might have noticed that there are a lot of outliers while plotting the count of registered and casual users. These values are not generated due to error, so we consider them as natural outliers. They might be a result of groups of people taking up cycling (who are not registered). To treat such outliers, we will use logarithm transformation. Let’s look at the similar plot after log transformation.
```
 boxplot(log(train$count)~train$hour,xlab="hour",ylab="log(count)")
```
Daily Trend: Like Hour, we will generate a variable for day from datetime variable and after that we’ll plot it.
```
date=substr(data$datetime,1,10)
days<-weekdays(as.Date(date))
data$day=days
```
Plot shows registered and casual users’ demand over days.
While looking at the plot, I can say that the demand of causal users increases over weekend.
Rain: We don’t have the ‘rain’ variable with us but have ‘weather’ which is sufficient to test our hypothesis. As per variable description, weather 3 represents light rain and weather 4 represents heavy rain. Take a look at the plot:It is clearly satisfying our hypothesis.

Temperature, Windspeed and Humidity: These are continuous variables so we can look at the correlation factor to validate hypothesis.
```
sub=data.frame(train$registered,train$casual,train$count,train$temp,train$humidity,train$atemp,train$windspeed)
cor(sub)
```
Here are a few inferences you can draw by looking at the above histograms:
- Variable temp is positively correlated with dependent variables (casual is more compare to registered)
- Variable atemp is highly correlated with temp.
- Windspeed has lower correlation as compared to temp and humidity

Time: Let’s extract year of each observation from the datetime column and see the trend of bike demand over year.

data$year=substr(data$datetime,1,4)
data$year=as.factor(data$year)
train=data[as.integer(substr(data$datetime,9,10))<20,]
test=data[as.integer(substr(data$datetime,9,10))>19,]
boxplot(train$count~train$year,xlab="year", ylab="count")

You can see that 2012 has higher bike demand as compared to 2011.

Pollution & Traffic: We don’t have the variable related with these metrics in our data set so we cannot test this hypothesis.

5. Feature Engineering

In addition to existing independent variables, we will create new variables to improve the prediction power of model. Initially, you must have noticed that we generated new variables like hour, month, day and year.

Here we will create more variables, let’s look at the some of these:

Hour Bins: Initially, we have broadly categorize the hour into three categories. Let’s create bins for the hour variable separately for casual and registered users. Here we will use decision tree to find the accurate bins.

train$hour=as.integer(train$hour) # convert hour to integer
test$hour=as.integer(test$hour) # modifying in both train and test data set

We use the library rpart for decision tree algorithm.

library(rpart)
library(rattle) #these libraries will be used to get a good visual plot for the decision tree model. 
library(rpart.plot)
library(RColorBrewer)
d=rpart(registered~hour,data=train)
fancyRpartPlot(d)

Now, looking at the nodes we can create different hour bucket for registered users.

data=rbind(train,test)
data$dp_reg=0
data$dp_reg[data$hour<8]=1
data$dp_reg[data$hour>=22]=2
data$dp_reg[data$hour>9 & data$hour<18]=3
data$dp_reg[data$hour==8]=4
data$dp_reg[data$hour==9]=5
data$dp_reg[data$hour==20 | data$hour==21]=6
data$dp_reg[data$hour==19 | data$hour==18]=7

Similarly, we can create day_part for casual users also (dp_cas).

Temp Bins: Using similar methods, we have created bins for temperature for both registered and casuals users. Variables created are (temp_reg and temp_cas).

Year Bins: We had a hypothesis that bike demand will increase over time and we have proved it also. Here I have created 8 bins (quarterly) for two years. Jan-Mar 2011 as 1 …..Oct-Dec2012 as 8.

data$year_part[data$year=='2011']=1
data$year_part[data$year=='2011' & data$month>3]=2
data$year_part[data$year=='2011' & data$month>6]=3
data$year_part[data$year=='2011' & data$month>9]=4
data$year_part[data$year=='2012']=5
data$year_part[data$year=='2012' & data$month>3]=6
data$year_part[data$year=='2012' & data$month>6]=7
data$year_part[data$year=='2012' & data$month>9]=8
table(data$year_part)

Day Type: Created a variable having categories like “weekday”, “weekend” and “holiday”.

data$day_type=""
data$day_type[data$holiday==0 & data$workingday==0]="weekend"
data$day_type[data$holiday==1]="holiday"
data$day_type[data$holiday==0 & data$workingday==1]="working day"

Weekend: Created a separate variable for weekend (0/1)

data$weekend=0
data$weekend[data$day=="Sunday" | data$day=="Saturday" ]=1

6. Model Building

As this was our first attempt, we applied decision tree, conditional inference tree and random forest algorithms and found that random forest is performing the best. You can also go with regression, boosted regression, neural network and find which one is working well for you.

Before executing the random forest model code, I have followed following steps:

Convert discrete variables into factor (weather, season, hour, holiday, working day, month, day)

train$hour=as.factor(train$hour)
test$hour=as.factor(test$hour)

As we know that dependent variables have natural outliers so we will predict log of dependent variables.
Predict bike demand registered and casual users separately.
y1=log(casual+1) and y2=log(registered+1), Here we have added 1 to deal with zero values in the casual and registered columns.

#predicting the log of registered users.
set.seed(415)
fit1 <- randomForest(logreg ~ hour +workingday+day+holiday+ day_type +temp_reg+humidity+atemp+windspeed+season+weather+dp_reg+weekend+year+year_part, data=train,importance=TRUE, ntree=250)
pred1=predict(fit1,test)
test$logreg=pred1

#predicting the log of casual users.

set.seed(415)
fit2 <- randomForest(logcas ~hour + day_type+day+humidity+atemp+temp_cas+windspeed+season+weather+holiday+workingday+dp_cas+weekend+year+year_part, data=train,importance=TRUE, ntree=250)
pred2=predict(fit2,test)
test$logcas=pred2

Re-transforming the predicted variables and then writing the output of count to the file submit.csv

test$registered=exp(test$logreg)-1
test$casual=exp(test$logcas)-1
test$count=test$casual+test$registered
s<-data.frame(datetime=test$datetime,count=test$count)
write.csv(s,file="submit.csv",row.names=FALSE)

After following the steps mentioned above, you can score 0.38675 on Kaggle leaderboard i.e. top 5 percentile of total participants. As you might have seen, we have not applied any extraordinary science in getting to this level. But, the real competition starts here. I would like to see, if I can improve this further by use of more features and some more advanced modeling techniques.

End Notes

In this article, we have looked at structured approach of problem solving and how this method can help you to improve performance. I would recommend you to generate hypothesis before you deep dive in the data set as this technique will not limit your thought process. You can improve your performance by applying advanced techniques (or ensemble methods) and understand your data trend better.

You can find the complete solution here : GitHub Link

Have you participated in any Kaggle problem? Did you see any significant benefits by doing the same? Do let us know your thoughts about this guide in the comments section below.

关于java通过背景图生成图片 a未来永远是个未知数 #java的图片处理 java java intellij-idea maven spring boot 图像处理
目录对接部分（碎碎念，可跳过）引入本地jar包文件路径错误尝试解决方案开发部分获取字体的方法关于二维码的生成关于在背景图上添加内容关于在背景图上写字关于在背景图上叠加图片关于保存图片第一次尝试第二次尝试第三次尝试最终方案关于文件读取为MultipartFile类型关于BufferedImage转MultipartFile最近用到了需要生成图片的开发，作为一个没有接触过这个的后端，实在头秃，记录一下
sda剩余的存储空间分配到sda2根目录（/）
sda8:0080G0disk├─sda18:101M0part└─sda28:2040G0part/sr011:013G0rom步骤1：检查分区布局使用lsblk或fdisk确认剩余空间的位置：sudofdisk-l/dev/sda确保剩余空间紧接在sda2分区之后。步骤2：安装必要工具确保已安装cloud-utils和e2fsprogs：sudoapt-getupdate&&sudoapt-g
【翻译】Part4: Texture samplers.
AtripthroughtheGraphicsPipeline2011,part4|Therygblog欢迎回来。上一部分讲的是顶点着色器，还大致介绍了通用的GPU着色器单元。总的来说，它们只是向量处理器，但它们可以访问一种在其他向量处理架构中不存在的资源：纹理采样器。纹理采样器是GPU流水线不可或缺的一部分，其复杂程度（以及趣味性！）足以单独写一篇文章来介绍，那接下来就开始吧。纹理状态在开始实际
1163 Dijkstra Sequence (30) 圣保罗的大教堂 PAT刷题图 pat考试
Dijkstra'salgorithmisoneoftheveryfamousgreedyalgorithms.Itisusedforsolvingthesinglesourceshortestpathproblemwhichgivestheshortestpathsfromoneparticularsourcevertextoalltheotherverticesofthegivengraph.
Day52｜动态规划part13：300.最长递增子序列、674. 最长连续递增序列、718. 最长重复子数组 QHG7C0 数据结构与算法（二刷）动态规划算法
子序列问题是动态规划解决的经典问题300.最长递增子序列首先我们明确一下子序列的定义，子序列与子串（必须要连续）不同，子序列是由数组派生而来的序列，删除（或不删除）数组中的元素而不改变其余元素的顺序。一看没啥思路，都忘光了。。。也不知道咋用动态规划做。确定dp数组含义及下标dp[i]表示i之前包括i的以nums[i]结尾的最长递增子序列的长度。这里明确dp[i]的含义很重要，因为这道题我们返回的结
快速排序的详解
分治策略：将大问题分解为小问题解决关键操作：选择基准（Pivot）并进行分区（Partition）递归处理：对分区后的子数组递归排序前言1.快速排序概述快速排序（QuickSort）是由英国计算机科学家TonyHoare于1960年提出的一种高效的分治排序算法。它在平均情况下的时间复杂度为O(nlogn)，最坏情况下为O(n²)（但可通过优化避免），且是原地排序（不需要额外空间）。2.算法步骤详解
wpf 学习笔记
1.同时加载两个窗体先添加一个子窗体，然后再app.xaml.cs里重写OnStartup方法//app.xaml.csusingSystem.Configuration;usingSystem.Data;usingSystem.Windows;namespaceWpfApp1{//////InteractionlogicforApp.xaml///publicpartialclassApp:Ap
英语杂记【4】身在此心在彼英语杂记笔记
permissivethechickensthathefeedsonthemountainwastoopermissivetohavealmostnofatonitsbody,tastingdefinitelydeliciouswhenitwassteamedorstir-fried.integralasanintegralpartsofOS,thekernelofthesystemwasgrea
KWIZ becomes part Crack SEO-狼术 net Delphi Crack 开发语言
KWIZbecomespartCrackNowundertheSnapOnSoftwarefamilyname,KWIZcarriesforwarditsdedicationtodeliveringproductivitysolutionsforMicrosoft365.KWIZ,longrecognizedforitsrobustMicrosoft365productivityandgovern
C#用递归的方法复制指定文件夹下所有文件(包括子文件夹)到指定位置未来无限 C#语言 c#递归的方法复制文件到另一个路径包含子文件覆盖
publicpartialclassForm1:Form{publicForm1(){InitializeComponent();}//////实现复制整个文件夹到另一个路径，如果存在此文件夹，便覆盖/////////publicstaticvoidCopyDir(stringsrcPath,stringaimPath){try{//检查目标目录是否以目录分割字符结束如果不是则添加if(aimPa
php artisan route:list | grep admin到底是干什么的？使用场景是什么？底层原理是什么？快点好好学习吧 Laravel php list android
phpartisanroute:list|grepadmin全解析：从命令到内核实现一、命令拆解与作用这个命令组合用于筛选并显示Laravel应用中包含admin关键字的路由：phpartisanroute:list功能：列出应用中所有注册的路由（包括URI、控制器、中间件等信息）输出示例：+--------+----------+-------------------+-------------
react setState详解谷歌玩家前端面试 react
ReactsetState调用的原理setState具体的执行过程如下：首先调用setState()函数：ReactComponent.prototype.setState=function(partialState,callback){//updater：一个带有形参的函数，返回被更新的状态对象。它可以接收到props和statethis.updater.enqueueSetState(this
以太坊dapp_构建以太坊DApp：使用自定义令牌进行投票 culi3182 java python 区块链人工智能 javascript ViewUI
以太坊dappInpart5ofthistutorialseriesonbuildingDAppswithEthereum,wedealtwithaddingcontenttothestory,lookingathowtoaddtheabilityforparticipantstobuytokensfromtheDAOandtoaddsubmissionsintothestory.It’snowt
Exception: This server is not the leader for that topic-partition. uplinker java java kafka
异常：2016081718:58:48ERRORcom.xxx.lac.service.impl.ComparePriceServiceImpl-307kafka-producer-network-thread|lac_compare_price_service_producer_3-sendCompleteexecptionThisserverisnottheleaderforthattopic
用“Gemini 2.0 Flash Preview Image Generation”模型修改图片，有哪些常用的提示词和方法子燕若水 AI画图 caoni
选定模型在GoogleAIStudio或API中切换到gemini-2.0-flash-preview-image-generation并将输出格式设为Image+Text，否则不会返回图片。12上传或贴入待修改的图片在Studio中点击“➕”上传；调用API时，把图片作为inline_data或多part请求的一部分。3输入编辑指令与聊天相同直接用自然语言描述，例如「把这辆蓝色轿车改成敞篷，然后
华为云挂载磁盘及初始化数据盘（Linux）_华为服务器怎么挂载硬盘 36氪（36Kr）频道首页程序员服务器 linux 华为云
Select(defaultp):pPartitionnumber(1-4,default1):接下来每步均使用默认值，直接按“Enter”。Partitionnumber(1-4,default1):Firstsector(2048-209715199,default2048):Usingdefaultvalue2048Lastsector,+sectorsor+size{K,M,G}(2048
Linux 超大规模数据盘初始化技巧我是峰迹 Linux运维 linux 运维服务器
在处理超大规模数据盘时，Linux系统管理员需要掌握一些高效的初始化技巧，以确保数据盘的性能和稳定性。本文将详细介绍如何使用fdisk和parted工具进行数据盘的初始化，包括分区创建、格式化以及挂载等步骤。使用fdisk进行数据盘初始化查看当前硬盘及分区情况首先，使用fdisk-l命令查看当前系统中所有的硬盘及分区情况，这有助于了解当前的存储资源分布。fdisk-l创建新分区使用fdisk对指定
C# 讯飞语音唤醒 jones.s c#
publicpartialclassMainWindow:Window{//导入C/C++的库文件[DllImport("msc_x64.dll",CallingConvention=CallingConvention.Winapi)]publicstaticexternintMSPLogin(stringusername,stringpassword,stringloginParams);[Dl
腾讯云文件上传流程从未、淡定前端领域腾讯云 http https
文件上传流程actoruseras"用户"participantClientas"浏览器"participantServeras"服务端"participantCOSas"腾讯云"autonumberCOSClient:点击上传按钮user->Client:选择上传文件Client->Server:向服务端发起请求需要上传的文件名Server->COS:结合密钥，向COS请求生成__临时密钥__S
pgsql14自动创建表分区健康马m pgsql 数据库
最近有pgsql的分区表功能需求，没想到都2025年了，pgsql和mysql还是没有自身支持自动创建分区表的功能现在pgsql数据库层面还是只能用老三样的办法来处理这个问题，每个方法各有优劣1.触发器这是最传统的方法，通过创建一个触发器来检查数据并创建新分区缺点是每次插入数据都会执行触发器，当数据量大时可能影响性能，现在基本很少用这个方案在生产环境上操作2.pg_partmanPostgreSQ
Xilinx FPGA ICAP原语实现多重配置 whik1194 ISE Vivado MicroBlaze系列教程 FPGA xilinx ICAP Multiboot 多重配置
文章目录1.FPGA可以运行几个固件2.XilinxICAP原语简介3.ICAP原语模板的使用4.ICAP在Spartan-6上的使用5.ICAP在Kintex-7上的使用工程下载1.FPGA可以运行几个固件众所周知，常见的FPGA通常为SRAM结构，固件程序一般存放在外置的串行Flash中，比如SPIFlash，M25P16或N25Q128等。FPGA启动时，一般先从SPI起始地址开始加载数据到
PART 7 视频 qq_39717490 音视频 opencv 人工智能
在Debian10上安装OpenCV的两种方法：从存储库和源代码中安装OpenCV_debianopencv-CSDN博客本人的树莓派系统是pi@pi:~$lsb_release-aNoLSBmodulesareavailable.DistributorID:DebianDescription:DebianGNU/Linux12(bookworm)Release:12Codename:bookwo
day02 数组part02 hwt819 算法 java 数据结构
209.长度最小的子数组滑动窗口，窗口满足条件，就开始移左边。classSolution{publicintminSubArrayLen(inttarget,int[]nums){intlength=Integer.MAX_VALUE;intsum=0;intleft=0;for(inti=0;i=target){//记录长度length=Math.min(length,i-left+1);//缩
day03 链表part01 hwt819 链表数据结构
203.移除链表元素使用dummy辅助，使用cur来遍历cur遍历到要操作节点的前一个节点。classSolution{publicListNoderemoveElements(ListNodehead,intval){ListNodedummy=newListNode(0);dummy.next=head;ListNodecur=dummy;//循环到链表结束while(cur.next!=nu
day04 链表part02
24.两两交换链表中的节点想不明白的时候，画图会很直观。写好操作的伪代码，按照伪代码写。classSolution{publicListNodeswapPairs(ListNodehead){if(head==null||head.next==null){//0个或者1个，直接返回returnhead;}ListNodedummy=newListNode(0);dummy.next=head;Li
16、流体力学数值模拟 404Feels 流体力学数值模拟纳维-斯托克斯方程
流体力学数值模拟1.流体力学的基本方程流体力学是研究流体（液体和气体）运动规律的学科，其基本方程是纳维-斯托克斯方程（Navier-Stokesequation）。该方程描述了流体的速度、压力、温度等物理量随时间和空间的变化。为了便于数值求解，我们需要将这些方程离散化。以下是纳维-斯托克斯方程的标准形式：[\frac{\partial\mathbf{u}}{\partialt}+(\mathbf{
拥抱Linux Mint，安装迅雷和微信 zhqh100 linux 运维服务器
迅雷的下载地址http://archive.kylinos.cn/kylin/partner/pool/com.xunlei.download_1.0.0.1_amd64.debLinuxMint自带的Transmission今天下载速度还可以，几兆的速度，挺满意的微信的下载地址https://linux.weixin.qq.com/搜狗拼音输入法虽然有官网，但官网最后说是支持Ubuntu20.0
H265 Intro - General Concepts fanbird2008 Stream Media Stream Media/HEVC/H265 hevc
http://www.f265.org/f265/static/txt/h265_companion.htmlH.265CompanionPurposeandorganizationofthisdocumentThisdocumentcontainshuman-readableinformationaboutthemorecomplexpartsoftheH.265specification.It
Leetcode 3599. Partition Array to Minimize XOR Espresso Macchiato leetcode笔记 leetcode 3599 leetcode medium leetcode周赛456 动态规划
Leetcode3599.PartitionArraytoMinimizeXOR1.解题思路2.代码实现题目链接：3599.PartitionArraytoMinimizeXOR1.解题思路这一题就是一个动态规划的思路。我们定义动态规划的状态函数dp(idx,k)将数组arr[idx:]切分为kkk个子串之后能够获得的最大XOR的最小值。此时，我们就能有状态转移函数：dp(i,k)=minj=i+
AI优化算法实战：使用粒子群优化求解复杂工程问题 AI学长带你学AI ai
AI优化算法实战：使用粒子群优化求解复杂工程问题关键词：粒子群优化（PSO）、全局优化、工程问题、智能算法、参数调优摘要：本文以“鸟群觅食”为灵感来源，深入浅出地讲解粒子群优化（ParticleSwarmOptimization,PSO）算法的核心原理，并通过机械结构轻量化设计的实战案例，展示其在复杂工程问题中的应用。文章从算法起源到数学模型，从代码实现到工程落地，层层拆解技术细节，帮助读者快速掌
多线程编程之理财周凡杨 java 多线程生产者消费者理财
现实生活中，我们一边工作，一边消费，正常情况下会把多余的钱存起来，比如存到余额宝，还可以多挣点钱，现在就有这个情况：我每月可以发工资20000万元（暂定每月的1号），每月消费5000（租房+生活费）元（暂定每月的1号），其中租金是大头占90%，交房租的方式可以选择（一月一交，两月一交、三月一交），理财：1万元存余额宝一天可以赚1元钱，
[Zookeeper学习笔记之三]Zookeeper会话超时机制 bit1129 zookeeper
首先，会话超时是由Zookeeper服务端通知客户端会话已经超时，客户端不能自行决定会话已经超时，不过客户端可以通过调用Zookeeper.close()主动的发起会话结束请求，如下的代码输出内容 Created /zoo-739160015 CONNECTEDCONNECTED .............CONNECTEDCONNECTED CONNECTEDCLOSEDCLOSED
SecureCRT快捷键 daizj secureCRT 快捷键
ctrl + a : 移动光标到行首ctrl + e ：移动光标到行尾crtl + b: 光标前移1个字符crtl + f: 光标后移1个字符crtl + h : 删除光标之前的一个字符ctrl + d ：删除光标之后的一个字符crtl + k ：删除光标到行尾所有字符crtl + u : 删除光标至行首所有字符crtl + w: 删除光标至行首
Java 子类与父类这间的转换周凡杨 java 父类与子类的转换
最近同事调的一个服务报错，查看后是日期之间转换出的问题。代码里是把 java.sql.Date 类型的对象强制转换为 java.sql.Timestamp 类型的对象。报java.lang.ClassCastException。代码：
可视化swing界面编辑朱辉辉33 eclipse swing
今天发现了一个WindowBuilder插件，功能好强大，啊哈哈，从此告别手动编辑swing界面代码，直接像VB那样编辑界面，代码会自动生成。首先在Eclipse中点击help，选择Install New Software,然后在Work with中输入WindowBui
web报表工具FineReport常用函数的用法总结（文本函数）老A不折腾 finereport web报表工具报表软件 java报表
文本函数 CHAR CHAR(number):根据指定数字返回对应的字符。CHAR函数可将计算机其他类型的数字代码转换为字符。 Number:用于指定字符的数字，介于1Number:用于指定字符的数字，介于165535之间（包括1和65535）。示例: CHAR(88)等于“X”。 CHAR(45)等于“-”。 CODE CODE(text):计算文本串中第一个字
mysql安装出错林鹤霄 mysql安装
[root@localhost ~]# rpm -ivh MySQL-server-5.5.24-1.linux2.6.x86_64.rpm Preparing... #####################
linux下编译libuv aigo libuv
下载最新版本的libuv源码，解压后执行： ./autogen.sh 这时会提醒找不到automake命令，通过一下命令执行安装（redhat系用yum，Debian系用apt-get）： # yum -y install automake # yum -y install libtool 如果提示错误：make: *** No targe
中国行政区数据及三级联动菜单 alxw4616
近期做项目需要三级联动菜单,上网查了半天竟然没有发现一个能直接用的! 呵呵,都要自己填数据....我了个去这东西麻烦就麻烦的数据上. 哎,自己没办法动手写吧. 现将这些数据共享出了,以方便大家.嗯,代码也可以直接使用文件说明 lib\area.sql -- 县及县以上行政区划分代码（截止2013年8月31日)来源：国家统计局发布时间：2014-01-17 15:0
哈夫曼加密文件百合不是茶哈夫曼压缩哈夫曼加密二叉树
在上一篇介绍过哈夫曼编码的基础知识,下面就直接介绍使用哈夫曼编码怎么来做文件加密或者压缩与解压的软件,对于新手来是有点难度的,主要还是要理清楚步骤; 加密步骤: 1,统计文件中字节出现的次数,作为权值 2,创建节点和哈夫曼树 3,得到每个子节点01串 4,使用哈夫曼编码表示每个字节
JDK1.5 Cyclicbarrier实例 bijian1013 java thread java多线程 Cyclicbarrier
CyclicBarrier类一个同步辅助类，它允许一组线程互相等待，直到到达某个公共屏障点 (common barrier point)。在涉及一组固定大小的线程的程序中，这些线程必须不时地互相等待，此时 CyclicBarrier 很有用。因为该 barrier 在释放等待线程后可以重用，所以称它为循环的 barrier。 CyclicBarrier支持一个可选的 Runnable 命令，
九项重要的职业规划 bijian1013 工作学习
一. 学习的步伐不停止古人说，活到老，学到老。终身学习应该是您的座右铭。世界在不断变化，每个人都在寻找各自的事业途径。您只有保证了足够的技能储
【Java范型四】范型方法 bit1129 java
范型参数不仅仅可以用于类型的声明上，例如 package com.tom.lang.generics; import java.util.List; public class Generics<T> { private T value; public Generics(T value) { this.value =
【Hadoop十三】HDFS Java API基本操作 bit1129 hadoop
package com.examples.hadoop; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoo
ua实现split字符串分隔 ronin47 lua split
LUA并不象其它许多"大而全"的语言那样，包括很多功能，比如网络通讯、图形界面等。但是LUA可以很容易地被扩展：由宿主语言(通常是C或 C++)提供这些功能，LUA可以使用它们，就像是本来就内置的功能一样。LUA只包括一个精简的核心和最基本的库。这使得LUA体积小、启动速度快，从而适合嵌入在别的程序里。因此在lua中并没有其他语言那样多的系统函数。习惯了其他语言的字符串分割函
java-从先序遍历和中序遍历重建二叉树 bylijinnan java
public class BuildTreePreOrderInOrder { /** * Build Binary Tree from PreOrder and InOrder * _______7______ / \ __10__ ___2 / \ / 4
openfire开发指南《连接和登陆》开窍的石头 openfire 开发指南 smack
第一步官网下载smack.jar包下载地址：http://www.igniterealtime.org/downloads/index.jsp#smack 第二步把smack里边的jar导入你新建的java项目中开始编写smack连接openfire代码 p
[移动通讯]手机后盖应该按需要能够随时开启 comsci 移动
看到新的手机，很多由金属材质做的外壳，内存和闪存容量越来越大，CPU速度越来越快，对于这些改进，我们非常高兴，也非常欢迎但是，对于手机的新设计，有几点我们也要注意第一：手机的后盖应该能够被用户自行取下来，手机的电池的可更换性应该是必须保留的设计,
20款国外知名的php开源cms系统 cuiyadll cms
内容管理系统，简称CMS，是一种简易的发布和管理新闻的程序。用户可以在后端管理系统中发布，编辑和删除文章，即使您不需要懂得HTML和其他脚本语言，这就是CMS的优点。在这里我决定介绍20款目前国外市面上最流行的开源的PHP内容管理系统，以便没有PHP知识的读者也可以通过国外内容管理系统建立自己的网站。 1. Wordpress WordPress的是一个功能强大且易于使用的内容管
Java生成全局唯一标识符 darrenzhu java uuid unique identifier id
How to generate a globally unique identifier in Java http://stackoverflow.com/questions/21536572/generate-unique-id-in-java-to-label-groups-of-related-entries-in-a-log http://stackoverflow
php安装模块检测是否已安装过, 使用的SQL语句 dcj3sjt126com sql
SHOW [FULL] TABLES [FROM db_name] [LIKE 'pattern'] SHOW TABLES列举了给定数据库中的非TEMPORARY表。您也可以使用mysqlshow db_name命令得到此清单。本命令也列举数据库中的其它视图。支持FULL修改符，这样SHOW FULL TABLES就可以显示第二个输出列。对于一个表，第二列的值为BASE T
5天学会一种 web 开发框架 dcj3sjt126com Web 框架 framework
web framework层出不穷，特别是ruby/python,各有10+个,php/java也是一大堆根据我自己的经验写了一个to do list,按照这个清单，一条一条的学习，事半功倍，很快就能掌握一共25条，即便很磨蹭，2小时也能搞定一条，25*2=50。只需要50小时就能掌握任意一种web框架各类web框架大同小异:现代web开发框架的6大元素，把握主线，就不会迷路建议把本文
Gson使用三(Map集合的处理,一对多处理) eksliang json gson Gson map Gson 集合处理
转载请出自出处：http://eksliang.iteye.com/blog/2175532 一、概述 Map保存的是键值对的形式，Json的格式也是键值对的，所以正常情况下，map跟json之间的转换应当是理所当然的事情。二、Map参考实例 package com.ickes.json; import java.lang.refl
cordova实现“再点击一次退出”效果 gundumw100 android
基本的写法如下： document.addEventListener("deviceready", onDeviceReady, false); function onDeviceReady() { //navigator.splashscreen.hide(); document.addEventListener("b
openldap configuration leaning note iwindyforest configuration
hostname // to display the computer name hostname <changed name> // to change go to: /etc/sysconfig/network, add/modify HOSTNAME=NEWNAME to change permenately dont forget to change /etc/hosts
Nullability and Objective-C 啸笑天 Objective-C
https://developer.apple.com/swift/blog/?id=25 http://www.cocoachina.com/ios/20150601/11989.html http://blog.csdn.net/zhangao0086/article/details/44409913 http://blog.sunnyxx
jsp中实现参数隐藏的两种方法 macroli JavaScript jsp
在一个JSP页面有一个链接，//确定是一个链接?点击弹出一个页面，需要传给这个页面一些参数。//正常的方法是设置弹出页面的src="***.do?p1=aaa&p2=bbb&p3=ccc"//确定目标URL是Action来处理?但是这样会在页面上看到传过来的参数，可能会不安全。要求实现src="***.do"，参数通过其他方法传！//////
Bootstrap A标签关闭modal并打开新的链接解决方案 qiaolevip 每天进步一点点学习永无止境 bootstrap 纵观千象
Bootstrap里面的js modal控件使用起来很方便，关闭也很简单。只需添加标签 data-dismiss="modal" 即可。可是偏偏有时候需要a标签既要关闭modal，有要打开新的链接，尝试多种方法未果。只好使用原始js来控制。 <a href="#/group-buy" class="btn bt
二维数组在Java和C中的区别流淚的芥末 java c 二维数组数组
Java代码： public class test03 { public static void main(String[] args) { int[][] a = {{1},{2,3},{4,5,6}}; System.out.println(a[0][1]); } } 运行结果： Exception in thread "mai
systemctl命令用法 wmlJava linux systemctl
对比表，以 apache / httpd 为例任务旧指令新指令使某服务自动启动 chkconfig --level 3 httpd on systemctl enable httpd.service 使某服务不自动启动 chkconfig --level 3 httpd off systemctl disable httpd.service 检查服务状态 service h

Kaggle Bike Sharing Demand Prediction – How I got in top 5 percentile of participants?

Kaggle Bike Sharing Demand Prediction – How I got in top 5 percentile of participants?

Introduction

Before you start – warming up to participate in Kaggle Competition

Kaggle Bike Sharing Demand Challenge

Solution

Step 1. Hypothesis Generation

2. Understanding the Data Set

3. Importing Data set and Basic Data Exploration

4. Hypothesis Testing (using multivariate analysis)

5. Feature Engineering

6. Model Building

End Notes

你可能感兴趣的:(part)