Snoopdle

机器学习 - Titanic幸存者预测保姆级

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression

df = pd.read_csv('/Users/gaoliang/Documents/Kaggle/titanic/train.csv') 
df.head()

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

df.isna().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

# check if the dataset is balanced by counting the unique vslues of the target variable:
df.Survived.value_counts()

0    549
1    342
Name: Survived, dtype: int64

df.info()


RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

df.shape

(891, 12)

# we do not need column PassengerId. We can drop it as follow:
df.drop(columns = ['PassengerId'], inplace = True)

# DataFrame does not change the original date. It simply produces a new copy. 
# Alternatively, we can write:
# df = df.drop(columns = ['PassengerId'])

df.head()

	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

df.drop(columns = ['Cabin','Ticket','Name'],inplace = True)

# change Fare to Price 
df.rename(columns = {'Fare':'Price'},inplace = True)

df.head()

	Survived	Pclass	Sex	Age	SibSp	Price	Embarked
0	0	3	male	22.0	1	7.2500	S
1	1	1	female	38.0	1	71.2833	C
2	1	3	female	26.0	0	7.9250	S
3	1	1	female	35.0	1	53.1000	S
4	0	3	male	35.0	0	8.0500	S

# Let's plot a histogram of Fare:
df.Price.hist(bins=100)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QyqhtZW3-1665478037452)(output_11_1.png)]

# we need to replace the values of the column 'Sex':
#Solution 1: 
df.loc[df['Sex'] == 'female','Sex'] = 0
df.loc[df['Sex'] == 'male','Sex'] = 1
df.head(5)

	Survived	Pclass	Sex	Age	SibSp	Price	Embarked
0	0	3	1	22.0	1	7.2500	S
1	1	1	0	38.0	1	71.2833	C
2	1	3	0	26.0	0	7.9250	S
3	1	1	0	35.0	1	53.1000	S
4	0	3	1	35.0	0	8.0500	S

# 2 
df2 = df.copy()

def Sex2Num(Sex_String):
    if Sex_String == 'female':
        return 0
    elif Sex_String == 'male':
        return 1
    else:
        return Sex_String

df2['Sex'] = df2['Sex'].apply(Sex2Num)
df2.head(3)

	Survived	Pclass	Sex	Age	SibSp	Price	Embarked
0	0	3	1	22.0	1	7.2500	S
1	1	1	0	38.0	1	71.2833	C
2	1	3	0	26.0	0	7.9250	S

# 3
df3 = df.copy()
df3['Sex'] = df3['Sex'].apply(lambda x:0 if x == 'female' else 1 if x == 'male' else x)
df3.head(3)

	Survived	Pclass	Sex	Age	SibSp	Price	Embarked
0	0	3	1	22.0	1	7.2500	S
1	1	1	0	38.0	1	71.2833	C
2	1	3	0	26.0	0	7.9250	S

pandas.get_dummies() allows us to convert a categorical variable with k possible values into k new binary variables called dummy variables. This conversion is also called one-hot encoding in computer science. Below, we convert column Embarked into dummies.

Note that dummy conversion is meaningful only if k is small. Otherwise, it creates too many new independent variables, each carrying only negligible amount of informatin.

df = pd.get_dummies(df, columns = ['Embarked'])

df.head(10)

	Survived	Pclass	Sex	Age	SibSp	Parch	Price	Embarked_C	Embarked_Q	Embarked_S
0	0	3	1	22.0	1	0	7.2500	0	0	1
1	1	1	0	38.0	1	0	71.2833	1	0	0
2	1	3	0	26.0	0	0	7.9250	0	0	1
3	1	1	0	35.0	1	0	53.1000	0	0	1
4	0	3	1	35.0	0	0	8.0500	0	0	1
5	0	3	1	NaN	0	0	8.4583	0	1	0
6	0	1	1	54.0	0	0	51.8625	0	0	1
7	0	3	1	2.0	3	1	21.0750	0	0	1
8	1	3	0	27.0	0	2	11.1333	0	0	1
9	1	2	0	14.0	1	0	30.0708	1	0	0

We then need to drop one of the created dummies to avoid the multicollinearity problem. Let’s drop the most frequent one, Embarked_S.

df.drop(columns = 'Embarked_S', inplace = True)

df.head()

	Survived	Pclass	Sex	Age	SibSp	Price	Embarked_C
0	0	3	1	22.0	1	7.2500	0
1	1	1	0	38.0	1	71.2833	1
2	1	3	0	26.0	0	7.9250	0
3	1	1	0	35.0	1	53.1000	0
4	0	3	1	35.0	0	8.0500	0

Rearrange column order

Suppose we want to move column Pclass to after column Parch. We can do so in two ways.

Use DataFrame.reindex(columns=[the columns in the order that you want])
Or, use the following:

df = df[['Survived','Sex','Age','SibSp','Parch','Pclass','Price','Embarked_C','Embarked_Q']]
# hint: we can use df.columns.to_list() to first produce the old column order, then copy & edit

df.head()

	Survived	Sex	Age	SibSp	Pclass	Price	Embarked_C
0	0	1	22.0	1	3	7.2500	0
1	1	0	38.0	1	1	71.2833	1
2	1	0	26.0	0	3	7.9250	0
3	1	0	35.0	1	1	53.1000	0
4	0	1	35.0	0	3	8.0500	0

# 要把 特征和标签分开， sklearn 是分开导入的

# Separate the data into the feature matrix and the target array
X = df.drop(columns=['Survived'])
y = df['Survived']

# Next, split train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                               test_size=0.2, # reserve 20% data for testing
                               random_state=365)

# (Not required in our class) The following is to avoid the well-known SettingWithCopyWarning
# associated with the problematic implementation of train_test_split()
#X_train = X_train.copy()
# X_test = X_test.copy()

print(X_train.shape)
print(X_test.shape)

(712, 8)
(179, 8)

# Any missing data?
df.isna().sum()

Survived        0
Pclass          0
Sex             0
Age           177
SibSp           0
Parch           0
Price           0
Embarked_C      0
Embarked_Q      0
dtype: int64

# This dataset has missing values in column Age, which we need to impute first
X_train_Age_mean = X_train['Age'].mean()
X_train['Age'] = X_train['Age'].fillna(X_train_Age_mean)

# Verify that there's no more missing values:
X_train.Age.isna().sum()

# Important: make sure to do exactly the same data wrangling over the test dataset!
X_test['Age'] = X_test['Age'].fillna(X_train_Age_mean)

df.isna().sum()

Survived        0
Pclass          0
Sex             0
Age           177
SibSp           0
Parch           0
Price           0
Embarked_C      0
Embarked_Q      0
dtype: int64

X_test.Age.isna().sum()

# Let's try logistic regression as the learning algorithm
# First, load the package
from sklearn.linear_model import LogisticRegression

# Next, set the hyperparameters of this classifier
clf_lr = LogisticRegression(
    penalty='none', # Otherwise regularization will happen (to study later)
    max_iter=1000) # The model didn't converge with default 100 iterations

# Next, fit (a.k.a. train) this model over the train dataset
clf_lr.fit(X_train,y_train)

LogisticRegression(max_iter=1000, penalty='none')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

LogisticRegression

LogisticRegression(max_iter=1000, penalty='none')

# Run this code cell to observe the coefficients of the trained model:
coef_lr = pd.DataFrame(clf_lr.coef_[0],index=X_train.columns,columns=['coefficient'])
coef_lr.transpose()

	Pclass	Sex	Age	SibSp	Parch	Price	Embarked_C	Embarked_Q
coefficient	-0.995441	-2.602657	-0.032009	-0.346223	-0.059504	0.003378	0.246601	0.346033

One weakness of the scikit-learn package, as compared to R packages, is that it is more into prediction and less into the completeness of statistics reporting.
For example, LogisticRegression does not report the p-value. If you need it, try another package statsmodels as follows:

import statsmodels.api as sm
logit_model=sm.Logit(y_train.astype(float),sm.add_constant(X_train.astype(float)))
result=logit_model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.452236
         Iterations 6
                           Logit Regression Results                           
==============================================================================
Dep. Variable:               Survived   No. Observations:                  712
Model:                          Logit   Df Residuals:                      703
Method:                           MLE   Df Model:                            8
Date:                Tue, 11 Oct 2022   Pseudo R-squ.:                  0.3193
Time:                        11:24:58   Log-Likelihood:                -321.99
converged:                       True   LL-Null:                       -473.03
Covariance Type:            nonrobust   LLR p-value:                 1.491e-60
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          4.2980      0.577      7.446      0.000       3.167       5.429
Pclass        -0.9946      0.157     -6.337      0.000      -1.302      -0.687
Sex           -2.6030      0.219    -11.902      0.000      -3.032      -2.174
Age           -0.0320      0.008     -3.761      0.000      -0.049      -0.015
SibSp         -0.3462      0.123     -2.818      0.005      -0.587      -0.105
Parch         -0.0598      0.139     -0.431      0.666      -0.331       0.212
Price          0.0034      0.003      1.247      0.213      -0.002       0.009
Embarked_C     0.2466      0.260      0.947      0.344      -0.264       0.757
Embarked_Q     0.3443      0.376      0.915      0.360      -0.393       1.082
==============================================================================


/Users/gaoliang/opt/anaconda3/lib/python3.9/site-packages/statsmodels/tsa/tsatools.py:142: FutureWarning: In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only
  x = pd.concat(x[::order], 1)

# Now back to LogisticRegression in scikit-learn. Let's evaluate the performance of the 
# trained model. To do so, we first use the trained model to predict the test dataset.
y_predict = clf_lr.predict(X_test)

# Then, compare the predicted values with the truth to get accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_predict).round(4)

0.8156

# Observe the confusion matrix
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_predict))

[[96 12]
 [21 50]]

When it comes to creating a trained algorithm (a.k.a., a trained model, or model), we know there are many possible choices:

There are many learning algorithms to choose from.
- E.g., regression, trees, kNN, SVM, neural network, emsemble, …
Also importantly, most learning algorithms have hyperparameters that need to be set before training. The choices of these hyperparameters may result in different trained models.
- E.g., for decision tree learning, how deep to allow a tree to grow? For regression, should we consider regularization? For kNN, what value of k to set? …

There is no free lunch – there is no sure choice that dominates all other choices. Otherwise, we wouldn’t see all these choices in today’s analytics practices.

In this lecture, we will try and discuss the pros/cons of a few popular learning algorithms. We leave the topic of hyperparameter tuning, as well as the discussion of the state-of-the-art boosting-based algorithms (that always require hyperparameter tuning), to the next lecture.

# A template for implementing various supervised learning algorithms 
# I assume that, prior to running this code, we have already pre-processed the data

# Load the learning algorithm
from sklearn.linear_model import LogisticRegression

# Set the hyperparameters of this algorithm
clf = LogisticRegression(penalty='none', max_iter=1000)

# Fit the model over the train data
clf.fit(X_train,y_train)

# Use the fitted model to predict the test data
y_predict = clf.predict(X_test)

# Obtain performance metrics
accuracy = accuracy_score(y_test, y_predict).round(4)
print(f"The accuracy is: {accuracy:.2%}")
print("The confusion matrix is:")
cm = confusion_matrix(y_test, y_predict)
print(cm)

# Save the model and the performance metrics for later comparison.
# Here I use suffix "lr" because we just tried logistic regression.
# Change the suffix when you switch to a new learning algorithm!
clf_lr = clf
accuracy_lr = accuracy
cm_lr = cm

The accuracy is: 81.56%
The confusion matrix is:
[[96 12]
 [21 50]]

# k-Nearest Neighbors (kNN)
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors=3)

clf.fit(X_train,y_train)

y_predict = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_predict).round(4)
print(f"The accuracy is: {accuracy:.2%}")
print("The confusion matrix is:")
cm = confusion_matrix(y_test, y_predict)
print(cm)

# save the results for later comparison
clf_knn = clf
accuracy_knn = accuracy
cm_knn = cm

The accuracy is: 70.95%
The confusion matrix is:
[[92 16]
 [36 35]]

# Decision Trees
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(max_depth=2)
clf.fit(X_train,y_train)

y_predict = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_predict).round(4)
print(f"The accuracy is: {accuracy:.2%}")
print("The confusion matrix is:")
cm = confusion_matrix(y_test, y_predict)
print(cm)

# save the results for later comparison
clf_dt = clf
accuracy_dt = accuracy
cm_dt = cm

The accuracy is: 77.65%
The confusion matrix is:
[[104   4]
 [ 36  35]]

Plotting the trained tree

One advantage of decision tree learning is that the trained model is often intuitive to human beings. Therefore, despite its often inferior performance especially for large and complicated datasets, analysts use it a lot in practice for understanding the data and for communication with others. Let’s plot the trained tree I just got.

from sklearn import tree
import matplotlib.pyplot as plt

# warning: if the tree is too big to read, limit the max_depth of the tree during training

plt.figure(figsize=(15,10))  # set plot size (denoted in inches)
tree.plot_tree(clf_dt,
               feature_names=X_train.columns,
               filled = True,
               fontsize=12)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GbHXlTTi-1665478037454)(output_47_0.png)]

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X_train,y_train)

y_predict = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_predict).round(4)
print(f"The accuracy is: {accuracy:.2%}")
print("The confusion matrix is:")
cm = confusion_matrix(y_test, y_predict)
print(cm)

# save the results for later comparison
clf_rf = clf
accuracy_rf = accuracy
cm_rf = cm

The accuracy is: 79.33%
The confusion matrix is:
[[107   1]
 [ 36  35]]

Using random forest for ranking the importance of features

A handy feature of RandomForestClassifier is that it provides a robust ranking of the relative importance of all input variables.

Often subsequently used for manual feature selection.
Note that this ranking gets around the (messy) choice among correlated variables problem.

importances = clf_rf.feature_importances_
pd.Series(importances, index=X_train.columns).sort_values(ascending=False)

Sex           0.322660
Price         0.215141
Pclass        0.201329
Age           0.122027
SibSp         0.070367
Parch         0.031325
Embarked_C    0.028627
Embarked_Q    0.008524
dtype: float64

A recap of the learning algorithms in `scikit-learn` package

The scikit-learn package contains a large selection of traditional supervised learning algorithms, with excellent documentation and coding examples.

“traditional” means not using the deep learning approach

I expect you to be able to use the following learning algorithms:

Linear Models including both linear regression and logistic regression
- For better predictive power, use Python
- For better explanatory power, use R
Decision Trees
Nearest Neighbors
- simple, only one hyperparameter, and can be used for imputation
Ensemble methods including random forest and XGBoost (next lecture)
Neural Networks (to study in the second-half of the semester)

(Models NOT required for this course) It is a good idea for you to at least read a bit about the following learning algorithms:

Support Vector Machines – the idea is to use a hyperplane to separate data in a high-dimensional space; was very popular before ensemble methods took off
Naive Bayes – based on the Bayes’ Theorem
Stochastic Gradient Descent (SGD) – It provides an efficient computational approach to fitting supervised learning models, especially when the data is big. This is a core fitting method used in deep learning (where the data is almost always big).

You should also be familiar with the concept of regularization that is commonly used in machine learning (see Part 4 of this lecture)

purpose is to control overfitting
two variations
- L1 (a.k.a. Lasso) regularization
- L2 (a.k.a. Ridge) regularization
- (if you use both L1 and L2, it’s called Elstic-Net)

你可能感兴趣的:(机器学习,python,人工智能)

有趣的学习Python-第十篇：Python的“魔法宝库”：标准库之旅王盼达有趣的学习Python 学习 python 开发语言
Python不仅是一门强大的编程语言，更像是一座充满宝藏的“魔法宝库”，里面装满了各种各样的“魔法工具”（标准库）。这些“魔法工具”可以帮助你轻松地完成各种任务，从文件操作到网络编程，从数据处理到性能优化。接下来，让我们一起探索Python的“魔法宝库”，看看这些“魔法工具”到底有多神奇！10.1操作系统接口：与“魔法世界”互动os模块就像是一个“魔法接口”，可以帮助你与操作系统进行互动。你可以用
有趣的学习Python-第八篇：Python的“魔法盾牌”：错误与异常处理王盼达有趣的学习Python 学习 python 开发语言
在Python的魔法世界里，即使是经验丰富的魔法师也可能遇到一些“魔法失误”。这些失误分为两种：语法错误和异常。别担心，Python为你准备了一面强大的“魔法盾牌”，帮助你应对这些挑战。8.1语法错误：魔法咒语写错了语法错误就像是你在念魔法咒语时，不小心说错了单词。这是学习Python过程中最常见的问题。比如，你可能忘记在while循环后面加上冒号：whileTrueprint('Hellowor
Python字符串操作 weixin_30871905 python
转自http://blog.chinaunix.net/u/19742/showart_382176.html#Python字符串操作'''1.复制字符串'''#strcpy(sStr1,sStr2)sStr1='strcpy'sStr2=sStr1sStr1='strcpy2'printsStr2'''2.连接字符串'''#strcat(sStr1,sStr2)sStr1='strcat'sSt
零基础必看！CCF-GESP Python一级考点全解析：运算符这样学就对了奕澄羽邦 python 开发语言
第一章编程世界的基础工具：运算符三剑客在Python编程语言中，运算符如同魔法咒语般神奇。对于CCF-GESPPython一级考生而言，正确掌握比较运算符、算术运算符和逻辑运算符这三大基础工具，就相当于打开了数字世界的大门。这三个运算符家族共同构成了程序逻辑的核心骨架，其灵活组合能实现从简单计算到复杂判断的多样功能。1.1运算符分类图谱算术运算符：负责数字间的数学运算（+-*/%）比较运算符：用于
机器学习(Machine Learning) 七指琴魔御清绝大数据学习
原文链接：http://blog.csdn.net/zhoubl668/article/details/42921187希望转载的朋友，你可以不用联系我．但是一定要保留原文链接，因为这个项目还在继续也在不定期更新．希望看到文章的朋友能够学到更多．《BriefHistoryofMachineLearning》介绍:这是一篇介绍机器学习历史的文章，介绍很全面，从感知机、神经网络、决策树、SVM、Ada
Python 字符串操作 iteye_13776 Python Python C C++C#
Python截取字符串使用变量[头下标:尾下标]，就可以截取相应的字符串，其中下标是从0开始算起，可以是正数或负数，下标可以为空表示取到头或尾。#例1：字符串截取str='12345678'printstr[0:1]>>1#输出str位置0开始到位置1以前的字符printstr[1:6]>>23456#输出str位置1开始到位置6以前的字符num=18str='0000'+str(num)#合并字
机器学习实战——音乐流派分类（主页有源码）喵了个AI 机器学习实战机器学习分类人工智能
✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨1.简介音乐流派分类是音乐信息检索（MusicInformationRetrieval,MIR）中的一个重要任务，旨在通过分析音频信号的特征，将音乐自动分类到不同的流派（如古典、摇滚、爵士、流行等）。随着数字音乐平台的普及，音乐流派分类技术被广泛应用于音乐推荐、自动标签生成和音乐库管理
【Python 第五篇章】数据类型蜗牛 | ICU Python 专栏 python windows 开发语言
一、列表详解list.append(x)在列表末尾添加一个元素。list.extend(iterable)用可迭代对象的元素扩展列表。list.insert(i,x)在指定位置插入元素，第一个参数是插入元素的索引，第二个是值。list.remove(x)从列表中删除第一个值为x的元素。list.pop([i])移除列表中给定位置的条目，并返回该条目。如果未指定索引号，则a.pop()将移除并返回列
python catia catalog文件_Python封装的获取文件目录的函数卢新生 python catia catalog文件
获取指定文件夹中文件的函数，网上学习时东拼西凑的结果。注意，其中文件名如1.txt，文件路径如D:\文件夹\1.txt；direct为第一层子级importos#filePath输入文件夹全路径#mode#1递归获取所有文件名;#2递归获取所有文件路径;#3获取direct文件名;#4获取direct文件路径;#5获取direct文件名和direct子文件夹名;#6获取direct文件路径和dir
Python：每日一题之错误票据努力的敲码工蓝桥杯每日一题 python 蓝桥杯
题目描述某涉密单位下发了某种票据，并要在年终全部收回。每张票据有唯一的ID号。全年所有票据的ID号是连续的，但ID的开始数码是随机选定的。因为工作人员疏忽，在录入ID号的时候发生了一处错误，造成了某个ID断号，另外一个ID重号。你的任务是通过编程，找出断号的ID和重号的ID。假设断号不可能发生在最大和最小号。输入描述输入描述要求程序首先输入一个整数N(N<100)表示后面数据行数。接着读入N行数据
AI大模型零基础金融人如何一周自学大模型，从零基础到入门，看这篇就够了！冻感糕人~ 人工智能金融 AI大模型 LLM 大模型技术大模型学习路线大模型基础
前几天参加了字节跳动在上海举办的火山引擎Force原动力大会，OpenAI也连续开了12天发布会，最近堪称科技界的春晚了。如果说2022年ChatGPT横空出世把人工智能的发展带上了一个新的台阶，那么2024年末，大模型对工作、生活的全面“侵入”让我们越来越接近库兹韦尔所描述的那个奇点时刻。作为金融民工，我们想通过这篇文章讲讲从用户的角度如何一周快速掌握大模型，以及为什么我建议每一个金融从业人员（
Python控制批量插入Catia文件并修改文件定义及PN 一盘红烧肉 python
改了两天，总算初步摸清楚了Catia中的文件结构，实现了使用Python控制批量修改文件名及定义使用Pycatia在Product中插入Part并改名及定义
PySide2是 Qt 库的 Python 绑定之一 WwwwwH_PLUS #Qt qt python 开发语言
PySide2是Qt库的Python绑定之一，它为Python程序员提供了创建跨平台桌面应用程序的工具和功能。PySide2是Qt5.x系列的Python绑定，而Qt本身是一个跨平台的图形用户界面（GUI）框架，广泛用于开发各种类型的桌面应用程序，包括多种平台（Windows、Linux、macOS）的应用。主要特点跨平台支持：PySide2可以在Windows、Linux和macOS上运行，允许
Python学习第十一天 Leo来编程 Python学习 python
疑惑：有很多人不知道是不是也分不清什么是单核？什么是多核？什么是时间片？进程？线程？那么在讲进程和线程前我先举个例子更好理解这些概念。单核例子：比如你是一个厨师（计算机）在一个厨房（CPU）里需要同时做3个菜（进程）、每个菜需要准备不同的调料以及协作（线程），那么这个厨师需要不断地切换时间（时间片）来达到同时在一个时间将三个菜做完。多核的话其实对应的例子就是多个厨师，这样的例子太多了因为万物皆对象
python学习第三天 Leo来编程 Python学习 python 开发语言
条件判断条件判断使用if、elif和else关键字。它们用于根据条件执行不同的代码块。#条件判断age=18ifage0:#也可以写if(s>0)但是没必要因为python给个提示建议去掉保证代码的按照缩进来进行更加规范print("这个数字是大于0的数字!")#这行代码属于if语句的代码块elifs==0:print("这个数字是等于0的数字!")#这行代码属于elif语句的代码块else:pr
三种优化算法旅者时光算法算法 python 开发语言
本文将总结遗传算法、粒子群算法、模拟退火三种优化算法的核心思路，并使用python完整实现。实际上，越来越多的优秀算法已经被封装为一个易用的接口。很多时候，一行代码就能实现我们的需求。但了解这些算法的基本逻辑，能够使用最基本的代码实现它。无论对于提升我们的编程能力还是解决问题的能力，都会大有裨益。甚至，改变我们思考问题的方式。1、遗传算法遗传算法，顾名思义，就是借鉴了生物通过遗传变异来逐渐适应环境
HarmonyNext实战案例：基于ArkTS的高性能分布式机器学习应用开发 harmonyos-next
HarmonyNext实战案例：基于ArkTS的高性能分布式机器学习应用开发引言在HarmonyNext生态系统中，分布式机器学习是其核心特性之一。通过分布式机器学习，开发者可以充分利用多设备的计算资源，实现复杂模型的训练与推理。本文将深入探讨如何使用ArkTS12+语法开发一个高性能的分布式机器学习应用，涵盖从基础概念到高级技巧的全面讲解。通过本案例，您将学习到如何利用HarmonyNext的分
使用 Python 合并微信与支付宝账单，生成财务报告 python后端
最近用思源笔记记东西上瘾，突然想每个月存一份收支记录进去。但手动整理账单太麻烦了，支付宝导出一份CSV，微信又导出一份，格式还不一样，每次复制粘贴头都大。干脆写了个Python脚本一键处理，核心就干两件事：把俩平台的CSV账单合并到一起自动生成带分类表格的Markdown（直接拖进思源就能渲染）代码主要折腾了这些：支付宝账单前24行都是废话，直接skiprows=24跳过去，GBK编码差点让我栽跟
Python Flask 在网页应用程序中处理错误和异常 dowhileprogramming python flask 开发语言
PythonFlask在网页应用程序中处理错误和异常PythonFlask在网页应用程序中处理错误和异常PythonFlask在网页应用程序中处理错误和异常在我们所有的代码示例中，我们没有注意如何处理用户在浏览器中输入错误的URL或向我们的应用程序发送错误的参数集的情况。这不是设计意图，但目的是首先关注网页应用程序的关键组件。网页框架的美妙之处在于，它们通常默认支持错误处理。如果发生任何错误，将自
成功案例丨开发时间从1小时缩短到3分钟：如何利用历史数据训练AI模型，预测设计性能？ Altair澳汰尔 PhysicsAI 仿真 AI 机器学习 HyperWorks 数据分析
案例简介PhysicsAI™助力HEROMOTOCORP实现设计效率提升99%印度领先的跨国摩托车和踏板车制造商HeroMotoCorpLtd.（以下简称Hero）致力于通过将人工智能（AI）和机器学习技术融入有限元分析（FEA）流程，以加速产品开发周期。在其首个AI驱动项目——摩托车把手设计优化中，Hero采用了PhysicsAI™几何深度学习解决方案，利用历史数据训练AI模型并预测设计性能。A
农业生产模拟和农业政策分析：WOFOST模型与PCSE模型安装、运行、数据准备；农田农作物生长模拟和产量预测等 WangYan2022 作物模型农业 WOFOST模型 PCSE模型农田生态系统作物模型农业生产模拟
WOFOST（WorldFoodStudies）和PCSE（PythonCropSimulationEnvironment）是两个用于农业生产模拟的模型：WOFOST是一个经过多年开发和验证的模型，被广泛用于全球的农业生产模拟和农业政策分析；采用了模块化的结构，可以对不同的农作物和环境条件进行参数化和适应；WOFOST可用于长期模拟，能够模拟整个作物生长周期，包括播种、生长、收获等各个阶段；WOF
数据分析与AI丨AI Fabric：数据和人工智能架构的未来 Altair澳汰尔数据分析 ai RapidMiner 知识图谱人工智能
AIFabric架构是模块化、可扩展且面向未来的，是现代商业环境中企业实现卓越的关键。在当今商业环境中，数据分析和人工智能领域发展可谓日新月异。几乎每天都有新兴技术诞生，新的应用场景不断涌现，前沿探索持续拓展。可遗憾的是，众多企业在利用数据和人工智能方面，脚步总是滞后。这是每个行业进行创新和获得竞争优势的冲刺阶段，但正如大多数企业时常感受到的那样，大规模实施下一代数据和AI工具说起来容易做起来难。
Manus演示案例：英伟达财务估值建模解锁投资洞察的深度剖析 ylfhpy Manus 深度学习人工智能机器学习机器翻译 Manus
在当今瞬息万变的金融投资领域，精准剖析企业价值是投资者决胜市场的关键。英伟达（NVIDIA），作为科技行业的耀眼明星，其在人工智能和半导体领域的卓越表现备受瞩目。Manus凭借专业的财务估值建模能力，深入挖掘英伟达的潜在价值，为投资者提供了一份极具价值的分析报告。Manus在接到为英伟达进行详细财务估值建模的任务后，迅速且有条不紊地开展工作。数据收集是建模的基石，其重要性不言而喻。在收集英伟达公司
基于Python+Vue开发的电影订票管理系统源码+运行步骤冷琴1996 Python系统设计 python vue.js 开发语言
项目简介该项目是基于Python+Vue开发的电影订票管理系统（前后端分离），这是一项为大学生课程设计作业而开发的项目。该系统旨在帮助大学生学习并掌握Python编程技能，同时锻炼他们的项目设计与开发能力。通过学习基于Python的电影订票管理系统项目，大学生可以在实践中学习和提升自己的能力，为以后的职业发展打下坚实基础。技术学习之路主要功能影片管理：管理系统可以录入、修改和查询影片的基本信息，如
Python通过YOLO格式TXT标签文件在图像中画框 CHERISH_KDX python YOLO 人工智能
使用场景检测数据集标注是否有误：在目标检测算法中需要标注自己的数据集，为了更加方便的检查数据集标注是否有误，可以使用该工具将标注结果绘制在图像中并查看。美化识别结果中的检测框：在一些目标检测场景中，YOLO检测算法原始的检测框绘制会导致重叠、颜色冲突、字体过大等问题。可以使用该工具进行修改。代码importosimportcv2classcheck_label:def__init__(self,c
基于llama_cpp 调用本地模型（llama）实现基本推理月光技术杂谈大模型初探 llama llama.cpp python LLM 集成显卡本地模型 AI
零基础实践本地推理模型基本应用：基于llama_cpp的本地模型调用。本文先安装llama_cpppython库，再编写程序，利用其调用llama-2-7b-chat.Q4_K_M.ggu模型。背景llama_cpp是一个基于C++的高性能库（llama.cpp）的Python绑定，支持在CPU或GPU上高效运行LLaMA及其衍生模型（如LLaMA2），并通过量化技术（如GGUF格式）优化内存使用
python实现查找满足条件的数字 qq_恰同学少年 python
问题：一个四位数，知道其前两位和后两位分别相等，并且这个数还是一个平方数，求出这个数。一个四位数，范围只能是1000~9999，前两位和后两位分别相等，也就是说，它的结构应该是aabb。最后，这个数是一个平方数。有的小伙伴可能不知道啥叫平方数，暂且解释下，所谓的平方数就是指该数等于一个整数的平方。比如3的平方是9，那么我们就说9是个平方数。第一步，这是个四位数，前两位和后两位分别相等，我们将满足条
python中常用的内置模块举例（入门级整理） qq_恰同学少年 python
python对于初学者可以说是十分友好的一门编程语言，不仅语法简单，而且它自身还包含了十分丰富的第三方模块，我仅就将我自己常用的一些内置模块（自带的，无需安装）做一下简单的总结和介绍：1.turtleturtle，是python中比较好玩一个模块，它有一个专有名称“海龟作图”，光看名字就应该能够猜到它是用来干嘛的，没错，就是来画图的，它可以通过某些语句来控制一个点在白板上的运动轨迹，它在白板上走过
QPython双核攻略：从零基础到AI开发，你的手机就是全栈训练营程之编 python 开发语言青少年编程人工智能
主题一：《编程小白必看！在手机上种下你的第一行代码》✨北京优趣天下信息技术有限公司重磅出品我们比谁都清楚：✔️86%的初学者因环境配置放弃编程✔️72%的上班族只有碎片化学习时间✔️95%的自学者需要即时答疑支持为什么QPython成为2025现象级学习工具？▸全栈开发环境：解释器+编辑器+控制台三合一▸AI导师常驻：集成DeepSeek代码助手（支持中英双语提问）▸极速学习路径：Q派课程7天完成
Python学习指南：系统化路径 + 避坑建议程之编 Python全栈通关秘籍青少年编程 python 开发语言人工智能机器学习
新手小白学习编程就像搭积木——需要从基础开始，逐步构建知识体系。以下是为你量身定制的Python学习路径，帮你告别杂乱，高效入门！一、学习前的关键认知明确目标：想用Python做什么？数据分析（如Excel自动化、可视化）Web开发（如搭建网站）人工智能（如机器学习）自动化办公（如处理文件、邮件）目标不同，后续学习侧重点不同（但基础通用）。避免误区：❌只看教程不写代码✅边学边动手，哪怕抄代码也要运
枚举的构造函数中抛出异常会怎样 bylijinnan java enum 单例
首先从使用enum实现单例说起。为什么要用enum来实现单例？这篇文章（ http://javarevisited.blogspot.sg/2012/07/why-enum-singleton-are-better-in-java.html）阐述了三个理由： 1.enum单例简单、容易，只需几行代码： public enum Singleton { INSTANCE;
CMake 教程 aigo C++
转自：http://xiang.lf.blog.163.com/blog/static/127733322201481114456136/ CMake是一个跨平台的程序构建工具，比如起自己编写Makefile方便很多。介绍：http://baike.baidu.com/view/1126160.htm 本文件不介绍CMake的基本语法，下面是篇不错的入门教程： http:
cvc-complex-type.2.3: Element 'beans' cannot have character Cb123456 spring Webgis
cvc-complex-type.2.3: Element 'beans' cannot have character Line 33 in XML document from ServletContext resource [/WEB-INF/backend-servlet.xml] is i
jquery实例:随页面滚动条滚动而自动加载内容 120153216 jquery
<script language="javascript"> $(function (){ var i = 4;$(window).bind("scroll", function (event){ //滚动条到网页头部的高度，兼容ie,ff,chrome var top = document.documentElement.s
将数据库中的数据转换成dbs文件何必如此 sql dbs
旗正规则引擎通过数据库配置器（DataBuilder）来管理数据库，无论是Oracle，还是其他主流的数据都支持，操作方式是一样的。旗正规则引擎的数据库配置器是用于编辑数据库结构信息以及管理数据库表数据，并且可以执行SQL 语句，主要功能如下。 1)数据库生成表结构信息：主要生成数据库配置文件(.conf文
在IBATIS中配置SQL语句的IN方式 357029540 ibatis
在使用IBATIS进行SQL语句配置查询时，我们一定会遇到通过IN查询的地方，在使用IN查询时我们可以有两种方式进行配置参数：String和List。具体使用方式如下： 1.String:定义一个String的参数userIds，把这个参数传入IBATIS的sql配置文件，sql语句就可以这样写： <select id="getForms" param
Spring3 MVC 笔记（一） 7454103 spring mvc bean REST JSF
自从 MVC 这个概念提出来之后 struts1.X struts2.X jsf 。。。。。这个view 层的技术一个接一个！都用过！不敢说哪个绝对的强悍！要看业务，和整体的设计！最近公司要求开发个新系统！
Timer与Spring Quartz 定时执行程序 darkranger spring bean 工作 quartz
有时候需要定时触发某一项任务。其实在jdk1.3，java sdk就通过java.util.Timer提供相应的功能。一个简单的例子说明如何使用，很简单： 1、第一步，我们需要建立一项任务，我们的任务需要继承java.util.TimerTask package com.test; import java.text.SimpleDateFormat; import java.util.Date;
大端小端转换，le32_to_cpu 和cpu_to_le32 aijuans C语言相关
大端小端转换，le32_to_cpu 和cpu_to_le32 字节序 http://oss.org.cn/kernel-book/ldd3/ch11s04.html 小心不要假设字节序. PC 存储多字节值是低字节为先(小端为先, 因此是小端), 一些高级的平台以另一种方式(大端)
Nginx负载均衡配置实例详解 avords
[导读] 负载均衡是我们大流量网站要做的一个东西，下面我来给大家介绍在Nginx服务器上进行负载均衡配置方法，希望对有需要的同学有所帮助哦。负载均衡先来简单了解一下什么是负载均衡，单从字面上的意思来理解就可以解负载均衡是我们大流量网站要做的一个东西，下面我来给大家介绍在Nginx服务器上进行负载均衡配置方法，希望对有需要的同学有所帮助哦。负载均衡先来简单了解一下什么是负载均衡
乱说的 houxinyou 框架敏捷开发软件测试
从很久以前，大家就研究框架，开发方法，软件工程，好多！反正我是搞不明白！这两天看好多人研究敏捷模型，瀑布模型！也没太搞明白. 不过感觉和程序开发语言差不多，瀑布就是顺序，敏捷就是循环. 瀑布就是需求、分析、设计、编码、测试一步一步走下来。而敏捷就是按摸块或者说迭代做个循环，第个循环中也一样是需求、分析、设计、编码、测试一步一步走下来。也可以把软件开发理
欣赏的价值——一个小故事 bijian1013 有效辅导欣赏欣赏的价值
　　第一次参加家长会，幼儿园的老师说："您的儿子有多动症，在板凳上连三分钟都坐不了，你最好带他去医院看一看。"　　回家的路上，儿子问她老师都说了些什么，她鼻子一酸，差点流下泪来。因为全班30位小朋友，惟有他表现最差；惟有对他，老师表现出不屑，然而她还在告诉她的儿子："老师表扬你了，说宝宝原来在板凳上坐不了一分钟，现在能坐三分钟。其他妈妈都非常羡慕妈妈，因为全班只有宝宝
包冲突问题的解决方法 bingyingao eclipse maven exclusions 包冲突
包冲突是开发过程中很常见的问题：其表现有： 1.明明在eclipse中能够索引到某个类，运行时却报出找不到类。 2.明明在eclipse中能够索引到某个类的方法，运行时却报出找不到方法。 3.类及方法都有，以正确编译成了.class文件，在本机跑的好好的，发到测试或者正式环境就抛如下异常： java.lang.NoClassDefFoundError: Could not in
【Spark七十五】Spark Streaming整合Flume-NG三之接入log4j bit1129 Stream
先来一段废话：实际工作中，业务系统的日志基本上是使用Log4j写入到日志文件中的，问题的关键之处在于业务日志的格式混乱，这给对日志文件中的日志进行统计分析带来了极大的困难，或者说，基本上无法进行分析，每个人写日志的习惯不同，导致日志行的格式五花八门，最后只能通过grep来查找特定的关键词缩小范围，但是在集群环境下，每个机器去grep一遍，分析一遍，这个效率如何可想之二，大好光阴都浪费在这上面了
sudoku solver in Haskell bookjovi sudoku haskell
这几天没太多的事做，想着用函数式语言来写点实用的程序，像fib和prime之类的就不想提了（就一行代码的事），写什么程序呢？在网上闲逛时发现sudoku游戏，sudoku十几年前就知道了，学生生涯时也想过用C/Java来实现个智能求解，但到最后往往没写成，主要是用C/Java写的话会很麻烦。现在写程序，本人总是有一种思维惯性，总是想把程序写的更紧凑，更精致，代码行数最少，所以现
java apache ftpClient bro_feng java
最近使用apache的ftpclient插件实现ftp下载，遇见几个问题，做如下总结。 1. 上传阻塞，一连串的上传，其中一个就阻塞了，或是用storeFile上传时返回false。查了点资料，说是FTP有主动模式和被动模式。将传出模式修改为被动模式ftp.enterLocalPassiveMode();然后就好了。看了网上相关介绍，对主动模式和被动模式区别还是比较的模糊，不太了解被动模
读《研磨设计模式》-代码笔记-工厂方法模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ package design.pattern; /* * 工厂方法模式：使一个类的实例化延迟到子类 * 某次，我在工作不知不觉中就用到了工厂方法模式（称为模板方法模式更恰当。2012-10-29）： * 有很多不同的产品，它
面试记录语 chenyu19891124 招聘
或许真的在一个平台上成长成什么样，都必须靠自己去努力。有了好的平台让自己展示，就该好好努力。今天是自己单独一次去面试别人，感觉有点小紧张，说话有点打结。在面试完后写面试情况表，下笔真的好难，尤其是要对面试人的情况说明真的好难。今天面试的是自己同事的同事，现在的这个同事要离职了，介绍了我现在这位同事以前的同事来面试。今天这位求职者面试的是配置管理，期初看了简历觉得应该很适合做配置管理，但是今天面
Fire Workflow 1.0正式版终于发布了 comsci 工作 workflow Google
Fire Workflow 是国内另外一款开源工作流，作者是著名的非也同志，哈哈.... 官方网站是 http://www.fireflow.org 经过大家努力,Fire Workflow 1.0正式版终于发布了正式版主要变化: 1、增加IWorkItem.jumpToEx(...)方法，取消了当前环节和目标环节必须在同一条执行线的限制，使得自由流更加自由 2、增加IT
Python向脚本传参 daizj python 脚本传参
如果想对python脚本传参数，python中对应的argc, argv(c语言的命令行参数)是什么呢？需要模块：sys 参数个数：len(sys.argv) 脚本名： sys.argv[0] 参数1： sys.argv[1] 参数2： sys.argv[
管理用户分组的命令gpasswd dongwei_6688 passwd
NAME： gpasswd - administer the /etc/group file SYNOPSIS： gpasswd group gpasswd -a user group gpasswd -d user group gpasswd -R group gpasswd -r group gpasswd [-A user,...] [-M user,...] g
郝斌老师数据结构课程笔记 dcj3sjt126com 数据结构与算法
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
yii2 cgridview加上选择框进行操作 dcj3sjt126com GridView
页面代码 <?=Html::beginForm(['controller/bulk'],'post');?> <?=Html::dropDownList('action','',[''=>'Mark selected as: ','c'=>'Confirmed','nc'=>'No Confirmed'],['class'=>'dropdown',])
linux mysql fypop linux
enquiry mysql version in centos linux yum list installed | grep mysql yum -y remove mysql-libs.x86_64 enquiry mysql version in yum repositoryyum list | grep mysql oryum -y list mysql* install mysq
Scramble String hcx2013 String
Given a string s1, we may represent it as a binary tree by partitioning it to two non-empty substrings recursively. Below is one possible representation of s1 = "great":
跟我学Shiro目录贴 jinnianshilongnian 跟我学shiro
历经三个月左右时间，《跟我学Shiro》系列教程已经完结，暂时没有需要补充的内容，因此生成PDF版供大家下载。最近项目比较紧，没有时间解答一些疑问，暂时无法回复一些问题，很抱歉，不过可以加群（334194438/348194195）一起讨论问题。 ----广告-----------------------------------------------------
nginx日志切割并使用flume-ng收集日志 liyonghui160com
nginx的日志文件没有rotate功能。如果你不处理，日志文件将变得越来越大，还好我们可以写一个nginx日志切割脚本来自动切割日志文件。第一步就是重命名日志文件，不用担心重命名后nginx找不到日志文件而丢失日志。在你未重新打开原名字的日志文件前，nginx还是会向你重命名的文件写日志，linux是靠文件描述符而不是文件名定位文件。第二步向nginx主
Oracle死锁解决方法 pda158 oracle
　select p.spid,c.object_name,b.session_id,b.oracle_username,b.os_user_name from v$process p,v$session a, v$locked_object b,all_objects c where p.addr=a.paddr and a.process=b.process and c.object_id=b.
java之List排序 shiguanghui list排序
在Java Collection Framework中定义的List实现有Vector，ArrayList和LinkedList。这些集合提供了对对象组的索引访问。他们提供了元素的添加与删除支持。然而，它们并没有内置的元素排序支持。　　你能够使用java.util.Collections类中的sort()方法对List元素进行排序。你既可以给方法传递
servlet单例多线程 utopialxw 单例多线程 servlet
转自http://www.cnblogs.com/yjhrem/articles/3160864.html 和 http://blog.chinaunix.net/uid-7374279-id-3687149.html Servlet 单例多线程 Servlet如何处理多个请求访问？Servlet容器默认是采用单实例多线程的方式处理多个请求的：1.当web服务器启动的