data science (part3)

1. What Is K-Means Clustering?

K-means is an unsupervised learning algorithm used for problems having to do with clustering data. It follows the sequence of steps described below: 

  1. Choose how many clusters to create and assign it as k. 
  2. Choose k points from the dataset randomly, which will serve as the centroids. 
  3. Take each data point and group it with the closest centroid. This will lead to the formation of k clusters. 
  4. Calculate the variance in the dataset and assign a new centroid for each cluster accordingly.
  5. Now repeat the third step by reassigning each data point with the new centroids. 
  6. If any reassignments have taken place, then repeat the fourth step. If not, the model is ready. 

2. How Can You Select K for K-Means?

The most popular method for selecting k for the k-means algorithm is using the elbow method. To do this, you need to calculate the Within-Cluster-Sum of Squared Errors (WSS) for different k values. The WSS is described as the sum of the squares of the distance between each data value and its centroid.

You will then choose the value of k for which the WSS error starts to become negligible. 

3. What Are the Assumptions Required for a Linear Regression?

There are four major assumptions. 

1. There is a linear relationship between the dependent variables and the regressors, meaning the model you are creating actually fits the data. 

2. The errors or residuals of the data are normally distributed and independent from each other. 3. There is minimal multicollinearity between explanatory variables

4. Homoscedasticity—the variance around the regression line—is the same for all values of the predictor variable.

4. What Is a Linear Regression Model? List Its Drawbacks.

A linear regression model is a model in which there is a linear relationship between the dependent and independent variables. 

Here are the drawbacks of linear regression: 

  • Only the mean of the dependent variable is taken into consideration. 
  • It assumes that the data is independent. 
  • The method is sensitive to outlier data values. 

5.

你可能感兴趣的:(css,css3,html)