radar_sun

9. Data Manipulation with dplyr in R

文章目录

1. Transforming Data with dplyr
- 1.1 The countries dataset (video)
- 1.2 Understanding you data
- 1.3 Selecting columns
- 1.4 The filter and arrange verbs (video)
- 1.5 Arranging observations
- 1.6 Filtering for conditions
- 1.7 Filtering and arranging
- 1.8 Mutate (video)
- 1.9 Calculating the number of government employees
- 1.10 Calculating the percentage of women in a country
- 1.11 Select, mutate, filter, and arrange
2. Aggregating Data
- 2.1 The count verb (video)
- 2.2 Counting by region
- 2.3 Counting citizens by state
- 2.4 Mutating and counting
- 2.5 The group by, summarize and ungroup verbs (video)
- 2.6 Summarizing
- 2.7 Summarizing by state
- 2.8 Summarizing by state and region
- 2.9 The top_n verb (video)
- 2.10 Selecting a country from each region
- 2.11 Finding the highest-income state in each region
- 2.12 Using summarize, top_n, and count together
3. Selecting and Transforming Data
- 3.1 Selecting (video)
- 3.2 Selecting columns
- 3.3 Select helpers
- 3.4 The renames verb (video)
- 3.5 Renaming a column after count
- 3.6 Rename a column as part of a select
- 3.7 The transmute verb (video)
- 3.8 Choosing among verbs
- 3.9 Using transmute
- 3.10 Matching verbs to their definitions
- 3.11 Choosing among the four verbs
4. Case Study: The babynames Dataset
- 4.1 The babynames data (video)
- 4.2 Filtering and arranging for one year
- 4.3 Using top_n with babynames
- 4.4 Visualizing names with ggplots
- 4.5 Grouped mutates (video)
- 4.6 Finding the year each name is most common
- 4.7 Adding the total and maximum for each name
- 4.8 Visualizing the normalized change in popularity
- 4.9 Window function (video)
- 4.10 Using ratios to describe the frequency of a name
- 4.11 Biggest jumps in a name
- 4.12 Congratulations!

1. Transforming Data with dplyr

1.1 The countries dataset (video)

1.2 Understanding you data

1.3 Selecting columns

Select the following four columns from the counties variable:

state
county
population
poverty

You don’t need to save the result to a variable.

Instruction:

Select the columns listed from the counties variable.

# Select the columns 
counties %>%
select(state, county, population, poverty)

1.4 The filter and arrange verbs (video)

1.5 Arranging observations

Here you see the counties_selected dataset with a few interesting variables selected. These variables: private_work, public_work, self_employed describe whether people work for the government, for private companies, or for themselves.

In these exercises, you’ll sort these observations to find the most interesting cases.

Instruction:

Add a verb to sort the observations of the public_work variable in descending order.

counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)

# Add a verb to sort in descending order of public_work
counties_selected %>%
arrange(desc(public_work))

1.6 Filtering for conditions

You use the filter() verb to get only observations that match a particular condition, or match multiple conditions.

Instruction 1:

Find only the counties that have a population above one million (1000000).

counties_selected <- counties %>%
select(state, county, population)

# Filter for counties with a population above 1000000
counties_selected %>%
filter(population > 1000000)

Instruction 2:

Find only the counties in the state of California that also have a population above one million (1000000).

counties_selected <- counties %>%
select(state, county, population)

# Filter for counties in the state of California that have a population above 1000000
counties_selected %>%
filter(state == "California" & population > 1000000)

1.7 Filtering and arranging

We’re often interested in both filtering and sorting a dataset, to focus on observations of particular interest to you. Here, you’ll find counties that are extreme examples of what fraction of the population works in the private sector.

Instruction:

Filter for counties in the state of Texas that have more than ten thousand people (10000), and sort them in descending order of the percentage of people employed in private work.

counties_selected <- counties %>%
select(state, county, population, private_work, public_work, self_employed)

# Filter for Texas and more than 10000 people; sort in descending order of private_work
counties_selected %>%
filter(state == 'Texas', population > 10000)%>%
arrange(desc(private_work))

1.8 Mutate (video)

1.9 Calculating the number of government employees

In the video, you used the unemployment variable, which is a percentage, to calculate the number of unemployed people in each county. In this exercise, you’ll do the same with another percentage variable: public_work.

The code provided already selects the state, county, population, and public_work columns.

Instruction 1:

Use mutate() to add a column called public_workers to the dataset, with the number of people employed in public (government) work.

counties_selected <- counties %>%
select(state, county, population, public_work)

# Add a new column public_workers with the number of people employed in public work
counties_selected %>%
mutate(public_workers = population * public_work / 100)

Instruction 2:

Sort the new column in descending order.

counties_selected <- counties %>%
select(state, county, population, public_work)

# Sort in descending order of the public_workers column
counties_selected %>%
mutate(public_workers = public_work * population / 100) %>%
arrange(desc(public_workers))

1.10 Calculating the percentage of women in a country

The dataset includes columns for the total number (not percentage) of men and women in each county. You could use this, along with the population variable, to compute the fraction of men (or women) within each county.

In this exercise, you’ll select the relevant columns yourself.

Instruction:

Select the columns state, county, population, men, and women.
Add a new variable called proportion_women with the fraction of the county’s population made up of women.

# Select the columns state, county, population, men, and women
counties_selected <- counties %>%
select(state, county, population, men,women)
  
# Calculate proportion_women as the fraction of the population made up of women
counties_selected %>%
mutate(proportion_women = women / population)

1.11 Select, mutate, filter, and arrange

In this exercise, you’ll put together everything you’ve learned in this chapter (select(), mutate(), filter() and arrange()), to find the counties with the highest proportion of men.

Instruction:

Select only the columns state, county, population, men, and women.
Add a variable proportion_men with the fraction of the county’s population made up of men.
Filter for counties with a population of at least ten thousand (10000).
Arrange counties in descending order of their proportion of men.

counties %>%
# Select the five columns 
select(state, county, population, men, women)%>%
# Add the proportion_men variable
mutate(proportion_men = men / population)%>%
# Filter for population of at least 10,000
filter(population >= 10000)%>%
# Arrange proportion of men in descending order 
arrange(desc(proportion_men))

2. Aggregating Data

2.1 The count verb (video)

2.2 Counting by region

The counties dataset contains columns for region, state, population, and the number of citizens, which we selected and saved as the counties_selected table. In this exercise, you’ll focus on the region column.

counties_selected <- counties %>%
select(region, state, population, citizens)

Instruction:

Use count() to find the number of counties in each region, using a second argument to sort in descending order.

# Use count to find the number of counties in each region
counties_selected %>%
count(region, sort = TRUE)

2.3 Counting citizens by state

You can weigh your count by particular variables rather than finding the number of counties. In this case, you’ll find the number of citizens in each state.

counties_selected <- counties %>%
select(region, state, population, citizens)

Instruction:

Count the number of counties in each state, weighted based on the citizens column, and sorted in descending order.

# Find number of counties per state, weighted by citizens
counties_selected %>%
count(state, wt = citizens, sort = TRUE)

2.4 Mutating and counting

You can combine multiple verbs together to answer increasingly complicated questions of your data. For example: “What are the US states where the most people walk to work?”

You’ll use the walk column, which offers a percentage of people in each county that walk to work, to add a new column and count based on it.

counties_selected <- counties %>%
select(region, state, population, walk)

Instruction:

Use mutate() to calculate and add a column called population_walk, containing the total number of people who walk to work in a county.
Use a (weighted and sorted) count() to find the total number of people who walk to work in each state.

counties_selected %>%
# Add population_walk containing the total number of people who walk to work 
mutate(population_walk = walk * population / 100) %>% 
# Count weighted by the new column
count(state, wt = population_walk, sort = TRUE)

2.5 The group by, summarize and ungroup verbs (video)

2.6 Summarizing

The summarize() verb is very useful for collapsing a large dataset into a single observation.

counties_selected <- counties %>%
select(county, population, income, unemployment)

Instruction:

Summarize the counties dataset to find the following columns: min_population (with the smallest population), max_unemployment (with the maximum unemployment), and average_income (with the mean of the income variable).

# Summarize to find minimum population, maximum unemployment, and average income
counties_selected %>%
summarise(min_population = min(population), 
          max_unemployment = max(unemployment),
          average_income = mean(income))

2.7 Summarizing by state

Another interesting column is land_area, which shows the land area in square miles. Here, you’ll summarize both population and land area by state, with the purpose of finding the density (in people per square miles).

counties_selected <- counties %>%
select(state, county, population, land_area)

Instruction 1:

Group the data by state, and summarize to create the columns total_area (with total area in square miles) and total_population (with total population).

# Group by state and find the total area and population
counties_selected %>%
group_by(state) %>%
summarise(total_area = sum(land_area), total_population = sum(population))

Instruction 2:

Add a density column with the people per square mile, then arrange in descending order.

# Add a density column, then sort in descending order
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
          total_population = sum(population)) %>%
mutate(density = total_population / total_area) %>%
arrange(desc(density))

2.8 Summarizing by state and region

You can group by multiple columns instead of grouping by one. Here, you’ll practice aggregating by state and region, and notice how useful it is for performing multiple aggregations in a row.

counties_selected <- counties %>%
select(region, state, county, population)

Instruction 1:

Summarize to find the total population, as a column called total_pop, in each combination of region and state.

# Summarize to find the total population
counties_selected %>%
group_by(region, state) %>%
summarize(total_pop = sum(population))

Instruction 2:

Notice the tibble is still grouped by region; use another summarize step to calculate two new columns: the average state population in each region (average_pop) and the median state population in each region (median_pop).

# Calculate the average_pop and median_pop columns 
counties_selected %>%
group_by(region, state) %>%
summarize(total_pop = sum(population)) %>%
summarize(average_pop = mean(total_pop),
          median_pop = median(total_pop))

2.9 The top_n verb (video)

2.10 Selecting a country from each region

Previously, you used the walk column, which offers a percentage of people in each county that walk to work, to add a new column and count to find the total number of people who walk to work in each county.

Now, you’re interested in finding the county within each region with the highest percentage of citizens who walk to work.

counties_selected <- counties %>%
select(region, state, county, metro, population, walk)

Instruction:

Find the county in each region with the highest percentage of citizens who walk to work.

# Group by region and find the greatest number of citizens who walk to work
counties_selected %>%
group_by(region)%>%
top_n(1,walk)

2.11 Finding the highest-income state in each region

You’ve been learning to combine multiple dplyr verbs together. Here, you’ll combine group_by(), summarize(), and top_n() to find the state in each region with the highest income.

When you group by multiple columns and then summarize, it’s important to remember that the summarize “peels off” one of the groups, but leaves the rest on. For example, if you group_by(X, Y) then summarize, the result will still be grouped by X.

counties_selected <- counties %>%
select(region, state, county, population, income)

Instruction:

Calculate the average income (as average_income) of counties within each region and state (notice the group_by() has already been done for you).
Find the highest income state in each region.

counties_selected %>%
group_by(region, state) %>%
# Calculate average income
summarize(average_income = mean(income))%>%
# Find the highest income state in each region
top_n(1,average_income)

2.12 Using summarize, top_n, and count together

In this chapter, you’ve learned to use five dplyr verbs related to aggregation: count(), group_by(), summarize(), ungroup(), and top_n(). In this exercise, you’ll use all of them to answer a question: In how many states do more people live in metro areas than non-metro areas?

Recall that the metro column has one of the two values “Metro” (for high-density city areas) or “Nonmetro” (for suburban and country areas).

counties_selected <- counties %>%
select(state, metro, population)

Instruction 1:

For each combination of state and metro, find the total population as total_pop.

# Find the total population for each combination of state and metro
counties_selected %>%
group_by(state, metro) %>%
summarize(total_pop = sum(population))

Instruction 2:

Extract the most populated row from each state, which will be either Metro or Nonmetro.

# Extract the most populated row for each state
counties_selected %>%
group_by(state, metro) %>%
summarize(total_pop = sum(population)) %>%
top_n(1, total_pop)

Instruction 3:

Ungroup, then count how often Metro or Nonmetro appears to see how many states have more people living in those areas.

# Count the states with more people in Metro or Nonmetro areas
counties_selected %>%
group_by(state, metro) %>%
summarize(total_pop = sum(population)) %>%
top_n(1, total_pop) %>%
ungroup() %>%
count(metro)

3. Selecting and Transforming Data

3.1 Selecting (video)

3.2 Selecting columns

Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs. The colon (:) is useful for getting many columns at a time.

Instruction:

Use glimpse() to examine all the variables in the counties table.
Select the columns for state, county, population, and (using a colon) all five of those industry-related variables; there are five consecutive variables in the table related to the industry of people’s work: professional, service, office, construction, and production.
Arrange the table in descending order of service to find which counties have the highest rates of working in the service industry.

# Glimpse the counties table
glimpse(counties)

counties %>%
# Select state, county, population, and industry-related columns
select(state, county, population,professional, service, office, construction, production)%>%
# Arrange service in descending order 
arrange(desc(service))

3.3 Select helpers

In the video you learned about the select helper starts_with(). Another select helper is ends_with(), which finds the columns that end with a particular string.

Instruction:

Select the columns state, county, population, and all those that end with work.
Filter just for the counties where at least 50% of the population is engaged in public work.

counties %>%
# Select the state, county, population, and those ending with "work"
select(state, county, population, ends_with('work'))%>%
# Filter for counties that have at least 50% of people engaged in public work
filter(public_work >= 50)

3.4 The renames verb (video)

3.5 Renaming a column after count

The rename() verb is often useful for changing the name of a column that comes out of another verb, such as count(). In this exercise, you’ll rename the n column from count() (which you learned about in Chapter 2) to something more descriptive.

Instruction 1:

Use count() to determine how many counties are in each state.

# Count the number of counties in each state
counties %>%
count(state)

Instruction 2:

Notice the n column in the output; use rename() to rename that to num_counties.

# Rename the n column to num_counties
counties %>%
count(state)  %>%
rename(num_counties = n)

3.6 Rename a column as part of a select

rename() isn’t the only way you can choose a new name for a column: you can also choose a name as part of a select().

Instruction:

Select the columns state, county, and poverty from the counties dataset; in the same step, rename the poverty column to poverty_rate.

# Select state, county, and poverty as poverty_rate
counties %>%
select(state, county, poverty_rate = poverty)

3.7 The transmute verb (video)

3.8 Choosing among verbs

3.9 Using transmute

As you learned in the video, the transmute verb allows you to control which variables you keep, which variables you calculate, and which variables you drop.

Instruction:

Keep only the state, county, and population columns, and add a new column, density, that contains the population per land_area.
Filter for only counties with a population greater than one million.
Sort the table in ascending order of density.

counties %>%
# Keep the state, county, and populations columns, and add a density column
transmute(state, county, population, density = population / land_area)%>%
# Filter for counties with a population greater than one million 
filter(population > 1000000)%>%
# Sort density in ascending order 
arrange(density)

3.10 Matching verbs to their definitions

3.11 Choosing among the four verbs

In this chapter you’ve learned about the four verbs: select, mutate, transmute, and rename. Here, you’ll choose the appropriate verb for each situation. You won’t need to change anything inside the parentheses.

Instruction:

Choose the right verb for changing the name of the unemployment column to unemployment_rate.
Choose the right verb for keeping only the columns state, county, and the ones containing poverty.
Calculate a new column called fraction_women with the fraction of the population made up of women, without dropping any columns.
Keep only three columns: the state, county, and employed / population, which you’ll call employment_rate.

# Change the name of the unemployment column
counties %>%
rename(unemployment_rate = unemployment)

# Keep the state and county columns, and the columns containing poverty
counties %>%
select(state, county, contains("poverty"))

# Calculate the fraction_women column without dropping the other columns
counties %>%
mutate(fraction_women = women / population)

# Keep only the state, county, and employment_rate columns
counties %>%
transmute(state, county, employment_rate = employed / population)

4. Case Study: The babynames Dataset

4.1 The babynames data (video)

4.2 Filtering and arranging for one year

The dplyr verbs you’ve learned are useful for exploring data. For instance, you could find out the most common names in a particular year.

Instruction:

Filter for only the year 1990.
Sort the table in descending order of the number of babies born.

babynames %>%
# Filter for the year 1990
filter(year == 1990)%>%
# Sort the number column in descending order 
arrange(desc(number))

4.3 Using top_n with babynames

You saw that you could use filter() and arrange() to find the most common names in one year. However, you could also use group_by and top_n to find the most common name in every year.

Instruction:

Use group_by and top_n to find the most common name for US babies in each year.

# Find the most common name in each year
babynames %>%
group_by(year)%>%
top_n(1, number)

4.4 Visualizing names with ggplots

The dplyr package is very useful for exploring data, but it’s especially useful when combined with other tidyverse packages like ggplot2.

Instruction 1:

Filter for only the names Steven, Thomas, and Matthew, and assign it to an object called selected_names.

# Filter for the names Steven, Thomas, and Matthew 
selected_names <- babynames %>%
filter(name %in% c("Steven","Thomas","Matthew"))

Instruction 2:

Visualize those three names as a line plot over time, with each name represented by a different color.

# Plot the names using a different color for each name
ggplot(selected_names, aes(x = year, y = number, color = name)) +
geom_line()

4.5 Grouped mutates (video)

4.6 Finding the year each name is most common

In an earlier video, you learned how to filter for a particular name to determine the frequency of that name over time. Now, you’re going to explore which year each name was the most common.

To do this, you’ll be combining the grouped mutate approach with a top_n.

Instruction:

Complete the code so that it finds the year each name is most common.

# Find the year each name is most common 
babynames %>%
group_by(year) %>%
mutate(year_total = sum(number)) %>%
ungroup() %>%
mutate(fraction = number / year_total) %>%
group_by(name) %>%
top_n(1, fraction)

4.7 Adding the total and maximum for each name

In the video, you learned how you could group by the year and use mutate() to add a total for that year.

In these exercises, you’ll learn to normalize by a different, but also interesting metric: you’ll divide each name by the maximum for that name. This means that every name will peak at 1.

Once you add new columns, the result will still be grouped by name. This splits it into 48,000 groups, which actually makes later steps like mutates slower.

Instruction 1:
Use a grouped mutate to add two columns:

name_total, with the total number of babies born with that name in the entire dataset.
name_max, with the highest number of babies born in any year.

# Add columns name_total and name_max for each name
babynames %>%
group_by(name) %>%
mutate(name_total = sum(number),
       name_max = max(number))

Instruction 2:

Add another step to ungroup the table.
Add a column called fraction_max, with the number in the year divided by the maximum for that name.

babynames %>%
group_by(name) %>%
mutate(name_total = sum(number),
       name_max = max(number)) %>%
# Ungroup the table 
ungroup() %>%
# Add the fraction_max column containing the number by the name maximum 
mutate(fraction_max = number / name_max)

4.8 Visualizing the normalized change in popularity

You picked a few names and calculated each of them as a fraction of their peak. This is a type of “normalizing” a name, where you’re focused on the relative change within each name rather than the overall popularity of the name.

In this exercise, you’ll visualize the normalized popularity of each name. Your work from the previous exercise, names_normalized, has been provided for you.

names_normalized <- babynames %>%
                     group_by(name) %>%
                     mutate(name_total = sum(number),
                            name_max = max(number)) %>%
                     ungroup() %>%
                     mutate(fraction_max = number / name_max)

Instruction:

Filter the names_normalized table to limit it to the three names Steven, Thomas, and Matthew.
Visualize fraction_max for those names over time.

# Filter for the names Steven, Thomas, and Matthew
names_filtered <- names_normalized %>%
filter(name %in% c('Steven', 'Thomas', 'Matthew'))

# Visualize these names over time
ggplot(names_filtered, aes(x = year, y = fraction_max, color = name)) + 
geom_line()

4.9 Window function (video)

4.10 Using ratios to describe the frequency of a name

In the video, you learned how to find the difference in the frequency of a baby name between consecutive years. What if instead of finding the difference, you wanted to find the ratio?

You’ll start with the babynames_fraction data already, so that you can consider the popularity of each name within each year.

Instruction:

Arrange the data in ascending order of name and then year.
Group by name so that your mutate works within each name.
Add a column ratio containing the ratio between each year.

babynames_fraction %>%
# Arrange the data in order of name, then year 
arrange(name, year) %>%
# Group the data by name
group_by(name) %>%
# Add a ratio column that contains the ratio between each year 
mutate(ratio = fraction / lag(fraction))

4.11 Biggest jumps in a name

Previously, you added a ratio column to describe the ratio of the frequency of a baby name between consecutive years to describe the changes in the popularity of a name. Now, you’ll look at a subset of that data, called babynames_ratios_filtered, to look further into the names that experienced the biggest jumps in popularity in consecutive years.

babynames_ratios_filtered <- babynames_fraction %>%
                     arrange(name, year) %>%
                     group_by(name) %>%
                     mutate(ratio = fraction / lag(fraction)) %>%
                     filter(fraction >= 0.00001)

Instruction:

From each name in the data, keep the observation (the year) with the largest ratio; note the data is already grouped by name.
Sort the ratio column in descending order.
Filter the babynames_ratios_filtered data further by filtering the fraction column to only display results greater than or equal to 0.001.

babynames_ratios_filtered %>%
# Extract the largest ratio from each name 
top_n(1,ratio) %>%
# Sort the ratio column in descending order 
arrange(desc(ratio)) %>%
# Filter for fractions greater than or equal to 0.001
filter(fraction >= 0.001)

4.12 Congratulations!

你可能感兴趣的:(r语言,dplyr)

R语言中的函数32：seq_along() zoujiahui_2018 #R语言中的函数 r语言开发语言
介绍seq_along函数在R语言中用于生成一个整数序列，其长度与给定对象的长度相同。这个函数特别有用，当你想要创建一个索引序列来遍历一个向量或列表时。用法seq_along(x)参数x:任何R对象（如向量、列表等）。返回值:返回一个从1到x的长度的整数序列。示例#创建一个向量vec<-c("a","b","c")#使用seq_along生成索引indices<-seq_along(vec)pri
使用R语言绘制山脊图的ggridges包心之飞翼 r语言开发语言 R语言
使用R语言绘制山脊图的ggridges包山脊图（ridgeplot）是一种用于可视化多个分布或变量之间关系的图表类型。在R语言中，可以使用ggridges包来创建漂亮的山脊图。本文将介绍如何使用ggridges包绘制山脊图，并提供相应的源代码供参考。首先，确保已经安装了ggridges包。可以使用以下代码来安装：install.packages("ggridges")安装完毕后，加载ggridge
Anaconda3 介绍和安装 gorgor在码农 #python入门基础 python conda
介绍Anaconda是一个开源的Python和R语言发行版，专注于数据科学、机器学习和科学计算，主要面向数据科学和机器学习领域。它集成了大量常用的科学计算库（如NumPy、Pandas、Matplotlib、Scikit-learn等），并提供了强大的包管理工具Conda和环境管理功能，适合快速部署和管理复杂的开发环境。特点：预装丰富库：包含250+常用的数据科学工具包，无需手动安装。跨平台支持：
$ operator is invalid for atomic vectors什么意思滚菩提哦呢
"$operatorisinvalidforatomicvectors"意思是在对原子向量使用"$"操作符时是无效的。"$"操作符是R语言中用于访问数据框(dataframe)中的列的常用操作符。但是，原子向量(atomicvector)是R中的一种基本数据类型，它是一个长度固定的向量，并且所有元素都是相同的数据类型。因此，在对原子向量使用"$"操作符时是无效的，因为原子向量没有列的概念。例如，下
5-R循环 qwy715229258163 R语言 r语言 python 算法
R循环有的时候，我们可能需要多次执行同一块代码。一般情况下，语句是按顺序执行的：函数中的第一个语句先执行，接着是第二个语句，依此类推。编程语言提供了更为复杂执行路径的多种控制结构。循环语句允许我们多次执行一个语句或语句组，下面是大多数编程语言中循环语句的流程图：R语言提供的循环类型有:repeat循环while循环for循环R语言提供的循环控制语句有：break语句Next语句循环控制语句改变你代
R语言可视化散点图实战：为每一个数据点都绘制指示线段或者都不绘制、ggrepel包 statistics.insight r语言开发语言数据挖掘机器学习
R语言可视化散点图实战：为每一个数据点都绘制指示线段或者都不绘制、ggrepel包目录R语言可视化散点图（scatterplot）、为每一个数据点都绘制指示线段或者都不绘制、ggrepel包来帮忙#ggrepel包的安装和加载#为每一个数据点都绘制指示线段或者都不绘制#文本标签相互排斥，远离数据点，远离绘图区域（面板）的边缘。#ggrepel包的安装和加载#从CRAN安装install.packa
三菱PLC大型项目实战指南：从零基础到成功实施 Mountain and sea 三菱plc入门系列学习自动化
三菱PLC大型项目实战指南：从零基础到成功实施作为一名刚入门的电气工程师，想要通过一个大型项目来实践三菱PLC可能会感到有些挑战，但这是一个非常有意义的过程。以下将详细介绍如何从零基础开始，一步步完成一个大型项目，并最终成功实施。一、前期准备学习基础知识了解PLC的基本组成：首先，熟悉三菱PLC的基本结构，包括中央处理单元（CPU）、程序存储器、数据存储器和输入输出端口。掌握Ladder语言：三菱
22章9节：使用 R Markdown 和 Shiny 结合R语言进行数据报告和交互式应用的创建 DAT｜R科学用R探索医药数据科学 r语言开发语言大数据人工智能 r语言-4.2.1
R语言是数据科学领域中广泛应用的编程语言之一，它的强大之处不仅在于数据分析能力，还体现在其丰富的可视化和报告生成功能上。在数据分析的过程中，生成报告、展示结果和与他人共享工作成果是非常重要的任务。Shiny是一个用于构建交互式Web应用的R包，它能够将R语言的分析能力与动态、互动的Web界面结合起来，允许用户与数据交互、实时更新结果。在本文中，我们将探讨如何使用RMarkdown和Shiny结合R
4-R判断语句 qwy715229258163 R语言 r语言 python 开发语言
R判断语句判断结构要求程序员指定一个或多个要评估或测试的条件，以及条件为真时要执行的语句（必需的）和条件为假时要执行的语句（可选的）。下面是大多数编程语言中典型的判断结构的一般形式：R语言提供了以下类型的判断语句：if语句if…else语句switch语句1.if语句一个if语句由一个布尔表达式后跟一个或多个语句组成。语法格式如下：if(boolean_expression){//布尔表达式为真将
ProtoBuf 官方文档（二）- 语法指引（proto2） n大橘为重n C++ProtoBuf protobuf rpc 序列化数据结构
翻译查阅外网资料过程中遇到的比较优秀的文章和资料，一是作为技术参考以便日后查阅，二是训练英文能力。此文翻译自ProtocolBuffers官方文档LanguageGuide部分翻译为意译，不会照本宣科的字字对照翻译以下为原文内容翻译语法指引（proto2）本指南介绍如何使用protocolbuffer语言来构造protocolbuffer数据，包括.proto文件语法以及如何从.proto文件生成
R语言机器学习与临床预测模型77--机器学习预测常用R语言包武昌库里写JAVA 面试题汇总与解析 spring log4j java 开发语言算法
R小盐准备介绍R语言机器学习与预测模型的学习笔记你想要的R语言学习资料都在这里，快来收藏关注【科研私家菜】01预测模型常用R包常见回归分析包:rpart包含有分类回归树的方法;earth包可以实现多元自适应样条回归;mgev包含广义加性模型回归;Rweka包中的MSP函数可用于回归。pls包中的plsr函数实现偏最小二乘和主成分回归。stats包中的ppr函数实现投影寻踪分析，同时包括线性回归的方
R语言文本分析天龙八部 waterHBO R语言 r语言开发语言
起因，目的:前面有人对“倚天屠龙记”进行分析，我这里只是进行模仿而已。完整的文件，已经绑定了，反正读者可以找一下。案例背景小说《天龙八部》是金庸先生所著的武侠小说，也是“射雕三部曲”的前传。全书共50章，字数超过一百万字。故事发生在北宋末年，以大理国、大辽、西夏、吐蕃和北宋五国之间的纷争为背景，讲述了乔峰、虚竹、段誉三位主角的江湖恩怨和爱恨情仇。小说中融入了丰富的历史元素和深刻的人生哲理，展现了人
ggalign：热图等复杂组合图及图形数据对齐的 ggplot2 扩展万木春❀ r语言
ggalign一个R语言绘图工具ggplot2的高级扩展，它专注于在多个图形之间对齐观察值，利用vctrs包中的“numberofobservations”或NROW()函数，确保图形组织的一致性。无论是自包含排序图形的对齐，还是在多个图形中应用一致的分组和排序（如k-means聚类），ggalign都可以帮助简化这一过程。文档：Aggplot2ExtensionforConsistentAxis
R语言数据分析案例：使用R进行销售数据分析 ByteWhisper r语言数据分析开发语言 R语言
R语言数据分析案例：使用R进行销售数据分析数据分析在现代业务决策中起着重要的作用。R语言作为一种功能强大且广泛使用的数据分析工具，为分析师提供了许多有用的功能和库。在本案例中，我们将使用R语言来分析销售数据，帮助我们了解销售趋势、客户行为以及产品表现。首先，让我们导入所需的库，并加载我们的销售数据集。#导入库library(dplyr)library(ggplot2)#加载数据集sales_dat
R语言如何对excel数据进行操作安宁ᨐ r语言 excel 开发语言
在R语言中，可以使用`readxl`包来读取和操作Excel数据。首先，需要安装`readxl`包，可以使用以下命令安装：```install.packages("readxl")```安装完成后，加载`readxl`包：```library(readxl)```读取Excel文件：```data<-read_excel("path_to_excel_file.xls")```其中，`path_t
使用R语言进行数据框操作代码创造者 r语言开发语言 R语言
使用R语言进行数据框操作数据框（DataFrames）是R语言中一种常用的数据结构，它类似于表格，可以用于存储和处理结构化数据。本文将介绍如何使用R语言进行数据框的操作，包括创建数据框、添加和删除列、选择和过滤数据等常见操作。创建数据框首先，我们需要了解如何创建一个数据框。下面的代码演示了如何使用data.frame函数创建一个包含学生信息的数据框：#创建数据框students<-data.fra
Rust代写 OCaml代做 Go R语言 SML Haskell Prolog DrRacket Lisp matlabgoodboy rust golang r语言
Rust：Rust是一种注重性能和安全性的系统编程语言。它具有严格的内存管理，能够防止许多常见的内存错误。Rust作业可能涉及编写高效的算法、处理并发问题、与操作系统接口等。OCaml：OCaml是一种函数式编程语言，具有强大的类型系统和模块系统。它适合用于开发高性能、高可靠性的应用程序。OCaml作业可能涉及编写函数、处理数据结构、实现算法等。Go：Go（又称Golang）是一种编译型、并发型，
r语言 xml html,R语言读取XML文件-xml文件 bean.Xu r语言 xml html
XML文件简介在计算机领域，XML(extensiblemarkuplanguage)指的是可扩展标记语言，类似于HTML，它设计的宗旨是传输数据，而不是显示数据，所以这也是它和HTML的一个明显的差别。另外一个差别是XML的标签没有被预定义，我们可以根据自己的需要自行设计标签名字，所以具有自我描述性。一个具体的例子以上就是一个XML的例子，它拥有发送者和接受者，标题，内容等信息，所以自我描述非常
datapasta包学习-可复制网页、Excel表格等其他来源的数据至Rstudio中凑齐六个字吧科研工具数据挖掘
datapasta是一个R语言中用于优化数据复制和粘贴（copy-paste）的R包，旨在简化数据导入和转换过程，减少手动格式调整的需求，提高数据整理的效率。功能介绍将Excel/CSV/表格数据快速粘贴到R代码：可将剪贴板中的数据直接转换为data.frame、tibble、vector等格式，无需手动整理格式。从R数据转换为文本格式（适用于论文、报告）：支持将R变量（如data.frame、向
R语言：将R语言中的Seurat数据对象转换为Python能处理的h5ad格式 S.GJ r语言 python 开发语言
背景在基因组学数据分析场景下，有些数据被保存为了R语言中的Seurat对象格式，我们的需求是将Seurat对象格式的数据转换为Python能处理的h5ad格式。R处理代码###1.准备工作#1.1readr包安装install.packages("readr")#1.2Seurat包安装#略#1.3SeuratDisk包安装remotes::install_github("mojaveazure/
【cran Archive R包的安装方式】遗落凡尘的萤火-生信小白 r语言开发语言
cranArchiveR包的安装方式添加链接描述1.包被cran移除2.包要求的R语言版本与你电脑上的版本不相符ad=archive包的网址或者是下载到工作目录下，ad等于文件名install,packages(adrepos=NULL)
R语言 Rstudio 安装包报错：安装包‘ ’时出现非零退出状态数据智团 r语言开发语言 R语言
问题描述：在使用R语言和RStudio时，尝试安装包时遇到了报错信息：“installationofpackage‘’hadnon-zeroexitstatus”。这个错误提示表明在安装特定的R包时出现了问题，导致安装过程未能成功完成。解决方法：出现这个错误的原因可能有多种，下面将介绍几种常见的解决方法。检查包名和版本：确保在安装包时提供了正确的包名，并且该包存在于CRAN（Comprehensi
R语言|1.2 R语言的工作空间管理 wqqqqqq_ R语言 r语言开发语言
#R语言|1.2R语言的工作空间管理工作空间是R的工作环境。退出R时，如果选择保存工作空间，R将会在工作空间所在文件夹中创建两个文件，“.Rhistory”，其中保存R中输入的任何命令，另一个为".Rdata"是将工作空间中的所有对象都保存在其中。工作目录(workingdirectory)，用来读取文件和保存结果的一个文件夹。我们可以使用函数getwd()查看当前的工作目录，也可以使用函数set
R语言环境下载和RStudio安装教程 CyberXZ r语言开发语言 R语言
R语言环境下载和RStudio安装教程R语言是一种广泛应用于统计计算和数据分析的编程语言。它提供了丰富的统计和图形功能，被广泛用于数据科学、机器学习和数据可视化等领域。本教程将向您介绍如何下载和安装R语言环境以及RStudio集成开发环境。步骤1：下载R语言环境首先，您需要下载R语言环境。请按照以下步骤进行操作：在您的Web浏览器中打开R官方网站（https://www.r-project.org
Elixir语言的物联网蔺曲韵包罗万象 golang 开发语言后端
使用Elixir语言构建物联网应用引言物联网（IoT）是当今技术发展的热门领域，它涉及各种设备的互联互通，这些设备能够收集和交换数据。随着智能家居、智能城市、工业自动化等应用场景的逐步普及，如何高效地构建和管理这些设备之间的通信已经成为一个重要的问题。在众多编程语言中，Elixir以其并发、可靠性和分布式特性，逐渐成为构建物联网应用的一个优秀选择。本文将详细探讨Elixir在物联网项目中的应用，包
Elixir语言的安全开发沈韶珺包罗万象 golang 开发语言后端
Elixir语言的安全开发引言在当今这个互联网高度发展的时代，软件的安全性变得越来越重要。随着网络攻击的增多，软件漏洞的频繁暴露，开发者面临着前所未有的安全挑战。Elixir，作为一种现代化的函数式编程语言，以其高并发、分布式和容错的特点，迅速获得了开发者的青睐。然而，尽管Elixir语言本身带来了许多安全优势，安全开发仍然是一个复杂而关键的过程。本文将探讨Elixir语言的安全开发，包括其安全特
R语言绘图实现—使用R语言绘制科研图形 kaka_R-Py r语言开发语言
###6.1常用图形参数####6.1.1颜色#对women数据集绘制散点图，并用红色表示散点。plot(women,col="red")#通过颜色名称plot(women,col=554)#通过颜色下标plot(women,col="#FF0000")#通过十六进制的颜色值mycolor=1)){points(x,y,pch=i,col="blue",bg="yellow",cex=2)}els
22章2节：如何在 R Markdown 和 R Notebook 中创建使用 DAT｜R科学用R探索医药数据科学 r语言开发语言
RMarkdown是一种广泛使用的工具，可以帮助数据科学家、统计学家和研究人员创建动态和交互式的报告。它结合了R语言的强大数据处理和分析能力，以及Markdown的简洁易用的文本格式，使得创建专业和美观的报告变得更加简单和高效。同时，RNotebook是一种交互式文档格式，它将叙述性文本、数据可视化以及其他多媒体组件与用R语言编写的代码结合在一起。RNotebook使用户能够创建和分享包含数据分析
Tex转化为Word文件的R语言方法风华绚烂 word r语言 c#R语言
Tex转化为Word文件的R语言方法Tex和Word是两种常用的文档编辑工具，它们各自具有不同的特点和用途。有时候，我们可能需要将Tex格式的文档转化为Word格式，以便与他人共享或进一步编辑。在R语言中，我们可以使用一些包和函数来实现这个目标。首先，我们需要安装rmarkdown包，它提供了将RMarkdown文档转化为多种格式的功能，包括将Tex转化为Word。可以通过以下命令安装rmarkd
R语言安装zip包毕崇亮 r语言开发语言
我整理的一些关于【管理】的项目学习资料（附讲解～～）和大家一起分享、学习一下：https://d.51cto.com/eDOcp1如何在R语言中安装zip包作为一名刚入行的开发者，你可能对R语言中的包管理有些困惑。今天，我将带你一步步了解如何在R语言中安装zip包，并通过简单的示例帮助你掌握这一技能。安装包可以让你使用更丰富的功能与工具，所以这是一项非常重要的基础技能。安装zip包的流程在开始之前
项目中枚举与注解的结合使用飞翔的马甲 java enum annotation
前言：版本兼容，一直是迭代开发头疼的事，最近新版本加上了支持新题型，如果新创建一份问卷包含了新题型，那旧版本客户端就不支持，如果新创建的问卷不包含新题型，那么新旧客户端都支持。这里面我们通过给问卷类型枚举增加自定义注解的方式完成。顺便巩固下枚举与注解。一、枚举 1.在创建枚举类的时候，该类已继承java.lang.Enum类，所以自定义枚举类无法继承别的类，但可以实现接口。
【Scala十七】Scala核心十一：下划线_的用法 bit1129 scala
下划线_在Scala中广泛应用，_的基本含义是作为占位符使用。_在使用时是出问题非常多的地方，本文将不断完善_的使用场景以及所表达的含义 1. 在高阶函数中使用 scala> val list = List(-3,8,7,9) list: List[Int] = List(-3, 8, 7, 9) scala> list.filter(_ > 7) r
web缓存基础：术语、http报头和缓存策略 dalan_123 Web
对于很多人来说，去访问某一个站点，若是该站点能够提供智能化的内容缓存来提高用户体验，那么最终该站点的访问者将络绎不绝。缓存或者对之前的请求临时存储，是http协议实现中最核心的内容分发策略之一。分发路径中的组件均可以缓存内容来加速后续的请求，这是受控于对该内容所声明的缓存策略。接下来将讨web内容缓存策略的基本概念，具体包括如如何选择缓存策略以保证互联网范围内的缓存能够正确处理的您的内容，并谈论下
crontab 问题周凡杨 linux crontab unix
一： 0481-079 Reached a symbol that is not expected. 背景： */5 * * * * /usr/IBMIHS/rsync.sh
让tomcat支持2级域名共享session g21121 session
tomcat默认情况下是不支持2级域名共享session的，所有有些情况下登陆后从主域名跳转到子域名会发生链接session不相同的情况，但是只需修改几处配置就可以了。打开tomcat下conf下context.xml文件找到Context标签,修改为如下内容如果你的域名是www.test.com <Context sessionCookiePath="/path&q
web报表工具FineReport常用函数的用法总结（数学和三角函数）老A不折腾 Web finereport 总结
ABS ABS(number):返回指定数字的绝对值。绝对值是指没有正负符号的数值。 Number:需要求出绝对值的任意实数。示例: ABS(-1.5)等于1.5。 ABS(0)等于0。 ABS(2.5)等于2.5。 ACOS ACOS(number):返回指定数值的反余弦值。反余弦值为一个角度，返回角度以弧度形式表示。 Number:需要返回角
linux 启动java进程 sh文件墙头上一根草 linux shell jar
#!/bin/bash #初始化服务器的进程PId变量 user_pid=0; robot_pid=0; loadlort_pid=0; gateway_pid=0; ######### #检查相关服务器是否启动成功 #说明： #使用JDK自带的JPS命令及grep命令组合，准确查找pid #jps 加 l 参数，表示显示java的完整包路径 #使用awk，分割出pid
我的spring学习笔记5-如何使用ApplicationContext替换BeanFactory aijuans Spring 3 系列
如何使用ApplicationContext替换BeanFactory？ package onlyfun.caterpillar.device; import org.springframework.beans.factory.BeanFactory; import org.springframework.beans.factory.xml.XmlBeanFactory; import
Linux 内存使用方法详细解析 annan211 linux 内存 Linux内存解析
来源 http://blog.jobbole.com/45748/ 我是一名程序员，那么我在这里以一个程序员的角度来讲解Linux内存的使用。一提到内存管理，我们头脑中闪出的两个概念，就是虚拟内存，与物理内存。这两个概念主要来自于linux内核的支持。 Linux在内存管理上份为两级，一级是线性区，类似于00c73000-00c88000，对应于虚拟内存，它实际上不占用
数据库的单表查询常用命令及使用方法(-) 百合不是茶 oracle 函数单表查询
创建数据库; --建表 create table bloguser(username varchar2(20),userage number(10),usersex char(2)); 创建bloguser表,里面有三个字段 &nbs
多线程基础知识 bijian1013 java 多线程 thread java多线程
一．进程和线程进程就是一个在内存中独立运行的程序，有自己的地址空间。如正在运行的写字板程序就是一个进程。 “多任务”：指操作系统能同时运行多个进程（程序）。如WINDOWS系统可以同时运行写字板程序、画图程序、WORD、Eclipse等。线程：是进程内部单一的一个顺序控制流。线程和进程 a. 每个进程都有独立的
fastjson简单使用实例 bijian1013 fastjson
一.简介阿里巴巴fastjson是一个Java语言编写的高性能功能完善的JSON库。它采用一种“假定有序快速匹配”的算法，把JSON Parse的性能提升到极致，是目前Java语言中最快的JSON库；包括“序列化”和“反序列化”两部分，它具备如下特征：
【RPC框架Burlap】Spring集成Burlap bit1129 spring
Burlap和Hessian同属于codehaus的RPC调用框架，但是Burlap已经几年不更新，所以Spring在4.0里已经将Burlap的支持置为Deprecated,所以在选择RPC框架时，不应该考虑Burlap了。这篇文章还是记录下Burlap的用法吧，主要是复制粘贴了Hessian与Spring集成一文，【RPC框架Hessian四】Hessian与Spring集成
【Mahout一】基于Mahout 命令参数含义 bit1129 Mahout
1. mahout seqdirectory $ mahout seqdirectory --input (-i) input Path to job input directory(原始文本文件). --output (-o) output The directory pathna
linux使用flock文件锁解决脚本重复执行问题 ronin47 linux lock　重复执行
linux的crontab命令，可以定时执行操作，最小周期是每分钟执行一次。关于crontab实现每秒执行可参考我之前的文章《linux crontab 实现每秒执行》现在有个问题，如果设定了任务每分钟执行一次，但有可能一分钟内任务并没有执行完成，这时系统会再执行任务。导致两个相同的任务在执行。例如： <? // test .php
java-74-数组中有一个数字出现的次数超过了数组长度的一半，找出这个数字 bylijinnan java
public class OcuppyMoreThanHalf { /** * Q74 数组中有一个数字出现的次数超过了数组长度的一半，找出这个数字 * two solutions: * 1.O(n) * see <beauty of coding>--每次删除两个不同的数字，不改变数组的特性 * 2.O(nlogn) * 排序。中间
linux 系统相关命令 candiio linux
系统参数 cat /proc/cpuinfo cpu相关参数 cat /proc/meminfo 内存相关参数 cat /proc/loadavg 负载情况性能参数 1）top M：按内存使用排序 P：按CPU占用排序 1：显示各CPU的使用情况 k：kill进程 o：更多排序规则回车：刷新数据 2）ulimit ulimit -a：显示本用户的系统限制参
[经营与资产]保持独立性和稳定性对于软件开发的重要意义 comsci 软件开发
一个软件的架构从诞生到成熟，中间要经过很多次的修正和改造如果在这个过程中，外界的其它行业的资本不断的介入这种软件架构的升级过程中那么软件开发者原有的设计思想和开发路线
在CentOS5.5上编译OpenJDK6 Cwind linux OpenJDK
几番周折终于在自己的CentOS5.5上编译成功了OpenJDK6，将编译过程和遇到的问题作一简要记录，备查。 0. OpenJDK介绍 OpenJDK是Sun（现Oracle）公司发布的基于GPL许可的Java平台的实现。其优点： 1、它的核心代码与同时期Sun（-> Oracle）的产品版基本上是一样的，血统纯正，不用担心性能问题，也基本上没什么兼容性问题；（代码上最主要的差异是
java乱码问题 dashuaifu java乱码问题 js中文乱码
swfupload上传文件参数值为中文传递到后台接收中文乱码在js中用setPostParams（{"tag" : encodeURI( document.getElementByIdx_x("filetag").value，"utf-8")}）; 然后在servlet中String t
cygwin很多命令显示command not found的解决办法 dcj3sjt126com cygwin
cygwin很多命令显示command not found的解决办法修改cygwin.BAT文件如下 @echo off D: set CYGWIN=tty notitle glob set PATH=%PATH%;d:\cygwin\bin;d:\cygwin\sbin;d:\cygwin\usr\bin;d:\cygwin\usr\sbin;d:\cygwin\us
[介绍]从 Yii 1.1 升级 dcj3sjt126com PHP yii2
2.0 版框架是完全重写的，在 1.1 和 2.0 两个版本之间存在相当多差异。因此从 1.1 版升级并不像小版本间的跨越那么简单，通过本指南你将会了解两个版本间主要的不同之处。如果你之前没有用过 Yii 1.1，可以跳过本章，直接从"入门篇"开始读起。请注意，Yii 2.0 引入了很多本章并没有涉及到的新功能。强烈建议你通读整部权威指南来了解所有新特性。这样有可能会发
Linux SSH免登录配置总结 eksliang ssh-keygen Linux SSH免登录认证 Linux SSH互信
转载请出自出处：http://eksliang.iteye.com/blog/2187265 一、原理我们使用ssh-keygen在ServerA上生成私钥跟公钥，将生成的公钥拷贝到远程机器ServerB上后,就可以使用ssh命令无需密码登录到另外一台机器ServerB上。生成公钥与私钥有两种加密方式，第一种是
手势滑动销毁Activity gundumw100 android
老是效仿ios，做android的真悲催！有需求：需要手势滑动销毁一个Activity 怎么办尼？自己写？不用~，网上先问一下百度。结果： http://blog.csdn.net/xiaanming/article/details/20934541 首先将你需要的Activity继承SwipeBackActivity，它会在你的布局根目录新增一层SwipeBackLay
JavaScript变换表格边框颜色 ini JavaScript html Web html5 css
效果查看：http://hovertree.com/texiao/js/2.htm代码如下，保存到HTML文件也可以查看效果： <html> <head> <meta charset="utf-8"> <title>表格边框变换颜色代码-何问起</title> </head> <body&
Kafka Rest : Confluent kane_xie kafka REST confluent
最近拿到一个kafka rest的需求，但kafka暂时还没有提供rest api（应该是有在开发中，毕竟rest这么火），上网搜了一下，找到一个Confluent Platform，本文简单介绍一下安装。这里插一句，给大家推荐一个九尾搜索，原名叫谷粉SOSO，不想fanqiang谷歌的可以用这个。以前在外企用谷歌用习惯了，出来之后用度娘搜技术问题，那匹配度简直感人。环境声明：Ubu
Calender不是单例 men4661273 单例 Calender
在我们使用Calender的时候，使用过Calendar.getInstance()来获取一个日期类的对象，这种方式跟单例的获取方式一样，那么它到底是不是单例呢，如果是单例的话，一个对象修改内容之后，另外一个线程中的数据不久乱套了吗？从试验以及源码中可以得出，Calendar不是单例。测试： Calendar c1 =
线程内存和主内存之间联系 qifeifei java thread
1， java多线程共享主内存中变量的时候，一共会经过几个阶段， lock:将主内存中的变量锁定，为一个线程所独占。 unclock:将lock加的锁定解除，此时其它的线程可以有机会访问此变量。 read:将主内存中的变量值读到工作内存当中。 load:将read读取的值保存到工作内存中的变量副本中。
schedule和scheduleAtFixedRate tangqi609567707 java timer schedule
原文地址：http://blog.csdn.net/weidan1121/article/details/527307 import java.util.Timer;import java.util.TimerTask;import java.util.Date; /** * @author vincent */public class TimerTest {
erlang 部署 wudixiaotie erlang
1.如果在启动节点的时候报这个错： {"init terminating in do_boot",{'cannot load',elf_format,get_files}} 则需要在reltool.config中加入 {app, hipe, [{incl_cond, exclude}]}, 2.当generate时，遇到： ERROR