radar_sun

R-DataCamp-Cleaning Data in R

1. Introduction and Exploring the Data

1.1 Introduction to cleaning data in R (video)

1.2 The data cleaning process

1.3 Here’s what messy data look like

In the final chapter of this course, you will be presented with a messy, real-world dataset containing an entire year’s worth of weather data from Boston, USA. Among other things, you’ll be presented with variables that contain column names, column names that should be values, numbers coded as character strings, and values that are missing, extreme, and downright erroneous!

Instruction:

We’ve placed some R code in the script to the right. Run the code as-is to see just how messy the weather data really are!

# View the first 6 rows of data
head(weather)

# View the last 6 rows of data
tail(weather)

# View a condensed summary of the data
str(weather)

1.4 Here’s what clean data look like

In this course, you will acquire many new tools in your data cleaning toolbox for whipping the weather data into shape!

Instruction:

Run the code provided to see what the weather dataset will look like by the time you are done cleaning it. If it’s not immediately clear what’s changed, don’t worry! You will have a much deeper understanding by the end of this course.

# View the first 6 rows of data
head(weather_clean)

# View the last 6 rows of data
tail(weather_clean)

# View a condensed summary of the data
str(weather_clean)

1.5 Exploring raw data (video)

1.6 Getting a feel for your data

The first thing to do when you get your hands on a new dataset is to understand its structure. There are several ways to go about this in R, each of which may reveal different issues with your data that require attention.

In this course, we are only concerned with data that can be expressed in table format (i.e. two dimensions, rows and columns). As you may recall from earlier courses, tables in R often have the type data.frame. You can check the class of any object in R with the class() function.

Once you know that you are dealing with tabular data, you may also want to get a quick feel for the contents of your data. Before printing the entire dataset to the console, it’s probably worth knowing how many rows and columns there are. The dim() command tells you this.

Instruction:
We’ve loaded a dataset called bmi into your workspace. The data, which give the (age standardized) mean body mass index (BMI) among males in each country for the years 1980-2008, come from the School of Public Health, Imperial College London.

Check the class of bmi.
Find the dimensions of bmi.
Print the bmi column names.

# Check the class of bmi
class(bmi)

# Check the dimensions of bmi
dim(bmi)

# View the column names of bmi
names(bmi)

1.7 Viewing the structure of your data

Since bmi doesn’t have a huge number of columns, you can view a quick snapshot of your data using the str() (for structure) command. In addition to the class and dimensions of your entire dataset, str() will tell you the class of each variable and give you a preview of its contents.

Although we won’t go into detail on the dplyr package in this lesson (see the Data Manipulation in R with dplyr course), the glimpse() function from dplyr is a slightly cleaner alternative to str(). str() and glimpse() give you a preview of your data, which may reveal issues with the way columns are labelled, how variables are encoded, etc.

You can use the summary() command to get a better feel for how your data are distributed, which may reveal unusual or extreme values, unexpected missing data, etc. For numeric variables, this means looking at means, quartiles (including the median), and extreme values. For character or factor variables, you may be curious about the number of times each value appears in the data (i.e. counts), which summary() also reveals.

Instruction:

View the structure of bmi using the traditional method.
Load the dplyr package.
View the structure of bmi using dplyr.
Look at a summary() of bmi.

# Check the structure of bmi
str(bmi)

# Load dplyr
library(dplyr)

# Check the structure of bmi, the dplyr way
glimpse(bmi)

# View a summary of bmi
summary(bmi)

1.8 Exploring raw data (part 2) (video)

1.9 Looking at your data

You can look at all the summaries you want, but at the end of the day, there is no substitute for looking at your data – either in raw table form or by plotting it.

The most basic way to look at your data in R is by printing it to the console. As you may know from experience, the print() command is not even necessary; you can just type the name of the object. The downside to this option is that R will attempt to print the entire dataset, which can be a nuisance if the dataset is too large.

One way around this is to use the head() and tail() commands, which only display the first and last 6 rows of data, respectively. You can view more (or fewer) rows by providing as a second argument to the function the number of rows you wish to view. These functions provide a useful method for quickly getting a sense of your data without overly cluttering the console.

Instruction:

Print the full dataset to the console (you don’t need print() to do this).
View the first 6 rows of bmi.
View the first 15 rows of bmi.
View the last 6 rows of bmi.
View the last 10 rows of bmi.

# Print bmi to the console
print(bmi)

# View the first 6 rows
head(bmi)

# View the first 15 rows
head(bmi, 15)

# View the last 6 rows
tail(bmi)

# View the last 10 rows
tail(bmi, 10)

1.10 Visualizing your data

There are many ways to visualize data. Since this is not a course about data visualization, we will only touch on two types of plots that may be useful for quickly identifying extreme or suspicious values in your data: histograms and scatter plots.

A histogram, created with the hist() function, takes a vector (i.e. column) of data, breaks it up into intervals, then plots as a vertical bar the number of instances within each interval. A scatter plot, created with the plot() function, takes two vectors (i.e. columns) of data and plots them as a series of (x, y) coordinates on a two-dimensional plane.

Let’s look at a quick example of each.

Instruction:

Use hist() to look at the distribution of average BMI across all countries in 2008.
Use plot() to see how each country’s average BMI in 1980 (x-axis) compared with its BMI in 2008 (y-axis).

# Histogram of BMIs from 2008
hist(bmi$Y2008)

# Scatter plot comparing BMIs from 1980 to those from 2008
plot(x = bmi$Y1980, y = bmi$Y2008)

2. Tidying Data

2.1 Introduction to tidy data (video)

2.2 Principle of tidy data

2.3 Common symptoms of messy data

2.4 Introduction to tidyr (video)

2.5 What kind of messy are the BMI data?

2.6 Gathering columns into key-value pairs

The most important function in tidyr is gather(). It should be used when you have columns that are not variables and you want to collapse them into key-value pairs.

The easiest way to visualize the effect of gather() is that it makes wide datasets long. As you saw in the video, running the following command on wide_df will make it long:

gather(wide_df, my_key, my_val, -col)

Experiment with this in the console before attempting the exercise.

Instruction:

Apply the gather() function to bmi, saving the result to bmi_long. This will create two new columns:
- year, containing as values what are currently column headers
- bmi_val, the actual BMI values
View the first 20 rows of bmi_long.

# Apply gather() to bmi and save the result as bmi_long
bmi_long <- gather(bmi, year, bmi_val, -Country)

# View the first 20 rows of the result
head(bmi_long, 20)

2.7 Spreading key-value pairs into columns

The opposite of gather() is spread(), which takes key-values pairs and spreads them across multiple columns. This is useful when values in a column should actually be column names (i.e. variables). It can also make data more compact and easier to read.

The easiest way to visualize the effect of spread() is that it makes long datasets wide. As you saw in the video, running the following command will make long_df wide:

spread(long_df, my_key, my_val)

Experiment with this in the console before attempting the exercise.

Instruction:

# Apply spread() to bmi_long
bmi_wide <- spread(bmi_long, year, bmi_val)

# View the head of bmi_wide
dim(bmi_wide)

2.8 Introduction to tidyr (part 2) (video)

2.9 Functions in tidyr

2.10 Separating columns

The separate() function allows you to separate one column into multiple columns. Unless you tell it otherwise, it will attempt to separate on any character that is not a letter or number. You can also specify a specific separator using the sep argument.

We’ve loaded the small dataset from the video called treatments into your workspace. This dataset obeys the principles of tidy data, but we’d like to split the treatment dates into two separate columns: year and month. This can be accomplished with the following:

separate(treatments, year_mo, c("year", "month"))

Experiment with this in the console before attempting the exercise.

Instruction:
We’ve loaded a dataset called bmi_cc into your workspace that is a slight variation of bmi_long, which you’ve already seen. The Country_ISO column of bmi_cc has the name of each country as well its two-letter ISO country code, separated by a forward slash.

Apply the separate() function to bmi_cc
- Separate Country_ISO into two columns: Country and ISO
- Be sure to specify the correct separator with the sep argument
- Save the result to a new object called bmi_cc_clean
View the head of the result.

# Apply separate() to bmi_cc
bmi_cc_clean <- separate(bmi_cc, col = Country_ISO, into = c("Country", "ISO"), sep = "/")

# Print the head of the result
head(bmi_cc_clean)

2.11 Uniting columns

The opposite of separate() is unite(), which takes multiple columns and pastes them together. By default, the contents of the columns will be separated by underscores in the new column, but this behavior can be altered via the sep argument.

We’ve loaded the treatments data into your workspace again, but this time the year_mo column has been separated into year and month. The original column can be recreated by putting year and month back together:

unite(treatments, year_mo, year, month)

Experiment with this in the console before attempting the exercise.

Instruction:
In the last exercise, you separated the Country_ISO column of the bmi_cc dataset into two columns (Country and ISO) and saved the result to bmi_cc_clean. Now you’re going to put the columns back together!

Apply the unite() function to bmi_cc_clean
- Reunite the Country and ISO columns into a single column called Country_ISO
- Separate each country name and code with a dash (-)
- Save the result as bmi_cc
View the head of the result.

# Apply unite() to bmi_cc_clean
bmi_cc <- unite(bmi_cc_clean, Country_ISO, Country, ISO, sep = "-")

# View the head of the result
head(bmi_cc)

2.12 Column headers are values, not variable names

You saw earlier in the chapter how we sometimes come across datasets where column names are actually values of a variable (e.g. months of the year). This is often the case when working with repeated measures data, where measurements are taken on subjects of interest on multiple occasions over time. The gather() function is helpful in these situations.

tidyr and dplyr are already loaded for you.

Instruction:

View the head of census.
Gather the month columns, creating two new columns (month and amount), saving the result to census2.
Run the code given to arrange() the rows of census2 by the YEAR column.
View the first 20 rows of the result.

# View the head of census
head(census)

# Gather the month columns
census2 <- gather(census, month, amount, -YEAR)

# Arrange rows by YEAR using dplyr's arrange
census2_arr <- arrange(census2, YEAR)

# View first 20 rows of census2_arr
head(census2_arr, 20)

2.13 Variables are stored in both rows and columns

Sometimes you’ll run into situations where variables are stored in both rows and columns. To illustrate this, we’ve loaded the pets dataset from the video, which tells us in a convoluted way how many birds, cats, and dogs Jason, Lisa, and Terrence have. Print the pets dataset to see for yourself.

Although it may not be immediately obvious, if we treat the values in the type column as variables and create a separate column for each of them, we can set things straight. To do this, we use the spread() function. Run the following code to see for yourself:

spread(pets, type, num)

The result shows the exact same information in a much clearer way! Notice that the spread() function took in three arguments. The first argument takes the name of your messy dataset (pets), the second argument takes the name of the column to spread into new columns (type), and the third argument takes the column that contains the value with which to fill in the newly spread out columns (num).

Now let’s try this on a new messy dataset census_long. What information does this tell us?

tidyr and dplyr are already loaded for you.

Instruction:

View the first 50 rows of census_long.
Decide which column of census_long would be best to spread, and which column of census_long would be best to display in the newly spread out columns. Use the spread() function accordingly and save the result to census_long2.
View the first 20 rows of census_long2.

# View first 50 rows of census_long
head(census_long,50)

# Spread the type column
census_long2 <- spread(census_long, type, amount)

# View first 20 rows of census_long2
head(census_long2,20)

2.14 Multiple values are stored in one column

It’s also fairly common that you will find two variables stored in a single column of data. These variables may be joined by a separator like a dash, underscore, space, or forward slash.

The separate() function comes in handy in these situations. To practice using it, we have created a slight modification of last exercise’s result. Keep in mind that the into argument, which specifies the names of the 2 new columns being formed, must be given as a character vector (e.g. c("column1", "column2")).

tidyr and dplyr are already loaded for you.

Instruction:

View the head of census_long3.
Use tidyr’s separate() to split the yr_month column into two separate variables: year and month, saving the result to census_long4.
View the first 6 rows of result.

# View the head of census_long3
head(census_long3)

# Separate the yr_month column into two
census_long4 <- separate(census_long3, yr_month, c("year", "month") )

# View the first 6 rows of the result
head(census_long4)

3. Preparing Data for Analysis

3.1 Type conversions (video)

3.2 Types of variables in R

As in other programming languages, R is capable of storing data in many different formats, most of which you’ve probably seen by now.

Loosely speaking, the class() function tells you what type of object you’re working with. (There are subtle differences between the class, type, and mode of an object, but these distinctions are beyond the scope of this course.)

3.3 Common type conversions

It is often necessary to change, or coerce, the way that variables in a dataset are stored. This could be because of the way they were read into R (with read.csv(), for example) or perhaps the function you are using to analyze the data requires variables to be coded a certain way.

Only certain coercions are allowed, but the rules for what works are generally pretty intuitive. For example, trying to convert a character string to a number gives an error: as.numeric("some text").

There are a few less intuitive results. For example, under the hood, the logical values TRUE and FALSE are coded as 1 and 0, respectively. Therefore, as.logical(1) returns TRUE and as.numeric(TRUE) returns 1.

Instruction:
We’ve loaded a dataset called students into your workspace. These data provide information on 395 students including their grades in three classes (in the Grades column, separated by /).

Use str() to preview students and see the class of each variable.
Coerce the following columns:
- Grades to character
- Medu to factor (categorical variable representing mother’s education level)
- Fedu to factor (categorical variable representing father’s education level)
- Use str() again to see the changes to students.

# Preview students with str()
str(students)

# Coerce Grades to character
students$Grades <- as.character(students$Grades)

# Coerce Medu to factor
students$Medu <- as.factor(students$Medu)

# Coerce Fedu to factor
students$Fedu <- as.factor(students$Fedu)
    
# Look at students once more with str()
str(students)

3.4 Working with dates

Dates can be a challenge to work with in any programming language, but thanks to the lubridate package, working with dates in R isn’t so bad. Since this course is about cleaning data, we only cover the most basic functions from lubridate to help us standardize the format of dates and times in our data.

As you saw in the video, these functions combine the letters y, m, d, h, m, s, which stand for year, month, day, hour, minute, and second, respectively. The order of the letters in the function should match the order of the date/time you are attempting to read in, although not all combinations are valid. Notice that the functions are “smart” in that they are capable of parsing multiple formats.

Instruction:
We have loaded a dataset called students2 into your workspace. students2 is similar to students, except now instead of an age for each student, we have a (hypothetical) date of birth in the dob column. There’s another new column called nurse_visit, which gives a timestamp for each student’s most recent visit to the school nurse.

Preview students2 with str(). Notice that dob and nurse_visit are both stored as character.
Load the lubridate package.
Print “17 Sep 2015” as a date.
Print “July 15, 2012 12:56” as a date and time (note there are hours and minutes, but no seconds!).
Coerce dob to a date (with no time).
Coerce nurse_visit to a date and time.
Use str() to see the changes to students2.

# Preview students2 with str()
str(students2)

# Load the lubridate package
library(lubridate)

# Parse as date
dmy("17 Sep 2015")

# Parse as date and time (with no seconds!)
mdy_hm("July 15, 2012 12:56")

# Coerce dob to a date (with no time)
students2$dob <- ymd(students2$dob)

3.5 String manipulation (video)

3,6 Trimming and padding strings

One common issue that comes up when cleaning data is the need to remove leading and/or trailing white space. The str_trim() function from stringr makes it easy to do this while leaving intact the part of the string that you actually want.

str_trim(" this is a test ")
[1] “this is a test”

A similar issue is when you need to pad strings to make them a certain number of characters wide. One example is if you had a bunch of employee ID numbers, some of which begin with one or more zeros. When reading these data in, you find that the leading zeros have been dropped somewhere along the way (probably because the variable was thought to be numeric and in that case, leading zeros would be unnecessary.)

str_pad(“24493”, width = 7, side = “left”, pad = “0”)
[1] “0024493”

Instruction:

Load the stringr package.
Trim all leading and trailing white space from the first set of strings.
Pad the second set of strings with leading zeros such that all are 9 characters in length.

# Load the stringr package
library(stringr)

# Trim all leading and trailing whitespace
str_trim(c("   Filip ", "Nick  ", " Jonathan"))

# Pad these strings with leading zeros
str_pad(c("23485W", "8823453Q", "994Z"), width = 9, side = "left", pad = "0")

3.7 Upper and lower case

In addition to trimming and padding strings, you may need to adjust their case from time to time. Making strings uppercase or lowercase is very straightforward in (base) R thanks to toupper() and tolower(). Each function takes exactly one argument: the character string (or vector/column of strings) to be converted to the desired case.

Instruction:
There’s a vector of state abbreviations called states in your workspace, but there’s a problem…it’s all lowercase. It’s more common for state abbreviations to be all uppercase.

Print states to the console.
Make states all uppercase and save the result to states_upper.
Make states_upper all lowercase again, but don’t save the result.

# Print state abbreviations
print(states)

# Make states all uppercase and save result to states_upper
states_upper <- toupper(states)

# Make states_upper all lowercase again
tolower(states_upper)

3.8 Finding and replacing strings

The stringr package provides two functions that are very useful for finding and/or replacing patterns in strings: str_detect() and str_replace().

Like all functions in stringr, the first argument of each is the string of interest. The second argument of each is the pattern of interest. In the case of str_detect(), this is the pattern we are searching for. In the case of str_replace(), this is the pattern we want to replace. Finally, str_replace() has a third argument, which is the string to replace with.

str_detect(c(“banana”, “kiwi”), “a”)
[1] TRUE FALSE
str_replace(c(“banana”, “kiwi”), “a”, “o”)
[1] “bonana” “kiwi”

The data.frame students2 is already available for you in the workspace. stringr is already loaded. students3 is a copy of it for you to work on so you can always start from scratch if you happen to make a mistake.

Instruction:
The students2 dataset from earlier in the chapter has been loaded for you again.

Look at the head() of students3 to remind yourself of how it looks.
Detect all dates of birth (dob) in 1997 using str_detect(). This should return a vector of TRUE and FALSE values.
Replace all instances of "F" with "Female" in students3$sex.
Replace all instances of "M" with "Male" in students3$sex.
View the head() of students3 to see the result of these

# Copy of students2: students3
students3 <- students2

# Look at the head of students3
head(students3)

# Detect all dates of birth (dob) in 1997
str_detect(students3$dob, "1997")

# In the sex column, replace "F" with "Female" ...
students3$sex <- str_replace(students3$sex, "F","Female")

# ... and "M" with "Male"
students3$sex <- str_replace(students3$sex, "M","Male")

# View the head of students3
head(students3)

3.9 Missing and special values (video)

3.10 Types of missing and special values in R

3.11 Finding missing values

As you’ve seen, missing values in R should be represented by NA, but unfortunately you will not always be so lucky. Before you can deal with missing values, you have to find them in the data.

If missing values are properly coded as NA, the is.na() function will help you find them. Otherwise, if your dataset is too big to just look at the whole thing, you may need to try searching for some of the usual suspects like "", "#N/A", etc. You can also use the summary() and table() functions to turn up unexpected values in your data.

In this exercise, we’ve created a simple dataset called social_df that has 3 pieces of information for each of four friends:

Name
Number of friends on a popular social media platform
Current “status” on the platform

Instruction:

Call is.na() on social_df to spot all NA values.
Wrap the above with the any() function to ask the question “Are there any NA values in my dataset?”.
View a summary() of the dataset to see how missing values are broken out.
Use table to identify odd values of the status variable.

# Call is.na() on the full social_df to spot all NAs
is.na(social_df)

# Use the any() function to ask whether there are any NAs in the data
any(is.na(social_df))

# View a summary() of the dataset
summary(social_df)

# Call table() on the status column
table(social_df$status)

3.12 Dealing with missing values

Missing values can be a rather complex subject, but here we’ll only look at the simple case where you are simply interested in normalizing and/or removing all missing values from your data. For more information on why this is not always the best strategy, search online for “missing not at random.”

Looking at the social_df dataset again, we asked around a bit and figured out what’s causing the missing values that you saw in the last exercise. Tom doesn’t have a social media account on this particular platform, which explains why his number of friends and current status are missing (although coded in two different ways). Alice is on the platform, but is a passive user and never sets her status, hence the reason it’s missing for her.

The stringr package is preloaded.

Instruction:

Replace all empty strings (i.e. "") with NA in the status column of social_df.
Print the updated version of social_df to confirm your changes.
Use complete.cases() to return a vector containing TRUE and FALSE to see which rows have NO missing values.
Use na.omit() to remove all rows with one or more missing values (without saving the result).

# Replace all empty strings in status with NA
social_df$status[social_df$status == ""] <- NA

# Print social_df to the console
print(social_df)

# Use complete.cases() to see which rows have no missing values
complete.cases(social_df)

# Use na.omit() to remove all rows with any missing values
na.omit(social_df)

3.13 Outliers and obvious (video)

3.14 Identifying outliers and obvious errors

3.15 Dealing with outliers and obvious errors

When dealing with strange values in your data, you often must decide whether they are just extreme or actually erroneous. Extreme values show up all over the place, but you, the data analyst, must figure out when they are plausible and when they are not.

We have loaded a dataset called students3, which is another slight variation of the original students dataset. Two variables appear to have suspicious values: age and absences. Let’s explore these value further.

Instruction:

Call summary() on the full students3 dataset to expose the concerning values of age and absences.
View a histogram (using hist()) of the age variable.
View a histogram of the absences variable.
View another histogram of absences, but force values of zero to be bucketed to the right of zero on the x-axis with right = FALSE (see ?hist for more info).

# Look at a summary() of students3
summary(students3)

# View a histogram of the age variable
hist(students3$age)

# View a histogram of the absences variable
hist(students3$absences)

# View a histogram of absences, but force zeros to be bucketed to the right of zero
hist(students3$absences, right = FALSE)

3.16 Another look at strange values

Another useful way of looking at strange values is with boxplots. Simply put, boxplots draw a box around the middle 50% of values for a given variable, with a bolded horizontal line drawn at the median. Values that fall far from the bulk of the data points (i.e. outliers) are denoted by open circles. (If you’re curious about the exact formula for determining what is “far”, check out ?hist.)

In this situation, we are concerned about three things:

Since this dataset is about students and the only student above the age of 22 is 38 years old, we must wonder whether this is an error in the data or just an older student (perhaps returning to school after working for several years).
There are four values of -1 for the absences variable, which is either a mistake or an intentional coding meant to say, for example, “this value is missing”.
There are several extreme values of absences in the positive direction, with a maximum value of 75 (which is over 18 times the median value of 4).

Instruction:

View a boxplot() of the age variable from students3.
View a boxplot() of the absences variable from students3.

# View a boxplot of age
boxplot(students3$age)

# View a boxplot of absences
boxplot(students3$absences)

4. Putting it ALL Together

4.1 Time to put it all together! (video)

4.2 Get a feel for the data

Before diving into our data cleaning routine, we must first understand the basic structure of the data. This involves looking at things like the class() of the data object to make sure it’s what we expect (generally a data.frame) in addition to checking its dimensions with dim() and the column names with names().

Instruction:
For the weather dataset, which is loaded in your workspace:

Check that it’s a data.frame using the function class().
Look at the dimensions.
View the column names.

# Verify that weather is a data.frame
class(weather)

# Check the dimensions
dim(weather)

# View the column names
names(weather)

4.3 Summarize the data

Next up is to look at some summaries of the data. This is where functions like str(), glimpse() from dplyr, and summary() come in handy.

Instruction:

# View the structure of the data
str(weather)

# Load dplyr package
library(dplyr)

# Look at the structure using dplyr's glimpse()
glimpse(weather)

# View a summary of the data
summary(weather)

4.4 Take a closers look

After understanding the structure of the data and looking at some brief summaries, it often helps to preview the actual data. The functions head() and tail() allow you to view the top and bottom rows of the data, respectively. Recall you’ll be shown 6 rows by default, but you can alter this behavior with a second argument to the function.

Instruction:
For the weather data:

View the first 6 rows.
View the first 15 rows.
View the last 6 rows.
View the last 10 rows.

# View first 6 rows
head(weather)

# View first 15 rows
head(weather, 15)

# View the last 6 rows
tail(weather)

# View the last 10 rows
tail(weather, 10)

4.5 Let’s tidy the data (video)

4.6 Column names are values

The weather dataset suffers from one of the five most common symptoms of messy data: column names are values. In particular, the column names X1-X31 represent days of the month, which should really be values of a new variable called day.

The tidyr package provides the gather() function for exactly this scenario. To remind you of how it works, we’ve loaded a small dataset called df in your workspace. Give the following a try in the console before attempting the instructions below.

gather(df, time, val, t1:t3)

Notice that gather() allows you to select multiple columns to be gathered by using the : operator.

Instruction:

Load the tidyr package.
Call gather() on the weather data to gather columns X1-X31. The two columns created as a result should be called day and value. Save the result as weather2.
View the result with head().

# Load the tidyr package
library(tidyr)

# Gather the columns
weather2 <- gather(weather, day, value, X1:X31, na.rm = TRUE)

# View the head
head(weather2)

4.7 Values are variable names

Our data suffer from a second common symptom of messy data: values are variable names. Specifically, values in the measure column should be variables (i.e. column names) in our dataset.

The spread() function from tidyr is designed to help with this. To remind you of how this function works, we’ve loaded another small dataset called df2 (which is the result of applying gather() to the original df from last exercise). Give the following a try before attempting the instructions below.

spread(df2, time, val)

Note how the values of the time column now become column names. The tidyr package is already loaded.

Instruction:

Using the code provided, remove the first column of weather2, assigning to without_x.
Spread the measure column of without_x and save the result to weather3.
View the result with head().

# First remove column of row names
without_x <- weather2[, -1]

# Spread the data
weather3 <- spread(without_x, measure, value)

# View the head
head(weather3)

4.8 Prepare the data for analysis (video)

4.9 Clean up dates

Now that the weather dataset adheres to tidy data principles, the next step is to prepare it for analysis. We’ll start by combining the year, month, and day columns and recoding the resulting character column as a date. We can use a combination of base R, stringr, and lubridate to accomplish this task.

tidyr and dplyr are already loaded.

Instruction:

Load the stringr and lubridate packages.
Use stringr’s str_replace() to remove the Xs from the day column of weather3.
Create a new column called date. Use the unite() function from tidyr to paste together the year, month, and day columns in order, using - as a separator (see ?unite if you need help).
Coerce the date column using the appropriate function from lubridate.
Use the code provided (select()) to reorder columns, saving the result to weather5.
View the head of weather5.

# Load the stringr and lubridate packages
library(stringr)
library(lubridate)

# Remove X's from day column
weather3$day <- str_replace(weather3$day, "X","")

# Unite the year, month, and day columns
weather4 <- unite(weather3, date, year, month, day, sep = "-")

# Convert date column to proper date format using lubridates's ymd()
weather4$date <- ymd(weather4$date)

# Rearrange columns using dplyr's select()
weather5 <- select(weather4, date, Events, CloudCover:WindDirDegrees)

# View the head of weather5
head(weather5)

4.10 A closer look at column types

It’s important for analysis that variables are coded appropriately. This is not yet the case with our weather data. Recall that functions such as as.numeric() and as.character() can be used to coerce variables into different types.

It’s important to keep in mind that coercions are not always successful, particularly if there’s some data in a column that you don’t expect. For example, the following will cause problems:

as.numeric(c(4, 6.44, "some string", 222))

If you run the code above in the console, you’ll get a warning message saying that R introduced an NA in the process of coercing to numeric. This is because it doesn’t know how to make a number out of a string ("some string"). Watch out for this in our weather data!

Instruction:

Use str() to see how variables are stored in weather5.
View the first 20 rows of weather5. Keep an eye out for strange values!
Try coercing the PrecipitationIn column of weather5 to numeric without saving the result.

# View the structure of weather5
str(weather5)

# Examine the first 20 rows of weather5. Are most of the characters numeric?
head(weather5, 20)

# See what happens if we try to convert PrecipitationIn to numeric
as.numeric(weather5$PrecipitationIn)

4.11 Column type conversions

As you saw in the last exercise, "T" was used to denote a trace amount (i.e. too small to be accurately measured) of precipitation in the PrecipitationIn column. In order to coerce this column to numeric, you’ll need to deal with this somehow. To keep things simple, we will just replace "T" with zero, as a string (“0”).

The dplyr and stringr packages are already loaded.

Instruction:

Use str_replace() from stringr to make the proper replacements in the PrecipitationIn column of weather5.
Run the call to mutate_at as-is to conveniently apply as.numeric() to all columns from CloudCover through WindDirDegrees (reading left to right in the data), saving the result to weather6.
View the structure of weather6 to confirm the coercions were successful.

# Replace "T" with "0" (T = trace)
weather5$PrecipitationIn <- str_replace(weather5$PrecipitationIn, "T", "0")

# Convert characters to numerics
weather6 <- mutate_at(weather5, vars(CloudCover:WindDirDegrees), funs(as.numeric))

# Look at result
str(weather6)

4.12 Missing, extreme, and unexpected values (video)

4.13 Finding missing values

Before dealing with missing values in the data, it’s important to find them and figure out why they exist in the first place. If your dataset is too big to look at all at once, like it is here, remember you can use sum() and is.na() to quickly size up the situation by counting the number of NA values.

The summary() function may also come in handy for identifying which variables contain the missing values. Finally, the which() function is useful for locating the missing values within a particular column.

Instruction:

Use sum() and is.na() to count the number of NA values in weather6.
Look at a summary() of weather6 to figure out how the missings are distributed among the different variables.
Use which() to identify the indices (i.e. row numbers) where Max.Gust.SpeedMPH is NA and save the result to ind (for indices).
Use ind to look at the full rows of weather6 for which Max.Gust.SpeedMPH is missing.

# Count missing values
sum(is.na(weather6))

# Find missing values
summary(weather6)

# Find indices of NAs in Max.Gust.SpeedMPH
ind <- which(is.na(weather6$Max.Gust.SpeedMPH))

# Look at the full rows for records missing Max.Gust.SpeedMPH
weather6[ind, ]

4.14 An obvious error

Besides missing values, we want to know if there are values in the data that are too extreme or bizarre to be plausible. A great way to start the search for these values is with summary().

Once implausible values are identified, they must be dealt with in an intelligent and informed way. Sometimes the best way forward is obvious and other times it may require some research and/or discussions with the original collectors of the data.

Instruction:

View a summary() of weather6.
Use which() to find the index of the erroneous element of weather6$Max.Humidity, saving the result to ind.
Use ind to look at the full row of weather6 for that day.
You discover an extra zero was accidentally added to this value. Correct it in the data.

# Review distributions for all variables
summary(weather6)

# Find row with Max.Humidity of 1000
ind <- which(weather6$Max.Humidity == 1000)

# Look at the data for that day
weather6[ind, ]

# Change 1000 to 100
weather6$Max.Humidity[ind] <- 100

4.15 Another obvious error

You’ve discovered and repaired one obvious error in the data, but it appears that there’s another. Sometimes you get lucky and can infer the correct or intended value from the other data. For example, if you know the minimum and maximum values of a particular metric on a given day…

Instruction:

Use summary() to look at the value of only the Mean.VisibilityMiles variable of weather6.
Determine the element of the value that is clearly erroneous in this column, saving the result to ind.
Use ind to look at the full row of weather6 for this day.
Inspect the values of other variables for this day to determine the correct value of Mean.VisibilityMiles, then make the appropriate fix.

# Look at summary of Mean.VisibilityMiles
summary(weather6$Mean.VisibilityMiles)

# Get index of row with -1 value
ind <- which(weather6$Mean.VisibilityMiles == -1)

# Look at full row
weather6[ind,]

# Set Mean.VisibilityMiles to the appropriate value
weather6$Mean.VisibilityMiles[ind] <- 10

4.16 Check other extreme values

In addition to dealing with obvious errors in the data, we want to see if there are other extreme values. In addition to the trusty summary() function, hist() is useful for quickly getting a feel for how different variables are distributed.

Instruction:

Check a summary() of weather6 one more time for extreme or unexpected values.
View a histogram for MeanDew.PointF.
Do the same for Min.TemperatureF.
And once more for Mean.TemperatureF to compare distributions.

# Review summary of full data once more
summary(weather6)

# Look at histogram for MeanDew.PointF
hist(weather6$MeanDew.PointF)

# Look at histogram for Min.TemperatureF
hist(weather6$Min.TemperatureF)

# Compare to histogram for Mean.TemperatureF
hist(weather6$Mean.TemperatureF)

4.17 Finishing touches

Before officially calling our weather data clean, we want to put a couple of finishing touches on the data. These are a bit more subjective and may not be necessary for analysis, but they will make the data easier for others to interpret, which is generally a good thing.

There are a number of stylistic conventions in the R language. Depending on who you ask, these conventions may vary. Because the period (.) has special meaning in certain situations, we generally recommend using underscores (_) to separate words in variable names. We also prefer all lowercase letters so that no one has to remember which letters are uppercase or lowercase.

Finally, the events column (renamed to be all lowercase in the first instruction) contains an empty string ("") for any day on which there was no significant weather event such as rain, fog, a thunderstorm, etc. However, if it’s the first time you’re seeing these data, it may not be obvious that this is the case, so it’s best for us to be explicit and replace the empty strings with something more meaningful.

Instruction:

We’ve created a vector of column names in your workspace called new_colnames, all of which obey the conventions described above. Clean up the column names of weather6 by assigning new_colnames to names(weather6).
Replace all empty strings in the events column of weather6 with "None".
One last time, print out the first 6 rows of the weather6 data frame to see the changes.

# Clean up column names
names(weather6) <- new_colnames

# Replace empty cells in events column
weather6$events[weather6$events == ""] <- "None"
    
# Print the first 6 rows of weather6
head(weather6)

4.18 Your data are clean! (video)

你可能感兴趣的:(R-DataCamp-Cleaning Data in R)

C++ 对txt文档进行编辑阿波茨的鹅 C++语法
#includeusingnamespacestd;#includeFILE*stream;//定义一个文件类型的指针变量，以便接下来对文件操作errno_terr;//定义一个errno_t类型的变量，以便监视读取文件操作（open）是否成功（err=0/err=2）intmain(){//利用fopen（之前定义的FEIL类型的变量地址，|文件地址，|r/w）if((err=fopen_s(&
Python中dataframe的to_list和to_list()差距 emmmmXxxy python list
先新建一个dataframe数据框df=pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[5,6,7]})df结果然后看一下两者的区别dataframe的to_list1df['b']结果031425Name:b,dtype:int642df['b'].to_list结果3看一下数据类型type(df['b'].to_list)结果methoddataframe
基础篇——数据库与表操作暴怒的代码 oracle 数据库
引言在掌握MySQL环境搭建后，数据库与表的操作是开发者必须精通的核心技能。本文系统讲解数据库与表的创建、数据类型选择、约束设计以及表结构修改四大模块，特别标注20+个新手高频踩坑点，帮助读者避开90%的常见错误。一、数据库与表的基础操作1.1创建/删除数据库标准语法：--创建数据库（必须指定字符集）CREATEDATABASEshop_dbDEFAULTCHARACTERSETutf8mb4CO
javaweb将上传的图片保存在项目文件webapp下的upload文件夹下 yuren_xia 后端技术前端技术 web app java tomcat eclipse
前端HTML表单(upload.html)首先，创建一个HTML页面，允许用户选择并上传图片。图片上传上传图片注意：表单的method设置为"post"，enctype需设置成"multipart/form-data"2.后端Servlet(UploadServlet.java)接下来，创建一个Servlet来处理文件上传请求，并将上传的图片保存到webapp/load目录下。packagecom
Spring 核心技术解析【纯干货版】- XII：Spring 数据访问模块 Spring-R2dbc 模块精讲 m0_74825003 面试学习路线阿里巴巴 spring java 后端
在现代应用架构中，高并发、低延迟的需求推动了响应式编程的发展，而传统的JDBC由于其同步阻塞机制，在高吞吐场景下可能成为瓶颈。R2DBC（ReactiveRelationalDatabaseConnectivity）作为响应式关系型数据库访问标准，正是为了解决这一问题而诞生的。SpringR2DBC作为Spring生态对R2DBC的封装，提供了非阻塞、异步的数据库访问能力，并与SpringWebF
C语言——转义字符糙米薏仁汤女士 c语言开发语言
转义字符，顾名思义，就是转变原来字符的意思\?在书写连续多个问号时使用，防止他们被解析成三字母词\'用于表示字符常量'\"用于表示一个字符串内部的双引号\\用于表示一个反斜杠，防止它被解释为一个转义序列符\a警告字符，触发电脑的蜂鸣\b退格符\f进纸符\n换行\r回车\t水平制表符\v垂直制表符\dddddd表示1~3个八进制的数字。如：\130X（八进制的130转化为十进制—88，所对应的ASC
javaweb文件上传：@MultipartConfig注解与Apache Commons FileUpload对比 yuren_xia 后端技术 apache java tomcat
在JavaWeb应用中处理文件上传时，可以选择使用@MultipartConfig注解或第三方库如ApacheCommonsFileUpload（通常简称为fileupload）。以下是两者的比较和建议：使用@MultipartConfig注解简介：@MultipartConfig是JavaServlet规范中用于处理multipart/form-data请求（通常是文件上传）的注解。它简化了在S
Exception:data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 69 解决方案爱编程的喵喵 Python基础课程 python tokenizer PyPreTokenizer 解决方案
大家好，我是爱编程的喵喵。双985硕士毕业，现担任全栈工程师一职，热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。现为CSDN博客专家、人工智能领域优质创作者。喜欢通过博客创作的方式对所学的知识进行总结与归纳，不仅形成深入且独到的理解，而且能够帮助新手快速入门。本文主要介绍了Exception:datadidn
DeepSeek R1 简单指南：架构、训练、本地部署和硬件要求爱喝白开水a 人工智能 AI大模型 DeepSeek R1 DeepSeek 算法人工智能训练大模型部署
DeepSeek推出的LLM推理新策略DeepSeek最近发表的论文DeepSeek-R1中介绍了一种创新的方法，通过强化学习（RL）提升大型语言模型（LLM）的推理能力。这项研究在如何仅依靠强化学习而不是过分依赖监督式微调的情况下，增强LLM解决复杂问题的能力上，取得了重要进展。DeepSeek-R1技术概述模型架构DeepSeek-R1不是一个单独的模型，而是包括DeepSeek-R1-Zer
driver中为什么要使用非阻塞赋值 m0_71354184 systemverilog
1.模拟硬件时序行为实际硬件行为：DUT的输入信号通常在时钟边沿被采样。Driver需要确保信号的更新与时钟同步，而非阻塞赋值的延迟更新特性（在时间步结束时统一生效）能够准确模拟寄存器的行为。示例：always@(posedgeclk)begin//非阻塞赋值：信号在时钟边沿后更新data<=next_data;//当前时钟周期计算next_data，下一时钟生效valid<=next_valid
Flask——request的form_data_args用法活动的笑脸 flask框架 flask python 后端
request中包含了前端发送过来的所有请求数据，在使用前要进行导入request库fromflaskimportFlask,request1.form和data是用来提取请求体数据，通过request.form可以直接提取请求体中的表单格式的数据，是一个类字典的对象，例如：fromflaskimportFlask,requestapp=Flask(__name__)@app.route("/in
python pandas 读取数据库_Python+Pandas 获取数据库并加入DataFrame的实例 weixin_39955149 python pandas 读取数据库
Python+Pandas获取数据库并加入DataFrame的实例实例如下所示：importpandasaspdimportsysimportimpimp.reload(sys)fromsqlalchemyimportcreate_engineimportcx_Oracledb=cx_Oracle.connect('userid','password','10.10.1.10:1521/dbins
python把oracle的查询结果导出为insert语句优游的鱼 oracle python 数据库开发语言
可以使用cx_Oracle库在Python中连接Oracle数据库并执行查询。然后，可以使用pandas库将查询结果读取为DataFrame，并使用to_sql()方法将其导出为insert语句。示例代码如下：importcx_Oracleimportpandasaspd#ConnecttoOracledatabaseconn=cx_Oracle.connect('username/passwor
Postgresql 查询数据库列表，表列表，字段列表小毛驴850 postgresql 数据库
--列出数据库列表SELECT*FROMpg_database;--查询表字段明细SELECTcol.table_schema,col.table_name,col.ordinal_position,col.column_name,col.data_type,col.character_maximum_length,col.numeric_precision,col.numeric_scale,c
MRtrix3安装报错2则：Checking for Qt: ERROR: error linking Qt application! 皎皎如月明 linux 运维服务器
服了，mrtrix3官网现在只推荐了conda安装这种方法，但是conda安装我一直报错，报错代码如下：Solvingenvironment:failedwithinitialfrozensolve.Retryingwithflexiblesolve.Collectingpackagemetadata(repodata.json):donePackagesNotFoundError:Thefoll
使用Python或R语言重新拟合模型 pk_xz123456 python 算法 python r语言开发语言
以下分别给出使用Python和R语言完成该任务的示例代码，假设我们有一个包含被试编号、实验条件和反应时的数据，并且要拟合一个线性回归模型。Python实现importpandasaspdimportnumpyasnpimportstatsmodels.apiassm#生成示例数据data={'subject':np.repeat(range(1,11),5),'condition':np.tile
软件定义网络（SDN）技术解析：现代网络架构的革新 Lethehong SDN技术网络架构 php 开发语言
嗨，我是Lethehong！立志在坚不欲说，成功在久不在速欢迎关注：点赞⬆️留言收藏欢迎使用：小智初学计算机网页AI文末第六点有：基于SDN控制器的流量转发示例目录一、什么是软件定义网络（SDN）？二、SDN的工作原理1、控制平面（ControlPlane）2、数据平面（DataPlane）：3、应用平面（ApplicationPlane）：三、SDN的关键技术1、OpenFlow协议2、SDN控
向量数据库实战介绍 Zhank10 数据库
本文将介绍三种常用的向量数据库：faiss,Milvus和Qdrant，并给出一个具体的使用例子。向量数据库（VectorDatabase）是一种专门用于存储、管理、查询、检索向量的数据库，主要应用于人工智能、机器学习、数据挖掘等领域。在向量数据库中，数据以向量的形式进行存储和处理，需要将原始的非向量型数据转化为向量表示（比如文本使用Embedding技术获得其表征向量）。这种数据库能够高效地进行
python 多进程 Zswdhy python
#-*-coding:utf-8-*-importtimeimportpymysqlfrommultiprocessingimportProcessfromdatetimeimportdatemonth=date.today().strftime("%Y%m")HOST,USER,PASSWD,DB,PORT='192.168.1.1','admin','password','database',
Pandas逐行读取DataFrame数据以及修改对应数据 Zswdhy python python
逐行读取数据，并修改对应数据#remove_data，为一个DataFrame对象forindexsinremove_data.index:#逐行查看，values可以用int型索引remove_data.loc[indexs].values[0:-1]#逐行修改列值remove_data.loc[indexs,"Norm_peptide"]=norm_protein#也可以用loc方法查看指定元
向量数据库milvus部署一方有点方 milvus
官方文档MilvusvectordatabasedocumentationRunMilvusinDocker(Linux)|MilvusDocumentationMilvusvectordatabasedocumentation按部署比较简单，这里说一下遇到的问题一：DockerCompose方式部署1、镜像无法拉取,(docker.io被禁)只能获取以下镜像，image:quay.io/core
2024年最全Python入门的60个基础练习（二）(1) 2401_84281588 程序员 python 开发语言
data=f.read(4)#读4字节f.readline()#读到换行符、n结束f.readlines()#把每一行数据读出来放到列表中f.close()################################f=open(‘/tmp/passwd’)forlineinf:print(line,end=‘’)f.close()##############################f
android——Livedata、StateFlow、ShareFlow和Channel的介绍和使用 wy313622821 kotlin -java android
目录一、LiveData介绍二、StateFlow介绍三、ShareFlow介绍四、Channel介绍小结一、LiveData介绍LiveData是一种在Android开发中用于观察数据变化的组件。它可以被观察者注册并在数据变化时通知观察者，从而实现数据的实时更新。LiveData具有生命周期感知能力，它会自动管理观察者的生命周期，确保观察者只会在活动状态下接收数据更新。示例代码classMyVi
大模型专栏博文汇总和索引 Donvink 大模型 transformer 深度学习人工智能语言模型
大模型专栏主要是汇总了我在学习大模型相关技术期间所做的一些总结和笔记，主要包括以下几个子专栏：DeepSeek-R1AIGC大模型实践Transformer多模态系统视频理解对比学习目标检测目标跟踪图神经网络大模型专栏汇总了以上所有子专栏的论文，目前暂时先按照不同的技术领域划分子专栏，子专栏之间的内容可能会有交集，不完全是独立的。为了方便查阅相关模块的内容，故以此文章进行汇总与索引。一、DeepS
慢慢欣赏linux 网络协议栈二 net_device以及初始化注册 (4.19版本) 天麓网络 linux device driver linux内核 linux 网络协议网络
代码流程staticint__initnet_dev_init(void){BUG_ON(!dev_boot_phase);dev_proc_init();=>int__initdev_proc_init(void){intret=register_pernet_subsys(&dev_proc_ops);==>staticstructpernet_operations__net_initdata
DeepSeek 持续火爆；微信蓝包首秀；世界级人工智能科学家许主洪加盟阿里巴巴...|网易数智日报网易数智网易数智日报人工智能大数据业界资讯 ai 云计算
DeepSeek持续火爆，多个云平台上线相关模型「抢食」算力需求AI公司DeepSeek旗下大模型DeepSeek-R1「爆火」后，多个云平台宣布上线DeepSeek旗下模型。2月5日，阅文集团宣布，旗下作家辅助创作应用“作家助手”已集成幻方量化旗下AI公司深度求索（DeepSeek）的DeepSeek-R1大模型。这是DeepSeek首次应用于网文领域，旨在为作家提供更智能的创作支持。2月4日，
Android LiveData(一)：介绍和简单使用且听风吟9527 框架原理 LiveData 框架原理源码分析
传统的组件间的通信方式有Handler、BroadcastReceiver，Interface、EventBus等等方式实现，他们有自己适合的应用场景，也有各自的弊端。这里介绍新的组件通信同时LiveData，它是一个数据持有类，具有以下特点：数据可以被观察者订阅能够感知组件(Fragment、Activity、Service）)的生命周期组件处于active状态时，会通知观察者有数据更新对于观察
LiveData真的会被Flow替代吗？ Android-Developer android
LiveData和Flow都是Android中用于响应式编程的工具，但它们有不同的使用场景和优缺点。先看一下LiveData和Flow的简单使用：LiveData是一种可观察的数据持有者，它可以感知生命周期并在数据发生变化时通知观察者。在Android中，LiveData通常用于将数据从ViewModel传递到UI层。以下是LiveData的使用步骤：1.创建LiveData对象可以通过继承Liv
Android开发实战班 - 应用架构 - LiveData/Flow 数据流老码小张 Android 开发实战班 android 架构
在MVVM架构中，数据流是连接ViewModel和View的重要桥梁，用于实现数据的观察和响应。Jetpack提供了两种主要的数据流机制：LiveData和Flow。本章节将深入讲解LiveData和Flow的概念、使用方法、区别以及在实际开发中的应用场景，帮助学员掌握数据流的应用。数据流概述数据流的作用:数据流用于在ViewModel和View之间传递数据，实现数据的观察和响应。数据流可以感知生
COMP8410 Data Mining S1 2025 后端
COMP8410DataMiningS12025PostgraduateAssignment1Maximummarks100Weight20%ofthetotalmarksforthecourseMintopasshurdle30%LengthMaximumof8pagesexcludingcoverpage,bibliographyandappendices.LayoutA4.Atleast11
继之前的线程循环加到窗口中运行 3213213333332132 java thread JFrame JPanel
之前写了有关java线程的循环执行和结束，因为想制作成exe文件，想把执行的效果加到窗口上，所以就结合了JFrame和JPanel写了这个程序，这里直接贴出代码，在窗口上运行的效果下面有附图。 package thread; import java.awt.Graphics; import java.text.SimpleDateFormat; import java.util
linux 常用命令 BlueSkator linux 命令
1.grep 相信这个命令可以说是大家最常用的命令之一了。尤其是查询生产环境的日志，这个命令绝对是必不可少的。但之前总是习惯于使用（grep -n 关键字文件名）查出关键字以及该关键字所在的行数，然后再用（sed -n '100,200p' 文件名），去查出该关键字之后的日志内容。但其实还有更简便的办法，就是用（grep -B n、-A n、-C n 关键
php heredoc原文档和nowdoc语法 dcj3sjt126com PHP heredoc nowdoc
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Current To-Do List</title> </head> <body> <?
overflow的属性周华华 JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
《我所了解的Java》——总体目录 g21121 java
准备用一年左右时间写一个系列的文章《我所了解的Java》，目录及内容会不断完善及调整。在编写相关内容时难免出现笔误、代码无法执行、名词理解错误等，请大家及时指出，我会第一时间更正。 &n
[简单]docx4j常用方法小结 53873039oycg docx
本代码基于docx4j-3.2.0，在office word 2007上测试通过。代码如下: import java.io.File; import java.io.FileInputStream; import ja
Spring配置学习云端月影 spring配置
首先来看一个标准的Spring配置文件 applicationContext.xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi=&q
Java新手入门的30个基本概念三 aijuans java 新手 java 入门
17.Java中的每一个类都是从Object类扩展而来的。　　18.object类中的equal和toString方法。　　equal用于测试一个对象是否同另一个对象相等。　　toString返回一个代表该对象的字符串,几乎每一个类都会重载该方法,以便返回当前状态的正确表示.(toString 方法是一个很重要的方法)　　 19.通用编程:任何类类型的所有值都可以同object类性的变量来代替。　
《2008 IBM Rational 软件开发高峰论坛会议》小记 antonyup_2006 软件测试敏捷开发项目管理 IBM 活动
我一直想写些总结,用于交流和备忘,然都没提笔,今以一篇参加活动的感受小记开个头,呵呵! 其实参加《2008 IBM Rational 软件开发高峰论坛会议》是9月4号,那天刚好调休.但接着项目颇为忙,所以今天在中秋佳节的假期里整理了下. 参加这次活动是一个朋友给的一个邀请书,才知道有这样的一个活动,虽然现在项目暂时没用到IBM的解决方案,但觉的参与这样一个活动可以拓宽下视野和相关知识.
PL/SQL的过程编程,异常,声明变量,PL/SQL块百合不是茶 PL/SQL的过程编程异常 PL/SQL块声明变量
PL/SQL; 过程; 符号; 变量; PL/SQL块; 输出; 异常; PL/SQL 是过程语言(Procedural Language)与结构化查询语言(SQL)结合而成的编程语言PL/SQL 是对 SQL 的扩展,sql的执行时每次都要写操作
Mockito(三)--完整功能介绍 bijian1013 持续集成 mockito 单元测试
mockito官网：http://code.google.com/p/mockito/，打开documentation可以看到官方最新的文档资料。一.使用mockito验证行为 //首先要import Mockito import static org.mockito.Mockito.*; //mo
精通Oracle10编程SQL(8)使用复合数据类型 bijian1013 oracle 数据库 plsql
/* *使用复合数据类型 */ --PL/SQL记录 --定义PL/SQL记录 --自定义PL/SQL记录 DECLARE TYPE emp_record_type IS RECORD( name emp.ename%TYPE, salary emp.sal%TYPE, dno emp.deptno%TYPE ); emp_
【Linux常用命令一】grep命令 bit1129 Linux常用命令
grep命令格式 grep [option] pattern [file-list] grep命令用于在指定的文件(一个或者多个,file-list)中查找包含模式串(pattern)的行,[option]用于控制grep命令的查找方式。 pattern可以是普通字符串，也可以是正则表达式，当查找的字符串包含正则表达式字符或者特
mybatis3入门学习笔记白糖_ sql ibatis qq jdbc 配置管理
MyBatis 的前身就是iBatis，是一个数据持久层(ORM)框架。 MyBatis 是支持普通 SQL 查询，存储过程和高级映射的优秀持久层框架。MyBatis对JDBC进行了一次很浅的封装。以前也学过iBatis，因为MyBatis是iBatis的升级版本，最初以为改动应该不大，实际结果是MyBatis对配置文件进行了一些大的改动，使整个框架更加方便人性化。
Linux 命令神器：lsof 入门 ronin47 lsof
lsof是系统管理/安全的尤伯工具。我大多数时候用它来从系统获得与网络连接相关的信息，但那只是这个强大而又鲜为人知的应用的第一步。将这个工具称之为lsof真实名副其实，因为它是指“列出打开文件（lists openfiles）”。而有一点要切记，在Unix中一切（包括网络套接口）都是文件。有趣的是，lsof也是有着最多
java实现两个大数相加，可能存在溢出。 bylijinnan java实现
import java.math.BigInteger; import java.util.regex.Matcher; import java.util.regex.Pattern; public class BigIntegerAddition { /** * 题目：java实现两个大数相加，可能存在溢出。 * 如123456789 + 987654321
Kettle学习资料分享，附大神用Kettle的一套流程完成对整个数据库迁移方法 Kai_Ge Kettle
Kettle学习资料分享 Kettle 3.2 使用说明书目录概述..........................................................................................................................................7 1.Kettle 资源库管
[货币与金融]钢之炼金术士 comsci 金融
自古以来,都有一些人在从事炼金术的工作.........但是很少有成功的那么随着人类在理论物理和工程物理上面取得的一些突破性进展...... 炼金术这个古老
Toast原来也可以多样化 dai_lm android toast
Style 1：默认 Toast def = Toast.makeText(this, "default", Toast.LENGTH_SHORT); def.show(); Style 2：顶部显示 Toast top = Toast.makeText(this, "top", Toast.LENGTH_SHORT); t
java数据计算的几种解决方法3 datamachine java hadoop ibatis r-langue r
4、iBatis 简单敏捷因此强大的数据计算层。和Hibernate不同，它鼓励写SQL，所以学习成本最低。同时它用最小的代价实现了计算脚本和JAVA代码的解耦，只用20%的代价就实现了hibernate 80%的功能,没实现的20%是计算脚本和数据库的解耦。复杂计算环境是它的弱项，比如：分布式计算、复杂计算、非数据
向网页中插入透明Flash的方法和技巧 dcj3sjt126com html Web Flash
将 Flash 作品插入网页的时候，我们有时候会需要将它设为透明，有时候我们需要在Flash的背面插入一些漂亮的图片，搭配出漂亮的效果……下面我们介绍一些将Flash插入网页中的一些透明的设置技巧。　　一、Swf透明、无坐标控制　　首先教大家最简单的插入Flash的代码，透明，无坐标控制：　　注意wmode="transparent"是控制Flash是否透明
ios UICollectionView的使用 dcj3sjt126com
UICollectionView的使用有两种方法，一种是继承UICollectionViewController，这个Controller会自带一个UICollectionView；另外一种是作为一个视图放在普通的UIViewController里面。个人更喜欢第二种。下面采用第二种方式简单介绍一下UICollectionView的使用。 1.UIViewController实现委托，代码如
Eos平台java公共逻辑蕃薯耀 Eos平台java公共逻辑 Eos平台 java公共逻辑
Eos平台java公共逻辑 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月1日 17:20:4
SpringMVC4零配置--Web上下文配置【MvcConfig】 hanqunfeng springmvc4
与SpringSecurity的配置类似，spring同样为我们提供了一个实现类WebMvcConfigurationSupport和一个注解@EnableWebMvc以帮助我们减少bean的声明。 applicationContext-MvcConfig.xml  <
解决ie和其他浏览器poi下载excel文件名乱码 jackyrong Excel
使用poi,做传统的excel导出，然后想在浏览器中，让用户选择另存为，保存用户下载的xls文件，这个时候，可能的是在ie下出现乱码（ie,9,10,11),但在firefox,chrome下没乱码，因此必须综合判断，编写一个工具类： /** * * @Title: pro
挥洒泪水的青春 lampcy 编程生活程序员
2015年2月28日，我辞职了，离开了相处一年的触控，转过身--挥洒掉泪水，毅然来到了兄弟连，背负着许多的不解、质疑——”你一个零基础、脑子又不聪明的人，还敢跨行业，选择Unity3D？“，”真是不自量力••••••“，”真是初生牛犊不怕虎•••••“，••••••我只是淡淡一笑，拎着行李----坐上了通向挥洒泪水的青春之地——兄弟连！这就是我青春的分割线，不后悔，只会去用泪水浇灌——已经来到
稳增长之中国股市两点意见-----严控做空，建立涨跌停版停牌重组机制 nannan408
对于股市，我们国家的监管还是有点拼的，但始终拼不过飞流直下的恐慌，为什么呢？笔者首先支持股市的监管。对于股市越管越荡的现象，笔者认为首先是做空力量超过了股市自身的升力，并且对于跌停停牌重组的快速反应还没建立好，上市公司对于股价下跌没有很好的利好支撑。我们来看美国和香港是怎么应对股灾的。美国是靠禁止重要股票做空，在
动态设置iframe高度(iframe高度自适应) Rainbow702 JavaScript iframe contentDocument 高度自适应局部刷新
如果需要对画面中的部分区域作局部刷新，大家可能都会想到使用ajax。但有些情况下，须使用在页面中嵌入一个iframe来作局部刷新。对于使用iframe的情况，发现有一个问题，就是iframe中的页面的高度可能会很高，但是外面页面并不会被iframe内部页面给撑开，如下面的结构： <div id="content"> <div id=&quo
用Rapael做图表 tntxia rap
function drawReport(paper,attr,data){ var width = attr.width; var height = attr.height; var max = 0; &nbs
HTML5 bootstrap2网页兼容（支持IE10以下） xiaoluode html5 bootstrap
<!DOCTYPE html> <html> <head lang="zh-CN"> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge">