SQL Queries from inside R

In your life as a data scientist, you'll often be working with huge databases that contain tables with millions of rows. If you want to do some analyses on this data, it's possible that you only need a fraction of this data. In this case, it's a good idea to send SQL queries to your database, and only import the data you actually need into R.

dbGetQuery() is what you need. As usual, you first pass the connection object to it. The second argument is an SQL query in the form of a character string. This example selects theagevariable from thepeopledataset where gender equals "male":

dbGetQuery(con, "SELECT age FROM people WHERE gender = 'male'")

Apart from checking equality, you can also check forless thanandgreater thanrelationships, with, just like in R.

# Connect to the database

library(DBI)

con <- dbConnect(RMySQL::MySQL(),

dbname = "tweater",

host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com",

port = 3306,

user = "student",

password = "datacamp")

# Import post column of tweats where date is higher than '2015-09-21': latest

latest<-dbGetQuery(con,"select post from tweats where date>'2015-09-21'")

# Print latest

latest

# Create data frame specific

specific<-dbGetQuery(con,"select message from comments where tweat_id=77 and user_id>4")

# Print specific

specific

There are also dedicated SQL functions that you can use in theWHEREclause of an SQL query. For example,CHAR_LENGTH() returns the number of characters in a string.


Of course, SQL does not stop with the the three keywordsSELECT,FROMandWHERE. Another very often used keyword isJOIN, and more specificallyINNER JOIN. Take this call for example:

SELECT name, post

FROM users INNER JOIN tweats on users.id = user_id

WHERE date > "2015-09-19"

Here, theuserstable is joined with thetweatstable. This is possible because theidcolumn in theuserstable corresponds to theuser_idcolumn in thetweatstable. Also notice howname, from theuserstable, andpostanddate, from thetweatstable, can be referenced to without problems.

Can you predict the outcome of the following query?

SELECT post, message

FROM tweats INNER JOIN comments on tweats.id = tweat_id

WHERE tweat_id = 77


You've used dbGetQuery() multiple times now. This is a virtual function from the DBI package, but is actually implemented by the RMySQL package. Behind the scenes, the following steps are performed:

Sending the specified query with dbSendQuery();

Fetching the result of executing the query on the database with dbFetch();

Clearing the result with dbClearResult().

# Send query to the database

res <- dbSendQuery(con, "SELECT * FROM comments WHERE user_id > 4")

# Use dbFetch() twice

dbFetch(res, n = 2)

dbFetch(res)

# Clear res

dbClearResult(res)

你可能感兴趣的:(SQL Queries from inside R)