【INST0069】Exercises on graph databases

Description
Using the Cypher statements given in the next section, you will create on Neo4j Desktop a graph
database describing a fictitious social network where users share information about the books they have read. In this network, users can follow other users and rate the books they have read. The
database contains information about the users (username and age), the books (id, title, genre,
author and publisher), the “follows” relationships among the users, the “read” relationships
between the users and the books, and the ratings of the users for the books they have read.
For the first exercise, you will create appropriate Cypher statements to retrieve information from
the database (as described in the exercise). For the second exercise, you will think of some simple
recommendation algorithms with which you would recommend users to follow or books to read to a
user of the network and implement these algorithms using Cypher statements.
Your report must contain:
- The Cypher statements that you created for exercise 1.
- The Cypher statements that you created for exercise 2 along with some brief description of
the recommendation algorithms they implement.
Setup
Before you attempt the exercises, follow the steps below to create the graph database on Neo4j
Desktop. Examine the csv files and the database that you created to verify that the database has
been correctly implemented and to familiarise with the structure of the database.
1. Using Neo4j Desktop, create a new graph database and name it Book Graph.
2. Download from Moodle and copy the files books.csv, users.csv, followers.csv and ratings.csv to
the import folder of your database.
3. Run the following Cypher statements (one at a time) to populate the database with data about
books, the users of a social network, the relationships among the users and the relationships
between the users and the books.
a. LOAD CSV WITH HEADERS from 'file:///books.csv' AS book
CREATE (:Book {bookID:book.BookId, title:book.Title, genre:book.Genre,
author:book.Author, publisher:book.Publisher})
This should create 101 nodes with label Book, each with a bookID, a title, a genre, an author and a
publisher property
b. LOAD CSV WITH HEADERS from 'file:///users.csv' AS user CREATE (:User
{username:user.Username, age:toFloat(user.Age)})
This should create 26 nodes with label User, each with a username and an age property.
c. LOAD CSV WITH HEADERS from 'file:///followers.csv' AS fol
MATCH (u1:User {username:fol.User1}), (u2:User {username:fol.User2})
CREATE (u1)-[:FOLLOWS]->(u2)
This should create 100 :FOLLOWS relationships among users.
d. LOAD CSV WITH HEADERS from 'file:///ratings.csv' AS rat
MATCH (u:User {username:rat.User}), (b:Book {bookID:rat.Book}) CREATE (u)-
[:READ {rating:toInteger(rat.Rating)}]->(b)
This should create 199 :READ relationships between users and books each with a rating property.
Exercise 1
Create Cypher queries to:
1. List the titles of the books that have been read by Charles and by a user whose age is
more than 20, and have received a rating by both that is greater than 2.
2. List the titles and authors of the books that have been published by MIT Press, Penguin,
Springer or Wiley and their genre is fiction, history, mathematics or economics. Show the
results in alphabetic order of the titles.
3. For each pair of users such that one follows the other, list the titles and the publishers of
the books that they have both read.
4. List the names of users who follow Fiona and have read more than 10 books. For each
such user, show also the number of books they have read.
5. List all publishers such that the average rating of the books they have published is higher
than the average rating of the books published by Pearson.
6. List the nodes in the shortest path from Adam to Lilly.
7. Show the maximum distance from a user to a science book, where the distance from node
A to node B is the length of the shortest path from A to B.
8. List the titles of the books for which the publisher is not known and for each of them the list
of names of the users that have read them. For each such book, add the label
UknownPublisher.
9. List the names of the users that are followed by Fiona and, if they have read any nonfiction
books, the list of titles of those books.
10. A book is considered popular if it has been read by more than 4 users and it has received
at least two ratings that are greater than 3. List the titles of the popular books.
Exercise 2
1. Write down two algorithms that provide recommendations for users to follow, using the
available information about users and books, and implement each of them as a CYPHER
statement. The statement should create new RECOMMENDED_USER relationships, each
connecting a user with a recommended user to follow. The recommended users should not
include those that the user already follows.
2. Write down two algorithms that provide recommendations for books, using the available
information about users and books, and implement each of them as a CYPHER statement.
The statement should create new RECOMMENDED_BOOK relationships, each connecting
a user with a recommended book. The recommended books should not include those that
the user has already read.
Marking Criteria and Procedure
This set of exercises counts as 50% of the total course assessment. Exercise 1 is worth 30%
(marks are divided equally among its subquestions) and Exercise 2 is worth 20% (marks are
divided equally among its subquestions). Marks will be awarded according to:
- whether the answers are technically correct (i.e. the syntax of the Cypher statements is
correct and the statements produce the correct results)
- whether the answers given are as straightforward as possible and not more complicated
than necessary
- whether the answers are set out clearly and in good style
- ( for Exercise 2 ) whether the recommendation algorithms are clearly described and correctly
implemented and the recommendations they produce are reasonable

你可能感兴趣的:(数据库)