In this lab you will build a lock server and client that cache locks at the client, reducing the load on the server and improving client performance. For example, suppose that an application using YFS creates 100 files in a directory. Your Lab 3 yfs_client will probably send 100 acquire and release RPCs for the directory's lock. This lab will modify the lock client and server so that the lock client sends (in the common case) just one acquire RPC for the directory's lock, and caches the lock thereafter, only releasing it if another yfs_client needs it.
The challenge in the lab is the protocol between the clients and the server. For example, when client 2 acquires a lock that client 1 has cached, the server must revoke that lock from client 1 by sending a revoke RPC to client 1. The server can give client 2 the lock only after client 1 has released the lock, which may be a long time after sending the revoke (e.g., if a thread on client 1 holds the lock for a long period). The protocol is further complicated by the fact that concurrent RPC requests and replies may not be delivered in the same order in which they were sent.
We'll test your caching lock server and client by seeing whether it reduces the amount of lock RPC traffic that your yfs_client generates. We will test with both RPC_LOSSY set to 0 and RPC_LOSSY set to 5.
Begin by merging your solution to lab 3 with the new code for lab 4:
% cd ~/lab % git commit -am 'my solution to lab3' Created commit ... % git pull remote: Generating pack... ... % git checkout -b lab4 origin/lab4 Branch lab5 set up to track remote branch refs/remotes/origin/lab5. Switched to a new branch "lab4" % git merge lab3
This will add these new files to your lab directory:
Note that although lock_client_cache extends the lock_client class from Lab 1, you probably won't be able to reuse any code from the parent class; we use a subclass here so that yfs_client can use the two implementations interchangeably. However, you might find some member variables useful (such as lock_client's RPC client cl).
We have also made changes to the following files:
Here is the set of states we recommend for the client:
A single client may have multiple threads waiting for the same lock, but only one thread per client ever needs to be interacting with the server; once that thread has acquired and released the lock it can wake up other threads, one of which can acquire the lock (unless the lock has been revoked and released back to the server). If you need a way to identify a thread, you can use its thread id (tid), which you can get using pthread_self().
When a client asks for a lock with an acquire RPC, the server grants the lock and responds with OK if the lock is not owned by another client (i.e., the lock is free). If the lock is not free, and there are other clients waiting for the lock, the server responds with a RETRY. Otherwise, the server sends a revoke RPC to the owner of the lock, and waits for the lock to be released by the owner. Finally, the server sends a retry to the next waiting client (if any), grants the lock and responds with OK.
Note that RETRY and retry are two different things. RETRY is the value the server returns for a acquire RPC to indicate that the requested lock is not currently available. retry is the RPC that the server sends the client which is scheduled to hold a previously requested lock next.
Once a client has acquired ownership of a lock, the client caches the lock (i.e., it keeps the lock instead of sending a release RPC to the server when a thread releases the lock on the client). The client can grant the lock to other threads on the same client without interacting with the server.
The server sends the client a revoke RPC to get the lock back. This request tells the client that it should send the lock back to the server when it releases the lock or right now if no thread on the client is holding the lock.
The server's per-lock state should include whether it is held by some client, the ID (host name and port number) of that client, and the set of other clients waiting for that lock. The server needs to know the holding client's ID in order to send it a revoke message when another client wants the lock. The server needs to know the set of waiting clients in order to send one of them a retry RPC when the holder releases the lock.
For your convenience, we have defined a new RPC protocol called rlock_protocol in lock_protocol.h to use when sending RPCs from the server to the client. This protocol contains definitions for the retry and revoke RPCs.
Hint: don't hold any mutexes while sending an RPC. An RPC can take a long time, and you don't want to force other threads to wait. Worse, holding mutexes during RPCs is an easy way to generate distributed deadlock.
The following questions might help you with your design (they are in no particular order):
Hint: a client may receive a revoke RPC for a lock before it has received an OK response from its acquire RPC. Your client code will need to remember the fact that the revoke has arrived, and release the lock as soon as you are done with it. The same situation can arise with retry RPCs, which can arrive at the client before the corresponding acquire returns the RETRY failure code.
A reasonable first step would be to implement the basic design of your acquire protocol on both the client and the server, including having the server send revoke messages to the holder of a lock if another client requests it, and retry messages to the next waiting client.
Next you'll probably want to implement the release code path on both the client and the server. Of course, the client should only inform the server of the release if the lock has been revoked.
Also make sure you instantiate a lock_server_cache object in lock_smain.cc, and correctly register the RPC handlers.
Once you have your full protocol implemented, you can run it using the lock tester, just as in Lab 1. For now, don't bother testing with loss:
% export RPC_LOSSY=0 % ./lock_server 3772
Then, in another terminal:
% ./lock_tester 3772
Run lock_tester. You should pass all tests and see no timeouts. You can hit Ctrl-C in the server's window to stop it.
A lock client might be holding cached locks when it exits. This may cause another run of lock_tester using the same lock_server to fail when the lock server tries to send revokes to the previous client. To avoid this problem without worrying about cleaning up, you must restart the lock_server for each run of lock_tester.
Now that it works without loss, you should try testing with RPC_LOSSY=5. Here you may discover problems with reordered RPCs and responses.
% export RPC_LOSSY=5 % ./lock_server 3772
Then, in another terminal:
% export RPC_LOSSY=5 % ./lock_tester 3772
Again, you must restart the lock_server for each run of lock_tester.
In the constructor for your yfs_client, you should now instantiate a lock_client_cache object, rather than a lock_client object. You will also have to include lock_client_cache.h. Once you do that, your YFS should just work under all the Lab 3 tests. We will run your code against all 3 tests (a, b, and c) from Lab 3.
You should also compare running your YFS code with the two different lock clients and servers, with RPC count enabled at the lock server. For this reason, it would be helpful to keep your Lab 3 code around and intact, the way it was when you submitted it. As described below, you can turn on RPC statistics with the RPC_COUNT environment variable. Look for a dramatic drop in the number of acquire (0x7001) RPCs between your Lab 3 and Lab 4 code during the test-lab-3-c test.
The file system tests should pass with RPC_LOSSY set as well. You can pass a loss parameter to start.sh and it will enable RPC_LOSSY automatically:
% ./start.sh 5 # sets RPC_LOSSY to 5If you're having trouble, make sure that the Lab 2 tester passes. If it doesn't, then the issues are most likely with YFS under RPC_LOSSY, rather than your caching lock client.
Please restart all the servers using stop.sh and start.sh for each run of each test.
Our measure of performance is the number of acquire RPCs sent to your lock server while running yfs_client and test-lab-3-c.
The RPC library has a feature that counts unique RPCs arriving at the server. You can set the environment variable RPC_COUNT to N before you launch a server process, and it will print out RPC statistics every N RPCs. For example, in the bash shell you could do:
% export RPC_COUNT=25 % ./lock_server 3772 RPC STATS: 7001:23 7002:2 ...This means that the RPC with the procedure number 0x7001 ( acquire in the original lock_protocol.h file) has been called 23 times, while RPC 0x7002 ( release) has been called twice.
test-lab-3-c creates two subdirectories and creates/deletes 100 files in each directory, using each directory through only one of the two YFS clients. You should count the acquire RPCs for your lab 3 and for your lab 4. If your lab 4 produces a factor of 10 fewer acquire RPCs, then you are doing a good job. This performance goal is vague because the exact numbers depend a bit on how you use locks in yfs_client.
We will check the following:
You are welcome (and encouraged!) to discuss design questions with other students or on the 6.824 discussion list. However, you must write all the code you hand in for the programming assignments, except for code that we give you as part of the assignment. You are not allowed to look at anyone else's solution (and you're not allowed to look at solutions from previous years). You may discuss the assignments with other students, but you may not look at or copy each others' code.
% cd ~/lab % ./stop.sh % make clean % rm core* % rm *log % cd .. % tar czvf `whoami`-lab4.tgz lab/or
% cd ~/6.824/lab % make handinThat should produce a file called [your_user_name]-lab4.tgz in your lab/ directory. Attach that file to an email and send it to the 6.824 submit address.