7.25 BFS

[ Purpose ]

We implemented a distributed grep system that allows user to run grep on log files that are local to all live machines in the distributed system.

[ User guide ]

1. run source startUP.sh to start
1. follow the prompt to run server/client code
1. to test the logGenerator, run ./test.sh
1. to create new random logs
- run go run ./src/github.com/jiayuc/logGenerator/logGenerator.go [known line] [known line count] [total line count]
1. - Modify ./src/github.com/jiayuc/info/* if you need to add your own vm ips.
- Note vmIPs contains ip of all machines in the system; contactList contains ips of all machines that the client will try to request the grep

tip:

- assume vms have static machine name and ip addresses

use the following command to manually pre-determine your ip addresses if needed
"ip route get 8.8.8.8 | awk '{print $NF; exit}'"

[ Design ]

- Basic architecture

Basically, all machines run as servers waiting for incoming client query. We can pick any machine as the querying machine by running client code on it. The client code prompts for user input as the querying pattern and sends the pattern over to servers via TCP. We stored an IPlist containing IPs of all machines in the system, and that’s what client refers to when connecting to all servers. Once a server receives a new querying pattern, it runs grep on its local log file and sends back the search result along with a header containing information such as its machine name and total line counts to the client.

- fault tolerance

While trying to send out search pattern to all servers, the client will print out that it fails to connect to any server that died. But this doesn’t affect the running program getting the correct result from all the remaining machine.

[ Unit test ]

We created a logGenerator that generates random log that containing specific number of given known lines. It takes three parameters: a string as known line to be inserted, an integer indicating the number of times this known line should occur in the output file, and another integer indicating the number of total lines in the output file. The other lines will be randomly generated strings.

For testing logGenerator, we used go testing, a lightweight testing framework for golang. Run the test will verify if the desired output file is created as described in the last paragraph. Internally, it checks if the file is created, and run grep on the file to see if it gets the correct line count.

[ Performance ]

Testing on 7 machine using 60MB log file on each machine.
to do: add more accurate data later