network programming(most excited):
1. to set up the tcp/ip connection between server and client, server forks out some child process to handle the request from clinet, using pipe to communicate between process and its children process.
2. (most difficult) different with the first project, this project using udp protocol and then transfer file. To make the data transmission stable we use some mechanism such like sliding window, timeout, congestion control and etc.
3. (most challenging): design an on-demand shortest hop routing protocol for an arbitrary and unknown connectivity networks. Compared with the first and second project, this project demand me to implement some manipulation in a lower layer - link layer.
flood the packet out until it arrives at the destination, every node will record the previous node when flooding. So the destination node will know how to trace back the source node and the hop of this route will be the smallest. After the route is determined, we begin to transfer the file from source to destination
this is the most chllenging experience because my partner took a trip out and I had to completed this project myself. In only two weeks I have to wrote out thousands of lines of codes. Fortunately I achieved this accomplishment finally.
4. We developed an application to walk through a bunch of nodes using IP raw socket. To compensate my work in project 3, my partner did large part of this project.
load balancing:
This research is concerned with scheduling in parallel and distributed systems with divisible loads. A divisible load (job) is one that can be arbitrarily partitioned among the processors and links in a system. For instance, as an example one may have a very large linear data file of numbers that need to be summed. One can send parts of the file to different processors through an interconnection network. The processors compute intermediate sums which are returned to the originating processor for a final summation. The purpose of this research is to find the optimal fractions of load to distribute to processors and links in a scheduled fashion taking into account the scheduling policy, interconnection network used, processor and link speeds and computation and communication intensity.
Applications include parallel and distributed processor network scheduling, cloud computing, grid computing, data intensive computing and metacomputing. The approach is particularly suited to the processing of very large data files as in signal processing, image processing, experimental data processing, linear algebra computation, DNA sequencing, video and computer utility applications.
This new methodology allows a close examination of the integration of computation and communication in networked computing. In fact what has been developed is a robust analytical "calculus" for scheduling problems of networked computation.
The methodology that has been developed to date is unique in that it is a linear and continuous one. Both computing time and channel transmission time are modeled linearly. Continuous time modeling is invoked as jobs can be arbitrarily partitioned. This leads to a very tractable overall model and in many cases recursive, linear or closed form solutions.
computing cost:
The research is concerned with scheuling in parallel and distributed systems with divisible loads. A divisible load is one that can be arbitrarily partitioned among the processors and links in a system. Different processors have different computation speed and monetary cost. The purpose of this research is to find the optimal fraction of load to distribute to processors and links in a scheduled fashion and trade-off between the computing cost and solution time taking into account the scheduling policy, interconnection network used, processor and link speeds and computation and communication intensity and processor monetary computing cost.
mongoDB:
MongoDB is a cross-platform document-oriented database system. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.
hadoop:
Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware.
The Apache Hadoop framework is composed of the following modules:
All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.
Database management system:
design and implement an simple social network and drive some simple queries using three different tools. The first is XSB system, a logic programming and deductive database system. the second is postgreSQL, JAVA/JSP/Servlets to build the application front end to the database.
The third is eXist-db XML to implement some Xqueries.
Operating system:
It has several parts.
the first is to check the boot loader, implement the physical memory allocator, virtual memory mechanism using segmentation and page translation, and memory management unit(MMU)'s page tables.
the second is to implement the basic kernel facilities under protected user-mode environment. Also I should set up GDT, IDT, ISR to make the kernel capable of handling any system calls and exceptions.
the third is to implement round-robin scheduling and basic process management system calls, a copy-on-write fork and inter process communication.
the last is to built a micro-kernel fashion file system. I dont need to implement the entire file system, but only certain key component, such like reading block into the block cache and flush them back to the disk, allocating disk blocks, implement read, write and open in the IPC interface.
Parallel computing:
using MPI to do FFT, 3D wave equation, and matrix multiplication on a super computer platform named blueGene.