讲解：CS210、java、java、CS106LSQL|R

CS210: Software DevelopmentThis assignment was written by Ali Malik originally for CS106L, a class he taught at StanfordUniversity. I have adapted it to work in Java for CSc210.Due: Wednesday 11/20 by 11:30PMSubmission:• WikiRacer.java• WikiScraper.java• MaxPQ.java - Implementation of a max-heap priority queue.OverviewThis assignment has multiple goals:• Implement a priority queue backed by a binary Max-heap• Practice using Sets• Implement a program using multiple interacting data structures• Learn about performance optimization strategies including parallelization.• Implement memoization for computationally expensive function calls.• See the importance of the above performance optimization techniques.You will implement a program that finds a Wikipedia link ladder between two given pages.A wikipedia link ladder is a list of Wikipedia links that one can follow to get from the startpage to the end page. It is based on the popular game WikiRace, which you can play onlinehere. The objective is to race to get to a target Wikipedia page by using links to travel frompage to page. The start and end page can be anything but are usually unrelated to make thegame harder.AssignmentThis assignment is broken up into different parts. But before breaking it into separate parts, itis helpful to get a large overview of the assignment. A broad pseudocode of the algorithm isas follows:1To find a ladder from startPage to endPage:Make startPage the currentPage being processed.Get set of links on the currentPage.If endPage is one of the links on the currentPage:We are done! Return the path of links followed to get here.Otherwise visit each link on currentPage in an intelligent way and searcheach of those pages in a similar manner.Part One: Internet ConnectivityFor this assignment we have provided some starter code. Part of this assignment requiresfetching webPages in order to be able to look at the HTML to find the various links on a page.You are responsible for testing this on your machine by Wednesday 11/13. We will not help,debug, or even look at issues with this code after that date. One of the benefits of Java, is thatit should work independent of what kind of computer it is running on, but as this is the firsttime this assignment has been given in Java, it is important to test this crucial aspect of theassignment. I will not detail how to test this part of the assignment. That is your job. Readthrough the starter code to understand which method is responsible for fetching the HTML,test the inputs/outputs to this method to make sure they work as expected. This part testscode reading/understanding/testing. The only hint I would like to mention is that most, ifnot all, web browsers allow you to see the HTML of any webpage.Part Two: Scraping the HTML to find linksCongratulations! You have already finished this part. This is what you did for the drill.But note the function name has changed, since you need to do a few other tasks beforefinding the actual wikiLinks. Fill in the private static Set scrapeHTML(Stringhtml) method with the solution you came up with for the drill. As a reminder, this functiongoes through a String of the HTML of a webpage and returns a Set which containsthe names of all of the valid wikiLinks found in that HTML. For our purposes, a valid link isany link that:• is of the form ”/wiki/PAGENAME”• PAGENAME does not contain either of the disallowed characters ’#’ ’:’You have examples in the drill testing code. Another example follows the brief HTML descriptionbelow.HTMLIf you are curious, here is a quick overview of HTML. HTML (Hypertext Markup Language)is a language that lets you describe the content and structure of a web page. When a browserloads a webpage, it is interpreting the content and structure described in the HTML of the2page and displaying things accordingly. There are many parts to HTML, but we are mainlyfocused on how hyperlinks are formatted. A section of HTML on a Wikipedia page mightlook like:The sea otter (Enhydra lutris) is amarine mammal native to the coasts of thenorthern and eastern North Pacific Ocean.which would display the following:The sea otter (Enhydra lutris) is a marine mammal native to the coasts of the northern andeastern North Pacific Ocean.where the blue text would be a link to another wikipedia article. Here we can see that tospecify a piece of text as a hyperlink, we have to surround it with the tag. This would look like this:link textExampleIn topology, the long line (orAlexandroff line) is atopological space somewhat similar tothe real line, but in a certain way longer. Itbehaves locally just like the real line, but has different large-scaleproperties (e.g., it is neitherLindel¨of norseparable). Therefore, it serves as one ofthe basic counterexamples of topology[1].Intuitively, the usual real-number line consists of a countable number of linesegments [0,1) laid end-to-end, whereas the long line is constructed from anuncountable number of such segments. You can consultthis book for moreinformation.In the above case, your method should return a Set containing:TopologyTopological_spaceReal_lineLindel%C3%B6f_spaceSeparable_space3Part Three: findWikiLadderNow it is time to find the actual wiki ladder! You will need to fill in the function:private static List findWikiLadder(String start, String end)This function takes in the name of a start page and the name of the target page and returns aList that will be the link ladder between the start page and the end page. For example,a call to findWikiLadder(Mathematics, American literature) might return a listthat looks like Mathematics, Alfred North Whitehead, Americans, Visual art of the United states,American literature. To clarify, this function takes in link names, not actual URLs to these articles.These can be easily converted, see private static String getURL(String link) inWikiScraper.java.We want to search for a link ladder from the start page to the end page. The hard part insolving a problem like this is dealing with the fact that Wikipedia is enormous. We need tomake sure our algorithm makes intelligent decisions when deciding which links to follow sothat it can find a solution quickly. A good first strategy to consider when designing algorithmslike these is to contemplate how you as a human would solve this problem. Let’s work witha small example using some simplified Wikipedia pages.Suppose our start page is Lion and our target page is Barack Obama. Let’s say these arethe links we could follow from the Lion page:• Middle Ages• Federal government of the United States• Carnivore• Cowardly Lion• Subspecies• Taxonomy (biology)Which link would you choose to explore first? Clearly, some of these links look more promisingthan others. For example, the link to Federal government of the United States looks likea winner since CS210代做、java编程语言调试、java设计代写、代做it is probably close to Barack Obama. In our algorithm, we want to capturethis idea of following links to pages “closer” in meaning to the target page before those thatare more unrelated. How can we measure this similarity?One idea to determine “closeness” of a page to the target page is to look at the links incommon between that page and the target page. The intuition is that pages dealing withsimilar content will often have more links in common than unrelated pages. This intuitionseems to pan out in terms of the links we just considered. For example, here are the numberof links each of the pages above have in common with the target Barack Obama page:4Pages Links common with Barack Obama pageMiddle Ages 0Federal government of the United States 5Carnivore 0Cowardly Lion 0Subspecies 0Taxonomy (biology) 0In our code, we will use a List to represent a ladder of links. Our pseudocodelooks like this:Finding a link ladder between pages startPage and endPage:Create an empty priority queue of ladders (a ladder is a List).Create/add a ladder containing {startPage} to the queue.While the queue is not empty:Dequeue the highest priority partial-ladder from the front of the queue.Get the set of links on the current page. The current page is the pageat the end of the just dequeued ladder.If the endPage is in this set:We have found a ladder!Add endPage to the ladder you just dequeued and return it.//Part Four optimizations. Ignore this comment until you get to part Four.For each neighbor page in the current page’s link set:If this neighbor page hasn’t already been visited:Create a copy of the current partial-ladder.Put the neighbor page String name on top of the copied ladder.Add the copied ladder to the queue.If while loop exits, no ladder was found so return an empty ListNote: this priority queue dequeues the ladder with the highest priority number! This isdifferent from what you did for PatientQueue. This priority queue should be backed by abinary MAX-heap.It is tempting to see this pseudocode and jump into coding right away. That would bea poor choice. You need to first understand what this pseudocode is doing and why. Readthrough it a few times to ensure you understand the purpose of each step.You will need to follow the above pseudocode and write a new implementation of a priorityqueue that will dequeue the ladder with the highest priority. Note that you will not needall of the methods that were required for PA9. You only need to implement the methods thatare required for PA10. This class should be called MaxPQ.java and written in the empty filethat we have provided in the starter code.Remember that after completing Part Two, you have a very valuable tool. The WikiScraperclass now has a method public static Set findWikiLinks(String link) whichtakes in a link name like Barack Obama and returns a Set of all of the wiki linksfrom that page.5Part Four: Now make it fast!Run your program on any of the below testcases. You will see that your program is extremelyslow. Let’s make it fast! I am going to describe my process when speeding up this program. Ifirst had a working implementation like you (hopefully) have at this point. It also ran reallyslow. What do you do in this situation? Well, two things: 1. Read through your code to searchfor inefficiencies 2. Run some tests to see what is taking so long!? This can be sophisticatedor as easy as adding print statements and seeing which parts of your code take really long. Iencourage you to do this right now.I wanted to require you to time how long it takes to find a ladder between Emu andStanford University without the optimizations you are about to perform, but I will not requireyou to turn this time in. You should though! It will make you very proud of the memoizationyou are about to perform.The part that needs speeding up is fetching the HTML of the different webpages. Addthe below lines of code right where the comment that says ”Part Four optimizations” is in thepseudocode above.links.parallelStream().forEach(link -> {WikiScraper.findWikiLinks(link);});The above code assumes you have named your Set of links from the current pageas ’links’.The above code simply calls findWikiLinks on each link in the set. This is worthless untilyou have memoized that function call! Now implement memoization as we saw in class forthe Fibonacci sequence. Remember the whole point of this process is to save results of yourfunction calls. i.e. if you have already called findWikiLinks(”Emu”), don’t do the work again,just return the result you previously calculated!Memoization should only require 4-10 lines of code.Now find a ladder between Emu and Stanford University. It should be way faster!Test PagesHere are some good test pages to try your algorithm on in the early stages:• Start page: Fruit• End page: Strawberry• Expected return: Fruit, Strawberry• Notes: This should return very quickly since it is a one link jump6• Start page: Milkshake• End page: Gene• Expected return: Milkshake, Carbohydrate, DNA, Gene• Notes: This ran in about 30 seconds on my computer.• Start page: Emu• End page: Stanford University• Expected return: Emu, Savannah, United States, University of California, Berkeley, StanfordUniversity• Notes: This ran in about 30 seconds on my computer.You don’t need to match this ladder exactly but your code should run in roughly the sameamount of time as times specified. If not, chances are your priority queue is not workingcorrectly, you did not implement memoization correctly, your scrapeHTML function is ineffi-cient, or you have another inefficiency somewhere in your program.Hints• Make sure you understand the pseudocode or you will have a difficult time testing andfixing issues in your code. It is impossible to debug code you do not understand.• You should NOT edit the main, fetchHTML, getURL functions. If you do edit main()for testing, be sure to revert it back to its original state. Keep in mind you can create anentirely seperate file to test and call WikiRacer.findWikiLadder(...).• Start early!• Test the various pieces of your program very well. If you do not ensure that each pieceworks perfectly, you will find very difficult to detect issues with the whole. Use thedebugger and/or various print statements.Grading CriteriaWe are not providing testcases for this PA. It is impossible since we are allowing differentladders.We encourage you to write your own JUnit testcases to ensure your classes work properly,but we will not be collecting or grading these test files. We will test your classes with our owntestcases.Your grade will consist of similar style and code clarity points we have looked for in earlierassignments.7Write your own code. We will be using a tool that finds overly similar code. Do not look atother students’ code. Do not let other students look at your code or talk in detail about howto solve this programming project. Do not use online resources that have solved the same orsimilar problems. It is okay to look up, ”How do I do X in Java”, where X is indexing into anarray, splitting a string, or something like that. It is not okay to look up, ”How do I solve {aprogramming assignment from CSc210}” and copy someone else’s hard work in coming upwith a working algorithm.8转自：http://www.3daixie.com/contents/11/3444.html

讲解：CS210、java、java、CS106LSQL|R

你可能感兴趣的:(讲解：CS210、java、java、CS106LSQL|R)