调试Python、Python语言代做、代做Python语言

groupname_assignment_bNote that this is a group assignment - you are required to work in a group of 2-4 students for thewhole assignment. One set of peer evaluation forms (submitted via blackboard) is required forAssignment B.1.1 BackgroundYou have been employed by a company that sells apps and devices to help drivers reduce theirrisk of infringing on road rules (and getting caught!). The marketing department has come upwith a two-pronged campaign that it wants to target to different demographics.Theyplantomarketthefinancialimplicationsofinfringementstouniversitystudents,throughan education campaign about the type and cost of infringements that are most likely to occur.They plan to market the safety aspects of their products to young families, focussing on situa-tions where child safety is at risk.Your job is to help support these campaigns: First, to establish the market for them, and sec-ondly to provide information that will be used in the education pieces of the campaigns.NB: The data set used in this assignment is both real and very recent (from the NSWOffice of State Revenue, see [http://data.gov.au/] for all open government data sets, or[http://www.revenue.nsw.gov.au/info/statistics] for this particular one - the &"Penalty NoticeData Set&"). That means you may be the first person in the world to uncover an error, quirkyfact, or meaningful result. Good luck!1.2 Submission Instructions1. EachgroupneedstosubmitasingleJupyternotebook(.ipynbfile)whichcontainsalloftheircode and analysis, via the link on Blackboard (Assessment, Assignment B Submission).2. The provided material is a zip file containing a template notebook (this document:x67x72x6Fx75x70x6Ex61x6Dx65x5Fx61x73x73x69x67x6Ex6Dx65x6Ex74x5Fx62x2Ex69x70x79x6Ex62, also as a pdf), two data files (x70x65x6Ex61x6Cx74x79x5Fx64x61x74x61x5Fx74x72x61x69x6Ex2Ex63x73x76and x70x65x6Ex61x6Cx74x79x5Fx64x61x74x61x5Fx74x65x73x74x2Ex63x73x76), and an excel spreadsheet describing the data set(x70x65x6Ex61x6Cx74x79x5Fx64x61x74x61x5Fx64x65x73x63x72x69x70x74x69x6Fx6Ex2Ex78x6Cx73x78).3. Complete the template notebook with your code. You may make extra cells as you prefer,but please leave the question cells there for ease of reading.4. The notebook will be run using the menu &"Cell->Run All&" (using the latest Python 3 basedAnaconda Python installation available on the date the assignment is posted), with thex70x65x6Ex61x6Cx74x79x5Fx64x61x74x61x5Fx73x65x74x2Ex63x73x76 file in the same folder as the notebook.15. All of your outputs (x2Ex63x73x76 files) need to be written to that same directory, with the filenameand format as requested by the question.6. The correctness of produced x2Ex63x73x76 files will be assessed automatically (by a python script),so specifications must be followed precisely. The most important thing is to have the (exact)correct column names and row ordering. The bold numbers (index) of the data frame. willbe ignored, so don’t worry about them.7. Use Markdown Cells for longer explanation of your work and analysis, as required by someof the questions.8. A short assessment of the content of the notebook will be made (for code style, clarity ofexplanation, and validity of your approach).1.3 Marking Criteria1. Correctness of results as per the given training / validation split.2. Correctness of results on a different random training / validation split (to be determinedby the marker after the assignment is handed in). This means that excessively tuning yourresults for the exact training/test data is not a good idea.3. Clear, well commented code (using the &"#&" symbol to add comments to explain your think-ing). This is particularly important when a result is incorrect, as you may still be able to getpartial marks for your answer.4. Specific marking criteria as described in the questions below.1.4 Suggested ResourcesWhile posting the questions online is strictly forbidden by the University’s academic honesty pol-icy, you may find help in a variety of ways:• You should be able to do the whole assignment with the following packages, which havevery helpful documentation on their websites:• pandas: http://pandas.pydata.org/ (e.g. x69x6Dx70x6Fx72x74 x70x61x6Ex64x61x73 x61x73 x70x64)• scikit-learn: http://scikit-learn.org/stable/index.html (e.g. x66x72x6Fx6D x73x6Bx6Cx65x61x72x6Ex2Ex74x72x65x65 x69x6Dx70x6Fx72x74x44x65x63x69x73x69x6Fx6Ex54x72x65x65x43x6Cx61x73x73x69x66x69x65x72)• There are many helpful online forums where python developers and data scientists discussthe best ways of solving particular problems. http://stackoverflow.com is the biggest, andwill likely appear in any googling you do.• If you still feel stuck with the basics, there are many free online resources to help you get upand running with the basics, e.g. http://datacamp.com, and inexpensive e-books such asthose on O’Reilly.1.5 ErrorsIf you believe there are any errors with the assignment please email the lecturer immediately atx6Dx69x63x68x61x65x6Cx2Ex62x65x77x6Cx65x79x40x73x79x64x6Ex65x79x2Ex65x64x75x2Ex61x75.1.6 SetupThe code below reads the data file, creates a training and test data set, and displays the first fiverows for you. Add code in the cells below (make more cells if you like) to answer the questions.DO NOT EDIT (except for adding in your group name)2x49x6E x5B x5Dx3A x25x70x79x6Cx61x62 x6Ex6Fx74x65x62x6Fx6Fx6Bx69x6Dx70x6Fx72x74 x70x61x6Ex64x61x73 x61x73 x70x64x23 x45x44x49x54 x48x45x52x45x3A x52x65x70x6Cx61x63x65 x74x68x69x73 x62x69x74 x77x69x74x68 x61 x75x6Ex69x71x75x65 x6Ex61x6Dx65 x66x6Fx72 x79x6Fx75x72 x67x72x6Fx75x70x47x52x4Fx55x50x5Fx4Ex41x4Dx45 x3D x22x6Dx79x5Fx67x72x6Fx75x70x5Fx6Ex61x6Dx65x22x64x66x5Fx74x72x61x69x6E x3D x70x64x2Ex72x65x61x64x5Fx63x73x76x28x27x70x65x6Ex61x6Cx74x79x5Fx64x61x74x61x5Fx74x72x61x69x6Ex2Ex63x73x76x27x2C x70x61x72x73x65x5Fx64x61x74x65x73x3Dx5Bx27x4Fx46x46x45x4Ex43x45x5Fx4Dx4Fx4Ex54x48x27x5Dx29x64x66x5Fx74x65x73x74 x3D x70x64x2Ex72x65x61x64x5Fx63x73x76x28x27x70x65x6Ex61x6Cx74x79x5Fx64x61x74x61x5Fx74x65x73x74x2Ex63x73x76x27x2C x70x61x72x73x65x5Fx64x61x74x65x73x3Dx5Bx27x4Fx46x46x45x4Ex43x45x5Fx4Dx4Fx4Ex54x48x27x5Dx291.7 Question 1 (2 marks)NB: Start with x64x66x5Fx74x72x61x69x6EInitial exploratory analysis: Let’s find out the infringements that bring in the most revenue.List all of the offence codes that brought in at least $1 million (aggregated throughout the entireduration of the data set for each offence code). List them in a dataframe. with the description, thenumber of occurrences, and the total revenue brought in by that offence. Order from highest tolowest total revenue.The format for the data frame. &"df_top_offences&" before saving to csv should be:OFFENCE_CODE OFFENCE_DESC TOTAL_NUMBER TOTAL_VALUE79053 Use unregistered registrable Class A motor veh... 185602 1162186446963 Disobey no stopping sign 456009 1031485081.7.1 Marking Guide• 1 mark - Partially correct solution (fails automatic verification, but passes some manual in-spection of code and results).• 2 marks - Passes automatic verification for correct resultsx49x6E x5B x5Dx3A x23 x51x31 x48x45x52x45x64x66x5Fx74x6Fx70x5Fx6Fx66x66x65x6Ex63x65x73 x3Dx23x2Ex2Ex2Ex64x66x5Fx74x6Fx70x5Fx6Fx66x66x65x6Ex63x65x73x2Ex74x6Fx5Fx63x73x76x28x27x71x31x5Fx7Bx7Dx2Ex63x73x76x27x2Ex66x6Fx72x6Dx61x74x28x47x52x4Fx55x50x5Fx4Ex41x4Dx45x29x291.8 Question 2 (3 marks)NB: Start with x64x66x5Fx74x6Fx70x5Fx6Fx66x66x65x6Ex63x65x73The marketing team wants to do a campaign about infringements relating to red lights. Takethe data frame. &"df_top_offences&" from Question 1, and restrict it to only those entries that mentionthe colour &"red&" (careful!). Save it as a csv in the same format as per Question 1.1.8.1 Marking Guide• 1 mark - Fails automatic verification, but solution has some correct aspects.• 2 marks - Fails automatic verification with minor errors, e.g. text search not quite accur调试Python作业、Python语言代做留学生、代做留学生Python语言ate,or wrong order.• 3 marks - Passes automatic verification for correct results3x49x6E x5B x5Dx3A x23 x51x32 x48x45x52x45x64x66x5Fx74x6Fx70x5Fx6Fx66x66x65x6Ex63x65x73x5Fx72x65x64 x3Dx23x2Ex2Ex2Ex64x66x5Fx74x6Fx70x5Fx6Fx66x66x65x6Ex63x65x73x5Fx72x65x64x2Ex74x6Fx5Fx63x73x76x28x27x71x32x5Fx7Bx7Dx2Ex63x73x76x27x2Ex66x6Fx72x6Dx61x74x28x47x52x4Fx55x50x5Fx4Ex41x4Dx45x29x291.9 Question 3 (5 marks)NB: Start with x64x66x5Fx74x72x61x69x6EThe marketing team now wants to understand the magnitude of infringements that relate tochild safety, for use with their &"young families&" customer segment.Take the original data frame. (&"df&") and find any offence (regardless of number of occurrences)that relates to children (or school zones), based on the text. You’ll have to come up with your owndefinition for what this means - please explain it in a comment. Add a new boolean column calledx43x48x49x4Cx44x5Fx52x45x4Cx41x54x45x44 that is x54x72x75x65 when the x4Fx46x46x45x4Ex43x45x5Fx44x45x53x43 matches your search, and x46x61x6Cx73x65 when it doesnot. Leave rows in the same order as the x64x66x5Fx74x72x61x69x6E that you read in at the start.Save data in csv of the following format:CHILD_RELATED OFFENCE_DESCFalse Proceed through red traffic light - Camera Det...False Stop on/near marked foot crossingFalse Enter restricted area without offering ticket ...1.9.1 Marking Guide• 1-2 marks - Solution incorrect, but some correct aspects.• 3-4 marks - Some minor errors with the solution.• 5 marks - Passes automatic verification for correct results (some leniency given to differinginterpretations of &"child related&").x49x6E x5B x5Dx3A x23 x51x33 x48x45x52x45x64x66x5Fx63x68x69x6Cx64x5Fx72x65x6Cx61x74x65x64 x3Dx23x2Ex2Ex2Ex64x66x5Fx63x68x69x6Cx64x5Fx72x65x6Cx61x74x65x64x2Ex74x6Fx5Fx63x73x76x28x27x71x33x5Fx7Bx7Dx2Ex63x73x76x27x2Ex66x6Fx72x6Dx61x74x28x47x52x4Fx55x50x5Fx4Ex41x4Dx45x29x291.10 Question 4 (10 marks)Imagine the office of state revenue has just announced some changes that will be made to the dataset in future (hey, you’re lucky they bothered to announce it!).1. They want to &"simplify&" the data by removing precise details of the infringements:• The x4Fx46x46x45x4Ex43x45x5Fx43x4Fx44x45 and x4Fx46x46x45x4Ex43x45x5Fx44x45x53x43 columns will no longer be given in future.• The x46x41x43x45x5Fx56x41x4Cx55x45 and x54x4Fx54x41x4Cx5Fx4Ex55x4Dx42x45x52 of infringement columns will be removed (but thex54x4Fx54x41x4Cx5Fx56x41x4Cx55x45 column will stay).2. The x53x43x48x4Fx4Fx4Cx5Fx5Ax4Fx4Ex45x5Fx49x4Ex44 column will no longer be available in future.Your marketing team panics that this data set, which is core to their &"child related&" strategy, isabout to become useless for ongoing campaigns. You assure them that you can build a predictive4model which can make a reasonable guess whether a line entry in the new data set is about a childrelated offence, based on the remaining columns that will be left in the data.Build a model that predicts whether a line represents a x43x48x49x4Cx44x5Fx52x45x4Cx41x54x45x44 infringement, as de-fined previously, using the remaining variables in x64x66x5Fx74x72x61x69x6E. Hint: Using dates in prediction isprobably unwise.Write the predictions for the test data set to a csv file in the following format, preserving thesame row order as x64x66x5Fx74x65x73x74, where x43x48x49x4Cx44x5Fx52x45x4Cx41x54x45x44 is the same as in your answer to Question 3,and x43x48x49x4Cx44x5Fx52x45x4Cx41x54x45x44x5Fx50x52x45x44x49x43x54x49x4Fx4E is the binary (True/False) output of your predictive model foreach row:CHILD_RELATED CHILD_RELATED_PREDICTIONFalse FalseFalse FalseFalse ...1.10.1 Marking Guide• 1-4 marks - Code exhibits some aspects of a correct model build, but either no scores areproduced, or the model is no better than random guessing.• 5 marks - Model achieves fair (better than random) performance on the provided test set• 6-8 marks - Model achieves fair to good performance on a different random split of thetraining/test data.• 9-10 marks - Model achieves good to outstanding performance on an undisclosed testmethod.NB: Questions about what a &"good&" model performance is, will not be answered, other thanthe generic &"and 100% is a perfect model&". We are simulating a &"real world&" model build, where you are notprovided with a definition of &"good enough&" prior to building it! A range of binary performancemetrics will be used in the assessment.x49x6E x5B x5Dx3A x79x5Fx74x72x61x69x6E x3D x64x66x5Fx63x68x69x6Cx64x5Fx72x65x6Cx61x74x65x64x2Ex43x48x49x4Cx44x5Fx52x45x4Cx41x54x45x44x58x5Fx74x72x61x69x6E x3D x64x66x5Fx74x72x61x69x6Ex2Ex64x72x6Fx70x28x5Bx27x53x43x48x4Fx4Fx4Cx5Fx5Ax4Fx4Ex45x5Fx49x4Ex44x27x2C x27x4Fx46x46x45x4Ex43x45x5Fx43x4Fx44x45x27x2C x27x4Fx46x46x45x4Ex43x45x5Fx44x45x53x43x27x2C x27x46x41x43x45x5Fx56x41x4Cx55x45x27x2C x27x54x4Fx54x41x4Cx5Fx4Ex55x4Dx42x45x52x27x5Dx2Cx61x78x69x73x3Dx31x29x23 x44x6F x77x68x61x74x65x76x65x72 x79x6Fx75 x6Cx69x6Bx65 x77x69x74x68 x74x68x65 x72x65x73x74 x6Fx66 x74x68x65 x58 x76x61x72x69x61x62x6Cx65x73x2Ex23 x42x75x69x6Cx64 x61 x63x6Cx61x73x73x69x66x69x65x72x23 x48x69x6Ex74x3A x55x73x65 x63x72x6Fx73x73 x76x61x6Cx69x64x61x74x69x6Fx6E x6Fx6E x64x66x5Fx74x72x61x69x6E x74x6F x74x75x6Ex65 x61x6Ex79 x68x79x70x65x72x70x61x72x61x6Dx65x74x65x72x73x2Ex23 x44x6F x74x68x65 x73x61x6Dx65 x6Dx61x6Ex69x70x75x6Cx61x74x69x6Fx6Ex73 x74x6F x64x66x5Fx74x65x73x74 x61x73 x79x6Fx75 x64x69x64 x74x6F x64x66x5Fx74x72x61x69x6E x28x73x65x6Cx65x63x74x69x6Ex67 x69x6Ex70x75x74 x76x61x72x69x61x62x6Cx65x73 x65x74x63x29x23 x61x6Ex64 x70x72x6Fx64x75x63x65 x70x72x65x64x69x63x74x69x6Fx6Ex73 x6Fx6E x74x68x61x74 x64x61x74x61 x73x65x74x2Ex79x5Fx70x72x65x64x2Ex74x6Fx5Fx63x73x76x28x27x71x34x5Fx7Bx7Dx2Ex63x73x76x27x2Ex66x6Fx72x6Dx61x74x28x47x52x4Fx55x50x5Fx4Ex41x4Dx45x29x291.11 Question 5 (10 marks)One of the most important aspects of data science is serendipitous discovery. If you’re lucky, youmay have been asked by a business to do one fairly straightforward analysis, but you discover5something else important along the way. More commonly, you will be provided with some dataand a vague business goal, and expected to come up with something insightful that impacts thebusiness. This is your job for Question 5 - still pretending you’re working for the same companydescribed above, perform. an unsupervised learning analysis, and write a report (in markdowncells below) documenting what you find. You must use either PCA or k-means, do some visuali-sation of results, and explain what you see.Pretend companies aside, this is a real, up-to-date open government data set. If you find outsomething important that relates to the real world, you’re playing for more than just uni marks.Maybe you’ll alert the government to policy error or fraud. You could discover something juicythat’s of interest to the Australian media. Perhaps you’ll even find a business opportunity andmake some money! If you find out something important that isn’t part of the &"core business&" ofthe pretend company - don’t worry. Great analysis and insight are the best way to get marks.• 1-4 marks - Some attempt at analysis is made (a few graphs, and a bit of explanation), butthere is neither correct use of PCA nor k-means, and nothing particularly insightful.• 5 marks - Correctly applies PCA or k-means, and comes up with an interesting insight.• 6-8 marks - Clear explanations, good visualisations, correct use of PCA or k-means, and ameaningful insight.• 9-10 marks - Gets all the basics above, but comes up with a genuinely compelling insight.2 5. Unsupervised Learning ReportPut your answers here! Markdown cells support all sorts of formatting (not quite as flexible as Word, butenough to write a good report).转自:http://ass.3daixie.com/2018060769870117.html

你可能感兴趣的:(调试Python、Python语言代做、代做Python语言)