Department of Accounting andFinance Lancaster UniversityAcF 351b Career Skills in Accounting and FinancePython for Data AnalysisStream Assignment2019/201. OverviewPython for Data Analysis stream is designed to provide introductory programming knowledge to studentswho have no or little prior programming experience. Throughout the five sessions, this module has coveredPython language basics, scientific computing and web scraping packages and introductory textual analysis.The assignment therefore is intended to give students the opportunity to practice on what have learned fromthe course and encourage students to do independent work towards writing a complete script to downloadpublic data and conducting basic textual analysis in the fields of accounting and finance.In this coursework, students are expected to investigate potential consequences of negative/constrainingtone reporting, i.e., do investors respond strongly to such soft information hidden in companies’ financialreporting, if so, in which way and to what extend? Students are also encouraged to be creative and explorefurther on research questions that are meaningful to the above topic.This document sets out the details of the stream coursework requirements along with some instructions/tipsand core reading list.2. Coursework Submission• Submission Deadline:Wednesday 15th January 2020 12:00noon• Submission Location: Moodle• Submission Documents:o Part 1: A Jupyter Notebook file for your Python script. You are required to provide codesto demonstrate the workflow to obtain, process and analyze your data. This part contributes50% of your coursework assessment. In general, this coding part should consist of thefollowings (for detailed instructions see Section 3 Part 1):§ Sub-section to obtain scrape 10-K data from SEC Edgar Database;2§ Sub-section to access financial data from WRDS;§ Sub-section to merge, clean, process and analyze on the above downloaded data;NOTE:• Make sure that your codes can be run in other’s environment and your resultscan be reproduced;• Make sure you also put short comments/markdown text to highlight what youare doing for those codes.o Part 2: A Microsoft Word document for your report. You are required to write a report tointerpret the results from your analysis. This part contributes 50% of your courseworkassessment. You should include the following:§ A cover sheet on the first page of your report including the details of your fullname and student ID;§ A short introduction to the topic;§ Relevant literature review (You are expected to read more than the selectedreferences listed in the end of this documents);§ Analysis on the data and main findings (for detailed instructions see the Section 3Part 2);§ A short conclusion;§ References (Review of Financial Studies or Journal of Finance Style).NOTE:• The coursework assignment should be kept as short and concise as possible.The overall length of the coursework MUST NOT EXCEED 2,000 words(Please note that the word limit excludes tables, the list of references, andappendices containing illustrative and supporting material, but includesfootnotes.)• Completed reports that exceed the maximum word limit will be SEVERELYPENALIZED.• Make sure to use 12 point Times New Roman font with generous marginsand 1.5 line spacing consistently.• For more detailed guidance on writing your report, please refer to the generaloutline of the AcF351b module.3. InstructionsPart 1: Gathering and preparing dataIn this part, the assessment will be carried out primarily based on your Python codes in Jupyter Notebookdocuments. This part has two main tasks and it counts for 50% towards the final mark of this coursework.3• TASK 1 (25 marks)Download and scrape 10-K filings for all listed US companies during the period from 2000 to 2018from SEC Edgar Database and conduct preliminary analysis on the textual data. Some tips (asdiscussed in details in the fifth session) are:o Start with the SEC Edgars Archives for directory listing of full-index and understand howSEC Edgar database organizes the U.S. companies filings;o Download the crawler.idx file and parse the useful information on the crawler files (foreach QTR each Year);o Access to the link for 10-K filing summary page and then the actual 10-K filing link inhtm format (for each filing);o Harvest all the text data in10-K filing report (for each filing).o Store all relevant data locally ready for further processingo Clean the downloaded textual data (Text Pre-processing, refer to Bodnaruk, Loughran andMcDonald, 2015 for detailed process)• TASK 2 (25 marks)Access and download CRSP stock data (anAcF 351b作业代做、代写Python课程设计作业、Data Analysis作业代做、Python编程作业代写 调d Compustat data if you think it can enhance youranalysis) from WRDS and merge with the processed text data above. Some tips (as discussed indetails in the third session) are:o Go through Fama-French sample codes from WRDS can be enormously helpful tounderstand CRSP and Compustat data. However you do not need to re-write the codes,just take whatever you think is sensible for our purpose;o Merge databases. Pay extra attention to company IDs since CRSP, Compustat and SECEdgar databases use different identification codes (for instance CRSP has crsp_permno,Compustat uses GVKEY, and SEC Edgar uses CIK). You should have access to a linkingtable/dataset in WRDS to map these different IDs. There are also plenty resources onlinedemonstrate how to link IDs and you should also be able to find instructions fromBodnaruk, Loughran and McDonald (2015);o Equally, you need to figure out how to merge data on the dimension of time. 10-K filingshave two sets of timestamps: filing date and report date. Filing date is when an individual10-K report be sent to SEC and you might want to treat this date as the date when the reportis public available. Report data however is meant to indicate the covering period of thatreport. For instance, a reporting date of 30th September 2018 for a 10-K file covers all thefinancials of company since the last 10-K file in 2017;o Merge multiple datasets into a panel data table ready for further processing and analyzing.4Part 2: Exploring and analyzing dataIn this part, the assessment will be carried out primarily based on your academic report in Microsoft Worddocuments. This part has two main tasks and it counts for 50% towards the final mark of this coursework.• TASK 3 (20 marks)Conduct exploratory data analysis (EDA). You need to provide detailed summary statistics on yourdata (including textual data from SEC Edgar and stock data from WRDS) and explore relevantquestions (but certainly not restricted to) and interpret:1. How many negative and/or constraining words in the 10-K files? Are there two tones highlycorrelated?2. Are there cross-sectional differences? Who are those companies reporting in the most negative(and/or constraining) tone?3. Are companies reporting in a consistent tone over time or change dramatically?Note: You can search and follow others’ textual analysis workflow if you think those can provideinsights to understand your data and of course it is always a solid idea to replicate this part frompublished journal articles such as Loughran and McDonald (2010) and Bodnaruk, Loughran andMcDonald (2015) and others.• TASK 4 (30 marks)Follow Bodnaruk, Loughran and McDonald (2015) to conduct textual analysi on companies’ 10-K text and investigate potential consequences of financial reporting with negative/constrainingtone. You need to conduct an event-study with 10-K filing disclosure as the time window andexamine whether and how the market reacts to such soft information. Answer the followingquestions and interpret your findings:1. How did most (least) constrained companies perform around the disclosure, such as oneday/week prior the 10-K filing and one day/week/month/year afterwards? You might want togroup stocks into quantiles according to the tone in their reports. Please refer to market eventstudy published in journal articles including (but not limited to) Bernard and Thomas (1989)who documented the post-earnings announcement drift.Note: After your attempt to answer the above question, you are encouraged to explore otherresearch questions relate to this topic and interpret your findings. One example would bewhether companies’ operation performance actually deteriorate (improve) after reporting in anegative/constraining (positive) tone? Be creative in an academic way of course.54. ResourcesSelected Academic References:- Bernard, V. L., & Thomas, J. K. (1989). Post-earnings-announcement drift: delayed price responseor risk premium?. Journal of Accounting research, 27, 1-36.- Loughran, Tim and Bill McDonald. (2011). “When is a Liability not a Liability? Textual Analysis,Dictionaries, and 10-Ks”, Journal of Finance, 66: 67-97.- Bodnaruk, Andriy, Tim Loughran, and Bill McDonald. (2015) Using 10-k text to gauge financialconstraints. Journal of Financial and Quantitative Analysis 50: 623-646.- Loughran, Tim and Bill McDonald. (2016). “Textual Analysis in Accounting and Finance: ASurvery”, Journal of Accounting Research, 54: 1187-1230.Selected Online Resources:- Software Repository for Accounting and Finance (SRAF): https://sraf.nd.edu/- Professor Bill McDonald’s website: https://www3.nd.edu/~mcdonald/ and his various recentpublished papers utilising textual analysis转自:http://www.daixie0.com/contents/3/4674.html