Assignment #4Course: ISA 414Points:100Due date: November 18th, 2019, before 11:59 pmSubmission instructions: this assignment is to be done individually. All youranswers should be in a single R script. Your code must be well formulated (i.e., noerrors) and sound (i.e., it does what the question asks it to do). In particular, thegrader must be able to open your .R file using RStudio and run the code withoutrunning into errors. Code with errors may receive zero points. Submit the finaldocument on Canvas before the due date.Question 1: suppose you are responsible for developing the code that computes thenumber of times each word appears on Twitter every day. One can use thesefrequencies, for example, as input when calculating the daily trending words.Clearly, Twitter’s data are massive. Being an expert in Hadoop, you quickly realizethat you can use MapReduce to complete your task. I highly encourage you to useRemote Desktop Connection to complete this question since all of the requiredlibraries are already installed there, and these libraries are not straightforward toinstall. Also, make sure you set the version of R to 3.4.3.a) To test your solution, you will be working with a sample of Twitter’s data. Startby loading the file tweets_asst_4.csv (available on Canvas) to R using theread.csv command (remember to set the argument stringAsFactors = FALSE).Next, upload the resulting data frame to HDFS using the command to.dfs. [10points]b) Define a map function to solve your task. Hint: you might want to consider keyscreated by combining (“pasting”) the date a tweet was tweeted with each word inthe tweet. For example, for a tweet tweeted on April 10th, 2017 containing theword “spider”, a possible key returned by the map function would be 2017-04-10_spider. [20 points]c) Define a reduce function that counts the number of times each word appears onTwitter per day. Hint: see the reduce function in the word-counting example wecovered in class. [10 points]d) Run the mapreduce function using the data in a), the map function in b), and thereduce function in c). Thereafter, retrieve the final output from HDFS and displaythe same as a data frame (table). [10 points]Question 2 – real-life case study: bol.comA 2015 study sponsored by the Dutch electronic-commerce company bol.com, ledby Arthur Carvalho (previously: Rotterdam School of Management – ErasmusUniversity; currently: Farmer School of Business – Miami University) and EstherHundepool (PwC), investigated some of the factors that affect customers’willingness-to-buy in B2C e-commerce environments. The case below is anadaptation of the above study.Business Understanding:Over the past 20 years, the Internet has changed the way consumers buy goods and/orservices. Ranging from groceries to vacation packages and clothing, more and morepeople are using the Internet to shop online. The online selling of products and/orservices by businesses to consumers is often defined as business-to-consumer (B2C)electronic commerce (e-commerce).E-commerce makes up a big share of the retail industry, often providing moreproduct choices and faster delivery time than bricks-and-mortar retailers do. Thetransactions related to B2C e-commerce in Western Europe totaled 177.7 billioneuros in 2013, an increase of 12 percent when compared to the previous year.Another interesting fact is that 95 million consumers in Western Europe boughtgoods and/or services online in 2013. The total e-commerce sales in the UnitedStates amounted to 1,233 billion US dollars in 2013. It is clear that e-commerce is abooming business, which creates an extensive array of research opportunities, e.g.,understanding the factors that influence customers willingness-to-buy in B2C ecommerceenvironments.One can argue that trust perception is one of the biggest barriers for consumers toengage in electronic commerce. A potential lack of trust will likely discourageconsumers to participate in online shopping. Therefore, it is interesting to study how to manage trust in e-commerce environments as well as to study the influence ofdifferent types of trust on consumers willingness-to-buy online.In addition to trust perception, risk perception can be another challenging factor ine-commerce. Different types of risk perception are likely to influence consumersattitude towards online transactions.Finally, consumers demographic traits might also be of influence when it comes toonline shopping behavior.The goal of this study is to investigate the variables that either positively ornegatively significantly influence customers’ willingness-to-buy in B2C ecommerceenvironments. Following the above background sketch, one canformulate the underlying business problem as:What are the determinants of customers willingness-to-buy in B2C e-commerceenvironments?In particular, this study aims at measuring the effects of perceived risk and perceivedtrust on consumers willingness-to-buy online. As e-commerce sales are expected tocontinue growing over the years, understanding these factors, and how to effectivelydeal with them, will play a crucial role in online strategies of companies engagingin e-commerce.Data Understanding:The data in this study were collected by means of an electronic survey developed inpartnership with PwC and bol.com. To illustrate the process of online shopping, thesurvey started by showing the respondents a 5-minute video containing an actualbrowsing and shopping behavior on bol.com, the number one online retailer in theNetherlands. Specifically, after exhibiting some features of the website, the videoshowed a search for and a purchase of a digital camera.When the video was over, the survey showed a web page from bol.com containinga detailed description of the purchased camera. Following the video and productdescription, the survey measured three dimensions of perceived risk and threedimensions of perceived trust using five question-items per dimension. The sixdimensions are: Perceived Product Risk (PPR), Perceived Informational Risk (PIR),Perceived Economic Risk (PER), Perceived Integrity (PI), Perceived Safety (PS),and Perceived Benevolence (PB). Next, the survey measured the main dimension of interest, Willingness-to-Buy(WTB), using five question-items. All the question-items used a 0-100 scale. Thinkabout a chosen scale-value as the likelihood (represented in percentage values) thatthe respondent agrees with a statement in the question-item. At the end, the surveycollected demographic information, such as respondents age, income, and gender.The survey was available from March 17th, 2015 to April 18th, 2015. We invitedparticipants via social networks and by sending emails to subject pools fromRotterdam School of Management at Erasmus University, and the office of thecompany PricewaterhouseCoopers (PwC) located in Rotterdam (the Netherlands).In total, 360 participants started the survey.After the data collection phase, we prepared the resulting data set for posterioranalysis by removing all incomplete survey responses, which resulted in a total of199 full observations in the data set, a completion rate of 55.27%. We show belowthe structure of the survey we used to collect data (translated from Dutch): Perceived Product Risk (PPR)- PPR_1: I think this product will perform as expected.- PPR_2: The prodISA 414代写、R程序语言调试、R设计代做、代写Canvuct purchased will likely not perform as expected.- PPR_3: I think it is difficult to judge the quality of this product adequately.- PPR_4: In case of a product purchase on this website, it is likely to fail theperformance requirements originally intended.- PPR_5: I believe the likelihood is high that something is wrong with theperformance of this product. Perceived Informational Risk (PIR)- PIR_1: It is clear to me whether Bol.com intends to give my personalinformation to third parties.- PIR_2: I believe this website will protect my personal information fromexposure to third parties.- PIR_3: I believe Bol.com does not intend to misuse the personalinformation provided by me.- PIR_4: I believe Bol.com will protect and store my personal informationcorrectly.- PIR_5: I believe Bol.com is likely to misuse my personal information. Perceived Economic Risk (PER)- PER_1: Purchasing from this website would involve economic risk (fraud,hard to return).- PER_2: I believe I can return this product and get a refund easily.- PER_3: I believe there is a high chance that I stand to lose money if Ipurchase this product.- PER_4: When I purchase this item from Bol.com I have the chance offinancial loss.- PER_5: I believe there is a great chance I do not receive the intendedproduct. Perceived Integrity (PI)- PI_1: Bol.com acts sincere in dealing with their customers.- PI_2: I believe this online shop is honest to their customers.- PI_3: I believe Bol.com would keep its promise.- PI_4: I would characterize Bol.com as honest.- PI_5: Bol.com acts truthful in dealing with their customers. Perceived Safety (PS)- PS_1: I believe this online shop has sufficient technical capacity to ensuremy data cannot be intercepted by hackers.- PS_2: I believe this online shop shows great concern for the security ofany of the transactions.- PS_3: I think this online shop has mechanisms to ensure the safetransmission of my information.- PS_4: I believe to have a safe transaction when purchasing from Bol.com.- PS_5: Purchasing from this online shop is safe. Perceived Benevolence (PB)- PB_1: When problems occur, I believe this website will be prepared tosolve my problems.- PB_2: In case of a problem, I believe it will be easy to report a complaintto this website.- PB_3: I believe, when required, Bol.com would do its best to offer help.- PB_4: In case of a problem, I believe this website will make all thenecessary efforts to solve it.- PB_5: I believe this online shop keeps the well-being of the consumerneeds in mind. Willingness to Buy (WTB)- WTB_1: The likelihood that I would shop at this online shop is high.- WTB_2: I would consider buying this product at this price.- WTB_3: I would be willing to recommend this online shop to friends.- WTB_4: I would be willing to buy at this online shop.- WTB_5: It is likely that I will purchase at this online shop. Demographics:- Gender: What is your gender? Male Female- Age: What is your age? Below 18 years old Between 18 and 25 years old Between 26 and 35 years old Between 36 and 45 years old Between 46 and 55 years old Above 55 years old- Income: What is your current yearly income? Less than $20.000 Between $20.000 and $35.000 Between $35.000 and $50.000 Between $50.000 and $65.000 More than $65.000 I prefer not to sayData Preparation:It is now time to analyze our data in order to provide an answer to the businessproblem. From now on, you will be using the Spark technology in conjunction withR programming language. I highly encourage you to use Remote DesktopConnection to complete this question. Make sure you set the version of R to 3.6.1.Then, run the following commands to install the required libraries:install.packages(sparklyr)spark_install(version = 2.0.2)a) Start by downloading the data set bol.csv from Canvas. Next, run the followingcommands to load the data locally, connect to a Spark cluster, and send the surveydata to the Spark cluster. [0 points]library(sparklyr)library(dplyr)survey_data sc survey_tbl Unless otherwise stated, all the following questions must be answered with code thatis executed on the Spark cluster. You should expect to use functions from the Rpackage dplyr in conjunction with Spark.b) Note that the scales of PPR_1, PIR_5, and PER_2 are different from the scales ofthe other items in their dimensions (constructs). For example, the scale of PPR_1 isincreasing in positivity, whereas the scales of PPR_2, PPR_3, PPR_4, and PPR_5are decreasing in positivity. Hence, you have to transform the scales for the sake ofconsistency. The goal of this preprocessing step is to have all risk-related variablesusing scales in increasing negativity, and all trust-related variables using scales inincreasing positivity. To do so, transform (mutate) the variables PPR_1, PIR_1,PIR_2, PIR_3, PIR_4, and PER_2 by subtracting their original values from 100, e.g.,the new values of PPR_1 must be equal to 100 minus the old values. Thesetransformations should change the data set in the Spark cluster. [10 points]c) After fixing the scales, it is now time to create our variables. Remember that wemeasured each risk and trust dimensions using five question-items. Since thequestion-items are highly subjective, one should expect that the respondents’answers contain some “random component”. A common approach to eliminate someof this “randomness” is by averaging the values of the question-items across eachdimension. In practice, one would have to perform reliability analysis and check forinternal consistency before doing so (e.g., performing a confirmatory factor analysisand calculating Cronbach’s alpha), but this is beyond the scope of this assignment. Using the mutate function from dplyr, add the following features to the data set inthe Spark cluster: [10 points] PPR = (PPR_1 + PPR_2 + PPR_3 + PPR_4 + PPR_5)/5 PIR = (PIR_1 + PIR_2 + PIR_3 + PIR_4 + PIR_5)/5 PER = (PER_1 + PER_2 + PER_3 + PER_4 + PER_5)/5 PI = (PI_1 + PI_2 + PI_3 + PI_4 + PI_5)/5 PS = (PS_1 + PS_2 + PS_3 + PS_4 + PS_5)/5 PB = (PB_1 + PB_2 + PB_3 + PB_4 + PB_5)/5 WTB = (WTB_1 + WTB_2 + WTB_3 + WTB_4 + WTB_5)/5Data Modeling:d) Next, you will build an explanatory model that tries to relate the risk and trustdimensions to willingness-to-buy. To simplify the analysis, ignore the demographicvariables in the data set. Using the ml_linear_regression function from the sparklyrpackage, build a linear regression model where the dependent variable is WTB andthe independent variables are PPR, PIR, PER, PI, PS, and PB. Apply the summaryfunction to your model to retrieve coefficients and associated p-values. [10 points]Conclusion:e) Given the coefficients and p-values from above, which actions would you suggestbol.com to take to increase consumers’ willingness-to-buy? List and carefullyexplain at least three features that bol.com could add to its website to alleviate somesignificant risk and trust perception issues, e.g., money back guarantees to reduceperceived economic risks, online reviews to decrease perceive product risk, etc.(sloppy answers will receive zero points) [20 points]转自:http://www.3daixie.com/contents/11/3444.html