Week3_Clean/Filter Data and Make Plots

After retrieving web data and storing them using MongoDB (pymongo), we are considering to clean or format the data in a certain consistent data, filter the data using "pipeline", and make the plot using "chart" module. All the coding was performed in Jupyter Notebook.

  1. Create a new collection, and transfer the retrieved data (.json format) to the new data collection and make a copy for that collections using either mongo shell or cmd:
hw3_1.png
hw3_2.png
  1. Below is the link for the code on how to show the top 3 posted categories in one selected zone:
    https://anaconda.org/tangli666/week3_hw_v2/notebook
hw3_3.png
  1. Below is the link for the code on how to show the relationship between the item condition and the average price:
    https://anaconda.org/tangli666/week3_hw_v10/notebook
    Note: in order to filter and format the 'price', some modification was made and update to a the new collection:
    """
    for i in item_info.find():
    try:
    price = int(i['price'].split(' ')[0])
    except ValueError:
    price = 0
    item_info.update({'_id':i['_id']},{'$set':{'price':price}})
    """
hw3_4.png
  1. Last, the command line for exporting the data collection to a csv file:
hw3_5.png

你可能感兴趣的:(Week3_Clean/Filter Data and Make Plots)