运行数据分析

How to install and run the analytics backend locally:

We have had some troubles getting people up and running locally with the analytics backend, so I wrote up a quick guide for installation. If you run into any undocumented issues or trouble, please try to document it here. These instructions were performed on a clean install of Ubuntu 14.04.

Clone the analytics repositories

1. Navigate to the folder you want to install into and git clone these repositories:

  • edx/edx-analytics-data-api
  • edx/edx-analytics-pipeline
  • edx/edx-analytics-data-api-client
  • edx/edx-analytics-dashboard
cd {PATH_TO_EDX_FOLDER}/analytics

cd {PATH_TO_EDX_FOLDER}/analytics



git clone https://github.com/edx/edx-analytics-pipeline.git

git clone https://github.com/edx/edx-analytics-data-api.git

git clone https://github.com/edx/edx-analytics-data-api-client.git

git clone https://github.com/edx/edx-analytics-dashboard.git

2. Create virtual environments in which to run the repositories

  • It is best to create a separate virtual environment for each repository; otherwise, you may run into conflicts between their dependencies.
mkdir ~/.venvs

cd ~/.venvs

virtualenv edx-analytics-pipeline

virtualenv edx-analytics-data-api

virtualenv edx-analytics-data-api-client

virtualenv edx-analytics-dashboard

Install the dependencies

  • You will need to activate and deactivate each virtualenv in turn
  • Once you think the dependencies are installed, check them by running the repository's unit tests.
  • If the unit tests complete successfully, you will see output of the form "Ran X tests in Ys \n\n OK"

1. Installing edx-analytics-pipeline:

cd {PATH_TO_EDX_FOLDER}/analytics

cd edx-analytics-pipeline/

source ~/.venvs/edx-analytics-pipeline/bin/activate

make requirements

make test

If this raises a NoAuthHandlerFound error from boto, run:

export AWS_ACCESS_KEY_ID="TESTACCESSKEY"

export AWS_SECRET_ACCESS_KEY="TESTSECRET"

make test

To run this in production, we need to supply actual AWS credentials to boto, but the test suite does not care if they are valid.

deactivate

source ~/.venvs/edx-analytics-data-api/bin/activate

2. Installing edx-analytics-data-api:

cd ../edx-analytics-data-api

make develop

./manage.py migrate --noinput

./manage.py migrate --noinput --database=analytics

./manage.py set_api_key edx edx

make validate

deactivate

3. Installing edx-analytics-data-api-client:

cd ../edx-analytics-data-api-client/

source ~/.venvs/edx-analytics-data-api-client/bin/activate

pip install -r requirements.txt 

make test

deactivate

4. Installing edx-analytics-dashboard:

cd ../edx-analytics-dashboard/

source ~/.venvs/edx-analytics-dashboard/bin/activate

sudo apt-get update

sudo apt-get install gettext

sudo apt-get install npm

sudo apt-get install openjdk-7-jre

sudo apt-get install openjdk-7-jdk

sudo apt-get install libxml2-dev libxslt-dev python-dev zlib1g-dev

make develop

make validate

If this raises an OfflineGenerationError for missing compression keys, run:

./manage.py compress --settings=analytics_dashboard.settings.test

make validate

deactivate

Run pipeline task locally and verify its completion

1. Install MySQL locally and create a credentials file for the pipeline

sudo apt-get install mysql-server

mysql -u root -p



CREATE USER 'analytics'@'localhost' IDENTIFIED BY 'edx';

GRANT ALL PRIVILEGES ON * . * TO 'analytics'@'localhost';

FLUSH PRIVILEGES;



cd {PATH_TO_EDX_FOLDER}/analytics

vi mysql_creds

***BEGIN mysql_creds FILE***

{

"host": "127.0.0.1",

"port": "3306",

"username": "analytics",

"password": "edx",

"database": "analytics"

}

***END mysql_creds FILE***

cd edx-analytics-pipeline

vi override.cfg

***BEGIN override.cfg***

[database-export]

database = analytics

credentials = {PATH_TO_EDX_FOLDER}/analytics/mysql_creds



[database-import]

database = edxprod

destination = s3://<bucket for intermediate hadoop products>/intermediate/database-import

credentials = s3://<secrets bucket>/edxapp_prod_ro_mysql_creds



[event-logs]

expand_interval = 2 days

pattern = .*tracking.log-(?P<date>[0-9]+).*

source = s3://<bucket to where all tracking logs are synched>/tracking/



[hive]

warehouse_path = s3://<bucket for intermediate hadoop products>/warehouse/hive/



[manifest]

path = s3://<bucket for intermediate hadoop products>/user-activity-file-manifests/manifest

lib_jar = s3://<secrets bucket>/oddjob-1.0.1-standalone-modified.jar

input_format = oddjob.ManifestTextInputFormat



[enrollments]

blacklist_date = 2001-01-01

blacklist_path = /tmp/blacklist



[answer-distribution]

valid_response_types = customresponse,choiceresponse,optionresponse,multiplechoiceresponse,numericalresponse,stringresponse,formularesponse

***END EXAMPLE override.cfg***

2. Acquire a log file (or create a dummy one)

mkdir /tmp/log_files

cd /tmp/log_files

At this point, you can either acquire a log file from S3 or another developer or use the dummy file below (Include the empty line at the end). Either way, place it in /tmp/log_files

vi tracking.log-20150101-1234567890

*** BEGIN DUMMY LOG FILE ***

{"username": "test_user", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555555, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-23T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_0", "choice_1", "choice_2"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}

{"username": "test_user_alt", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555556, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-22T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_4", "choice_5", "choice_6"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}

{"username": "test_user", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555555, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-22T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_4", "choice_5", "choice_6"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}

*** END DUMMY LOG FILE ***

3. Run the API locally and query for results of the pipeline's aggregation

cd PATH_TO_EDX_FOLDER/analytics/edx-analytics-pipeline

source ~/.venvs/edx-analytics-pipeline/bin/activate

launch-task AnswerDistributionToMySQLTaskWorkflow --local-scheduler --remote-log-level DEBUG --include *tracking.log* --src /tmp/log_files --dest /tmp/answer_dist --mapreduce-engine local --name test_task

mysql -u root -p



USE ANALYTICS;

SELECT COUNT(*) FROM answer_distribution;

If the pipeline task ran successfully (and you used the dummy file above), this should be the output:

+----------+

| COUNT(*) |

+----------+

|        2 |

+----------+

1 row in set (0.00 sec)

exit

deactivate

cd ../edx-analytics-data-api

source ~/.venvs/edx-analytics-data-api/bin/activate

./manage.py runserver --settings=analyticsdataserver.settings.local_mysql

Verify that the data API can connect to the database

1. Navigate to 127.0.0.1:8000 in your web browser:

  • If the page does not display and you see ImproperlyConfigured: Error loading MySQLdb module in the logs, run: 'pip install mysql-python'
  • If the page indicates a 401 access forbidden error, you need to rerun: './manage.py set_api_key edx edx'

2. Click on the answer_distribution query modal and enter 'i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1' into the box (or a different module_id from your logs if you didn't use the dummy log file from above)

3. Click to request the data from the API, and the results should match the log file from above (or whichever you used)

你可能感兴趣的:(数据分析)