Discovery the motivation of behavior from electricity consumption signal

Key Points

This chapter briefly elaborates how to analyze the motivation of people's operation on a system from the system electricity consumption signal and other data.

Objective: understand how, and why occupants interact with the system.
System: ventilation system in passive houses with adjustable flow rate option.
Raw data: electricity consumption signal; environment sensor records (temperature, humidity, CO2 etc.); 3-min interval * 2 years (2013-2015): 325946 rows × 25 features.
Technique: Noise reduction (Gaussian filter); Edge detection (1st derivative Gaussian filter), Feature selection (L1-penalized logistic regression, recursive feature elimination)

Below Figure 1 shows the overall pipeline I designed.

Discovery the motivation of behavior from electricity consumption signal_第1张图片

flowchart

(1) After essential preprocessing and cleaning (NaNs are backfilled), start with a system electricity consumption signal like Figure 2 below. A sudden change in the signal could imply the occupants' interaction with the system (e.g. once the occupant turn the flow rate into a higher option there should be a steep increasing edge on the electricity consumption signal). First thing to do is filtering out the noise (caused by wind etc. or system itself) and "fake operation" (status change with too-short duration).

Discovery the motivation of behavior from electricity consumption signal_第2张图片

Figure 2 Demo of Electricity Consumption Signal

(2) Through a finely-tuned 1st derivative Gaussian filter, the noise and "fake operation" could be filtered out and the valid operations would be marked out, like shown in Figure 3 below.

Discovery the motivation of behavior from electricity consumption signal_第3张图片

Noise reduced signal

Discovery the motivation of behavior from electricity consumption signal_第4张图片

1st Derivative filtered signal

Discovery the motivation of behavior from electricity consumption signal_第5张图片

Operation Marked

(3) Then the marked data set would undergo an undersampling process since the dataset is now skewed (The no. of records marked with 'no operation' is far more than ones with operation, either increase or decrease). The undersampling process ensures the data set has balanced scales with each class, for the effectiveness of following classification algorithm.

(4) After undersampling, the training set would be normalized and then fed into a L1-penalized logistic regression classifier. Since linear model penalized with L1 norm has sparse solutions i.e. many of its estimated coefficients would be zero, it could be used for feature selection purpose. Figure 4 below shows an example of the coefficients output in a certain experiment.

Discovery the motivation of behavior from electricity consumption signal_第6张图片

Coefficient output of logistic regression model

Then the logistic regression runs repeatedly to make a recursive feature elimination (first, the estimator is trained on the initial set of features and weights are assigned to each one of them. Then, features whose absolute weights are the smallest are pruned from the current set features). At last, the most informative feature combination (judged by cross-validation accuracy) in this case could be determined, like below Figure 5 shows: these features implies this occupant's motivation for his/her behavior.

Discovery the motivation of behavior from electricity consumption signal_第7张图片

Best feature combination after recursive feature elimination

(5) Repeat the process above for different occupants. The results imply there are different kinds of people since their "best feature combination" vary a lot: e.g. some of them are with strong "time pattern" while others may be more sensitive to indoor environment, like temperature etc. A K-Means clustering could help us demonstrate this by grouping the occupants into different user profiles.

Discovery the motivation of behavior from electricity consumption signal_第8张图片

Grouping: different user profiles found

From here below is technical log regarding relevant theory and code to realize the whole process.

Technical Details: Noise reduction & Edge detection

Context

There is a ventilation system (with heat recovery) in one passive house, of which the ventilation flow rate is controlled by a fan system, and adjustable by occupants. There are 3 available options (let's say, low, medium, high rate respectively)for the fan flow rate setting.

The electricity consumption of the fan system is recorded by a smart meter in terms of pulse. Obviously, occupants' flow rate setting could put significant influence on the electricity consumption and we could calibrate when and how people adjust their ventilation system based on the electricity consumption.

However, on the one hand, with the influence of back pressure, wind speed etc. the record is not something like a clear 3-stage square wave, instead it is quite noisy. On the other hand, we got many different houses (with similar structure but with different scales of records) within our research. They made it is not really practical to calibrate the ventilation setting position by fixed intervals (like pulse < 3 == position 1; 3 < pulse < 5 == position 2 etc.). We need a new algorithmic method to do this job.

This is a tiny piece of the elec. consumption record (day 185 in year 2014, house #9):

Discovery the motivation of behavior from electricity consumption signal_第9张图片

sample_size = math.ceil((sum(ventpos.op == 1) + sum(ventpos.op == -1))/2) sample_size noop_indices = ventpos[ventpos.op == 0].index noop_indices random_indices = np.random.choice(noop_indices, sample_size, replace=False) random_indices noop_sample = ventpos.loc[random_indices] up_sample = ventpos[ventpos.op == 1] down_sample = ventpos[ventpos.op == -1] op_sample = pd.concat([up_sample,down_sample]) op_sample.head()

undersampled_up = pd.concat([up_sample,noop_sample]) undersampled_up.head() #generate month/hour attribute from datetime string undersampled_up.dt = pd.to_datetime(undersampled_up.dt) t = pd.DatetimeIndex(undersampled_up.dt) hr = t.hour undersampled_up['HourOfDay'] = hr month = t.month undersampled_up['Month'] = month year = t.year undersampled_up['Year'] = year undersampled_up.head() for col in undersampled_up: print col

def remap(x): if x == 't': x = 0 else: x = 1 return x for col in ['wc_lr', 'wc_kitchen', 'wc_br3', 'wc_br2', 'wc_attic']: w = undersampled_up[col].apply(remap) undersampled_up[col] = w undersampled_up.head() openwin = undersampled_up.wc_attic + undersampled_up.wc_br2 + undersampled_up.wc_br3 + undersampled_up.wc_kitchen + undersampled_up.wc_lr undersampled_up['openwin'] = openwin; undersampled_up = undersampled_up.drop(['wc_lr', 'wc_kitchen', 'wc_br3', 'wc_br2', 'wc_attic','Year','dt','pulse_channel_ventilation_unit'],axis = 1) undersampled_up.head() for col in undersampled_up: print col

from sklearn import cross_validation lg = LogisticRegression(penalty='l1',C = 0.1) scores = cross_validation.cross_val_score(lg, X_scaled, y, cv=10) #The mean score and the 95% confidence interval of the score estimate print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

plt.figure(figsize=(12,9)) y_pos = np.arange(len(X_scaled.columns)) plt.barh(y_pos,abs(clf.coef_[0])) plt.yticks(y_pos + 0.4,X_scaled.columns) plt.title('Feature Importance from Logistic Regression')

Discovery the motivation of behavior from electricity consumption signal

Key Points

Technical Details: Noise reduction & Edge detection

Context

Methodology

Basic idea

Terms

Results

Finer Tuning

Reference

Technical Details: Feature selection

Feature selection after Gaussian filter

Reference

你可能感兴趣的:(Discovery the motivation of behavior from electricity consumption signal)