I know this question was asked before but each case is different... My plea is this:
df = pd.read_csv(‘file.csv’)
# convert the string into a datetime object
time = pd.to_datetime(df.dttm_utc)
Month=time.dt.month
Day=time.dt.day
Hour=time.dt.Hour
InDayLightSavings=True
if (Month<3): InDayLightSavings=False
if (Month==3) and (Day<11) and (Hour<2): InDayLightSavings=False
if (Month>11): InDayLightSavings=False
if (Month==11) and (Day>4)and (Hour>=2): InDayLightSavings=False
if (InDayLightSavings):
time=time-datetime.timedelta(hours=1)
And it returns, as you guessed correctly,Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I used this with timestamp, changing it to an ISO8601 before and this method works but it doesn't work for series apparently. And I tried adding .any() and it doesn't work. I also change and to & as suggested in other thread.
A par of my file.csv looks like this, running til end of 2012:
timestamp dttm_utc value
13253763002012-01-01 0:05:0016.9444
13253766002012-01-01 0:10:0016.6837
13253769002012-01-01 0:15:0016.6837
13253772002012-01-01 0:20:0016.9444
13253775002012-01-01 0:25:0016.1623
13253778002012-01-01 0:30:0016.6837
Desired output:
Include is an example of data in 15 min interval
3/13/2016 1:0051
3/13/2016 1:1548
3/13/2016 1:3050.4
3/13/2016 1:4551
3/13/2016 3:0047.4
3/13/2016 3:1549.8
3/13/2016 3:3051
3/13/2016 3:4551
3/13/2016 4:0048.6
Any help is appreciated.Thank you!
解决方案
The exception you are seeing is due to the fact that you try to evaluate a series with many different entries against a set of single conditions.
Briefly, let's have a look what you do:
Error analysis (why not to do it like that):
First, you did take a pandas dataframe column and then converted it to datetime, which of course also returns a column (series).
time = pd.to_datetime(df.dttm_utc) # Convert content of dttm_utc COLUMN to datetime
# This returns a dataframe COLUMN / series
Month = time.dt.month # Convert content of your COLUMN/series to month
Day = time.dt.day # Convert content of your COLUMN/series to month
Hour = time.dt.Hour # Convert content of your COLUMN/series to month
Your mistake: You then try to assess specific conditions along the series:
if (Month == whatever_condition):
do_something()
However, you can't compare a single condition to a series, at least not like that. Python doesn't know which entry in the series you mean, as some values in it may be different to others. That means, for some items in the series the condition may be fulfilled, for others not. Hence the ValueError: The truth value of a series is ambiguous.
What you want to do instead:
Evaluate item by item, ideally in a vectorized way. My suggestion: stay in the pandas dataframe all time:
df['Datetime'] = pd.to_datetime(df['dttm_utc']) # Add second column with datetime format
df['Month'] = df.Datetime.dt.month # Extract month and add to new column
# Same for day
df.loc[(df.Month < 3), 'InDayLightSavings'] = False
# You can add multiple conditions here
# Finally, your filter:
df.loc[(df.InDayLightSavings == True), 'Time'] = df['Time'] - dt.timedelta(hours=1)
# dt when import datetime as dt, else just datetime
Further reading here, here, here and here.