My name
is
Jon Skeet

Correctly parse dates with timezones in Python

I am working from Europe and I have a series of datetime that look like these:

 datetime(2016,10,30,0,0,0)
 datetime(2016,10,30,1,0,0)
 datetime(2016,10,30,2,0,0)
 datetime(2016,10,30,2,0,0)
 datetime(2016,10,30,3,0,0)
 datetime(2016,10,30,4,0,0)
 datetime(2016,10,30,5,0,0)

and so on for the entire day. I would like to convert them to UTC datetime which, in the end, should look something like this:

2016-10-30 00:00:00 + 02:00
2016-10-30 01:00:00 + 02:00
2016-10-30 02:00:00 + 02:00
2016-10-30 02:00:00 + 01:00
2016-10-30 03:00:00 + 01:00
2016-10-30 04:00:00 + 01:00

I used the following code to convert the timezones, but I get something that looks like this instead.

2016-10-30 00:00:00 + 02:00
2016-10-30 01:00:00 + 02:00
2016-10-30 02:00:00 + 02:00
2016-10-30 02:00:00 + 02:00
2016-10-30 03:00:00 + 01:00   
2016-10-30 04:00:00 + 01:00

The dates actually come from an Excel, but at the moment I am trying this to check if the conversion is correct.

import pytz
import datetime
from pytz.reference import UTC 

european = pytz.timezone('Europe/Berlin')
startdate = datetime.datetime(2016,10,30,0,0,0) 
hours = []

for i in range(3): 
   hours.append(startdate + datetime.timedelta(hours = i))
hours.append(hours[2])
for i in range(3,24): 
   hours.append(startdate + datetime.timedelta(hours = i))        

for i in range(len(hours)):
   hours[i] = european.localize(hours[i], is_dst = True)
   hours[i] = hours[i].astimezone(UTC)
   hours[i] = european.normalize(hours[i].astimezone(european))
   print(hours[i])

Update: Edited to make the question more clearer; hopefully
Update: Edited the time values in the code

This local value:

datetime(2016,10,30,2,0,0)

is ambiguous. 2am happens twice on October 30th 2016 in Berlin - once at 2016-10-30T00:00:00Z (with a UTC offset of +2), then once again at 2016-10-30T01:00:00Z (with a UTC offset of +1).

Now with your updated expectations, you appear to be wanting the same input value to give a different output value the second time you call it... look here at your input:

datetime(2016,10,30,0,0,0)
datetime(2016,10,30,1,0,0)
datetime(2016,10,30,2,0,0)
datetime(2016,10,30,2,0,0)

Lines 3 and 4 are the same. But you expect output of:

2016-10-30 00:00:00 + 02:00
2016-10-30 01:00:00 + 02:00
2016-10-30 02:00:00 + 02:00
2016-10-30 02:00:00 + 01:00

Now lines 3 and 4 are different.

The only way I can see for that to work is to detect that you've just tried the same value, and this time pass is_dst = False. Fundamentally, you shouldn't expect to run the same code on the same input and it to psychically work out whether you really "meant" DST this time or not.

If you're genuinely running through all the hours of a day, then just work out the start time of the day in UTC and add an hour to that each time. Fundamentally, converting a local time to UTC is problematic precisely because of the potential for ambiguous local times (around the "fall back") and skipped local times (around the "spring forward").

Your question has changed several times so it's not really clear what your input data is (something to do with Excel, but quite possibly not the values you've shown us) but you need to work out what you want to do with a given local time... the is_dst flag allows you to express a preference for or against DST, but you can't expect pytz to know more about your context than you do.

See more on this question at Stackoverflow