strftime()
and strptime()
Behaviordate
, datetime
, and time
objects all support a strftime(format)
method, to create a string representing the time under the control of an explicit format string.
Conversely, the datetime.strptime()
class method creates a datetime
object from a string representing a date and time and a corresponding format string.
The table below provides a high-level comparison of strftime()
versus strptime()
:
|
|
|
---|---|---|
Usage |
Convert object to a string according to a given format |
Parse a string into a |
Type of method |
Instance method |
Class method |
Method of |
|
|
Signature |
|
|
strftime()
and strptime()
Format CodesThe following is a list of all the format codes that the 1989 C standard requires, and these work on all platforms with a standard C implementation.
Directive |
Meaning |
Example |
Notes |
---|---|---|---|
|
Weekday as locale’s abbreviated name. |
Sun, Mon, …, Sat (en_US);
So, Mo, …, Sa (de_DE)
|
(1) |
|
Weekday as locale’s full name. |
Sunday, Monday, …, Saturday (en_US);
Sonntag, Montag, …, Samstag (de_DE)
|
(1) |
|
Weekday as a decimal number, where 0 is Sunday and 6 is Saturday. |
0, 1, …, 6 |
|
|
Day of the month as a zero-padded decimal number. |
01, 02, …, 31 |
(9) |
|
Month as locale’s abbreviated name. |
Jan, Feb, …, Dec (en_US);
Jan, Feb, …, Dez (de_DE)
|
(1) |
|
Month as locale’s full name. |
January, February, …, December (en_US);
Januar, Februar, …, Dezember (de_DE)
|
(1) |
|
Month as a zero-padded decimal number. |
01, 02, …, 12 |
(9) |
|
Year without century as a zero-padded decimal number. |
00, 01, …, 99 |
(9) |
|
Year with century as a decimal number. |
0001, 0002, …, 2013, 2014, …, 9998, 9999 |
(2) |
|
Hour (24-hour clock) as a zero-padded decimal number. |
00, 01, …, 23 |
(9) |
|
Hour (12-hour clock) as a zero-padded decimal number. |
01, 02, …, 12 |
(9) |
|
Locale’s equivalent of either AM or PM. |
AM, PM (en_US);
am, pm (de_DE)
|
(1), (3) |
|
Minute as a zero-padded decimal number. |
00, 01, …, 59 |
(9) |
|
Second as a zero-padded decimal number. |
00, 01, …, 59 |
(4), (9) |
|
Microsecond as a decimal number, zero-padded on the left. |
000000, 000001, …, 999999 |
(5) |
|
UTC offset in the form |
(empty), +0000, -0400, +1030, +063415, -030712.345216 |
(6) |
|
Time zone name (empty string if the object is naive). |
(empty), UTC, GMT |
(6) |
|
Day of the year as a zero-padded decimal number. |
001, 002, …, 366 |
(9) |
|
Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. |
00, 01, …, 53 |
(7), (9) |
|
Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0. |
00, 01, …, 53 |
(7), (9) |
|
Locale’s appropriate date and time representation. |
Tue Aug 16 21:30:00 1988 (en_US);
Di 16 Aug 21:30:00 1988 (de_DE)
|
(1) |
|
Locale’s appropriate date representation. |
08/16/88 (None);
08/16/1988 (en_US);
16.08.1988 (de_DE)
|
(1) |
|
Locale’s appropriate time representation. |
21:30:00 (en_US);
21:30:00 (de_DE)
|
(1) |
|
A literal |
% |
Several additional directives not required by the C89 standard are included for convenience. These parameters all correspond to ISO 8601 date values.
Directive |
Meaning |
Example |
Notes |
---|---|---|---|
|
ISO 8601 year with century representing the year that contains the greater part of the ISO week ( |
0001, 0002, …, 2013, 2014, …, 9998, 9999 |
(8) |
|
ISO 8601 weekday as a decimal number where 1 is Monday. |
1, 2, …, 7 |
|
|
ISO 8601 week as a decimal number with Monday as the first day of the week. Week 01 is the week containing Jan 4. |
01, 02, …, 53 |
(8), (9) |
These may not be available on all platforms when used with the strftime()
method. The ISO 8601 year and ISO 8601 week directives are not interchangeable with the year and week number directives above. Calling strptime()
with incomplete or ambiguous ISO 8601 directives will raise a ValueError
.
The full set of format codes supported varies across platforms, because Python calls the platform C library’s strftime()
function, and platform variations are common. To see the full set of format codes supported on your platform, consult the strftime(3) documentation.
New in version 3.6: %G
, %u
and %V
were added.
Broadly speaking, d.strftime(fmt)
acts like the time
module’s time.strftime(fmt, d.timetuple())
although not all objects support a timetuple()
method.
For the datetime.strptime()
class method, the default value is 1900-01-01T00:00:00.000
: any components not specified in the format string will be pulled from the default value. 4
Using datetime.strptime(date_string, format)
is equivalent to:
datetime(*(time.strptime(date_string, format)[0:6]))
except when the format includes sub-second components or timezone offset information, which are supported in datetime.strptime
but are discarded by time.strptime
.
For time
objects, the format codes for year, month, and day should not be used, as time
objects have no such values. If they’re used anyway, 1900
is substituted for the year, and 1
for the month and day.
For date
objects, the format codes for hours, minutes, seconds, and microseconds should not be used, as date
objects have no such values. If they’re used anyway, 0
is substituted for them.
For the same reason, handling of format strings containing Unicode code points that can’t be represented in the charset of the current locale is also platform-dependent. On some platforms such code points are preserved intact in the output, while on others strftime
may raise UnicodeError
or return an empty string instead.
Notes:
Because the format depends on the current locale, care should be taken when making assumptions about the output value. Field orderings will vary (for example, “month/day/year” versus “day/month/year”), and the output may contain Unicode characters encoded using the locale’s default encoding (for example, if the current locale is ja_JP
, the default encoding could be any one of eucJP
, SJIS
, or utf-8
; use locale.getlocale()
to determine the current locale’s encoding).
The strptime()
method can parse years in the full [1, 9999] range, but years < 1000 must be zero-filled to 4-digit width.
Changed in version 3.2: In previous versions, strftime()
method was restricted to years >= 1900.
Changed in version 3.3: In version 3.2, strftime()
method was restricted to years >= 1000.
When used with the strptime()
method, the %p
directive only affects the output hour field if the %I
directive is used to parse the hour.
Unlike the time
module, the datetime
module does not support leap seconds.
When used with the strptime()
method, the %f
directive accepts from one to six digits and zero pads on the right. %f
is an extension to the set of format characters in the C standard (but implemented separately in datetime objects, and therefore always available).
For a naive object, the %z
and %Z
format codes are replaced by empty strings.
For an aware object:
%z
utcoffset()
is transformed into a string of the form ±HHMM[SS[.ffffff]]
, where HH
is a 2-digit string giving the number of UTC offset hours, MM
is a 2-digit string giving the number of UTC offset minutes, SS
is a 2-digit string giving the number of UTC offset seconds and ffffff
is a 6-digit string giving the number of UTC offset microseconds. The ffffff
part is omitted when the offset is a whole number of seconds and both the ffffff
and the SS
part is omitted when the offset is a whole number of minutes. For example, if utcoffset()
returns timedelta(hours=-3, minutes=-30)
, %z
is replaced with the string '-0330'
.
Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes.
Changed in version 3.7: When the %z
directive is provided to the strptime()
method, the UTC offsets can have a colon as a separator between hours, minutes and seconds. For example, '+01:00:00'
will be parsed as an offset of one hour. In addition, providing 'Z'
is identical to '+00:00'
.
%Z
In strftime()
, %Z
is replaced by an empty string if tzname()
returns None
; otherwise %Z
is replaced by the returned value, which must be a string.
strptime()
only accepts certain values for %Z
:
any value in time.tzname
for your machine’s locale
the hard-coded values UTC
and GMT
So someone living in Japan may have JST
, UTC
, and GMT
as valid values, but probably not EST
. It will raise ValueError
for invalid values.
Changed in version 3.2: When the %z
directive is provided to the strptime()
method, an aware datetime
object will be produced. The tzinfo
of the result will be set to a timezone
instance.
When used with the strptime()
method, %U
and %W
are only used in calculations when the day of the week and the calendar year (%Y
) are specified.
Similar to %U
and %W
, %V
is only used in calculations when the day of the week and the ISO year (%G
) are specified in a strptime()
format string. Also note that %G
and %Y
are not interchangeable.
When used with the strptime()
method, the leading zero is optional for formats %d
, %m
, %H
, %I
, %M
, %S
, %J
, %U
, %W
, and %V
. Format %y
does require a leading zero.
Footnotes
If, that is, we ignore the effects of Relativity
This matches the definition of the “proleptic Gregorian” calendar in Dershowitz and Reingold’s book Calendrical Calculations, where it’s the base calendar for all computations. See the book for algorithms for converting between proleptic Gregorian ordinals and many other calendar systems.
See R. H. van Gent’s guide to the mathematics of the ISO 8601 calendar for a good explanation.
Passing datetime.strptime('Feb 29', '%b %d')
will fail since 1900
is not a leap year.
def extract_date(date_str):
"""
匹配并提取日期时间
:param date_str: 网页中的日期时间字符串
:return: 格式化后的日期字符串%Y-%m-%d %H:%M:%S(提取失败则返回当前日期时间)
"""
# [(date_str_regex, datetime_format), (), ...]
date_patterns = [
# 2020-12-22 12:12:12
(r'\d{4}-\d{1,2}-\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}', '%Y-%m-%d %H:%M:%S'),
# 2020-12-22 12:12
(r'\d{4}-\d{1,2}-\d{1,2}\s+\d{1,2}:\d{1,2}', '%Y-%m-%d %H:%M'),
# 2020-12-22
(r'\d{4}-\d{1,2}-\d{1,2}', '%Y-%m-%d'),
# 2020/12/22
(r'\d{4}/\d{1,2}/\d{1,2}', '%Y/%m/%d'),
# 12-22 12:21:22
(r'\d{1,2}-\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}', '%m-%d %H:%M:%S'),
# 12-22 12:21
(r'\d{1,2}-\d{1,2}\s+\d{1,2}:\d{1,2}', '%m-%d %H:%M'),
# 12-2116:07
(r'\d{1,2}-\d{1,2}\d{1,2}:\d{1,2}', '%m-%d%H:%M')
]
try:
for date_pattern_tuple in date_patterns:
search_date = re.search(date_pattern_tuple[0], date_str)
if search_date:
# re.search中pattern没有(),所以此处group(0)即为完整匹配的字符,()对应的group索引从1开始
search_date_str = search_date.group(0)
search_datetime = datetime.strptime(search_date_str, date_pattern_tuple[1])
# 格式化日期输出(替换缺失年1900为当前年)
return search_datetime.strftime('%Y-%m-%d %H:%M:%S').replace("1900", str(datetime.now().year))
except Exception as e:
logging.info("转换日期格式错误 - cur_date_str=%s", date_str)
return datetime.now().strftime('%Y-%m-%d %H:%M:%S')
def test_extract_date():
"""
测试日期时间提取
:return: 日期时间字符串
"""
date_str_list = [
'2021-01-04 10:13',
'2021-01-04 10:13:40',
'2021-01-01',
'2021/01/02',
'01-02 12:35:36',
'01-02 12:35',
'01-0212:35',
]
for date_str in date_str_list:
new_date_str = extract_date(date_str)
print(new_date_str)
运行结果:
2021-01-04 10:13:00
2021-01-04 10:13:40
2021-01-01 00:00:00
2021-01-02 00:00:00
2021-01-02 12:35:36
2021-01-02 12:35:00
2021-01-02 12:35:00