Pandas To Datetime β String to Date β pd.to\_datetime()
Convert strings to datetime in pandas
One of the Top 10 Pandas functions you must know is Pandas To Datetime. It a need-to-have in your data analysis toolkit. The wonderful thing about to_datetime() is itβs flexibility to read 95% of any dates youβll throw at it.
Interested in my Top 10 Pandas Functions? Get em here.
Pandas To Datetime (.to_datetime()
) will convert your string representation of a date to an actual date format. This is extremely important when utilizing all of the Pandas Date functionality like resample.
If you walk away with anything from this post, make sure itβs an understanding of how to use format codes when converting dates. Check out the code sample below.
Pseudo code: Given format, convert a string into a datetime object.
Pandas To Datetime
To DateTime Parameters
.to_datetime()
has a ton of parameters and they are all are important to understand. After you become familiar with them, youβll need to understand date format codes below.
- arg: This is the βthingβ that you want to convert to a datetime object. Pandas gives you a ton of flexibility; you can pass a int, float, string, datetime, list, tuple, Series, DataFrame, or dict. Thatβs a ton of input options!
- format (Default=None): *Very Important* The format parameter will instruct Pandas how to interpret your strings when converting them to DateTime objects. The format must use the format codes below. See examples below.
- origin (Default=βunixβ): An origin is simply a reference date. Where do you want to have your universe of timestamps to start? By default is is set to unix which is 1970-01-01.
'julian'
is January 1, 4713 BC. You can even set your own origin. - unit: Say you pass an int as your
arg
(like 20203939), withunit
, youβll be able specify what unit your int is is away from the origin. In the example here, if we set unit=βsβ, this means pandas will interpret 20203939 as 20,203,939 seconds away from the origin. Available units are [D,s,ms,us,ns] - dayfirst: This parameter helps pandas understand if your βdayβ is first in your format (ex: 01/02/2020 > 2020-02-01). I suggest playing with other parameters first before you try this one.
- yearfirst: Same as the dayfirst parameter above. This will help pandas parse your dates if your year is first. Try the format code options first.
- utc (Default=None): If you want to convert your DateTime objects to timezone-aware (meaning each datetime object also has a timezone) and you want that timezone to be UTC then set utc=True:
DateTime Format Codes
One extremely important concept to understand is DateTime format codes. This is how you instruct Pandas what format your DateTime string is in. Itβs magic every time you see it work. In fact, I look forward to gross strings with dates in them just to parse. See documentation.
Format Code | Description | Examples |
---|---|---|
%a | Weekday, abbreviated | Mon, Tues, Sat |
%A | Weekday, full name | Monday, Tuesday, Saturday |
%w | Weekday, decimal. 0=Sunday | 1, 2, 6 |
%d | Day of month, zero-padded | 01, 02, 21 |
%b | Month, abbreviated | Jan, Feb, Sep |
%B | Month, full name | January, February, September |
%m | Month number, zero-padded | 01, 02, 09 |
%y | Year, without century, zero-padded | 02, 95, 99 |
%Y | Year, with century | 1990, 2020 |
%H | Hour (24 hour), zero padded | 01, 22 |
%I | Hour (12 hour) zero padded | 01, 12 |
%p | AM or PM | AM, PM |
%M | Minute, zero-padded | 01, 02, 43 |
%S | Second, zero padded | 01, 32, 59 |
%f | Microsecond, zero-padded | 000001, 000342, 999999 |
%z | UTC offset Β±HHMM[SS[.ffffff]] | +0000, -1030, -3423.234 |
%Z | Time zone name | ITC, EST, CST |
%j | Day of year, zero-padded | 001, 365, 023 |
%U | Week # of year, zero-padded. Sunday first day of week | 00, 01, 51 |
%W | Week # of year, zero-padded. Monday first day of week | 00, 02, 51 |
%c | Appropriate date and time | Monday Feb 01 21:30:00 1990 |
%x | Appropriate Date | 02/01/90 |
%X | Appropriate Time | 21:22:00 |
%% | Literal '%' β Use this when you have a % sign in your format. | % |
Letβs run through each iteration of the above parameters