python列表操作计算列表长度并输出,用Python方式计算pandas dataframe列中列表的长度...

I have a dataframe like this:


2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux]

2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2]

2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik]

I am calculation length of lists in the CreationDate column and making a new Length column like this:

df['Length'] = df.CreationDate.apply(lambda x: len(x))

Which gives me this:

CreationDate Length

2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3

2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4

2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4

Is there a more pythonic way to do this?


You can use the str accessor for some list operations as well. In this example,


returns the length of each list. See the docs for str.len.

df['Length'] = df['CreationDate'].str.len()



CreationDate Length

2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3

2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4

2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4

For these operations, vanilla Python is generally faster. pandas handles NaNs though. Here are timings:

ser = pd.Series([random.sample(string.ascii_letters,

random.randint(1, 20)) for _ in range(10**6)])

%timeit ser.apply(lambda x: len(x))

1 loop, best of 3: 425 ms per loop

%timeit ser.str.len()

1 loop, best of 3: 248 ms per loop

%timeit [len(x) for x in ser]

10 loops, best of 3: 84 ms per loop

%timeit pd.Series([len(x) for x in ser], index=ser.index)

1 loop, best of 3: 236 ms per loop

你可能感兴趣的:(python列表操作计算列表长度并输出,用Python方式计算pandas dataframe列中列表的长度...)