python列表操作计算列表长度并输出,用Python方式计算pandas dataframe列中列表的长度...

I have a dataframe like this:

CreationDate

2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux]

2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2]

2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik]

I am calculation length of lists in the CreationDate column and making a new Length column like this:

df['Length'] = df.CreationDate.apply(lambda x: len(x))

Which gives me this:

CreationDate Length

2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3

2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4

2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4

Is there a more pythonic way to do this?

解决方案

You can use the str accessor for some list operations as well. In this example,

df['CreationDate'].str.len()

returns the length of each list. See the docs for str.len.

df['Length'] = df['CreationDate'].str.len()

df

Out:

CreationDate Length

2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3

2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4

2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4

For these operations, vanilla Python is generally faster. pandas handles NaNs though. Here are timings:

ser = pd.Series([random.sample(string.ascii_letters,

random.randint(1, 20)) for _ in range(10**6)])

%timeit ser.apply(lambda x: len(x))

1 loop, best of 3: 425 ms per loop

%timeit ser.str.len()

1 loop, best of 3: 248 ms per loop

%timeit [len(x) for x in ser]

10 loops, best of 3: 84 ms per loop

%timeit pd.Series([len(x) for x in ser], index=ser.index)

1 loop, best of 3: 236 ms per loop

你可能感兴趣的:(python列表操作计算列表长度并输出,用Python方式计算pandas dataframe列中列表的长度...)