A recent paper has explored how taxi drivers search new passengers after drop-off and how they deliver passengers to given destination from GPS traces perspective.
Its GPS traces contain approx. 7600 taxis over one year in Hangzhou, China. Extracting those traces, each driver' record per day is obtained.
What strategies are there for searching passengers?
The researchers conducted studies into searching strategies by three perspectives - locally hunting, locally waiting, and going distant.
As defined in the paper, three perspectives are quantified by a metric $d_{\textrm{drop}}$, the length of initial intended path. Thresholds are set below to confirm strategy for each condition:
$$d_{\textrm{drop}}\left{\begin{matrix}
\tau _{d} & \textrm{going distant}\
\leq \tau _{d} & \left{\begin{matrix}
\tau _{\textrm{wait}}>\omega _{d}, & \textrm{waiting locally} \
\tau _{\textrm{wait}}\leq \omega _{d}, & \textrm{hunting locally}
\end{matrix}\right.
\end{matrix}\right.$$
where $\tau _{d}$ is a distance threshold and $\omega _{d}$ is a time threshold, they are empirically set as $\tau _{d}$=1.5 km, $\omega _{d}$=5 min.
Which strategy did the driver follow?
As stated above, $d_{\textrm{drop}}$ is the length of initial intended path, so what is this path? Let me clarify it.
Figure 1 depicts a cycle of a taxi service. Blue footprints indicate passenger delivering and red for passenger-searching.
Taxi drops off passengers at point A, and then drives to Point C via red path, waits there for 10 minutes (probably waiting for some potential passengers), and finally pick up passengers in the vicinity of last drop-off point A, i.e. Point B.
Which strategy did the driver follow?
- Going distance, because he drives to point C for passengers?
- Locally-waiting, for he waits for 10 min at point C? or
- Locally-hunting, since final pick-up point is in the vicinity of drop-off point?
It is a critical issue to address. Smart researchers propose a solution - they call it initial intended path.
For drop-off point A, they collect all recorded delivering path that starts from or passes through point A into a set named $\mathbb{T}$. And they map the red searching traces onto cells by indexing each dots.
Now, they count the number of paths in $\mathbb{T}$ that start from Point A through given dot index (denoted as $g$) on the red traces, Apart from same start and finishing points, each path must share same path, this is denoted as $hasPath(\mathbb{T},tf)$.
A support factor is defined then by dividing $hasPath(\mathbb{T},tf)$ by all path that starts from Point A to given dot, denoted as $passTraj(\mathbb{T},g)$, which is stated mathematically as:
$$Support(\mathbb{T},tf)=\tfrac{|hasPath(\mathbb{T},tf)|}{|passTraj(\mathbb{T},g)|}$$
A threshold $\beta$ is set for $Support(\mathbb{T},tf)$, if it's lower that $\beta$, the path from Point A to $g$ is $\beta-\textrm{anomalous}$.
To illustrate it, we take path from point A to C into consideration. We firstly create a set $\mathbb{T}$ containing all path that start/pass from point A through C, the number of these paths is denoted as $|passTraj(\mathbb{T},C)|$. Among these paths, some may deliver passengers in the same path as the driver, the number of that is denoted as $|hasPath(\mathbb{T},A→C)|$. Thus, $Support(\mathbb{T},A→C)$. Intuitively, it indicates what proportion of delivering drivers will choose the path as the drive does. If it is greater that threshold $\beta$, this is an efficient path since quite many choose it, and vice versa.
What exactly is this $\beta-\textrm{anomalous}$? Imagine, with assumption that delivering driver alwalys choose the shortest path, if A→C is selected by a great number of delivering drivers, this path may be considered as a shortest path, thus, it is the initial intended path of the driver, which tells us where the driver inclines to drive and helps us understand his strategy. If the path is the largest among any subpath of the drop-off to pick-up point, the path is the longest initial intended path.
Relax, we will understand it soon. The length of longest initial intended path is $d_{\textrm{drop}}$ defined in the beginning.
Take figure 1 as example, from A→C, it appears to be an efficient path, which is not $\beta-\textrm{anomalous}$; but A→B, very near, few will drive as the driver does, or passengers will complain. Thus, A→C is the initial intended path, viz, the driver follow strategy going distant rather than waiting/hunting locally.
With this method, every single driver's search strategy can be confirmed.
Researchers take some cells to illustrate drivers' searching strategies (figure 2). Passenger demands reduce from top to below. We conclude that in bustling area (Cell ID 1-10), locally-hunting (in dark green) is a good strategy; in medium demand area (Cell ID 42-65), going distant improves in the morning; in remote area (Cell ID 72-97), all strategies can not guarantee profitable income compared with bustling area.
Note that cell ID 72 is an airport, some intriguing conclusion can be drawn:
- a few flights arriving in the morning, thus, local waiting at airport doesn't earn much, driver had better hunt around or drive downtown;
- drivers prefer to go back to the downtown after dropping off passengers in the morning;
- Most flights arrive in the afternoon; more profitable to wait for passengers locally.
Next part, we will focus on how drivers deliver passengers and how they set their serving area.