数据量:10w
需求:订单表关联批次表导出线管数据
用应用jenssegers/mongodb 拓展包存取数据给我们带来便利,但是会不会带来性能方面的问题呢?
工具/环境:homestead + jenssegers/mongodb": "^3.2 + console
$t = microtime(true);
$order_id = '';
$is_true = true;
$limit = 1000;
$file_name = storage_path('files/csv_test.csv');
$fp = fopen($file_name, 'w');
while ($is_true) {
$query = \App\Models\Order_test::query()
->with('batch')
->where('payment.status', '>', \App\Models\Order::PAYMENT_UNPAID);
if (!empty($order_id)) {
$query->where('id','>', $order_id);
}
$orders = $query->orderBy('id', 'asc')->limit($limit)->get();
$order_count = count($orders);
if ($order_count > 0) {
//第一次
if (empty($order_id)) {
fwrite($fp, chr(0xEF) . chr(0xBB) . chr(0xBF));
$title_list = [
'订单号',
'付款时间',
'订单状态',
'商品金额',
'订单金额',
'商品种类',
'批次信息1',
'批次信息2',
'订单创建时间'
];
fputcsv($fp, $title_list);
}
//数据格式
foreach ($orders as $order) {
$order_id = $order['id'];
$data = [
'order_id' => $order_id.'\t',
'pay_time' => $order->pay_time ?? '',
'order_status' => \App\Models\Order::$_statusArr[$order->status] ?? '',
'products_cost' => $order->products_cost,
'cost_total' => $order->total,
'items_total' => $order->items_total,
'batch_field1' => $order->batch['field1']??'';,
'batch_field2' => $order->batch['field2']??'';,
'created_at' => (string)$order->created_at,
];
fputcsv($fp, $data);
}
}
//结束循环
if ($order_count < $limit) {
$is_true = false;
//关闭文件
fclose($fp);
}
}
$tl = microtime(true)-$t;
dd($tl);
用时73s
工具/环境:homestead + console
Artisan::command('tiway:mongo_raw', function () {
$t = microtime(true);
//链接mongo
$con = new \MongoDB\Client("mongodb://username:password@host:port/");
$db = $con->selectDatabase('test');
$collection = $db->selectCollection('order_test');
$order_id = '';
$is_true = true;
//分页数
$limit_item = 1000;
$file_name = storage_path('files/mongo_raw.csv');
$fp = fopen($file_name, 'w');
while ($is_true) {
//第一次
if (empty($order_id)) {
fwrite($fp, chr(0xEF) . chr(0xBB) . chr(0xBF));
$title_list = [
'订单号',
'付款时间',
'订单状态',
'商品金额',
'订单金额',
'商品种类',
'批次信息1',
'批次信息2',
'订单创建时间'
];
fputcsv($fp, $title_list);
}
$pipeline = [];
$match = ["payment.status" => ['$gt'=> \App\Models\Order::PAYMENT_UNPAID]];
if (!empty($order_id)) {
$order_id_match = ["id" => ['$gt'=> $order_id]];
$match += $order_id_match;
}
array_push($pipeline,['$match' => $match]);
$sort = [
'$sort' =>['id'=>1],
];
$limit = [
'$limit'=>$limit_item
];
array_push($pipeline,$limit,$sort);
//关联batch
$batch_pipe = [
'$lookup' => [
"from" => "easyeda_batch",
// 关联到order表
"localField" => "batch_num",
// user 表关联的字段
"foreignField" => "batch_num",
// order 表关联的字段
"as" => "batch",
],
];
array_push($pipeline,$batch_pipe);
//是否允许用文件缓存索引
$allow_disk = ['allowDiskUse'=> true];
$orders = $collection->aggregate($pipeline,$allow_disk);
$row = 0;
//数据格式
foreach ($orders as $order) {
$order_id = $order->id;
$data = [
'order_id' => $order_id.'\t',
'pay_time' => $order->pay_time ?? '',
'order_status' => \App\Models\Order::$_statusArr[$order->status] ?? '',
'products_cost' => $order->products_cost,
'cost_total' => $order->total,
'items_total' => $order->items_total,
'batch_field1' => $order->batch['field1']??'';,
'batch_field2' => $order->batch['field2']??'';,
'created_at' => (string)$order->created_at,
];
$row ++;
fputcsv($fp, $data);
}
//结束循环
if ($row < $limit_item) {
$is_true = false;
//关闭文件
fclose($fp);
}
}
$tl = microtime(true)-$t;
dd($tl);
});
居然用时172s,居然更多了!!
猜测是[‘allowDiskUse’=> true]这个设置的原因,但是如果不设置为true,内存不够呀;
In Aggregate.php line 219:
Sort exceeded memory limit of 104857600 bytes, but did not opt in to external
sorting. Aborting operation. Pass allowDiskUse:true to opt in.
尝试不用 s o r t 用 sort 用 sort用skip 去分页:
然后。。。。
导出在第37000 条的时候已经超过172s了,然后越来越慢慢慢慢。。。 第二天发现:2213s!!! what?!!!
不要轻易使用Skip来做查询,否则数据量大了就会导致性能急剧下降,这是因为Skip是一条一条的数过来的,多了自然就慢了。
mongodb 聚合文档
在探索中…