在实际的项目中,经常会遇到这种情况
每隔一段时间,遍历某个Model表,并修改对应的数据。
经常使用的代码是
Model.all.each do |obj|
obj.do_something
end
这段代码的意思是,将所有的数据一次性加载到内存中处理。但是当我们的Model中table数据过多时,会引起程序崩溃。所以,find_each 方法应运而生。
find_each方法,是一次性加载1000条(默认)记录到内存中处理,知道将所有数据都处理完
Model.where(conditions).find_each do |obj|
obj.do_something
end
find_each有两个参数:
:batch_size
:start
batch_size: 一次加载的数据,默认是 1000
start: 统一个处理队列中,开启多个 workers?
This is especially useful if you wantmultiple workers dealing with the same processing queue.
find_each的源代码
# File activerecord/lib/active_record/relation/batches.rb, line 19
def find_each(options = {})
find_in_batches(options) do |records|
records.each { |record| yield record }
end
end
find_each 实际调用的是 find_in_batches
find_in_batches的代码如下:
# File activerecord/lib/active_record/relation/batches.rb, line 48
def find_in_batches(options = {})
relation = self
unless arel.orders.blank? && arel.taken.blank?
ActiveRecord::Base.logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
if (finder_options = options.except(:start, :batch_size)).present?
raise "You can't specify an order, it's forced to be #{batch_order}" if options[:order].present?
raise "You can't specify a limit, it's forced to be the batch_size" if options[:limit].present?
relation = apply_finder_options(finder_options)
end
start = options.delete(:start).to_i
batch_size = options.delete(:batch_size) || 1000
relation = relation.reorder(batch_order).limit(batch_size)
records = relation.where(table[primary_key].gteq(start)).all
while records.any?
records_size = records.size
primary_key_offset = records.last.id
yield records #将数据交给block处理
break if records_size < batch_size
if primary_key_offset
records = relation.where(table[primary_key].gt(primary_key_offset)).to_a #接着取另外的数据
else
raise "Primary key not included in the custom select clause"
end
end
end
由源代码可以看出,find_in_batches 实际上是while do ... 不断的取数据,然后使用block处理数据