文字图片是转载:http://kernel.meizu.com/linux-thermal-framework-intro.html
代码分析是自己的分析
Linux Thermal 是Linux 系统下温度控制相关的模块,主要用来控制系统运行过程中芯片产生的热量,使得芯片温度和设备外壳维持在一个安全的范围。
Thermal 的主要框架
要实现一个温度控制的需求,就需要:获取温度的设备和控制温度的设备,以及一些使用温度控制设备的策略。
获取温度的设备:在Thermal框架中被抽象为Thermal Zone Device;
控制温度的设备:在Thermal框架中被抽象为Thermal Cooling Device;
Thermal Zone Device
上面说到Thermal Zone Device是获取温度设备的抽象,怎么抽象的?RTFSC
通过代码我们可以看到,一个能提供温度的设备操作函数主要有:绑定函数、获取温度函数、获取触发点温度函数。
绑定函数:Thermal core用来绑定用的,后面讲;
获取温度函数:获取设备温度用的。一般soc内部会有温度传感器提供温度,有些热敏电阻通过ADC也能算出温度,这个函数就是取这些温度值;
获取触发点温度函数:这个是用来做什么的呢?这个其实是thermal框架里面的一个关键点,因为要控制温度,那么什么时候控制就需要有东西来描述,而描述什么时候控制的东西就是触发点,每个thermal zone device会定义很多触发点,那么每个触发点就是通过该函数获得;
该结构体定义的地方是:./include/linux/thermal.h
struct thermal_zone_device {
int id;
char type[THERMAL_NAME_LENGTH];
struct device device;
struct thermal_attr *trip_temp_attrs;
struct thermal_attr *trip_type_attrs;
struct thermal_attr *trip_hyst_attrs;
void *devdata;
int trips;
/*轮询时间*/
unsigned long trips_disabled; /* bitmap for disabled trips */
int passive_delay;
int polling_delay;
int temperature;
int last_temperature;
int emul_temperature;
int passive;
unsigned int forced_passive;
atomic_t need_update;
/*设备操作函数*/
struct thermal_zone_device_ops *ops;
struct thermal_zone_params *tzp;
/*降温策略*/
struct thermal_governor *governor;
void *governor_data;
//重要,每个zone的instance列表头@thermal_instances:list of &struct thermal_instance of this thermal zone
struct list_head thermal_instances;
struct idr idr;
struct mutex lock;
struct list_head node;
/*用来循环处理的delayed_work*/
struct delayed_work poll_queue;
struct sensor_threshold tz_threshold[2];
struct sensor_info sensor;
};
struct thermal_zone_device_ops {
/*绑定函数*/
int (*bind) (struct thermal_zone_device *,struct thermal_cooling_device *);
int (*unbind) (struct thermal_zone_device *,struct thermal_cooling_device *);
/*获取温度函数*/
int (*get_temp) (struct thermal_zone_device *, unsigned long *);
int (*get_mode) (struct thermal_zone_device *,enum thermal_device_mode *);
int (*set_mode) (struct thermal_zone_device *,enum thermal_device_mode);
int (*get_trip_type) (struct thermal_zone_device *, int,enum thermal_trip_type *);
int (*activate_trip_type) (struct thermal_zone_device *, int,enum thermal_trip_activation_mode);
/*获取触发点温度*/
int (*get_trip_temp) (struct thermal_zone_device *, int,unsigned long *);
int (*set_trip_temp) (struct thermal_zone_device *, int,unsigned long);
int (*get_trip_hyst) (struct thermal_zone_device *, int,unsigned long *);
int (*set_trip_hyst) (struct thermal_zone_device *, int,unsigned long);
int (*get_crit_temp) (struct thermal_zone_device *, unsigned long *);
int (*set_emul_temp) (struct thermal_zone_device *, unsigned long);
int (*get_trend) (struct thermal_zone_device *, int,enum thermal_trend *);
int (*notify) (struct thermal_zone_device *, int,enum thermal_trip_type);
};
Thermal Cooling Devices
Thermal Cooling Devices是可以降温设备的抽象,能降温的设备比如风扇,这些好理解,但是像CPU,GPU,这些Cooling Devices怎么理解呢?
其实CPU,GPU这些Cooling device是通过降低产热量来降温的。而风扇,散热片这些是用来加快散热的。
Thermal Cooling Devices抽象的方式是,认为所有的能降温的设备有很多可以单独控制的状态,例如风扇有不同的风速状态。
CPU/GPU Cooling device 有不同最大运行频率状态,这样当温度高了之后通过调整这些状态来降低温度;
struct thermal_cooling_device {
int id;
char type[THERMAL_NAME_LENGTH];
struct device device;
struct device_node *np;
void *devdata;
/*操作函数*/
const struct thermal_cooling_device_ops *ops;
bool updated; /* true if the cooling device does not need update */
struct mutex lock; /* protect thermal_instances list */
//同上 ,instances列表的头结点
struct list_head thermal_instances;
struct list_head node;
};
struct thermal_cooling_device_ops {
int (*get_max_state) (struct thermal_cooling_device *, unsigned long *);
int (*get_cur_state) (struct thermal_cooling_device *, unsigned long *);
/*设定等级*/
int (*set_cur_state) (struct thermal_cooling_device *, unsigned long);
int (*get_requested_power)(struct thermal_cooling_device *,struct thermal_zone_device *, u32 *);
int (*state2power)(struct thermal_cooling_device *,struct thermal_zone_device *, unsigned long, u32 *);
int (*power2state)(struct thermal_cooling_device *,struct thermal_zone_device *, u32, unsigned long *);
};
Thermal Governor
Thermal Governor是降温策略的一个抽象,主要是根据温度来选择thermal cooling devices等级的方法,举个简单的例子,当前的温度升高速度很快,选择风扇3挡风,温度升高不快,选择1挡风,这就是一个Governor
很简单,所有的策略都通过throttle这个函数实现,内核已经实现了一些策略,step_wise,user_space,power_allocator,bang_bang,等具体实现算法细节就不展开了。
/**
* struct thermal_governor - structure that holds thermal governor information
* @name: name of the governor
* @bind_to_tz: callback called when binding to a thermal zone. If it
* returns 0, the governor is bound to the thermal zone,
* otherwise it fails.
* @unbind_from_tz: callback called when a governor is unbound from a
* thermal zone.
* @throttle: callback called for every trip point even if temperature is
* below the trip point temperature
* @governor_list: node in thermal_governor_list (in thermal_core.c)
*/
struct thermal_governor {
char name[THERMAL_NAME_LENGTH];
int (*bind_to_tz)(struct thermal_zone_device *tz);
void (*unbind_from_tz)(struct thermal_zone_device *tz);
/*策略函数*/
int (*throttle)(struct thermal_zone_device *tz, int trip);
struct list_head governor_list;
};
Thermal Core
有了获取温度的设备,有了温控控制的设备,有了控制方法,Thermal Core就负责把这些整合在一起。RTFSC
1.注册函数,Thermal Core通过对外提供注册的接口,让thermal zone device\thermal cooling device\thermal governor注册进来
这个接口函数是增加一个thermal zone device 的sensor 在目录/sys/class/thermal目录下,并且取名为thermal_zone[0-*],同时打算绑定thermal cooling devices 的注册,返回值是指向创建thermal_zone_device的指针
struct thermal_zone_device *thermal_zone_device_register(const char *type,int trips, int mask, void *devdata,struct thermal_zone_device_ops *ops,struct thermal_zone_params *tzp,int passive_delay, int polling_delay)
thermal_zone_device_register() - register a new thermal zone device
@type: the thermal zone device type
@trips: the number of trip points the thermal zone support
@mask: a bit string indicating the writeablility of trip points
@devdata: private device data
@ops: standard thermal zone device callbacks
@tzp: thermal zone platform parameters
@passive_delay: number of milliseconds to wait between polls when performing passive cooling
@polling_delay: number of milliseconds to wait between polls when checking whether trip points have been crossed (0 for interrupt driven systems)
这个接口函数是增加一个新的接口函数thermal cooling device (fan/processor/…) 在/sys/class/thermal/文件夹中作为cooling_device[0-*],它对自己是绑定的,返回值是指向thermal_cooling_device 结构体的指针。
struct thermal_cooling_device * thermal_cooling_device_register(char *type, void *devdata,const struct thermal_cooling_device_ops *ops)
thermal_cooling_device_register() - register a new thermal cooling device
@type: the thermal cooling device type.
@devdata: device private data.
@ops: standard thermal cooling devices callbacks.
这个接口是注册thermal governor
int thermal_register_governor(struct thermal_governor *governor)
2.Thermal zone/cooling device 注册过程中thermal core会调用绑定函数,绑定的过程最主要是一个cooling device 绑定到一个thermal_zone的触发点上
这个接口连接thermal cooling device到thermal zone device的某个触发点上。成功返回0
//先贴一个结构体
/*
* This structure is used to describe the behavior of
* a certain cooling device on a certain trip point
* in a certain thermal zone
*/
struct thermal_instance {
int id;
char name[THERMAL_NAME_LENGTH];
struct thermal_zone_device *tz;
struct thermal_cooling_device *cdev;
int trip;
bool initialized;
unsigned long upper; /* Highest cooling state for this trip point */
unsigned long lower; /* Lowest cooling state for this trip point */
unsigned long target; /* expected cooling state */
char attr_name[THERMAL_NAME_LENGTH];
struct device_attribute attr;
char weight_attr_name[THERMAL_NAME_LENGTH];
struct device_attribute weight_attr;
struct list_head tz_node; /* 重要node in tz->thermal_instances */
struct list_head cdev_node; /* 重要node in cdev->thermal_instances */
unsigned int weight; /* The weight of the cooling device */
};
thermal_zone_bind_cooling_device() - bind a cooling device to a thermal zone
@tz: pointer to struct thermal_zone_device
@trip: indicates which trip point the cooling devices is associated with in this thermal zone.
@cdev: pointer to struct thermal_cooling_device
@upper: the Maximum cooling state for this trip point. THERMAL_NO_LIMIT means no upper limit, and the cooling device can be in max_state.
@lower: the Minimum cooling state can be used for this trip point.THERMAL_NO_LIMIT means no lower limit,and the cooling device can be in cooling state 0.
@weight:The weight of the cooling device to be bound to thethermal zone. Use THERMAL_WEIGHT_DEFAULT for thedefault value
int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
int trip,
struct thermal_cooling_device *cdev,
unsigned long upper, unsigned long lower,
unsigned int weight)
{
struct thermal_instance *dev; //用来描述zone和cooling设备在某个trip 上的关系
struct thermal_instance *pos;
struct thermal_zone_device *pos1;
struct thermal_cooling_device *pos2;
unsigned long max_state;
int result;
//使得pos1指向tz设备,pos2指向cooling设备
list_for_each_entry(pos1, &thermal_tz_list, node) { if (pos1 == tz) break; }
list_for_each_entry(pos2, &thermal_cdev_list, node) { if (pos2 == cdev) break; }
//使用cooling设备的get_max_state函数,得到最大等级状态
cdev->ops->get_max_state(cdev, &max_state);
/* lower default 0, upper default max_state */
lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
upper = upper == THERMAL_NO_LIMIT ? max_state : upper;
dev = kzalloc(sizeof(struct thermal_instance), GFP_KERNEL); //给dev开辟空间
dev->tz = tz; //dev得到zone设备
dev->cdev = cdev; //dev得到cooling设备
dev->trip = trip; //dev得到温度触发的那个点
dev->upper = upper; //dev得到上限
dev->lower = lower; //dev得到下限
dev->target = THERMAL_NO_TARGET; // 不知道做啥的
dev->weight = weight; //dev得到weight
//调用idr_alloc,动态分配一个id号,并将该id号做为dev的id号
result = get_idr(&tz->idr, &tz->lock, &dev->id);
sprintf(dev->name, "cdev%d", dev->id); //用id号做成dev的name
//一个kobject对象就对应sys目录中的一个设备,代表这些驱动的结构
//在tz->device.kobj目录下创建指向cdev->device.kobj目录的软链接,name为软链接文件名称。
result =sysfs_create_link(&tz->device.kobj, &cdev->device.kobj, dev->name);
sprintf(dev->attr_name, "cdev%d_trip_point", dev->id);// 用id号做成dev的attr_name
sysfs_attr_init(&dev->attr.attr);// 文件属性的初始化?
//对属性进行赋值
dev->attr.attr.name = dev->attr_name;
dev->attr.attr.mode = 0444;
dev->attr.show = thermal_cooling_device_trip_point_show; //属性中show函数,具象为一个文件节点cat的调用
//调用sysfs_create_file()在kobj对应的目录下创建attr对应的属性文件
result = device_create_file(&tz->device, &dev->attr);
//大致同上,只是不太清楚weight是用来做啥的
sprintf(dev->weight_attr_name, "cdev%d_weight", dev->id);
sysfs_attr_init(&dev->weight_attr.attr);
dev->weight_attr.attr.name = dev->weight_attr_name;
dev->weight_attr.attr.mode = S_IWUSR | S_IRUGO;
dev->weight_attr.show = thermal_cooling_device_weight_show;
dev->weight_attr.store = thermal_cooling_device_weight_store;
result = device_create_file(&tz->device, &dev->weight_attr);
mutex_lock(&tz->lock); //对zone列表上锁
mutex_lock(&cdev->lock); //对cooling列表上锁
//遍历zone下的thermal_instances列表,看看有没有跟这个准备加入的instances一样的
list_for_each_entry(pos, &tz->thermal_instances, tz_node)
if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
result = -EEXIST; //有
break;
}
if (!result) { //没有的话,就分别在zone和cooling的设备的instances列表中加入
list_add_tail(&dev->tz_node, &tz->thermal_instances); //把这个instances加入到zone的instances列表中
list_add_tail(&dev->cdev_node, &cdev->thermal_instances);//把这个instances加入到cooling的instances列表中
atomic_set(&tz->need_update, 1);//原子操作,设置值
}
mutex_unlock(&cdev->lock); //对cooling列表解锁
mutex_unlock(&tz->lock); //对zone列表解锁
if (!result)
return 0;
device_remove_file(&tz->device, &dev->weight_attr);
remove_trip_file:device_remove_file(&tz->device, &dev->attr);
remove_symbol_link:sysfs_remove_link(&tz->device.kobj, dev->name);
release_idr:release_idr(&tz->idr, &tz->lock, dev->id);
free_mem:kfree(dev);
return result;
}
EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);//导出符号,在另一个函数中调用
3.Thermal core使能delayed_work循环处理,使得整个thermal控制流程运转起来,当温度升高超过温度触发点的话,就会使能对应的cooling device进行降温处理。
首先在在struct thermal_zone_device *thermal_zone_device_register()中调用中:
a.bind_tz(tz); --__bind–thermal_zone_bind_cooling_device()绑定zone和cooling设备
b.INIT_DELAYED_WORK(&(tz->poll_queue), thermal_zone_device_check);来初始化工作poll_queue以及工作函数thermal_zone_check;
c.if (!tz->ops->get_temp) thermal_zone_device_set_polling(tz, 0);如果tz不存在get_temp这个函数,则调用delay为0的thermal_zone_device_set_polling函数,里面调用cancel_delayed_work(&tz->poll_queue);取消延迟工作
d.thermal_zone_device_reset(tz); 重置这个zone设备,里面包括tz->temperature = THERMAL_TEMP_INVALID;tz->passive = 0;以及对每一个instances的pos->initialized = false;
c.之后是重点:
atomic_cmpxhg()是比较+交换的原子操作,比较need_update的值是否等于1,如果是,则把0赋值给need_update,否则不修改它的值,返回值是need_update赋值前的值。
如果,之前的bind成功,就会通过原子操作使得need_update的值为1
然后调用thermal_zone_device_update(tz)
if (atomic_cmpxchg(&tz->need_update, 1, 0))
thermal_zone_device_update(tz);
//在thermal_zone_device_update(tz);中
先执行update_temperature
--thermal_zone_get_temp(tz, &temp) -- tz->ops->get_temp(tz, temp)获得temp值
之后再赋值
tz->last_temperature = tz->temperature;
然后进行每个trip温度的处理,就是处理触发点,这里就会调用到具体的governor
for (count = 0; count < tz->trips; count++) handle_thermal_trip(tz, count);
在handle_thermal_trip函数中,首先通过tz->ops->get_trip_type(tz, trip, &type); 获取每个触发点的类别
然后根据类别进行不同governor运算handle_critical_trips(tz, trip, type);或者handle_non_critical_trips(tz, trip, type);
在处理完某个trip点后,我们需要调用monitor_thermal_zone(tz)来重新start 监视器monitor
在看monitor_thermal_zone函数之前,先看一下zone device结构体的一些用到的成员:
passive:1 if you've crossed a passive trip point, 0 otherwise. 当这个trip温度被触发后,passive为1,在前面的reset的时候已经置为0
passive_delay: number of milliseconds to wait between polls when performing passive cooling. 执行cooling时候的delay时间
polling_delay: number of milliseconds to wait between polls when checking whether trip points have been crossed (0 for interrupt driven systems) 平常检查的delay时间
根据以上三个参数执行函数thermal_zone_device_set_polling,执行如下函数
static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
int delay)
{
if (delay > 1000)
//执行延迟工作,delay时间后执行工作tz->poll_queue,用system_freezable_wq线程,因为delay>1000,且用cooling的时候,所以用粗粗的定时器round_jiffies
mod_delayed_work(system_freezable_wq, &tz->poll_queue, round_jiffies(msecs_to_jiffies(delay)));
else if (delay)//执行延迟工作,正常的检查温度状态
mod_delayed_work(system_freezable_wq, &tz->poll_queue,msecs_to_jiffies(delay));
else //如果delay为0,取消这个工作
cancel_delayed_work(&tz->poll_queue);
}
下面介绍延迟工作做了什么
static void thermal_zone_device_check(struct work_struct *work)
{
//通过工作,获得zone的结构体
struct thermal_zone_device *tz = container_of(work, struct thermal_zone_device, poll_queue.work);
thermal_zone_device_update(tz);//发现没有,又调用了上面的函数了,获得并且更新温度,进行governor的调度,重新start monitor,然后set polling,一段时间后又进行工作(delay时间,降温就久一点,check就短一点),不断循环
}
现在不妨换换思路,瞧点文学东西
如果你喜欢,聊历史,思哲学,品诗集,赏国学。
那就关注公众号:二校五叔
这个是博主的文学公众号啦_~