锁 公平 非公平
Being good is easy, what is difficult is being just. - Victor Hugo
做好是容易的,困难是公正。 - 维克多·雨果
介绍 (Introduction)
Prevalence of A.I. system is not a new thing anymore from product, movie recommendation to taxi-hailing services it is present everywhere. As time is progressing further their adoption, popularity is increasing as well. Fairness is the absence of any prejudice or favoritism toward an individual or a group based on their inherent or acquired characteristics, hence a not so fair system will be biased towards a certain kind of individual.
从产品,电影推荐到出租车叫车服务,人工智能系统的普及已不再是新鲜事物。 随着时间的流逝,它们的采用也日益普及。 公平是指根据个人或群体的内在或后天特性,对个人或群体不存在任何偏见或偏爱,因此,不太公平的制度将偏向某种类型的个人。
AI不公平的问题 (Problems with unfair AI)
There are many famous cases to underscore the importance of fairness in A.I. systems. One recent paper ‘Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses’ highlights the same thing. It says ‘In this paper, we show that in settings where data exhibits bias along some axes (eg. gender, race), failure modes of Generative Adversarial Networks (GANs) exacerbate the biases in the generated data.’ Many top researchers have been speaking about biases and there is a lot of active research going in this direction.
有许多著名案例强调了AI系统中公平的重要性。 最近的一篇论文《不完美的植入:GAN加剧面部数据增强和Snapchat自拍照镜头的偏见》强调了同一件事。 它说:“在本文中,我们表明,在数据沿某些轴(例如性别,种族)表现出偏见的情况下,生殖对抗网络(GAN)的故障模式会加剧所生成数据中的偏见。” 许多顶尖的研究人员一直在谈论偏见,并且朝着这个方向开展了许多积极的研究。
The COMPAS predicted whether or not a crime was likely to re-offend or not. The system somewhat discriminated against African American ethnicities as compared to others. The issue was with data that was biased but they were not very transparent as well which exacerbated the conditions. The accuracy of the system later by a study was shown merely to be around 65 percent compared to predictions made by non-experts. It was also later found that the risk could also be predicted without having sensitive features such as race and gender. If COMPAS would have considered fairness standpoint such a blunder could have been avoided especially in such a critical domain
COMPAS预测犯罪是否有可能再次犯罪。 与其他种族相比,该制度在一定程度上歧视了非洲裔美国种族。 问题在于存在偏见的数据,但它们也不太透明,这加剧了情况。 后来的一项研究表明,与非专家的预测相比,该系统的准确性仅为65%左右。 后来还发现,如果没有种族和性别等敏感特征,也可以预测风险。 如果COMPAS考虑公平立场,那么就可以避免这种失误,尤其是在这样的关键领域
Another very interesting area where bias was discovered was healthcare. Paper titled Dissecting racial bias in an algorithm used to manage the health of populations discovered that a particular algorithm that helps hospitals and insurance agencies to identify which patient will benefit more from high-end healthcare services targeted for high-risk individuals based on training data where the ratio of black patients to white was that of 7T1. Even though this represents the reality but this imbalanced data problem needs to mitigate in general.
发现偏见的另一个非常有趣的领域是医疗保健。 题为“剖析种族偏见的一种用于管理人口健康的算法”的论文发现,一种特殊的算法可帮助医院和保险机构根据培训数据确定哪些患者将从针对高危人群的高端医疗服务中受益更多,其中黑人患者与白人的比例为7T1。 即使这代表了现实,但总体而言,这种不平衡的数据问题仍需要缓解。
偏见 (Biases)
Going a bit deep and simplifying the problem we have at our hands, there are 2 classes of biases that emerge algorithmically and data biases.
进一步深入并简化我们面临的问题,算法上会出现两类偏差和数据偏差。
Commonly occurring biases related to data are Historical Bias, Representation Bias, Measurement Bias, Evaluation Bias, Aggregation Bias, Population Bias, Sampling Bias, Content Production Bias, Temporal Bias, Popularity Bias, Observer Bias, and Funding Bias. One good example to showcase bias would be the Simpsons Paradox where characteristics of the subgroup are very distinct as compared to when they are aggregated. This means that we need to aggregate the data on a level when there is a sufficient sense of similarity between the components in the group which is often difficult to accomplish. So here we need to be sure when and how much to aggregate not on the basis of our convenience but depending upon how the data demands to be segregated.
与数据相关的常见偏差是历史偏差,代表偏差,度量偏差,评估偏差,汇总偏差,人口偏差,抽样偏差,内容生产偏差,时间偏差,流行偏差,观察者偏差和资金偏差。 表现出偏见的一个很好的例子是“辛普森悖论”,与汇总时相比,该亚组的特征非常不同。 这意味着,当组中各个组件之间存在足够的相似性(通常难以实现)时,我们需要在一个级别上汇总数据。 因此,在此我们需要确定何时聚合聚合以及聚合的数量不是基于我们的便利性,而是取决于数据需求的分离方式。
When it comes to algorithms, we might classify types of discrimination into direct, Indirect, systemic, statistical, explainable, unexplainable. One good example of systemic discrimination can be Amazon’s AI hiring algorithm which was somewhat sexist in nature.
在算法方面,我们可以将歧视类型分为直接,间接,系统,统计,可解释,无法解释。 系统歧视的一个很好的例子是亚马逊的AI招聘算法,该算法本质上有点性别歧视。
The ways to check the presence of biases and discriminatory behaviors of these systems depends on case to case basis. Importance of detecting one bias over the other in terms of priority and which kind of discrimination to address first is purely custom in nature. But one should try to rule as many biases possible out of the algorithms and data and should try to maintain a healthy balance between effectiveness and fairness.
检查这些系统是否存在偏见和歧视性行为的方法取决于具体情况。 本质上,在优先级和首先解决哪种歧视方面检测一个偏见是很重要的。 但是,应该尝试从算法和数据中排除尽可能多的偏差,并应尝试在有效性和公平性之间保持健康的平衡。
公平工具 (Tools for Fairness)
There are many interesting approaches by leaders to approach fairness in A.I. I would like to mention very interesting ideas in that sphere:
领导者有许多有趣的方法来解决AI中的公平问题,我想在这个领域中提到非常有趣的想法:
ML fairness gym relies on the foundational idea of understanding the long term impact of ML decision systems by the use of simulation and hence trying to create a replica socially dynamic system. The simulation framework can also be extended for multi-agent interaction environments. Papers such as ‘Delayed Impact of Fair Machine Learning’ tells us how important it is to consider dynamic and temporal factors.
机器学习公平体育馆依靠通过使用模拟理解机器学习决策系统的长期影响的基本思想,从而尝试创建一个具有社会动态性的复制品。 模拟框架也可以扩展为多主体交互环境。 诸如“公平机器学习的延迟影响”之类的论文告诉我们,考虑动态和时间因素是多么重要。
AI Fairness 360 by IBM is an open-source tool to address the issue of fairness in data and algorithms. It implements techniques as mentioned in a few research papers and provides us with bias detection, bias mitigation, and bias explainability tools.
IBM的AI Fairness 360是一个开放源代码工具,用于解决数据和算法中的公平性问题。 它实现了一些研究论文中提到的技术,并为我们提供了偏差检测,偏差缓解和偏差可解释性工具。
FATE: Fairness, Accountability, Transparency, and Ethics in AI in this offering by Microsoft we get extremely efficient tools to assess visualization dashboards and bias mitigation algorithms. It’s a good tool to compare the trade-offs between the performance and fairness of the systems.
命运:Microsoft在此产品中提供了AI中的公平性,问责制,透明度和道德规范,我们获得了非常高效的工具来评估可视化仪表板和偏差缓解算法。 这是一个比较系统性能和公平性之间折衷方案的好工具。
Even the EU commission white-paper on artificial intelligence focuses on the including fairness in the hindsight. But as we see slowly the algorithms which appear to be black boxes at the moment active research is going in the direction to understand even better and making great progress. With the increase in the capability of the explainability of these algorithms, it would become relatively much easier to track down the biases and make necessary interventions to ensure fairness. Papers such as Inceptionism: Going deeper into neural networks, The building blocks of interpretability, Feature visualization, and much more shows progress in such directions. When it comes to explainable AI there are many tools available right now to us which we can use to understand how even very complicated and black-box algorithms work. Tools are Lime, Shap, use of surrogate simpler model and feature importance graphs are very helpful for the same. For advance, unstructured data applications such as deep learning techniques such as GradCam and Attention visualization has also become popular for interpretability.
甚至欧盟委员会关于人工智能的白皮书也着眼于事后观察中的公平性。 但是,正如我们慢慢看到的那样,在积极研究的当下,似乎是黑匣子的算法正在朝着更好地理解和取得巨大进展的方向发展。 随着这些算法的可解释性能力的提高,追踪偏见并进行必要的干预以确保公平就变得相对容易得多。 诸如Inceptionism的论文:深入神经网络,可解释性的构建块,特征可视化等等,都显示了在这些方向上的进展。 当谈到可解释的AI时,我们现在可以使用许多工具,我们可以使用它们来了解甚至非常复杂和黑盒算法的工作方式。 工具包括Lime,Shap,使用替代品更简单的模型以及功能重要性图,这对它们都非常有帮助。 事前,诸如GradCam和Attention可视化之类的深度学习技术等非结构化数据应用程序也因其可解释性而广受欢迎。
确保AI公平实践的实践 (Practices to ensure the fair practice of AI)
Google also provides certain fair practices that are fundamentally based on 2 major ideas. Ensuring transparency in how the algorithm makes the decision and also forming teams that are diverse in nature. The main idea is to capture many varied views about the data and algorithms to ensure the issue of bias can be attacked from all corners. In addition to having people from different domains in the team, the open-source community can also serve as an extended team. Community groups are also useful in creating awareness and ensuring transparency. Model drift as well as the post-deployment performance of the systems should also be monitored more vigilantly. There should be extended studies about the origin of data, data collection methods, data preprocessing, data post-processing, it is labeling, the possible presence of sensitive fields such as race, gender, religion, and whether the data is diverse enough and balanced in terms of all the classes present or not.
Google还提供了一些基本基于2个主要想法的公平做法。 确保算法决策的透明度,并组建性质各异的团队。 主要思想是捕获有关数据和算法的许多不同观点,以确保可以从各个角度应对偏差问题。 除了团队中来自不同领域的人员外,开源社区还可以作为扩展团队。 社区团体也有助于提高认识和确保透明度。 系统的模型漂移以及部署后的性能也应得到更加谨慎的监控。 应该对数据的来源,数据收集方法,数据预处理,数据后处理,标签,敏感字段(例如种族,性别,宗教)的可能存在以及数据是否足够多样化和平衡进行深入研究。就是否存在所有类别而言
There is a new section which is introduced in recent papers ‘Broader Impact’ in papers which also covers ethical aspects of the usage of algorithms in the papers. This shows the increasingly sensitive nature of the researchers towards developing not just more accurate systems but also fair ones. One of the most famous deep learning and machine learning conference NeurIPS 2020 published guidelines for the Broader Impact section as follows:
最近的论文“ Broader Impact”中引入了一个新的部分,该部分还涵盖了论文中算法使用的道德方面。 这表明研究人员对不仅开发更准确的系统而且还开发公平的系统的敏感性日益提高。 最著名的深度学习和机器学习会议之一NeurIPS 2020发布了关于更广泛影响部分的指南,如下所示:
‘Authors are required to include a statement of the broader impact of their work, including its ethical aspects and future societal consequences. The authors should discuss both positive and negative outcomes if any. For instance, authors should discuss who may benefit from this research, who may be put at disadvantage from this research, what are the consequences of the failure of the system, whether the task/method leverages biases in the data. If authors believe this is not applicable to them, authors can simply state this.’
要求作者写一份有关其工作更广泛影响的陈述,包括其道德方面和未来社会后果。 如果有的话,作者应该讨论积极和消极的结果。 例如,作者应该讨论谁可能从这项研究中受益,谁可能从这项研究中处于不利地位,系统故障的后果是什么,任务/方法是否利用了数据中的偏差。 如果作者认为这不适用于他们,则只需声明一下即可。”
结论 (Conclusion)
In conclusion, we can see that the research, as well as the engineering world, is now taking this problem of unfairness in AI seriously and we can see good work coming out. I feel in the future it will become almost a prerequisite for these AI systems to fulfill a standard bar of fairness both in terms of training data is used as well as algorithms used. One troublesome this can be to track down faulty biases in increasingly complex systems and also huge amounts of data on which they are trained. One good example can be GPT-3 75 billion parameters language model (deep learning system) which is trained on a corpus as big as 2000 GB. If the rate of progress in understanding these systems and studies regarding fairness goes well in accordance with the development of new methods then the future is safe and we can see a safer and fair space. In the future, we might witness a specific body that will ensure the fairness of these systems consisting of experts from diverse fields something like the FDA. This might also need the development of standardized procedures to check for the biases and other ethical standards before it is too late which should also scale well with huge data sources.
总而言之,我们可以看到研究以及工程界现在都在认真对待AI中的不公平问题,并且我们可以看到良好的工作即将来临。 我觉得将来,无论是在使用训练数据还是使用算法方面,这些AI系统几乎都将成为达到标准公平性的先决条件。 一个麻烦的问题可能是要跟踪日益复杂的系统中的错误偏差,以及要跟踪的大量数据。 一个很好的例子是GPT-3 750亿参数语言模型(深度学习系统),该模型在高达2000 GB的语料库上进行训练。 如果随着新方法的发展,在理解这些系统和关于公平的研究方面取得的进展进展顺利,那么未来就是安全的,我们将看到一个更安全,更公平的空间。 将来,我们可能会见证一个特定的机构,该机构将确保这些系统的公正性,其中包括来自FDA等不同领域的专家。 这可能还需要开发标准化的程序来检查偏见和其他道德标准,以免为时已晚,这也应该能够与庞大的数据源一起很好地扩展。
https://www.reuters.com/article/us-amazon-com-jobs-automationinsight/amazon-scraps-secret-ai-recruiting-tool-that-showed-biasagainst-women-idUSKCN1MK08G
https://www.reuters.com/article/us-amazon-com-jobs-automationinsight/amazon-scraps-secret-ai-recruiting-tool-that-showed-biasagainst-women-idUSKCN1MK08G
https://arxiv.org/pdf/1803.04383.pdf
https://arxiv.org/pdf/1803.04383.pdf
https://arxiv.org/pdf/1908.09635.pdf
https://arxiv.org/pdf/1908.09635.pdf
https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/
https://www.microsoft.com/zh-cn/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/
https://blog.tensorflow.org/2019/12/fairness-indicators-fair-MLsystems.html
https://blog.tensorflow.org/2019/12/fairness-indicators-fair-MLsystems.html
https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
https://ai.googleblog.com/2020/02/ml-fairness-gym-tool-forexploring-long.html
https://ai.googleblog.com/2020/02/ml-fairness-gym-tool-forexploring-long.html
https://advances.sciencemag.org/content/4/1/eaao5580
https://advances.sciencemag.org/content/4/1/eaao5580
http://www.crj.org/assets/2017/07/9_Machine_bias_rejoinder.pdf
http://www.crj.org/assets/2017/07/9_Machine_bias_rejoinder.pdf
https://science.sciencemag.org/content/366/6464/447
https://science.sciencemag.org/content/366/6464/447
翻译自: https://towardsdatascience.com/fairness-in-a-i-5d3ceaaf649
锁 公平 非公平