- LeetCode --- 2169. Count Operations to Obtain Zero 解题报告
杨鑫newlfe
Python算法LeetCodeleetcode算法面试Python数据结构
Question:Youaregiventwonon-negativeintegersnum1andnum2.Inoneoperation,ifnum1>=num2,youmustsubtractnum2fromnum1,otherwisesubtractnum1fromnum2.Forexample,ifnum1=5andnum2=4,subtractnum2fromnum1,thusobtai
- Python爬虫与窗口实现翻译小工具(仅限学习交流)
纵码奔腾
python
Python爬虫与窗口实现翻译小工具(仅限学习交流)在工作中,遇到一个不懂的单词时,就会去网页找对应的翻译,我们可以用Python爬虫与窗口配合,制作一个简易的翻译小工具,不需要打开网页,自动把翻译结果显示出来。整个过程比较简单。#ThisisasamplePythonscript.#PressShift+F10toexecuteitorreplaceitwithyourcode.#PressDo
- 强化学习在自动驾驶中的实现与挑战
Echo_Wish
人工智能前沿技术自动驾驶人工智能机器学习
强化学习在自动驾驶中的实现与挑战自动驾驶技术作为当今人工智能领域的前沿之一,正通过各种方式改变我们的出行方式。而强化学习(ReinforcementLearning,RL),作为机器学习的一大分支,在自动驾驶的实现中扮演了至关重要的角色。它通过模仿人类驾驶员的决策过程,为车辆提供动态、灵活的导航与控制能力。然而,强化学习在实际应用中并非一帆风顺,还面临着诸多技术和现实挑战。本文将从原理、实现与挑战
- AI:263-强化学习在自动驾驶领域的应用与前沿挑战
一键难忘
精通AI实战千例专栏合集自动驾驶汽车强化学习人工智能
强化学习在自动驾驶中的应用与挑战自动驾驶汽车是当前人工智能和机器学习的热门研究方向,而强化学习(ReinforcementLearning,RL)因其在复杂动态环境中的决策能力,成为推动自动驾驶技术的重要工具。本文将探讨强化学习在自动驾驶中的应用、面临的挑战,并提供一个简单的代码实例以展示如何在自动驾驶中应用强化学习。1.强化学习的基础概念强化学习是一种通过试错的方式来学习最佳策略的机器学习方法。
- 强化学习在自动驾驶技术中的应用与挑战
电气_空空
自动驾驶人工智能机器学习
摘要:围绕强化学习在自动驾驶领域的应用进行了多方面的概括和总结。对强化学习原理及发展历程进行了介绍;系统介绍了自动驾驶技术体系以及强化学习在自动驾驶领域的应用所需的基础;按不同的应用方向分别介绍了强化学习在自动驾驶领域中的应用案例;深入分析了现阶段强化学习在自动驾驶领域存在的挑战,并提出若干展望。关键词:强化学习;自动驾驶;人工智能近年来,人工智能在各个领域得到了广泛应用。其快速发展为智能交通系统
- 强化学习:在无人驾驶中的应用
AI天才研究院
AI大模型企业级应用开发实战大数据AI人工智能计算计算科学神经计算深度学习神经网络大数据人工智能大型语言模型AIAGILLMJavaPython架构设计AgentRPA
强化学习:在无人驾驶中的应用作者:禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着科技的飞速发展,无人驾驶技术逐渐成为汽车工业和人工智能领域的热点。无人驾驶汽车被认为是未来交通系统的重要组成部分,它能够提高道路安全性、缓解交通拥堵、降低环境污染等。然而,实现无人驾驶面临着诸多挑战,其中最为关键的是如何让汽车在复杂多变的交通环
- 基于强化学习的自动驾驶决策规划算法
AI天才研究院
LLM大模型落地实战指南AI大模型应用入门实战与进阶计算科学神经计算深度学习神经网络大数据人工智能大型语言模型AIAGILLMJavaPython架构设计AgentRPA
基于强化学习的自动驾驶决策规划算法作者:禅与计算机程序设计艺术1.背景介绍自动驾驶技术是当前人工智能领域最受关注和投入的方向之一。自动驾驶汽车需要在复杂多变的交通环境中做出安全、舒适和高效的决策和行动。传统基于规则和模型的决策规划方法已经难以满足自动驾驶的需求。近年来,基于强化学习的决策规划算法越来越受到关注,它能够在复杂动态环境中学习出高效的决策策略。2.核心概念与联系强化学习是一种通过与环境的
- 原创prompt:员工加班助手
姚瑞南
prompt实战应用案例prompt
本文原创作者:姚瑞南AI-agent大模型运营专家,先后任职于美团、猎聘等中大厂AI训练专家和智能运营专家岗;多年人工智能行业智能产品运营及大模型落地经验,拥有AI外呼方向国家专利与PMP项目管理证书。(转载需经授权)#Role:员工加班填报助手##Profile:你是一个在公司内部帮助员工填报加班信息、审批的办公室助手,主要任务是通过友好且礼貌的引导员工对话填报加班方式来帮助员工完成加班信息填报
- 深入探讨:如何在Python中使用流式传输技术高效调用大型语言模型
m0_57781768
python语言模型microsoft
深入探讨:如何在Python中使用流式传输技术高效调用大型语言模型在现代人工智能应用中,大型语言模型(LargeLanguageModels,LLM)已经成为了强大的工具,能够生成高质量的自然语言文本,并且被广泛应用于各种任务中,如对话系统、文本生成、内容总结等。然而,如何更加高效地调用这些模型,特别是在实时交互的应用中,往往是开发者面临的挑战。流式传输(Streaming)技术提供了一种解决方案
- 从0到1:C++ 开启游戏开发奇幻之旅(二)
小周不想卷
艾思科蓝学术会议投稿c
目录游戏开发核心组件设计游戏循环游戏对象管理碰撞检测人工智能(AI)与物理引擎人工智能物理引擎性能优化技巧内存管理优化多线程处理实战案例:开发一个简单的2D射击游戏项目结构设计代码实现总结与展望游戏开发核心组件设计游戏循环游戏循环是游戏运行的核心机制,它就像是游戏的“心脏”,不断地跳动,驱动着游戏世界的运转。在游戏循环中,程序会不断地重复执行一系列的操作,包括处理用户输入、更新游戏状态、进行物理模
- 特征选择(机器学习)
赵孝正
机器学习算法机器学习人工智能
目录1.为什么需要特征选择2.常见的特征选择方法2.1过滤式(FilterMethods)小示例(用Python伪代码表达):2.2包裹式(WrapperMethods)小示例(RFE伪代码示例):2.3嵌入式(EmbeddedMethods)小示例(Lasso伪代码示例):3.实践建议4.小结1.为什么需要特征选择在机器学习任务中,经常会遇到以下问题:特征(变量)数量过多,导致计算量大、训练速度
- 【Python】解决UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x9A in position xxx: illegal multibyte
云天徽上
python运行报错解决记录pythonnumpy机器学习深度学习pandas
【Python】解决UnicodeDecodeError:‘gbk’codeccan’tdecodebyte0x9Ainpositionxxx:illegalmultibytesequence博主简介:曾任某智慧城市类企业算法总监,目前在美国市场的物流公司从事高级算法工程师一职,深耕人工智能领域,精通python数据挖掘、可视化、机器学习等,发表过AI相关的专利并多次在AI类比赛中获奖。CSDN人
- 新零售社交电商系统小程序功能开发详细解析
v.15889726201
零售小程序
现在的购物方式是越来越有趣了,新零售社交电商系统是互联网、大数据、人工智能的技术和咱们熟悉的传统零售深度结合后产生的。它整合线上线下渠道及数据,带来全方位、多渠道、个性化购物体验。借助实时库存管理、智能推荐和无缝购物体验等功能,打破传统电商与实体店界限,其具备以下显著特点:一、系统主要功能分销管理独家推广代码机制:在这个新零售社交电商系统里,每个经销商都有一个只属于自己的推广代码。把这个代码分享给
- Spark性能调优
大数据侠客
spark相关问题汇总及解决spark性能调优
1、前言在大数据计算领域,Spark已经成为了越来越流行、越来越受欢迎的计算平台之一。Spark的功能涵盖了大数据领域的离线批处理、SQL类处理、流式/实时计算、机器学习、图计算等各种不同类型的计算操作,应用范围与前景非常广泛。在美团•大众点评,已经有很多同学在各种项目中尝试使用Spark。大多数同学(包括笔者在内),最初开始尝试使用Spark的原因很简单,主要就是为了让大数据计算作业的执行速度更
- [碎碎念] 重启学习与博客之旅-我的每日计划
言午coding
碎碎念碎碎念
好久没在写博客了,今天我下定决心,要重新开始。我给自己定了个小目标,从今天起,每天都要写一篇博客,然后发布到CSDN和掘金上。以下是我的计划。一、每天学点新东西以后每天早上,我都得抽出至少一个小时专门用来学新技术。我打算先列个学习清单,把一直想学但没时间学的技术都写上去,然后按照自己的兴趣和工作需要,一项一项地去攻克。比如说,我最近对人工智能和大数据分析特别感兴趣,所以打算每天看点相关的专业书,或
- Python学习笔记 - 探索5种数据类型
Mr数据杨
Python编程基础python数据类型
在当今的数字时代,编程已经成为一种基本技能,不仅适用于软件开发人员,更广泛地应用于数据分析、人工智能、自动化和科学研究等领域。Python作为一种强大且易于学习的编程语言,因其简洁的语法和广泛的应用场景,成为了初学者学习编程的首选语言。在学习Python编程的过程中,理解和掌握数据类型是至关重要的。数据类型决定了程序中可以进行的操作类型,以及如何存储和处理信息。理解不同数据类型的特性和使用场景,不
- Mamba 项目使用教程
祖筱泳
Mamba项目使用教程mamba项目地址:https://gitcode.com/gh_mirrors/ma/mamba1.项目的目录结构及介绍Mamba项目的目录结构如下:mamba/├──docs/├──examples/├──mamba/│├──__init__.py│├──core.py│├──utils.py│└──...├──tests/├──.gitignore├──LICENSE├
- Android Bitmap高斯模糊
不会写代码的猴子
AndroidJavaandroidjavaBitmap
加载和使用缩小的位图(对于非常模糊的图像)永远不要使用完整大小的位图。图像越大,需要模糊的越多,模糊半径也需要越高,通常,模糊半径越高,算法所需的时间就越长。缩小位图的两种方式1.位图options缩小BitmapFactory.Optionsoptions=newBitmapFactory.Options();options.inSampleSize=8;BitmapblurTemplate=B
- Windows上安装与使用 Jupyter Notebook
梓仁沐白
pythonwindowsjupyteride
1.了解JupyterNotebookJupyterNotebook是一个交互式计算环境,非常适合进行数据科学和机器学习的研究和实验。可以在Notebook中直接编写代码、运行代码块、保存结果,非常直观。在安装JupyterNotebook时,可以选择全局环境(base环境)或虚拟环境。全局环境指的是安装在Miniconda或Anaconda根目录的Python环境,而虚拟环境是用于隔离不同项目和
- 讯飞绘镜(ai生成视频)技术浅析(三):自然语言处理(NLP)
爱研究的小牛
AIGC—视频AIGC—自然语言处理自然语言处理人工智能自然语言处理AIGC深度学习
1.技术架构概述讯飞绘镜的NLP技术架构可以分为以下几个核心模块:语义分析:理解用户输入的文本,提取关键信息(如实体、事件、情感等)。情节理解:分析文本中的故事情节,识别事件序列和逻辑关系。人物关系建模:识别文本中的人物及其关系,构建人物关系图。场景生成:根据情节和人物关系生成场景描述。每个模块都依赖于先进的深度学习模型和算法,以下将逐一详细讲解。2.语义分析语义分析的目标是从用户输入的文本中提取
- 讯飞智作 AI 配音技术浅析(一)
爱研究的小牛
AIGC—技术综述AIGC—概述AIGC—音频人工智能AIGC机器学习深度学习
一、核心技术讯飞智作AI配音技术作为科大讯飞在人工智能领域的重要成果,融合了多项前沿技术,为用户提供了高质量的语音合成服务。其核心技术主要涵盖以下几个方面:1.深度学习与神经网络讯飞智作AI配音技术以深度学习为核心驱动力,通过以下关键模型实现语音合成:Tacotron模型:该模型采用端到端的编码器-解码器架构,将输入文本直接转换为梅尔频谱(Mel-spectrogram),再通过声码器生成语音信号
- AWS CloudFormation Fargate 开源项目教程
虞耀炜
AWSCloudFormationFargate开源项目教程aws-cloudformation-fargateSampleCloudFormationtemplatesforhowtorunDockercontainersinAWSFargatewithvariousnetworkingconfigurations项目地址:https://gitcode.com/gh_mirrors/aw/aw
- DeepSeek-R1:多模态AGI的实践突破与场景革命
热爱分享的博士僧
agi
一、DeepSeek-R1的核心定位DeepSeek-R1是深度求索(DeepSeek)研发的多模态通用人工智能模型,旨在突破单一模态的局限性,实现文本、图像、语音、视频等跨模态信息的深度理解、推理与生成。该模型基于统一的架构设计,通过跨模态对齐与知识共享机制,推动AI在复杂场景中的落地应用,覆盖医疗、工业、教育、娱乐等领域。二、技术架构与创新亮点统一的多模态框架采用Transformer-bas
- 浏览器同源策略:从“源”到安全限制的全面解析
WebGISer_白茶乌龙桃
安全前端vue.jsjavascript
一、什么是“源”(Origin)?在浏览器中,“源”是Web安全的核心概念。一个“源”由三部分组成:协议(Protocol):如http://、https://、ftp://域名(Host):如www.example.com端口(Port):如:80(HTTP默认)、:443(HTTPS默认)示例:https://www.example.com:443和https://www.example.co
- 核心线程数和最大线程数设置参考标准【Java】
松树戈
实用配置java开发语言
核心线程数和最大线程数设置参考标准【Java】首先确定Java线程是什么态的?Java的线程是用户态+内核态,而内核态线程通过操作系统来调用,最终的可用线程数与操作系统的核数相关【如果设置了太多,很多是无效线程】一个设计标准:根据当前业务是IO密集型还是CPU密集型,设置核心线程数CPU密集型:核心线程数=CPU核数+1【机器学习、视频转码】IO密集型:核心线程数=CPU核数*2【Web应用】Ja
- DeepSeek R1与OpenAI o1深度对比
码事漫谈
AI人工智能机器学习
文章目录引言技术原理DeepSeekR1OpenAIo1性能表现官方数据推理任务知识密集型任务通用能力价格对比应用场景科研与技术开发自然语言处理(NLP)企业智能化升级教育与培训数据分析与智能决策部署与集成DeepSeekR1OpenAIo1伦理考量DeepSeekR1OpenAIo1未来展望DeepSeekR1OpenAIo1引言在科技飞速发展的当下,人工智能领域中的大型语言模型(LLMs)正以
- 使用 JuiceFS 快照功能实现数据库发布与端到端测试
Juicedata
架构运维
今天的博客来自JuiceFS云服务用户Jerry,他们通过使用JuiceFSsnapshot功能,创新性地实现了数据的版本控制。Jerry,是一家位于北美的科技公司,利用人工智能和机器学习技术,简化用户购买汽车和家庭保险的比较及购买流程。在软件开发领域,严格的测试和受控发布已经成为几十年来的标准做法。但如果我们能将这些原则应用到数据库和数据仓库中会怎样?想象一下,能够为数据基础设施定义一套带有测试
- GitHub热门开源项目
李小白杂货铺
计算机技术杂谈github
文章目录GitHub高级搜索GitHub秘籍GitHub开源项目排行榜热门开源项目学习类、资料类freeCodeCampfree-programming-bookscoding-interview-universityawesomedeveloper-roadmapsystem-design-primerYou-Dont-Know-JSCS-Notesjavascript-algorithmsbu
- 关于人工智能(AI)的发展现状和未来趋势的详细分析!
Stanford_1106
学习关于AI人工智能c++微信开放平台微信小程序微信公众平台aitwitter
成长路上不孤单【14后///C++爱好者///持续分享所学///如有需要欢迎收藏转发///】今日将继续分享关于人工智能(AI)的发展现状和趋势的相关内容!关于【人工智能(AI)的发展现状和未来趋势】目录:一、AI人工智能行业背景二、AI人工智能产业细分领域三、AI人工智能产业链结构四、AI人工智能行业发展现状五、AI人工智能行业未来发展趋势预测六、AI人工智能行业前景七、AI人工智能行业目前存
- 本地部署 DeepSeek-R1 大模型
网络安全我来了
人工智能AI人工智能
本地部署DeepSeek-R1大模型指南1.引言1.1DeepSeek-R1模型简介在人工智能的世界里,大型语言模型(LLM)正如一座巨大的宝库,里面储存着丰富的信息和无限的潜力。而DeepSeek-R1,就像那扇打开智慧之门的钥匙。它是一款专注于数学、代码和自然语言推理任务的高性能AI推理模型。许多用户希望能在本地环境中自由操作这些强大的模型,因为这不仅关乎数据隐私,还能满足定制化部署的需求。这
- Spring4.1新特性——综述
jinnianshilongnian
spring 4.1
目录
Spring4.1新特性——综述
Spring4.1新特性——Spring核心部分及其他
Spring4.1新特性——Spring缓存框架增强
Spring4.1新特性——异步调用和事件机制的异常处理
Spring4.1新特性——数据库集成测试脚本初始化
Spring4.1新特性——Spring MVC增强
Spring4.1新特性——页面自动化测试框架Spring MVC T
- Schema与数据类型优化
annan211
数据结构mysql
目前商城的数据库设计真是一塌糊涂,表堆叠让人不忍直视,无脑的架构师,说了也不听。
在数据库设计之初,就应该仔细揣摩可能会有哪些查询,有没有更复杂的查询,而不是仅仅突出
很表面的业务需求,这样做会让你的数据库性能成倍提高,当然,丑陋的架构师是不会这样去考虑问题的。
选择优化的数据类型
1 更小的通常更好
更小的数据类型通常更快,因为他们占用更少的磁盘、内存和cpu缓存,
- 第一节 HTML概要学习
chenke
htmlWebcss
第一节 HTML概要学习
1. 什么是HTML
HTML是英文Hyper Text Mark-up Language(超文本标记语言)的缩写,它规定了自己的语法规则,用来表示比“文本”更丰富的意义,比如图片,表格,链接等。浏览器(IE,FireFox等)软件知道HTML语言的语法,可以用来查看HTML文档。目前互联网上的绝大部分网页都是使用HTML编写的。
打开记事本 输入一下内
- MyEclipse里部分习惯的更改
Array_06
eclipse
继续补充中----------------------
1.更改自己合适快捷键windows-->prefences-->java-->editor-->Content Assist-->
Activation triggers for java的右侧“.”就可以改变常用的快捷键
选中 Text
- 近一个月的面试总结
cugfy
面试
本文是在学习中的总结,欢迎转载但请注明出处:http://blog.csdn.net/pistolove/article/details/46753275
前言
打算换个工作,近一个月面试了不少的公司,下面将一些面试经验和思考分享给大家。另外校招也快要开始了,为在校的学生提供一些经验供参考,希望都能找到满意的工作。 
- HTML5一个小迷宫游戏
357029540
html5
通过《HTML5游戏开发》摘抄了一个小迷宫游戏,感觉还不错,可以画画,写字,把摘抄的代码放上来分享下,喜欢的同学可以拿来玩玩!
<html>
<head>
<title>创建运行迷宫</title>
<script type="text/javascript"
- 10步教你上传githib数据
张亚雄
git
官方的教学还有其他博客里教的都是给懂的人说得,对已我们这样对我大菜鸟只能这么来锻炼,下面先不玩什么深奥的,先暂时用着10步干净利索。等玩顺溜了再用其他的方法。
操作过程(查看本目录下有哪些文件NO.1)ls
(跳转到子目录NO.2)cd+空格+目录
(继续NO.3)ls
(匹配到子目录NO.4)cd+ 目录首写字母+tab键+(首写字母“直到你所用文件根就不再按TAB键了”)
(查看文件
- MongoDB常用操作命令大全
adminjun
mongodb操作命令
成功启动MongoDB后,再打开一个命令行窗口输入mongo,就可以进行数据库的一些操作。输入help可以看到基本操作命令,只是MongoDB没有创建数据库的命令,但有类似的命令 如:如果你想创建一个“myTest”的数据库,先运行use myTest命令,之后就做一些操作(如:db.createCollection('user')),这样就可以创建一个名叫“myTest”的数据库。
一
- bat调用jar包并传入多个参数
aijuans
下面的主程序是通过eclipse写的:
1.在Main函数接收bat文件传递的参数(String[] args)
如: String ip =args[0]; String user=args[1]; &nbs
- Java中对类的主动引用和被动引用
ayaoxinchao
java主动引用对类的引用被动引用类初始化
在Java代码中,有些类看上去初始化了,但其实没有。例如定义一定长度某一类型的数组,看上去数组中所有的元素已经被初始化,实际上一个都没有。对于类的初始化,虚拟机规范严格规定了只有对该类进行主动引用时,才会触发。而除此之外的所有引用方式称之为对类的被动引用,不会触发类的初始化。虚拟机规范严格地规定了有且仅有四种情况是对类的主动引用,即必须立即对类进行初始化。四种情况如下:1.遇到ne
- 导出数据库 提示 outfile disabled
BigBird2012
mysql
在windows控制台下,登陆mysql,备份数据库:
mysql>mysqldump -u root -p test test > D:\test.sql
使用命令 mysqldump 格式如下: mysqldump -u root -p *** DBNAME > E:\\test.sql。
注意:执行该命令的时候不要进入mysql的控制台再使用,这样会报
- Javascript 中的 && 和 ||
bijian1013
JavaScript&&||
准备两个对象用于下面的讨论
var alice = {
name: "alice",
toString: function () {
return this.name;
}
}
var smith = {
name: "smith",
- [Zookeeper学习笔记之四]Zookeeper Client Library会话重建
bit1129
zookeeper
为了说明问题,先来看个简单的示例代码:
package com.tom.zookeeper.book;
import com.tom.Host;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.Wat
- 【Scala十一】Scala核心五:case模式匹配
bit1129
scala
package spark.examples.scala.grammars.caseclasses
object CaseClass_Test00 {
def simpleMatch(arg: Any) = arg match {
case v: Int => "This is an Int"
case v: (Int, String)
- 运维的一些面试题
yuxianhua
linux
1、Linux挂载Winodws共享文件夹
mount -t cifs //1.1.1.254/ok /var/tmp/share/ -o username=administrator,password=yourpass
或
mount -t cifs -o username=xxx,password=xxxx //1.1.1.1/a /win
- Java lang包-Boolean
BrokenDreams
boolean
Boolean类是Java中基本类型boolean的包装类。这个类比较简单,直接看源代码吧。
public final class Boolean implements java.io.Serializable,
- 读《研磨设计模式》-代码笔记-命令模式-Command
bylijinnan
java设计模式
声明: 本文只为方便我个人查阅和理解,详细的分析以及源代码请移步 原作者的博客http://chjavach.iteye.com/
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
/**
* GOF 在《设计模式》一书中阐述命令模式的意图:“将一个请求封装
- matlab下GPU编程笔记
cherishLC
matlab
不多说,直接上代码
gpuDevice % 查看系统中的gpu,,其中的DeviceSupported会给出matlab支持的GPU个数。
g=gpuDevice(1); %会清空 GPU 1中的所有数据,,将GPU1 设为当前GPU
reset(g) %也可以清空GPU中数据。
a=1;
a=gpuArray(a); %将a从CPU移到GPU中
onGP
- SVN安装过程
crabdave
SVN
SVN安装过程
subversion-1.6.12
./configure --prefix=/usr/local/subversion --with-apxs=/usr/local/apache2/bin/apxs --with-apr=/usr/local/apr --with-apr-util=/usr/local/apr --with-openssl=/
- sql 行列转换
daizj
sql行列转换行转列列转行
行转列的思想是通过case when 来实现
列转行的思想是通过union all 来实现
下面具体例子:
假设有张学生成绩表(tb)如下:
Name Subject Result
张三 语文 74
张三 数学 83
张三 物理 93
李四 语文 74
李四 数学 84
李四 物理 94
*/
/*
想变成
姓名 &
- MySQL--主从配置
dcj3sjt126com
mysql
linux下的mysql主从配置: 说明:由于MySQL不同版本之间的(二进制日志)binlog格式可能会不一样,因此最好的搭配组合是Master的MySQL版本和Slave的版本相同或者更低, Master的版本肯定不能高于Slave版本。(版本向下兼容)
mysql1 : 192.168.100.1 //master mysq
- 关于yii 数据库添加新字段之后model类的修改
dcj3sjt126com
Model
rules:
array('新字段','safe','on'=>'search')
1、array('新字段', 'safe')//这个如果是要用户输入的话,要加一下,
2、array('新字段', 'numerical'),//如果是数字的话
3、array('新字段', 'length', 'max'=>100),//如果是文本
1、2、3适当的最少要加一条,新字段才会被
- sublime text3 中文乱码解决
dyy_gusi
Sublime Text
sublime text3中文乱码解决
原因:缺少转换为UTF-8的插件
目的:安装ConvertToUTF8插件包
第一步:安装能自动安装插件的插件,百度“Codecs33”,然后按照步骤可以得到以下一段代码:
import urllib.request,os,hashlib; h = 'eb2297e1a458f27d836c04bb0cbaf282' + 'd0e7a30980927
- 概念了解:CGI,FastCGI,PHP-CGI与PHP-FPM
geeksun
PHP
CGI
CGI全称是“公共网关接口”(Common Gateway Interface),HTTP服务器与你的或其它机器上的程序进行“交谈”的一种工具,其程序须运行在网络服务器上。
CGI可以用任何一种语言编写,只要这种语言具有标准输入、输出和环境变量。如php,perl,tcl等。 FastCGI
FastCGI像是一个常驻(long-live)型的CGI,它可以一直执行着,只要激活后,不
- Git push 报错 "error: failed to push some refs to " 解决
hongtoushizi
git
Git push 报错 "error: failed to push some refs to " .
此问题出现的原因是:由于远程仓库中代码版本与本地不一致冲突导致的。
由于我在第一次git pull --rebase 代码后,准备push的时候,有别人往线上又提交了代码。所以出现此问题。
解决方案:
1: git pull
2:
- 第四章 Lua模块开发
jinnianshilongnian
nginxlua
在实际开发中,不可能把所有代码写到一个大而全的lua文件中,需要进行分模块开发;而且模块化是高性能Lua应用的关键。使用require第一次导入模块后,所有Nginx 进程全局共享模块的数据和代码,每个Worker进程需要时会得到此模块的一个副本(Copy-On-Write),即模块可以认为是每Worker进程共享而不是每Nginx Server共享;另外注意之前我们使用init_by_lua中初
- java.lang.reflect.Proxy
liyonghui160com
1.简介
Proxy 提供用于创建动态代理类和实例的静态方法
(1)动态代理类的属性
代理类是公共的、最终的,而不是抽象的
未指定代理类的非限定名称。但是,以字符串 "$Proxy" 开头的类名空间应该为代理类保留
代理类扩展 java.lang.reflect.Proxy
代理类会按同一顺序准确地实现其创建时指定的接口
- Java中getResourceAsStream的用法
pda158
java
1.Java中的getResourceAsStream有以下几种: 1. Class.getResourceAsStream(String path) : path 不以’/'开头时默认是从此类所在的包下取资源,以’/'开头则是从ClassPath根下获取。其只是通过path构造一个绝对路径,最终还是由ClassLoader获取资源。 2. Class.getClassLoader.get
- spring 包官方下载地址(非maven)
sinnk
spring
SPRING官方网站改版后,建议都是通过 Maven和Gradle下载,对不使用Maven和Gradle开发项目的,下载就非常麻烦,下给出Spring Framework jar官方直接下载路径:
http://repo.springsource.org/libs-release-local/org/springframework/spring/
s
- Oracle学习笔记(7) 开发PLSQL子程序和包
vipbooks
oraclesql编程
哈哈,清明节放假回去了一下,真是太好了,回家的感觉真好啊!现在又开始出差之旅了,又好久没有来了,今天继续Oracle的学习!
这是第七章的学习笔记,学习完第六章的动态SQL之后,开始要学习子程序和包的使用了……,希望大家能多给俺一些支持啊!
编程时使用的工具是PLSQL
Twice I’ve tried to realistically present the performance of the algorithm. Twice was my paper rejected because of “unfinished methods” or “disappointing results”. There’s a whole culture of “rounding-up”, and trying to do the evaluations fairly just gives you trouble. When fair evaluations get rejected and rounders-up pass through, what do you do?
Anonymous’s story is surely common.
On any given paper, there is an incentive to “cheat” with some of the above methods. This can be hard to resist when so much rides on a paper acceptance _and_ some of the above cheats are not easily detected. Nevertheless, it should be resisted because “cheating” of this sort inevitably fools you as well as others. Fooling yourself in research is a recipe for a career that goes nowhere. Your techniques simply won’t apply well to new problems, you won’t be able to tackle competitions, and ultimately you won’t even trust your own intuition, which is fatal in research.
My best advice for anonymous is to accept that life is difficult here. Spend extra time testing on many datasets rather than a few. Spend extra time thinking about what make a good algorithm, or not. Take the long view and note that, in the long run, the quantity of papers you write is not important, but rather their level of impact. Using a “cheat” very likely subverts long term impact.
How about an index of negative results in machine learning? There’s a Journal of Negative Results in other domains: Ecology & Evolutionary Biology, Biomedicine, and there is Journal of Articles in Support of the Null Hypothesis. A section on negative results in machine learning conferences? This kind of information is very useful in preventing people from taking pathways that lead nowhere: if one wants to classify an algorithm into good/bad, one certainly benefits from unexpectedly bad examples too, not just unexpectedly good examples.
I visited the workshop on negative results at NIPS 2002. My impression was that it did not work well.
The difficulty with negative results in machine learning is that they are too easy. For example, there are a plethora of ways to say that “learning is impossible (in the worst case)”. On the applied side, it’s still common for learning algorithms to not work on simple-seeming problems. In this situation, positive results (this works) are generally more valuable than negative results (this doesn’t work).
This discussion reminds of some interesting research on “anti-learning“, by Adam Kowalczyk. This research studies (empirically and theoretically) machine learning algorithms that yield good performance on the training set but worse than random performance on the independent test set.
Hmm, rereading this post. What do you mean by “brittle”? Why is mutual information brittle?
Standard deviation of loss across the CV folds is not a bad summary of variation in CV performance. I’m not sure one can just reject a paper where the authors bothered to disclose the variation, rather than just plopping out the average. Standard error carries some Gaussian assumptions, but it is still a valid summary. The distribution of loss is sometimes quite close to being Gaussian, too.
As for significance, I came up with the notion of CV-values that measure how often method A is better than method B in a randomly chosen fold of cross-validation replicated very many times.
What I mean by brittle: Suppose you have a box which takes some feature values as input and predicts some probability of label 1 as output. You are not allowed to open this box or determine how it works other than by this process of giving it inputs and observing outputs.
Let x be an input.
Let y be an output.
Assume (x,y) are drawn from a fixed but unknown distribution D.
Let p(x) be a prediction.
For classification error I(|y – p(x)| < 0.5) you can prove a theorem of the rough form:
forall D, with high probability over the draw of m examples independently from D,
expected classification error rate of the box with respect to D is bounded by a function of the observations.
What I mean by “brittle” is that no statement of this sort can be made for any unbounded loss (including log-loss which is integral to mutual information and entropy). You can of course open up the box and analyze its structure or make extra assumptions about D to get a similar but inherently more limited analysis.
The situation with leave-one-out cross validation is not so bad, but it’s still pretty bad. In particular, there exists a very simple learning algorithm/problem pair with the property that the leave-one-out estimate has the variance and deviations of a single coin flip. Yoshua Bengio and Yves Grandvalet in fact proved that there is no unbiased estimator of variance. The paper that I pointed to above shows that for K-fold cross validation on m examples, all moments of the deviations might only be as good as on a test set of size $m/K$.
I’m not sure what a ‘valid summary’ is, but leave-one-out cross validation can not provide results I trust, because I know how to break it.
I have personally observed people using leave-one-out cross validation with feature selection to quickly achieve a severe overfit.
Thanks for the explanation of brittleness! This is a problem with log-loss, but I’d say that it is not a problem with mutual information. Mutual information has well-defined upper bounds. For log-loss, you can put a bound into effect by mixing the prediction with a uniform distribution over y, bounding the maximum log-loss in a way that’s analogous to the Laplace probability estimate. While I agree that unmixed log-loss is brittle, I find classification accuracy noisy.
A reasonable compromise is Brier score. It’s a proper loss function (so it makes good probabilistic sense), and it’s a generalization of classification error where the Brier score of a non-probabilistic classifier equals its classification error, but a probabilistic classifier can benefit from distributing the odds. So, the result you mention holds also for Brier score.
If I perform 2-replicated 5-fold CV of the NBC performance on the Pima indians dataset, I get the following [0.76 0.75 0.87 0.76 0.74 0.77 0.79 0.72 0.78 0.82 0.81 0.79 0.73 0.74 0.82 0.79 0.74 0.77 0.83 0.75 0.79 0.73 0.79 0.80 0.76]. Of course, I can plop out the average of 0.78. But it is nicer to say that the standard deviation is 0.04, and summarize the result as 0.78 +- 0.04. The performance estimate is a random quantity too. In fact, if you perform many replications of cross-validation, the classification accuracy will have a Gaussian-like shape too (a bit skewed, though).
I too recommend against LOO, for the simple reason that the above empirical summaries are often awfully strange.
Very very interesting. However, I still feel (but would love to be convinced otherwise) that when the dataset is small and no additional data can be obtained, LOO-CV is the best among the (admittedly non-ideal) choices. What do you suggest as a practical alternative for a small dataset?
I’m not convinced by your observation about people using LOO-CV with feature selection to overfit. Isn’t this just a problem with reusing the same validation set multiple times? Even if I use a completely separately drawn validation set, which Bengio and Grandvalet show yield an unbiased estimtae of the variance of the prediction error, I can still easily overfit the validation set when doing feature selection, right?
This is my first post on your blog. Thanks so much for putting it up — a very nice resource!
Aleks’s technique for bounding log loss by wrapping the box in a system that mixes with the uniform distribution has a problem: it introduces perverse incentives for the box. One reason why people consider log loss is that the optimal prediction is the probability. When we mix with the uniform distribution, this no longer becomes true. Mixing with the uniform distribution shifts all probabilistic estimates towards 0.5, which means that if the box wants to minimize log loss, it should make an estimate p such that after mixing, you get the actual probability.
David McAllester advocates truncation as a solution to the unboundedness. This has the advantage that it doesn’t create perverse incentives over all nonextreme probabilities.
Even when we swallow the issues of bounding log loss, rates of convergence are typically slower than for classification, essentially because the dynamic range of the loss is larger. Thus, we can expect log loss estimates to be more “noisy”.
Before trusing mutual information, etc…, I want to see rate of convergence bounds of the form I mentioned above.
I’m not sure what Brier score is precisely, but justing using L(p,y)=(p-y)^2 has all the properties mentioned.
I consider reporting standard deviation of cross validation to be problematic. The basic reason is that it’s unclear what I’m supposed to learn. If it has a small deviation, this does not mean that I can expect the future error rate on i.i.d. samples to be within the range of the +/-. It does not mean that if I cut the data in another way (and the data is i.i.d.), I can expect to get results in the same range. There are specific simple counterexamples to each of these intuitions. So, while reporting the range of results you see may be a ‘summary’, it does not seem to contain much useful information for developing confidence in the results.
One semi-reasonable alternative is to report the confidence interval for a Binomial with m/K coin flips, which fits intuition (1), for the classifier formed by drawing randomly from the set of cross-validated classifiers. This won’t leave many people happy, because the intervals become much broader.
The notion that cross validation errors are “gaussian-like” is also false in general, on two counts:
This is an important issue because it’s not always obvious from experimental results (and intuitions derived from experimental results) whether the approach works. The math says that if you rely on leave-one-out cross-validation in particular you’ll end up with bad inuitions about future performance. You may not encounter this problem on some problems, but the monsters are out there.
For rif’s questions — keep in mind that I’m only really considering methods of developing confidence here. I’m ok with people using whatever ugly nasty hacks they want in producing a good predictor. You are correct about the feature selection example being about using the same validation set multiple times. (Bad!) The use of leave-one-out simply aggravated the effect of this with respect to using a holdout set because it’s easier to achieve large deviations from the expectation on a leave-one-out estimate than on a holdout set.
Developing good confidence on a small dataset is a hard problem. The simplest solution is to accept the need for a test set even though you have few examples. In this case, it might be worthwhile to compute very exact confidence intervals (code here). Doing K-fold cross validation on m examples and using confidence intervals for m/K coin flips is better, but by an unknown (and variable) amount. The theory approach, which has never yet worked well, is to very carefully use the examples for both purposes. A blend of these two approaches can be helpful, but the computation is a bit rough. I’m currently working with Matti Kääriäinen on seeing how well the progressive validation approach can be beat into shape.
And of course we should remember that all of this is only meaningful when the data is i.i.d, which it often clearly is not.
I think we have a case where the assumptions of applied machine learners differ from the assumptions of the theoretical machine learners. Let’s hash it out!
==
* (Half-)Brier score is 0.5(p-y)^2, where p and y are vectors of probabilities (p-predicted, y-observed).
* A side consequence of mixing is also truncation; but mixing is smooth, whereas truncation results in discontinuities of the gradient. There is a good justification for mixing: if you see that you misclassify in 10% of the cases on the unseen test data, you can anticipate similar error in the future, and calibrate the predictions by mixing with the uniform distribution.
* Standard deviation of the CV results is a foundation for bias/variance decomposition and a tremendous amount of work in applied statistics and machine learning. I wouldn’t toss it away so lightly, and especially not based on the argument of non-independence of folds. The purpose of non-independence of folds in the first place is that you get a better estimate of the distribution over all the training/test splits of a fixed proportion (one could say that the split is chosen by i.i.d., not the instances). You get a better estimate with 10-fold CV than by picking 10 train/test splits by random.
* Both binomial and Gaussian model of the error distribution are just models. Neither of them is ‘true’, but they are based on slightly different assumptions. I generally look at the histogram and eyeball it for gaussianity, as I have done in my example. The fact that it is a skewed distribution (with the truncated hump at ~85%) empirically invalidates the binomial error model too. One can compute the first two moments as a “finite” summary as an informative summary even if the underlying distribution has more of them.
I am not advocating ‘tossing’ cross-validation. I am saying that caution should be exercised in trusting it.
Do you have a URL for this other analysis?
You are right to be skeptical about models, but the ordering of skepticism seems important. Models which make more assumptions (and in particular which makes assumptions that are clearly false) should be viewed with more skepticism.
What is standard deviation of cross validation errors is supposed to describe? I listed and dismissed a couple possibilities, so now I’m left without an understanding.
I’d like to follow up a bit on your comment that “It’s easier to achieve large deviations from the expectation on a leave-one-out estimate than on a holdout set.” I was not familiar with this fact. Could you discuss this in more detail, or provide a reference that would help me follow this up? Quite interesting.
I didn’t mean to imply that you’d disagree with cross-validation in general. The issue at hand is whether the standard deviation of CV errors is useful or not. I can see two reasons for why one can be unhappy about it:
a) It can happen that you get accuracy of 0.99 +- 0.03. What could that mean? The standard deviation is a summary. If you provide a summary consisting of the first two moments, it does not mean that you believe in the Gaussian model – of course those statistics are not sufficient. It is a summary that roughly describes the variance of the classifier, inasmuch that the mean accuracy indicates its bias.
b) The instances in a training and test set are not i.i.d. Yes, but the above summary relates to the question: “Given a randomly chosen training/test 9:1 split of instances, what can we say about the classifier’s accuracy on the test set?” This is a different question than “Given a randomly chosen instance, what will be the classifier’s expected accuracy?”
Several people have a problem with b) and use bootstrap instead of cross-validation in bias/variance analysis. Still, I don’t see a problem with the formulation, if one doesn’t attempt to perceive CV as an approximation to making statements about i.i.d. samples.
rif – see today’s post under “Examples”.
Aleks, I regard the 0.99 +/- 0.3 issue as a symptom that the wrong statistics are being used (i.e. assuming gaussianity on obviously non-gaussian draws).
I’m not particularly interested in “Given a randomly chosen training/test 9:1 split of instances, what can we say about the classifier’s accuracy on the test set?†because I generally think the goal of learning is doing well on future examples. Why should I care about this?
Reporting 0.99 +- 0.03 does not imply that one who wrote it believes that the distribution is Gaussian. Would you argue that reporting 0.99 +- 0.03 is worse than just reporting 0.99? Anyone surely knows that the classification accuracy cannot be more than 1.0, it would be most arrogant to assume such ignorance.
CV is the de facto standard method of evaluating classifiers, and many people trust the results that come out of this. Even if I might not like this approach, it is a standard, it’s an experimental bottom line. “Future examples” are something you don’t have, something you can only make assumptions about. Cross-validation and learning curves employ the training data as to empirically demonstrate the stability and convergence of the learning algorithm on what effectively *is* future data for the algorithm, under the weak assumption of permutability of the training data. Permutability is a weaker assumption than iid. My main problem with most applications of CV is that people don’t replicate the cross-validation on multiple assignments to folds, something that’s been pointed out quite nicely by, e.g.,
Estimating Replicability of Classifier Learning Experiments. ICML, 2004.
The problem with LOO is that you *cannot* perform multiple replications.
If your assumptions grow from iid, you shouldn’t use cross-validation, it’s a) not solving your problem, and b) you could get better results with an evaluation method that assumes more. It is unfair to criticize CV on these grounds. One can grow a whole different breed of statistics based on permutability and training/test splitting.
Reporting 0.99 +- 0.03 does mean that the inappropriate statistics are being used.
I am not trying to claim anything about the belief of the person making the application (and certainly not trying to be arrogant).
I have a problem with reporting the +/- 0.03. It seems that it has no interesting interpretation, and the obvious statistical interpretation is simply wrong.
The standard statistical “meaning” of 0.99 +- 0.03 is a confidence interval about an observation. A confidence interval [lower_bound(observation), upper_bound(observation)] has the property that, subject to your assumptions, it will contain the true value of some parameter with high probability over the random draw of the observation. The parameter I care about is the accuracy, the probability that the classifier is correct. Since the true error rate can not go above 1, this confidence interval must be constructed with respect to the wrong assumptions about the observation generating process. This isn’t that damning though – what’s really hard to swallow is that this method routinely results in intervals which are much narrower than the standard statistical interpretation would suggest. In other words, it generates overconfidence.
> Would you argue that reporting 0.99 +- 0.03 is worse than just reporting 0.99?
Absolutely. 0.99 can be interpreted as an unbiased monte carlo estimate of the “true” accuracy. I do not have an interpretation of 0.03, and the obvious interpretations are misleading due to nongaussianity and nonindependence in the basic process. Using this obvious interpretation routinely leads to overconfidence which is what this post was about.
I don’t regard the distinction between “permutable” and “independent” as significant here, because DeFinetti’s theorem says that all exchangeable (i.e. permutable) sequences can be thought of as i.i.d. samples conditioned on the draw of a hidden random variable. We do not care what the value of this hidden random variable is because a good confidence interval for accuracy works no matter what the datageneration process is. Consequently, the ‘different breed’ you speak of will end up being the same breed.
Many people use cross validation in a way that I don’t disagree with. For example, tuning parameters might be reasonable. I don’t even have a problem with using cross validation error to report performance (except when this creates a subtle instance of “reproblem”). What seems unreasonable is making confidence interval-like statements subject to known-wrong assumptions. This seems especially unreasonable when there are simple alternatives which don’t make known-wrong assumptions.
I think you are correct: many other people (I would not say it’s quite “the” standard) try to compute (and report) confidence interval-like summaries. I think it’s harmful to do so because of the routine overconfidence this creates.
rif — Another reason LOO CV is bad because it asymptotically suboptimal. For example if you use Leave One Out cross-validation for feature selection, you might end up selecting suboptimal subset, even with infinite training sample. Te neural-nets FAQ talks about it: http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html
Experimentally, Ronny Kohavi and Breiman found independently that 10 is the best number of folds for CV.
The FAQ says “cross-validation is markedly superior [to split sample validation] for small data sets; this fact is demonstrated dramatically by Goutte (1997)”. (google scholar has the paper), but I’m not sure their conclusions extend beyond their Gaussian synthetic data.
I agree with you regarding the inappropriateness of +- notation, and I also agree about general overconfidence of confidence intervals. Over here it says: “LTCM’s loss in August 1998 was a -10.5 sigma-event on the firm’s risk model, and a -14 sigma-event in terms of the actual previous price movements. Sometimes overfitting is very expensive LTCM “lost” quite a few hundred million US$ (“lost” — financial transactions are largely a zero-sum game).
What if I’d had written 0.99(0.03), without implying that 0.03 is a confidence interval (because it is not)? It is quite rare in statistics to provide confidence intervals – usually one provides either the standard deviation of the distribution or the standard error of the estimate of the mean. Still, I consider the 0.03 a very useful piece of information, and I’m grateful to any author that is dilligent enough to provide some information about the variation in the performance. I’d reject a paper that only provides the mean for a small dataset, or didn’t perform multiply replicated experiments.
As much as I’m concerned this is The Right Way of dealing with confidence intervals of cross-validated loss is to perform multiple replications of cross-validation, and provide the scores at appropriate percentiles. My level of agreement with the binomial model is about at the same level as your agreement with the Gaussian model. Probability of error is meaningless: there are instances that you can almost certainly predict right, there are instances that you usually misclassify, and there are boundary instances where the predictions of the classifier vary, depending on the properties of the split. Treating all these groups as one would be misleading.
Regarding de Finetti, one has to be careful: there is a difference between finite and infinite exchangeability. The theorem goes from *infinite* exchangeability to iid. When you have an infinite set, there is no difference between forming a finite sample by sampling-with-replacement (bootstrap) versus sampling-without-replacement (cross-validation). When you have a finite set to sample from, it’s two different breeds.
As for assumptions, they are all wrong… But some are more agreeable than others.
0.99(0.03) is somewhat better, but I suspect people still interpret it as a confidence interval, even when you explicitly state that it is not.
Another problem is that I still don’t know why it’s interesting. You assert it’s very interesting, but can you explain why? How do you use it? Saying 0.99(0.03) seems semantically equivalent to saying “I achieved test set performance of 0.99 with variation 0.03 across all problems on the UCI database”, except not nearly as reassuring because the cross-validation folds do not encompass as much variation across real-world problems.
On Binomial vs. Gaussian model: the Binomial model (at least) has the advantage that it is not trivially disprovable.
On probability of error: it’s easy to criticize any small piece of information as incomplete. Nevertheless, we like small pieces of information because we can better understand and use them. “How often should I expect the classifier to be wrong in the future” seems like an interesting (if incomplete) piece of information to me. A more practical problem with your objection is that distinguishing between “always right”, “always wrong” and “sometimes right” examples is much harder, requiring more assumptions, than distinguishing error rate. Hence, such judgements will be more often wrong.
I had assumed you were interested in infinite exchangeability because we are generally interested in what the data tells us about future (not yet seen) events. Analysis which is only meaningful with respect to known labeled examples simply doesn’t interest me, in the same way that training error rate doesn’t interest me.
Why bother to make a paper, at all? Why don’t you code stuff and throw it into e-market? There are forums, newsgroups, and selected “peers” for things that are incomplete and require some discussion.
No, 0.99(0.03) means 0.99 classification error across 90:10 training-test splits on a single data set. It is quite meaningless to try to assume any kind of average classification error across different data sets.
Regarding probability of error, if it’s easy to acquire this kind of information, why not do it?
Infinite exchangeability does not apply to a finite population. What do you do when I gather *all* the 25 cows from the farm and measure them? You cannot pretend that there are infinitely many cows in the farm. You can, however, wonder about the number of cows (2,5, 10, 25?) you really need to measure to characterize all the 25 with reasonable precision.
I maintain that future is unknowable. Any kind of a statement regarding the performance of a particular classifier trained from data should always be seen as relative to the data set.
This still isn’t answering my question: Why is 0.03 useful? I can imagine using an error rate in decision making. I can imagine using a confidence interval on the error rate in decision making. But, I do not know how to use 0.03 in any useful way.
Note that 0.99 means 0.99 average classification error across multiple 90:10 splits. 0.99(0.03) should mean something else if 0.03 is useful.
Your comment on exchangeability makes more sense now. In this situation, what happens is that (basically) you trade using a Binomial distribution for a Hypergeometric distribution to analyze the number of errors on the portion of the set you haven’t seen. The trade Binomial->Hypergeometric doesn’t alter intuitions very much because the distributions are fairly similar (Binomial is a particlular limit of the Hypergeometric, etc…)
0.03 gives you an indication of reliability, stability of a classifier. This relates to the old bias/variance tradeoff. A short bias/variance reading list:
Neural networks and the bias/variance dilemma
Bias, Variance, and Arcing Classifiers
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss
This still isn’t the answer I want. How is 0.03 useful? How do you use it?
The meaning of “stability” here seems odd. It seems to imply nothing about how the algorithm would perform for new problems or even for a new draw of the process generating the current training examples. Why do we care about this very limited notion of stability?
If you don’t mind a somewhat philosophical argument, examine the Figure 5 in Modelling Modelled. The NBC becomes highly stable beyond 150 instances. On the other hand, C4.5 has a higher average utility, but also a greater variation in its utility on the test set. Is it meaningful to compare both methods when the training set consists of ~100 instances? The difference in expected utility is negligible in comparison to the total amount of variation in performance.
This still isn’t answering my question. How and why do you use 0.03? There should be a simple answer to this, just like there are simple answeres for 0.99 and for confidence intervals about 0.99.
(I don’t want to spend time debating what is and is not “meaningful”, because that seems to vague.)
(0.03) indicates how much the classification accuracy is affected by the choice of the training data across the experiments. It quantifies the variance of the learned model. It describes that the estimate of classification accuracy across test sets of a certain size is not a number, it is a distribution.
I get my distribution of expected classification accuracy through sampling, and the only assumption is the fixed choice of the relative size of the training and test set. The purpose of (0.03) is to stress that the classification accuracy estimate depends on the assignment of instances to training or test set. You get your confidence interval starting from an arbitrary point estimate “0.99” along with a very strong binomial assumption, one that is invalidated by the above sampling experiments. It’s a simple answer alright, but a very dubious set of assumptions.
By now, I’ve listed sufficiently many papers that attempt to justify the bias/variance problem, and the purpose of (0.03) should be apparent in the context of this problem. Do you have a good reason for disagreeing with with the whole issue of bias/variance decomposition?
I know what (0.03) indicates, but this still doesn’t answer my question. How do we _use_ it? How is this information supposed to affect the choices that we make? The central question is whether or not (0.03) is relevant to decision making, and I don’t yet see that relevance.
“Binomial distribution” is not the assumption. Instead, it is the implication. The assumption is iid samples. This assumption is not always true, but none of the experiments in the ‘modelling modeled’ reference seem to be the sort which disprove the correctness of the assumption. In particular, cutting up the data in several different ways and learning different classifiers with different observed test error rates cannot disprove the independence assumption.
This reminds me of Lance’s post on calibrating weather prediction numbers. The weatherman tells us that (subjective) probability of rain tomorrow is 0.8 How do (should) we use that? Now suppose we know something about the prior he used to come up with the 0.8 estimate. Does that change the way we use the number?
Re: Yaroslav – Yes, if the prior doesn’t match our own prior, we can squeeze out the update and update *our* prior.
Re: John – If you accept the bias/variance issue, then (0.03) is interesting therefore intrinsically useful I guess you don’t buy this. It concerns the estimation of risk, second-order probability (probability-of-probability), etc. The issue is that you cannot characterize the error rate reliably, and must therefore use a probability distribution. This is the same pattern as with introducing error rate because you cannot say whether a a classifier is always correct or always wrong.
A more practical utility is comparing two classifiers in two cases. In one case, the classifier A gets the classification accuracy of 0.88(0.31) and B gets 0.90(0.40). What probability would you assign to the statement “A is better than B?” in the absence of any other information? Now consider another experiment, where you get 0.88(0.01) for A and 0.90(0.01) for B.
Why would I want to assign a probability to “A is better than B”? How would you even do that given this information? And what does “better” mean?
a) What is the definition you use to do model selection? b) Any assignment is based upon a particular data set. c) “better” – lower aggregate loss on the test set.
a) I am generally inclined to avoid model selection because it is a source of overfitting. I would generally rather make a weighted integration of predictions. If pressed for computational reasons, I might choose the classifier with the smallest cross validation or validation set error rate.
I still don’t understand why you want to assign a probability.
b) I don’t understand your response. You give examples of 0.88(0.01) and 0.90(0.01). How do you use the 0.01 to decide?
c) I agree with your definition of better, as long as the test set is not involved in the cross validation process.
Interesting! Now I understand: all the stuff I’ve been talking about in this thread is very much about the tools and tricks in order to do model selection. But you dislike model selection, so obviously these tools and tricks may indeed seem useless.
a) If you have to make a choice, how easy is it for you to then state that A is better than B? It’s very rare that A would always be better than B. Instead, it may usually be better. Probability captures the uncertainty inherent to making such a choice. The probability of 0.9 means that in 90% of the test batches, A will be better.
b) With A:0.88(0.01) vs B:0.90(0.01), B will almost always be better than A. With A:0.88(0.1) vs B:0.90(0.1), we can’t really say which one will be better, and a choice could be arbitrary.
c) OK, but assume you have a certain batch of the data. That’s all you have. What do you do? Create a single test/train split, or create a bunch of them and ‘integrate out’ the dependence of your estimate on the particular choice?
Regarding the purpose of model selection. I’m sometimes working with experts, e.g. MD’s, who gathered the data and want to see the model. I train SVM, I train classification trees, I train NBC, I train many other things. Eventually, I would like to give them a single nicely presented model. They cannot evaluate or teach this ensemble of models. They won’t get insights from an overly complex model, they need something simpler, something they can teach/give to their ambulance staff to make decisions. So the nitty-gritty reality of practical machine learning has quite an explicit model complexity cost.
And one way of dealing with model complexity is model selection. It’s cold and brutal, but it gets the job done. The above probability is a way of quantifying how unjustified or arbitrary it is in a particular case. If it’s too brutal and if the models are making independent errors, then one can think about how to approximate or present the ensemble. Of course, I’d want to hand the experts the full Bayesian posterior, but how do I print it out on an A4 sheet of paper so that the expert can compare it to her intuition and experience?
Of course, I’m not saying that everyone should be concerned about model complexity and presentability. I am just trying to justify its importance to applied data analysis.
I understand that some form of predictor simplification/model selection is sometimes necessary.
a) I still don’t understand why you want to assign a probability to one being better than another. If we accept that model selection/simplification must be done, then it seems like you must make a hard choice. Why are probabilities required?
b) The reasoning about B and A does not hold on future data in general (and I am not interested in examples where we have already measured the label). In particular, I can give you learning algorithm/problem pairs in which there is a very good chance you will observe something which looks like a significant difference over cross validation folds, but which is not significant. The extreme example mentioned in this post shows you can get 1.00(0.00) and 0.00(0.00) for two algorithms producing classifiers with the same error rate.
c) If I thought there was any chance of a time ordering in the data, I would using a single train/test split with later things in the test set. I might also be tempted to play with “progressive validation” (although that’s much less standard). If there was obviously no time dependence, I might use k-fold cross validation (with _small_ k) and consider the average error rate a reasonable predictor of future performance. If I wanted to know roughly how well I might reasonably expect to do in the future and thought the data was i.i.d. (or effectively so), I would use the test set bound.
a) I consider 10-fold cross-validation to be a series of 10 experiments. For each of these experiments, we obtain a particular error rate. For a particular experiment, A might be better than B, but for a different experiment B would be better than A. Both probability and the standard deviations are ways of modelling the uncertainty that comes with this. If I cannot make a sure choice, and if modelling uncertainty is not too expensive, why not model it?
b) Any fixed method can be defeated by an adaptive adversary. I’m looking for a sensible evaluation protocol that will discount both overfitting and underfitting, and I realize that nothing is perfect.
c) I agree with your suggestions, especially with the choice of a small ‘k’. Still, I would stress that cross-validation is to be replicated multiple times, with several different permutations of the fold-assignment vector. Otherwise, the results are excessively dependent on a particular assignment to folds. If something affects your results, and if you are unsure about it, then you should not keep it fixed, but vary it.
a) I consider the notion that 10-fold cross validation is 10 experiments very misleading, because there can exist very strong dependencies between the 10 “experiments”. It’s like computing the average and standard deviations of the wheel locations of race car #1 and race car #2. These simply aren’t independent, and so the amount of evidence they provide towards “race car #1 is better than race car #2″ is essentially the same as the amount of evidence given by “race car #1 is in front of race car #2″.
b) Pleading “but nothing works in general” is not convincing to me. In the extreme, this argument can be used to justify anything. There are some things which are more robust than other things, and it seems obvious that we should prefer the more robust things. If you use confidence intervals, this nasty example will not result in nonsense numbers, as it does with the empirical variance approach.
You may try to counterclaim that there are examples where confidence intervals fail, but the empirical variance approach works. If so, state them. If not, the confidence interval approach at least provides something reasonable subject to a fairly intuitive assumption. No such statement holds for the empirical variance approach.
c) I generally agree, as time allows.
I agree about b), but continue to disagree about a). The argument behind it is somewhat intricate. We’re estimating something random with a non-random set of experiments. Let me pose a small problem/analogy: if you wanted to use monte carlo sampling to estimate the area of a certain shape in 2D, but you can only take 10 samples, would you draw these samples purely at random? You would not, because you would risk the chance that you’d sample the same point twice, and would gain no information. Cross-validation is a bit like that: it tries to diversify the samples in order to get a better estimate with fewer samples. Does it make sense?
No, it does not. Cross validation makes samples which are (in analogy) more likely to be the same than independent samples. That’s why you can get the 1.00(0.00) or 0.00(0.00) behavior.
Back to this tar baby I understand your concern, but it is inherent to *sampling without replacement* of instances as contrasted to *sampling with replacement* of instances. I was not arguing bootstrap or iid versus training/test split or cross-validation. I was arguing for cross-validation compared to random splitting into the training and test set.
It’s quite clear that i.i.d. is often incompatible with sampling without replacement, and I can demonstrate this experimentally. In some cases, i.i.d. is appropriate (large populations, random sampling), and in other cases splitting is appropriate (finite populations, exhaustive or stratified sampling). These two stances should be kept apart and not mixed, as seems to be the fashion. What should be a challenge is to study learning in the latter case.
I don’t understand what is meant by “incompatible” here.
Assuming m independent samples, what we know (detailed here) is that K-fold cross validation has a smaller variance, skew, or other higher order moment then a random train/test split with the test set of size m/K. We do not and cannot (fully) know how much smaller this variance is. There exist examples where K-fold cross validation has the same behavior as a random train/test split.
If you want to argue that cross-validation is a good idea because it removes variance, I can understand that. If you want to argue that the individual runs with different held out folds are experiments, I disagree. This really is like averaging the position of wheels on a race car. It reduces variance (i.e. doesn’t let a race car with a missing wheel win), but it is still only one experiment (i.e. one race). If you want more experiments, you should not share examples between runs of the learning algorithm.
Incompatible means that assuming i.i.d. within the classifier will penalize you if the classifier is evaluated using cross-validation: the classifier is not as confident as it can afford to be. I’m not arguing that CV is better, I’m just arguing that it’s different. I try to be agnostic with respect to evaluation protocols, and adapt to the problem at hand. CV tests some things, bootstrap other things, each method has its pathologies, but advocating a single individual train/test split is complete rubbish unless you’re in highly cost-constrained adversial situation.
But now I’ll play the devil’s advocate again. Assume that I’m training on 10% and testing on 90% of data in “-10″-fold CV. Yes, the experiments are not independent. Why should they be? Why shouldn’t I exhaustively test all the tires of the car in four *dependent* experiments? Why shouldn’t I test the blood pressure of every patient just once, even if this makes my experiments dependent? Why shouldn’t I hold out for validation each and every choice of 10% of instances? Why is having this kind of dependence any less silly than sampling the *same* tire multiple times in order to keep the samplings “independent”? Would it be less silly than sampling just one tire and compute a bound based on that single measurement, as any additional measure could be dependent? Why is using a Gaussian to model the heights of *all* the players in a basketball team silly, even if the samples are not independent?
The notion that “advocating a single individual train/test split is complete rubish except in a cost constrained adversarial situation” is rubbish. As an example, suppose you have data from wall street and are trying to predict stock performance. This data is cheap and plentiful, but the notion of using cross validation is simply insane due to the “survivor effect”: future nonzero stock price is a strong and unfair predictor of past stock price. If you try to use cross validation, you will simply solve the wrong problem.
What’s happening here is that cross validation relies upon identicality of data in a far more essential manner than just having a training set and a test set. It is essential to understand this in considering methods for looking at your performance.
For your second point, I agree with the idea of reducing variance via cross validation (see second paragraph of comment 42) when the data is IID. What I disagree with is making confidence interval-like statements about the error rate based upon these nonindependent tests. If you want to know that one race car is better than another, you run them both on different tracks and observe the outcome. You don’t average over their wheel positions in one race and pretend that each wheel position represents a different race.
Well, of course neither cross-validation nor bootstrap makes sense when the assumption of instance exchangeability is clearly not justified. It was very funny to see R. Kalman make this mistake in http://www.pnas.org/cgi/content/abstract/101/38/13709/ – a journalist noticed this and wrote a pretty devastating paper on why peer review is important. My comment on “rubbish” was in the context of the validity of instance exchangeability, of course.
Regarding your note on “reducing variance”: I believe that you’re trying to find some benefit of cross-validation in the context of IID. Although you might do that, the crux of my message is that finite exchangeability (FEX) exercised by CV is different from infinite exchangeability (iid) exercised by bootstrap. Finite exchangeability has value on its own, not just as an approximation to infinite exchangeability. In fact, I’d consider finite exchangeability as primary, and infinite exchangeability as n approximation to it. I guess that your definition of confidence interval is based upon IID, so if I do “confidence intervals” based on FEX, it may look wrong.
I hope that I understand you correctly. What I’m suggesting is to allow for and appreciate the assumption of finite exchangeability, and build theory that accomodates for it. Until then, it would be unfair to dismiss empirical work assuming FEX in some places just because most theory work assumes IID.
I’ve worked on FEX confidence intervals here. The details change, but not the basic message w.r.t. the IID assumption.
The basic issue we seem to be debating, regardless of assumptions about the world, is whether we should think of the different runs of cross validation as “different” experiments. I know of no reasonable assumption under which the answer is “yes” and many reasonable assumptions under which the answer is “no”. For this conversation to be further constructive, I think you need to (a) state a theorem and (b) argue that it is relevant.
[...] Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here: including “reprobleming”, “data set selection”, and probably “overfitting by review”. It isn’t too surprising to observe this: when the testers of a drug have $109 or more riding on the outcome the temptation to make the outcome “right” is extreme. [...]
Useful list. Should be made required reading for students of ML.