多元回归训练数据和测试数据
I just had a great one on one coding learning session with a good friend of mine over lunch. He's trying to take his coding skills to the "next level." Just as we plateau when we work out physically, I think we can plateau when coding and solving problems. That one of the reasons I try to read a lot of code to be a better developer and why I started The Weekly Source Code. He said it was OK to write about this as someone else might benefit from our discussion. The code and problem domain have been changed to protect the not-so-innocent.
我和我的一个好朋友共进午餐,进行了一场很棒的一对一编码学习课程。 他正在尝试将其编码技能提高到“下一个水平”。 正如我们在进行身体锻炼时达到平稳状态一样,我认为在编码和解决问题时也可以达到稳定状态。 这就是我尝试阅读大量代码以成为更好的开发人员的原因之一,以及为什么我启动了Weekly Source Code的原因之一。 他说可以写这篇文章,因为其他人可能会从我们的讨论中受益。 代码和问题域已更改,以保护不是很纯真的。
One of the things that we talked about was that some programmers/coders/developers have just a few tools in their toolbox, namely, if, for, and switch.
我们谈论的一件事是,一些程序员/编码器/开发人员的工具箱中只有几个工具,即if , for和switch 。
I'm not making any judgements about junior devs vs. senior devs. I'm talking about idioms and "vocab." I think that using only if, for and switch is the Computer Programmer equivalent of using "like" in every sentence. Like, you know, like, he was all, like, whatever, and I was like, dude, and he was, like, ewh, and I was like meh, you know?
对于初级开发人员还是高级开发人员,我没有做出任何判断。 我说的是成语和“ vocab”。 我认为,仅在if , for和switch时使用才等同于Computer Programmer在每个句子中使用“ like”。 就像,你知道,就像,他就是一切,就像,随便什么,我就像,伙计,他就像,呃,我就像……嗯,你知道吗?
When speaking English, by no means do I have a William F Buckley, Jr.-sized vocabulary, nor do I believe in being sesquipedal for sesquipedalianism's sake, but there is a certain sparkly joyfulness in selecting the right word for the right situation. I'm consistently impressed when someone can take a wordy paragraph and condense it into a crisp sentence without losing any information.
说英语时,我绝不会有小威廉姆·巴克利(William F Buckley,Jr.)的词汇量,也不相信会因为倍半神论而被归为半信半疑,但是为正确的情况选择正确的单词肯定会带来一些快乐。 当有人可以将一个冗长的段落并将其压缩为清晰的句子而又不丢失任何信息时,我始终感到印象深刻。
Refactoring code often brings me the same shiny feeling. Here's a few basic things that my friend and I changed in his application over lunch, turning wordy paragraphs into crisp sentences. Certainly this isn't an exhaustive list of anything, but perhaps it can act as a reminder to myself and others to be mindful and think about solving problems beyond, like, if and for and, like, switch, y'know?
重构代码通常给我带来同样的光泽感。 这是我和我的朋友在午饭时更改应用程序的一些基本操作,将冗长的段落变成清晰的句子。 当然,这并不是所有内容的详尽清单,但它可能会提醒我自己和其他人要保持警惕,并思考解决其他问题,例如,是否以及是否需要切换,知道吗?
He had some code that parsed a ridiculous XML document that came back from a Web Server. That the format of the XML was insane wasn't his fault, to be sure. We all have to parse crap sometimes. He had to check for the existence of a certain value and turn it into an Enum.
他有一些代码可以解析从Web服务器返回的可笑的XML文档。 可以肯定的是,XML格式的疯狂并非他的错。 有时我们所有人都要分析废话。 他必须检查某个值的存在并将其转换为枚举。
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("fog") >= 0)
{
wt = MyEnum.Fog;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("haze") >= 0)
{
wt = MyEnum.Haze;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("dust") >= 0)
{
wt = MyEnum.Haze;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("rain") >= 0)
{
wt = MyEnum.Rainy;
}
...and this went on for 40+ values. There's a few problems with this.
...然后继续进行40多个值。 这有一些问题。
First, he's using IndexOf() and ToLower() to when he's trying to say "ignoring case, does this string contain this other string?" Using ToLower() for a string comparison is always a bad idea, and not just because of the Turkish i problem (details here, here, here and here). Be sure to check out the Recommendations for Strings in .NET.
首先,当他试图说“忽略大小写,此字符串是否包含其他字符串?”时,他使用的是IndexOf()和ToLower()。 使用ToLower()进行字符串比较始终是一个坏主意,而不仅仅是因为土耳其语i问题(在此处,此处,此处和此处详细说明)。 确保检查.NET中的字符串建议。
We could make this simpler and more generic with a helper method that we'll use later. It expresses what we want to do pretty well. If we were using .NET 3.5 we could make this an extension method, but he's on 2.0.
我们可以使用稍后将使用的辅助方法使此方法更简单,更通用。 它表达了我们要做好的事情。 如果我们使用.NET 3.5,则可以将其作为扩展方法,但他使用的是2.0。
private static bool ContainsIgnoreCase(string s, string searchingFor)
{
return s.IndexOf(searchingFor, StringComparison.OrdinalIgnoreCase) >= 0;
}
Second, he's indexing into the Attributes collection over and over again, and he's hasn't "else cased" the other ifs, so every indexing into Attributes and every other operation runs every time. He can do the indexing once, pull it out, then check his values.
其次,他一次又一次地在Attributes集合中建立索引,而他并没有“其他情况”,因此每次对Attributes进行索引并执行其他所有操作。 他可以进行一次索引编制,将其取出,然后检查其值。
string ws = xmlNode.Attributes["someAttr"].Value;
if (ContainsIgnoreCase(ws, "cloud"))
wt = MyEnum.Cloudy;
else if (ContainsIgnoreCase(ws, "fog"))
wt = MyEnum.Fog;
else if (ContainsIgnoreCase(ws, "haze"))
wt = MyEnum.Haze;
else if (ContainsIgnoreCase(ws, "dust"))
wt = MyEnum.Dust;
else if (ContainsIgnoreCase(ws, "rain"))
wt = MyEnum.Rainy;
else if (ContainsIgnoreCase(ws, "shower"))
...and again, as a reminder, this goes on for dozens and dozens of lines.
……再次提醒一下,这持续了数十行。
We were talking this through step by step to explain my "from point a to point d" wait of thinking. I tend to skip b and c, so it's useful to show these incremental steps, kind of like showing all your work in Math class when doing long division.
我们正在逐步讨论这个问题,以解释我的“从点a到点d”的思考等待。 我倾向于跳过b和c,因此显示这些增量步骤非常有用,有点像在进行长除法时显示Math类中的所有工作。
At this point, I pointed out that he was clearly mapping the strings to the enums. Now, if the mapping was direct (and it's not for a variety of horrible many-to-one reasons that are spec-specific as well as that this a "contains" operations rather than a direct equality comparison) he could have parsed the string an enum like this where the last parameter ignores case:
在这一点上,我指出他显然是在将字符串映射到枚举。 现在,如果映射是直接的(并且不是出于各种特定于规格的可怕的多对一原因,并且这不是“直接”比较而是“包含”操作),他本可以解析该字符串这样的枚举,其中最后一个参数忽略大小写:
wt = (MyEnum)Enum.Parse(typeof(MyEnum), ws, true);
However, his mapping has numerous exceptions and the XML is messy. Getting one step simpler, I suggested making a map. There's a lot of folks who use Hashtable all the time, as they have for years in .NET 1.1, but haven't realized how lovely Dictionary
但是,他的映射有许多例外,并且XML混乱。 建议简化一步,我建议制作一张地图。 许多人一直在使用Hashtable,就像他们在.NET 1.1中使用多年一样,但是他们还没有意识到Dictionary
var stringToMyEnum = new Dictionary()
{
{ "fog", MyEnum.Fog},
{ "haze", MyEnum.Fog},
{ "fred", MyEnum.Funky},
//and on and on
};
foreach (string key in stringToMyEnum.Keys)
{
if (ContainsIgnoreCase(ws, key))
{
wt = stringToMyEnum[key];
break;
}
}
Because his input data is gross, he spins through the Keys collection and calls ContainsIgnoreCase on each key until settling on the right Enum. I suggested he clean up his input data to avoid the for loop, turning the whole operation into a simple mapping. At this point, of course, the Dictionary can be shoved off into some global readonly static resource.
因为他的输入数据很重要,所以他遍历Keys集合并在每个键上调用ContainsIgnoreCase,直到在正确的Enum上进行设置。 我建议他清理输入数据以避免for循环,从而将整个操作变成一个简单的映射。 当然,在这一点上,可以将Dictionary转换为某些全局只读静态资源。
string ws = xmlNode.Attributes["someAttr"].Value;
//preproccessing over ws to tidy up
var stringToMyEnum = new Dictionary()
{
{ "fog", MyEnum.Fog},
{ "haze", MyEnum.Fog},
{ "fred", MyEnum.Funky},
//and on and on
};
wt = stringToMyEnum[key];
Often large switches "just happen." What I mean is that one introduces a switch to deal with some uncomfortable and (apparently) unnatural mapping between two things and then it just gets out of hand. They tell themselves they'll come back and fix it soon, but by then it's grown into a hairball.
通常,大型交换机“只是发生”。 我的意思是说,人们引入了一种开关来处理两件事之间某些不舒服且(显然)不自然的映射,然后它就变得一发不可收拾。 他们告诉自己,他们会很快回来并修复它,但是到那时,它已经变成了毛发球。
My buddy had a method that was supposed to remove an icon from his WinForms app. The intent is simple, but the implementation became another mapping between a fairly reasonable Enum that he couldn't control, and a number of named icons that he could.control.
我的好友有一种应该从他的WinForms应用程序中删除图标的方法。 意图很简单,但是实现成为他无法控制的相当合理的Enum与他可以控制的多个命名图标之间的另一种映射。
The key here is that he could control the icons and he couldn't easily control the enum (someone else's code, etc). That the mapping was unnatural was an artifact of his design.
这里的关键是他可以控制图标,并且不能轻易控制枚举(其他人的代码,等等)。 映射不自然是他设计的产物。
The next thing he knew he was embroiled in a switch statement without giving it much thought.
接下来,他知道自己陷入了一个转换声明中,没有多加考虑。
private void RemoveCurrentIcon()
{
switch (CurrentMyEnumIcon)
{
case MyEnum.Cloudy:
CustomIcon1 twc = (CustomIcon1)FindControl("iconCloudy");
if (twc != null)
{
twc.Visible = false;
this.Controls.Remove(twc);
}
break;
case MyEnum.Dust:
CustomIcon2 twd = (CustomIcon2)FindControl("iconHaze");
if (twd != null)
{
twd.Visible = false;
this.Controls.Remove(twd);
}
break;
//etc...for 30+ other case:
There's a few things wrong here, besides the switch is yucky. (Yes, I know there are uses for switches, just not here.) First, it's useful to remember when dealing with a lot of custom classes, in this case CustomIcon1 and CustomIcon2 that they likely have a common ancestor. In this case a common ancestor is Control.
除了开关很麻烦之外,这里还存在一些错误。 (是的,我知道开关有用途,只是在这里没有。)首先,记住许多自定义类时很有用,在这种情况下,CustomIcon1和CustomIcon2可能具有相同的祖先。 在这种情况下,共同祖先是控制。
Next, he can control the name of his controls. For all intents in his WinForms app, the Control's name is arbitrary. Because his controls map directly to his Enum, why not literally map them directly?
接下来,他可以控制其控件的名称。 对于WinForms应用程序中的所有意图,控件的名称都是任意的。 因为他的控件直接映射到他的Enum,为什么不直接从字面上映射它们呢?
private void RemoveCurrentIcon(MyEnumIcon CurrentMyEnumIcon)
{
Control c = FindControl("icon" + CurrentMyEnumIcon);
if(c != null)
{
c.Visible = false;
Controls.Remove(c);
}
}
The recognition of the shared base class along with the natural Enum to Control name mapping turns 160 lines of switch into a single helper method.
共享基类的识别以及从自然的Enum到Control名称的映射将160行开关转换为单个辅助方法。
It's very easy, seductive even, to try to explain to the computer to "do it like this" using if, for and switch. However, using even the basic declarative data structures to describe the shape of the data can help you avoid unnecessary uses of the classic procedural keywords if, for and switch.
尝试使用if,for和switch向计算机解释“这样做”非常容易,甚至很诱人。 但是,即使使用基本的声明性数据结构来描述数据的形状,也可以帮助您避免不必要地使用经典过程关键字if,for和switch。
FINAL NOTE: We worked for a while and turned and 8100 line long program into 3950 lines. I think we can squeeze another 1500-2000 lines out of it. We did the refactoring with the new Resharper 4.0 and a private daily build of CodeRush/Refactor!Pro. I'll do some writeups on both wonderful tools in the coming weeks.
最后的提示:我们工作了一段时间,然后将8100行的长程序转换为3950行。 我认为我们可以再压缩1500-2000行。 我们使用新的Resharper 4.0和每日私有构建的CodeRush / Refactor!Pro进行了重构。 在接下来的几周中,我将对这两种出色的工具进行一些撰写。
翻译自: https://www.hanselman.com/blog/back-to-basics-life-after-if-for-and-switch-like-a-data-structures-reminder
多元回归训练数据和测试数据