数据挖掘与Taco Bell编程

原作者:
来源Data Mining and Taco Bell Programming
译者caiwei

Programmer Ted Dziuba suggests an alternative to traditional program that he called "Taco Bell Programming." The Taco Bell chain creates multiple menu items from about eight different ingredients. Dziuba wants to be able to be able to create many applications with combinations of about eight different shell commands.

数据挖掘与Taco Bell编程_第1张图片程序员Ted Dziuba提出了一种他命名为“Taco Bell编程”的方案用于替代传统编程。Taco Bell链使用大概八种不同的元素来创建多种菜单项。Diziuba希望通过组合使用大概八种不同的shell脚本命令来创建多种应用程序。

Here's an example from Dziuba:

这里有一个Dziuba提供的例子:

Here's a concrete example: suppose you have millions of web pages that you want to download and save to disk for later processing. How do you do it? The cool-kids answer is to write a distributed crawler in Clojure and run it on EC2, handing out jobs with a message queue like SQS or ZeroMQ.

这是一个具体的例子:假设你需要下载几百万个网页并存储到硬盘中以便于后期处理。你应该怎么作?最直接的办法就是用Clojure写一个分布式网络爬虫并在EC2上运行,用类似SQS或ZeroMQ的消息队列来分别处理。

The Taco Bell answer? xargs and wget. In the rare case that you saturate the network connection, add some split and rsync. A "distributed crawler" is really only like 10 lines of shell script.

那么Taco Bell的做法呢?使用xargs和wget。在特殊情况下,比如你占满了网络连接资源的话,你可以添加一些拆分和同步。一个“分布式的网络爬虫”真的仅仅需要大概10行的shell脚本代码。

Dziuba gives another example. Instead of using Hadoop to process that data once you have it, you can use:

Dziuba提供了另外一个例子。与使用Hadoop来处理你所获得的信息不同,你可以使用:

find crawl_dir/ -type f -print0 | xargs -n1 -0 -P32 ./process

find crawl_dir/ -type f -print0 | xargs -n1 -0 -P32 ./process

"It is a viable way to deal with massive data problems, at least for one-off jobs," Big data expert and ReadWriteWeb contributor Pete Warden says about Dziuba's Taco Bell programming concept. "You're trading off the ability to manage and tightly control the process against development speed."

“这是处理大量数据问题的一种可行方法,至少对于一次性的工作而言是很好的,” 著名数据专家和读写网的攥稿人Pete Warden对于Dziuba的Taco Bell编程概念如此说,“你可以权衡开发速度与你对代码的管理力、控制力。”

Do you have any favorite hacks like this?

你有类似的编程偏好么?

添加新评论

相关文章:

  微软资深软件工程师:阅读代码真的很难

  揭秘linux驱动程序----都是关于模块

  程序员:你的代码为谁而写

  Python之禅

  Scala 编程指南 (第五章,第二部分)

你可能感兴趣的:(数据挖掘,bell,taco)