hfs-delimited and lfs-delimited

Hey guys,

I've pushed a snapshot update to Cascalog that includes two new taps -- hfs-delimited and lfs-delimited. These support the same keyword options as the other hfs-* and lfs-* taps, with a few extras I'll detail below.

If any of you find these useful, I'd really appreciate it if you would give them a try and let me know how the API works out for you. This feature is available in either of the following builds:

[cascalog "1.8.7-SNAPSHOT"]
[cascalog "1.9.0-wip8"]

As an example, say you had a textfile with data like this:

exchange,stock_symbol,date,open,high,low,close,volume,adj
NYSE,AA,2008-03-05,37.01,37.9,36.13,36.6,17752400,36.6
NYSE,AA,2008-03-04,38.85,39.28,38.26,38.37,11279900,38.37


The default separator is a tab character, so the standard hfs-delimited tap with no options would produce 1-tuples with a single line of text:

(hfs-delimited "/path/to/file")
;; makes textlines

The ":delimiter" option allows you to change this:

(hfs-delimited "/pathto/data"
:delimiter ",")

;; produces 9-tuples, all strings

Now we have the problem of the header line getting in the way. :skip-header? to the rescue:

(hfs-delimited "/pathto/data"
:delimiter ","
:skip-header? true)

;; produces 9-tuples of strings

Next, if you include a vector of classes with the :classes keyword, the tap will do class conversions on the fields for you:

(hfs-delimited "/pathto/data"
:delimiter ","
:classes [String String String Float Float Float Float Integer Float]
:skip-header? true)

;; produces 9-tuples with the above classes -- numbers are parsed properly, strings stay strings.

Finally, by providing :outfields you gain the ability to select out specific fields by name:

(def stock-tap
(hfs-delimited "/pathto/data"
:delimiter ","
:outfields ["?exchange" "?stock-sym" "?date" "?open" "?high" "?low" "?close" "?volume" "?adj"]
:classes [String String String Float Float Float Float Integer Float]
:skip-header? true))


(select-fields stock-tap ["?stock-sym" "?open"])
;; returns 2-tuples of [String, Float] pairs representing the stock symbol and opening price for each line.

Looking forward to hearing your feedback! The API here will probably change a bit before release, so get your notes in now.

Cheers,


http://grokbase.com/t/gg/cascalog-user/123ky5apsx/new-taps-hfs-delimited-and-lfs-delimited

你可能感兴趣的:(hfs-delimited and lfs-delimited)