Kx systems公司的创始人之一Arthur Whitney在2003年研发了列式数据库KDB和它的操作语言Q。 官网:www.kx.com
主要Feature:
KDB+(Q)入门 (以下文档来自网络,记下备用)
Now that we know how q works and how to start it up, let'sexamine some real code that shows the power of q. The following program reads acsv file of time-stamped symbols and prices, places the data into a table andcomputes the maximum price for each day. It then opens a socket connection to aq process on another machine and retrieves a similar daily aggregate. Finally,it merges the two intermediate tables and appends the result to an existingfile.
sample:{
t:("DSF"; enlist ",") 0: `:c:/q/data/px.csv;
tmpx:select mpx:max Price by Date,Sym from t;
h:hopen `:aerowing:5042;
rtmpx:h "select mpx:max Price by Date,Sym from tpx";
hclose h;
.[`:c:/q/data/tpx.dat; (); ,; rtmpx,tmpx]
}
Contents
|
All data is ultimately built from atoms, so we begin withatoms. An atom is an irreducible value with a specific data type. Thebasic data types in q correspond to those of SQL with some additional date andtime related types that facilitate time series. We summarize the data types inthe tables below, giving the corresponding types in SQL, and where appropriateJava and C#. We cover enumerations inCasting and Enumerations.
Q |
SQL |
Java |
C# |
boolean |
boolean |
Boolean |
Boolean |
byte |
byte |
Byte |
Byte |
short |
smallint |
Short |
Int16 |
int |
int |
Integer |
Int32 |
long |
bigint |
Long |
Int64 |
real |
real |
Float |
Single |
float |
float |
Double |
Double |
char |
char(1) |
Character |
Char |
symbol |
varchar |
(String) |
(String) |
date |
date |
Date |
|
datetime |
datetime |
Timestamp |
!DateTime |
minute |
|||
second |
|||
time |
time |
Time |
!TimeSpan |
enumeration |
Note:The words boolean, short, int, etc. arenot keywords in q, so they arenot displayed in a special font in this text. They do have special meaning whenused as name arguments in some operators. You should avoid using them as names.
The next table collects the important information abouteach of the q data types. We shall refer to this in subsequent sections.
type |
size |
char type |
num type |
notation |
null value |
boolean |
1 |
b |
1 |
1b |
|
byte |
1 |
x |
4 |
0x26 |
0x00 |
short |
2 |
h |
5 |
42h |
0Nh |
int |
4 |
i |
6 |
42 |
0N |
long |
8 |
j |
7 |
42j |
0Nj |
real |
4 |
e |
8 |
4.2e |
0Ne |
float |
8 |
f |
9 |
4.2 |
0n |
char |
1 |
c |
10 |
"z" |
" " |
symbol |
* |
s |
11 |
`zaphod |
` |
month |
4 |
m |
13 |
2006.07m |
0Nm |
date |
4 |
d |
14 |
2006.07.21 |
0Nd |
datetime |
4 |
z |
15 |
2006.07.21T09:13:39 |
0Nz |
minute |
4 |
u |
17 |
23:59 |
0Nu |
second |
4 |
v |
18 |
23:59:59 |
0Nv |
time |
4 |
t |
19 |
09:01:02:042 |
0 |
enumeration |
* |
`u$v |
|||
dictionary |
99 |
`a`b`c!10 20 30 |
|||
table |
98 |
([] c1:`a`b`c; c2:10 20 30) |
The basic integer data type is common to nearly allprogramming environments.
An int is a signed four-byte integer. A numeric value isidentified as an int by that fact that it contains only numeric digits,possibly with a leading minus sign,without a decimal point. Inparticular, it has no trailing character that would indicate that it is anothernumeric type (see below). Here is a typical int value,
42
The other two integer data types are short and long. Theshort type represents a two byte signed integer and is denoted by a trailing'h' after optionally signed numeric digits. For example,
b:-123h
b
-123h
Similarly, the long type represents an eight byte signedlong integer denoted by a trailing 'j' after optionally signed numeric digits.
c:1234567890j
c
1234567890j
Important:Type promotion is performed automatically in q primitive operations. However,if a specific integer type is required in a list and a narrower type ispresented - e.g., an int is expected and a short is presented - the submittedtype willnot be automatically promoted and an error will result.This may be unintuitive for programmers coming from languages of C ancestry,but it will make sense in the context of tables.
Single and double precision floating point data types aresupported.
The float type represents an IEEE standard eight-bytefloating point number, often called "double" in other languages. Itis denoted by optionally signed numeric digits containing a decimal point withan optional trailing 'f'. A floating point number can hold at least 15 decimaldigits of precision.
For example,
pi:3.14159265
float1:1f
The real type represents a four-byte floating point numberand is denoted by numeric digits containing a decimal point and a trailing 'e'.Keep in mind that this type is called 'float' in some languages. A real canhold at least 6 decimal digits of precision, 7 being the norm. Thus
r:1.4142e
r
1.4142e
is a valid real number.
Note:The q console abbreviates the display of float or real values having zeros tothe right of the decimal.
2.0
2f
4.00e
4e
The behavior of substituting floating point types ofdifferent widths is analogous to the case of integer types.
Both float and real values can be specified in IEEEstandard scientific notation for floating point values.
f:1.23456789e-10
r:1.2345678e-10e
By default, the q console displays only seven decimaldigits of accuracy for float and real values by rounding the display in theseventh significant digit.
f
1.234568e-10
r
1.234568e-10e
You can change this by using the \P command (noteupper case) to specify a display width up to 16 digits.
f12:1.23456789012
f16:1.234567890123456
\P 12
f12
1.23456789012
f16
1.23456789012
\P 16
f12
1.23456789012
f16
1.234567890123456
Binary data can be represented as bit or byte values.
The boolean type uses one byte to store an individual bitand is denoted by the bit value followed by 'b'.
bit:0b
bit
0b
The byte type uses one byte to store 8 bits of data and isdenoted by '0x' followed by a hexadecimal value,
byte:0x2a
In handling binary data, q is more like C than itsdescendants, in that both binary types are considered to be unsigned integersthat can participate in arithmetic expressions or comparisons with othernumeric types. There are no keywords for 'true' or 'false', nor are thereseparate logical operators. With a and pi as above,
a:42
bit:1b
a+bit
43
is an int and
byte+pi
45.14159
is a float. Observe that type promotion has been performedautomatically.
There are two atomic character types in q. They resemblethe SQL types CHAR and VARCHAR more than the character types of verboselanguages.
A char holds an individual ASCII character and is stored inone byte. This corresponds to a SQL CHAR. A char is denoted by a singlecharacter enclosed in double quotes.
ch:"q"
ch
"q"
Some keyboard characters, such as the double-quote, cannotbe entered directly into a char since they have special meaning in q. As in C,these characters are escaped with a preceding back-slash ( \ ). While theconsole display also includes the escape, these are actually single characters.
ch:"\"" / double-quote
ch / console also displays the escape "\""
ch:"\\" / back-slash
ch:"\n" / newline
ch:"\r" / return
ch:"\t" / horizontal tab
You can also escape a character with an underlying numericvalue expressed as three octal digits.
"\142"
"b"
A symbol holds a sequence of characters as a single unit. Asymbol is denoted by a leading back-quote (` ), also read "backtick" in q circles.
s1:`q
s2:`zaphod
A symbol is irreducible, meaning that the individualcharacters that comprise it arenot directly accessible. Symbols areoften used in q to hold names of other entities.
Important:A symbol isnot a string. We shall see inlists that there is an analogue of strings inq, namely a list of char. While a list of char is a kissing cousin to a symbol,we emphasize that a symbol isnot made up of char. The symbol`a and thechar "a" are not the same. The char"q" and thesymbol`kdb are both atomic entities.
Advanced:Youmay ask whether a symbol can include embedded blanks and special characterssuch as back-tick. The answer is yes. You create such a symbol using therelationship between lists of char and symbols. See Creating Symbols from Stringsfor more on this.
`$"A symbol with `backtick"
`A symbol with `backtick
Note:A symbol is somewhat akin a SQL VARCHAR, in that it can hold and arbitrarynumber of characters. It is different in that it is atomic. The char "q"and the symbol `kdb are both atomic entities.
A major benefit of q is that it can process both timeseries and relational data in a consistent and efficient manner. Q extends thebasic SQL date and time data types to facilitate temporal arithmetic, which isminimal in SQL and can be clumsy in verbose languages (e.g., Java's datelibrary and its use of time zones). We begin with the equivalents to SQLtemporal types. The additional temporal types in q deal with constituents of adate or time.
A date is stored in four bytes and is denoted by yyyy.mm.dd,where yyyy represents the year, mm the month and dd theday. A date value stores the count of days from Jan 1, 2000.
d:2006.07.04
d
2006.07.04
Important:Months and days begin at 1 (not zero) so January is '01'.
Leading zeroes in months and days are required; theiromission causes an error.
bday:2007.1.1
'2007.1.1
Advanced:The underlying day count can be obtained by casting to int.
`int$2000.02.01
31
A time is stored in four bytes and is denoted by hh:mm:ss.uuuwhere hh represents hours on the 24-hour clock, mm representsminutes, ss represents seconds, and uuu represents milliseconds.A time value stores the count of milliseconds from midnight.
t:09:04:59.000
t
09:04:59:000
Again, leading zeros are required in all constituents of atime.
Advanced:The underlying millisecond count can be obtained by casting to int.
`int$12:34:56.789
45296789
A datetime is the combination of a date and a time,separated by 'T' as in the ISO standard format. A datetime value stores thefractional day count from midnight Jan 1, 2000.
dt:2006.07.04T09:04:59:000
dt
2006.07.04T09:04:59:000
Advanced:The underlying fractional day count can be obtained by casting to float.
`float$2000.02.01T12:00:00.000
31.5
The month type uses four bytes and is denoted by yyyy.mmwith a trailing 'm'. A month values stores the count of months since thebeginning of the year.
mon:2006.07m
mon
2006.07m
Advanced:The underlying month offset can be obtained by casting to int.
`int$2000.04m
3
The minute type uses four bytes and is denoted by hh:mm.A minute value stores the count of minutes from midnight.
mm:09:04
mm
09:04
Note:We did not usemin for the variable name becausemin is a reserved name inq.
Advanced:The underlying minute offset can be obtained by casting to int.
`int$01:23
83
The second type uses four bytes and is denoted by hh:mm:ss.A second value stores a count of seconds from midnight.
sec:09:04:59
sec
09:04:59
The representation of the second type makes it look like aneveryday time value. However, a q time value is a count of milliseconds frommidnight, so the underlying values are different.
Advanced:The underlying values can be obtained by casting to int. This manifests theinequality.
`int$12:34:56
45296
`int$12:34:56.000
45296000
12:34:56=12:34:56.789
0b
The constituents of dates, times and datetimes can be extractedusing dot notation. The individual field values are all extracted as int. Thefield values of a date are named 'year', 'mm' and 'dd'.
d:2006.07.04
d.year
2006
d.mm
7
d.dd
4
Similarly, the field values of time are 'hh', 'mm', 'ss'.
t:12:45:59.876
t.hh
12
t.mm
45
t.ss
59
Note:At the time of this writing (Jun 2007) there is no syntax to retrieve themillisecond constituent. Use the construct,
t mod 1000
876
In addition to the individual field values, you can alsoextract higher-order constituents.
d.month
2007.07m
t.minute
12:45
t.second
12:45:59
Of course, this works for a datetime as well.
dt:2006.07.04T12:45:59.876
dt.date
2006.07.04
dt.time
12:45:59.876
dt.month
2006.07m
dt.mm
7
dt.minute
12.45
Advanced:It is a quirk in q that dot notation for accessing temporal constituents doesnot work on function arguments. For example,
fmm:{[x] x.mm}
fmm 2006.09.15
{[x] x.mm}
'x.mm
Instead, cast to the constituent type,
fmm:{[x] `mm$x}
fmm 2006.09.15
9
In addition to the regular numeric and temporal values,special values represent infinities, whose absolute values are greater than any“normal” numeric or temporal value.
Token |
Value |
0w |
Positive float infinity |
0W |
Positive int infinity |
0Wh |
Positive short infinity |
0Wj |
Positive long infinity |
0Wd |
Positive date infinity |
0Wt |
Positive time infinity |
0Wz |
Positive datetime infinity |
0n |
NaN, or not a number |
Important:Observe the distinction between lower case 'w' and upper case 'W'.
The result of dividing any positive (or unsigned) non-zerovalue by any zero value is positive float infinity, denoted0w.Dividing a negative value by zero results in negative float infinity, denotedby-0w. The way to remember these is that 'w' looks like the infinitysymbol ∞.
The integral infinities can not be produced via anarithmetic division on normal int values, since the result of division in q isalways a float.
The result of dividing any 0 value by any zero value isundefined, so q represents this as the floating point null 0n.
The q philosophy is that any valid arithmetic expressionwill produce a result rather than an error. Therefore, dividing by 0 produces aspecial float value rather than an exception. You can perform a complexsequence of calculations without worrying about things blowing up in the middleor inserting cumbersome exception trapping. We shall see more about this inPrimitive Operations.
Advanced:While infinities can participate in arithmetic operations, infinite arithmeticis not implemented. Instead, q performs the operation on the underlying bitpatterns. Math propeller heads (including the author) find the followingdisconcerting.
0W-2
2147483645
2*0W
-2
The concept of a null value generally indicates missingdata. This is an area in which q differs from both verbose programminglanguages and SQL.
In such languages as C++, Java and C#, the concept of anull value applies to complex entities (i.e., objects) that are accessedindirectly by pointer or by reference. A null value for such an entitycorresponds to an un-initialized pointer, meaning that it has not been assignedthe address of an allocated block of memory. There is no concept of null forentities that are of simple or value type. For those types that admit null, youtest for being null by asking if the value is equal to null.
The NULL value in SQL indicates that the data value isinapplicable or missing. The NULL value is distinct from any value that canactually be contained in a field and does not have '=' semantics. That is, youcannot test a field for being null with = NULL. Instead, you ask if it IS NULL.Because NULL is a separate value, Boolean fields actually have three states: 0,1 and NULL.
In q, the situation is more interesting. While most typeshave distinct null values, some types have no designated way of representing anull value.
The following table summarizes the way nulls are handled.
type |
null |
boolean |
0b |
byte |
0x00 |
short |
0Nh |
int |
0N |
long |
0Nj |
real |
0Ne |
float |
0n |
char |
" " |
sym |
` |
month |
0Nm |
date |
0Nd |
datetime |
0Nz |
minute |
0Nu |
second |
0Nv |
time |
0Nt |
Let's start with the binary types. As you can see, theyhave no special null value, which means that null is equivalent to the valuezero. Consequently, you cannot distinguish between a missing boolean value andthe value that represents false.
In practice, this isn't an issue, since in mostapplications it isn't a critical distinction. It can be a problem if thedefault value of a boolean flag in your application is not zero, so you mustensure that this does not occur. A similar precaution applies to byte values.
Next, observe that all the numeric and temporal types havetheir own designated null values. Here the situation is similar to SQL, in thatyou can distinguish missing data from data whose underlying value is zero. Thedifference from SQL is that there is no universal null value.
The advantage of the q approach is that the null valueshave equals semantics. The tradeoff is that you must use the correct null valuein type-checked situations.
Finally, we consider the character types. Considering asymbol to a variable length character collection justifies why the symbol nullvalue is the empty symbol, designated by a back-tick (` ).
In contrast, the null value for the char type is the charconsisting of the blank character ( " " ). As with binary data, youcannot distinguish between a missing char value and a blank value. Again, thisis not seriously limiting in practice, but you should ensure that yourapplication does not rely on this distinction.
Note:The value"" isnot the char null. Instead, it is the empty list of char.
Contents[hide]
|
Data complexity is built up from atoms, which we know, andlists. It is important to achieve a thorough understanding of lists sincenearly all q programming involves processing lists. The concepts are simple butcomplexity can build rapidly. Our approach is to introduce the basic notion ofa general list in the first section, take a quick detour to cover simple andsingleton lists, then return to cover general lists in more detail.
A list is simply an ordered collection. A collection ofwhat, you ask. More precisely, alist is an ordered collection of atomsand other lists. Since this definition is recursive, let's start with thesimplest case in which the list comprises only atoms.
The notation for a general list encloses its items withinmatching parentheses and separates them with semicolons. For readability,optional whitespace is used after the semicolon separators in the last example.
(1;2;3)
("a";"b";"c";"d")
(`Life;`the;`Universe;`and;`Everything)
(-10.0; 3.1415e; 1b; `abc; "z")
In the preceding examples, the first three lists are simple,meaning that the list comprises atoms of uniform type. The last example is agenerallist, meaning that it is not simple. Otherwise put, a general list containsitems that are not atoms of a uniform type. This could be atoms of mixed type,nested lists of uniform type, or nested lists of mixed type.
Important:The order of the items in the list is positional (i.e., left-to-right) and ispart of its definition. The lists(1;2) and(2;1) are different. SQLis based on sets, which are inherently unordered. This distinction leads tosome subtle differences between the results of queries on q tables versus theresult sets from analogous SQL queries. The inherent ordering of lists makestime series processing natural and fast in q, while it is cumbersome andperforms poorly in standard SQL.
Lists can be assigned to variables exactly like atoms.
L1:(1;2;3)
L2:("z";"a";"p";"h";"o";"d")
L3:(`Life;`the;`Universe;`and;`Everything)
L4:(0b;1b;0b;1b;1b;0b)
L5:(-10.0;3.1415e;1b;`abc;"z")
The number of items in a list is its count. You canobtain the count of a list as follows,
count L1
3
This is our first example of a function, which we willlearn about in Functions. For now, we need onlyunderstand that count returns an int value equal to the number ofitems in a list to its right.
Observe that the count of any atom is 1.
count 42
1
count `abcd
1
A simple list - that is, a list of atoms of a uniform type- corresponds to the mathematical notion of avector. Such lists aretreated specially in q. They have a simplified notation, take less storage andcompute faster than general lists. Of course, you can use general list notationfor a vector, but q converts a general list to a vector whenever feasible.
A simple list of any numeric type omits the enclosingparentheses and replaces the separating semi-colons with blanks. The followingtwo expressions for a simple list of int are equivalent,
(100;200;300)
100 200 300
This is confirmed by the console display,
(100;200;300)
100 200 300
Similar notation is used for simple lists of short and longwith the addition of the type indicator.
H:(1h;2h;255h)
H
1 2 255h
We conclude that a trailing type indicator in the displayapplies to the entire list and not just the last item of the list; otherwise,the list would not be simple and would be displayed in general form.
G:(1; 2; 255h)
G
1
2
255h
Simple lists of float and real are notated similarly.Observe that the q console suppresses the decimal point when displaying a floathaving zero(s) to the right of the decimal, but the value is not an int.
F:(123.4567;9876.543;99.0)
F
123.4567 9876.543 99
This notational efficiency for float display means that alist of floats having no decimal parts displays with a trailingf.
FF:1.0 2.0 3.0
FF
1 2 3f
The simplified notation for a simple list of binary datajuxtaposes the individual data values together with a type indicator. The typeindicator for boolean trails the value.
bits:(0b;1b;0b;1b;1b)
bits
01011b
The indicator for byte leads,
bytes:(0x20;0xa1;0xff)
bytes
0x20a1ff
Note:A simple list of boolean atoms requires the same number of bytes to store as ithas atoms. While the simplified notation is suggestive, multiple bits arenotcompressed to fit inside a single byte. The list bits above holds itsvalues in 5 bytes of storage.
The simplified notation for simple lists of symbolsjuxtaposes the individual atoms with no intervening whitespace.
symbols:(`Life;`the;`Universe;`and;`Everything)
symbols
`Life`the`Universe`and`Everything
Inserting spaces between the atoms causes an error.
bad:`This `is `wrong
'is
The simplified notation for a list of char looks just likea string in most languages, with the juxtaposed sequence of characters enclosedin double quotes.
chars:("s";"o";" ";"l";"o";"n";"g")
chars
"so long"
Note:A simple list of char is called astring.
Lists can be defined using simplified notation,
L:100 200 300
H:1 2 255h
F:123.4567 9876.543 99.99
bits:01011b
bytes:0x20a1ff
symbols:`Life`the`Universe`and`Everything
chars:"so long"
Finally, we observe that a list entered as intermixed intsand floats is converted to a simple list of floats.
1 2.0 3
1 2 3f
Specifying a list of mixed temporal types has a differentbehavior from that of a list of mixed numeric types. In this case, the listtakes the type of the first item in the list; other items are widened ornarrowed to match.
12:34 01:02:03
12:34 01:02
01:02:03 12:34
01:02:03 12:34:00
To force the type of a mixed list of temporal values,append a type specifier.
01:02:03 12:34 11:59:59.999u
01:02 12:34 11:59
Lists with one or no items merit special consideration.
It is useful to have lists with no items. A pair ofparentheses with nothing (except possibly whitespace) between denotes the emptylist.
L:( )
L
-
We shall see in Creating Typed Empty Liststhat it is possible to define an empty list with a specific type.
There is a quirk in q regarding how it handles a listcontaining a single item, called asingleton. Creation of a singletonpresents a notational problem. To see the issue, first realize that a listcontaining a single atom is distinct from the individual atom. As any UPSdriver will readily tell you, an item in a box is not the same as an unboxeditem. By now, we recognize the following as atoms,
42
1b
0x2a
`beeblebrox
"z"
We also recognize the following are all lists with twoelements,
(42;6)
01b
`zaphod`beeblebrox
"zb"
(40;`two)
How to create a list of a single item? Good question. Theanswer is that there is no syntactic way to do so. You might think that youcould simply enclose the item in parentheses, but this doesn't work since theresult is an atom.
singleton:(42)
singleton
42
The reason for this is that parentheses are used formultiple purposes in q. As we have seen, paired parentheses are used to delimititems in the specification of a general list. Paired parentheses are also usedfor grouping in expressions - that is, to isolate the result of the expressioninside the parentheses. The latter usage forces (42) to be the same as the atom42 and so precludes the intention in the specification ofsingletonabove.
The way to make a list with a single item is to use the enlistfunction, which returns a singleton list containing what is to its right.
singleton:enlist 42
singleton
,42
To distinguishbetween an atom and the equivalent singleton, examine the sign of their types.
signum type 42
-1
signum type enlist 42
1
As a final check before moving on, make sure that youunderstand that the following also defines a list containing a single item,
singleton:enlist 1 2 3
count singleton
1
Recall that a list is ordered from left to right by theposition of its items. The offset of an item from the beginning of the list iscalled itsindex. Thus, the first item is has index 0, the second item(if there is one) has index 1, etc. A list of count n has index domain 0 ton-1.
Given a list L, the item at index i isaccessed by L[i]. Retrieving an item by its index is calleditemindexing. For example,
L:(-10.0;3.1415e;1b;`abc;"z")
L[0]
-10f
L[1]
3.1415e
L[2]
1b
L[3]
`abc
L[4]
"z"
Items in a list can also be assigned via item indexing.Thus,
L1:1 2 3
L1[2]:42
L1
1 2 42
Important:Index assignment into a simple list enforces strict type matching with no typepromotion. Otherwise put, when you reassign an item in a simple list, the typemust match exactly and a narrower type is not widened.
L:100 200 300
L[1]:42h
'type
f:100.0 200.0 300.0
f
100 200 300f
f[1]:400
'type
This may come as a surprise if you are accustomed tonumeric values always being promoted to wider types in a verbose language.
Providing an invalid data type for the index results in anerror.
L:(-10.0;3.1415e;1b;`abc;"z")
L[`1]
'type
If you attempt to index outside of the bounds of the list,the result is not an error. Rather, you get a null value. If the list issimple, this is the null for the type of atoms in the list. For general lists,the result is0n.
L[5]
0n
One way to understand this is that the result of asking fora non-existent index is "missing value." Keep this in mind, sinceindexing one position past the end of the list is easy to do, especially ifyou're not used to indexing relative to 0.
An empty index returns the entire list.
L[]
-10f
3.1415e
1b
`abc
"z"
Note:An empty index isnot the same as indexing with an empty list. Thelatter returns an empty list.
L[()]
_
The syntactic form double-colon ( :: ) denotes thenull item, which allows explicit notation or programmatic generation of anempty index.
L[::]
-10f
3.1415e
1b
`abc
"z"
Advanced:The type of the null item is undefined; in particular, its type does not matchthat of any normal item in a list. As a consequence, inclusion of the null itemin a list forces the list to be general.
L:(1;2;3;::)
L
1
2
3
::
type L
0h
This can be used to avoid a nasty surprise when q is tooclever. To see how, consider the general list,
L:(1;2;3;`a)
type L
0h
Now, reassign the last item to an int and note what happensto the list.
L[3]:4
L
1 2 3 4
type L
6h
The list has been converted to a simple list of int! Asubsequent attempt to reassign the last item back to its original value failswith a type error.
L[3]:`a
'type
This can be circumvented by placing a null item in thelist, forcing it to remain general.
L:(1;2;3;`a;::)
L[3]:4
L
1
2
3
4
::
type L
0h
L[3]:`a
L
1
2
3
`a
::
Lists can be created from variables.
L1:(1;2;100 200)
L2:(1 2 3;‘ab`c)
L6:(L1;L2)
L6
1 2 100 200
1 2 3 `ab `c
We scoop our presentation on operations in the next chapterto describe an important operation on lists. Probably the most common operationon two lists is to join them together to form a larger list. More precisely,the join oerator (,) appends its right operand to the end of the left operandand returns the result. It accepts an atom in either argument.
1 2,3 4 5
1 2 3 4 5
1,2 3 4
1 2 3 4
1 2 3,4
1 2 3 4
Observe that if the arguments are not of uniform type, theresult is a general list.
1 2 3,4.4 5.5
1
2
3
4.4
5.5
1 2 3,"ab"
1
2
3
"a"
"b"
Note:To accept either a scalar or a list x and produce a uniform shape, use theidiom,
(),x
which always yields a list with the content of x.
Thus far, we have viewed a list as a static collection ofits items. We can also consider a list to be a mapping provided by itemindexing. Specifically, a listL of count n represents a monadicmapping over the domain of non-negative integers 0,...,n-1. The list mappingassigns the output valueL[ i] to the input value i.Succinctly, the I/O association for the list is,
i ——> L[ i]
Here are the I/O tables for some basic lists:
101 102 103 104
I |
O |
0 |
101 |
1 |
102 |
2 |
103 |
3 |
104 |
(`a; 123.45; 1b)
I |
O |
0 |
`a |
1 |
123.45 |
2 |
1b |
(1 2; 3 4)
I |
O |
0 |
1 2 |
1 |
3 4 |
The first two examples demonstrate ranges of a collectionof atoms. The last example has a range comprised of lists.
A list not only looks like a map, it is a map whosenotation is a shortcut for the I/O table assignment. This is a useful way oflooking at things. We shall see inPrimitive Operations that anested list can be viewed as a multivalent map whose range is atoms.
From the perspective of list as map, the fact that indexingoutside the bounds of a list returns null means the map is implicitly extendedto the domain of all integers with null values outside the list items.
Data complexity is built by using lists as items of lists.
Now that we're comfortable with simple lists, we return togeneral lists. We can nest by including lists as items of lists. The number oflevels of nesting for a list is called itsdepth. Atoms are consideredto have depth 0 and simple lists have depth 1.
The notation of complex lists reflects their nesting. Forpedagogical purposes, in this section, we shall often use general notation todefine even simple lists; however, the console always display lists insimplified form. In subsequent sections, we shall use only simplified notationfor simple lists.
Following is a list of depth 2 that has three items, thefirst two being atoms and the last a list.
L1: (1;2;(100;200))
count L1
3
Following is the simplified notation for the inner list,
L1:(1;2;100 200)
L1
1
2
100 200
We present a pictorial representation that may help invisualizing levels of nesting. An atom is represented as a circle containingits value. A list is represented as a box containing its items. A general listis a box containing boxes and atoms.
Following is a list of depth two having two elements, eachof which is a simple list,
L2:((1;2;3);(`ab;`c))
L2
1 2 3
`ab`c
count L2
2
Following is a list of depth two having three elements,each of which is a general list,
L3:((1;2h;3j);("a";`bc);(1.23;4.56e))
L3
(1;2h;3j)
("a";`bc)
(1.23;4.55999994278e)
count L3
3
Following is a list of depth two having one item that is asimple list,
L4:enlist 1 2 3 4
L4
1 2 3 4
count L4
1
L4[0]
1 2 3 4
Following is list of depth three having two items. Thesecond item is a list of depth two having three items, the last of which is asimple list of four items.
L5:(1;(100;200;(1000;2000;3000;4000)))
L5
1
(100;200;1000 2000 3000 4000)
count L5
2
count L5[1]
3
Following is a "rectangular" list that can bethought of as a 3x4 matrix,
m:((11;12;13;14);(21;22;23;24);(31;32;33;34))
m
11 12 13 14
21 22 23 24
31 32 33 34
It is possible to index directly into the items of a nestedlist.
Retrieving an item via a single index always retrieves anuppermost item from a nested list.
L:(1;(100;200;(1000;2000;3000;4000)))
L[0]
1
L[1]
100
200
1000 2000 3000 4000
Recalling that q evaluates expressions from right-to-left,we interpret the second retrieval above as,
· Retrieve the item at index 1from L
Alternatively, reading it functionally as left-of-right,
· Retrieve from L the item atindex 1
Since the result L[1] is itself a list, we canretrieve its elements using a single index.
L[1][2]
1000 2000 3000 4000
Read this as:
· Retrieve the item at index 2from the item at index 1 in L
or,
· Retrieve the item at index 1from L, and from it retrieve the item at index 2
We can repeat single indexing once more to retrieve an itemfrom the innermost nested list.
L[1][2][0]
1000
Read this as,
· Retrieve the item from index 0from the item at index 2 in the item at index 1 in L
or,
· Retrieve the item at index 1from L, and from it retrieve the item at index 2, and from it retrieve the itemat index 0
There is an alternate notation for repeated indexing intothe constituents of a nested list. The last retrieval can also be written as,
L[1;2;0]
1000
Retrieving inner items for a nested list with this notationis called indexing at depth.
Important:The semicolons in indexing at depth are critical.
Assignment via index also works at depth.
L:(1;(100;200;(1000 2000 3000 4000)))
L[1;2;0]:999
L
1
(100;200; 999 2000 3000 4000)
To verify that the notation for indexing at depth isreasonable, we return to our matrix example,
m:((11;12;13;14);(21;22;23;24);(31;32;33;34))
m[0;2]
13
m[0][2]
13
The indexing at depth notation suggests thinking of mas a multi-dimensional matrix, whereas repeated single indexing suggeststhinking ofm as an array of arrays.Chacun à son goût.
A list of positions can be used to index a list.
In this section, we begin to see the power of q formanipulating lists. We start with,
L1:100 200 300 400
We know how to index single items of the list
L1[0]
100
L1[2]
300
By extension, we can retrieve a list of multiple items viamultiple indices,
L1[0 2]
100 300
The indices can be in any order, and the correspondingitems are retrieved,
L1[3 2 0 1]
400 300 100 200
An index can be repeated,
L1[0 2 0]
100 300 100
Some more examples,
bits:01101011b
bits[0 2 4]
011b
chars:"beeblebrox"
chars[0 7 8]
"bro"
This explains why including the semi-colon separators isessential when indexing at depth. Leaving them out effectively specifiesmultiple indices, and you will get a corresponding list of values from the toplevel as a result.
You have no doubt noticed that retrieving items viamultiple indices looks just like we've substituted a list for the index.Indeed, this is exactly what is happening. Here are some examples of a simpleindex list,
I:3 2 0
L1[I]
400 300 100
L2:(-10.0;3.1415e;1b;`abc;"z")
L2[I]
`abc
1b
-10f
L3:(1;(100;200;(1000;2000;3000;4000));5;(600 700))
L3
1
(100 200; 1000 2000 3000 4000)
5
600 700
J:2 1 0
L3[J]
5
(100 200; 1000 2000 3000 4000)
1
Observe that in every case, the result of indexing a givenlist via a simple list is a new list whose values are retrieved from the firstlevel of the given list and whose shape is the same as the index list. Inparticular, the retrieved list has the same shape as the index list. Thissuggests the behavior with an index that is a non-simple list.
L1:100 200 300 400
L1[(0 1; 2 3)]
100 200
300 400
I:(1;(0;(3 2)))
L1[I]
200
(100;400 300)
To figure out the result of indexing by any non-simplelist, start with the fact that the result always has the same shape as theindex.
Advanced:More precisely, the result of indexing via a list conforms to the index list.The notion ofconformability of lists is defined recursively. All atomsconform. Two lists conform if they have the same number of items and each oftheir corresponding items conform. In plain language, two lists conform if theyhave the same shape.
Recall that a list item can be assigned via item indexing,
L:100 200 300 400
L[0]:1000
L
1000 200 300 400
Assignment via index extends to indexing via a simple list.
L:100 200 300 400
L[1 2 3]:2000 3000 4000
L
100 2000 3000 4000
Note:Assignment via a simple index list is processed in index order - i.e., fromleft-to-right. Thus,
L[3 2 1]:999 888 777
is equivalent to,
L[3]:999
L[2]:888
L[1]:777
Consequently, in the case of a repeated item in the indexlist, the right-most assignment prevails.
L:100 200 300 400
L[0 1 0 3]:1000 2000 3000 4000
L
3000 2000 300 4000
You can assign a single value to multiple items in a listby indexing on a simple list and using an atom for the assignment value.
L:100 200 300 400
L[1 3]:999
L
100 999 300 999
Now that we're familiar with retrieving and assigning viaan index list, we introduce a simplified notation. It is permissible to leaveout the brackets and juxtapose the list and index with a separating blank. Someexamples follow.
L:100 200 300 400
L[0]
100
L 0
100
L[2 1]
300 200
L 2 1
300 200
I:2 1
L[I]
300 200
L I
300 200
L[::]
100 200 300 400
L ::
100 200 300 400
Which notation you use is a matter of personal preference.In this manual, we usually use brackets, since this notation is probably mostfamiliar from verbose programming. Experienced q programmers often usejuxtaposition since it reduces notational density.
The dyadic primitive find ( ? ) returns the indexof the right operand in the left operand list.
1001 1002 1003?1002
1
Performing find on a list is the inverse to positionalindexing because it maps an item to its position.
If you try to find an item that is not in the list, theresult is an int equal to the count of the list.
1001 1002 1003?1004
3
The way to think of this result is that the position of anitem that is not in the list is one past the end of the list, which is where itwould be if you were to append it to the list.
Of course, find extends to lists of items.
1001 1002 1003?1003 1001
2 0
We return to the situation of indexing at depth for nestedlists. For simplicity, let's start with a list that looks like a matrix.
m:(1 2 3 4; 100 200 300 400; 1000 2000 3000 4000)
Analogy with traditional matrix notation suggests that wecould retrieve a row or column fromm by providing a"partial" index at depth. Indeed, this works.
m[1;]
100 200 300 400
m[;3]
4 400 4000
Observe that eliding the last index reduces to itemindexing at the top level.
m[1;]
100 200 300 400
m[1]
100 200 300 400
Note:In the previous example, the two syntactic forms have the same result, but thefirst more clearly connotes the situation.
The situation of eliding other than the first index is moreinteresting. The way to readm[;3] above is,
· Retrieve the items in the thirdposition from all items at the top level of m
Let's tackle another level of nesting.
L:((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the"))
L
(1 2 3;4 5 6 7)
(`a`b`c`d;`z`y`x`;`0`1`2)
("now";"is";"the")
L[;1;]
4 5 6 7
`z`y`x`
"is"
L[;;2]
3 6
`c`x`2
"w e"
Interpret L[;1;] as,
· Retrieve all items in thesecond position of each list at the top level
Interpret L[;;2] as,
· Retrieve the items in the thirdposition for each list at the second level
Observe that in L[;;2] the attempt to retrieve theitem at the third position of the string "is" resulted in the nullvalue " "; hence the blank in "w e" of the result.
Recommendation:In general, it will make things more evident if you donot omit trailingsemi-colons when eliding indices. For example, with L as above,
L[ ;;] / instead of L[]
L[1;;] / instead of L[1]
L[;1;] / instead of L[;1]
As the final exam for this section, let's combine an elidedindex with indexing by simple arrays. LetL be as above. Then we canretrieve a cross-section ofL using a combination of elided and listindices.
L[0 2;;0 1]
(1 2;4 5)
("no";"is";"th")
Interpret this as,
· Retrieve the items frompositions 0 and 1 from all columns in rows 0 and 2
In this section, we further investigate the matrix-likelists from the previous section. A "rectangular" list is a list oflists, all having the same count. Understand that this does not mean that arectangular list is necessarily a traditional matrix, since there can beadditional levels of nesting. For example, the following list is rectangularbecause each of its items has count three, but is not a matrix.
L:(1 2 3; (10 20; 100 200; 1000 2000))
L
1 2 3
10 20 100 200 1000 2000
In a rectangular list, elision of the second indexcorresponds to generalized row retrieval and elision of the first indexcorresponds to generalized column retrieval.
r:(`a`b`c;(1 2 3 4;10 20 30 40;100 200 300 400))
r[0;]
`a`b`c
r[;1]
`b
10 20 30 40
Advanced:A rectangular list can be transposed withflip (seeflip), meaning that that therows and columns are reflected, effectively reversing the first two indices inindexing at depth. For example, the transpose ofL above is,
flip L
1 10 20
2 100 200
3 1000 2000
Matrices are a special case of rectangular lists and canmost easily be defined recursively. Amatrixof dimension 1 is a simplelist. In the context of mathematical operations, the simple list would havenumeric type, but this is not a restriction. The count of a one-dimensionalmatrix is called itsize. In some contexts, a simple one-dimensionalmatrix is called a vector, its countlength, and an atom is a scalar.Some examples.
v1:1 2 3
v2:98.60 99.72 100.34 101.93
v3:`so`long`and`thanks`for`all`the`fish
For n>1, we define a matrix of dimension n recursivelyas a list of matrices of dimensionn-1 all having the same size. Thus, amatrix of dimension 2 is a list of matrices of dimension 1, all having the samesize. If all items in a matrix have the same type, we call this thetypeof the matrix.
Two-dimensional matrices are frequently encountered andhave special terminology. Letm be a two-dimensional matrix. The itemsofm are its rows. As we have already seen, theithrow of m can be obtained via item indexing asm[i]. Equivalently, wecan use an elided index with indexing at depth to obtain theithrow asm[i;].
By laying out the rows of m in tabular form, werealize that the list m[;j] is the jth column of m.Note that the expressionsm[i][j] andm[i;j] both retrievethe same item - namely, the element in rowi and columnj.
Following is an example of a two dimensional matrix of int,having size 4x3,
m:(1 2 3;10 20 30;100 200 300;1000 2000 3000)
m[0]
1 2 3
m[0;]
1 2 3
m[;2]
3 30 300 3000
m[0][2]
3
m[0;2]
3
The specification of m demonstrates that ourapproach to matrix definition treatsm as a collection of rows - i.e.,m is in row order. Since each row is a simple list, the elements of arow are in fact stored in contiguous memory. This makes retrieval of an entirerow very fast, but retrieval of a column will be slower since its elements arenot contiguous. This choice was made so that list indexing would result in theconventional matrix notation.
Advanced:It is equally valid to consider a one-dimensional array as a column and a twodimensional array as a collection of column vectors. This would make columnretrieval very fast, but index order would be transposed from conventionalnotation. As we shall see inTables, a table is in fact a collection ofcolumns that are notationally transposed for convenience. The constraints andcalculations of q-sql operate on columns, so they are fast, especially when thecolumns are vectors (i.e., simple lists). In particular, a simple time seriescan be represented by two parallel ordered columns, one holding the datetimesand the second holding the associated values. Retrieving and manipulating thepoints stored in time sequence is faster by orders of magnitude than performingthe same operations in an RDBMS that stores data by row with undefined roworder.
For completeness, here is an example of a three dimensional2x3x3 matrix - i.e., each item ofmm is a 3x3 matrix,
mm:((1 2 3;4 5 6;7 8 9);(10 20 30; 40 50 60; 70 80 90))
mm[0]
1 2 3
4 5 6
7 8 9
mm[1;2]
70 80 90
mm[1;;2]
30 60 90
We have seen that matrices in q look and act like theirmathematical counterparts. However, they have additional features not availablein simple mathematical notation or in many verbose languages. We have seen thata matrix can be viewed and manipulated both as a multi-dimensional array (i.e.,indexing at depth) and as an array of arrays (repeated item indexing). Inaddition, we can extend individual item indexing with indexing via a simplelist. With m as above,
m[0 2]
1 2 3
100 200 300
Contents[hide]
|
Operators and functions are closely related. In fact,operators are just functions used with infix notation. We cover functions indepth inFunctions, but provide a brief overviewhere. Function evaluation in q uses square brackets to enclose the argumentsand semicolons to separate them. Thus the output value of a monadic functionffor the inputx is written,
f[x]
Similarly, the value of a dyadic function is written,
f[x;y]
The simplest functions are those whose domain and range areatomic data types. These functions are called (what else?)atomicfunctions.
The normal way of writing addition in mathematics and mostprogramming languages uses an operator with infix notation - that is, a plussymbol between the two operands,
2+3
In q, we can also consider addition to be a dyadic functionthat takes two numeric arguments and returns a numeric result. You probablywouldn't think twice at seeing,
sum[a;b]
But you might blink at the following perfectly logicalequivalent,
+[a;b]
A dyadic function that is written with infix notation iscalled a verb. This terminology arises from thinking of the left operandas the subject which acts on the right operand as object.
The primitive operators are the built-in atomicverbs, including the basic arithmetic, relation and comparison operators. Someare represented by a single ASCII symbol such as '+', '-', '=', and '<'.Others use compound symbols, such as '<=', '>=', and '<>'. Stillothers have names such as 'not', 'neg'. The extent of operations is not limitedto the primitives, since any monadic or dyadic function can be made into averb.
Any verb, including all the primitive operators, can alsouse regular function notation. So, in q you can write,
+[2;3]
5
It is even possible, and sometimes useful, to write abinary verb using a combination of infix and functional notation for the twooperands. This may look very strange at first,
(2+)[3]
5
It is even possible to write,
(2+)3
5
A fundamental feature of an atomic function or operator isthat its domain is extended to lists by item-wise application. Thus, a monadicatomic function is applied to a simple list by operating element-wise on thelist. A dyadic atomic operator is extended to operate on an atom and a simplelist by applying its operation to the atom and the items in each position ofthe list. Similarly, a dyadic atomic operator is extended to operate on a pairof simple lists by operating pair-wise on elements in corresponding positions.
Symbolically, let m be a unary atomic verb, opa binary atomic verb,a an atom,L, L1 and L2simple lists, and i an int index. Then,
i th element of |
is |
m[ L] |
m[ L[ i] ] |
a op L |
a op L[ i] |
L op a |
L[ i] op a |
L1 op L2 |
L1[ i] op L2[ i] |
For example, the result of applying neg to asimple list is obtained by application to each item of the list.
L:100 200 300 400
neg L
-100 -200 -300 -400
The result of adding an atom to a simple list is obtainedby adding the atom to each item of the list.
99+L
199 299 399 499
The result of adding two simple lists of the same length isaddition of items at corresponding positions.
L1:100 200 300 400
L2:9 8 7 6
L1+L2
109 208 307 406
Recall that mathematical notation and verbose programminglanguages have a concept of operator precedence, which attempts to resolveambiguities in the evaluation of arithmetic and logical operations inexpressions. The arithmetic precedence rules were drummed into you inelementary school: multiplication and division are equal and come beforeaddition and subtraction, etc. There are similar precedence rules for =, <,>, 'and' and 'or'.
Although the traditional notion of operator precedence hasthe weight of many years of incumbency (not to mention the imprecations of yourfifth grade math teacher), it's time to throw the bum out. As mentioned inatoms, q has no rules for operatorprecedence. Instead, it has one simple rule for evaluating any expression:
Expressions areevaluated left of right
We could also say "right to left" since theinterpreter evaluates an expression from right-to-left. However, every actionin q is essentially a function evaluation, and it is more natural to read"f of x" rather than "x evaluated by f". Thinkingfunctionally makes "of" a paradigm, not just a preposition.
The adoption of left-of-right expression evaluation frees qto treat infix notation simply and uniformly. Which notation is used, infix orfunctional, depends on what is clearer in the specific context.
Left-of-right expression evaluation also means that thereis no ambiguity in any expression. (This is from the compiler's perspective; itis certainly possible to write q expressions comprehensible to only thecompiler and q gods). Parentheses can still be used to override the defaultevaluation order but there will be far fewer once you abandon the old (bad)habit of using them to override operator precedence. You should arrange yourexpressions with a goal of placing parentheses on the endangered species list.
Due to left-of-right evaluation, parentheses areneeded to isolate the result of an expression that is the left operand of averb. Omitting such parentheses is a common error for q newbies, as this groupingis often unnecessary in verbose languages.
Here is a canonical example, where < and > have theirusual meanings. As we shall see shortly, the | operator returns the maximum ofits operands; this reduces to "or" for binary types. It is a rite ofpassage of q newbies to write the first expression intending the second,
x:100
x<42|x>98
0b
(x<42)|x>98
1b
The first expression parses from right to left as:
· x is tested against 98 bygreater than, yielding 1b, which is compared for the larger to 42, yielding 42,against which x is tested by less than, yielding 0b.
The second expression parses from right to left as:
· x is tested against 98 bygreater than, yielding 1b, which is compared for the larger to 0b (being theresult of testing x against 42 by less than), yielding 1b.
Should this seem unnatural, don't worry. Once you completethis chapter, revisit here and it'll feel right as rain.
Operator precedence is quite feeble in that it requires allthe components of an expression to be analyzed (think for a moment about howyou do it manually) before it can be evaluated. Ironically, it results in thefrequent use of parentheses to override the very rules that are purportedlythere to help.
Even more damning is that operator precedence forcessemantic content onto infix notation. Suppose a programming language wished toallow dyadic functions to be verbs - i.e., expressed in infix notation - sothat
f[x;y]
can also be written,
x f y
This would entail the extension of precedence rules tocover verbs whenever they are mixed with arithmetic operations. Aside frombeing impractical, this would result in yet more parentheses.
The non-atomic, binary match operator ( ~ )applies to any two entities, returning a boolean result of1b if theyare identical and0b otherwise. For two entities to match, they musthave the same shape, the same type and the same value(s), but they may occupyseparate storage locations. Colloquially, clones are considered identical in qbecause they are indistinguishable.
Advanced:This differs from the notion of identity in some verbose languages, in thatdistinct q entities can be identical. For example, in languages of C ancestry,objects are equal if and only if their underlying pointers address the samememory location. Identical twins arenot equal. You must write your ownequivalence method to determine if one object is a deep copy of another.
There are no restrictions as to the type or shape of thetwo operands for match. Try to predict each of the following results of match,
42~42
1b
42~42h
0b
42f~42.0
1b
42~`42
0b
`42~"42"
0b
4 2~2 4
0b
42~(4 2;(1 0))
0b
(4 2)~(4; 2*1)
1b
(1 2; 3 4)~(1; 2 3 4)
0b
While you are learning q, applying match can be aneffective way to determine if you have entered what you intended, or todiscover whether two different ways of expressing something produce the sameresult. For example, q newbies often trip over
42~(42)
1b
This technique can be useful in checking intermediateresults when debugging (except for the q gods who enter perfect q code everytime).
The relational operators are atomic verbs that returnboolean results. Relational operations on atomic types have requirementsregarding the compatibility of the operands.
We begin with the equality operator ( = ), whichdiffers from match in that it is atomic, so it tests its operandscomponent-wise instead of in entirety. All atoms of numeric or char type aremutually compatible for equality, but symbols are compatible only with symbols.
Equality is not strict with regard to type, meaning typeswith the same underlying value are equal. For example, chars are equal to theirunderlying values.
#!q
42h=2*21
1b
42=42.0
1b
42=(42)
1b
42=0x42
0b
42="*"
1b
A symbol and a character are not compatible and an errorresults from the test,
`a="a"
'type
The not-equal primitive is ( <> ).
42<>0x42
1b
Note:The test "not equal" can also be expressed by applyingnot to theresult of testing with=.
a:42
b:98.6
a<>b
1b
not a=b
1b
Note:When comparing floats, q uses multiplicative tolerance, which makes arithmeticgive rational results.
r:1%3
r
0.3333333
2=r+r+r+r+r+r
1b
The monadic, atomic relational operator notdiffers from its equivalent in some verbose languages. It returns a booleanresult and has a domain of all numeric and character types; it is not definedfor symbols. Thenot operator generalizes the reversal of true andfalse values to any entity having an underlying numeric value by testing itsargument against an underlying 0. In other words, it answers the Hamletonianquestion: to be, or not to be, zero.
The test against zero yields the expected results forboolean arguments.
not 0b
1b
not 1b
0b
More generally, the test against zero apples for anynumeric type.
not 42
0b
not 0
1b
not 0j
1b
not 0xff
0b
f:98.6
not f
0b
not 0.0
1b
For char values, not returns false except for thecharacter representing the underlying value of 0.
not "a"
0b
not " "
0b
not "\000"
1b
For date and datetime values, not tests againstmidnight of Jan 1, 2000, since this is the datetime with underlying value 0.
not 2042.04.02
0b
not 2000.01.01T00:00:00:000
1b
not 2000.01
0b
The last example obtains because omitted temporalconstituents default to their underlying numeric 0 values.
For time values, not tests against 00:00:00.000.
not 00:00:00.000
1b
not 04:02:42.042
0b
We consider the binary atomic order operators. Less than ( <), greater than (> ) less or equal (<= ) and greateror equal ( >= ) are defined for all atoms with the requirement thatthe operands be of compatible types. Numeric and char types are mutuallycompatible, but symbols are only compatible with symbols. Comparison fornumeric and char types is based on underlying numeric value, independent oftype.
4<42
1b
4h>=0x2a
0b
-1.59e<=99j
1b
For char atoms, the underlying numeric value results incomparison according to ASCII character sequence.
"A"<"Z"
1b
"a"<="Z"
0b
"A"<"0"
0b
"?"<"/"
0b
A numeric atom and a char are compared according to theunderlying numeric value of the char.
42<"z"
1b
For symbols, comparison is based on lexicographic order.
#!q
`a>=`b
0b
`ab<`abc
1b
Now that we are familiar with relational operations onatoms, let's examine their item-wise extensions to simple lists.
2<1 2 3
001b
1 2 3h>=-987.65 1.234 567.89
110b
" "="Life the Universe and Everything"
00001000100000000100010000000000b
"zaphod"="Arthur"
000100b
"zaphod">"Arthur"
100000b
Note:As of this writing (Jun 2007), the primitive> is converted to theequivalent< under the covers by the q interpreter. That is,
a>b
is actually evaluated as,
bThis does not matter when a and b areatoms or lists, but it does have consequences when they are dictionaries.
BasicArithmetic: +, -, *, %
The arithmetic operators are atomic verbs and come in twoflavors: binary (in the mathematical sense of having two operands) andunary(one operand). We begin with the four operations of elementary arithmetic.
42+67 |
||
Symbol |
Name |
Example add |
* |
times |
2h*3h |
% |
divide |
42%6 |
On the surface, things look pretty much like otherprogramming languages, except that division is represented by% since/is used to delimit comments. We have,
6*7
42
a:42
b:3
c:a-b
c
39
100*a
4200
c%b
13f
Note:The result of division isalways a float.
For a programmer not accustomed to left-of-rightevaluation, the following may take some getting used to.
2*1+1
4
Things can get funky fast for the q newbie.
c:1000*b:1+a:42
c
43000
One way to read this is:
The integervalue 42 is assigned to the variable named a, then the assigned value is addedto 1, then this result is assigned to the variable named b, whose assignedvalue is multiplied by 1000 and the result is assigned to the variable named c
The arithmetic operations are defined for all numerictypes, and all numeric types are compatible. The type of the result depends onthe operands. Loosely speaking, smaller types are promoted to their widercousins and division always results in floats. Typing does not get in the way ofarithmetic.
When binary types participate in addition, subtraction andmultiplication, they are promoted to int. In other words, arithmetic isnotperformed modulo 2 (i.e., in base 2) for binary values, or modulo 256 for bytevalues.
1b+1b
2
0x2a+0x11
59
42+1b
43
5*0x2a
210
When integer types are used in addition, subtraction andmultiplication, the result is an int or the widest type present, whichever iswider.
a:42
b:123h
c:1234567890j
b+b
246
a+b
165
a+b+c
12345678055j
The result of addition, subtraction and multiplication ofinteger data types is modulo the width of the result. That is, overflow isignored. For example, int arithmetic is modulo 2^32^.
i:2147483647
i+3
- 2147483646
When any numeric types participate in division, they arepromoted to float and the result is a float.
1%3
0.3333333
3%1
3f
When floating point data types are mixed, the result isfloat.
6.0*7.0e
42.0
Note:The arithmetic operators arealways dyadic. In particular, while (- ) is alsoused syntactically to denote a negative number, there is no unary function (- ) to negatea value. Its attempted use for such generates an error. Use the operatorneg for thispurpose.
a:-4
a
-4
-a / This is an error
'-
neg a
4
According to the discussion in Match, the arithmetic operatorsare extended item-wise to lists. Thus,
2+100 200 300
102 202 302
b:1000.0 2000.0 3000.0 4000.0
b*2
2000 4000 6000 8000f
c:2 4 6 8
b%c
500 500 500 500f
In the following example, observe that item-wise atomicapplication is recursive when all the list components are numeric,
e:(100 200;1000 2000)
e-2
98 198
998 1998
The comparison operators are atomic and binary, and returnthe type of the widest operand. Numeric types and char are mutually compatible;comparison is not defined for symbols.
The max operation ( | ) returns the maximum of itsoperands based on underlying numeric values; this reduces to logical"or" for binary operands. The min operation (& )returns the minimum of its operands based on underlying numeric values; thisreduces to logical "and" for binary operands. The same type promotionrules apply as for the arithmetic operators.
0b|1b
1b
1b&0b
0b
42|0x2b
43
4.2e&42j
4.2e
"a"|"z"
"z"
"0"&"A"
"0"
`a|`z / this is an error
`type
Following are examples of comparison extended item-wise tosimple lists.
2|0 1 2 3 4
2 2 2 3 4
11010101b&01100101b
01000101b
"zaphod"|"arthur"
"zrthur"
Note:For the symbolically challenged, the operator| can also be written asor. Theoperator & can be written asand.
1 and 3
1
"a" or "z"
"z"
The atomic unary sqrt has as domain allnon-negative numeric values and returns a float representing the square root ofits argument.
sqrt 2
1.414214
sqrt 4
2f
sqrt 0x42
8.124038
sqrt -1
0n
The atomic unary exp has as domain all numericvalues and returns a float representing the base e raised to the power of itsargument.
exp 1
2.718282
exp 4.2
66.68633
exp -12h
6.144212e-06
Note:Do not confuse the 'e' used in the display of scientific notation with themathematical base of natural logarithms.
The atomic unary log has as domain all numericvalues and returns a float representing the natural logarithm of its argument.
log 1
0f
log 0x2a
3.73767
log 0.0001
-9.21034
log -1
0n
The atomic binary xexp has as domain all numericvalues in both operands and returns a float representing the left operandraised to the power of the right operand. If the mathematical operation doesnot make sense, the result is0n.
2 xexp 5
32f
-2 xexp .5
0n
The atomic binary xlog has as domain all numericvalues in both operands and returns a float representing the logarithm of theright operand with respect to the base of the left operand. If the mathematicaloperation does not make sense, the result is 0n.
2 xlog 32
5f
2 xlog -1
0n
These functions are useful in calculations.
The binary mod is atomic in its left operand (dividend)which is any numeric value. The right operand (divisor) is a numericatom. The result is the remainder of dividing the dividend by the divisor. Thisproduces the usual remainder from elementary school for positive integers butis somewhat more complex for general numeric arguments.
For a positive divisor, the remainder is defined as thedifference between the dividend and the largest integral multiple of thedivisor not exceeding the absolute value of the dividend.
4 mod 3
1
0x2a mod 0x10
10
4.5 mod 2.3
2.2
4.5 mod -2.3
-0.1
The atomic unary signum has as domain all integraland floating point types and returns an int representing the sign of itsargument. Here 1 represents "positive", -1 represents"negative" and 0 represents a zero argument.
signum 4.2
1
signum -42
-1
signum 0
0
The atomic unary reciprocal has as domain allnumeric types and returns afloat representing 1.0 divided by theargument.
reciprocal 0.02380952
42.00001
reciprocal 0
0w
The atomic unary floor has as domain int andfloating point types and returns an int representing the largest integer thatis less than or equal to its argument.
floor 4
4
floor 4.0
4
floor 4.2
4
floor -4.0
-4
floor -4.2
-5
The floor operator can be used to truncate orround floating point values to a specific number of digits to the right of thedecimal.
a:4.242
0.01*floor 100*a
4.24
0.1*floor 0.5+10*a
4.2
Note:Thefloor function does not apply to boolean, byte, or short types.
floor 0x2a
'type
Analogous to floor, the atomic unary ceilinghas as domain int, long and floating point types and returns the smallest intthat is greater than or equal to its argument.
ceiling 4
4
ceiling 4.0
4
ceiling 4.2
5
ceiling -4.0
-4
ceiling -4.2
-4
Note:For reasons known to the q gods,ceiling does apply to boolean or bytetypes but not to short type.
ceiling 0b
0
ceiling 42h
'type
The atomic unary abs has as domain all integraland floating point types. It returns its argument if the argument is greaterthan or equal to zero, orneg applied to its argument otherwise. Theresult ofabs has the same type as the argument.
abs 4
4
abs -4
4
abs -4.2
4.2
abs -4.0
4f
abs -4.2e
4.2e
abs -4j
4j
We have separated temporal types and their operations intothis section because they have richer semantics.
First, we note that a date or datetime is actually storedunder the covers as a signed float, with 0.0 corresponding to midnight ofJanuary 1, 2000. So,
0.0=2000.01.01T00:00:00:000
1b
The integral part of the floating point value correspondsto the number of days after (positive) or before (negative) the start of themillennium. The decimal portion of a datetime is the fractional portion of a24-hour day represented by its time component. Thus,
33.5=2000.02.03T12:00:00.000
1b
Time is stored as the number of milliseconds from the startof day. Thus, a time value is between 0 and 86,400,000 (24*60*60*1000). So,
43200000=12:00:00.000
1b
In contrast to some verbose languages, any expression involvingtemporal types and numerical types that should make sense actually does, and itworks in the expected fashion. Comparison of dates or datetimes reduces tocomparison of the underlying floating point values. Thus,
2006.01.01T00:00:00.000<2005.12.25T12:00:00.000
0b
2005.12.25=2005.12.25T00:00:00.000
1b
2005.12.25<2005.12.25T12:00:00.000
1b
Time values can be compared with each other and the resultis based on the underlying millisecond counts.
12:01:10.987<17:05:42.986
1b
A date and a time can be added to give a datetime.
2007.07.04+12:45:59.876
2007.07.04T12:45:59.876
Note:A time is implicitly converted to a fractional day when it is added to a dateto get a datetime.
A date or datetime can be compared, or tested for equality,with a float,
366.0=2001.01.01
1b
A time can be compared with an int.
43200000<12:00:00.001
1b
A float representing a fractional day count can be added toor subtracted from a datetime (or date) to give a datetime. In this context,the integral part of the fractional day count represents the number of days andthe decimal part represents the fractional part of a 24-hour day. For example,to move forward 33 days and 12 hours,
2000.01.01T00:00:00:000+33.5
2000.02.03T12:00:00.000
Or, to move back 2 hours and 30 minutes,
2000.01.01T00:00:00:000-2.5%24
1999.12.31T21:30:00.000
An int representing a day count can be added to orsubtracted from a date to give a date.
2006.07.04+5
2006.07.09
The difference of two datetimes is a float representing thefractional day count between them.
2007.02.03T12:00:00.000-2007.01.01T00:00:00:000
33.5
The difference between two dates is an int day countrepresenting the number of days between them.
2006.07.04-2006.04.04
91
An int representing a time count of milliseconds can beadded to or subtracted from a time to give a time.
12:00:00.000+1000
12:00:01.000
The difference between two times is an int count of thenumber of milliseconds between them.
23:59:59.999-00:00:00.000
86399999
Observe that a time does not wrap when it exceeds 24 hours.
23:59:59.999+2
24:00:00.001
As you gain experience with the way q handles infinitiesand nulls, you'll find that it is simpler and more rational than verboselanguages. Injection of such an exceptional value into a calculation streampropagates through subsequent steps in a predictable way without the need forspecial error trapping and handling. While the result will contain somemeaningless data, portions that do not depend on the invalid values will stillcompute correctly.
We show how to produce and operate with the infinities wemet in Infinities and NaN. Division of anon-negative numeric value by any 0 results in float infinity, denoted by0w.
4.0%0
0w
3.14%0.0
0w
0x32%0
0w
1b%0
0w
Similarly, division of a negative numeric value by any 0results in negative float infinity, denoted by-0w.
-4%0.0
-0w
-3.14%0
-0w
The int infinities can not be produced via an arithmeticoperation on normal int values, since the result of division in q is always oftype float.
42%0
0w
-42%0
-0w
When any numeric zero is divided by zero, the mathematicalresult is undefined. This is sometimes represented in writing as NaN("not-a-number"). It is denoted in q by0n, which is thefloat null value,
0%0
0n
0.0%0.0
0n
0.0e%0b
0n
0j%0x00
0n
The infinities and nulls act reasonably in numericexpressions and comparisons. Generally, if one member of an expression isinfinite or null so is the result. In an arithmetic mix of infinity, null orNaN, the null prevails over infinity and NaN prevails over other nulls. Notethat the signs of infinities are carried correctly through arithmetic andmeaningless expressions involving infinities result in NaN.
2+0w-3
0w
0w*-0w
-0w
-0w+0w
0n
42+0n
0n
42+0N
0N
0w+0n
0n
0n+0N
0n
The exception to the above is that any integral infinitycan be added to its negative infinity to yield 0.
-0Wj+0Wj
0j
When nulls occur in expressions of mixed type, the sametype promotion rules apply as for finite values.
42+0N
0N
42j+0N
0Nj
0N+0Nj
0Nj
0n+0N
0n
Infinities are distinct from all numeric values and fromall nulls as well, since they do not represent missing data. All nulls areequal since they differ only by type.
42=0W / can compare a numeric value to infinity
0b
0w=42%0 / can compare float infinity to itself
1b
0=0N / 0 is not the same as missing integer
0b
0=0n / 0 is not the same as missing float
0b
0w=0W / float infinity is not the same as int infinity
0b
0w=0N / float infinity is not the same as null integer
0b
0w=0n / float infinity is not the same as missing float
0b
0Nj=0N / missing long and missing int are the same
1b
0N=0n / missing int and missing float are the same
1b
Note:In contrast to some languages, such as C, separateNaNs are equal.
(0%0)=0%0
1b
Advanced:The integral infinities, positive and negative, have underlying values whosebit patterns correspond to legitimate base-2 integral values.
Value |
Bit Representation |
0Wh |
0111111111111111b |
0W |
01111111111111111111111111111111b |
0Wj |
0111111111111111111111111111111111111111111111111111111111111111b |
Consequently, we find
32767=0Wh
1b
2147483647j=0W
1b
-32767=-0Wh
1b
-2147483647j=-0W
1b
Match is a different story because type matters.
42~0w / can try to match a numeric value to infinity
0b
0w~42%0 / can match infinity to itself
1b
0~0N / 0 does not match an missing integer
0b
0~0n / 0 does not match missing float
0b
0w~0W / float infinity does not match int infinity
0b
0w~0N / infinity does not match missing integer
0b
0w~0n / infinity does not match missing float
0b
0Nj~0N / missing long and missing int do not match
0b
0N~0n / missing int and missing float do not match
0b
The not operator returns 0b for allinfinities and nulls since they all fail the test of equality with 0.
not 0w
0b
not 0W
0b
not 0N
0b
not 0n
0b
The neg operator returns -1 times its operand, soit reverses the sign on infinities but does nothing to nulls since sign ismeaningless for missing data.
neg 0W
-0W
neg -0w
0w
neg 0N
0N
not " "
0b
Comparisons apply to infinities and nulls, as summarized inthe following diagram.
nulls < -0w < -0Wj< -0W < -0Wh < numeric values < 0Wh < 0W < 0Wj < 0w
As rules:
Note:These relations characterize the infinities, in the sense that they are largeror smaller than all normal values. The integral infinities have underlying bitpatterns corresponding to legitimate base 2 values that yield the aboverelations. Infinite arithmetic will parse, but the results are not particularlyuseful. It is recommended that you limit operations on integral infinities toequals, not equals and inequalities.
Some examples,
42<0W
1b
-0w<42.0
1b
-0w<1901.01.01
1b
-0w<0w
1b
0W<0w
1b
-0w<0W
1b
-10000000<0N
0b
0Nj<42
1b
0n<-0w
1b
The null symbol is less than any other symbol
`a<` / the right side is the null symbol
0b
The behavior of | and & withinfinities and nulls derives from that of equality and comparison.
42|0W
0W
-42&0N
0N
0w|0n
0w
-0w&0n
0n
0n|0N
0n
0n&0n
0n
0W&0Wj
2147483647j
The last result obtains because int infinity is promoted toa long and its bit pattern corresponds to the listed value.
An alias is a variable that is defined as anexpression involving other variables. This differs from ordinary assignmentwhich defines a variable as theresult of an expression.
Double assignment (::) outside a function defines the leftoperand as an alias of the right operand. When the alias is referenced,the underlying expression will be (re)evaluated. For example, the following definesb as an alias fora. Observe that changing the value of ais reflected in b but not inc:
a:42
b::a
c:a
b
42
c
42
a:98.6
b
98.6
c
42
Aliasing is useful when the underlying expressionrepresents a calculation.
u:4
v:3
w::v+sqrt u
w
5f
u:9
w
6f
The result of aliasing can also be achieved with afunction. In the previous example, we could define,
f:{y+sqrt x}
f[4;3]
5f
Aliasing provides convenient variable syntax instead offunction semantics, but the dependencies are more evident in the function.
Alias chains are resolved and dependency loops aredetected.
a:42
b::a
c::b+1000
b
42
c
1042
a:98.6
b
98.6
c
1098.6
a::c
'loop
Advanced:Aliasing can be used to provide a view in a database by specifying a query asthe right operand. For example
t:([]c1:`a`b`c`a;c2:20 15 10 20;c3:99.5 99.45 99.42 99.4)
va:select sym:c1,px:c3 from t where c1=`a
va
sym px
--------
a 99.5
a 99.4
Double assignment establishes a dependency of the alias onthe entities in its underlying expression. For example,
u:4
v:3
w::u+v
establishes s dependency of w on u and v. Q maintains alist of dependencies in the dictionary .z.b.
.z.b
u| w
v| w
Each entity in the domain of .z.b is mapped to the entitiesthat depend on it. If we add an alias of u in our example, we find,
.z.b
u| w z
v| w
Advanced:The table dependencies implicit in views are not reflected in .z.b.
t:([]c1:`a`b`c`a;c2:20 15 10 20;c3:99.5 99.45 99.42 99.4)
s:select c1,c3 from t where c2=20
.z.b
u| w z
v| w
Contents
|
In this chapter, we cover functions in depth. Beforestarting, you may wish to review theMathematical Functions Refresher if it hasbeen a while since your last encounter with mathematical functions.
Appendix A contains specifics and examples of all the qbuilt-in functions. We shall use built-in functions in the following sectionswithout introduction. Simply look it up inAppendix A.
The notion of a function in q corresponds to a(mathematical) map that is specified by an algorithm. Afunction is asequence of expressions to be evaluated, having optional input parameters and areturn value.Application of a function is the process of evaluating theexpressions in sequence, substituting actual arguments for any formalparameters. If a return value is specified, the function evaluates to itsreturn value.
Advanced:Because a q function can access global variables, the corresponding mathematicalmapping actually includes the workspace as an implicit parameter. In otherwords, q is not a pure functional language because functions can have sideeffects.
The distinguishing characteristic of function definition isa matching pair of braces{ and} enclosing a sequence ofexpressions separated by semi-colons. In contrast to verbose languages, afunction's input parameters and the return value are not typed. In fact, theydon't even need to be declared explicitly. Even the function name is optional.
Following is a full specification of a function thatreturns the square of its input. Observe that we have added optional whitespaceafter the parameter for readability.
f:{[x] x*x}
You call f by enclosing its actual parameter insquare brackets,
f[3]
9
Here is a compact form of an equivalent function evaluationin which optional aspects are omitted,
{x*x}[5]
25
The notation for function definition is,
{[p1;...;pn]e1; ...; en}
where the optional p1, ... , pnare formal parameters ande1, ... , en is asequence of expressions to be evaluated in left-to-right sequence.
For readability, we shall normally insert optionalwhitespace after the closing square bracket that closes the parameter list, aswell as after each semicolon separator. Other styles may differ.
Note:The reason the expressions in a function are evaluated in left-to-rightsequence is so that the sequence becomes top-to-bottom when the function definitionis split across multiple lines. Specifically, right-to-left expressionevaluation would result in the following definition,
f:{[p1;...;pn]
e,,1,,;
...;
e,,n,,}
being evaluated from bottom to top, which would be veryunnatural.
The number of formal input parameters, either implicit orexplicit, is the function'svalence". Most common are monadic (valence1) and dyadic (valence 2). You specify a function with no parameters (niladic)with an empty argument list,
{[] ...}
Important:The maximum valence currently permitted is 8, so specifying more than eightarguments will cause an error. You can circumvent this restriction byencapsulating multiple parameters in a list argument.
Recommendation:Q functions should be compact and modular: each function should performwell-defined unit of work. Due to the power of q operators and built-infunctions, helper functions are often one liners. When a function exceeds 20expressions, you should ask yourself if it can be factored.
Variables that are defined within the expression(s) of afunction are called local variables.
The "return value" of a function is the valuecarried by the function evaluation. It is determined by the following rules:
For example, the following function specifications resultin the same input-output mapping.
f1:{[x] :x*x} / explicit return
f2:{[x] r:x*x} / local variable is returned
f3:{[x] x*x} / last expression is result
So does this one, even though it includes useless andunexecuted evaluations.
f4:{[x] a:1;:x*x;3}
Advanced:In contrast to k, the q operators are not overloaded on valence, meaning thatan operation does not have different functionality for different numbers ofarguments. However, q some operators (and build-in functions) are overloaded onthe types of the arguments, or even the sign of the arguments. For example, tounderstand the exact use of (?), you must carefully examine the operands.
If you omit the formal parameters and their brackets, threeimplicit positional parametersx,y and z areautomatically available in the function's expressions. Thus, the following twospecifications are equivalent:
f:{[x] x*x}
g:{x*x}
And so are,
f:{[x;y] x+y}
g:{x+y}
When using implicit parameters, x is always thefirst actual argument,y second andz third. The followingfunction g generates an error unless it is called with threeparameters.
g:{x+z} / likely meant x+y; requires 3 parms in call
g[1;2] / error...needs three parameters
{z+z}[1;2]
g[1;2;3] / OK...2nd value is required but ignored
4
Recommendation:If you use the names x, y and z in a function, reserve them for the first threeparameters, either explicit or implicit. Any other use will almost certainlylead to confusion, if not to trouble.
A function can be defined without being assigned to avariable. Such a function is calledanonymous since it cannot beevaluated by name.
{x+y}[4;5]
9
An anonymous function can be appropriate when it will beevaluated in only one location. A prevalent use is in-line helper functionswithin other functions.
f{[...] ...; {...}[...]; ...}
It is arguably more readable to extract anonymousfunctions.
g:{...}
f:{...; g[...]; ....}
This is a matter of coding style.
The identity function :: returns its argument. Itis useful for specifying defaults when using functional forms ofamendandselect.
Important:The identity function cannot be used with juxtaposition.
::[`a]
`a
::[1 2 3]
1 2 3
:: 42
'
The q entities we have met until now have been either nounsor verbs. Atoms and lists are nouns. Operators are verbs. In the followingexpression,
a:1+L:100 200 300
a, L and theliterals 100, 200, 300 are nouns, while the assign and plus operators areverbs.
It may come as a surprise that functions are also nouns. Wecan write,
a:3
f:{[x] 2*x}
a:f
a 3
6
Operators used as functions are also nouns, so continuingthe previous example we can also write,
L:(f;+)
L
{2*x}
+
Note:The display ofL illustrates that a function name is resolved to its body at thetime of assignment. If the definition off is subsequentlymodified, L will not change.
A variable that is defined by assignment in an expressionin a function is called alocal variable. For example,a is alocal variable in the following function.
f:{a:42; a+x}
A local variable exists only from the time it is firstassigned until the completion of the enclosing function's evaluation; it has novalue until it is actually assigned. Provided there is no variableaalready assigned in the workspace, evaluation of the function does not createsuch a variable. Using f as above,
f[6]
48
a
`a
Variables that have been assigned outside any functiondefinition are called global variables.
b:6
f:{x*b}
f[7]
42
To assign a global variable inside a function, use a doublecolon ( :: ), which tells the interpreter not to create a localvariable with the same name.
b:6
f:{b::7; x*b}
f[6]
42
b
7
When a local variable is defined with the same name as aglobal variable, the global variable is obscured.
a:42
f:{a:98; x+a}
f[6]
104
a
42
Important:When local and global names collide, the global variable is always obscured.Even double colon assignment affects the local variable. For example,
a:42
f:{a:6;a::98; x*a}
f[6]
588
a
42
We have already seen the basic form of assignment usingamend
a:42
Programmers from languages with C heritage will be familiarwith expressions such as,
x += 2; // C expression representing amend
which is shorthand for,
x = x + 2; // C expression
This is usually read simply "add 2 to x" but moreprecisely is, "assign to x the result of adding 2 to the current value ofx." This motivates the interpretation of such an operation as"amend," in whichx is re-assigned the value obtained byapplying the operation + to the operandsx and 2. Byimplication, a variable can only be amended if it has been previously assigned.
In q, the equivalent to the above C expression uses +:as the operator.
x:42
x+:2
x
44
There is nothing special about + in the abovediscussion. Amend is available with any binary verb, as long as the operandtypes are compatible.
a:42
a-:1
a
41
We shall see interesting examples of amend with otheroperators in later chapters.
This capability to amend in one step extends to lists andindexing,
L1:100 200 300 400
L1[1]+:9
L1
100 209 300 400
L1[0 2]+:99
L1
199 209 399 400
L1:100 200 300 400
L1[0 1 2]+:1 2 3
L1
101 202 303 400
L2:(1 2 3; 10 20 30)
L2[;1]+:9
L2
1 11 3
10 29 30
L2:(1 2 3; 10 20 30)
L2[0;1]+:100
L2
1 102 3
10 20 30
Note:Amend enforces strict type matching with simple lists, since the result must beplaced back into the list,
L1[0]+:42f
`type
Sometimes a function of valence two or more is evaluatedrepeatedly while some of its arguments are held constant. For this situation, amultivalent function can have one or more arguments fixed and the result is afunction of lower valence called theprojection of the original functiononto the fixed arguments. Notationally, a projection appears as a function callwith the fixed arguments in place and nothing in the other positions.
For example, the dyadic function which returns thedifference of its arguments,
diff:{[x;y] x-y}
can be projected onto the first argument by setting it to42, written as,
diff[42;]
The projected function is the monadic function"subtract from 42",
diff[42;][6]
36
This projection is equivalent to,
g:{[x] 42-x}
g[6]
36
We can also project diff onto its second argumentto get "subtract 42",
diff[;42][6]
-36
which is equivalent to,
h{[x] x-42}
When a function is projected onto any argument other thanthe last, the trailing semi-colons can be omitted. Givendiff asabove,
diff[42][6]
36
Recommendation:It will make your intent more evident if you donot omit trailingsemi-colons when projecting. For example, withdiff as above, a readerwill immediately recognize the projection,
diff[42;][6] / instead of diff[42][6]
The brackets denoting a function projection are required,but the additional brackets in the projection's evaluation can be omitted withjuxtaposition (as for any regular function).
diff[;42] 6
-36
diff[42] 6
36
Which notation to use is a matter of coding style.
A binary verb can also be projected onto its left argument,although the notation may take some getting used to. For example, theprojection of - onto its left argument is,
(42-)6
36
A verb cannot be projected onto its right argument, sincethis would lead to notational ambiguity. For example,(-42) is theatom-42 and not a projection.
(-42)
-42
If you really want to project onto the right argument of anoperator, you can do so by using the dyadic function form and juxtaposition ofthe argument.
-[;42] 98
56
In fact, the whitespace is not necessary in this example.
-[;42]98
56
We warned you about the notation.
When the original function has valence greater than two, itis possible to project onto multiple arguments simultaneously. For example,given,
f:{x+y+z}
we can project f into its first and thirdarguments and end up with a monadic function,
f[1;;3][5]
9
We arrive at the same result by taking the projection f[1;;]- now a dyadic function - and projecting onto its second argument to arrive atf[1;;][;3].
f[1;;][;3][5]
9
This is equivalent to projecting in the reverse order,
f[;;3][1;][5]
9
Note:Ifg is defined as a projection off and the definition of f is changed,g remains theprojection of the originalf.
f:{[x;y] x-y}
g:f[42;]
g
{[x;y] x-y}[42;]
g[6]
36
f:{[x;y] x+y}
g[6]
36
This can be seen by displaying g on the console,
g
{[x;y] x-y}[42;]
This section explores the deeper relationship between listsand functions. While it can be skipped on first reading by the mathematicallyfaint of heart, that would be like not eating your vegetables when you were akid.
You have no doubt noticed that the notation for listindexing is identical to that for function evaluation. That is,
L:(0 1 4 9 16 25 36)
f:{[x] x*x}
L[2]
4
f[2]
4
L 5
25
f 5
25
L 3 6
9 36
f 3 6
9 36
This is not an accident. In Creating Typed Empty Listswe saw that a list is a map defined by means of the implicit input-outputcorrespondence given by item indexing. A function is a map defined by asequence of expressions representing the algorithm used to obtain an outputvalue from the input parameters. For consistency, the two different mechanismsfor implementing a map do have the same notation. It may take a little time toget accustomed to the rationality of q.
With the interpretation of lists and functions as maps, wecan motivate the behavior of list indexing and function application when asimple index or atomic parameter is replaced by a simple list of the same.Specifically, we are referring to,
L[2 5]
4 25
f[2 5]
4 25
in the previous examples. The expression enclosed inbrackets is a simple list, call itl. Viewing the listI as amap, the two expressions are the composition ofL and I, andthe composition of f and I,
L[2 5] is (L[2]; L[5])
f[2 5] is (f[2]; f[5])
For a general list L, function f and itemindex list I, the compositions are,
L ◦ I(j) = L(i,,j,,)
f ◦ I(j) = f(i,,j,,)
Next, we show the deeper correspondence between listindexing and multivalent function evaluation. Notationally, a nested list is alist of lists, but it can also be viewed functionally as a compact form of theinput-output relationship for a multivariate map. This mapping transformstuples of integers onto the constituent atoms of the list and has valence equalto one plus the level of nesting of the list.
For example, a list with no nesting is a monadic map ofintegers to its atoms via item indexing.
L1:(1;2h;`three;"4")
L1[3]
"4"
A list with one level of nesting can be viewed as anirregular (or ragged) array by laying its rows out one above another. Forexample, the listL2 specified as,
L2:((1b;2j;3.0);(4.0e;`five);("6";7;0x08;2000.01.10))
can be thought of as a ragged array. The console displaydoes just this,
L2
(1b;2j;3f)
(4e;`five)
("6";7;0x08;2000.01.10)
This representation of a ragged array is a generalizationof the I/O table for monadic maps. From this perspective, indexing at depth isa function whose output value is obtained by indexing into the ragged array viaposition. In other words, the output value L2[i;j] is the jthelement of the ith row,
L2[1;0]
4.0e
This motivates the interpretation of L2 as dyadicmap over a sub-domain of the two-dimensional Cartesian product of non-negativeintegers and with range equal to the atoms ofL2. The duplei,jis mapped positionally, analogous to simple item indexing.
Advanced:It is possible create a ragged array of any number of columns using 0N as thenumber of rows with the reshape operator (# ).
0N 3#til 10
0 1 2
3 4 5
6 7 8
,9
You may have also noticed that the notations of functionprojection and elided indices in a list are identical. Revisiting the exampleof elided indices we used inNesting,
#!q
L :((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the"))
Define the list L1 by eliding the first and lastindex as,
L1:L[;1;]
L1
4 5 6 7
`z`y`x`
"is"
Viewing L as a map of valence three whose outputvalue is obtained by indexing at depth, this makesL1 the projectionofL onto its second argument. From this perspective,L1 is adyadic map that retrieves values from a sub-list,
L1[1;2]
`x
The previous discussion also motivates the explanation forthe behavior of item indexing in case an "out of bounds" index ispresented. In verbose languages, this would either result in some sort of error- the infamous indexing off the end of an array in C‚ - or an exception in Javaand C#.
By viewing a list as a function defined on a sub-domain ofintegers, it is reasonable to extend the domain of the function to all integersby assigning a null output value to any input not in the original domain. Inthis context, null should be thought of as "missing value." This isexactly what happens.
In the following examples, observe that the type of nullreturned matches the item type for simple lists and is0N for ageneral list
L1:1 2 3
L1[-1]
0N
L2:100.1 200.2 300.3 400.4
L2[100]
0n
L3:"abcde"
L3[-1]
" "
L4:1001101b
L4[7]
0b
L5:(1;`two;3.0e)
L5[5]
0N
As mentioned earlier, q strings are simple lists of char,which play a role similar to strings in verbose languages. It is possible toconvert data into strings, akin to the toString() method in O-O languages.
The function string can be applied to any q entityto produce a textual representation suitable for display or use in externalcontexts such as text editors, Excel, etc. In particular, thestringresult does not contain any q formatting information. Also, note that theresult ofstring is always a list of char. Following are someexamples.
string 42
"42"
string 6*7
"42"
string 42422424242j
"42422424242"
string `Zaphod
"Zaphod"
See Appendix A for more details on string.
Syntactically q has nouns, verbs and adverbs. Data entitiessuch as atoms, lists, dictionaries and tables are nouns. Functions are alsonouns. Primitive symbol operators and operations expressed in infix notationare verbs. For example, in the expression,
c:a+b
a, b and care nouns, while : and + are verbs. On the other hand, in
c:+[a;b]
a, b, cand + are nouns, while : is a verb.
An adverb is an entity that modifies a verb orfunction to produce a new verb or function whose behavior is derived from theoriginal.
The following adverbs are available in q.
Symbol |
Name |
' |
each both |
each |
each monadic |
/: |
each right |
\: |
each left |
/ |
over |
\ |
scan |
': |
each previous |
Note:The character that represents each is the single quote (' ) which isdistinct from the back-tick (` ) used with symbols.
Loosely speaking, the adverb each-both (') modifies a verbor function by applying its behavior item-wise to corresponding list elements.This concept is similar to the manner in which an atomic verb or function isextended to lists.
Important:There cannot be any whitespace between' and the verb itmodifies.
Perhaps the most common example of each is join-each ( ,') which concatenates two lists item-wise. In its base form, join takes twolists and returns the result of the second appended to the first.
L1:1 2 3 4
L2: 5 6
L1,L2
1 2 3 4 5 6
Two lists of the same count can be joined item-wise to formpairs.
L3:100 200 300 400
L1,'L3
1 100
2 200
3 300
4 400
As in the case of item-wise extension of atomic functions, thetwo arguments must be of the same length, or either can be an atom.
L1,'1000
1 1000
2 1000
3 1000
4 1000
`One,'L1
`One 1
`One 2
`One 3
`One 4
"a" ,' "z"
"az"
When both arguments of a derived function are atoms, theadverb has no effect.
3,'4
3 4
Advanced:A useful example of join-each arises when both arguments are tables. Since atable is a list of records, it is possible to apply join-each to tables withthe same count. The item-wise join of records results in a sideways join of thetables.
t1:([] c1:1 2 3)
t2:([] c2:`a`b`c)
t1
c1
__
1
2
3
t2
c2
__
a
b
c
t1,'t2
c1 c2
-------
1 a
2 b
3 c
There is a form of each that applies to monadic functionsand unary operators. It applies a (non-atomic) function to each element of alist. Monadic each can be notated in two equivalent ways for a monadic functionf,
f each
each[f]
The latter form underscores the fact that eachtransforms a function into a new function.
reverse each (1 2;`a`b`c;"xyz")
2 1
`c`b`a
"zyx"
each[reverse] (1 2;`a`b`c;"xyz")
2 1
`c`b`a
"zyx"
The transform is arguably more readable when the base operationis a projection.
(1#) each 1001 1002 1004 1003
1001
1002
1004
1003
each[1#] 1001 1002 1004 1003
1001
1002
1004
1003
Observe that the result of the last example can also beobtained with enlist.
enlist each 1001 1002 1004 1003
1001
1002
1004
1003
flip enlist 1001 1002 1004 1003
1001
1002
1004
1003
The last expression executes fastest for long lists.
The each-left adverb \: modifies the base functionso that it applies the entire second argument to each item of the firstargument.
Important:There cannot be any whitespace between\: and the verb itmodifies.
To append a given string to every string in a list,
("Now";"is";"the";"time") ,\: ", "
"Now, "
"is, "
"the, "
"time, "
The each-right adverb /: modifies the basefunction so that it applies the entire first argument to each item of thesecond argument.
Important:There cannot be any whitespace between/: and the verb itmodifies.
To prepend a given string to every string in a list,
" ," ,/: ("Now";"is";"the";"time")
" ,Now"
" ,is"
" ,the"
" ,time"
To achieve a Cartesian (cross) product of two lists, beginwith join-right ,/: and modify it with each-left. The net effect is tojoin every item of the first argument with every element of the secondargument.
L1:1 2
L2:`a`b`c
L1,/:\:L2
1 `a 1 `b 1 `c
2 `a 2 `b 2 `c
There is an extra level of nesting that can be eliminatedwith raze.
raze L1,/:\:L2
1 `a
1 `b
1 `c
2 `a
2 `b
2 `c
You can also begin with join-left ,\: and modifyit with each-right.
raze L1,\:/:L2
1 `a
2 `a
1 `b
2 `b
1 `c
2 `c
Observe that the orders of the resulting items for ,/:\:and for ,\:/: are transposed.
Note:Cartesian product is also encapsulated in the functioncross.
L1 cross L2
1 `a
1 `b
1 `c
2 `a
2 `b
2 `c
The over adverb / modifies a base dyadic functionso that the items of the second argument are applied iteratively to the firstargument.
Important: Therecannot be any whitespace between / and the function it modifies.
To add multiple items to another entity,
L:100 200 300
((L+1)+2)+3
106 206 306
L+/1 2 3
106 206 306
0+/10 20 30 / easy way to add a list
60
To raze a list,
L1:(1; 2 3; (4 5; 6))
(),/L1
1
2
3
4 5
6
To use your own function,
f:{2*x+y}
100 f/ 1 2 3
822
Advanced:To delete multiple items from a dictionary,
d:1 2 3!`a`b`c
d _/1 3
2| b
The scan adverb \ modifies a base dyadic functionso that the items of the right operand are applied cumulatively to the leftoperand.
Important:There cannot be any whitespace between\ and the function itmodifies.
To find running sums,
100+\1 2 3
101 103 106
0+\10 20 30 / easy way to find running sums of list
10 30 60
To use your own function,
f:{2*x+y}
100 f\ 1 2 3
202 408 822
The each-previous adverb ': modifies a base dyadicfunction so that each item of the right operand is applied to its predecessor.The left operand of the adverb is taken as the predecessor for the initialitem.
Important:There cannot be any whitespace between ': and the function it modifies.
To find the running 2-item sum with 0 before the initialitem,
0+':1 2 3 4 5
1 3 5 7 9
More interesting is to determine the positions where itemsincrease in value.
0w>':8 9 7 8 6 7
010101b
-0w>':8 9 7 8 6 7
110101b
The left operand controls the initial result. The firstexpression results in initial 0b for all numeric lists, while the secondresults in initial 1b. Why?
We are familiar with the syntactic forms of indexing andfunction application using either square brackets or juxtaposition.
L:(1 2;3 4 5; 6)
L[0]
1 2
L[0 2]
1 2
6
L 0 2
1 2
6
L[1;2]
5
f:{x*x}
f[0]
0
f[0 2]
0 4
f 0 2
0 4
g:{x+y}
g[1;2]
3
There are equivalent verb forms for indexing and functionapplication. The verb forms are read "index" or "apply"depending on the context.
The verb @ takes a list or a unary function as itsleft operand and a list of indices or a list of arguments as its right operand.For a list operand,@ returns the items specified by the right operand- i.e., indexing at the top level. For a function operand,@ returnsthe result of applying the function to the arguments item-wise.
With L and f as above,
L@0
1 2
L@0 2
1 2
6
f@0
0
f@0 4
0 16
The evaluation of a niladic function with @requires an arbitrary scalar argument.
fn:{6*7}
fn[]
42
fn@0N
42
Advanced:The verb@ also applies to dictionaries, tables and keyed tables. Fordictionaries and keyed tables it performs lookup. Since a table is a list ofrecords, it indexes records.
d:`a`b`c!10 20 30
d@`b
20
t:([]c1:1 2 3; c2:`a`b`c)
t@1
c1| 2
c2| b
kt:([k:`a`b`c]f:1.1 2.2 3.3)
kt@`c
f| 3.3
The verb . takes a list or a multivalent functionas its left operand and a list of indices or a list of arguments as its rightoperand. For a list left operand, verb. returns the result ofindexing the list at depth as specified by the right operand. For a functionleft operand, verb. returns the result of applying the function tothe arguments.
Important:Verb. must be separated from its operands by whitespace if they are namesor literal constants.
With L and g as above,
L . 1 2
5
g . 1 2
3
The verb . evaluates functions of any valence. This isuseful when the function or arguments are supplied programmatically and thevalence cannot be known beforehand.
Note:The right argument of. must be a list.
f . 4
'type
f . enlist 4
16
Use the null item :: to elide an index when usingverb . to index at depth.
m:(1 2 3;4 5 6)
m[;1]
2 5
m . (::;1)
2 5
Evaluating a niladic function with . requires a singletonoperand, which is arbitrary.
fn:{6*7}
fn[]
42
fn . enlist 0N
42
Advanced:Verb. provides a generalization of indexing at depth for complex entitiescomprised of general lists, dictionaries, tables and keyed tables. Perhaps theeasiest way to understand its action is to view all such entities as compositemappings. Verb. evaluates the composite map by iteratively applying indexing/lookupon each item of the right operand to the result of the previous step.
The use of verb . in the first following complexis list indexing in all positions; in the second, the middle item is a lookup.
L1:(1;2 3;(4; 5 6))
L1 . 2 1 1
6
L2:(1;2 3;`a`b!(4;5 6))
L2 . (2;`b;1)
6
In the following complex dictionary, the first use of verb .yields lookup followed by indexing, whereas the second use is two lookups.
dd:`a`b`c!(1 2;1.1 2.2 3.3;`aa`bb!10 20)
dd . (`a;1)
2
dd . (`c`bb)
20
Because a table is a list of records, verb .indexes a record on the first item and then performs a field lookup on thesecond.
t:([]c1:1 2 3;c2:`a`b`c)
t . (1;`c2)
`b
Because a keyed table is a dictionary mapping between twotables, verb . performs key lookup on the first item and then a fieldlookup on the second.
kt:([k:`a`b`c]f:1.1 2.2 3.3)
kt . `b`f
2.2
The functions @ and . can be used withvalence three or four to apply any function to an indexed sublist and anoptional second argument. The fact that the list can be a table that may bestored on disk makes this very powerful.
The general form of functional @ for dyadicfunctions is,
@[L;I;f;y]
While the notation is suggestive of lists, in fact Lcan be any mapping with explicit domain such as a list, dictionary, table,keyed table or open handle to a table on disk. ThenI is a list ofitems in the domain of the map,f is a dyadic function andyis an atom or list conforming to I. When L is a list, theresult is the item-wise application to the items ofL,indexed atthe top level by I, of f and the parametery.Over the subdomainI, the map output becomes,
L[I] f y / written as binary verb
f[L[I];y] / written as dyadic function
Or, using verb @ for indexing,
(L@I) f y / written as binary verb
f[L@I;y] / written as dyadic function
For example, to add 42 to certain items in a list,
L:100 200 300 400
I:1 2
@[L;I;+;42 43]
100 242 343 400
To replace these items,
@[L;I;:;42 43]
100 42 43 400
Observe that the argument L is unchanged,
L
100 200 300 400
In order to change the list argument, it must be referencedby name.
@[`L;I;:;42] / update L
`L
L
100 42 42 400
Note:The result of functional amend with a reference by name is a symbol containingthe name of the entity affected, not to be confused with an error message.
Advanced:As mentioned previously,L can be a dictionary, a table, or even an open handle to a table ondisk. In the general case, the resultf[L@I;y] is applied alongthe subdomain.
d:`a`b`c!10 20 30
@[d;`a`c;+;9]
a| 19
b| 20
c| 39
t:([] c1:`a`b`c; c2:10 20 30)
@[t;0;:;(`aa;100)]
c1 c2
------
aa 100
b 20
c 30
The general form of functional @ for a monadicfunction is,
@[L;I;f]
Again the notation is suggestive of lists, but Lis any map with explicit domain,I is a list of items in the domain ofL, and f is a monadic function. WhenL is a list, the resultis the item-wise application of f to the items ofL indexed at the toplevel by I. Over the subdomain I, the map output becomes,
f L[I] / written as unary verb
f[L[I]] / written as mondaic function
Or, using the verb form of @,
f[L@I]
For example,
L:101 102 103
I:0 2
@[L;I;neg]
-101 102 -103
Advanced:In the general case, the resultf[L@I]is applied along the subdomain.
d:`a`b`c!10 20 30
@[d;`a`c;neg]
a| -10
b| 20
c| -30
The general form of functional . for dyadicfunctions is,
.[L;I;f;y]
Again the notation is suggestive of lists, but Lis a mapping with explicit domain,I is a list in the domain ofL,f is a dyadic function andy is an atom or list of the propershape. For a list, the result is the item-wise application to the items ofLindexed at depth byI, of f and the parameter y.Over the subdomain I, the map output becomes,
(L . I) f y / binary operator
f[L . I;y] / dyadic function
For example, to add along a sublist,
L:(100 200;300 400 500)
I1:1 2
I2:(1;0 2)
.[L;I1;+;42]
100 200
300 400 542
.[L;I2;+;42 43]
100 200
342 400 543
To replace the same item,
.[L;I2;:;42 43]
100 200
42 400 43
Observe that the argument L is not modified.
L
100 200
300 400 500
In order to change L, it must be referenced by name.
L:(100 200;300 400 500)
.[`L;I1;:;42] / update L
`L
L
100 200
300 400 42
Note:The result of functional amend with a reference by name is the name of theentity affected, not an error message.
Advanced:In the general case, the result f[L . I;y] is applied along the subdomain.
d:`a`b`c!(100 200;300 400 500;600)
.[d;(`b;1);+;42]
a| 100 200
b| 300 442 500
c| 600
The general form of functional . for a monadicfunction is,
.[L;I;f]
Again the notation is suggestive of lists, but Lis any map with explicit domain,I is a list in the domain ofL,and f is a monadic function. For a list, the result is the item-wiseapplication off to the items ofL indexed at the depth levelby I. Over the subdomainI, the map output becomes,
f[L . I]
For example,
L:(100 200;300 400 500)
I:1 2
.[L;I;neg]
100 200
300 400 -500
Advanced:In the general case, the result f[L . I] is applied along the subdomain.
d:`a`b`c!(100 200;300 400 500;600)
.[d;(`b;1 2);neg]
a| 100 200
b| 300 -400 -500
c| 600
Contents
|
Casting manifests the malleability of data. In some cases,such as changing a string to a symbol, this is obvious and straightforward.Converting a char to its underlying ASCII code or converting an datetime to afloat require a little more consideration. Enumerations also fit into the castpattern.
Every atom has both an associated numeric and symbolic datatype. For convenience we repeat the data types table fromatoms.
type |
type symbol |
type char |
type num |
boolean |
`boolean |
b |
1h |
byte |
`byte |
x |
4h |
short |
`short |
h |
5h |
int |
`int |
i |
6h |
long |
`long |
j |
7h |
real |
`real |
e |
8h |
float |
`float |
f |
9h |
char |
`char |
c |
10h |
symbol |
` |
s |
11h |
month |
`month |
m |
13h |
date |
`date |
d |
14h |
datetime |
`datetime |
z |
15h |
minute |
`minute |
u |
17h |
second |
`second |
v |
18h |
time |
`time |
t |
19h |
The monadic function type can be applied to anyentity in q to find its (numeric) short data type. It is a quirk of q that thedata type of atoms is a short with thenegative of the value in thefourth column above;
type 42
-6h
type 1b
-1h
type 4.2
-9h
type 4h
-5h
type `42
-11h
type "4"
-10
type 2007.04.02
-14h
Observe that infinities also carry a type.
type 0W
-6h
type -0w
-9h
The type of a simple list is a short containing the positivevalue of the type of its constituent atoms.
type 1 2 3
6h
type "abc"
10h
type 1 2 3f
9h
The type of any general list is 0.
type (1;2h;3j)
0h
type (1;2;(3 4))
0h
type (`1;"2";3)
0h
How q handles the type of a variable may be confusing tothose coming from verbose languages. In many typed languages, the variable'stype must be specified before the variable is assigned a value - that is, whenit is declared. In q, a variable is assigned without declaration. The variablecan subsequently be reassigned a new value of a different type.
a:42
type a
-6h
a:98.6
type a
-9h
This can be understood by considering that q considers avariable to be a name (symbol) associated with a value. The association is madeupon assignment. A variable has the type of the value associated with its name.
In the example at hand, a variable with name 'a' is createdwhen the initial assignment is made. Since this is the first time that the name'a' is assigned, the q interpreter creates an entry for 'a' in its dictionaryof variable names and associates it with the int value 42. On the second assignment,there is already an entry for 'a' in the dictionary, so this name is simplyre-associated with the float value 98.6.
When you ask q for the type of a variable, it returns thetype of the value associated with the variable's name. Thus, when you reassignthe variable, the type of the variable reflects the type of its new value.
As in verbose languages, it is possible to cast an entityfrom one type to another, provided the underlying values are compatible. Such acast informs the compiler that you want it to consider the variable to be ofthe specified type for subsequent operations. Such a cast may result in acompile-time or run-time error if it can not be performed.
The q cast operator, denoted $, is a binary verb that isatomic in its right operandsource value, and whose left operand is thetargettype. The target can be represented in any of three type designators inthe table ofBasic Types.
First, examples using the numeric type.
5h$42
42h
6h$4.2
4
This form is useful when the target type is obtainedprogrammatically using thetype function.
It is arguably more readable to use the type's char in acast.
"i"$4.2
4
"x"$42
0x2a
"d"$2004.04.02T04:02:24.042
2004.04.02
The most readable (but longest) form uses the symbolic typename.
`int$4.2
4
`short$42
42h
`date$2004.04.02T04:02:24.042
2004.04.02
The result of casting between superficially distinct typescan be uncovered by considering the underlying numeric values. Chars correspondto their underlying ASCII sequence; dates to their offset from Jan 1, 2000; andtimes to their count of milliseconds.
"c"$0x42
"B"
`date$42
2000.02.12
Because cast is atomic in its right operand, it is extendeditem-wise to a list.
"x"$(10 20 30;255)
0x0a141e
0xff
Cast is also atomic in its left operand.
5 6 7h$42
42h
42
42j
Advanced:When integral infinities are cast to integers of wider type, they areconsidered to be their underlying bit patterns. Since these bit patterns arelegitimate values for the wider type, the cast results in a finite value.
"i"$0Wh
32767
"i"$-0Wh
-32767
"j"$-0W
-2147483647j
"j"$0W
2147483647j
Casting from a string (i.e., a list of char) to a symbol isa convenient way to create symbols. It is the preferred way to create symbolswith embedded blanks or other special characters. To cast a char or a string toa symbol, use the empty symbol (` ) as the target domain.
`$"z"
`z
`$"Zaphod Beeblebrox"
`Zaphod Beeblebrox
`$("Life";"the";"Universe";"and";"Everything")
`Life`the`Universe`and`Everything
Cast is atomic in both operands.
A string is trimmed as part of the cast.
`$" abc "
`abc
string `$" abc "
"abc"
Cast can also be used to parse data from a string by usingan upper case type char in the left argument.
"I"$"4267"
4267
"T"$"23:59:59.999"
23:59:59.999
Date string parsing is flexible with respect to the formatof the date.
"D"$"2007-04-24"
2007.04.24
"D"$"12/25/2006"
2006.12.25
"D"$"07/04/06"
2006.07.04
Casting can be used to coerce type-safe assignment. Recallthat assignment into a simple list must strictly match the type.
c:10 20 30 40
c[1]:42h
`type
This situation can arise when the list and the assignmentvalue are created dynamically. You can coerce the type by casting it to that ofthe target.
c[1]:(type c)$42h
c
10 42 30 40
c[0 1 3]:(type c)$(1.1; 42j; 0x2a)
c
1 42 30 42
We met the empty list in lists. Observe that it has type 0h,meaning that is a general list whose elements have no specific type,
type ()
0h
This empty list can be considered as the degenerate case ofa general list, so we call it thegeneral empty list. In situationswhere type enforcement is desired, it is necessary to have an empty list with aspecific type. Casting the general empty list using a symbolic type name makesthis clear.
L1:`int$()
type L1
6h
L2:`float$()
type L2
9h
L3:`$()
type L3
11h
A typed empty list is the degenerate case of a simple listof the specified type. This is useful because type matching is enforced whenyou append items.
L1,:4.2
'type
L1,:42
L1
,42
We have seen that the dyadic casting operator ( $) transforms its right operand into a conforming entity of type specified bythe left operand. In the basic operation, the left operand can be a char typeabbreviation, a type short, or a symbol type name. In this section, casting isextended to user-defined target domains, providing a functional version ofenumerated types.
To begin, recall that in some verbose languages, anenumerated type is a way of associating a series of names with a correspondingset of integral values. Often the sequence of numbers is consecutive and beginswith 0. The specific set of names/values is called the domain of the enumeratedtype and its name identifies the enumeration.
A traditional enumerated type serves multiple purposes.
There is a subtler, more powerful use: an enumerationnormalizes data.
Broadly speaking, data normalization seeks to eliminateduplicates and retain the minimum amount of data. Suppose you know that youwill have a list—in either the colloquial or q sense—of text entries taken froma fixed and reasonably short set of values. Storing a long list of such stringsverbatim presents two problems.
An enumeration solves both problems.
To see how, we start with the case of a q list vcontaining arbitrary symbols representing character values. Letu bethe unique values inv. This is achieved with the distinctfunction (SeeAppendix A for a detaileddescription).
u:distinct v
Let's try a simple example.
v:`c`b`a`c`c`b`a`b`a`a`a`c
u:distinct v
u
`c`b`a
Observe that order of the items in u is the orderof their first appearances inv.
Now consider a new list k that represents thepositions in u of each of the items inv. This is achievedwith the find (?) operator (SeeFind).
k:u?v
k
0 1 2 0 0 1 2 1 2 2 2 0
Then we have,
u[k]
`c`b`a`c`c`b`a`b`a`a`a`c
v~u[k]
1b
We observe that u and k indeed normalizethe data of v. In general,v will have many repetitions ofeach of the underlying values, butu stores each value once. Changingan underlying value requires only one operation in the normalized version butpotentially many updates to the non-unique list.
Extra credit for recognizing that v is simply thecomposite map u◦k. Effectively, we have factored the non-unique listvthrough the unique listu via the index map k.
v = u◦k
Why would we want to do this? Easy: compactness and speed.
Advanced:Let's say that the count ofu isa and the maximum width (in the colloquial sense) of thesymbols inu isb. For a list v of variable count x,the amount of storage required is potentially
b*x
For the factored form, the storage is known to be
a*b+4*x
which represents the fixed amount of storage for uplus the variable amount of storage for the simple integer listk. Ifais small and b is even moderately large, the factorization issignificantly smaller.
This can be seen by comparing the sizes of v, uand k in a slightly modified version of our example.
v:`ccccccc`bbbbbbb`aaaaaaa`ccccccc`ccccccc`bbbbbbb
u:distinct v
u
`ccccccc`bbbbbbb`aaaaaaa
k:u?v
k
0 1 2 0 0 1
Now imagine v and k to be much longer.
Reading and writing the factored index list from/to disk isa block operation that will be very fast.
Assuming that items of v are symbols stored in ahash-table, item indexing in the un-factored list requires looking up eachsymbol. Indexing into the factored list can be done directly via position sinceit is a uniform list of integers. This will be faster.
Enumeration encapsulates the above factorization of anarbitrary list of symbols through a list of unique values. An enumeration usesthe binary cast operator ($) and is a generalization of the basic cast betweentypes.
The general form of an enumerated value is,
`u$v
where u is a simple list of unique symbol valuesand v is either an atom inu or a list of such. The projection`u$ is theenumeration,u is the domain of theenumeration and `u$v represents theenumerated value(s).
Under the covers, applying the enumeration `u$ toa vector v actually factorsv throughu as in theprevious section. The resulting index listk is stored internally andthe lookup is performed automatically.
We recast our factorization example as an enumeration,
u:`c`b`a
v:`c`b`a`c`c`b`a`b`a`a`a`c
ev:`u$v
ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c
While the display of the enumeration ev shows thevalues of v within the domainu, only the implicit int indexlist is actually stored.
The enumeration ev acts just like the original v.
v[3]
`c
ev[3]
`u$`c
v[3]:`b
v
`c`b`a`b`c`b`a`b`a`a`a`c
ev[3]:`b
ev
`u$`c`b`a`b`c`b`a`b`a`a`a`c
v=`a
001000101110b
ev=`a
001000101110b
v in `a`b
011101111110b
ev in `a`b
011101111110b
Note:While the enumeration is item-wise equal to - and can be freely substituted for- the original, they arenot identical.
v=ev
111111111111b
v~ev
0b
The find operator ( ? ) can be used with anenumeration to locate the first position of specific values.
v?`a
2
ev?`a
2
The function where can be used to find alloccurrences of a specific value.
where v=`a
2 6 8 9 10
where ev=`a
2 6 8 9 10
The normalization provided by an enumeration reducesupdating all occurrences of a value into a single operation. This can havesignificant performance implications for large lists with many repetitions.
With u, v and e as above,
u[1]:`x
ev
`u$`c`x`a`c`c`x`a`x`a`a`a`c
v
`c`b`a`c`c`b`a`b`a`a`a`c
To make the equivalent update to v, it isnecessary to change every occurrence.
v[where v=`b]:`x
v
`c`x`a`c`c`x`a`x`a`a`a`c
One situation in which an enumeration is more complicatedthan working with the denormalized data is when you want to add a new value.Continuing with the example above, appending a new item tov is ssingle operation but this is not the case for the corresponding enumerationev.
u:`c`b`a
v:`c`b`a`c`c`b`a`b`a`a`a`c
ev:`u$v
v,:`d
v
`c`b`a`c`c`b`a`b`a`a`a`c`d
ev,:`d
'cast
What went wrong? The new value must first be added to theunique list.
u,:`d
ev,:`d
ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c`d
You may have already recognized that this presents acomplication in practice. Because you may not know whether the value you areappending tov is already inu, in order to maintain uniquenessin u you must test this before appending.
Fortunately, q has anticipated this situation. When dyadic ?is used with thename of a (simple) list of symbols as its left argumentand a symbol as its right argument, it appends the symbol to the list if andonly if it is not an item in the list.
u
`c`b`a`d
`u?`a
`u$`a
u
`c`b`a`d
`u?`e
`u$`e
u
`c`b`a`d`e
If you wish to append items to an enumerated valueprogrammatically, simply add to the unique list using? beforeappending to the enumerated value.
u:`c`b`a
v:`c`b`a`c`c`b`a`b`a`a`a`c
ev:`u$v
`u?`e
`u$`e
ev,:`e
u
`c`b`a`e
ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c`e
If you are given an enumerated value, you can recover theoriginal value by applyingvalue. In our example,
ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c
value ev
`c`b`a`c`c`b`a`b`a`a`a`c
Each enumeration is assigned a new numeric data type, beginningwith 20h. If you start a new q session and load no script files, you willobserve the following.
u1:`c`b`a
u2:`2`4`6`8
u3:`a`b`c
u4:`c`b`a
type `u1$`c`a`c`b`b`a
20h
type `u1$`a`a`b`b`c`c
20h
type `u2$`8`8`4`2`6`4
21h
type `u3$`c`a`c`b`b`a
22h
type `u4$`c`a`c`b`b`a
23h
Note:Enumerations with distinct domains are distinct, even when the domains match.
u1~u4
1b
v:`c`a`c`b`b`a
(`u1$v)~`u4$v
0b
Contents[hide]
|
Dictionaries are a generalization of lists and provide thefoundation for tables. A dictionary is a (mathematical) mapping defined by anexplicit I/O association between a domain list and range list. The two listsmust have the same count and the domain list should be a unique collection.While general lists can be used to create a dictionary, many usefuldictionaries involve lists of special forms. The domain is frequently acollection of symbols representing names. As we shall see, a dictionary whosedomain is a unique list of symbols and whose range is rectangular correspondsto a table.
A dictionary is an ordered collection of key-value pair -that is, a hashtable in verbose languages.
A dictionary, also called an association, isa mapping defined by an explicit I/O association between a domain list and arange list via positional correspondence. The creation of a dictionary uses thexkey primitive (! ),
Ldomain!Lrange
Recall from Mathematical Functions Refresher the viewof a map's I/O table as a pair of input and output columns. Dictionary notationis simply the map's I/O table turned on its side for ease of entry andcompactness of display.
Note:All dictionaries have type 99h.
The domain list comprises the keys of the dictionaryand the range list itsvalues. The keys of a dictionary are retrieved bythe unary primitivekey and the values by the unary primitive value.The count of the dictionary is the (common) count of its keys andvalues.
Note:Although q does not enforce the requirement that the key items are unique, adictionary does provide a unique output value for each input value, thusguaranteeing a well-defined mathematical map. See below for details.
The most basic dictionary maps a simple list to a simplelist. The following I/O table represents a mapping of three symbols containingnames to the corresponding individual's intelligent quotient,
I |
O |
`Dent |
98 |
`Beeblebrox |
42 |
`Prefect |
126 |
This mapping is defined compactly as a dictionary.
d1:`Dent`Beeblebrox`Prefect!98 42 126
count d
3
key d
`Dent`Beeblebrox`Prefect
value d
98 42 126
The console displays a dictionary I/O table in columnarform.
d
Dent | 98
Beeblebrox | 42
Prefect | 126
The function cols also returns the domain.
cols d1
`Dent`Beeblebrox`Prefect
Note:The order of the items in the domain and range lists is significant, just aspositional order is significant for lists. Although the I/O assignments and theassociated mappings are equivalent regardless of order, differently ordereddictionaries arenot identical.
d1:`Prefect`Beeblebrox`Dent!126 42 98
d~d1
0b
Finding the dictionary output value corresponding to aninput value is called looking up the input. This actually is achievedvia a hash-table lookup under the covers. Similar to functions and lists, bothd[x]andd x lookup the output value for x.
d[`Beeblebrox]
42
d `Beeblebrox
42
As with item indexing, lookup of a key not in the domain ofa dictionary results in an appropriately typed null value,not an error.
d[`Slartibartfast]
0N
As with lists and functions, key lookup in a dictionary isextended item-wise to a simple list of keys.
d[`Dent`Prefect]
98 126
Advanced:We can interpret key list lookup as the composition of the key lookup map withthe item indexing map. Symbolically, letd be a dictionary and K a key listin the domain of d. Then for 0 ≤j < countK,
d[K][j] = d[K[j]]
Using one of our examples,
d:`Dent`Beeblebrox`Prefect! 98 42 126
K:`Dent`Prefect
d[K][1]
126
d[K[1]]
126
Or, using the entire index list,
d K
98 126
d[K]
98 126
A dictionary a generalization of a list in which itemindexing has been extended to a non-integral domain. In particular, adictionary cannot be indexed implicitly via position. Attempting this on anydictionary generates an error.
d:"abcde"!1.1 2.2 3.3 4.4 6.5
d["c"]
3.3
d[0]
`type
We can define a dictionary whose lookup emulates themapping of list item indexing.
L3:`one`two`three
L3[1]
`two
d3:0 1 2!`one`two`three
d3[1]
`two
When we ask q to compare the two entities for equality, itobliges by considering both as mappings with integral domain. It then tests theassignments item-wise.
L3=d3
0| 1
1| 1
2| 1
However, the dictionary so-specified is not the sameas the list.
L3~d3
0b
Although retrieving items from a list-like dictionary isnotationally identical to item indexing, it is not the same. Item indexing is apositional offset, whereas dictionary retrieval is a lookup. They areimplemented differently under the covers.
Recall that indexing into a list can be achieved with verb @.
L:100 200 300
L[1]
200
L@1
200
The same syntax works for dictionary lookup.
d:`a`b`c!10 20 30
d[`b]
20
d@`b
20
We noted earlier that q does not enforce uniqueness in adictionary domain list. In the event of a repeated domain item, only the outputvalue associated with the first occurrence in left-to-right order is accessiblevia lookup. This guarantees that a dictionary provides a unique output for eachinput value and is thus a well-defined mathematical map.
For example,
ddup:8 4 8 2 3 1!`one`two`three`four`five`six
ddup[8]
`one
Advanced:Reverse lookup works properly for a non-unique domain.
ddup?`three
8
The range values of a dictionary are not required to beatoms. The range can be a general list that contains nested lists.
dgv:(1;2h;3.3;"4")!(`one;2 3;"456";(7;8 9))
dgv["4"]
7
8 9
Nor are keys are required to be atoms.
dgk:(0 1; 2 3)!`first`second
dgk[0 1]
`first
dgk[2 3]
`second
Advanced:If the keys are not a list of items of uniform shape, lookup does not work in auseful way.
dweird:(0 1; 2; 3)!`first`second`third
dweird[0 1]
`first
dweird[2]
`
dweird[3]
`
The observed behavior is that key lookup fails at the firstkey of different shape.
Dictionary lookup on a key or a list of keys returns theassociated values. It is also possible to extract the key-value associationsusing the take operator (#). The left operand is alist of keys, theright operand is thesource dictionary and the result is a newdictionary whose mapping is that of the original restricted to the specifiedkeys.
(enlist `c)#d
c| 30
`a`c#d
a| 10
c| 30
This works when the keys are not simple.
dns:(1 2; 3 4; 5 6)!("onetwo"; "threefour"; "fivesix")
(1 2; 5 6)#dns
1 2| "onetwo"
5 6| "fivesix"
As with lists, the items of a dictionary can be modifiedvia indexed assignment.
d:10 20 30!"abc"
d[30]:"x"
d
10| a
20| b
30| x
Important:In contrast to lists, dictionariescan be extended via index assignment.For example,
d[40]:"y"
d
10| a
20| b
30| x
40| y
L:"abc"
L[3]:"x"
'length
Let's examine this capability to modify or extend adictionary via index assignment more closely. Letd be a dictionary,cbe an atom whose type matches the domain ofd, and x an itemwhose type is compatible with the range ofd. The assignment,
d[c]:x
updates the existing range value if c is in thedomain of d, but inserts a new entry at the end of the dictionary ifcis not in the domain ofd.
This insert/update behavior is called upsertsemantics. Because tables are essentially dictionaries, upsert semantics carrythrough to tables.
Recall that the dyadic primitive find ( ? )returns the index of the right operand in a list.
1001 1002 1003?1002
1
Extending this concept to dictionaries means reversing thedomain-to-range mapping. We expect? to perform reverse lookup bymapping a range element to its domain element.
d:`a`b`c!1001 1002 1003
d?1002
`b
The result of find on an entity not in the range is a nullwhose type matches the domain list. For simple lists, the null matches the typeof the list; for general lists, the null is0N.
d?1004 / the result is the null symbol `
`
dg:(1;`a;"z")!10 20 30
dg?50
0N
Note:For a non-unique range element, find returns thefirst item mapping toit from the domain list.
d:`a`b`c`d!1001 1002 1003 1002
d?1002
`b
The binary operation delete (_) returns the result ofremoving an entry from a dictionary by key value. The left operand of delete isthe dictionary (target) and the right operand is a key value whose typematches that oftarget.
Note: Whitespace isrequired to the left of _ if the first operand is a variable.
For example,
d:1 2 3!`a`b`c
d _2
1| a
3| c
Observe that attempting to remove a key that does not existhas no effect.
d _42
1| a
2| b
3| c
The binary delete, also denoted by an underscore ( _), returns the result of removing multiple entries from a dictionary. The leftoperand of delete is a list of key values whose type matches that of thedictionary and the right operand is the dictionary (target). The resultis a dictionary obtained by removing the specified key-value pairs fromtarget.
Note:Whitespace is also required to the left of_ if the first operand isa variable.
Note:Since the left operand is required to be a list, a single key value must beenlisted.
For example,
d:1 2 3!`a`b`c
(enlist 2)_d
1| a
3| c
1 3_d
2| b
(enlist 42)_d
1| a
2| b
3| c
Attempting to remove a key that does not exist has noeffect.
4 5_d
1| a
2| b
3| c
Observe that removing all the entries in a dictionaryleaves a dictionary with empty domain and range lists of the appropriate types.
1 2 3_d
There binary operator cut is the same as ( _) on a dictionary.
(enlist 2) cut d
1| a
3| c
Because dictionaries are maps, it is possible to composetheir mappings with function mappings to perform operations on dictionaries. Ofcourse, this assumes that the range of each dictionary is in the domain of theindicated operation, so that the operation makes sense. The application of aunary operator is straight-forward.
d1:`a`b`c!1 2 3
neg d1
a| -1
b| -2
c| -3
2*d1
a| 2
b| 4
c| 6
d1=2
a| 0
b| 1
c| 0
When the domains of two dictionaries are identical, performingbinary operations is straightforward. For example, to add two dictionaries witha common domain, add their corresponding range elements,
d2:`a`b`c!10 20 30
d1+d2
a| 11
b| 22
c| 33
How do we combine two dictionaries whose domains are notidentical? First, the domain of the resulting dictionary is the union of thedomains of its operands. For items in the intersection of the domain lists,clearly we should simply apply the indicated operation on the correspondingrange items.
The real question is, what to do on non-common domainitems? The answer: do what makes sense for the operation. We start with joiningtwo dictionaries.
In the simple case of joining two disjoint dictionaries,the result should be the merge.
d3:`e`f`g!100 200 300
d1,d3
a| 1
b| 2
c| 3
e| 100
f| 200
g| 300
d3,d1
e| 100
f| 200
g| 300
a| 1
b| 2
c| 3
Observe that although the mappings arising fromopposite-order joins have equivalent input-output assignments, the dictionariesare not identical because order is significant.
We examine another simple example of joining dictionarieswith a special form. The particular dictionaries map symbols to lists of simplelists. When the two are disjoint the result should again be the merge. Forexample,
dc1:`a`b!(1 2 3; 10 20 40)
dc2:(enlist `c)!enlist 10 20 30
dc1,dc2
a| 1 2 3
b| 10 20 40
c| 10 20 30
As in the previous example, join simply appends the domainsand ranges in the obvious way. We shall refer to this case later.
Now we tackle the case of non-disjoint dictionaries. Theissue is how to merge items that are common to both dictionary domains, sincethese elements each have two I/O assignments.
Important:In a join of dictionaries, the right operand's I/O assignment prevails forcommon domain elements.
The result is another illustration of upsert semantics.Each I/O assignment of the right operand is applied as an update if the domainelement is assigned in the left operand, or as an insert if the domain elementis not already assigned.
With d1 as above,
d3:`c`d!33 44
d1,d3
a| 1
b| 2
c| 33
d| 44
Observe that upsert is not commutative, even over a commondomain. Join order matters.
d4:`a`b`c!300 400 500
d1,d4
a| 300
b| 400
c| 500
d4,d1
a| 1
b| 2
c| 3
Now that we understand how to join two dictionaries, weexamine other operations. When arithmetic and comparison operations areperformed on dictionaries, the indicated operation is performed on the commondomain elements and the dictionaries are merged elsewhere,
d5:`c`x`y!1000 2000 3000
d1+d5
a| 1
b| 2
c| 1003
x| 2000
y| 3000
d1*d5
a| 1
b| 2
c| 3000
x| 2000
y| 3000
d1|d5
a| 1
b| 2
c| 1000
x| 2000
y| 3000
When a relational operation is performed on twodictionaries, the indicated operation is performed over the entire uniondomain. Effectively, each dictionary is extended to the union domain with(type-matched) nulls. Otherwise put, for non-common domain items, the operationis performed on a pair of items in which a null whose type matches the providedrange item is substituted for the missing range item.
In the following examples, observe that operations on d1and d6 are equivalent to the corresponding operations ond11andd66,
d1:`a`b`c!1 2 3
d6:`b`c`d`e!22 3 44 55
d1=d6
a| 0
b| 0
c| 1
d| 0
e| 0
d1 a| 0b| 1c| 0d| 1e| 1d6 b| 0c| 0d| 0e| 0a| 1d1>d6b| 0c| 0d| 0e| 0a| 1d11:`a`b`c`d`e!1 2 3 0N 0Nd66:`a`b`c`d`e!0N 22 3 44 55d11=d66a| 0b| 0c| 1d| 0e| 0d11 a| 0b| 1c| 0d| 1e| 1d66 a| 1b| 0c| 0d| 0e| 0d11>d66a| 1b| 0c| 0d| 0e| 0Note:The> operation is evidently converted to the equivalent< operationwith reversed operands.
ColumnDictionaries
Column dictionaries are the foundation for tables.
Definitionand Terminology
A very useful type of dictionary is one that maps a list ofsymbols to a rectangular list of lists. Such a dictionary has the form,
c1... cn !(v1 ;... ;vn)
where each ci is a symbol and the viare lists with common count. Such a dictionary associates the symbolciwith the list of valuesvi.
Interpreting each symbol as a column name and thecorresponding vector as the column values, we call such a list acolumndictionary. Thetype of column named by ci is thetype of its value listvi. For many column dictionaries, theviare all simple lists, meaning that each column is a vector of atoms of uniformtype. We call this asimple column dictionary.
SimpleExample
Let's reorganize the example of the previous section as asimple column dictionary.
scores:`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)In this dictionary, the values for the name columnare,
scores[`name]`Dent`Beeblebrox`PrefectIt is possible to retrieve the values for a column in acolumn dictionary using dot notation.
scores.name`Dent`Beeblebrox`PrefectThe value in row 1 of the name column is,
scores[`name][1]`BeeblebroxSimilarly, the value in row 2 of the iq column is,
scores[`iq][2]126The dictionary console shows the mapping clearly.
scoresname| Dent Beeblebrox Prefectiq | 42 98 126AccessingValues
For a general column dictionary defined as,
dcols:c1 ...cn!(v1;...;vn)
the ith element of column cjis retrieved by,
dcols [cj][i]
What should we make of the following notation?
dcols [cj;i]
We can interpret it in three ways:
- Indexing at depth in the dictionary
- A generalization of a two dimensional matrix in which item indexing in the first dimension has become lookup into the list of column names
- A dyadic mapping
All interpretations are all equivalent and give the sameresult,
dcols [cj][i]
In our example,
scores[`iq][2]126scores[`iq; 2]126Rows andColumns
Viewing the dictionary as a dyadic function, we can projectonto its first argument by fixing it to obtain the monadic function - i.e.,dcols[cj;].This projected form yields item indexing into the column list.
In simple terms, projecting onto the first argumentretrieves a vector of column values from a column dictionary.
scores[`iq;]42 98 126Analogously, we would expect projection onto the secondargument to retrieve a "row" corresponding to the values in theithposition of each column vector. What form does such a row take?
Observe that the projection of the dyadic function onto itssecond argument by fixing the item index,
dcols[;i])
is a monadic function corresponding to generalized indexingby column name - i.e., dictionary lookup. Thus, we expect theithrow to be a dictionary that maps each column name to the value in that column'sithrow.
This is exactly what we find.
scores[;2]name| `Prefectiq | 126Notational differences aside, this resembles the result ofretrieving a record from a table using a SQL query: we get the column names andthe associated row values.
A column dictionary seems to be the perfect data structureto serve as the basis for a table: a generalized matrix with indexed rows andnamed columns. But you no doubt notice the fly in the ointment: the indices arein the wrong order. It is unnatural to retrieve a column in the first index anda row in the second.
ColumnDictionary with a Single Column
The domain of a column dictionary must always be a listof symbols and the range must be alist of column vectors. Consequently,when there is only one column you must enlist the domain and range. Thefollowingis a valid column dictionary (the parentheses are necessary),
ds:(enlist `c)!enlist 100 200 300The following dictionary that maps a symbol to a list is nota valid column dictionary,
dnot:`c!1 2 3Flipping aDictionary
Transposeof a Column Dictionary
A column dictionary can be viewed as a generalizedrectangular matrix. Let d be a column dictionary defined as,
d:c1... cn!(v1;...;vn)
where ci is a symbol and the vihave common count, saym. We can index at depth intod for eachci and eachj,
d[ci;j]= vi[j]
Since all the vi have count m, inanalogy with matrices, it makes sense to define the transposet ofdby the formula,
t[j;ci]= d[ci;j]
Exactly what is t that so defined? The answercomes from realizing that indexing at depth intot should be the sameas repeated indexing,
t[j;ci]= t[j][ci]
The right hand side of this equation makes explicit that tis a list of n itemst[j] , for 0≤j<n.
What is each item in the list t? Combining thethree previous equations, we see that that,
t[j][ci]= vi[j]
Now fix j in this equation. We see that t[j]is a dictionary with the same domain asd, meaning the list ofci.This dictionary assigns to each itemci the output valuevi[j].Thus, the range of the dictionary is the collection of valuesv1![1],...,vn[j].
We summarize our findings,
- The transpose of a column dictionary is a list of dictionaries.
- The dictionaries in the transpose have as common domain the column names of the original dictionary.
- The dictionary in the jth item of the transpose maps the column names to thejth row of values across the column vectors.
Flip of aColumn Dictionary
As in the case of lists, the transpose of a dictionary isobtained by applying the unaryflip operator,
flip dWhen flip is applied to a column dictionary, nodata is actually rearranged. The console display confirms the transposition ofrows and columns.
d:`name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)flip dname iq-------------------Dent 98Beeblebrox 42Prefect 126The net effect of flipping a column dictionary is simplyreversing the order of the indices. This is logically equivalent to transposingrows and columns.
Flip of aFlipped Column Dictionary
If you transpose a dictionary twice, you obtain theoriginal dictionary,
d~flip flip d / true for any column dictionary d1bConsequently, if you are given t the transpose of acolumn dictionary and you flip it, you obtain a column dictionary.
t:flip d / pretend you didn't see this stepflip tname| Dent Beeblebrox Prefectiq | 98 42 126Advanced:As of this writing (Jan 2007),flip has been implemented in q fordictionaries of columns, although the operation makes sense for any rectangulardictionary. In the event thatflip is implemented for a generalrectangular dictionary (i.e., any dictionary in which the range is a list oflists all having the same count) we would find the following:
The transpose ofa rectangular dictionary is a list of dictionaries. The dictionaries in thetranspose have a common domain that is the domain of the original dictionary.Thejth dictionary of the transpose maps the original domainto thejth row of values across the range list.
In this case, data likely will berearranged.
Contents
[hide]
- 1 Tables
- 1.1 Overview
- 1.2 Table Definition
- 1.2.1 Table is the flip of Column Dictionary
- 1.2.2 Table Display
- 1.2.3 Table Definition Syntax
- 1.2.4 Table Metadata
- 1.2.5 Records
- 1.2.6 Flipped Column Dictionary vs. List of Records
- 1.3 Empty Tables and Schema
- 1.4 Basic select
- 1.4.1 Syntax
- 1.4.2 Displaying the Result
- 1.4.3 Selecting Columns
- 1.4.4 Basic update
- 1.5 Primary Keys and Keyed Tables
- 1.5.1 Keyed Table
- 1.5.2 Simple Example
- 1.5.3 Keyed Table Specification
- 1.5.4 Accessing Records of a Keyed Table
- 1.5.5 Retrieving Multiple Records
- 1.5.6 Reverse Lookup
- 1.5.7 Components of a Keyed Table
- 1.5.8 Tables vs. Keyed Tables
- 1.5.9 Compound Primary Key
- 1.5.10 7.4.10 Retrieving Records with a Compound Primary Key
- 1.5.11 Key Lookup with txf
- 1.6 Foreign Keys and Virtual Columns
- 1.6.1 Definition of Foreign Key
- 1.6.2 Example of Simple Foreign Key
- 1.6.3 Resolving a Foreign Key
- 1.6.4 Foreign Keys and Relations
- 1.7 Working with Tables and Keyed Tables
- 1.7.1 First and Last Records
- 1.7.2 Find
- 1.7.3 Primitive Join (,)
- 1.7.4 Coalesce (^)
- 1.7.5 Column Join
- 1.8 Complex Column Data
- 1.8.1 Simple Example
- 1.8.2 Operations on Compound Column Data
- 1.8.3 Compound Foreign Key
- 1.9 Attributes
- 1.9.1 Sorted (`s#)
- 1.9.2 Unique (`u#)
- 1.9.3 Parted (`p#)
- 1.9.4 Grouped (`g#)
8.Tables
Overview
Tables form the basis for kdb+. A table is a collection ofnamed columns implemented as a dictionary. Consequently, q tables arecolumn-oriented, in contrast to row-oriented tables in relational databases.Moreover, a column's values in q comprise anordered list; thiscontrasts to SQL, in which the order of rows is undefined. The fact that qtables comprise ordered column lists makes kdb+ very efficient at storing,retrieving and manipulating sequenced data. One important example is data thatarrives in time sequence.
Kdb+ handles relational and time series data in the unifiedenvironment of q tables. There is no separate data definition language, noseparate stored procedure language and no need to map internal representationsto a separate form for persistence. Just q tables, expressions and functions.
Tables are built from dictionaries, so it behooves thecursory reader to reviewDictionaries before proceeding.
TableDefinition
Table isthe flip of Column Dictionary
You undoubtedly realized at the end of Dictionaries that a table isimplemented as a column dictionary that has been flipped (i.e., transposed).Theonly effect of flipping the column dictionary is to reverse theorder of its indices; no data is rearranged under the covers.
Note:All tables have type 98h.
For example,
d:`name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)d[`iq;]98 42 126d[;2]name| `Prefectiq | 126d[`iq; 2]126t: flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)t[;`iq]98 42 126t[2;]name| `Prefectiq | 126t[2;`iq]126To access items in a table t created by flipping acolumn dictionary d, simply reverse the order of the arguments in theprojections of d. We also reverse the roles ofi andjcompared to dictionaries to make things morenatural from the table perspective.
t[i;] / row i isdictionary mapping column names to values
t[i] / ithelement of list t...same as previous
t[;cj]/ vector of column values for column cj
This validates the implementation of a table as a flippedcolumn dictionary. Retrieving rows and columns conforms to conventional matrixnotation in which the first index denotes the row and the second index thecolumn.
TableDisplay
Observe that rows and columns of a table display are indeedthe transpose of the dictionary display, even though the internal data layoutis the same.
dname| Dent Beeblebrox Prefectiq | 98 42 126tname iq--------------Dent 98Beeblebrox 42Prefect 126TableDefinition Syntax
Table definition can also be accomplished using a syntaxthat manifests the columns,
([] c1:L1;...;cn:Ln)
Here c1 is a symbol containing a columnname and L1 is the corresponding list of column values. TheL1are lists of equal count, but in some circumstances can be atoms. The purposeof the square brackets is to specify a primary key and will be explained inBasic Select.
Note:For readability, we shall normally include optional whitespace after theclosing square bracket and to the right of semicolon separators.
In our example, we can define t as,
t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)t[;`iq]98 42 126t[2;]name| `Prefectiq | 126t[2;`iq]126Defining t syntactically yields the same result ascreating the column dictionary and flipping it. It is arguably simpler andclearer.
The value columns can be stored in variables, which isuseful for programmatic table definition.
c1:`Dent`Beeblebrox`Prefectc2:98 42 126t:([]c1;c2)tc1 c2--------------Dent 98Beeblebrox 42Prefect 126Note:WhenallLi are singleton lists - that is, you aredefining a table with a single row - they must be enlisted.
tt:([]c1:`a;c2:100)'typett:([]c1:enlist `a; c2:enlist 100)Note:Whenat least one column is a list and one or more columns are atoms,each atom column is extended into a list whose count matches the other columns.This can be used to assign a default value.
tdef:([]c1:`a`b`c; c2:42; c3:1.1 2.2 3.3)tdefc1 c2 c3---------a 42 1.1b 42 2.2c 42 3.3Advanced:If you create a table as the flip of a column dictionary, item-wise extensionof an atom column is not performed on the dictionary definition but it isperformed when the column dictionary is flipped into a table.
ddef:`c1`c2`c3!(`a`b`c;42;1.1 2.2 3.3)ddefc1| `a`b`cc2| 42c3| 1.1 2.2 3.3flip ddefc1 c2 c3---------a 42 1.1b 42 2.2c 42 3.3TableMetadata
The column names of a table can be retrieved by using theunary cols.
cols t`name`iqRecall that it is possible to retrieve the column values ina column dictionary using dot notation. This is also true after it is flippedto a table. For a tablet and a columnc, the expression t.cretrieves the value list for columnc. In our example,
t.name`Dent`Beeblebrox`Prefectt.iq98 42 126The dot effectively disassociates a column's values fromits name.
The function meta can be applied to a table tto retrieve its metadata. The result is a keyed table with one record for eachcolumn int. The key columnc of the result contains thecolumn names. The columnt contains a symbol denoting the type of thecolumn. The columnf contains the domains of any foreign keys. Thecolumna contains any attribute associated with the column.
meta tc | t f a--| -----c1| sc2| iAdvanced:If the result of meta displays an upper case type char for a column, thisindicates that column is a non-simple list in which each item is a list of thecorresponding type. Such tables arise, for example, when you group withoutaggregating in a select.
t:([] sc:1 2 3; nsc:(1 2; 3 4; 5 6 7))tsc nsc--------1 1 22 3 43 5 6 7Advanced:The function tables XE "tables (function)" takes a symbolrepresenting a context (seeworkspace organization) andreturns a sorted list of symbol names of the tables in that context. Forexample,
tables `.`s#`t`ttlists all the tables in the default context. Alternatively,the command \a provides tha same result. If no parameter is provided,it returns the result for the current context.
Records
We observe that count returns the number of rowsin the table since each row is an item in the list. In our example,
count t3Now let's inspect the sequence of dictionaries thatcomprise the rows.
t[0]c1| `Dentc2| 98t[1]name| Beeblebroxiq | 98The dictionary in each row maps the common domain list ofcolumn names to the column values of the row. This motivates calling each rowdictionary arecord in the table.
Important:A table is a sequentially ordered list of records. Each record is anassociation of column names with one row's values.
Sometimes it is useful to separate a record's values fromits column names. In this context, we shall refer to therow value list.The row value list for theith row of a table is obtained byretrieving theith item of each of the column vectors. Thisis simply the range of the record dictionary.
value t[1]`Beeblebrox42FlippedColumn Dictionary vs. List of Records
Is a table a flipped column dictionary or a list ofrecords? Logically it is both, but physically it is stored as a columndictionary with a flipped indicator.
To verify this, we create a list of records, each of whichis a dictionary that maps (common) column names to a row's values.
lrows:(`name`iq!(`Dent;98); `name`iq!(`Beeblebrox;42))While this list is apparently different from the equivalentcolumn dictionary, observe the curious result when you display the list ofrows,
lrowsname iq-------------Dent 98Beeblebrox 42The q interpreter has recognized that this list conforms tothe requirements for a list of records of a table - i.e., the domain lists ofall the dictionaries are the same, the range lists have common count, and thetypes of the range lists are consistent by position. It has converted the listof dictionaries to a flipped column dictionary by reorganizing the values thatwe specified record-by-record into column vectors.
Advanced:In general, column retrieval and manipulation on a simple column dictionarywill be significantly faster than operations on rows. The values in a simplecolumn are stored contiguously, whereas the values in each row must beretrieved by indexing into all columns.
Be mindful that deletion of a row is an expensive operationbecause all the column lists must be compressed to close the resulting gap.This can result in large amounts of data being moved in a table with many rows.
EmptyTables and Schema
We saw in the previous section that a table can be definedand populated in one step using table syntax.
t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)This is infrequently done with individual values inpractice, other than for small tests. Often values are deferred to run-time orthe value lists may be prohibitively long.
In these circumstances, it is useful to create an emptytable initially and then populate it later. The empty parentheses here signifythe empty list.
t:([] name:(); iq:())The table will then be populated, for example, by readingthe values from a file.
When an empty table is created as above, the columns arelists of general type, so data of any type can initially be loaded. The type ofeach column will be determined by the type of the first item placed in it.Thereafter, type checking is enforced for all inserts and updates, with notype promotion performed.
It is possible to fix the type of any column in an emptytable definition by specifying a null list of the appropriate type.
t:([] name:`symbol$(); iq:`int$())Shorter, and arguably less obvious,
t:([] name:0#`; iq:0#0N)Note:Either of the previous two forms of empty table definition is the q version ofthe table's schema.
Basicselect
We shall use the following definition in this section,
t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)Syntax
We shall cover select expressions in depth in q-sql, but we provide an introductionhere in order to extract and display data in our examples. The basic selectexpression takes the form,
select colsfrom table
where table is either a table or a keyed table and colsis a comma separated list of columns fromtable. This expression resultsin a list of all records for the specified columns intable.
The simplest form of select is,
select from table
which corresponds to the SQL statement,
SELECT * FROMtable
In q you do not need to write the wildcard character whenyou want all columns in the table.
Note:The basic select expression may look familiar from SQL, but it should seem oddto the q newbie who is finally becoming accustomed to parsing expressionsright-to-left. Neither select nor from represent functions that can stand alone. Instead, they are part ofa template and always appear together.
Q has a host of extensions to the basic select templatewhose elements appear between theselect andfrom or afterthe table element. As we shall see inq-sql, it is possible to convert anyselect template to a purely functional form, although this form isn'tparticularly friendly to the q newbie.
Displayingthe Result
Since the result of select is a list of records, it too isa table.
select from tname iq--------------Dent 42Beeblebrox 98Prefect 126We shall use this method of display in what follows unlesswe need to see the structure of the underlying column dictionary.
SelectingColumns
To select specific columns, list them in the desired order,comma-separated, between select and from.
select name from tname------DentBeeblebroxPrefectselect iq,name from tiq name--------------98 Dent42 Beeblebrox126 PrefectBasicupdate
The syntax of basic update is similar to select,but named columns represent replacement by the specified values. In ourexample,
show update iq:iq%100 from tname iq---------------Dent 0.98Beeblebrox 0.42Prefect 1.26PrimaryKeys and Keyed Tables
KeyedTable
In SQL, it is possible to declare column(s) of a table as aprimary key. Amongst other things, this means that the values in the column(s)are unique, making it possible to retrieve a row via its key value. These twofeatures motivate how q implements a primary key.
We begin with a simple key - i.e., the key is a singlecolumn. The idea is to place the key column in a separate table parallel to atable containing the remaining columns. How to associate each key with itscorresponding value record? Simple: set up a dictionary mapping between the keyrecords and the associated value records.
A keyed table is a dictionary that maps each row ina table of unique keys to a corresponding row in a table of values.
SimpleExample
Let's see how this works for our previous example. Viewingthe data table as a flipped dictionary of rows will make things explicit.
values:flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)Now say we want to add a key column named eidcontaining employee identifiers. We place the identifiers in a separate table.Recall fromColumn Dictionary with a Single Columnthat we must enlist both the column name and the value list for a columndictionary having a single column.
k:flip (enlist `eid)!enlist 1001 1002 1003Now we establish the mapping between the two tables.
kt:k!valuesVoilà!—a keyed table. The console display of a keyed tablelists the key column(s) on the left, separated by a vertical bar from the valuecolumns on the right.
kteid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126Note:The key mapping assumes that the key rows and value records are incorresponding order since the dictionary associates a key with the data row inthe same position.
Note:The keys should be unique. As we have already noted, dictionary creation doesnot enforce uniqueness, but a value row associated with a repeat key is not beaccessible via key lookup. It can be retrieved via a select on the key column.
KeyedTable Specification
The console display of a keyed table demonstrates how todefine it in one step as a dictionary of flipped dictionaries,
kt:(flip (enlist `eid)!enlist 1001 1002 1003)!flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)Unless you are constructing the keyed table from itsconstituents, it is simpler to use table syntax. The key column goes betweenthe square brackets and the value columns to the right as in a normal tabledefinition.
#!qkt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)To define an empty keyed table, use empty key and valuecolumns.
ktempty:([eid:()] name:(); iq:())The empty columns can be typed with either of the followingconstructs,
ktempty:([eid:`int$()] `symbol$name:(); iq:`int$())ktempty:([eid:0#0] name:0#`; iq:0#0)AccessingRecords of a Keyed Table
Since a keyed table is a dictionary mapping, it providesaccess to records in the value table via key lookup. Remember that the recordsin the key table and value table are both dictionary mappings for their rows.
kt[`eid!1002]name| `Beeblebroxiq | 42You can abbreviate the full dictionary specification of akey record to its key value. Our example reduces to,
kt[1002]name| `Beeblebroxiq | 42An individual column in the value record can be accessedvia repeated indexing or indexing at depth.
kt[1002][`iq]42kt[1002;`iq]42Important:The net effect of placing a key on a table is to convert item indexing of therows to generalized indexing via key value. Otherwise put, the first index isconverted from positional retrieval to key lookup.
RetrievingMultiple Records
Given that it is possible to lookup a single record in akeyed table by the key value,
kt[1001]you might think it is possible to retrieve multiple recordsfrom a keyed table via a simple list of keys. You would be wrong.
kt[1001 1002]`lengthTo lookup multiple key values in a keyed table, you mustuse a list of enlisted keys.
kt[(enlist 1001; enlist 1002)]name iq-------------Dent 98Beeblebrox 42A fast way to do this is,
kt[flip enlist 1001 1002]name iq-------------Dent 98Beeblebrox 42Another convenient way to lookup multiple keys is to indexusing a table having a single column with the name of the primary key and valuelist of the desired keys. In our example,
kt[([] eid:1001 1002)]name iq-------------Dent 98Beeblebrox 42This works because the records of the inner table are inthe domain of the keyed table dictionary. SeeOperations on Dictionaries fordetails.
If you want to retrieve the full entries of the keyed tableinstead of just the value records, use the # operator.
([]eid:1001 1002)#kteid | name iq----| -------------1001| Dent 981002| Beeblebrox 42ReverseLookup
Because a keyed table is a dictionary, it is possible toperform reverse lookup from a value to a key. In a simple example,
kts:[eid:1001 1002 1003]name:`Dent`Beeblebrox`Prefect)ktseid | name----| ----------1001| Dent1002| Beeblebrox1003| Prefectkts?`Prefecteid| 1003Componentsof a Keyed Table
Since a keyed table is a dictionary mapping between thetable of keys and the table of values, the functionskey andvalueprovide a convenient way to retrieve the two constituent tables.
key kteid----100110021003value ktname iq--------------Dent 98Beeblebrox 42Prefect 126A list containing the names of the key column(s) can beretrieved with the functionkeys.
keys kt,`eidObserve that cols retrieves both the key and valuecolumn names for a keyed table.
cols kt`eid`name`iqTables vs.Keyed Tables
It is sometimes convenient to convert between a regulartable having a column of (presumably) unique values and the corresponding keyedtable.
The dyadic primitive xkey converts a table with acolumn of unique values to a keyed table. The right argument ofxkeyis the table and the left operand is a symbol (or list of symbols) with thename of the column(s) to be used as the key(s).
t:([] eid:1001 1002 1003; name:`Dent`Beeblebrox`Prefect; iq:98 42 126)teid name iq-------------------1001 Dent 981002 Beeblebrox 421003 Prefect 126`eid xkey teid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126Conversely, to convert a keyed table to a regular table,use xkey with an empty table as the left operand.
kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)kteid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126() xkey kteid name iq-------------------1001 Dent 981002 Beeblebrox 421003 Prefect 126Note:The conversion expressions above do not affect the original table. You mustrefer to the table by name to modify the original.
`eid xkey `t`tteid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126() xkey `kt`ktkteid name iq-------------------1001 Dent 981002 Beeblebrox 421003 Prefect 126Advanced:It is possible to applyxkey against a column that does not contain unique values. The result isa keyed table that does not have a primary key.
t:([] eid:1001 1002 1003 1001; name:`Dent`Beeblebrox`Prefect`Dup )teid name---------------1001 Dent1002 Beeblebrox1003 Prefect1001 Dupktdup:`eid xkey tktdupeid | name----| ----------1001| Dent1002| Beeblebrox1003| Prefect1001| DupDuplicate key values are not accessible via key lookup,
ktdup 1001name| Dentbut they are accessible via select.
select from ktdup where eid=1001eid | name----| ----1001| Dent1001| DupCompoundPrimary Key
We understand that the q implementation of a SQL table witha simple key is actually a dictionary mapping between a pair of tables in whichthe first table has a single key column. This has a straightforward extensionto a compound key.
Recall that a compound key in SQL is a collection of two ormore columns that together provide a unique value for each row. To implement acompound key in q, we generalize the key table from a single column to multiplecolumns by requiring that each record in the key table has a unique combinationof column values.
Here is our example redone to replace the employee id witha compound key comprising the last and first names.
ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford]; iq:98 42 126)Observe that the console displays a compound keyed tablewith the key columns on the left separated by a vertical bar| fromthe value columns to the right.
ktclname fname | iq-----------------| ---Dent Arthur| 98Beeblebrox Zaphod| 42Prefect Ford | 126As in the case of a simple primary key, we can abbreviatethe full key record to the key value for retrieval.
ktc[`Dent`Arthur]iq| 98Here is how to initialize our example as an empty table,
ktc:([lname:();fname:()] iq:())The empty keyed table can be typed with either of thefollowing,
ktc:([lname:`symbol$();fname:`symbol$()] iq:`int$())ktc:([lname:0#`;fname:0#`] iq:0#0)We shall see in Insert into Keyed Tables how to fillboth key columns and data tables in a keyed table simultaneously.
For the fundamentalist, here is the same compound keyedtable built from its constituent pair of tables
ktc:(flip `lname`fname!(`Dent`Beeblebrox;`Arthur`Zaphod))!flip (enlist `iq)!enlist 98 42 126And here is retrieval by full key record,
ktc[`lname`fname!`Beeblebrox`Zaphod]iq| 42Most will agree that the table definition syntax andabbreviated key value retrieval is simpler.
7.4.10Retrieving Records with a Compound Primary Key
Retrieval of multiple records via a compound primary key isactually easier than with a simple key, since each compound key value isalready a list.
ktc (`Dent`Arthur; `Prefect`Ford)iq---98126As was the case with a keyed table having a simple key,retrieval can be performed via a table whose columns and values match the keycolumns.
K:([] lname:`Dent`Prefect; fname:`Arthur`Ford)ktc[K]iq---98126ktc K /use juxtapositioniq---98126As in the case of a simple key, you can use # to retrievethe full entities of the keyed table instead of just the value records.
K#ktclname fname | iq--------------| ---Dent Arthur| 98Prefect Ford | 126Key Lookupwith txf
Looking up keys in a keyed table is complicated by thedifferent formats for simple and compound keys. The triadic functiontxfprovides a uniform way to perform such key lookup. The first argument is akeyed table (target). The second argument is a list of key values,either simple or compound. The third argument is a list of symbol column namesin the value table oftarget. The result is a list comprising thematching row values from the specified columns of the value table oftarget.
In the following example using a simple key, observe thecolumn order of the result.
kts:([k:101 102 103] c1:`a`b`c; c2:1.1 2.2 3.3)txf[kts;101 103;`c2`c1]1.1 `a3.3 `cWith a compound key, the values to be looked up must belisted in columns.
ktc:([k1:`a`b`a; k2:`x`y`z] c1:100 200 300; c2:1.1 2.2 3.3)txf[ktc;(`a`b;`z`y);`c1`c2]300 3.3200 2.2ForeignKeys and Virtual Columns
A foreign key in SQL is a column in one table whose valuesare members of a primary key column in another table. Foreign keys are themechanism for establishing relations between tables.
One of the important features of a foreign key is that theRDBMS enforces referential integrity, meaning that the value in a foreign keycolumn isrequired to be in the referenced primary key column. To inserta foreign key value that is not in the primary key column, it must first beinserted into the primary key column.
Definitionof Foreign Key
Q has the notion of a foreign key that also providesreferential integrity. Extra credit to the reader who has guessed that aforeign key is implemented using an enumeration. In our introduction toenumerations, we saw that an enumeration domain can be any list with uniqueitems. A keyed table meets the criterion of a unique domain, since the keyrecords in the dictionary domain are unique.
A foreign key is a table column defined as anenumerated value over a keyed table. As an enumeration, a foreign key indeedprovides referential integrity by restricting values in the foreign key columnto be in the list of primary key values.
Example ofSimple Foreign Key
An enumeration over a keyed table domain acts just like oursimple enumeration examples. Let's return to a previous example.
kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)To enumerate over the primary key of kt , use asymbol containing the keyed table name as the domain in the enumeration.
`kt$The primary key table records provide the unique set ofvalues for enumerating records.
`kt$`eid!1001`kt$1001As usual, q saves us the trouble of being so explicit andallows the enumeration to be applied to items in the value list for the primarykey dictionary - that is, the primary key values.
e1:`kt$1002 1001 1001 1003 1002 1003e1 = 1003000101bAs with any enumeration, attempting to enumerate a keyvalue that is not in the domain causes an error.
`kt$1004`castWe can use table definition syntax to define a table with aforeign key over kt.
tdetails:([] eid:`kt$1003 1001 1002 1001 1002 1001; sc:126 36 92 39 98 42)The foreign key column has simply been defined as anenumeration over the keyed table.
We see the foreign key table in the f column whenwe invoke meta on the table.
meta tdetailsc | t f a---| ------eid| i ktsc | iAs of release 2.4, the built-in function fkeysreturns a dictionary in which each foreign key column name is mapped to its keydomain—its primary key table name.
treport:([] eid:`kt$1001 1002 1003; mgrid:`kt$1002 0N 1002)fkeys treporteid | ktmgrid| ktResolvinga Foreign Key
There are occasions when you wish to resolve a foreign key,by which we mean substitute the actual values in place of the enumeratedvalues. As with an ordinary enumeration, this is done by applying thevaluefunction to the foreign key column.
update eid:value eid from tdetailseid sc--------1003 1261001 361002 921001 391002 981001 42ForeignKeys and Relations
In SQL, an inner join is used to splice back together datathat has been normalized via relations. The splice is usually done along aforeign key, which establishes a relation to the keyed table via the primarykey. In the join, columns from both tables are available using dot notation.
In q the same effect is achieved using foreign keys withoutexplicitly creating the joined table. The notation is similar, but differentenough to warrant close attention.
Let tf be a table having a foreign key fwhose enumeration domain is the keyed tablekt. All columns inktare available via dot notation in any select expression whose from domain istf.A columnc in kt that is accessed in this way is called avirtualcolumn and is specified with dot notationf.c in the selectexpression.
For example, given t as above, we create a detailstable that contains individual test results for each person. We name theforeign key in the details table the same as the primary key it refers to, butthis is not required,
tdetails:([] eid:`kt$1003 1002 1001 1002 1001 1002; sc:126 36 92 39 98 42)Now we can access columns in t via a select on tdetails.
select eid.name, sc from tdetailsname sc--------------Prefect 126Beeblebrox 36Dent 92Beeblebrox 39Dent 98Beeblebrox 42The case in which the enumeration domain of a foreign keyhas a compound primary key is slightly more complicated. We cover this inOperations on Compound Column Data
Workingwith Tables and Keyed Tables
In this section, we use the following examples.
t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)First andLast Records
Because a table is a list of records, the functions firstand last retrieve the first and last records, respectively.
first tname| `Dentiq | 98last tname| `Prefectiq | 126first ktname| `Dentiq | 98last ktname| `Prefectiq | 126These functions are useful in select expressions,especially with grouping and aggregation.
Note:Every table in kdb+ has a first and last record since it is an ordered list ofrecords. Moreover, the result of aselect template is atable and so is also ordered. Contrast this with SQL, in which tables andresult sets are unordered, and you must use ORDER BY to impose an order.
You can retrieve the first or last n records of atable or keyed table using the take operator (# ).
2#tname iq-------------Dent 98Beeblebrox 42-3#kteid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126See Appendix A for more on usingtake. Also see select[n] for another way to achievethis result using select[n].
Find
The find operator ( ? ) used with a table performsa reverse lookup of a record and returns the corresponding row number. Withtas above,
t?`name`iq!(`Dent;98)0As usual, the record can be abbreviated to a list of rowvalues.
t?(`Dent;98)0You can reverse-lookup a list of multiple row values.
t?((`Dent;98);(`Prefect;126))0 2Since a keyed table is a dictionary, find performs areverse lookup of a value record and returns the key record.
kt?`name`iq!(`Dent;98)eid| 1001kt?(`Dent;98)eid| 1001In the case of find on a table with a single column, eachlist of row values must be a singleton list.
t1:([] eid:1001 1002 1003)t1?(enlist 1001; enlist 1002)0 1The list of singletons can be created by the followingexpressions, although the first executes faster, especially for long lists.
flip enlist 1001 100210011002enlist each 1001 100210011002PrimitiveJoin (,)
The join operator ( , ) is defined for tables andkeyed tables.
You can use join to append a record to a table.
t:([]c1:`a`b;c2:10 20)t,`c1`c2!(`c;30)c1 c2-----a 10b 20c 30This join is one situation in which you cannot use a listof row values.
t,(`a;30)`c1`c2!(`a;10)`c1`c2!(`b;20)`a30You can, however, use a list of row values to amend theoriginal table.
t,:(`a;30)tc1 c2-----a 10b 20c 30Only tables having exactly the same list of column namesand compatible column types can be joined. Since a table is a list of records,the result is obtained by appending the rows of the right operand to those ofthe left operand.
t1:([] a:1 2 3; b:100 200 300)t2:([] a:3 4 5; b:300 400 500)t1,t2a b-----1 1002 2003 3003 3004 4005 500Note that common rows are duplicated in the result.
Two tables with the same columns in different order cannotbe joined with , because the order of columns in records issignificant in q,
t3:([]b:1001 2001 3001; a:101 201 301)t1,t3'mismatchTwo keyed tables with the same key and value columns can bejoined. Because a keyed table is a dictionary, the result has upsert semantics,as we saw inJoin Keys in the right operand thatare not in the left operand are treated as inserts, whereas the right operandacts as an update for common key values.
kt1:([k:1 2 3] c:10 20 30)kt2:([k:3 4 5] c:300 400 500)kt1,kt2k| c-| ---1| 102| 203| 3004| 4005| 500Coalesce(^)
The coalesce operator ( ^ ) is defined for keyedtables and differs from primitive join (, ) in its treatment of nullcolumn items in the value tables.
When two keyed tables have the same key and value columnsand the column values of both keyed tables are non-null atoms,^ behavesthe same as primitive join (, ).
kt1:([k:1 2 3] c1:10 20 30;c2:`a`b`c)kt2:([k:3 4 5] c1:300 400 500;c2:`cc`dd`ee)kt1,kt2k| c1 c2-| ------1| 10 a2| 20 b3| 300 cc4| 400 dd5| 500 eekt1^kt2k| c1 c2-| ------1| 10 a2| 20 b3| 300 cc4| 400 dd5| 500 eeWhen the right operand has null column values, the columnvalues of left operand are only updated with non-null values of the rightoperand.
kt3:([k:2 3] c1:0N 3000;c2:`bbb`)kt3k| c1 c2-| --------2| bbb3| 3000kt1,kt3k| c1 c2-| --------1| 10 a2| bbb3| 3000kt1^kt3k| c1 c2-| --------1| 10 a2| 20 bbb3| 3000 cNote:The performance of^ is slower than that of, since each column valueof the right operand must be checked for null.
ColumnJoin
Two tables with the same number of rows can be combinedwith join-each ( ,' ) to form a sideways, or column, join in which thecolumns are aligned in parallel.
t1:([] a:1 2 3)t2:([] b:100 200 300)t1,'t2a b-----1 1002 2003 300When the column lists of the tables are not disjoint, theoperation on the common columns has upsert semantics because each column is adictionary.
t3:([] a:10 20 30; b:100 200 300)t1,'t3a b------10 10020 20030 300Because keyed tables are dictionaries, they can only besideways joined if they have identical key columns. In such a situation, we candeduce the behavior by recalling fromRemoving Entries that any operation ona dictionary is applied on the common elements of the merged domains and isextended to the non-common domain elements with appropriate nulls.
Thus, a sideways join on keyed tables with identical keycolumns has simple upsert semantics for common data columns. More interestingare the non-common data columns, where the result becomes a column join splicedalong common key values.
t4:([a:1 2 3] x:100 200 300)t4a| x-| ---1| 1002| 2003| 300t5:([a:3 4 5] y:1000 2000 3000)t5a| y-| ----3| 10004| 20005| 3000t4,'t5a| x y-| --------1| 1002| 2003| 300 10004| 20005| 3000ComplexColumn Data
SimpleExample
Recall from the definition of a column dictionary in Dictionary vs. List that there is norestriction that the column vectors must be lists of simple type. We haveheretofore worked with examples having homogenous atomic values in each columnbecause they correspond to familiar SQL tables, but there is no need to limitourselves to simple columns.
Suppose we want to keep track of a pair of dailyobservations, say a low temperature and a high temperature. We can do this bystoring the low and high values in separate columns.
t1:([] d:2006.01.01 2006.01.02; l:67.9 72.8; h:82.1 88.4)t1d l h--------------------2006.01.01 67.9 82.12006.01.02 72.8 88.4t1[0]d| 2006.01.01l| 67.9h| 82.1t1.l67.9 72.8t1.h82.1 88.4We can also store pairs in a single column.
t2:([] d:2006.01.01 2006.01.02; lh:(67.9 82.10; 72.8 88.4))t2d lh--------------------2006.01.01 67.9 82.12006.01.02 72.8 88.4t2[0]d | 2006.01.01lh| 67.9 82.1t2.lh67.9 82.172.8 88.4t2.lh[;0]67.9 72.8t2.lh[;1]82.1 88.4The first form is arguably more natural if you intend tomanipulate the low and high values separately. This example can easily begeneralized to the situation of n-tuples. In this case, storing multiple valuesin a single column has a definite advantage since defining and populating ncolumns is unwieldy when n is not known in advance. Storing and retrievingn-tuples to/from a single column is a simple operation in q. A useful examplein finance is storing daily values for a yield curve.
Operationson Compound Column Data
We generalize the above example to the case of storing aset of repeated observations in which the number of observations is not fixed -i.e., varies with each occurrence. Say we want to perform a statisticalanalysis on the weekly gross revenues for movies and we don't care about thespecific titles. Since there will be a different number of movies in releaseeach week, the number of observations will not be constant. An oversimplifiedversion of this might look something like,
t3:([] wk:2006.01.01 2006.01.08; gr:( 38.92 67.34; 16.99 5.14 128.23 31.69))t3wk gr----------------------------------2006.01.01 38.92 67.342006.01.08 16.99 5.14 128.23 31.69Handling the situation in which the number of column valuesis not known in advance, or is variable, is cumbersome in SQL. You normalizethe data into a master-detail pair of tables, but you cannot re-assemble thedetails into separate columns via a join. Instead, for each master record youget a collection of records that must be iterated over via some sort ofcursor/loop. In verbose programming, this results in many lines of code thatare slow and prone to error on edge cases.
By storing complex values in a single column in a q table,sophisticated operations can be performed in a single expression that executesfast. In the following q-sql examples, don't worry about the details of thesyntax, and remember to read individual expressions from right to left. Observethat because there are no stinking loops, we never need to know the number ofdetail records.
Using our movie data, we can produce the sorted gross, theaverage and high gross for each week in one expression.
select wk, srt:desc each gr, avgr:avg each gr, hi:max each gr from t3wk srt avgr hi-------------------------------------------------2006.01.01 67.34 38.92 53.13 67.342006.01.08 128.23 31.69 16.99 5.14 45.5125 128.23While sorts and aggregates such as Max and Avg are standardSQL, think of how you'd produce the sorted sublist and the aggregates together.In your favorite verbose programming environment, you'll soon discover that youhave a sordid list of rows requiring a loop to unravel into a single outputline.
Now think about what you'd do to compute the percentagedrops between successive gross numbers within each week. Because the sorteddetail items are rows in SQL, this requires another loop. In q,
select wk,drp:neg 1_'deltas each desc each gr,avgr:avg each gr,hi:max each gr from t3wk drp avgr hi------------------------------------------2006.01.01 ,28.42 53.13 67.342006.01.08 96.54 14.7 11.85 45.5125 128.23CompoundForeign Key
Storing multiple values in a column is how to make aforeign key on a compound primary key. We return to the example using last nameand first name as the primary key.
ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford]; iq:98 42 126)We create a details table with a foreign key enumerationover ktc by placing the names in the foreign key column.
tdetails:([] name:`ktc$(`Beeblebrox`Zaphod;`Prefect`Ford;`Beeblebrox`Zaphod); sc:36 126 42)The columns of ktc are available as virtualcolumns from tdetails.
select name.lname,name.iq,sc from tdetailslname iq sc------------------Beeblebrox 42 36Prefect 126 126Beeblebrox 42 42Attributes
Attributes are metadata applied to lists of special form.They are used on a dictionary domain or a table column to reduce storagerequirements and/or speed retrieval. When it sees an attribute, the qinterpreter can make certain optimizations based on the structure of the list.
Important:Attributes are descriptive rather than prescriptive. Consequently, applying anattribute (other than`g#) to a list will not make it so. Moreover, a modification thatrespects the form specified by the attribute leaves the attribute intact (otherthan`p#), while a modification that breaks the form is permitted but theattribute is lost on the result.
The syntax for applying an attribute looks like the verb #with a left operand containing the symbol for the attribute and the list as theright operand. However, this use of# is not functional.
Note:You will not see significant benefit from a attribute for less than a millionitems. This is why attributes are not automatically applied in mundanesituations such as the result of til or distinct. You should test yourparticular situation to see whether applying an attribute actually providesperformance benefit.
Sorted (`s#)
Applying the sorted attribute (`s#) to a listindicates that the items of the list are sorted in ascending order.
Note:As of this writing (Jun 2007) there is no way to indicate a descending sort.
When a list has the sorted attribute, the default linearsearch used in lookups is replaced with binary search. Sorted also makescertain operations much faster — for examplemin andmax.
The following fragments show situations in which thisapplies.
x?v... where x = v, ...... where x in v, ...... where x within v, ...The sorted attribute can be applied to a simple list,
L:`s#1 2 2 4 8L`s#1 2 2 4 8L,:16 / respects sortL`s#1 2 2 4 8 16L,:0 / does not, attribute lostL1 2 2 4 8 16 0or a column of a table,
t:([]`s#t:04:02:42.001 04:02:42.003;v:101.05 100.95)The sorted attribute can be applied to a dictionary, whichmakes the dictionary into a step function.
ds:`s#1 2 3 4 5!`a`b`c`d`eds1| a2| b3| c4| d5| eApplying the sorted attribute to a table implies binarysearch on the table and also that the first column is sorted.
ts:`s#([]t:04:02:42.001 04:02:42.003;v:101.05 100.95)tst v-------------------04:02:42.001 101.0504:02:42.003 100.95Applying the sorted attribute to a keyed table means thatthe dictionary, its key table and its key column(s) are all sorted.
kt:`s#([k:1 2 3 4] v:`d`c`b`a)ktk| v-| -1| d2| c3| b4| aUnique (`u#)
Applying the unique attribute (`u#) to a listindicates that the items of the list are distinct. Knowing that the elements ofa list are unique dramatically speeds updistinct and allows q to exitsome comparisons early.
Operations on the list must preserve uniqueness or theattribute is lost.
LU:`u#4 2 6 18 1LU`u#4 2 6 18 1LU,:0 / uniqueness preservedLU`u#4 2 6 18 1 0LU,:2 / attribute lostLU4 2 6 18 1 0 2The unique attribute can be applied to the domain of adictionary, a column of a table, or the key column of a keyed table. It cannotbe applied to a dictionary, a table or a keyed table directly.
Parted (`p#)
The parted attribute (`#p) indicates that the listrepresents a step function in which all occurrences of a particular outputvalue are adjacent. The range is an int or temporal type that has an underlyingint value, such as years, months, days, etc. You can also partition over asymbol provided it is enumerated.
Advanced':Applying the parted attribute causes the creation of an index dictionary thatmaps each unique output value to the position of its first occurrence.
When a list is parted, lookup is much faster since linearsearch is replaced by hashtable lookup.
Sorting in ascending or descending order is one way toproduce the partitioned structure, but list need not be in sorted order. Forexample,
L:`p#2 2 2 1 1 4 4 4 4 3 3L,:3L2 2 2 1 1 4 4 4 4 3 3 3The parted attribute is not preserved under an operation onthe list, even if the operation preserves the partitioning.
Note:The parted attribute should be considered when the number of entities reaches abillion and most of the partitions of of substantial size—i.e., there issignificant repetition.
Grouped (`g#)
The grouped attribute (`g#) differs from otherattributes in that it imposes additional structure on the list by causing q tocreate and maintain an index. Grouping can be applied to a list when no otherassumptions about its structure can be made.
Applying the grouped attribute to a table column roughly correspondsto placing a SQL index on a column. For example, if you wish to query a tablevia a symbol column sym, applying the grouped attribute to the columndrastically speeds up queries such as,
select[-100] ... where sym=`xyzHere we are retrieving the last 100 records matching a symvalue.
Advanced:The index is a dictionary that maps each unique output value to the a list ofthe positions of all its occurrences. This speeds lookups and some operations(e.g., distinct). The tradeoff is significant storage overhead.
For example,
L:`g#1 2 3 2 3 4 3 4 5 2 3 4 5 4 3 5 6L`g#1 2 3 2 3 4 3 4 5 2 3 4 5 4 3 5 6Note:The grouped attribute is preserved for both inserts and upserts.
Applying the grouped attribute to a table column,
t:([]`g#c1:1 2 3 2 3 4 3 4; c2:`a`b`a`c`a`d`b`c)Note:As of this writing (Jun 2007), the maximum number of`g# attributesthat can be placed on a single table is 99.
Contents
[hide]
- 1 Queries: q-sql
- 1.1 Overview
- 1.2 Insert
- 1.2.1 Basic Insert
- 1.2.2 Alternate Forms
- 1.2.3 Repeated Inserts
- 1.2.4 Columnar Bulk Insert
- 1.2.5 Table Insert
- 1.2.6 Insert into Keyed Tables
- 1.2.7 Insert into Empty Tables
- 1.2.8 Insert and Foreign Keys
- 1.3 The select and exec Templates
- 1.3.1 Syntax
- 1.3.2 The where Phrase
- 1.3.3 The select Phrase
- 1.3.4 The by Phrase
- 1.3.5 The exec Template
- 1.3.6 Using distinct in select and exec
- 1.3.7 Using each in where
- 1.3.8 Nested where
- 1.3.9 select[n]
- 1.3.10 fby
- 1.4 The update Template
- 1.4.1 Basic update
- 1.4.2 update-by
- 1.5 upsert
- 1.6 delete
- 1.7 Grouping and Aggregation
- 1.7.1 SQL Aggregation
- 1.7.2 Grouping without Aggregation
- 1.7.3 Aggregation without Grouping
- 1.7.4 Grouping with Aggregation
- 1.7.5 Using Uniform and Aggregate Functions
- 1.7.6 Using each
- 1.7.7 Using ungroup
- 1.8 8.7 Sorting
- 1.8.1 xasc
- 1.8.2 xdesc
- 1.9 Renaming and Rearranging Columns
- 1.9.1 xcol
- 1.9.2 xcols
- 1.10 Joins
- 1.10.1 Equijoin on Foreign Key
- 1.10.2 Pseudo Join
- 1.10.3 Ad hoc Left Join
- 1.10.4 Plus Join
- 1.10.5 Union Join
- 1.10.6 Asof Join
- 1.11 Parameterized Queries
- 1.12 Views
- 1.12.1 View
- 1.13 Functional Forms
- 1.13.1 Functional select
- 1.13.2 Functional exec
- 1.13.3 Functional update
- 1.13.4 Functional delete
- 1.14 Examples
- 1.14.1 The Table Schemas
- 1.14.2 Creating the Tables
- 1.14.3 Basic Queries
- 1.14.4 Meaty Queries
- 1.14.5 Remote Queries
9. Queries:q-sql
Overview
Q has a collection of functions for manipulating tablesthat are similar to their counterparts in SQL. This collection, which we callq-sql,includes the usual suspects such as insert, select, update, etc., as well asfunctionality that is not available in traditional SQL. While q-sql provides asuperset of SQL functionality, there are some significant differences in thesyntax and behavior.
The first important difference is that a q table haswell-defined record and column orders. This is particularly useful in dealingwith the situation in which records are inserted in a canonical order.Subsequent actions against the table will then retrieve records in this order.For example, a time series can be created by inserting (in time order) pairs consistingof a time (or date, or datetime) value and data value(s). The result of anyselect will then be in time order, without requiring a sort.
A second difference is that every q table is storedphysically as a collection of column vectors. This means that operations oncolumn data are easy and fast since atomic, aggregate or uniform functionsapplied to columns are optimized vector operations.
A third difference is that q-sql provides upsert semantics.This means that one dataset can be applied to another without the need toseparate inserts from updates. Upsert can simplify operations significantly inpractice.
In this chapter, we cover the important features of q-sql,including all the basic operations in kdb+. We demonstrate each feature with asimple example. Gradually, more complex examples are introduced.
Many examples are based on the sp.q distributionscript. The schemas for the tables in the script are,
s:([s:()]name:();status:();city:())p:([p:()]name:();color:();weight:();city:())sp:([]s:`s$();p:`p$();qty:())The contents of the tables are,
ss | name status city--| -------------------s1| smith 20 londons2| jones 10 pariss3| blake 30 pariss4| clark 20 londons5| adams 30 athenspp | name color weight city--| -------------------------p1| nut red 12 londonp2| bolt green 17 parisp3| screw blue 17 romep4| screw red 14 londonp5| cam blue 12 parisp6| cog red 19 londonsps p qty---------s1 p1 300s1 p2 200s1 p3 400s1 p4 200s4 p5 100s1 p6 100s2 p1 300s2 p2 400s3 p2 200s4 p2 200s4 p4 300s1 p5 400Insert
Insert appends records to a table or keyed table.
BasicInsert
To add records to a table, use the dyadic function insert,
insert[st;L]
where st is a symbol containing the name of atable (target) andL is a list whose items correspond to recordsoftarget. The result ofinsert is a list of int representingthe positions of the new record(s).
Note:Since the items in L are appended to the column vectors of st, eachvalue must type-match the corresponding column vector.
For a regular (i.e., non-keyed) table, the effect of insertis to append a new record holding the specified values. Let's use our simpleexample.
t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)Insert a record into t as follows,
insert[`t;(`Slartibartfast;156)],3tname iq-------------------Dent 42Beeblebrox 98Prefect 126Slartibartfast 156AlternateForms
Since the dyadic insert is also a verb, it cantake various notational forms. For example, the previous insert can be writtenas a binary operator.
`t insert (`Slartibartfast; 156)It can also be expressed as a projection onto the firstargument with juxtaposition of the second argument.
insert[`t] (`Slartibartfast; 156)You may find one of these more readable. We shall use theminterchangeably.
You can also insert a record, as opposed to a list of rowvalues.
`t insert `name`iq!(`Slartibartfast; 156)This is useful when you wish to insert a table which is theresult of a select.
RepeatedInserts
For a (non-keyed) table, repeatedly inserting the same datais permissible and it results in duplicate records.
t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)`t insert (`Slartibartfast; 156) / one form,3insert[`t] (`Slartibartfast; 156) / equivalent form,4tname iq-------------------Dent 42Beeblebrox 98Prefect 126Slartibartfast 156Slartibartfast 156ColumnarBulk Insert
In the preceding, we have considered the case when the listin an insert represents a set of values for a single row. Each item is an atomdestined for the corresponding column in the table. It is also possible to bulkinsert multiple entries.
Recall that a table is a dictionary of columns. So in theexample,
t:([] name:`Dent`Beeblebrox; iq:98 42)`t insert (`Prefect;126),2the right operand looks like a row, but is in fact a listof column values. With this perspective, a bulk insert can be achieved with acompound list, each of whose items is a list of column values destined for thecorresponding column in the table.
t:([] name:`Dent`Beeblebrox; iq:98 42)`t insert (`Prefect`Mickey;126 1024)2 3TableInsert
It is also possible to bulk insert records (i.e., rows). Atable can be viewed as a list of records (and vice versa), so it is reasonableto insert one table into another provided the columns are compatible.
t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)tnew:([] name:`Slartibartfast`Mickey; iq:158 1042)`t insert tnew3 4tname iq-------------------Dent 98Beeblebrox 42Prefect 126Slartibartfast 158Mickey 1042Insertinto Keyed Tables
Inserting data into a keyed table works just like insertingdata into a regular table, with the additional requirement that the key mustnot already exist in the table. Using our previous example of a keyed table,
t:([eid:1001 1002] name:`Dent`Beeblebrox; iq:98 42)teid | name iq----| -------------1001| Dent 981002| Beeblebrox 42`t insert (1004; `Slartibartfast; 158),2teid | name iq----| -------------------1001| Dent 981002| Beeblebrox 421004| Slartibartfast 158The following insert fails because the key 1004 alreadyexists in t,
`t insert (1004; `Slartibartfast; 158)'insertObserve that, by default, the records in a keyed table arestored in insert order rather than key order.
`t insert (1003; `Prefect; 126)teid | name iq----| -------------------1001| Dent 981002| Beeblebrox 421004| Slartibartfast 1581003| Prefect 126Insertinto Empty Tables
We consider the situation of an empty table with no columntypes specified. The column types are inferred from the first insert.
t:([] name:(); iq:())type t.name0htype t.iq0h`t insert (`Dent; 98),0type t.name11htype t.iq6hIf you define an empty table without types, be especiallycareful to get the first insert correct.
`t insert (98; `Dent)...`t insert (`Beeblebrox; 42)`typeIt is advantageous to define an empty table with types. Inour example,
t:([] name:`symbol$(); iq:`int$())t:([] name:0#`; iq:0#0) / an equivalent waytype t.name11htype t.iq6hInsert andForeign Keys
When inserting data into a table that has a foreign key,everything works as for a regular table, except that a value destined for aforeign key column must already exist as a key in the corresponding primary keytable.
Note:This last requirement is how q implements referential integrity.
Returning to our example of the previous section,
kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)tdetails:([] eid:`kt$1003 1002 1001 1002 1001; sc:126 36 92 39 98)kteid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126tdetailseid sc--------1003 1261002 361001 921002 391001 98`tdetails insert (1002; 42),5tdetailseid sc--------1003 1261002 361001 921002 391001 981002 42The following insert fails because the key 1004 does notexist in kt.
`tdetails insert (1004; 158)'castThe selectand exec Templates
In this section, we investigate the general form of select,which we met briefly inBasic select. We presentselect asa template having required and optional elements. The template elements, inturn, contain phrases whose expressions involve column values. The qinterpreter applies the template against the specified table to produce aresult table. While the syntax and results resemble those of the analogous SQLstatement, the underlying mechanics are quite different.
We shall examine each of the constituents of the selecttemplate in detail. Our approach is to introduce the concepts with illustrativeexamples using trivial tables and then to proceed with more meaningful examplesusing time series. Here are our sample table definitions:
tk:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)tdetails:([] eid:`tk$1003 1002 1001 1002 1001 1002; sc:126 36 92 39 98 42)Syntax
The select template has the following form, whereelements enclosed in matching angle brackets (<...>) are optional.
select <ps>
pb>from texp pw> The select and from keywords arerequired, as is texp, which is a q expression whose result isa table or keyed table. The elementsps,pband pw are the select, the by and the wherephrases, respectively. The result ofselect is a list of records or,equivalently, a table.
Note:Ifwhere is present andtexp is itself the result of aselect, the expression that producestexp must be enclosed inparentheses.
Some simple examples follow.
select from tkeid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126select eid,name from tk where name=`Denteid name---------1001 Dentselect cnt:count sc by eid.name from tdetailsname | cnt----------| ---Beeblebrox| 3Dent | 2Prefect | 1select topsc:max sc, cnt:count sc by eid.name from tdetails where eid.name<>`Prefectname | topsc cnt----------| ---------Beeblebrox| 42 2Dent | 98 2The order of execution for select is:
(1) fromexpression texp,
(2) where phrasepw
(3) by phrase pb
(4) selectphrase ps
In particular, the from expression is always evaluatedfirst and the select phrase last.
Note:Ifps is absent, all columns are returned. There is no needfor the * wildcard of SQL.
Each phrase in the select template is acomma-separated list of subphrases. Asubphrase is an expressioninvolving columns oftexp or virtual columns if a tablerelated totexp via foreign key. The subphrases within aphrase are evaluated left-to-right, but each expression comprising a subphraseis parsed right-to-left, like any q expression.
Important:The commas separating the subphrases are separators, meaning that it is notnecessary to enclose a subphrase in parentheses. However, any expressioncontaining the join operator ( , ) must be enclosed in parentheses todistinguish it from the separator.
The wherePhrase
The where phrase controls which records appear in theresult. The action of this phrase is a generalization of the built-inwherefunction (SeeAppendix A).
Each subphrase is a criterion on columns. It produces aboolean result vector corresponding to records passing or failing thecriterion. The effect of awhere subphrase is to select only therecords that pass its criterion.
The individual where subphrases are applied fromleft-to-right. Each step produces a result whose rows are a subset of theprevious one. The net effect is a series of progressively narrowed interimtables.
select from tk where iq <100eid name iq------------------1001 Dent 981002 Beeblebrox 42select from tdetails where eid=1002eid sc-------1002 361002 391002 42select from tdetails where eid=1002,sc eid sc-------1002 361002 39select from tdetails where (eid=1002)&sc eid sc-------1002 361002 39We point out that the last two queries return the sameresult but execute differently; we shall see more about this later. Alsoobserve that the parentheses in the last query are necessary due toright-to-left evaluation of expressions.
The selectPhrase
The select phrase controls which columns appear in theresult. Each select subphrase produces a column. The name of the result columnfrom each subphrase is taken from the last underlying column referenced in thesubphrase evaluation unless the result is renamed by assignment.
select LastName:name,iq from tkLastName iq--------------Dent 98Beeblebrox 42Prefect 126If a column is repeated in the select phrase, it appearsmore than once in the result. This behaves like SQL SELECT.
select iq,iq from tkiq iq-------98 9842 42126 126Important:A virtual columni holding the position of each record is implicitly available in theselect phrase.This is useful, for example, in aggregation if you want a column with recordcounts without reference to a specific column name.
select cnt:count i by eid from tdetailseid | cnt----| ---1001| 21002| 31003| 1In this situation, i plays a role somewhat similarto * in SQL, but is more useful since it can be used to select specificrecords. For example, criteria oni can be used to fill only one pageof results when you do not wish to transmit an entire result set. Here is thesecond page of detail records for a page size of 3, noting that the withinfunction includes both its endpoints (see Appendix A).
select from tdetails where (3<=i) and i<6eid sc-------1002 391001 981002 42This is difficult to do in SQL and vendors have addedproprietary extensions to handle it.
The byPhrase
The by phrase controls how rows are grouped in theresult. The action of this phrase is a generalization of the built-ingroupfunction (SeeAppendix A).
Each by subphrase is an expression involving acolumn. It produces a grouping criterion for that column. The columns resultingfrom theby phrase become the primary keys of theselectresult. Multiple subphrases in theby phrase result in a compoundprimary key in the result.
Note:If the by phrase is included, theresult of select is a keyed table; if itis omitted, the result is a table.
Important:Every column included in the by phrase is automatically included in the resultand should not be included separately in theselect phrase.
It is possible to group without aggregation. The result isa table with non-simple lists for columns - that is, non-atomic column values.(SeeComplex Column Data for more on tables withnon-simple column lists.)
select sc by eid from tdetailseid | sc----| --------1001| 92 981002| 36 39 421003| ,126This cannot be achieved easily with GROUP BY in SQL.
The function ungroup can be used to normalize theresult of grouping back to a flat table.
seid:select sc by eid from tdetailsseideid | sc----| -----1001| 92 981002| 36 39 421003| ,126ungroup seideid sc--------1001 921001 981002 361002 391002 421003 126The execTemplate
The syntax of the exec template is identical to select.
exec <ps>
pb>from texp pw> The difference from select is that the result isnot a table.
If only one column is produced by the select phrase, theresult of exec is a list containing the column values produced. Thiscontrasts withselect, which produces a table with a single column inthis situation.
With tk as above,
tkeid | name iq----| --------------1001| Dent 981002| Beeblebrox 421003| Prefect 126select name from tkname----------DentBeeblebroxPrefectexec name from tk`Dent`Beeblebrox`PrefectUsing exec to extract a single column of a table (asopposed to a keyed table) is more powerful than other mechanisms to extract thecolumn because you can apply constraints on other columns.
tdetails.sc126 36 92 39 98 42tdetails[`sc]126 36 92 39 98 42exec sc from tdetails126 36 92 39 98 42exec sc from tdetails where eid in 1001 100236 92 39 98 42If more than one column is produced by the select phrase,the result of exec is a dictionary mapping the column names to thevalues produced. This contrasts withselect, which produces a tablewith the specified columns.
select eid,name from tkeid name---------------1001 Dent1002 Beeblebrox1003 Prefectexec eid,name from tkeid | 1001 1002 1003name| Dent Beeblebrox PrefectUsingdistinct in select and exec
The built-in distinct function (see Appendix A) applied to a sourcetable returns a table containing the unique records in the source.
tdup:([]c1:10 20 10 30 10 20 40 30;c2:`a`b`a`c`z`b`d`c)tdupc1 c2-----10 a20 b10 a30 c10 z20 b40 d30 cdistinct tdupc1 c2-----10 a20 b30 c10 z40 dBy including distinct in the select phrase of a selector exec query, you can similarly suppress duplicates from the result.
select distinct c1 from tdupc1--10203040exec distinct c2 from tdup`a`b`c`z`dNote:Whendistinct is used inselect, it appears immediately after ‘select’ and is applied across allthe specified columns, meaning that it returns rows with distinct values inthose columns. By contrast, in exec,distinct can apply to anycolumn and the result will be a non-rectangular in general.
select distinct c2,c1 from tdupc2 c1-----a 10b 20c 30z 10d 40exec distinct c2,c1 from tdupc2| `a`b`c`z`dc1| 10 20 10 30 10 20 40 30exec distinct c2,distinct c1 from tdupc2| `a`b`c`z`dc1| 10 20 30 40One way to understand this behavior is as follows. Theresult of select is a table, which is rectangular; hencedistinctmust produce full rows. The result ofexec is a dictionary, so eachcolumn name (i.e., key) can have a different number of values.
Using eachin where
If a function or operator used in a where criterion is notatomic or uniform in its argument, you must use an each adverb. This is becausethe criterion is applied across the column vector(s).
ts:([]f:1.1 2.2 3.3;s:("abc";"d";"ef"))select from ts where s~"abc"f s---select from ts where ("abc"~) each sf s---------1.1 "abc"The first select does not achieve the desiredresult because it asks if the entire column matches the specified string. Thesecondselect works correctly because it is the projection of thebinary match operator applied to each item of the column.
Nestedwhere
As was mentioned in The where Phrase the criteria in thesubphrases of a where phrase are applied to the records of the tablesequentially from left to right. Consequently, the final list of records isobtained via a succession of intermediate results, each of which is narrowed bythe following subphrase criterion. Otherwise put, the where subphrasesconstitute a nested set of criteria.
The order of the subphrases in a nested where can havesignificant performance implications for queries against large tables. Wheneverpossible, list the subphrases in order of decreasing restrictiveness. That is,choose the subphrase at each position to be the one that results in thegreatest narrowing. Each intermediate table will be smallest and consequentlyless processing will be required at the next step.
Note:If there is one where subphrase that will always result in a significantlysmaller result set, it should be placed first in the sequence. In the case of apartitioned table, place any constraint on the partition column first.
A typical example is a series of measurements for entitieswith an identifier. This could be real-time stock prices, daily bond yields,yearly batting averages, test scores, etc. Say there are many differentidentifier values and you want to select certain records for a givenidentifier. It is better to filter on the identifier first since this willimmediately restrict the result set to a small subset of the original. This canlead an order of magnitude improvement.
Let's take our trivial example of IQ test scores andimagine that the table contains the result of SAT scores for all high schoolseniors in the United States. In this case, there will be several millionstudents with only a few records per student. Clearly if you want to perform ananalysis on the scores of an individual, it is best to limit the result bystudent first, since the initial and subsequent intermediate tables will betiny.
Imagine the following table containing millions of studentsocial security numbers and scores.
tscores:([]ssn:0#`;sc:0#0)`tscores insert (`$"111-11-1111"; 999)`tscores insert (`$"222-22-2222"; 1242)`tscores insert (`$"333-33-3333"; 735)`tscores insert (`$"444-44-4444"; 1600)`tscores insert (`$"555-55-5555"; 1178)`tscores insert (`$"111-11-1111"; 1021)`tscores insert (`$"666-66-6666"; 882)...Since each student takes the test only a few times, thefollowing query,
select from tscores where ssn=`$"111-11-1111",0executes significantly faster than,
select from tscores where (ssn=`$"111-11-1111")&0We point out that any nested where phrase is logicallyequivalent to an unnested phrase in which each of the subphrases is joined by&. In our example, the nested query produces the am results as either,
select from tscores where (ssn=`$"111-11-1111")&0 / orselect from tscores where (0However, both unnested versions will execute more slowlysince the compound criterion is applied against all records in the table.
select[n]
You can return the first or last n records of a selectresult using function parameter syntax on theselect. A positiveparameter returns the first n records specified by theselect body,while a negative parameter returns the last records.
select[2] from tkeid | name iq----| -------------1001| Dent 981002| Beeblebrox 42select[-1] from tkeid | name iq----| -----------1003| Prefect 126fby
It is sometimes desirable to use an aggregate function inthe where phrase of select. For example, suppose we are given a tablewith a foreign key and we wish to determine which key values have more than oneentry in the table. A first attempt might be to place a condition in thewherephrase that filters on the count being greater than 1. In our example oftdetails,this would be something like,
select distinct eid from tdetails where 1 eid----100310021001You can see this doesn't work, as the record for eidvalue 1003 is included even though it has only a single entry intdetails.What went wrong?
The better question is, what does this where expressionactually do? Since count is an aggregate function, it is appliedagainst the list of column values foreid. It cannot select individualrows since it does not return a boolean vector result. Indeed, it returns thescalar 5, the number of items in the column vector.
You could achieve the desired result with a correlatedsubquery. The inner query counts the records for each key value usingaggregation and grouping.
q1:select cnt:count eid by eid from tdetailsq1eid | cnt----| ---1001| 21002| 31003| 1The outer query selects the records with the desired count.
select eid from q1 where 1 eid----10011002An easier way to accomplish this result is to use fbyin the where phrase. Placingfby in a where subphrase allows anaggregate function to be used to select individual rows. The action is similarto the grouping ofby, with the specified aggregate function appliedacross the grouped values. (Hence the name "fby" which is short for"function by").
The use of fby is somewhat more abstract thanother elements of the select template. It is a binary operator of theform,
(fagg;expcol) fby c
The left operand is a two-item list consisting of anaggregate function fagg and a column expressionexp'colon which the function will be applied. The right operandc is a symbolcontaining the name of the column whose values are grouped to form lists forthe aggregate function.
Inclusion of fby in a where subphrase selectsthose records whose group passes the subphrase criterion specified by theaggregate function. This means that all records in a group either pass or failtogether, depending on the result of the aggregation on the group.
In our example above, we can achieve the desired resultwith an un-nested select usingfby. First, we verify thatfbydoes indeed accomplish what we want. Remember to evaluate the where criterionright-to-left.
select eid from tdetails where 1<(count;eid) fby eideid----10021001100210011002Now we eliminate the duplicates.
select distinct eid from tdetails where 1<(count;eid) fby eideid----10021001Note:Multiple columns in the right operand offby must be encapsulatedin a table. To do this is, create an anonymous empty table with the desiredcolumn names only.
t:([]sym:`IBM`IBM`MSFT`IBM`MSFT;ex:`N`O`N`N`N;time:12:10:00.0 12:30:00.0 12:45:00.0 12:50:00.0 13:30:00.0;price:82.1 81.95 23.45 82.05 23.40)tsym ex time price--------------------------IBM N 12:10:00.000 82.1IBM O 12:30:00.000 81.95MSFT N 12:45:00.000 23.45IBM N 12:50:00.000 82.05MSFT N 13:30:00.000 23.4select from t where price=(max;price) fby ([]sym;ex)sym ex time price--------------------------IBM N 12:10:00.000 82.1IBM O 12:30:00.000 81.95MSFT N 12:45:00.000 23.45It may take a while to get used to this notation.
The updateTemplate
Basicupdate
The update template has the same form as the selecttemplate.
update <pu>
pb> from texp pw> The difference is that column assignments in the updatephrase pu represent modifications to columns instead ofcolumn name aliases.
t:([] c1:`one`two`three; c2:10 20 30)update c1:`third,c2:33 from t where c1=`threec1 c2--------one 10two 20third 33Important:In order to modify the contents oftexp you must refer to atable by name.
After execution of the query above, we still find,
tc1 c2--------one 10two 20three 30However, t can be modified in place by referringto the table by name.
update c1:`third,c2:33 from `t where c1=`three`ttc1 c2--------one 10two 20third 33Note:Unlike updates in SQL,update can add a new column.
t:([] c1:20 10 30 20; c2:`z`y`x`a)tc1 c2-----20 z10 y30 x20 aupdate c3:100+c1 from `t`ttc1 c2 c3---------20 z 12010 y 11030 x 13020 a 120update-by
When the by phrase is present, update can be usedto create new columns from the grouped values. When an aggregate function isused, it is applied to each group of values and the result is assigned to allrecords in the group.
t:([] n:`a`b`a`c`c`b; p:10 15 12 20 25 14)tn p----a 10b 15a 12c 20c 25b 14update av:avg p by n from tn p av---------a 10 11b 15 14.5a 12 11c 20 22.5c 25 22.5b 14 14.5If a uniform function is used, it is applied across thegrouped values and the result is assigned in sequence to the records in thegroup. Withtas above,
update s:sums p by n from tn p s-------a 10 10b 15 15a 12 22c 20 20c 25 45b 14 29upsert
The dyadic function upsert is an alternate namefor join ( , ) on tables and keyed tables.
For keyed tables, the match is done by key value.
kt:([k:`one`two`three] c:10 20 30)ktk | c-----| --one | 10two | 20three| 30ku:([k:`three`four]; c:300 400)kuk | c-----| ---three| 300four | 400kt upsert kuk | c-----| ---one | 10two | 20three| 300four | 400For regular (non-keyed) tables, the records are appended.
t:([]c1:`one`two`three;c2:10 20 30)tc1 c2--------one 10two 20three 30u:([]c1:`three`four;c2:30 40)uc1 c2--------three 30four 40t upsert uc1 c2--------one 10two 20three 30three 30four 40Note:The upsert expressions above do not affect the original table. You must referto the table by name to modify the original.
delete
The syntax of the delete template is simpler thanthat of select, with the added restriction that eitherpcolsorpw can be present but not both.
delete <pcols> from texp
pw> If pcols is present as a symbol list ofcolumn names, the result is a table derived fromtexp inwhich the secified columns are removed. Ifpw is present, theresult is a table derived fromtexp in which records meetingthe criteria ofpw are removed.
t:([]c1:`a`b`c;c2:`x`y`z)tc1 c2-----a xb yc zdelete c1 from tc2--xyzdelete from t where c2=`zc1 c2-----a xb yImportant:In order to modify the contents oftexp you must refer to thetable by name.
Thus, after execution of the last query above, we stillfind,
tc1 c2-----a xb yc zHowever, t can be modified in place with,
delete from `t where c2=`z'ttc1 c2-----a xb yGroupingand Aggregation
Aggregation is the result of applying an aggregate function- one that produces an atom from a list - to a column.
SQLAggregation
In traditional SQL, aggregation and grouping are limitedand cumbersome. Aggregation and grouping are bound together: only columns thatappear in the GROUP BY can participate in the SELECT result. Moreover, there isa limited collection of built-in aggregation functions.
In q, grouping and aggregation can be used independently ortogether.
Groupingwithout Aggregation
Grouping in q collects rows having a common value in thegroup domain. Unlike SQL, any column can participate in the select result whengrouping. Moreover, the columns in the by phrase are automatically included inthe result as keys.
When a column not in the by phrase is explicitly specifiedin the select phrase, the result of grouping without aggregation has acorresponding column of non-simple type. There will be one item in the valuelist for each record matching a given group domain value.
For example, we can group order quantities by supplier inthe sp.q script sample tables.
select qty by s from sps | qty-----| -----------------------s1| 300 200 400 200 100 4000s2| 300 400s3| ,200s4| 100 200 300You can group by the result of a function applied to acolumn. For example, the following query groups all products meeting a certainorder quantity threshold.
select distinct p by thrsh:qty>200 from spthrsh| p-----| ------------------0 | `p$`p2`p4`p5`p61 | `p$`p1`p3`p2`p4`p5You can also group by virtual columns from foreign keys.
select sname:s.name, qty by pname:p.name from sppname| sname qty-----| ----------------------------------------bolt | `smith`jones`blake`clark 200 400 200 200cam | `clark`smith 100 400cog | ,`smith ,100nut | `smith`jones 300 300screw| `smith`smith`clark 400 200 300Important:When no columns are explicitly specified in the select phrase, the result ofgrouping without aggregation has columns of simple type. The value for eachresult column is obtained by picking the value of the last record matching thegroup domain value.
For example, the following query,
select by p from spp | s qty--| ------p1| s2 300p2| s4 200p3| s1 400p4| s4 300p5| s1 400p6| s1 100is equivalent to the following query using the aggregatelast on each non-grouped column,
select last s, last qty by p from spp | s qty--| ------p1| s2 300p2| s4 200p3| s1 400p4| s4 300p5| s1 400p6| s1 100One way to obtain all the remaining columns in a groupingwithout explicitly listing them in aselect is to use thexgroupfunction. It takes column symbol(s) as the left operand and a table as itsright operand. The result is a keyed table that is that same as listing all thenon-grouped columns in the comparableselect.
Using the distibution example,
`p xgroup spp | s qty--| -------------------------------p1| `s$`s1`s2 300 300p2| `s$`s1`s2`s3`s4 200 400 200 200p3| `s$,`s1 ,400p4| `s$`s1`s4 200 300p5| `s$`s4`s1 100 400p6| `s$,`s1 ,100Aggregationwithout Grouping
Aggregation can be applied against a column of non-simpletype in any table. The aggregate function can be any function that processes alist of the appropriate form and produces an atom. While q has many built-inaggregates, you can also define and use your own.
We calculate the average order quantity in the sp.qscript sample tables by using the built-in aggregateavg.
select totq:sum qty, avgq:avg qty from sptotq avgq-------------3100 258.3333Groupingwith Aggregation
The equivalent of SQL aggregation is achieved in q bycombining grouping with aggregation.
Continuing with the sp.q script example, wecombine grouping and aggregation to compute the average order quantity bysupplier.
select avgqty:avg qty by s.name from spname | avgqty-----| --------blake| 200clark| 200jones| 350smith| 266.6667UsingUniform and Aggregate Functions
Any uniform or aggregate function can by applied directlyto columns in aggregation.
Again using the sp.q distribution example, foreach salesperson we can find the cumulative low quantity at the same time withthe average and high.
select cumlo:mins qty, av:avg qty, hi:max qty by s.name from spname | cumlo av hi-----| ------------------------------------blake| 200 200 200clark| 100 100 100 200 300jones| 300 300 350 400smith| 300 200 200 200 100 100 266.6667 400Using each
If the data in a column is not atomic (that is, the columnhas a list of values for each row), you must use theeach modifier toapply an aggregate.
In our sp.q example, suppose we define a table ofintermediate results as,
o:select qty by p.name from sponame | qty-----| ---------------bolt | 200 400 200 200cam | 100 400cog | 100nut | 300 300screw| 400 200 300We must use each to compute the average order sizefor each product ino.
select name, avqty:avg qty from o'lengthselect name, avqty:avg each qty from oname avqty-----------bolt 250cam 250cog 100nut 300screw 300Usingungroup
The monadic function ungroup is a partial inverseto the resultant keyed tables ofselect andxgroup. Itunwinds the keyed table into a table whose records have the same format as theoriginal table. How closely its output resembles the original table depends onwhether information has been collapsed in the grouping.
We use the sp table from the distribution scriptfor our examples.
sps p qty---------s1 p1 300s1 p2 200s1 p3 400s1 p4 200s4 p5 100s1 p6 100s2 p1 300s2 p2 400s3 p2 200s4 p2 200s4 p4 300s1 p5 400`p xgroup spp s qty---------p1 s1 300p1 s2 300p2 s1 200p2 s2 400p2 s3 200p2 s4 200p3 s1 400p4 s1 200p4 s4 300p5 s4 100p5 s1 400p6 s1 100Since no aggregation has been performed and all non-keycolumns are present, the result ofungroup is the same as the originaltable with the rows sorted by the group column(s).
ungroup `p xgroup spp s qty---------p1 s1 300p1 s2 300p2 s1 200p2 s2 400p2 s3 200p2 s4 200p3 s1 400p4 s1 200p4 s4 300p5 s4 100p5 s1 400p6 s1 100If aggregation has been performed or columns have beenomitted, then only the selected values will be reflected after theungroup.For example, we omit the s column in the following grouping, so it is alsomissing after theungroup.
select qty by p from spp | qty--| ---------------p1| 300 300p2| 200 400 200 200p3| ,400p4| 200 300p5| 100 400p6| ,100ungroup select qty by p from spp qty------p1 300p1 300p2 200p2 400p2 200p2 200p3 400p4 200p4 300p5 100p5 400p6 100Note:The result of aselect in which grouping is specified but no columns are explicitly listedis not a keyed table of the proper form forungroup. You will receivean error if you apply ungroup to the result of such a query.
Sorting
Recall that tables and keyed tables are lists of recordsand therefore have an inherent order. A table or keyed table can be sorted bythe values of any column(s).
We use the following table definition in this section.
t:([]c1:20 10 30 20;c2:`z`y`x`a)tc1 c2-----20 z10 y30 x20 axasc
The dyadic xasc takes a scalar or list of symbolscontaining column names as its left argument and a table as its right argument.It returns the records of the table sorted in ascending order of the items inthe specified column(s). The order of the column names indicates the sortorder, from major to minor.
`c1 xasc tc1 c2-----10 y20 z20 a30 x`c2 xasc tc1 c2-----20 a30 x10 y20 z`c1`c2 xasc tc1 c2-----10 y20 a20 z30 xImportant:In order to modify the contents of a table you must refer to the table by name.
After execution of the expressions above, we still find,
tc1 c2-----20 z10 y30 x20 aHowever, t can be sorted in place with,
`c1`c2 xasc `t`ttc1 c2-----10 y20 a20 z30 xxdesc
The dyadic xdesc behaves exactly as xasc,except that the sort is performed in descending order.
tc1 c2-----20 z10 y30 x20 a`c1`c2 xdesc tc1 c2-----30 x20 z20 a10 yRenamingand Rearranging Columns
Since a table is the flip of a column dictionary, itscolumns are named and ordered by the list of symbols in the dictionary domain.It is sometimes necessary to rename or reorder the columns. This isaccomplished using the dyadic functionsxcol andxcols.
We use the following table definition in this section.
t:([]c1:20 10 30 20;c2:`z`y`x`a;c3:101.1 202.2 303.3 404.4)tc1 c2 c3-----------20 z 101.110 y 202.230 x 303.320 a 404.4xcol
The dyadic xcol takes a scalar or list of symbolscontaining column names as its left argument (names) and a table (source)as its right argument. The count ofnames must be less than or equal tothe number of columns insource. The result is a table obtained fromsourceby renaming the columns, in order, using the symbols innames.
For example,
`id`name`val xcol tid name val-------------20 z 101.110 y 202.230 x 303.320 a 404.4Important:The functionxcol does not modify its table operand.
After execution of the expressions above, we still find,
tc1 c2 c3-----------20 z 101.110 y 202.230 x 303.320 a 404.4However, t can effectively be renamed with,
t:`id`name`val xcol ttid name val-------------20 z 101.110 y 202.230 x 303.320 a 404.4If the count of names is less than the number ofcolumns in source, the remaining columns are unaffected. Returning tothe original definition of t,
`id`name xcol tid name c3-------------20 z 101.110 y 202.230 x 303.320 a 404.4xcols
The dyadic xcols takes a scalar or list of symbolscontaining column names as its left argument ("names") and a table("source") as its right argument. The count of "names" mustbe less than or equal to the number of columns in "source". Itreturns a table obtained from "source" by reordering the columnsaccording to the symbols in "names".
Note:The source operand can not be a keyed table.
For example,
`c3`c2`c1 xcols tc3 c2 c1-----------101.1 z 20202.2 y 10303.3 x 30404.4 a 20Important:The functionxcols does not modify itssource.
After execution of the expressions above, we still find,
tc1 c2 c3-----------20 z 101.110 y 202.230 x 303.320 a 404.4However, t can effectively be reordered with,
t: `c3`c2`c1 xcols ttc3 c2 c1-----------101.1 z 20202.2 y 10303.3 x 30404.4 a 20If the count of names is less than the number ofcolumns in source, the specified columns are reordered at the beginningof the column list and the remaining columns are left unchanged. Returning tothe original definition of t,
`c3`c1 xcols tc3 c1 c2-----------101.1 20 z202.2 10 y303.3 30 x404.4 20 aJoins
It is common in SQL to reassemble normalized data byjoining a table having a foreign key (source) to its primary key tablealong common key values. This situation occurs when the tables have amaster-detail relation, or when the values of a field have been factored into alookup table. Such an inner join with equals in the join criterion is called anequal join or anequijoin. In an equijoin, the join can bespecified in either order, and there will beexactly one record it theresult for each record in the source.
An inner join combines two tables having compatible columnsby selecting a subset of the Cartesian product along matching column values. Inaleft inner join, each row from the first table (source) ispaired with any matching rows from the second table. In aright inner join,each row from the second table (source) is paired with any matching rows fromthe first table. The match columns do not need to be key columns. In an innerjoin, there may beno rows or multiple rows in the result for each rowin the source.
SQL also has outer joins, in which each element of onetable (source) is paired with all matching elements of the other table.The match columns do not need to be key columns. In an outer join, there is atleastone row in the result for each row in the source.
Equijoinon Foreign Key
Given a primary key table m, foreign key table dand common key columnk, an equijoin can be expressed in various SQLnotations, among them,
m,d WHERE m.k = d.km INNER JOIN d ON m.k = d.kA SELECT statement for this join refers to columns in thejoin by using dot notation based on the constituent tables.
SELECT d.cold,m.colm FROM m,d WHERE m.k = d.k
As we saw in Foreign Keys and Relations a join along aforeign key is accomplished with an enumeration in q. The join is implicit inthe followingselect on the detail table.
select cold, k.colm from d
This generalizes to the situation where d hasmultiple foreign keys. Sayd has foreign keys k1, k2,... ,kn referring to primary key tables m1, m2,... ,mn. Columns from the n-way join ofd to the primarykey tables are accessed via a select of the form,
select cold, k1.colm1, k2.colm2,... , kn.colmn from d
For example, in the sp.q distribution script,
select sname:s.name,pname:p.name,qty from spsname pname qty---------------smith nut 300smith bolt 200smith screw 400smith screw 200clark cam 100smith cog 100jones nut 300jones bolt 400blake bolt 200clark bolt 200clark screw 300smith cam 400Multi-way equijoins also arise when m and dare as above and additionallyd has a primary keyl. If sis a table with a foreign key whose enumeration domain isl, thenm,d and s can be joined. In SQL,
SELECT m.colm, d.cold, s.colsfrom m,d,s WHERE m.k=d.k AND d.l=s.l
In q this is
select l.k.colm,l.cold,colsfrom s
PseudoJoin
It is possible to lookup a table's values in a keyed tableeven if there is no foreign key relationship defined. One method to achievethis is to perform a dictionary lookup inselect. There is norequirement for column names to match and the result will be a left outer join.
In the following example, observe that we must transformthe column to be looked up into the proper shape.
kt:([k:101 102 103] v:`a`b`c)t:([] c1:101 103 104)select c1, v:kt[flip enlist c1;`v] from tc1 v-----101 a103 c104Here is an example with compound keys.
t:([]c1:`a`b`c; c2:`x`x`z)ktc:([k1:`a`b`a; k2:`y`x`x] v:`one`two`three)select c1, c2, v:txf[ktc;(c1;c2);`v] from tc1 c2 v-----------a x threeb x twoc zAd hoc LeftJoin
You can also create a left outer join using the dyadic lj.The right operand is a keyed table (lookup) and the left operand is atable (source) having column(s) that match the key column(s) inlookup.In particular,source can have a foreign key defined overlookup.The ad hoc joinlj uses lookup to map the records of theappropriatesource column(s) and upsertssource with the valuecolumn(s) fromlookup.
In our example,
tdetails lj tkeid sc name iq-----------------------1003 126 Prefect 1261002 36 Beeblebrox 421001 92 Dent 981002 39 Beeblebrox 421001 98 Dent 981002 42 Beeblebrox 42The same result can be obtained with a foreign key join bylisting all the columns
select eid, sc, eid.name, eid.iq from tdetailseid sc name iq-----------------------1003 126 Prefect 1261002 36 Beeblebrox 421001 92 Dent 981002 39 Beeblebrox 421001 98 Dent 981002 42 Beeblebrox 42Note:The performance of an equijoin on a key is approximately 2.5 times faster thanan ad hoc left join.
In contrast to the equijoin, an ad hoc left join does notrequire a column in thesource table to be defined explicitly as aforeign key into thelookup keyed table.
td:([] eid:1003 1001 1002 1001 1002; sc:126 36 92 39 98)td lj tkeid sc name iq-----------------------1003 126 Prefect 1261001 36 Dent 981002 92 Beeblebrox 421001 39 Dent 981002 98 Beeblebrox 42Note:If the column(s) for the join are not foreign key(s) into the keyed table, thename(s) must match the key name(s).
Let's examine the general result of lj closely. Say tis the source table and kt is the lookup keyed table. Foreach record int, the result has at least one record. If there are norecords inkt whose values in the join column(s) match those in thecorresponding column(s) oft, thet columns are present in theresult and the remaining columns are null. If there are matching records inkt,for each match the result has a record comprising the catenation of thematching records.
kt:([k:1 2 3] b:100 200 300)ktk| b-| ---1| 1002| 2003| 300t:([]k:1 1 2 2 3 4; a:10 11 20 21 30 40)tk a----1 101 112 202 213 304 40t lj ktk a b--------1 10 1001 11 1002 20 2002 21 2003 30 3004 40Advanced:The behavior oflj differs from that of a SQL outer join when there are duplicatecolumns in the two tables. The SQL left outer join will display both columns,whereaslj upserts the appropriate column items of thesource tablewith those of thelookup keyed table.
t2:([]k:1 2 3;b:10 20 30)t2k b------1 102 203 30kt2:([k:1 2 3 4]b:100 200 300 400)kt2k| b-| ---1| 1002| 2003| 3004| 400t2 lj kt2k b-----1 1002 2003 300Plus Join
The plus join pj is a type of left join that isuseful for adding matching values in tables containing numeric data. As with anad hoc join, the right operand of plus join is a keyed table (lookup) andthe left operand is a table (source) having column(s) that match the keycolumn(s) in lookup. The plus join pj uses lookup to mapthe records of the appropriate source column(s), zero filling nulls inthe result from the lookup value column(s). It then performs a table add ofthis interim result into thesource table.
For example,
kt:([k1:1 2; k2:`x`y] a:10 20; b:1.1 2.2)t:([]k1:1 2 3; k2:`x`y`z; a:100 200 300)t pj ktk1 k2 a b-------------1 x 110 1.12 y 220 2.23 z 300 0We examine the result of pj more closely. Eachrecord of t has a corresponding record in the result.
Along the matching rows, the value columns from lookup ktare added to those of sourcet. In our example, this means thatcolumnsa and b are added intot on matching rows.Since a exists in both tables, corresponding values are added.According to the rules of table arithmetic, sinceb does not exist int,it is implicitly assumed to have 0 values int for the addition.
For non-matching rows, the values of the source tare extended with 0 in the columns oflookup.
Advanced:Note that the result in our example can also be obtained by the expression,
t+0^kt[`k1`k2#t]k1 k2 a b-------------1 x 110 1.12 y 220 2.23 z 300 0Union Join
The union join uj combines any two tables. In theresult, the rows and columns of the left operand appear before those of theright operand. Column value lists are joined for common columns. For non-commoncolumns, the value lists are extended with nulls so that they are the samelength. The column value lists of the left operand have nulls appended, whereasthose of the right operand have nulls prepended.
t1:([]c1:1 2 3;c2:101 102 103;c3:`x`y`z)t2:([]c2:103 104 105 106;c4:`a`b`c`d)t1c1 c2 c3---------1 101 x2 102 y3 103 zt2c2 c4------103 a104 b105 c106 dt1 uj t2c1 c2 c3 c4------------1 101 x2 102 y3 103 z103 a104 b105 c106 dAsof Join
The asof join is so-named because it is often used to jointables along time columns, but this is not a restriction. In general, thetriadic functionaj can be used to join two tables along commoncolumns. Significantly, there is no requirement for any of the join columns tobe keys. The syntax of asof join is,
aj[c1...cn;t1;t2]
where c1...cn is a symbol listof common column names for the join andt1 andt2are the tables to be joined. The result is a table containing records from theleft outer join oft1 andt2 along thespecified columns.
For each record in t1, the result has onerecord containing all the items int1. If there is no recordint2 whose values in the specified columns match those inthe corresponding columns oft1, there are no further itemsin the result record. If there are matching records int2,the items of the last (in row order) matching record are appended to those ofthet1 record in the result.
For example,
t:([]ti:10:01:01 10:01:03 10:01:04;sym:`msft`ibm`ge;qty:100 200 150)tti sym qty-----------------10:01:01 msft 10010:01:03 ibm 20010:01:04 ge 150q:([]ti:10:01:00 10:01:01 10:01:01 10:01:03;sym:`ibm`msft`msft`ibm; px:100 99 101 98)qti sym px-----------------10:01:00 ibm 10010:01:01 msft 9910:01:01 msft 10110:01:03 ibm 98aj[`ti`sym;t;q]ti sym qty px---------------------10:01:01 msft 100 10110:01:03 ibm 200 9810:01:04 ge 150ParameterizedQueries
Relational databases have the concept of stored procedures,which are programs that operate on tables via SQL statements. The programminglanguages that extend SQL are not part of the SQL standard, differ acrossvendors and the capabilities of the programming environments vary greatly.
This situation forces a programmer to make a difficultchoice: pay a steep price in programming power to place functionality close tothe data, or extract the data into an application server in order to performcalculations. Multi-tier architectures with separate database and applicationservers have evolved largely to address this problem, but they increase costand complexity.
This choice is obviated in kdb+, since the q programmingenvironment has all the power and performance you need. In fact, q is muchfaster than traditional database programming environments for retrieval andcalculations on large time series. Other components of the application canperform their data retrieval and manipulation by making calls to q.
Traditional calls to a database are made via storedprocedures, which are programs executed by the database manager. Often thestored procedure has parameters that supply specific values to the queries.Such parameters are limited to the basic data types of SQL.
Any q program can serve as a stored procedure; there is nodistinction between data retrieval and calculations. Any valid q expressionthat operates on tables or dictionaries can be invoked in a function. Functionparameters can be used to supply specific values for queries. In particular,the select, update and delete templates can be invoked within a function byusing parameters to pass specific values to the query. Such a function iscalled aparameterized query.
Important:Parameterized queries have restrictions. First, a parameterized query cannotuse implicit function parameters. Second, columns cannot be passed asparameters.
In the following example using our tdetails table,we pass a specific value for a foreign key match criterion.
getScByEid:{[e] select from tdetails where eid=e}getScByEid 1003eid sc--------1003 126This example can be generalized to handle a scalar or listargument.
getScByEid:{[e] select from tdetails where eid in ((),e)}getScByEid 1001eid sc-------1001 921001 98getScByEid 1001 1003eid sc--------1003 1261001 921001 98The last expression in the revised function definitionwarrants closer examination. The empty-list join turns a scalar argument into alist and has no effect on a list. It must be enclosed in parentheses because itappears in a phrase inselect, otherwise the comma would beinterpreted as a separator.
You can pass a table as a parameter to a stored procedure.Suppose we have multiple trade tables, all having at the columnspx(price) anddate in common. The following parameterized query returnsthe maximum price over a specified date range from any trade table.
maxpx:{[t;range] select max px from t where date within range}Here t is a trade table and range is alist of two dates in increasing order.
Advanced:You can effectively parameterize column names in two ways. First, you can mimica common technique from SQL in which the query is created dynamically: buildthe query text in a string and then pass the string tovalue forexecution. There is a performance penalty for this approach. Also, you mustremember to escape special characters in the string.
The second method is to use the functional form of thequery, which has no performance penalty. In the functional form, all columnsare referred to by name, so columns names are passed as symbols.
Views
In SQL, a view is essentially a stored procedure whoseresult set is used like a table. Views are used to encapsulate such datatransformations as hiding data columns or rows, renaming columns, orsimplifying complex queries. Q-sql implements a view as an alias to a query.
View
A view is a named query created as an alias withthe double assignment (::) operator. In the following, the double–colonsignifies thatv is an alias for the query rather than the currentresult of the query.
t:([] c1:`a`b`c; c2:1 2 3)v::select c1 from t where c2=2vc1--bWhen the content of the underlying table changes, theresult will be reflected in the view. This is not true of the equivalent singleassignment.
r:select c1 from t where c2=2`t insert (`d;2),3tc1 c2-----a 1b 2c 3d 2rc1--bvc1--bdFunctionalForms
The functional forms of select, updateand delete can be used in any situation but are especially useful forprogrammatically generated queries, such as when column names are dynamicallyproduced. The functional forms are,
?[t;c;b;a] / select![t;c;b;a] / update and deletewhere t is a table, a is a dictionary ofaggregates, b is a dictionary of groupbys andc is a list ofconstraints.
Note:All q entities ina,b and c must be referenced by name, meaning they appear as symbolscontaining the entity names.
The q interpreter parses the syntactic forms of select,exec,update anddelete into their equivalentfunctional forms, so there is no performance difference.
Advanced:The function parse can be applied to a string containing a query template toproduce a parse tree whose items are close to the arguments of the equivalentfunctional form. See the description of parse in Appendix A for more details.
Functionalselect
Let's start with a simple select example.
t:([]n:`x`y`x`z`z`y;p:0 15 12 20 25 14)tn p----x 0y 15x 12z 20z 25y 14select m:max p,s:sum p by name:n from t where p>0,n in `x`yname| m s----| -----x | 12 12y | 15 29Following is the equivalent functional form. Note the useof enlist to create singletons, ensuring that appropriate entities arelists.
c: ((>;`p;0);(in;`n;enlist `x`y))b: (enlist `name)!enlist `na: `m`s!((max;`p);(sum;`p))?[t;c;b;a]name| m s----| -----x | 12 12y | 15 29Of course, the functional form can be written without theintermediate variablesa,b and c. We leave this asan exercise to the macho coder.
The general form of functional select is,
?[t;c;b;a]where t is a table, c is a list of wherespecifications (constraints), b is a dictionary of groupingspecifications (by phrase), and a is a dictionary ofselectspecifications (aggregations).
Every item in c is a triple consisting of aboolean or int valued dyadic function together with its arguments, each anexpression containing column names and other variables. The function is appliedto the two arguments, producing a boolean vector. The resulting boolean vectorselects the rows that yield non-zero results. The selection is performed in theorder of the items in c, from left to right.
The domain of b is a list of symbols that are thekey names for the grouping. The range ofb is a list of columnexpressions whose results are used to construct the groups. The grouping isordered by the domain elements, from major to minor.
The domain of a is a list of symbols containingthe names of the produced columns. Each element of the range of a is an evaluationlist consisting of a function and its argument(s), each of which is a columnname or another such result list. For each evaluation list, the function isapplied to the specified value(s) for each row and the result is returned. Theevaluation lists are resolved recursively when operations are nested.
Note:Here are the degenerate cases: For no constraints, makec the empty(general) list For no grouping makeb a boolean 0b To produceall columns of the original table in the result, makea the emptylist
For example,
select from t / is equivalent to functional form?[t;();0b;()] / degenerate case for c, b, aFunctionalexec
The functional form of exec is a simplified formof select. Since the constraint parameter is the same as inselect,we omit it in the following.
In the simplest example of a single result column, thegroupby parameter is the empty list and the aggregate parameter is a symbolatom.
exec n from t`x`y`x`z`z`y?[t;();();`n] / same as previous exec`x`y`x`z`z`yIn the same query with multiple columns, the groupbyparameter is the empty list and the aggregate parameter is a dictionary as itwould be in aselect. Remember that the result is a dictionary ratherthan a table.
exec n,p from tn| x y x z z yp| 0 15 12 20 25 14?[t;();();`n`p!`n`p] / same as previous execn| x y x z z yp| 0 15 12 20 25 14If you wish to group by a single column, specify it as asymbol atom.
exec p by n from tx| 0 12y| 15 14z| 20 25?[t;();`n;`p] / same as previous execx| 0 12y| 15 14z| 20 25More complex examples of exec seem to reduce tothe equivalent select.
Functionalupdate
The functional form of update is completelyanalogous to that of select. Again note the use ofenlist tocreate singletons to ensure that appropriate entities are lists.
update p:max p by n from t where p>0n p----x 0y 15x 12z 25z 25y 15c: enlist (>;`p;0)b: (enlist `n)!enlist `na: (enlist `p)!enlist (max;`p)![t;c;b;a]n p----x 0y 15x 12z 25z 25y 15Note:The degenerate cases are the same as in functionalselect.
Functionaldelete
The functional form of delete is a simplified formof functional update,
![t;c;0b;a]where t is a table, c is a list of wherespecifications (constraints) anda is a list of column names. Eithercor a, but not both, must be present. The list of constraints, whichhas the same format as in functional select and update, chooses which rows willbe removed. The aggregates argument is a simple list of symbols with the namesof columns to be removed.
In the following examples, note the use of enlistto create singletons to ensure that appropriate entities are lists.
t:([]c1:`a`b`c;c2:`x`y`z)/ following is: delete c2 from t![t;();0b;enlist `c2]c1--abc/ following is: delete from t where c2 = `y![t;enlist (=;`c2; enlist `y);0b;`symbol$()]c1 c2-----a xc zExamples
In this section we demonstrate many of the capabilities ofq-sql using semi-serious examples taken from the world of finance. We create asample table representing a month's worth of trades for a small set of Americanstocks. To make things easy, we treat all trades as buys.
The TableSchemas
Our vastly over-simplified trading example involves twotables. The instrument table is a reference keyed table that contains basicinformation about the companies whose financial instruments (stocks in ourcase) are traded. Its schema has fields for the stock symbol, the name of thecompany and the industry classification of the company.
instrument:([sym:`symbol$()] name:`symbol$(); industry:`symbol$())instrumentsym| name industry---| -------------The trade table represents a collection of trades.Each trade record comprises: the symbol of the instrument; the date and time ofthe trade; the quantity—i.e. number of shares traded; and the price of thetrade.
trade:([] sym:`instrument$(); date:`date$(); time:`time$(); quant:`int$();px:`float$())tradesym date time quant px----------------------Note:In practice, the trade table would likely be partitioned by day on disk andonly the current day's trades would be stored in memory.
Creatingthe Tables
Populating the instrument reference table is donevia simple inserts.
`instrument insert (`ibm; `$"International Business Machines"; `$"Computer Services")`instrument insert (`msft; `$"Microsoft"; `$"Software")`instrument insert (`g; `$"Google"; `$"Internet")`instrument insert (`intc; `$"Intel"; `$"Semiconductors")`instrument insert (`gm; `$"General Motors"; `$"Automobiles")`instrument insert (`ge; `$"General Electric"; `$"Diversified Industries")Here is the console display of instrument,
instrumentsym | name industry----| ------------------------------------------------------ibm | International Business Machines Computer Servicesmsft| Microsoft Softwareg | Google Internetintc| Intel Semiconductorsgm | General Motors Automobilesge | General Electric Diversified IndustriesIn order to populate the trade table with somewhatrealistic data, we create an auxiliary function. The filltrade function takesthe name of the target trade table, a stock symbol, a median price and a count.It populates the named table with simulated trade data for the month of Jan2007. The trades are randomly distributed across days and times. The quantitiesoccur in multiples of 10. The prices are uniformly distributed around themedian price. We do not claim that this represents realistic trade data; onlythat it is sufficient to serve our query examples.
filltrade:{[tname;s;p;n]// tname is name of target table// s is stock symbol// p is median price// n is count of items/// sym column duplicates stock symbol n timessc:n#s;/ date column has n random days in Jan 2007dc:2007.01.01+n?31;/ time column has n random timestc:n?24:00:00.000;/ quantity column has n random multiples of 10qc:10*n?1000;/ price column has n random prices that are/ distributed uniformly around p/ prices are in penniespc:.01*floor (.9*p)+n?.2*p*:100;/ bulk insert columns into target tabletname insert (sc;dc;tc;qc;pc)}filltrade[`trade;`ibm;115;10000]0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ..tradesym date time quant px----------------------------------------ibm 2007.01.15 02:32:54.217 9280 111.59ibm 2007.01.20 08:56:05.985 9960 110.69ibm 2007.01.24 19:20:17.727 5970 114.58ibm 2007.01.21 08:44:50.939 1090 113.32..We invoke filltrade on each of the remaininginstruments.
filltrade[`trade;`msft;30;5000]10000 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010..filltrade[`trade;`g;540;12000]15000 15001 15002 15003 15004 15005 15006 15007 15008 15009 15010..filltrade[`trade;`intc;25;4000]27000 27001 27002 27003 27004 27005 27006 27007 27008 27009 27010..filltrade[`trade;`ge;40;9000]31000 31001 31002 31003 31004 31005 31006 31007 31008 31009 31010..filltrade[`trade;`gm;35;3000]40000 40001 40002 40003 40004 40005 40006 40007 40008 40009 40010..Finally, we sort trade by date and time so that itrepresents trades as they came in.
`date`time xasc `trade`tradetradesym date time quant px-----------------------------------------intc 2007.01.01 00:00:04.569 5440 26.63ge 2007.01.01 00:02:24.871 8280 40.11gm 2007.01.01 00:02:43.419 4280 32.13ibm 2007.01.01 00:03:06.278 5070 105.73intc 2007.01.01 00:03:24.229 1740 24.47gm 2007.01.01 00:04:17.590 830 36.53gm 2007.01.01 00:04:18.227 5060 33.02ge 2007.01.01 00:04:18.772 8290 43.73msft 2007.01.01 00:06:01.424 5170 27.71..BasicQueries
In this section, we demonstrate the use of basic q-sql toquery the trade andinstrument tables we have created.
We can count the total number of trades in several ways.
count trade43000select count i from tradex-----43000exec count i from trade43000We can count the number of trades for an individual symbol.
exec count i from trade where sym=`ibm10000count select from trade where sym=`ibm10000Observe that the former retrieves only a single record fromthe query whereas the latter retrieves all matching records and then countsthem.
We can count the number of trades across all symbols.
select count i by sym from tradesym | x----| -----g | 12000ge | 9000gm | 3000ibm | 10000intc| 4000msft| 5000() xkey select count i by sym from tradesym x----------g 12000ge 9000gm 3000ibm 10000intc 4000msft 5000Observe that the former retrieves the results as a keyedtable and the latter removes the key.
We find one day's trades for GM.
select from trade where sym=`gm, date=2007.01.07sym date time quant px---------------------------------------gm 2007.01.07 00:29:31.311 4390 32.24gm 2007.01.07 00:29:57.886 1270 38.08gm 2007.01.07 00:30:35.671 3370 35.67gm 2007.01.07 00:30:43.216 8090 36.77gm 2007.01.07 00:44:26.336 1800 35.03..We find all lunch hour trades for GM.
select from trade where sym=`gm, time within (12:00:00;13:00:00)sym date time quant px---------------------------------------gm 2007.01.01 12:01:32.133 7960 33.61gm 2007.01.01 12:37:45.021 8480 31.84gm 2007.01.01 12:39:46.197 5350 32.34gm 2007.01.01 12:57:13.215 1090 33.34gm 2007.01.02 12:53:06.764 1080 31.63..We find the maximum daily price for GE. Due to oursimplistic construction, it is statistically constant.
select maxpx:max px by date from trade where sym=`gedate | maxpx----------| -----2007.01.01| 43.972007.01.02| 43.992007.01.03| 43.992007.01.04| 43.98..We find the minimum and maximum trade price over the timespan for each symbol and display the result by company name. The latterresolves the foreign key to theinstrument table with an implicitinner join.
select lo:min px, hi:max px by sym.name from tradename | lo hi-------------------------------| ------------General Electric | 36 43.99General Motors | 31.5 38.49Google | 486 593.99Intel | 22.5 27.49International Business Machines| 103.5 126.49Microsoft | 27 32.99We find the total and average trade volume for threesymbols. Due to our simplistic construction, the latter are statistically thesame.
select totq:sum quant, avgq:avg quant by sym from trade where sym in`ibm`msft`gsym | totq avgq----| -----------------g | 59748830 4979.069ibm | 49983940 4998.394msft| 24988910 4997.782We find the daily volume weighted average price for Intel.
select vwap:quant wavg px by date from trade where sym=`intcdate | vwap----------| --------2007.01.01| 24.868492007.01.02| 25.001132007.01.03| 24.825382007.01.04| 24.980492007.01.05| 25.27898..We find the high, low and close over one minute intervalsfor Intel.
select hi:max px,lo:min px,close:last px by date, time.minute from trade where sym=`intcdate minute| hi lo close-----------------| -----------------2007.01.01 00:12 | 23.3 23.3 23.32007.01.01 00:17 | 24.03 24.03 24.032007.01.01 00:26 | 24.45 24.45 24.452007.01.01 00:51 | 25.73 25.73 25.732007.01.01 00:55 | 25.34 25.34 25.34..We demonstrate how to use your own functions in queries.Suppose we define a funky average that weights items by their position.
favg:{(sum x*1+til count x)%(count x)*count x}Then we can apply this just as we did the built-in qfunction avg.
select favgpx:favg px by sym from tradesym | favgpx----| --------g | 270.0021ge | 19.99897gm | 17.51145ibm | 57.53255intc| 12.48081msft| 15.00309MeatyQueries
In this section, we demonstrate more interesting q-sqlagainst the trade table.
We find the volume weighted average price over 5 minuteintervals for intel.
select vwap:quant wavg px by date, bucket:5 xbar time.minute from trade where sym=`intcdate bucket| vwap-----------------| --------2007.01.01 00:10 | 23.32007.01.01 00:15 | 24.032007.01.01 00:25 | 24.452007.01.01 00:50 | 25.732007.01.01 00:55 | 25.34..We use favg from the previous section todemonstrate that user functions can appear in any phrase of the query.
select from trade where px<2*(favg;px) fby symsym date time quant px-----------------------------------------gm 2007.01.01 00:06:02.168 5270 33.6g 2007.01.01 00:07:36.023 9340 527.71g 2007.01.01 00:09:46.313 3640 491.6intc 2007.01.01 00:12:05.909 610 23.3ibm 2007.01.01 00:12:17.056 6410 112.92..We find the average daily volume and price for allinstruments and store the result for the next example.
atrades:select avgqt:avg quant, avgpx:avg px by sym, date from tradeatradessym date | avgqt avgpx--------------| -----------------g 2007.01.01| 5098.892 542.3796g 2007.01.02| 5021.136 538.6672g 2007.01.03| 5114 539.1208g 2007.01.04| 4712.385 541.5371g 2007.01.05| 5202.108 539.6128..We find the days when the average price went up. Note thatwe must explicitly exclude the first day becausedeltas is funky onits first value. Observe that the avpx column scrolls off the page.
select date, avgpx by sym from atrades where 0<{0,1_deltas x} avgpxsym | date----| -------------------------------------------...g | 2007.01.03 2007.01.04 2007.01.06 2007.01.08...ge | 2007.01.02 2007.01.04 2007.01.06 2007.01.08...gm | 2007.01.02 2007.01.04 2007.01.05 2007.01.07...ibm | 2007.01.01 2007.01.03 2007.01.05 2007.01.08...intc| 2007.01.04 2007.01.05 2007.01.08 2007.01.10...msft| 2007.01.01 2007.01.02 2007.01.04 2007.01.07...To see a more representative display, take only the firstfew field values.
select 2#date, 2#avgpx by sym from atrades where 0<{0,1_deltas x} avgpxsym | date avgpx----| ---------------------------------------g | 2007.01.03 2007.01.04 539.1208 541.5371ge | 2007.01.02 2007.01.04 39.98092 40.115gm | 2007.01.02 2007.01.04 35.13107 35.25371ibm | 2007.01.01 2007.01.03 115.1667 115.1036intc| 2007.01.04 2007.01.05 24.83024 25.18836msft| 2007.01.01 2007.01.02 29.73195 30.03784We can denormalize trade to obtain a keyed tablewith one row and complex columns for each symbol. We display the first twoitems of each field to make the structure more evident.
dntrades:select date,time,quant,px by sym from tradeselect 2#date,2#time,2#quant,2#px by sym from tradesym | date time quant px----| -----------------------------------------------------------------------g | 2007.01.01 2007.01.01 00:09:54.444 00:12:34.851 4670 3080 591.05 523.08ge | 2007.01.01 2007.01.01 00:02:24.871 00:04:18.772 8280 8290 40.11 43.73gm | 2007.01.01 2007.01.01 00:02:43.419 00:04:17.590 4280 830 32.13 36.53ibm | 2007.01.01 2007.01.01 00:03:06.278 00:06:27.951 5070 9740 105.73 117.76intc| 2007.01.01 2007.01.01 00:00:04.569 00:03:24.229 5440 1740 26.63 24.47msft| 2007.01.01 2007.01.01 00:06:01.424 00:23:28.908 5170 1370 27.71 29.86In such a complex table or keyed table, you must use eachto apply a monadic (unary) function across the items in a field.
select sym,cnt:count each date, avgpx:avg each px from dntrade/ or the following alternate notation is equivalentselect sym,cnt:each[count] date, avgpx: each[avg] px from dntradesym cnt avgpx-------------------g 12000 540.0778ge 9000 39.99574gm 3000 34.98716ibm 10000 114.978intc 4000 24.96621msft 5000 29.98583We can also apply our own monadic favg functionwith each.
select sym, favgpx:favg each px from dntradessym favgpx-------------g 269.94ge 19.98121gm 17.48443ibm 57.49667intc 12.48413msft 15.0314We find the volume weighted average price by applying thedyadic wavg. In this case we must use the each-both adverb '. Observethat our simplistic construction makes the average price and volume weightedaverage price statistically the same.
select sym, vwap:quant wavg' px from dntrade/ is equivalent to the alternate notationselect sym, vwap:wavg'[quant;px] from dntradesym vwap-------------g 540.1832ge 40.00807gm 34.95398ibm 114.9836intc 24.97542msft 29.96661Note that the latter form generalizes to n-adicfunctions for any n>1.
We find the profit of the ideal transaction over the monthfor each symbol. This is the maximum amount of money that could be made with20-20 hindsight. In other words, find the largest profit obtainable by buyingat any traded price and selling at the highest subsequently traded price. Tosolve this, we reverse the perspective. For each traded price, we look at theminimum prices that preceeded it. The largest such difference is our answer.
select max px-mins px by sym from tradesym | px----| ------g | 107.99ge | 7.99gm | 6.99ibm | 22.99intc| 4.99msft| 5.99RemoteQueries
In this section, we demonstrate how to execute q-sqlqueries against a remote server. We assume that our sample tables have beencreated in a q instance (the server) that is listening on some port, say 5042.We also assume that we have another q process (the client) with an open handle hto the server. See IO for details on how to connect to remoteprocesses in q. The following expressions are all executed on theclient.
We can ask the server to list its tables.
h "tables `."`dntrades`instrument`tradeWe can ask the server for the count of its trade table.
h "count trade"43000We look up a name by sym. Observe the result is a vector.
h "exec sym from instrument where name=`Intel",`intcWe can look up a sym by name. Observe the necessity ofescaping the double quotes inside the dynamic q-sql string.
h "exec name from instrument where name=`$\"General Electric\"",`General ElectricWe can construct a query on the client and send it to theserver along with parameters to be executed.
qdaily:{[s;d] select from trade where sym=s, date=d}h (qdaily;`g;2007.01.12)sym date time quant px----------------------------------------g 2007.01.12 00:03:24.082 3570 507.44g 2007.01.12 00:05:31.920 2900 588.99..We can construct the same query on the server and executeit remotely.
h "qdaily:{[s;d] select from trade where sym=s, date=d}"/ verify that it's thereh "qdaily"{[s;d] select from trade where sym=s, date=d}/ execute ith "qdaily[`msft;2007.01.31]"sym date time quant px----------------------------------------msft 2007.01.31 00:00:41.237 9940 29.65msft 2007.01.31 00:01:36.508 580 27.19..
Contents
[hide]
- 1 Execution Control
- 1.1 Overview
- 1.2 Control Flow
- 1.2.1 Basic Conditional Evaluation
- 1.2.2 Extended Conditional Evaluation
- 1.2.3 9.1.3 Vector Conditional Evaluation
- 1.2.4 if
- 1.2.5 do
- 1.2.6 while
- 1.2.7 Return and Signal
- 1.2.8 Protected Evaluation
- 1.3 Debugging
- 1.4 Scripts
- 1.4.1 Creating and Loading a Script
- 1.4.2 Special Notations
- 1.4.3 Passing Parameters
- 1.4.4 Example
10. ExecutionControl
Overview
Function evaluation provides sequential execution of aseries of expressions. In this chapter, we demonstrate how to control executionin q.
ControlFlow
In a vector-oriented language such as q, the clearest codeand best performance is generally obtained by avoiding loops and individualtests. For those times when you simply must write iffy or loopy code, q hasversions of the usual constructs.
Warning:The constructs in this section all involve branching in the byte code that is generatedby the q interpreter. The offset of the branch destination is limited(currently to 255), which means that the sequence of q expressions that can becontained in any part of$,if, do, or while must be short. At some point, insertion of one additional statementwill result in abranch error, which is q's way of rejecting bloated code. If you insist onwriting iffy or loopy code (never a good idea in q), factor code blocks intoseparate functions.
BasicConditional Evaluation
Languages of C heritage have a form of in-line 'if' calledconditional evaluation that has the form.
exprcond ? exprtrue : exprfalse
where exprcond is an expression thatevaluates to a boolean (or int in C and C++). The result of the expression isexprtruewhenexprcond is true (or non-zero) andexprfalseotherwise.
The same effect can be achieved in q using basicconditional evaluation,
$[exprcond;exprtrue;exprfalse]
where exprcond is an expression thatevaluates to a boolean or int. The result isexprtrue whenexprcondis not zero and exprfalse if it is zero.
a:42b:98$[a>60;`Pass;`Fail]`Fail$[b>60;`Pass;`Fail]`PassObserve that a test for zero in exprcondcan be abbreviated.
c:0$[a;`Nonzero;`Zero]`Nonzero$[b;`Nonzero;`Zero]`Nonzero$[c;`Nonzero;`Zero]`ZeroNote:A null is not accepted for exprcond.
d:0N$[d;`NonNull;`Null]'typeExtendedConditional Evaluation
In languages of C heritage, the if-else construct has theform,
if (exprcond){
statementtrue1;
.
.
.
}
else {
statementfalse1;
.
.
.
}
where exprcond is an expression thatevaluates to a boolean (or int in C and C++). If the expressionexprcondis true (or non-zero) the first sequence of statements in braces is executed;otherwise, the second sequence of statements in braces is executed.
A similar effect can be achieved in q using an extendedform of conditional evaluation.
$[exprcond;[exprtrue1;...];[exprfalse1;...]]
where exprcond is an expression thatevaluates to a boolean or int. Whenexprcond evaluates tonon-zero, the first bracketed sequence of expressions is executed inleft-to-right order; otherwise, the second bracketed sequence of expression isexecuted.
a1:42a2:24$[a1<>42;[a:6;b:7;a*b];[a:`Life;b:`the;c:`Universe;a,b,c]]`Life`the`Universe$[a2<>42;[a:6;b:7;a*b];[a:`Life;b:`the;c:`Universe;a,b,c]]42Languages of C heritage have a cascading form of if-else inwhich multiple tests can be made,
if (exprcond1){
statementtrue11;
.
.
.
}
else if (exprcondn){
statementtruen1;
.
.
.
}
.
.
.
else {
statementfalse;
.
.
.
}
In this construction, the exprcond areevaluated consecutively until one is true (or non-zero), at which point theassociated block of statements is executed and the statement is complete. Ifnone of the expressions passes, the final block of statements, called the defaultcase, is executed.
Note that any conditional other than the first is onlyevaluated if all those prior to it have evaluated to false. In addition, onlyone of the statement blocks will be executed.
A similar effect can be achieved in q with another extendedform of conditional execution.
$[exprcond1;exprtrue1;... ;exprcondn;exprtruen;exprfalse]
In this form, the conditional expressions are evaluatedconsecutively until one is non-zero, at which point the associatedexprtrueis evaluated and its result is returned. If none of the conditional expressionsevaluates to non-zero,exprfalse is evaluated and its resultis returned. Observe thatexprfalse is distinguished as thelast expression following a sequence of paired expressions.
Note:Any conditional other than the first is only evaluated if all those prior to ithave evaluated to zero. Otherwise put, a conditional evaluating to non-zeroshort-circuits the evaluation of all those after it.
a:42b:0c:-42$[a=0;`zero;a>0;`pos;`neg]`pos$[b=0;`zero;b>0;`pos;`neg]`zero$[c=0;`zero;c>0;`pos;`neg]`negFinally, the previous extended form of conditionalexecution can be further extended by substituting a bracketed sequence ofexpressions for anyexprtrue orexprfalse.
$[exprcond1;[exprtrue11;...];... ; exprcondn;[exprtruen1;...];[exprfalse1;...]]
9.1.3Vector Conditional Evaluation
Triadic vector-conditional evaluation ( ? ) hasthe form,
?[vb; exprtrue ; exprfalse]
where vb is a simple boolean list and exprtrueandexprfalse are atoms or vectors of the same type thatconform tovb. The result conforms tovb,and containsexprtrue in positions where vb has1b and exprfalsein positions wherevb has 0b .
The following example inserts 42 for odd-valued items of a list.
L:(til 10) mod 3L0 1 2 0 1 2 0 1 2 0?[0=L mod 2;L;42]0 42 2 0 42 2 0 42 2 0Note:All arguments of a vector-conditional are fully executed. In other words, thereis no short circuiting of the evaluation.
if
The if statement conditionally evaluates asequence of expressions. It has the form,
if[exprcond;expr1;... ;exprn]
where exprcond is evaluated and if it isnon-zero the expressionsexpr1 thruexprnare evaluated in left-to-right order. Theif statement does not havean explicit result.
For example,
a:42b:98z:""if[a=42;z:"Life the universe and everything"]z"Life the universe and everything"if[b<>42;x:6;y:7;z:x*y]z42do
The do statement is an iterator of the form,
do[exprcount; expr1;... ; exprn]
where exprcount must evaluate to an int.The expressions expr1 thru exprn areevaluated exprcount times in left-to-right order. Thedostatement does not have an explicit result.
For example, the following expression computes nfactorial. It iteratesn-1 times, decrementing the factorfon each pass.
n:5do[-1+f:r:n;r*:f-:1]r120while
The while statement is an iterator of the form,
while['exprcond;expr1;... ; exprn]
where expr cond is evaluated and theexpressions expr1 thruexprn are evaluatedrepeatedly in left-to-right order as long asexprcond isnon-zero. Thewhile statement does not have an explicit result.
Let's examine a nifty example taken from the Q Language Reference Manual. The followingfunction returns a list in which each null item in the argument listxhas been replaced with the item before it.
f:{r:x;r[i]:r[-1+i:where null r];r}Now observe that the expression,
max null vindicates whether there are any nulls in a list v(why?).
The following expression applies f iterativelyuntil there are no nulls left inv.
while[max null v;v:f v]Effectively, non-null values are propagated forward acrossnulls.
v:10 -3.1 0n 42 0n 0n 0n 3.4while[max null v;v:f v]v10 -3.1 -3.1 42 42 42 42 3.4Do you see the problem with this example? Hint:consider the case wherev has one or more initial null items andremember that Ctrl-C terminates execution of a long-running q expression. Thewhileexpression will iterate forever because there is no value to propagate acrossthe initial item.
When you know v will be of a type having anunderlying numeric value, one solution is to prepend a default initial valueand remove it afterward. We use a type-matched zero,
v:0n -3.1 0n 42 0n 0n 0n 3.0w:((type v)$0),vwhile[max null w;w:f w]1_w0 -3.1 -3.1 42 42 42 42 3Return andSignal
Normal function execution evaluates each expression in thefunction and terminates after the last one. There are two mechanisms for endingthe execution early: one returns successfully and the other aborts.
To terminate a function's execution successfully and returna value, use an empty assignment, which is assign (: ) with a valueto its right and no variable to its left. For example, in the followingcontrived function, execution is terminated and the result is returned afterthe third expression. The final expression is never evaluated.
c:0f:{a:6;b:7;:a*b;c::98}f 042c0To abort function execution immediately, use signal,which is single-quote (' ) with a value to its right. For example, inthe following function, execution will be aborted in the third expression. Thefinal expression that assignsc is never evaluated.
c:0g:{a:6;b:7;'`TheEnd;c::98}g 0{a:6;b:7;'`TheEnd;c::98}'TheEndc0Note:Unless a function issuing a signal is invoked with protected execution, thesignal will cause the calling routine to fail.
You can also use signal within an if statement toterminate execution. Compare the following,
a:42if[a<50; '`Stop; b:100]'StopProtectedEvaluation
Languages of C++ heritage have the concept of protectedexecution using a try-catch. The idea is that an unexpected condition arisingfrom any statement enclosed in the try portion does not abort execution.Instead, control transfers to the catch block, where the exception can behandled or passed up to the caller. This mechanism allows the call stack to beunwound gracefully.
Q provides a similar capability using triadic forms offunction evaluation ( @ ) and ( . ). Triadic @ isused for monadic functions and triadic. is used for multivalentfunctions. The syntax is the similar for both,
@[fmon;a;exprfail]
.[fmul;Largs;exprfail]
Here fmon is a monadic function, ais single argument,fmul is a multivalent function,Largsis a list of arguments, andexprfail is any expression. Inboth forms, the function is applied to its argument(s). Provided there is noerror in evaluating the function, the return value off is returnedfrom the protected evaluation. Should an error arise, exprfailis evaluated.
Note:Ifexprfail results in an error, the protected call itselfwill fail.
These functions are especially useful when processing inputreceived from users. In the following examples, you would replace the unhelpfulerror message with more useful error handling.
Suppose a user wishes to enter dynamic q expressions. Youcould place the expression in a string and pass it tovalue. Theproblem with this is that if the user types an invalid q expression, it willcause the application to fail. You should instead use protected execution.
s:"6*7"@[value;s;`$"Invalid q expression"]42s:"6x7"@[value;s;`$"Invalid q expression"]`Invalid q expressionSimilarly, triadic . provides protected executionfor multivalent functions.
x:6y:7.[*;(x;y);`$" Invalid args for *"]42x:6y:`7.[*;(x;y);`$" Invalid args for *"]`Invalid args for *Debugging
Debugging in q harkens back to the olden days, before theadvent of debuggers and integrated development environments. The q gods don'tgive debugging much consideration because their code always runs correctly thefirst time. For the rest of us, things aren't quite as bad as inserting printstatements, but you are certainly on your own. There is no debugger, nor isthere any notion of break points or tracing execution.
When any expression evaluation fails, the console displaysan (often cryptic) error message along with a dump of the offending values.Many errors manifest as either'type or'length, indicatingan incompatibility in function arguments with respect to type or length. Thegoal is to discover the root cause of the superficial error.
The first step is to examine the dump of the offendingarguments. Sometimes, the error will be obvious. A common'typeculprit is violation of type checking by attempting to assign a non-matchingvalue to a simple list (e.g., a table column). Another common'typeoffense is attempting to perform an operation on an atom not in the domain ofthe operation. A common culprit is failure to enlist an argument when a list isexpected.
In a technique passed on by Simon Garland,you can get a more useful display of relevant information when a function issuspended. Define a function, sayzs, as follows,
zs:{`d`P`L`G`D!(system"d"),v[1 2 3],enlist last v:value x}This function takes another function as its argument andreturns a dictionary with entries for the current directory, functionparameters, local variables referenced, global variables referenced and thefunction definition.
We demonstrate this with a trivial example.
b:7f:{a:6;x+a*b}f[100] / this is OK142f[`100] / this is an error{a:6;x+a*b}'type+`0042zs f / see what's whatd| `.P| ,`xL| ,`aG| ``bD| "{a:6;x+a*b}"Stopping execution prior to the offending expression ishelpful. This can be done by inserting a signal before the expression you wishto examine. You can then evaluate the various items in the offendingevaluation. Stopping execution with a signal is a poor man's break point.
However the execution is suspended, you can evaluate theexpressions of the function by hand from the console. To resume execution witha return value, issue a return (: ) with the desired value at thecommand prompt. To return an error, issue a signal (' ) from thecommand line. To terminate execution and clear the call stack, issue (\) from the command line.
Scripts
A script is a q program stored in a text file withan extension of 'q'. A script can contain any q expressions or commands. Thecontents of the script are executed sequentially from top to bottom. Non-localentities created in the script exist in the workspace after the script isloaded.
Creatingand Loading a Script
You can create a script in a text editor and save it with aq extension. For example, enter the following lines and save to a file namedtrades.q in the q directory.
trades:([] sym:(); ex:(); time:(); price:())`trades insert (`IBM;`N; 12:10:00.0; 82.1)`trades insert (`IBM;`O; 12:30:00.0; 81.95)`trades insert (`MSFT;`N; 12:45:00.0; 23.45)`trades insert (`IBM;`N; 12:50:00.0; 82.05)`trades insert (`MSFT;`N; 13:30:00.0; 23.40)Now issue the load command,
\l trades.q,0,1,2,3,4You can verify that the trades table has been created andthe records have been inserted.
count trades5A script can be loaded at the start of the q session, or atany time during the session using the\l command. The load command canbe executed from the console or from another script. Seehere for more oncommands.
SpecialNotations
You can comment out a block of code by surrounding itmatching / and \. An unmatched \ exits the script.
Multi-line expressions are permitted in a script but theyhave a special form. The first line must be out-dented, meaning that it beginsat the left of the line withno initial whitespace. Any continuationlines must be indented, meaning that there isat least one whitespacecharacter at the beginning of the line. Empty lines between expressions arepermitted.
Table definition syntax and function definition syntax havethe same rule for splitting across multiple lines:
A table orfunction can have line breaks after the closing square bracket or after asemicolon separator (;).
PassingParameters
Parameters are passed to a q script at q startup similarlyto command line parameters in a C or Java program. They are strings that arenot explicitly declared and are accessed positionally corresponding to theorder in which they are passed.
Note:As of this writing (Jun 2007), parameters can be passed when a script is loadedat q startup but not when a script is loaded with the\l command.
Specifically, the system variable .z.x is a listof strings, each of which contains the char representation of an argumentpresent when the script was invoked. For example, the scriptcaptureargs.q,
/ script that captures its first three argumentsp0:.z.x 0;p1:.z.x 1;p2:.z.x 2;can be loaded during q startup,
q.exe captureargs.q 42 forty 2.0and in the new q session you will find,
p0"42"p1"forty"p2"2.0"Example
Here is the commented script text for the sample programfrom Overview.
/ read px.csv file into table tt:("DSF"; enlist ",") 0: `:c:/q/data/px.csv;/ select max Price from t grouped by Date and Symtmpx:select mpx:max Price by Date,Sym from t;/ open connection to q process on port 5042 on aerowingh:hopen `:aerowing:5042;/ issue above query against table tpx on remote machinertmpx:h "select mpx:max Price by Date, Sym from tpx";/ close connectionhclose h;/ append merger of local and remote results to file tpx.dat.[`:c:/q/data/tpx.dat; (); ,; rtmpx,tmpx]
Contents
[hide]
- 1 I/O
- 1.1 Overview
- 1.2 Data Files
- 1.2.1 File Handle
- 1.2.2 Using hcount and hdel
- 1.2.3 Using set and get
- 1.2.4 Using hopen and hclose
- 1.2.5 Using Dot Amend
- 1.2.6 Writing Splayed Tables
- 1.3 Save and Load on Tables
- 1.4 Text Files
- 1.4.1 Writing (0:) and Reading (read0)
- 1.4.2 Using hopen and hclose
- 1.5 Binary Files
- 1.5.1 Writing (1:) and Reading (read1)
- 1.5.2 Using hopen and hclose
- 1.5.3 Reading Text Files as Binary
- 1.6 Parsing File Records
- 1.6.1 Fixed Length Records
- 1.6.2 Variable Length Records
- 1.7 Saving and Loading Contexts
- 1.7.1 Saving a Context
- 1.7.2 Loading a Context
- 1.8 Interprocess Communication
- 1.8.1 Communication Handle
- 1.8.2 Connection Handle
- 1.8.3 Message Format
- 1.8.4 Synchronous Messages
- 1.8.5 Asynchronous Messages
- 1.8.6 Message Handlers
- 1.8.7 Handling Close
- 1.8.8 Http Connection Handler
11.I/O
Overview
I/O in q is achieved using handles, which are symbols whose valuesare file names. The handle acts as a mapping to an I/O stream, in the sensethat retrieving a value from the handle results in a read and passing a valueto the handle is a write.
Data Files
All q entities are automatically serializable to disk. The persistent formis a self-describing version of the in-memory form. Adata filecomprises a q entity written to disk.
File Handle
A file handle is a symbol that starts with a colon ( : ) and has the form,
`:[path]fname
where the bracketed expression represents an optional path and fname is afile name. Both path and fname must be valid names as recognized by theunderlying operating system.
Important: The one caveat is that separators in q paths are always represented bythe forward slash ( / ), even for Windows.
Using hcount and hdel
Use hcount with a file handle to determine the size of the file in bytes.The result is a long.
hcount`:c:/q/Life.txt
21210j
Use hdel with a file handle to delete a file from the file system of theunderlying operating system. A return value of the file handle indicates thatthe deletion was successful. You will get an error message if the file does notexist or if the delete cannot be performed.
hdel`:c:/q/Life.txt
`:c:/q/Life.txt
Using set and get
A data file is created and a q entity written to it in a single step usingbinary set . The left operand is a file handle, the right operand is the entityto be written and the result is the handle of the written file. The file isclosed once the write is complete.
`:/q/qdata.dat set 101 102 103
`:/q/qdata.dat
Note: Thebehavior of set is to create the file if it does not exist and overwrite it ifit does.
A data file can be read using unary get, whose argument is a file handleand whose result is the q entity contained in the data file.
get`:/q/qdata.dat
101 102 103
An alternate way to read a data file is with value,
#!q
value`:/q/qdata.dat
101 102 103 42 1 2 3 4
Using hopen and hclose
A data file handle is opened with hopen. The result of hopen is an intfile handle that acts like a function for writing to the file once assigned toa variable.
h:hopen`:c:/qdata.dat
h[42] / handle used as function
h 1 2 34 / juxtapositionnotation
If the file already exists, opening it with hopen appends to it ratherthan overwriting it.
To close the handle, issue hclose on the result of hopen. This flushes anydata that might be buffered.
hclose h
After the operations above, we fond,
get`:/q/qdata.dat
101 102 103 42 1 2 3 4
Using Dot Amend
Fundamentalists can use dot amend to write to data files. To overwrite thefile if it exists, use assign ( : ).
.[`:/q/qdata.dat;();:;1001 1002 1003]
`:/q/qdata.dat
get`:/q:/qdata.dat
1001 1002 1003
To append to the file if it exists, use join ( , ).
.[`:/q/qdata.dat;();,;42 43]
`:/q/qdata.dat
get`:/q/qdata.dat
1001 1002 1003 42 43
Writing Splayed Tables
Writing a table to a data file using the above methods puts it into asingle file. For example,
t:([] c1:101102 103; c2:1.1 2.2 3.3)
`:/q/data/t.dat set t
`:/q/data/t.dat
creates a single file in the data subdirectory of the q directory. Listthe directory on your disk now to verify this.
You can write each column of the table to its own file in the directoryspecified in the handle; this is especially useful for large tables. A tablewritten in this form is called asplayed table.
To splay a table, specify the path as a directory - that is, with atrailing slash (/) and no file name.
`:/q/data/t/ set t
`:/q/data/t/
If you list the directory in the OS, you will see a new subdirectory named't'. It contains three files, one file for each column in the original table,as well as a '.d' file containing q meta data. The latter describes how to putthe columns back together.
Important: For a table to be splayed, each column must be of uniform width.Consequently a splayed table cannot contain any symbol or non-simple columns. Atable with symbol column(s) can effectively splayed by enumerating the symbols.
Thus, the following fails,
ts:([]c1:`a`b`c`a;c2:10 20 30 40)
`:/q/data/ts/ set ts
'type
Enumerate the symbol column and the write succeeds.
syms:distinct ts.c1
updatec1:`syms$c1 from `ts
`ts
ts
c1 c2
-----
a 10
b 20
c 30
a 40
`:/q/data/ts/ set ts
`:/q/data/ts/
Save and Load on Tables
The save and load functions simplify the process of writing and readingtables to/from disk files.
In its simplist form, save writes a table to a file with the same name asthe table. The form,
save`:path/tname
in which path is an optional path name and tname is the name of a table inthe workspace, is equivalent to,
`:path/tnameset tname
Thus,
save`:/q/trade
writes the trade table to a file named trade in the q directory.
Similarly,
save`:path/tname/
splays the table within the directory tname.
As you would expect, load is the inverse of save, in that it reads a tablefrom a file into a variable with the same name as the file. In other words,
load`:path/tname
is equivalent to,
tname:get`:path/tname
Thus, the expression,
load `:/q/trade
creates a table variable trade and populates it from the file data.
As before, appending a / indicates that the table has been splayed. So,
load`:path/tname/
populates a table tname from the directory tname.
You can also use save to write a table as delimited text simply byappending an appropriate file extension. The expression,
save`:path/tname.txt
writes the table as text records. The expression,
save`:path/tname.csv
writes the table as csv records. The expression,
save`:path/tname.xml
writes the table as xml records.
Note: Tableswritten as .txt or .csv can be read as text files.
As an example, we take the simple table,
tsimp:([]c1:`a`b`c; c2:10 20 30)
We save it,
save`:/q/tsimp
`:/q/tsimp
Then reload it
tsimp:()
load `:/q/tsimp
`tsimp
tsimp
c1 c2
-----
a 10
b 20
c 30
Next we save it in delimited text formats,
save`:/q/tsimp.txt
`:/q/tsimp.txt
save`:/q/tsimp.csv
`:/q/tsimp.csv
save`:/q/tsimp.xml
`:/q/tsimp.xml
Now we inspect the files files with a text editor. In tsimp.txt, we find,
c1 c2
a 10
b 20
c 30
In tsimp.csv we have,
c1,c2
a,10
b,20
c,30
In tsimp.xml, we have,
a 10
b 20
c 30 Text Files
Importing and exporting data often involves reading and writing textfiles. The mechanism for doing this in q differs from processing q data files.
Writing (0:) and Reading(read0)
The q primitive verb denoted 0: takes a file handle as its left argumentand a list of q strings as it right argument. It writes each string as a lineof text in the specified file.
`:/q/Life.txt 0: ("So";"Long")
`:/q/Life.txt
Opening the file Life.txt in a text editor will show a file with twolines.
Read a text file with read0. The result is a list of strings, one for eachline in the file.
read0`:/q/Life.txt
"So"
"Long"
Using hopen and hclose
A text file handle can be opened with hopen. The result of hopen is apositive int whosenegative is a file handle can be used to write textto the file.
h:hopen`:/q/Life.txt
(negh)["and"]
-152
(neg h)("Thanks";"for";"all";"the";"Fish")
-152
If the file already exists, opening it with hopen will append to it ratherthan overwriting it.
To close the handle, issue hclose on the int result of hopen . Thisflushes any data that might be buffered.
hclose h
read0`:/q/Life.txt
"So"
"Long"
"and"
"Thanks"
"for"
"all"
"the"
"Fish"
Binary Files
It is also useful to read and write data from/to binary files. Themechanism for doing this is similar to that for processing text files. In q, abinary record is a simply a list of byte values.
Writing (1:) and Reading(read1)
The q primitive verb denoted 1: takes a file handle as its left argumentand a simple byte list as its right argument. It writes each byte in the listas a byte in the specified file.
`:/q/answer.bin 1: 0x2a0607
`:q/answer.bin
Opening the file answer.bin in an editor that displays binary data willshow a file with three bytes.
Read a text file with read1. The result is a list of byte.
read1`:/q/answer.bin
0x2a0607
Using hopen and hclose
A binary file handle can be opened with hopen. The result of hopen is apostiive file handle int that can be used to write a list of byte to the file.Close the file by issuing hclose on the handle.
h:hopen`:/q/answer.bin
h[0x01]
152
h 0x020304
152
hclose h
read1`:/q/answer.bin
0x2a060701020304
Reading Text Files as Binary
A text file can also be read as binary data by using read1. With Life.txtas above,
read0`:/q/Life.txt
"So"
"Long"
"and"
"Thanks"
"for"
"all"
"the"
"Fish"
read1`:c:/q/Life.txt
0x536f0d0a4c6f6e670d0a616e640d0a5468616e6b730d0a666f720d0...
To convert this binary data to char, cast the binary. On a Windowsmachine, this looks as follows,
"c"$read1 `:c:/q/Life.txt
"So\r\nLong\r\nand\r\nThanks\r\nfor\r\nall\r\nthe\r\nFish\r\n"
Parsing File Records
Binary forms of 0: and 1: parse individual fields of a text or binaryrecord according to data type. Field parsing is based on the following fieldtypes.
0
1
Type
Width(1)
Format(0)
B
b
boolean
1
[1tTyY]
X
x
byte
1
H
h
short
2
[0-9a-fA-F][0-9a-fA-F]
I
i
int
4
J
j
long
8
E
e
real
4
F
f
float
8
C
c
char
1
S
s
symbol
n
M
m
month
4
[yy]yy[?]mm
D
d
date
4
[yy]yy[?]mm[?]dd or [m]m/[d]d/[yy]yy
Z
z
datetime
8
date?time
U
u
minute
4
hh[:]mm
V
v
second
4
hh[:]mm[:]ss
T
t
time
4
hh[:]mm[:]ss[[.]ddd]
blank
skip
*
literal chars
The column labeled '0' contains the (upper case) field type char for textdata. The (lower case) char in column '1' is for binary data. The columnlabeled 'Width(1)' contains the number of bytes that will be parsed for abinary read. The column labeled 'Format(0)' displays the format(s) that areaccepted in a text read.
Note: Theparsed records are presented in column form rather than in row form because qconsiders a table to be a collection of columns.
Fixed Length Records
The binary form of 0: and 1: for reading fixed length files is,
(Lt;Lw) 0: f
(Lt;Lw) 1: f
The left operand is a (general) list containing two sublists: Ltis a simple list of char containing one letter per field; Lwis a simple list of int containing one int width per field. The sum of thefield widths inLw must equal the width of the record. Theresult of the function in all cases is a (general) list of lists with an itemfor each field.
The simplest form of the right operand f is a symbol representing a filehandle. For example,
("IFCD";4 8 10 6 4) 0: `:/q/Fixed.txt
reads a text file containing fixed length records of width 32. The firstfield is an int of length 4; the second field is a float of width 8; the thirdfield consists of 10 char; the fourth slot of 6 positions is skipped; the fifthfield is a date of width 10.
You might think that the widths are superfluous, but they are not. Theactual width can be narrower than the default for small values. Alternatively,you may wish to specify a width larger than that required by the correspondingdata type to indicate blanks between fields. If the file in the previousexample were rewritten with one additional blank character between fields, theproper left operand to read it would be,
("IFCD"; 5 9 11 6 4)
For example, we take a file c:/q/data/Px.txt having the form,
1001DBT12345678 98.61002EQT98765432 24.571003CCR00000001121.23
The read is,
("ISF";4 11 6) 0: `:/q/data/Px.txt
1001 1002 1004
DBT12345678 EQT98765432 CCR00000001
98.6 24.75 121.23
The second form of the right operand f is,
(hfile;i;n)
where hfile is a symbol containing a file name,iis the offset into the file to begin reading andn is the number ofbytes to read. This is useful for large files that cannot be read into memoryin one operation.
Note: A readoperation must begin and end on a record boundary.
In our trivial example, the following reads the second and third records,
("ISF";4 11 6) 0: (`:/q/data/Px.txt; 21; 42)
1002 1004
EQT98765432 CCR00000001
24.75 121.23
Variable Length Records
The binary form of 0: and 1: for reading variable length delimited filesis,
(Lt;D) 0: f
(Lt;D) 1: f
The left operand is a (general) list comprising two items: Ltis a simple list of char containing one type letter per field;D is aeither a char representing the delimiting character or an enlisted such.
If D is a delimiter char, the result is a general list of lists.Each list in the result is made up of items of type specified byLt.The simplest form of the right operandf is a symbol representing a filehandle.
For example, say we have a csv file /q/data/Px.csv having records,
1001,"DBT12345678",98.6
1002,"EQT98765432",24.75
1004,"CCR00000001",121.23
Reading with a simple delimiter char results in a list of column lists,
("ISF";",") 0: `:c:/q/data/Px.csv
1001 1002 1004
DBT12345678 EQT98765432 CCR00000001
98.6 24.75 121.23
If D is the enlist of a delimiter char, the first record is taken to be alist of column names. Subsequent records are read as data specified by thetypes inLt. The result is a table in which each record isformed from a file record.
Say we have a csv file /q/data/pxtitles.csv having records,
"Seq","Sym","Px"
1001,"DBT12345678",98.6
1002,"EQT98765432",24.75
1004,"CCR00000001",121.23
Reading with an enlisted delimiter results in a table,
("ISF";enlist ",") 0: `:/q/data/pxtitles.csv
Seq Sym Px
-----------------------
1001 DBT12345678 98.6
1002 EQT98765432 24.75
1004 CCR00000001 121.23
You can also read this file with an atomic delimiter. The result is a listof lists with nulls in the positions where the header records do not match thespecified types.
("ISF";",") 0: `:c:/q/data/pxtitles.csv
1001 1002 1004
Sym DBT12345678 EQT98765432 CCR00000001
98.6 24.75 121.23
Saving and Loading Contexts
It is possible to save or restore all the entities in a q context in oneoperation. This is useful to restore the state of a system to its initialcondition or from a checkpoint.
Saving a Context
Recall that a context is actually a dictionary. You can write an entirecontext, with all its entities, to a single data file by writing thedictionary.
For example, to write out the default context,
`:currentwsset value `.
`:currentws
Loading a Context
To retrieve a saved context, use get with the file handle,
dc:get`:currentws
Use set with a symbol containing the context name to replace the context,
`. set dc
Important: Overlaying the root context replaces all its entities. This isconvenient for re-initialization, but be sure of your intent.
Interprocess Communication
A q process can communicate with another q process residing anywhere onthe network, provided that process is accessible. The process that initiatesthe communication is theclient, while the process receiving andprocessing the request is the server. The server process can be on thesame machine, the same network, a different network or on the internet. Thecommunication can be synchronous (wait for a result to be returned) orasynchronous (don't wait and no result returned).
The easiest way to examine interprocess communication (IPC) is to startanother q process on the same machine running your current q session. Make sureit is listening on a different port (the default port is 5000). In what followswe shall assume that a server q process has been started on the same machinewith the command,
q -p 5042
This means it is listening on port 5042.
Communication Handle
A communication handle is similar to a file handle. It is a symbol thatstarts with a colon (:) and has the form,
`:[server]:port
where the bracketed expression represents an optional server machineidentifier and port is a port number.
If the server process is running on the same machine as the clientprocess, you can omit the server identifier. In our case, the communicationhandle is,
#!q
`::5042
If the server is on the same network as your machine, you can use itsmachine name. In our case,
`:aerowing:5042
You can use the IP address of the server,
`:198.162.0.2:5042
If the server is running on the internet, you can use a url,
`:www.yourco.com:5042
Connection Handle
Use a communication handle as the argument of hopen to open a connectionto the server process. Store the int result of hopen , called theconnectionhandle, in a variable. You issue commands to the server by treating thisvariable as if it were a function.
For example, if the server process is running on the same machine and islistening on port 5042, the following q code opens a connection to the serverprocess. It assigns the value 42 to the variable a on the server and thenretrieves the value of a from the server. Finally, the connection is closed.
h:hopen`::5042
h"a:42"
h"a"
42
hclose h
Note:Whitespace between h and the quoted string is optional, as it is in functionjuxtaposition. We include it for readability.
Message Format
The general message format for interprocess communication is a list,
(f; arg1; arg2; ...)
Here f is a symbol or string representing an expression to be evaluated onthe server. It can be an expression containing q operators or it can be afunction, dictionary or list. The remaining itemsarg1, arg2... are optional parameters for the map. The parameters are arguments when f isfunction, indices when f is a list, or domain items when f is a dictionary.Message execution returns the result of the server's evaluation.
This form of remote call is very powerful, in that it can send a mappingto a remote q instance for evaluation. In particular, the lambda of a functionis transported. In a simple example, say we already have an open handle h to aserver. If f is defined on the client as,
f:{x*x}
then executing the following expression on the client,
h (f;2)
results in f being sent to the server with the argument 2 and thenevaluated there. The result is,
h (f;2)
4
Important: Exercise caution when sending entities to a remote server. A trivialmistake could place the server into a non-responding state. It is safer todefine a function on the server and screen its input internally.
A special case of the general message format, which we used previously, isa string in which f is a q expression to be executed on the server and thereare no args. For example,
"a:6*7"
"select avg price from t where date>2006.01.01"
This format can be used to execute a function that has been defined on theserver. For example, suppose g is defined on theserver as,
g:{x*x*x}
Executing the following on the client sends the string "g 2" tothe server where it is evaluated. The result is,
h "g2"
8
Compare this with the example above where f is defined on the client.
Note: If theexpression in the execution string contains special characters, they must beescaped. For example, to define a string on the server, you must escape thedouble quotes in the message string.
"str:\"abc\""
When the remote function performs an operation on a table, it can beviewed as a remote stored procedure. For example, suppose t and f are definedon the server as,
t:([]c1:`a`b`c;c2:1 2 3)
f:{[x] selectc2 from t where c1=x}
The following expression on the client executes f on the server, selectingrows that match the value `b in c1,
h "f`b"
c2
--
2
The equivalent of dynamic SQL can be achieved by passing a functiondefinition.
h ({[x]select c2 from t where c1=x};`b)
+(,`c2)!,,2
Synchronous Messages
The messages sent in the previous sections were synchronous,meaning that the sending client process waits for a result from the serverbefore proceeding. The result of the operation on the server becomes the returnvalue of the remote call that uses the connection handle.
To send a synchronous message, use the original positive int value of theconnection handle as if it were a function. A typical example of sending asynchronous message is executing a select expression on the server. In thiscase, you surely want to wait for the result to return.
For example, suppose a table has been defined on the server as,
t:([]c1:`a`b`c;c2:1 2 3)
The following message executes a query against t, assuming h is an openconnection handle to the server.
h"select from t where c1=`b"
c1 c2
-----
b 2
Note: Theprevious example demonstrates how to perform the equivalent of dynamic SQLagainst the server process.
As another example, send an insert synchronously if you want confirmationof the operation.
h "`tinsert (`x;42)"
,3
h"t"
c1 c2
-----
a 1
b 2
c 3
x 42
Asynchronous Messages
It is also possible to send messages asynchronously, meaning thatthe client does not wait and there is no result containing a return value. Youwould typically send an asynchronous message to kick off a long-runningoperation on the server. You might also send an asynchronous message if theoperation does not have a meaningful result, or if you simply don't care towait for the result.
To send an asynchronous message, use the negative of the int connectionhandle returned by hopen. For example, the insert that was sent synchronouslyin the previous example can also be sent asynchronously,
(neg h)"`t insert (`y;43)"
h"t"
c1 c2
-----
a 1
b 2
c 3
x 42
y 43
Observe that there is no return value from the first message.
Advanced: In theprevious example, because the first message is asynchronous, it is possiblethat the second message will be sent from the client before the insert hascompleted on the server. However, the second message will not execute on theserver until the first has completed.
Message Handlers
When a q process receives a message via interprocess communication, thedefault behavior is to evaluate the message, effectively executing the messagecontent. If the message is synchronous, the result is returned to the client.
During message processing on the server, the server connection handle isautomatically placed in .z.w . This can be used to manage connections on theserver. See below for a simple example.
Note: Theconnection handle on the client side and the connection handle on the serverside are assigned independently by their respective q processes. In general,they are not equal.
The default message processing can be overridden using message filters.Message filters are event-handling functions in the .z context. The .z.pgmessage filter processes synchronous requests and .z.ps processes asynchronousrequests.
Advanced: The namesend in 'g' and 's' because synchronous processing has "get" semanticsand asynchronous processing has "set" semantics.
The following two assignments on the server recreate the default messageprocessing behavior.
.z.ps:{value x}
.z.pg:{value x}
Message filtering can be used for a variety of purposes. For example,suppose the connection allows a user on the client side to execute dynamicq-sql against the server. You could improve on the default processing byenclosing the evaluation in protected execution.
.z.pg:{@[value; x; errHandler x]}
Here errHandler is a function that recovers from an unexpected error.
A more interesting example is a server that keeps track of the clientsconnected to it. A simplistic way to do this is to maintain a dictionary ofconnection handles mapped to client names. The following function on the serverregisters a new client connection by upserting it to the global dictionary cp.Remember, .z.w has the connection handle.
cp:()!() / server
regConn:{cp[.z.w]::x} / server
The client could pass its machine name,
h:hopen`::5042 /client
h / client
224
h"regConn `",string .z.h / client
After this call, cp will contain an entry that reflects the specifichandle assigned to the connection on the server. For example,
cp / server
4| macpro.local
As additional connections are made to the server, cp will contain oneentry for each connection.
Handling Close
An open connection can be closed by either the client or the server. Theclose can be deliberate, meaning it occurs under user or program control, or itcan be unanticipated due to a process terminating unexpectedly.
The close handler .z.pc can be used to perform processing whenever aconnection is closed from the other end. While it will be invoked on any close,it does not know how the close was initiated.
In our example above, we use a close handler to remove the informationabout a connection once it is closed. Specifically, we create a handler toremove the appropriate entry from cp.
.z.pc:{cp::cp _ x} / server
When the client issues an hclose on its connection handle,
hcloseh / client
the dictionary cp no longer shows the connection,
cp / server
_
Now that we have established basic close handling on the server, we turnour attention to the client. We want the client to reconnect automatically inthe event the server disconnects for any reason. The easiest way to do this iswith the timer.
We create a close handler that resets the global connection handle to 0and issues a command that sets the timer to fire every 2 seconds (2000milliseconds).
.z.pc:{h::0; value"\\t 2000"}
The timer handler attempts to re-open the connection. Upon success, itissues a command that turns the timer off.
.z.ts:{h::hopen`::5042; if[h>0;value"\\t 0"]}
Note: Inpractice, you should restrict the number of connection retries rather than tryforever.
Http Connection Handler
There is also a message handler for http connections, named z.ph. Sincehttp communication is always synchronous, there is only one handler. Incontrast to other system handlers, there is a default handler for http, whichis used for the q web viewer.
The default handler allows a q process to be accessed programmaticallyover the web, similar to a servlet. The ambitious reader could replace thiswith a handler that processes SOAP, thus enabling q to be a web service. (Sucha handler would be the object of derision from those who decry SOAP asunnecessary and wasteful.)
Contents
[hide]
- 1 Workspace Organization
- 1.1 Overview
- 1.2 Contexts
- 1.2.1 Context Notation
- 1.2.2 Reserved Contexts
- 1.2.3 Working with Contexts
- 1.2.4 A Context is a Dictionary
- 1.2.5 Expunging from a Context
- 1.2.6 Functions and Contexts
- 1.2.7 Namespaces (Advanced)
12. WorkspaceOrganization
Overview
The collection of entities that exist in a q sessioncomprises the workspace. In other words, the workspace includes allatoms, lists, dictionaries, functions, enumerations, etc., that have beencreated through the console or via script execution.
Any programming environment of reasonable complexity hasthe potential for name clashes. For example, should two separate q scripts bothcreate a variable called 'foobar', one will overwrite the value of the other.Variable typing is of no help here, since a variable can be reassigned with adifferent type at any time.
The solution to name clashes is to create namespaces. Thisis accomplished with a hierarchical naming structure implemented with aseparator character, usually a dot or a slash. For example, the name spacesAandB can both have an entity foobar , yet A.foobarandB.foobar are distinct. A familiar example of this is thehierarchical directory/file system used by operating systems.
Namespaces in q are called directories or contexts.Contexts provide an organization of the workspace.
Contexts
The q workspace provides a simple namespace structure usingdot notation for entity names. Each of the nodes is called acontext, oradirectory. The default context, also called theroot,comprises all entities whose names start with an initial alpha character. Thevariables we have created heretofore have resided in the default context.
ContextNotation
A context name has the form of a dot ( . )followed by alphnums, starting with an alpha. The following are all validcontext names.
.a.q.z0.zaphodThere is no need to pre-declare the context name. As in thecase of variables, a context is created dynamically as required. You specify avariable to a context by prepending the context name to the variable name,separated by a dot (. ). The variablefoobar can be createdin various contexts,
foobar:42.aa.foobar:43.z0.foobar:45.zaphod.foobar:46Variables of the same name in different contexts are indeeddistinct,
foobar42.aa.foobar43.z0.foobar45.zaphod.foobar46When an entity name includes its full context name, we saythe name is fully qualified. When an entity name omits the context name,we say the name isunqualified.
ReservedContexts
All contexts of a single letter (both lower and upper case)are reserved for q itself. Some of these are listed below:
Name
Use
.q
Built-in functions
.Q
Low-level routines used by q
.z
Environmental interaction
Important:While q will not prevent you from placing entities in the reserved contexts,doing so risks serious problems should you collide with names used by q.
Workingwith Contexts
At any time in a q session, there is a current or workingcontext. When you start a q session, the current context is the defaultcontext. You change the current context with the\d command. Forexample, to switch to the 'files' context,
\d .filesTo switch back to the default context,
\d .To display the current context,
\d`.Any entity in the current context can be specified usingits unqualified name.
\d . / switch to default context.files.home:`c:.files.home`c:\d .fileshome`c:A Contextis a Dictionary
A context is actually a sorted dictionary whose domain is alist of symbols with the names of the entities defined in the context. Applythekey function to the dictionary name to display the names of theentities in the context. Applyvalue to see the entire dictionarymapping.
.new.a:42.new.L:1 2 3.new.d:`a`b`c!1 2 3key `.new``a`L`dvalue `.new| ::a| 42L| 1 2 3d| `a`b`c!1 2 3Observe that q places an entry into any non-default contextthat maps the null symbol to the null item.
You can look up an entity name in the directory to get itsassociated value. Use a symbol containing the context name to refer to thedictionary.
`.new[`L]1 2 3Note:In order to access an entity in the default context from another context, youmust retrieve the value from the context dictionary. There is no syntacticform.
\d .ztop:42\d .new`.[`ztop]42Expungingfrom a Context
We have seen that a context is a directory that maps entitynames for the context to their values. This means that in order to expunge anentity from a context, we can simply delete it from the dictionary.
For example, if we can define a variable a in the context .newand then remove it from the workspace when it is no longer needed. Observe thatwe use the symbolic name of the context to ensure that the delete is applied toit by reference.
.new.a:42.new| ::a| 42// do some work .../delete a from `.new`.new.new| ::In particular, to expunge a global entity from the defaultcontext, use `. as the directory name. In a fresh workspace we find,
a:42b:98.6c:`life\v`s#`a`b`cdelete a from `.`.\v`s#`b`cFunctionsand Contexts
Function definition presents an issue with respect toglobal variable references and unqualified names. In the following function,the variablea is an unqualified global variable,
f:{a+x}There is a potential ambiguity with respect to the contextof a. Is the context resolved at the timef is defined, or isit resolved at the timef is evaluated?
Important:The context of an unqualified global variable in a function is the context inwhich the function is defined, not the context in which it is evaluated.
Thus, we find
\d .a:42\d .libf:{a+x}f[6]{a+x}'a)\a:100f[6]106\d ..lib.f[6]106We also find the following result, because even though glives in the .lib context, it is defined in the default context.
\d ..lib.g:{a*x}a:42g[2]'g\d .libg[3]126a:6g[7]294Namespaces(Advanced)
It is possible to simulate a multi-level namespacehierarchy by using multiple dots in names.
.lib1.vars.op1:6.lib1.vars.op2:7.math.fns.f:{x*y}.math.fns.f[.lib1.vars.op1;.lib1.vars.op2]42In the example above, q creates dictionaries at each nodeof the tree.
#!qvalue `.lib1.vars| ::op1| 6op2| 7value `.math.fns| ::f| {x*y}But appearances are deceiving. As of this writing (Jan2007), q does not recognize a context tree below the first level. So, inour example, you can not switch to a context.lib1.vars using the\dcommand.
\d .math.fns'.math.fnsYou must access the contents of a node dictionarybelow the top level functionally.
`.math.fns[`f] [6;7]42The following is arguably more readable.
mlib:`.math.fnsmlib[`f][6;7]42vlib:`.lib1.varsvlib[`op1`op2]6 7mlib[`f] . vlib[`op1`op2]42This is one way to perform late-bound computation usingmembers in the context tree.
Contents
[hide]
- 1 Commands and System Variables
- 1.1 Command Format
- 1.1.1 Tables (\a)
- 1.1.2 Console (\c)
- 1.1.3 Web Console (\C)
- 1.1.4 Change O/S Directory (\cd path)
- 1.1.5 Directory (\d)
- 1.1.6 Functions (\f)
- 1.1.7 Load (\l)
- 1.1.8 Offset (\o)
- 1.1.9 Port (\p)
- 1.1.10 Precision (\P)
- 1.1.11 Seed (\S)
- 1.1.12 Timer (\t)
- 1.1.13 Elapsed Time (\t expr)
- 1.1.14 Timeout (\T)
- 1.1.15 Variables (\v)
- 1.1.16 Workspace (\w)
- 1.1.17 Week Offset (\W)
- 1.1.18 Expunge Handler (\x)
- 1.1.19 Date Format (\z)
- 1.1.20 Operating System (\text)
- 1.1.21 Interrupt (Ctrl-C)
- 1.1.22 Terminate (\)
- 1.1.23 Exit Q (\\)
- 1.2 System Variables
- 1.2.1 IP Address (.z.a)
- 1.2.2 Dependencies (.z.b)
- 1.2.3 Global Date (.z.d)
- 1.2.4 Local Date (.z.D)
- 1.2.5 Startup File (.z.f)
- 1.2.6 Host (.z.h)
- 1.2.7 Process ID (.z.i)
- 1.2.8 Release Date (.z.k)
- 1.2.9 Release Major Version (.z.K)
- 1.2.10 License Information (.z.l)
- 1.2.11 O/S (.z.o)
- 1.2.12 Process Close (.z.pc)
- 1.2.13 Process Get (.z.pg)
- 1.2.14 Process HTTP Get (.z.ph)
- 1.2.15 Process Input (.z.pi)
- 1.2.16 Process Open (.z.po)
- 1.2.17 Process HTTP Post (.z.pp)
- 1.2.18 Process Set (.z.ps)
- 1.2.19 Global Time (.z.t)
- 1.2.20 Local Time (.z.T)
- 1.2.21 Timer Expression (.z.ts)
- 1.2.22 User (.z.u)
- 1.2.23 Value Set (.z.vs)
- 1.2.24 Handle (.z.w)
- 1.2.25 Command Line Parameters (.z.x)
- 1.2.26 GMT (.z.z)
- 1.2.27 Local Date and Time (.z.Z)
- 1.3 Command Line Parameters
- 1.3.1 Console (-c)
- 1.3.2 Web Browser Console (-C)
- 1.3.3 Offset (-o)
- 1.3.4 Port (-p)
- 1.3.5 Print Digits (-P)
- 1.3.6 Timer (-t)
- 1.3.7 Timeout (-T)
- 1.3.8 Workspace Size (-w)
- 1.3.9 Week Offset (-W)
- 1.3.10 Date Format (-z)
13. Commandsand System Variables
CommandFormat
Commands control aspects of the q environment. A commandbegins with a back-slash (\) and is followed by one or morecharacters. Some commands have an optional parameter that is separated from thecommand by whitespace.
Important:Case is significant in the command characters.
To execute a command programmatically, place it in a stringand use the value function.
value "\\p 5042"Note:A backslash in the string must be escaped.
Tables(\a)
The command \a returns a list of symbols with thenames of all tables in the current context. For example, in a fresh q session,
t:([]c1:1 2 3; c2:`a`b`c)\a,`tConsole(\c)
The command \c (note lower case) controls the sizeof the q virtual console display. The first parameter specifies the number ofrows and the second the number of columns. The default setting is 23 by 79.
til 1000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ..\c 23 200til 10000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 2930 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 5657 58 59 60 61 62 63 64 65 66 67 68 ..WebConsole (\C)
The command \C (note upper case) controls the sizeof the q web console display. The first parameter specifies the number of rowsand the second the number of columns. The default setting is 36 by 2000.
Change O/SDirectory (\cd path)
The \cd command affects the current workingdirectory of the underlying operating system. To display the current directory,issue\cd with no argument.
\cd"/Users/jeffry/bin"The result of \cd is the text string as receivedfrom the O/S with escapes where applicable. For Windows, the back-slashcharacters are escaped and are not converted to forward-slashes.
To change the current working directory, issue \cdwith the path of the desired directory.
\cd /qIf the specified directory does not exist, it will becreated.
Note:Since the argument of\cd is not a string, special characters do not need to be escaped.
Directory(\d)
The \d command controls the current context(directory).
To determine the current context, issue \d with noparameter.
\d`.To set the current context, issue \d followed bythe target context.
\d .tutorial\d`.tutorialNote:If the specified context does not exist, using it in\d will causeits creation.
Issue \d . to set the current working context tothe default context.
\d .\d`.Functions(\f)
The \f command returns a sorted list containingthe functions in a context (directory). When used with no parameters, itreturns the functions in the current context.
\f`s#`diff`f`gUse \f with the name of a context to list itsfunctions.
\f .debug`s#``addBPs`break`clearBPs`deleteBPs`stopLoad (\l)
A script can be loaded at startup of q or during a session.To load the script from the session, issue the\l command with the(optionally qualified) name of the script file.
For example, to load the distribution script sp.q from thecurrent directory,
\l sp.q+`p`city!(`p$`p1`p2`p3`p4`p5`p6`p1`p2;`london`london`london`london`london`lon..(+(,`color)!,`blue`green`red)!+(,`qty)!,900 1000 1200+`s`p`qty!(`s$`s1`s1`s1`s2`s3`s4;`p$`p1`p4`p6`p2`p2`p4;300 200 100 400 200 300)Offset(\o)
The \o command sets the offset in hours from GMTused to determine local time in. For example,
.z.z2007.04.12T11:31:13.352.z.Z2007.04.12T07:31:15.365\o -2.z.z2007.04.12T11:31:35.954.z.Z2007.04.12T09:31:37.587Port (\p)
The \p command controls which port the kdb+ serverlistens on. For example,
\p 5001means that it will listen for connections on port 5001.
Note:When you issue the \p commend, kdb+ attempts to open the port. For this to besuccessful, the security settings of the machine must allow it.
If the port has not been set, you will see,
\p0This means that no connection to this instance of kdb+ iscurrently possible because it is not listening on any port. You can also issue\p 0 to stop listening on any port.
Precision(\P)
The precision command \P (note the upper case)sets the display precision for floating point numbers to the specified numberof digits.
The default precision is 7, meaning that the display offloat or real values is rounded to the seventh significant digit.
\P7f:1.23456789012345678f1.234568Set the precision with a non-negative int parameter.
\P 12f1.23456789012Set the precision to the maximum available that respectsmultiplicative tolerance with 0. This is currently the same as using 16.
\P 0f1.234567890123457Set the precision to the maximum available with 17.
\P 17f1.2345678901234569Seed (\S)
The \S (note upper case) sets the seed forpseudo-random number generation. The default value is -314159. The argument isan integer.
\S-314159\S 424242\S424242Timer (\t)
The \t command controls the timer. The optionalparameter is the number of milliseconds between timer ticks, with 0 signifyingthat the timer is off. On each timer tick, the function.z.ts isinvoked if it has been assigned.
To determine the current timer setting, issue \twith no parameter.
\t0To set the timer, issue \t with the number ofmilliseconds. For example, to set the timer to tick once a second,
\t 1000Note:The actual timer tick frequency is determined by the timing granularitysupported by the underling operating system. This can be considerably less thana millisecond.
To turn the timer off,
\t 0ElapsedTime (\t expr)
When the \t command is invoked with an expressionas its parameter, the expression is evaluated and its duration of execution isreported. This can be used to profile code execution when tuning anapplication.
In q there are often multiple ways to achieve a desired result,but one may execute significantly faster. This may not matter for small tablesor sporadic updates, but for processing very large volumes of data in real timeit can be essential. Inserting\t at key points in the program canidentify the critical routines that are consuming the most time. By measuringthe execution times of alternate expressions for the critical routines, you candetermine which is most efficient in your environment.
The following measures the time required to add the first100,000 integers 10,000 times on the author's laptop.
\t do[10000; sum til 100000]2553We conclude that adding the first 100,000 integers oncerequires approximately .25 milliseconds.
If it is actually necessary to add the first 100,000integers in an application, you could use the formula,
sn = (n*n-1)%2We time it for n = 100,000.
\t do[10000; (100000*99999)%2]10As you can see, this is roughly 200 times faster thanperforming the actual addition. We can do even better by replacing the divisionwith a multiplication,
sn = .5*n*n-1To see the effects clearly, we increase the counter to100,000.
\t do[100000; sum til 100000]25216\t do[100000; (100000*99999)%2]120\t do[100000; .5*100000*99999]80Timeout(\T)
The \T command (note upper case) controlsexecution timeout. The int parameter is the number of seconds any call from aclient will execute before it is timed out and terminated. The default value is0 which means no timeout.
Variables(\v)
The \v command returns a sorted list containingthe variables in the current context (directory). When used with no parameters,it returns the variables in the current context.
\v`s#`L`h`kt`p`pi`r`sqrt2`t`tdetails`thirdUse \v with the name of a context to list its variables.
\v .debug`s#`breakPoints`stopPointsWorkspace(\w)
The workspace command \w (note lower case)displays six integer values that indicate memory usage by the currentworkspace.
\w168144 67108864 67108864 0 0 8589934592jThe first value indicates the number of bytes currentlyallocated. The second indicates the total number of bytes available in theheap. The third indicates the maximum heap seenn so far in the current session.The fourth indicates the maximum bytes available if set with the -w commandline option, else 0. The fifth display the bytes mapped. The sixth displays thephysical memory.
WeekOffset (\W)
The week offset command \W (note upper case)specifies the start of week offset. An offset of 0 corresponds to Saturday. Thedefault is 2, which is Monday.
ExpungeHandler (\x)
The expunge handler command \x deletes the assignment of auser-specified function to one of the.z.p* event handlers andrestores the default behavior. For example, if you have assigned a routine to.z.pcin order to process remote connection close, reset with,
\x .z.pcDateFormat (\z)
The date format command \z specifies the formatfor date parsing. A value of 0 corresponds tomm/dd/yyyy; a value of 1corresponds todd/mm/yyyy.
\z0"D"$"12/31/2007"2007.12.31"D"$"31/12/2007"0Nd\z 1"D"$"12/31/2007"0Nd"D"$"31/12/2007"2007.12.31OperatingSystem (\text)
If a backslash is followed by characters not recognized asa kdb+ command, the text is assumed to be an operating system command and ispassed to the O/S for execution.
For example, you can issue,
\dir / display Windows directory(" Volume in drive C has no label.";" Volume Serial Number is E89F-3533";..\pwd / display Unix directory"/Users/jeffry/bin"Any return value from the O/S is displayed as a list ofstrings.
Interrupt(Ctrl-C)
You can terminate a long-running routine by pressing the Ctrl-Ccombination.
Terminate(\)
The terminate command, denoted by a single backslash (\),exits one level of the q interpreter. This is useful when debugging a failedfunction evaluation. In the following console shot, we do not suppress the qprompt.
q)f:{x*y}q)f[2;`3]{x*y}'type*2`3q))\q)_Here the underscore denotes the blinking cursor.
Advanced:If you issue \ at the "q)" prompt, you drop into a k session.
q)\_Again, the underscore denotes the blinking cursor. Becausek is q's underlying implementation language, some q expressions will execute asexpected in the k session but most will not. Explanation of k is beyond thescope of this manual.
To return to the q console from a k session and see the"q)" prompt again, enter a single \ at the prompt.
\q)Exit Q(\\)
To exit the q process, enter a double backslash (\\),
\\Important:There is no confirmation prompt for \\. The q session is terminated with extremeprejudice.
SystemVariables
Variables in certain reserved contexts provide useful qenvironmental interaction.
IP Address(.z.a)
The variable .z.a is an int representing the IPaddress of the current running kdb instance. To see the usual four-integer IPaddress, decode the int using base 256. For example, on the author’s laptop,
.z.a-1442929031`int$0x00 vs .z.a169 254 166 121Dependencies(.z.b)
The systen variable .z.b is a dictionary thatrepresents variable dependencies. Recall that non-local assignment with ::establishes a dependency between the variable and variables in the expressionassigned to it. These dependencies are recorded in the dictionary .z.bthat maps a variable name to a list of the names of variables that depend onit.
For example, in a new q session, we find,
a:42b:98c::a+b.z.ba| cb| cGlobalDate (.z.d)
The variable .z.d retrieves the date component ofGreenwich Mean Time (GMT) and is equivalent to,
\`date$.z.zLocal Date(.z.D)
The variable .z.D retrieves the local datecomponent from the local datetime and is equivalent to,
`date$.z.ZStartupFile (.z.f)
The system variable .z.f is a symbol representingthe name of the file or directory provided on the command line when the runninginstance of q was invoked. For example, if q is invoked from the O/S consolewith,
q.exe convertargs.q 42 forty 2.0we find,
.z.f`convertargs.q.z.x("42";"forty";"2.0")Host (.z.h)
The variable .z.h is a symbol representing thename of the host running the q instance.
.z.h`macpro.localProcess ID(.z.i)
The system variable .z.i is an int representingthe process id of the running q instance.
.z.i8615Note:As of this writing (Jun 2007), .z.i is not yet implemented on Windows.
ReleaseDate (.z.k)
The system variable .z.k is a date valuerepresenting the release date of the running kdb+ instance.
.z.k2006.06.01ReleaseMajor Version (.z.K)
The system variable .z.K is a float value representing themajor version of the running kdb+ instance.
.z.K2.4LicenseInformation (.z.l)
The variable .z.l is a list of strings containinginformation about the license of the running kdb+ instance. The most useful arethe items in positions 1 and two which represent the expiry date and updatedate, respectively.
#1q.z.l("";"2007.07.01";"2007.07.01";,"1";,"1";,"0";,"0")O/S (.z.o)
The system variable .z.o is a symbol representingthe underlying operating system. For example, this tutorial is being written ona 64 bit Mac system.
.z.o`m64ProcessClose (.z.pc)
The variable .z.pc is a q function representing anevent handler that is executed whenever a connection to the current q processis closed. SeeInterprocess Communication for a discussion.
To reset the .z.pg to the default behavior, issuethe command,
\x .z.pcProcessGet (.z.pg)
The variable .z.pg is a q function representing anevent handler that is executed whenever a client q process makes a synchronouscall to the current q process. The name derives from the fact that anasynchronous call has get semantics. SeeInterprocess Communication for a discussion.
To reset the .z.pg to the default setting, issuethe command,
\x .z.pgProcessHTTP Get (.z.ph)
The variable .z.ph is a q function representing anevent handler that is executed whenever an HTTP get is routed to the current qprocess. SeeInterprocess Communication for a discussion.
To reset the .z.ph to the default setting, issuethe command,
\x .z.phProcessInput (.z.pi)
The variable .z.pi is a qfunction representing an event handler that is executed when q echoes theresult of user input to the console. You can make the console display mimicthat of 2.3 by assigning,
.z.pi:{-1 .Q.s1 value x}You can make the console display mimic that of 2.4 byassigning,
.z.pi:{-1 .Q.s value x}To reset the .z.pi to the default setting, issuethe command,
\x .z.piProcessOpen (.z.po)
The variable .z.po is a q function representing anevent handler that is executed whenever a connection to the current q processis opened. SeeInterprocess Communication for a discussion.
To reset the .z.po to the default setting, issuethe command,
\x .z.poProcessHTTP Post (.z.pp)
The variable .z.pp is a q function representing anevent handler that is executed whenever an HTTP post is routed to the current qprocess.
To reset the .z.pp to the default setting, issue thecommand,
\x .z.ppProcessSet (.z.ps)
The variable .z.ps is a q function representing anevent handler that is executed whenever a client q process makes anasynchronous call to the current q process. The name derives from the fact thatan asynchronous call has set semantics. SeeInterprocess Communication for a discussion.
To reset the .z.ps to the default setting, issuethe command,
\x .z.psGlobalTime (.z.t)
The variable .z.t retrieves the time component ofGreenwich Mean Time (GMT) and is equivalent to,
`time$.z.zLocal Time(.z.T)
The variable .z.T retrieves the time component ofGreenwich Mean Time (GMT) and is eqivalent to,
`time$.z.zTimerExpression (.z.ts)
The variable .z.ts is a q function representing anevent handler that is executed on every timer tick (see the command\t).For example, the following displays local time to the console approximatelyevery two seconds.
.z.ts:{0N!`time$.z.Z}\t 200007:20:00.32907:20:02.33207:20:04.335...User(.z.u)
The variable .z.u is a symbol representing theuser id that invoked the running q instance.
.z.u`JeffryValue Set(.z.vs)
The variable .z.vs is a q function representing anevent handler that is executed whenever anyglobal variable in rootnamespace is assigned in q. You could use.z.vs, for example, tomonitor who is modifying certain variables.
The signature of the handler is,
{[v;i]...}where v represents a symbol with the name of thevariable being assigned andi is the index for which the assignment isapplied. The following trivial handler displaysv andi tothe console.
.z.vs:{[v;i]0N!v;0N!i;}a:42`a()a:til 5`a()a[2]:42`a,2a[0 3]:6`a,0 3Since the granularity of .z.s is all or nothing,you'd need to write your own logic to monitor only certain variables, forinstance.
To remove the handler, issue the command \x .z.vs.
\x .z.vsa:42_Handle(.z.w)
The variable .z.w contains an int with theconnection handle (i.e., “who”) during synchronous or asynchronous requestprocessing. SeeInterprocess Communication for a discussion.
CommandLine Parameters (.z.x)
The system variable .z.x is a list of stringsrepresenting the command line parameters provided after the name of the file ordirectory on the command line when the running instance of q was invoked. Forexample, if q is invoked from the O/S console with,
q.exe convertargs.q 42 forty 2.0we find,
.z.f`convertargs.q.z.x("42";"forty";"2.0")GMT (.z.z)
The variable .z.z is a datetime value representingthe current Greenwich Mean Time (GMT) as reported by the operating system.
.z.z2007.02.02T15:24:28.156Local Dateand Time (.z.Z)
The variable .z.Z is a datetime value representing thecurrent local time as known to the operating system.
.z.Z2007.02.02T10:24:30.820Note:The -o startup option or \o command override the default time zone offset asdetermined by the operating system. This is useful when you want to adjust timemanually, such as for daylight savings time.
CommandLine Parameters
We describe here the options of a q session that can be setvia command line parameters. A command line parameter is deonted by a dash (-)and a single character, followed by whitespace and then the valu(s) of theparameter. Multiple command line characters are separated by whitespace and canbe entered in any order.
Note:The case of the command line character is significant.
Most command line parameters have equivalent workspacecommands denoted by the same character. SeeCommand Format (\d)for detailed descriptions and examples.
Console(-c)
The console parameter is a pair of ints that specifythe size of the q virtual console display. The first specifies the number ofrows and the second the number of columns. The default setting is 23 by 79.This parameter corresponds to the command\c.
WebBrowser Console (-C)
The web console parameter (note upper case) is apair of ints the specify the size of the q web console display. The firstparameter specifies the number of rows and the second the number of columns.The default setting is 36 by 2000. This parameter corresponds to the command \C.
Offset(-o)
The offset parameter is an int that sets the offsetin hours from GMT used to determine local time in.z.Z. This parametercorresponds to the command\o.
Port (-p)
The port parameter is an int that specifies the portnumber on which the kdb+ server listens. This parameter corresponds to thecommand\p.
PrintDigits (-P)
The print digits parameter is an int that specifiesthe display precision for floating point numbers to the specified number ofdigits. The default precision is 7, meaning that the display of float or realvalues is rounded to the seventh significant digit. This parameter correspondsto the command \P.
Timer (-t)
The timer parameter is an int that specifies thenumber of milliseconds between timer ticks, with 0 signifying that the timer isturned off. This parameter corresponds to the command\t.
Timeout(-T)
The timeout parameter (note upper case) is an intthat specifies the number of milliseconds any call from a client will executebefore it is timed out and terminated. The default value is 0 which means notimeout. This parameter corresponds to the command \T.
WorkspaceSize (-w)
The workspace parameter is an int that specifies themaximum workspace size in megabytes. The default value is unlimited. A value of0 means an unlimited workspace. In a multithreaded mode, as each thread has itsown heap, this limit is per thread and not per process.
WeekOffset (-W)
The week offset parameter (note upper case) is anint that specifies the start of week as an offset from Saturday. For example,
q –W 2starts a q session in which Monday is considered thebeginning of the week.
DateFormat (-z)
The date format parameter is a boolean value that specifiesthe format expected in date parsing. A value of 0 corresponds tomm/dd/yyyy;a value of 1 corresponds todd/mm/yyyy. This parameter corresponds tothe command \z.
Contents
- 1 Built-in Functions
- 1.1 Overview
- 1.2 String Functions
- 1.2.1 like
- 1.2.2 lower
- 1.2.3 ltrim
- 1.2.4 rtrim
- 1.2.5 ss
- 1.2.6 ssr
- 1.2.7 string
- 1.2.8 sv
- 1.2.9 trim
- 1.2.10 upper
- 1.2.11 vs
- 1.3 Mathematical Functions
- 1.3.1 acos
- 1.3.2 asin
- 1.3.3 atan
- 1.3.4 cor
- 1.3.5 cos
- 1.3.6 cov
- 1.3.7 cross
- 1.3.8 inv
- 1.3.9 lsq
- 1.3.10 mmu
- 1.3.11 sin
- 1.3.12 tan
- 1.3.13 var
- 1.3.14 wavg
- 1.3.15 wsum
- 1.4 Aggregate Functions
- 1.4.1 all
- 1.4.2 any
- 1.4.3 avg
- 1.4.4 dev
- 1.4.5 med
- 1.4.6 prd
- 1.4.7 sum
- 1.5 Uniform Functions
- 1.5.1 deltas
- 1.5.2 differ
- 1.5.3 fills
- 1.5.4 mavg
- 1.5.5 maxs
- 1.5.6 mcount
- 1.5.7 mdev
- 1.5.8 mins
- 1.5.9 mmax
- 1.5.10 mmin
- 1.5.11 msum
- 1.5.12 next
- 1.5.13 prds
- 1.5.14 prev
- 1.5.15 rank
- 1.5.16 ratios
- 1.5.17 rotate
- 1.5.18 sums
- 1.5.19 xbar
- 1.5.20 xprev
- 1.5.21 xrank
- 1.6 Miscellaneous Functions
- 1.6.1 Conditional Append (?)
- 1.6.2 asc
- 1.6.3 bin
- 1.6.4 count
- 1.6.5 cut
- 1.6.6 delete (_)
- 1.6.7 desc
- 1.6.8 distinct
- 1.6.9 drop (_)
- 1.6.10 eval
- 1.6.11 except
- 1.6.12 exit
- 1.6.13 fill (^)
- 1.6.14 find (?)
- 1.6.15 flip
- 1.6.16 getenv
- 1.6.17 group
- 1.6.18 iasc
- 1.6.19 identity
- 1.6.20 idesc
- 1.6.21 in
- 1.6.22 inter
- 1.6.23 join (,)
- 1.6.24 join-each (,')
- 1.6.25 list
- 1.6.26 null
- 1.6.27 parse
- 1.6.28 rand (?)
- 1.6.29 raze
- 1.6.30 reshape (#)
- 1.6.31 reverse
- 1.6.32 sublist
- 1.6.33 system
- 1.6.34 take (#)
- 1.6.35 til
- 1.6.36 ungroup
- 1.6.37 union
- 1.6.38 value
- 1.6.39 where
- 1.6.40 within
14. Built-inFunctions
Overview
The collection of built-in functions in q is rich andpowerful. In this chapter, we group functions by form. Astring functiontakes a string and returns a string. Anaggregate function takes a listand returns an atom. Auniform function takes a list and returns a listof the same count. A mathematical function takes numeric arguments and returnsa numeric argument derives by some numerical calculation.
Note that these categories are not mutually exclusive. Forexample, some mathematical functions are also aggregate functions.
StringFunctions
The basic string functions perform the usual stringmanipulations on a list of char. There are also powerful functions that areunique to q.
like
The dyadic like performs pattern matching on itsfirst string argument (source) according to the pattern in its stringsecond argument (pattern). It returns a boolean result indicatingwhetherpattern is matched. The pattern is expressed as a mix of regularcharacters and special formatting characters. The special chars are"?", "*", the pair"[" and"]",and "^" enclosed in square brackets.
The special char "?" represents an arbitrarysingle character in the pattern.
"fan" like "f?n"1b"fun" like "f?n"1b"foP" like "f?p"0bThe special char "*" represents an arbitrarysequence of characters in the pattern.
Note:As of this writing (Jan 2007), only a single occurance of * is allowed in thepattern.
"how" like "h*"1b"hercules" like "h*"1b"wealth" like "*h"1b"flight" like "*h*"1b"Jones" like "J?ne*"1b"Joynes" like "J?ne*"0b"Joynes" like "J*ne*"'nyiThe special character pair "[" and"]" encloses a sequence of alternatives for a single charactermatch.
"flap" like "fl[ao]p"1b"flip" like "fl[ao]p"0b"459-0609" like "[09][09][09]-0[09][09][09]"1b"459-0609" like "[09][09][09]-1[09][09][09]"0bThe special character "^" is used in conjunctionwith "[" and "]" to indicate that the enclosedsequence of characters is disallowed. For example, to test whether a stringends in a numeric character,
"M26d" like "*[^09]"1b"Joe999" like "*[^09]"0blower
The monadic lower takes a char or string argumentand returns the result of converting any alpha characters to lower case.
lower "A""a"lower "a Bc42De""a bc42de"ltrim
The monadic ltrim takes a string argument andreturns the result of removing leading blanks.
ltrim " abc ""abc "You can also apply ltrim to a non-blank char.
ltrim "a""a"rtrim
The monadic rtrim takes a string argument andreturns the result of removing trailing blanks.
rtrim " abc "" abc"You can also apply rtrim to a non-blank char.
rtrim "a""a"ss
The dyadic ss ("string search") performsthe same pattern matching as like against its first string argument (source),looking for matches to its string second argument (pattern). However,the result ofss is a list containing the position(s) of the matchesof the pattern insource. See above for a discussion of like.
"Now is the time for all good men to come to" ss "me"13 29 38"fun" ss "f?n",0If no matches are found, an empty int list is returned.
"aa" ss "z"`int$()Note:You cannot use * to match withss.
ssr
The triadic ssr ("string search andreplace") extends the capability ofss with replacement. Theresult is a string based on the first string argument (source) in whichall occurrences of the second string argument (pattern) are replacedwith the third string argument.
ssr["suffering succotash";"s";"th"]"thuffering thuccotathh"Note:You cannot use * to match withssr.
string
The monadic string can be applied to any q entityto produce a textual representation of the entity. For scalars, lists andfunctions, the result ofstring is a list of char that does notcontain any q formatting characters. Following are some examples.
string 42"42"string 6*7"42"string 42422424242j"42422424242"string `Zaphod"Zaphod"f:{[x] x*x}string f"{[x] x*x}"The next example demonstrates that string is notatomic, because the result of applying it to an atom is alist of char.
string "4","4"The next example may be surprising.
string 0x42"42"To see why, recall from Creating Symbols from Stringsthat a string can be parsed into q data using $ with the appropriateupper-case type domain character. Now, converting to a string and parsing froma string should be inverse maps, in that their composite returns the originalinput value. That is, we should find,
"X"$string 0x420x42Thus, the behavior of string is determined by thatof parse.
"X"$"42"0x42Comparing these two results, we see that the result of stringon a byte must not contain the format characterless. This reasoningworks for other types as well.
Although string is not atomic (it returns a listfrom an atom), it does act like an atomic function in that its application isextended item-wise to a list.
string 42 98"42""98"string 1 2 3,"1","2","3"string "Beeblebrox","B","e","e","b","l","e","b","r","o","x"string(42; `life; ("the"; 0x42))"42""life"((,"t";,"h";,"e");"42")Considering a list as a mapping, we see that stringacts on the range of the mapping. Viewing a dictionary as a generalized list,we conclude that the action ofstring on a dictionary should alsoapply to its range.
d:1 2 3!100 101 102string d1| "100"2| "101"3| "102"A table is the flip of a column dictionary, so we expect stringto operate on the range of the column dictionary.
t:([] a:1 2 3; b:`a`b`c)string ta b---------,"1" ,"a","2" ,"b","3" ,"c"Finally, a keyed table is a dictionary, so we expect stringto operate on the value table.
kt:([k:1 2 3] c:100 101 102)string ktk| c-| -----1| "100"2| "101"3| "102"sv
The basic form of dyadic sv ("string fromvector") takes a char as its left operand and a list of strings (source)as its right operand. It returns a string that is the concatenation of thestrings insource, separated by the specified char.
";" sv("Now";"is";"the";"time";"")"Now;is;the;time;"When sv is used with an empty symbol as its leftoperand and a list of symbols as its right operand (source), the resultis a symbol in which the items insource are concatenated with aseparating dot.
` sv `qalib`stat`qalib.statThis is useful for q context names.
When sv is used with an empty symbol as its leftoperand and a symbol right operand (source) whose first item is a filehandle, the result is a symbol in which the items insource areconcatenated with a separating forward-slash. This is useful for fullyqualified q path names.
` sv `:`q`tutorial`draft1`:/q/tutorial/draft1When sv is used with an int left operand (base)that is greater than 1, together with a right operand of a simple list of placevalues expressed inbase, the result is an int representing theconverted base 10 value.
2 sv 101010b4210 sv 1 2 3 4 212342256 sv 0x0010924242Advanced:More precisely, the last version ofsv evaluates thepolynomial,
(d[n-1]*b exp n-1) + ... +d[0]where d is the list of digits, n is thecount of d, andb is the base.
Thus, we find,
10 sv 1 2 3 11 212412-10 sv 2 1 5195trim
The monadic trim takes a string argument andreturns the result of removing leading and trailing blanks.
trim " abc "" abc"Note:The functiontrim is equivalent to,
{ltrim rtrim x}You can also apply trim to a non-blank char.
trim "a""a"upper
The monadic upper takes a char, string or symbolargument and returns the result of converting any alpha characters to uppercase.
upper "a""A"upper "a Bc42De""A BC42DE"vs
The dyadic vs ("vector from string")takes a char as its left operand and a string (source) as its rightoperand. It returns a list of strings containing the tokens ofsource asdelimited by the specified char.
" " vs "Now is the time ""Now""is""the""time"""When vs is used with an empty symbol as its leftoperand and a symbol right operand (source) containing separating dots,it returns a simple symbol list obtained by splittingsource along thedots.
` vs `qalib.stat`qalib`statWhen vs is used with an empty symbol as its leftoperand and a symbol representing a fully qualified file name as the rightoperand, it returns a simple list of symbols in which the first item is thepath and the second item is the file name.
` vs `:/q/tutorial/draft`:/q/tutorial`draftNote that in the last usage, vs is not quite theinverse of sv.
When vs is used with a null of binary type as theleft operand and an value of integer type as the right operand (source),it returns a simple list whose items comprise the digits of the correspondingbinary representation ofsource.
0x00 vs 42420x0000109210h$0x00 vs 8151631268726338926j"q is fun"0b vs 4200000000000000000000000000101010bAdvanced:The last form can be used to display the internal representation of specialvalues.
0b vs 0W01111111111111111111111111111111b0b vs -0W10000000000000000000000000000001bMathematicalFunctions
The mathematical functions perform the mathematicaloperations for basic calculations. Their implementations are efficient.
acos
The monadic acos is the mathematical inverse of cos.For a float argument between -1 and 1,acos returns the float between0 and π whose cosine is the argument.
sqrt 2:1.414213562373095acos 10facos sqrt20nacos -13.141592653589793\ acos 01.570796326794897asin
The monadic asin is the mathematical inverse of sin.For a float argument between -1 and 1,asin returns the float between-π/2 and π/2 whose sine is the argument.
sqrt2:1.414213562373095asin 00fasin sqrt 2%20.7853982asin 11.570796asin -1-1.570796326794897atan
The monadic atan is the mathematical inverse of tan.For a float argument, it returns the float between -π/2 and π/2 whose tangentis the argument.
sqrt2:1.414213562373095atan 00fatan sqrt 20.9553166181245093atan 10.7853981633974483cor
The dyadic cor takes two numeric lists of the samecount and returns a float equal to the mathematical correlation between theitems of the two arguments.
23 -11 35 0 cor 42 21 73 390.9070229Note:The functioncor is equivalent to,
{cov[x;y]%dev[x]*dev y}cos
The monadic cos takes a float argument and returnsthe mathematical cosine of the argument.
pi:3.141592653589793cos 01fcos pi%30.5000000000000001cos pi%26.123032e-017cos pi-1fcov
The dyadic cov takes a numeric atom or list inboth arguments and returns a float equal to the mathematical covariance betweenthe items of the two arguments. If both arguments are lists, they must have thesame count.
98 cov 420f23 -11 35 0 cov 42 21 73 39308.4375Note:The functioncov is equivalent to,
{avg[x*y]-avg[x]*avg y}cross
The binary cross takes atoms or lists as argumentsand returns their Cartesian product - that is, the set of all pairs drawn fromthe two arguments.
1 2 cross `a`b`c1 `a1 `b1 `c2 `a2 `b2 `cNote:Thecross operator is equivalent to the function,
{raze x,\:/:y}inv
The monadic inv returns the inverse of a floatmatrix.
m:(1.1 2.1 3.1; 2.3 3.4 4.5; 5.6 7.8 9.8)inv m-8.165138 16.51376 -512.20183 -30.18349 10-5.045872 14.58716 -5Note:An integer argument will cause an error, so cast it to float.
lsq
The dyadic matrix function lsq returns the matrixX that solves the following matrix equation, whereA is the floatmatrix left operand,B is the float matrix right operand and·is matrix multiplication.
A = X·BFor example,
A:(1.1 2.2 3.3;4.4 5.5 6.6;7.7 8.8 9.9)B:(1.1 2.1 3.1; 2.3 3.4 4.5; 5.6 7.8 9.8)A lsq B1.211009 -0.1009174 2.993439e-12-2.119266 2.926606 -3.996803e-12-5.449541 5.954128 -1.758593e-11Observe that the result of lsq can be obtained as,
A mmu inv B1.211009 -0.1009174 1.77991e-12-2.119266 2.926606 -5.81224e-12-5.449541 5.954128 -1.337952e-11Note:Integer arguments will cause an error, so cast them to float.
mmu
The dyadic matrix multiplication function mmureturns the matrix product of its two float vector or matrix arguments, whichmust be of the correct shape.
Note:Integer arguments will cause an error, so cast them to float.
Here is an example of multiplying a matrix and itstranspose.
m1:(1.1 2.2 3.3;4.4 5.5 6.6;7.7 8.8 9.9)m2:flip m2m1 mmu m236.3 43.56 50.8279.86 98.01 116.16123.42 152.46 181.5The $ operator is overloaded to yield matrixmultiplication when its arguments are float vectors or matrices.
1 2 3f mmu 1 2 3f14f1 2 3f$1 2 3f14fsin
The monadic sin takes a float argument and returnsthe mathematical sine of the argument.
pi:3.141592653589793sin 00fsin pi%40.7071068sin pi%21fsin pi1.224606e-016tan
The monadic tan takes a float argument and returnsthe mathematical tangent of the argument.
Note:The valuetan x is (sin x)%cos x
pi:3.141592653589793tan 00ftan pi%80.4142136tan pi%41ftan pi%21.633178e+016tan pi-1.224606e-016var
The monadic var takes a scalar or numeric list andreturns a float equal to the mathematical variance of the items.
var 420fvar 42 45 37 3810.25Note:The functionvar is equivalent to
{(avg[x*x]) - (avg[x])*(avg[x])}wavg
The dyadic wavg takes two numeric lists of thesame count and returns the average of the second argument weighted by the firstargument. The result is always of type float.
1 2 3 4 wavg 500 400 300 200300fNote:The expressionw wavg b is equivalent to,
(sum w*a)%sum wIn our example,
(sum (1 2 3 4)*500 400 300 200)%sum 1 2 3 4300fIt is possible to apply wavg to a nested listprovided all sublists of both arguments conform. In this context, the resultconforms to the sublists and the weighted average is calculated recursivelyacross the sublists.
(1 2;3 4) wavg (500 400; 300 200)350 266.6667((1;2 3);(4;5 6)) wavg ((600;500 400);(300;200 100))360f285.7143 200wsum
The dyadic wsum takes two numeric lists of thesame count and returns the sum of the second argument weighted by the firstargument. The result is always of type float.
1 2 3 4 wsum 500 400 300 2003000fNote:The expressionw wsum b is equivalent to,
sum w*aIn our example,
sum (1 2 3 4)*500 400 300 2003000It is possible to apply wsum to a nested listprovided all sublists of both arguments conform. In this context, the resultconforms to the sublists and the weighted sum is calculated recursively acrossthe sublists.
(1 2;3 4) wsum (500 400;300 200)1400 1600((1;2 3);(4;5 6)) wsum ((600;500 400);(300;200 100))18002000 1800AggregateFunctions
An aggregate function operates on a list and returns anatom. Aggregates are especially useful with grouping inselectexpressions.
all
The monadic all takes a scalar or list of numerictype and returns the result of& applied across the items.
all 1b1ball 100100b0ball 10 20 3010any
The monadic any takes a scalar or list of numerictype and returns the result of| applied across the items.
any 1b1bany 100100b1bany 2001.01.01 2006.10.132006.10.13avg
The monadic avg takes a scalar, list, dictionaryor table of numeric type and returns the arithmetic average. The result isalways of type float.
avg 4242favg 1 2 3 4 53favg `a`b`c!10 20 4023.33333It is possible to apply avg to a nested listprovided the sublists conform. In this context, the result conforms to thesublists and the average is calculated recursively on the sublists.
avg (1 2; 100 200; 1000 2000)367 734favg ((1 2;3 4); (100 200;300 400))50.5 101151.5 202For tables, the result is a dictionary that maps eachcolumn name to the average of its column values.
tc1 c2------1.1 52.2 43.3 34.4 2avg tc1| 2.75c2| 3.5dev
The monadic dev takes a scalar, list, ordictionary of numeric type and returns the standard deviation. For result is afloat.
dev 420fdev 42 45 37 383.201562dev `a`b`c!10 20 4012.47219Note:The functiondev is equivalent to
{sqrt[var[x]]}med
The monadic med takes a list, dictionary or tableof numeric type and returns the statistical median.
For lists and dictionaries, the result is a float.
med 42 21 73 3940.5med `a`b`c!10 20 4020fNote:The functionmed is equivalent to,
{$[n:count x;.5*sum x[rank x]@floor .5*n-1 0;0n]}For tables, the result is a dictionary mapping the columnnames to their value medians.
t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2med tc1| 2.75c2| 3.5prd
The monadic prd takes a scalar, list, dictionaryor table of numeric type and returns the arithmetic product.
For scalars, lists and dictionaries the result has the typeof its argument.
prd 4242prd 1.1 2.2 3.3 4.4 5.5193.2612prd `a`b`c!10 20 408000It is possible to apply prd to a nested listprovided the sublists conform. In this case, the result conforms to thesublists and the product is calculated recursively on the sublists.
prd (1 2; 100 200; 1000 2000)100000 800000prd ((1 2;3 4); (100 200;300 400))100 400900 1600For tables, the result is a dictionary that maps eachcolumn name to the product of its column values.
t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2prd tc1| 35.1384c2| 120sum
The monadic sum takes a scalar, list, dictionaryor table of numeric type and returns the arithmetic sum.
For scalars, lists and dictionaries the result has the typeof its argument.
sum 4242sum 1.1 2.2 3.3 4.4 5.516.5sum `a`b`c!10 20 4070It is possible to apply sum to a nested listprovided the sublists conform. In this case, the result conforms to thesublists and the sum is calculated recursively on the sublists.
sum (1 2; 100 200; 1000 2000)1101 2202sum ((1 2;3 4); (100 200;300 400))101 202303 404For tables, the result is a dictionary that maps eachcolumn name to the sum of its column values.
t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2sum tc1| 11c2| 14UniformFunctions
Uniform functions operate on lists and return lists of thesame shape. They are useful inselect expressions.
deltas
The uniform deltas takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the differenceof each item from its predecessor.
deltas 4242deltas 1 2 3 4 51 1 1 1 1deltas 96.25 93.25 58.25 73.25 89.50 84.00 84.2596.25 -3 -35 15 16.25 -5.5 0.25deltas `a`b`c!10 20 40a| 10b| 10c| 20t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2deltas tc1 c2------1.1 51.1 -11.1 -11.1 -1Important:As the third example shows, the result ofdeltas contains theinitial item ofsource in its initial position. This may be inconsistentwith the behavior of similar functions in other languages or libraries thatreturn 0 in the initial position. The alternate behavior can be achieved withthe expression
1_deltas (1#x),xIn our example above,
1_deltas (1#x),x:96.25 93.25 58.25 73.25 89.50 84.00 84.250 -3 -35 15 16.25 -5.5 0.25differ
The uniform differ takes as its argument (source)a list and returns a boolean list whose item in position i is the result ofmatch (~) applied to the item at positioni and the item at positioni-1.The result of differ on a scalar is0b.
Note:The item at position 0 in the result is always 1b.
differ 1 1 2101bdiffer 0N 0N 1 1 210101bdiffer "mississippi"11101101101bdiffer (1 2; 1 2; 3 4 5)101bOne use of differ is to locate runs of repreateditems in a list.
L:0 1 1 2 3 2 2 2 4 1 1 3 4 4 4 4 5L where nd|next nd:not differ L1 1 2 2 2 1 1 4 4 4 4fills
The uniform fills takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns a copy of thesourcein which non-null items are propagated forward to fill nulls.
fills 4242fills 1 0N 3 0N 51 1 3 3 5fills `a`b`c`d`e`f!10 0N 30 0N 0N 60a| 10b| 10c| 30d| 30e| 30f| 60tt:([] c1:1 0N 3 0N; c2:`a`b``d)ttc1 c2-----1 ab3dfills ttc1 c2-----1 a1 b3 b3 dNote:Initial nulls are not affected byfills.
fills 0N 0N 3 0N 50N 0N 3 3 5mavg
The uniform dyadic mavg takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving average ofsource, obtained byapplyingavg over length consecutive items. For positions lessthanlength-1,avg is applied only through that position.
In the following example, the first item in the result isthe average of itself only; the second result item is the average of the firsttwo source items; all other items reflect the average of the item at theposition along with its two predecessors.
3 mavg 10 20 30 40 5010 15 20 30 40fFor length 1, the result is the source converted to float.Forlength less than or equal to 0 the result is all nulls.
Note:As of release 2.4,mavg ignores null values.
3 mavg 10 20 0N 40 50 60 0N10 15 15 30 45 50 55fmaxs
The uniform maxs takes as its argument (source)a scalar, list, dictionary or table and returns the cumulative maximum of thesourceitems.
maxs 4242maxs 1 2 5 4 101 2 5 5 10maxs "Beeblebrox""Beeelllrrx"maxs `a`b`c`d!10 30 20 40a| 10b| 30c| 30d| 40t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2maxs tc1 c2------1.1 52.2 53.3 54.4 5mcount
The uniform dyadic mcount takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving count ofsource, obtained byapplyingcount over length consecutive items. For positionsless thanlength-1,count is applied only through thatposition.
This function is useful in computing other movingquantities. For example,
3 mcount 10 20 30 40 501 2 3 3 3For length less than or equal to 0 the result is allzeroes
Note:As of release 2.4, mcount ignores null values.
3 mcount 10 20 0N 40 50 60 0N1 2 2 2 2 3 2mdev
The uniform dyadic mdev takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving standard deviation ofsource,obtained by applyingdev over length consecutive items. Forpositions less thanlength-1,dev is applied only through thatposition.
In the following example, the first item in the result isthe standard deviation of itself only; the second result item is the standarddeviation of the first two source items; all other items reflect the standarddeviation of the item at the position along with its two predecessors.
3 mdev 10 20 30 40 500 5 8.164966 8.164966 8.164966For length less than or equal to 0 the result is allnulls.
mins
The uniform mins takes as its argument (source)a scalar, list, dictionary or table and returns the cumulative minimum of thesourceitems.
mins 4242mins 10 4 5 1 210 4 4 1 1mins "Beeblebrox""BBBBBBBBBB"mins `a`b`c`d!40 10 30 20a| 40b| 10c| 10d| 10t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2mins tc1 c2------1.1 51.1 41.1 31.1 2mmax
The uniform dyadic mmax takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving maximum ofsource, obtained byapplyingmax over length consecutive items. For positions lessthanlength-1,max is applied only through that position.
In the following example, the first item in the result isthe max of itself only; the second result item is the max of the first twosource items; all other items reflect the max of the item at the position alongwith its two predecessors.
3 mmax 20 10 30 50 4020 20 30 50 50For length less than or equal to 0 the result is source.
mmin
The uniform dyadic mmin takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving minimum ofsource, obtained byapplyingmin over length consecutive items. For positions lessthanlength-1,min is applied only through that position.
In the following example, the first item in the result isthe min of itself only; the second result item is the min of the first twosource items; all other items reflect the min of the item at the position alongwith its two predecessors.
3 mmin 20 10 30 50 4020 10 10 10 30For length less than or equal to 0 the result is source.
msum
The uniform dyadic msum takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving sum ofsource, obtained by applyingsumover length consecutive items. For positions less thanlength-1,sumis applied only through that position.
In the following example, the first item in the result isthe sum of itself only; the second result item is the sum of the first twosource items; all other items reflect the sum of the item at the position alongwith its two predecessors.
3 msum 10 20 30 40 5010 30 60 90 120For length less than or equal to 0 the result is allzeros.
next
The uniform next takes as its argument (source)a scalar, list or table of numeric type and returns thesource shiftedone position to the left with no wrapping. For lists and dictionaries, the lastitem of the result is a null matching the type ofsource. For tables,the last record of the result is a row of nulls.
next 1 2 3 4 52 3 4 5 0Nt:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2next tc1 c2------2.2 43.3 34.4 2prds
The uniform sums takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the cumulativeproduct of thesource items.
prds 4242prds 1 2 3 4 51 2 6 24 120prds `a`b`c!10 20 40a| 10b| 200c| 8000t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2prds tc1 c2-----------1.1 52.42 207.986 6035.1384 120prev
The uniform prev takes as its argument (source)a scalar, list, dictionary or table. It returns thesource shifted oneposition forward with initial null filling.
prev 4242prev 1 2 3 4 50N 1 2 3 4prev `a`b`c!10 20 40a|b| 10c| 20t:([]c1:`a`b`c;c2:10 20 40)tc1 c2-----a 10b 20c 40prev tc1 c2-----a 10b 20rank
The uniform rank takes as its argument (source)a list, dictionary or table whose values are sortable. It returns a list of intcontaining the order of each item in thesource under an ascending sort.For dictionaries, the operation is against the range.
rank 5 2 3 1 44 1 2 0 3rank `a`b`c`e`f! 5 2 3 1 44 1 2 0 3For tables and keyed tables, the result is a list with therank of the records under ascending sort of the first column or the key column.
ttt:([] c1:2.2 1.1 3.3 5.5 4.4; c2:1 2 3 4 5)tttc1 c2------2.2 11.1 23.3 35.5 44.4 5rank ttt1 0 2 4 3kt:([k:103 102 101 105 104] d:1 2 3 4 5)ktk | d---| -103| 1102| 2101| 3105| 4104| 5rank kt2 1 0 4 3ratios
The uniform ratios takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the float ratioof each item to its predecessor.
ratios 4242ratios 1 2 3 4 51 2 1.5 1.333333 1.25ratios 96.25 93.25 58.25 73.25 89.50 84.00 84.2596.25 0.9688312 0.6246649 1.257511 1.221843 0.9385475 1.002976deltas `a`b`c!10 20 40a| 10b| 10c| 20t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2ratios tc1 c2------------------1.1 52 0.81.5 0.751.333333 0.6666667Important:As the second example shows, the result ofratios contains theinitial item ofsource in its initial position. This may be inconsistentwith the behavior of similar functions in other languages or libraries thatreturn 1 in the initial position. The alternate behavior can be achieved withthe expression,
1,ratios 1_xIn our example above,
#!q1,ratios 1_x:96.25 93.25 58.25 73.25 89.50 84.00 84.25193.250.62466491.2575111.2218430.93854751.002976rotate
The uniform dyadic rotate takes as its firstargument an int (length) and as its second argument (source) anumeric list or table. It returns the source shiftedlength positions tothe left with wrapping iflength is positive, orlength positionsto the right with wrapping iflength is negative. Forlength 0,it returns the source.
2 rotate 1 2 3 4 53 4 5 1 2-2 rotate 1 2 3 4 54 5 1 2 3t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2---------1.1 52.2 43.3 34.4 22 rotate tc1 c2------1.1 52.2 43.3 34.4 2sums
The uniform sums takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the cumulativesum of thesource items.
sums 4242sums 1 2 3 4 51 3 6 10 15sums `a`b`c!10 20 40a| 10b| 30c| 70t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 2sums tc1 c2------1.1 53.3 96.6 1211 14xbar
The uniform dyadic xbar takes as its firstargument a non-negative numeric atom (width) and a second argument (source)that is a numeric list, dictionary or table. It returns an entity that conformstosource, in which each item of source is mapped to the largestmultiple of thewidth that is less than or equal to that item. The typeof the result is that of thewidth parameter.
3 xbar 2 7 12 17 220 6 12 15 215.5 xbar 59.25 53.75 81.00 96.25 93.25 58.25 73.25 89.50 84.00 84.2555 49.5 77 93.5 88 55 71.5 88 82.5 82.515 xbar `a`b`c!10 20 40a| 0b| 15c| 30t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)tc1 c2------1.1 52.2 43.3 34.4 22 xbar tc1 c2-----0 42 42 24 2Since xbar is atomic in its second argument it canbe applied to a nested list.
5 xbar ((11;21 31);201 301)10 20 30200 300xprev
The dyadic xprev takes an int as its firstargument (shift) and is uniform in its second argument (source),which can be a list or a table. It returns a result that conforms tosource.Whenshift is 0 or positive, each entity in source is shiftedshiftpositions forward in the result, with the initialshift entries nullfilled.
2 xprev 10 20 30 400N 0N 10 20t:([]c1:`a`b`c`d;c2:10 20 30 40)tc1 c2-----a 10b 20c 30d 402 xprev tc1 c2-----a 10b 20When shift is negative, the result is a copy of sourcewith the initialshift entries null filled.
-2 xprev 10 20 30 4030 40 0N 0Nxrank
The binary xrank is uniform in its right operand (source),which is a list, dictionary, table or keyed table whose values are sortable.The left operand is a positive int (quantile). It returns a list of intcontaining the quantile of the source distribution to which each item of sourcebelongs. The analysis is applied to the range of a dictionary and the firstcolumn of a table.
For example, by choosing quantile to be 4, xrankdetermines into which quartile each item ofsource falls.
4 xrank 30 10 40 20 901 0 2 0 34 xrank `a`b`c`d`e!30 10 40 20 901 0 2 0 3t:([]c1:30 10 40 20 90;c1:`a`b`c`d`e)tc1 c11------30 a10 b40 c20 d90 e4 xrank t1 0 2 0 3Choosing quantile to be 100 gives percentileranking.
MiscellaneousFunctions
We collect here the built-in functions that don't fit intoany of the previously defined categories.
ConditionalAppend (?)
The left operand of conditional append ( ? ) is asymbol representing the name of a list of symbols (target) and the rightoperand is a symbol, the right operand is appended totarget if and onlyif it is not intarget. There is no effect when the right operand isalready in target. The result is the enumeration of the right operand intarget.
v:`a`b`c`v?`z`v$`zv`a`b`c`z`v?`b`v$`bv`a`b`c`zNote:While conditional append is normally used with a target list of unique items,this is not a requirement.
asc
The monadic function asc operates on a list or adictionary (source). The result ofasc on a list is a listcomprising the items ofsource sorted in increasing order with the s#attribute applied. The result ofasc on a dictionary is an equivalentmapping with the range items sorted in increasing order and with thes#attribute applied.
asc 3 7 2 8 1 9`s#1 2 3 7 8 9asc `b`c`a!3 2 1a| 1c| 2b| 3bin
The dyadic bin takes a simple list of items (target)in strictly increasing order as its first argument and is atomic in its secondargument (token). Loosely speaking, the result of bin is the position atwhichtoken would fall in target.
More precisely, the result is -1 if token is lessthan the first item intarget. Otherwise, the result is the position ofthe right-most item oftarget that is less than or equal to token;this reduces to the found position if the token is intarget. Iftokenis greater than the last item in target, the result is the count oftarget.
Note:For large sorted lists, the binary search performed bybin isgenerally more efficient than the linear search algorithm used byin.
Some examples with simple lists,
1 2 3 4 bin 32"xyz" bin "a"-11.0 2.0 3.0 bin 0.0 2.0 2.5 3.0-1 1 1 2Observe that the type of token must strictly matchthat of target.
1 2 3 bin 1.5`typeWe can apply bin to a dictionary to performreverse lookup, provided the dictionary domain is in increasing order. Whensourceis a dictionary,bin takes a token whose type matches that ofthe dictionary range. The result is null iftoken is less than everyitem of the range. Otherwise, the result is the right-most domain element whosecorresponding range element is less than or equal totoken. Loosely put,when token is not found, the result is the domain item after which youwould make an insertion to place it into the dictionary in proper order.
Note that the result reduces to the corresponding domainitem if token is found intarget, and is the last domain item iftokenis greater than every range item.
d:10 20 30!`first`second`thirdd bin `second20d bin `missing10d bin `zero30d bin `aaa0NBecause a table is a list of records, we expect binto return the row number of a record.
t:([] a:1 2 3; b:`a`b`c)ta b---1 a2 b3 ct bin `a`b!(2;`b)1As always, the record can be abbreviated to the list of rowvalues.
t bin (1;`a)0t bin (0;`z)0NObserve that a record that is not found results in a nullresult.
Finally, since a keyed table is a dictionary, binwill perform a reverse lookup on a record of the value table, which can beabbreviated to a list of row values.
kt:([k:1 2 3] c:100 101 102)ktk| c-| ---1| 1002| 1013| 102kt bin (enlist `c)!enlist 101k| 2kt bin 101k| 2Warning:While the items of the first argument ofbin should be in strictlyincreasing order for the result to meaningful, this condition is not enforced.The results ofbin when the first argument is not strictly increasing are predictablebut not particularly useful.
count
The monadic count returns a non-negative intrepresenting the number of entities in its argument. Its domain comprisesscalars, lists, dictionaries, tables and keyed tables.
count 31count 10 20 303count `a`b`c`d!10 20 30 404count ([] a:10 20 30; b:1.1 2.2 3.3)3count ([k:10 20] c:`one`two)2Note:You cannot usecount to determine whether an entity is a scalar or list since scalarsand singletons both have count 1.
count 31count enlist 31This test is accomplished instead by testing the sign ofthe type of the entity.
0>type 31b0>type enlist 30bAside:Do you know why they call it count? Because it loves to count!! Nyah, ha, ha,ha, ha. Vun, and two, and tree, and....
cut
The binary operator cut is related to the _operator. It is the same as_ when the right operand is a dictionaryand the left operand is a list of items from the dictionary domain.
d:1 2 3!`a`b`c(enlist 2) cut d1| a3| cHowever, for a list right operand source and an intleft operand size,cut returns a new list created by collectingthe items ofsource into sublists of countsize.
5 cut til 130 1 2 3 45 6 7 8 910 11 12Advanced:Thecut function is equivalent to,
{$[0>type x;x*til neg floor neg(count y)mod x;x]_y}delete (_)
The symbol _ is overloaded to have several meaningsdepending on the signature of its operands. See also drop.
Note:When _ is used as an operator, whitespace isrequired to the left if theleft operand is a name. This is because _ is a valid non-initial namecharacter. Whitespace is permitted but not required to the right.
When the first argument of dyadic ( _ ) is a listof non-negative int and the second argument (source) is a list, itproduces a new list obtained by breakingsource into sublists at thepositions indicated in the first argument. An example will make this clear.
0 3_100 200 300 400 500100 200 300400 500Each sublist includes the items from the beginning cutposition up to, but not including, the next cut position. The final cutincludes the items to the end ofsource. Observe that if the leftargument does not begin with 0, the initial items ofsource will notbe included in the result.
2 4_2006.01 2006.02 2006.03 2006.04 2006.05 2006.062006.03 2006.042006.05 2006.06When the right operand of _ is a dictionary (source)and the left operand is a list of key values whose type matchessource,the result is a dictionary obtained by removing the specified key-value pairsfrom the target.
For example,
d:1 2 3!`a`b`c(enlist 42) _ d1| a2| b3| c(enlist 2) _ d1| a3| c1 3 _ d2| b(enlist 32) _ d1| a2| b3| c1 2 3 _ d_Note:The operand must be a list, so a single key value must be enlisted.
When the first argument of dyadic delete ( _ ) isa list or a dictionary (source) and the second argument is a position inthe list or an item in the domain of the dictionary, the result is a new entityobtained by deleting the specified item from the source.
L: 101 102 103 104 105L _2101 102 104 105d:`a`b`c`d!101 102 103 104d _ `ba| 101c| 103d| 104Since a table is a list, delete can be applied by rownumber.
t:([]c1:1 2 3;c2:101 102 103;c3:`x`y`z)tc1 c2 c3---------1 101 x2 102 y3 103 zt _ 1c1 c2 c3---------1 101 x3 103 zSince a keyed table is a dictionary, delete can be appliedby key value.
kt:([k:101 102 103]c:`one`two`three)ktk | c---| -----101| one102| two103| threekt _ 102k | c---| -----101| one103| threedesc
The monadic function desc operates on a list or adictionary (source). The result ofdesc on a list is a listcomprising the items ofsource sorted in decreasing order with thes#attribute applied. The result ofdesc on a dictionary is an equivalentmapping with the range items sorted in decreasing order and with thes#attribute applied.
desc 3 7 2 8 1 99 8 7 3 2 1desc `b`c`a!3 2 1b| 3c| 2a| 1distinct
The monadic function distinct returns the distinctentities in its argument. For a list, it returns the distinct items in thelist, in order of first occurrence.
distinct 1 2 3 2 3 4 6 4 3 5 61 2 3 4 6 5For a table, distinct returns a table comprisingthe distinct records of the argument, in the order of first occurrence.
tdup:([]a:1 2 3 2 1; b:`washington`adams`jefferson`adams`wasington)tdupa b------------1 washington2 adams3 jefferson2 adams1 wasingtondistinct tdupa b------------1 washington2 adams3 jefferson1 wasingtonObserve that all fields of the records must be identicalfor the records to be considered identical. Otherwise put, if any fielddiffers, the records are distinct.
When applied to an int n, distinct produces arandom int between 0 (inclusive) and n (exclusive).
distinct 4237distinct 4239drop (_)
The symbol _ is overloaded to have several meaningsdepending on the signature of its operands. See also delete.
Note:When _ is used as an operator, whitespace isrequired to the left if theleft operand is a name. This is because _ is a valid non-initial namecharacter. Whitespace is permitted but not required to the right.
When the first argument of the dyadic _ is an intand the second argument (source) is a list, the result is a new listcreated via removal fromsource. A positive int in the first argumentindicates that the removal occurs from the beginning of thesource,whereas a negative int in the first argument indicates that the removal occursfrom the end of thesource.
The source can be a list, a dictionary, a table or akeyed table.
2_10 20 30 4030 40-3_`one`two`three`four`five`one`two2_`a`b`c`d!10 20 30 40c| 30d| 40-1_([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)a b------10 1.120 2.230 3.32_([k:10 20 30] c:`one`two`three)k | c--| -----30| threeThe result of drop is of the same type and shape as sourceand is never a scalar.
1_42 67,67Observe that for nested lists, the deletion occurs at thetop-most level.
1_(100 101 102;103 104 105)103 104 105In the degenerate case, the result is an empty entityderived from source.
4_10 20 30 40`int$()4_`a`b`c`d!10 20 30 404_([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)a b--3_([k:10 20 30] c:`one`two`three)k| c-| -eval
The monadic eval evaluates a list that representsa valid q parse tree, which can be produced by parse or by hand (if you knowwhat you're doing). A discussion of parse trees is beyond the scope of thismanual.
show pt:parse "a:6*7":`a(*;6;7)eval pt42except
The dyadic except takes a simple list or adictionary whose range is a simple list as its first argument (target)and returns a list containing the items oftarget excluding those thatare in its second argument, which can be a scalar or a list. The returned itemsare in the order of their first occurrence intarget.
1 2 3 4 3 2 except 21 3 4 31 2 3 4 3 2 except 1 2 103 4 3"Now is the time_" except "_""Now is the time"d:`a`c`d`e!1 2 1 2d except 12 2The result of except is never a scalar.
1 2 except 1,21 2 except 2 1`int$()d except 1 2`int$()exit
The monadic exit takes an int as its argument anda and executes the system command\\ with the specified parameter.
Warning:Exit does not prompt for a confirmation.
fill (^)
The dyadic fill ( ^ ) takes an atom as its firstargument and a list or dictionary (target) as its second argument. For alist, it returns a list obtained by substituting the first argument for everyoccurrence of null intarget. It operates on the range of a dictionary.
42^1 2 3 0N 5 0N1 2 3 42 5 42";"^"Now is the time""Now;is;the;time"`NULL^`First`Second``Fourth`First`Second`NULL`Fourthd:`a`b`c`d!100 0N 200 0N42^da| 100b| 42c| 200d| 42Observe that the action of fill is recursive - i.e., it isapplied to sublists of the target.
42^(1;0N;(100;200 0N))42^a| 100b| 42c| 200d| 42find (?)
When the first argument (target) of find ( ?) is a simple list, find is atomic in the second argument (source) andreturns the positions intarget of the initial occurrence of each itemofsource.
The simplest case is when source is a scalar.
100 99 98 87 96?982"Now is the time"?"t"7If source is not found in target, findreturns the count of target - i.e., the position one past the lastelement.
`one`two`three?`four3In this context, find is atomic in its second argument, soit is extended item-wise to asource list.
"Now is the time"?"the"7 8 9Note that find always returns the position of the firstoccurrence of each atom.
"Now is the time"?"time"7 4 13 9When the first argument (target) of find is ageneral list, find considers both elements to be general lists and attempts tolocate the second argument (source) in the target, returning theposition where it is found or the count oftarget if not found.
(1 2;3 4)?3 41Observe that find only compares items at the top level ofthe two arguments and does not look for nested items,
((0;1 2);3 4;5 6)?1 23((0;1 2);3 4;5 6)?(1;(2;3 4))3When the first argument (target) of find is adictionary, find represents reverse lookup and is atomic in the second argument(source). In other words, find returns the domain item mapping tosourceifsource is in the range, or a null appropriate to the domain typeotherwise.
d:1 2 3!100 101 102d1| 1002| 1013| 102d?1012d?990Nd?102 1003 1When the first argument (target) of find is a tableand the second argument (source) is a record of the target, find returnsthe position ofsource if it is intarget, or the count of targetotherwise.
t:([] a:1 2 3; b:`a`b`c)ta b---1 a2 b3 ct?`a`b!(2;`b)1As usual with records, you can abbreviate the record to itsrow values.
t?(3;`c)2When the first argument of find is a keyed table, since akeyed table is a dictionary, find performs a reverse lookup on a record fromthe value table.
kt:([k:1 2 3] c:100 101 102)ktk| c-| ---1| 1002| 1013| 102kt?`c!101k| 2Again, a record of the value table can be abbreviated toits row value(s).
kt?102k| 3flip
The monadic function flip takes a rectangularlist, a column dictionary or a table as its argument (source). Theresult is the transpose ofsource.
When source is a rectangular list, the items arerearranged, effectively reversing the first two indices in indexing at depth.For example,
L:(1 2 3; (10 20; 100 200; 1000 2000))L1 2 310 20 100 200 1000 2000L[1;0]10 20fL:flip LfL1 10 202 100 2003 1000 2000fL[0;1]10 20When source is a singleton list whose item is asimple list, flip creates a vertical list.
flip enlist 101 103101103This idiom is used to index multiple key values into keyedtables.
kt:([k:101 102 103] c:`one`two`three)kt flip enlist 101 103c-----onethreeWhen source is a column dictionary, the result is a tablewith the given column names and values. Row and column access are effectivelyreversed, but no data is rearranged.
d:(`a`b`c!1 2 3;1.1 2.2 3.3;("one";"two";"three"))d`a`b`c!1 2 31.1 2.2 3.3("one";"two";"three")d[`b;0]1.1t:flip dta b c-----------1 1.1 one2 2.2 two3 3.3 threet[0;`b]1.1When source is a table, the result is the underlyingcolumn dictionary. Row and column access are effectively reversed, but no datais rearranged.
t:([]a:1 2 3;b:1.1 2.2 3.3;c:("one";"two";"three"))ta b c-------------1 1.1 "one"2 2.2 "two"3 3.3 "three"t[1;`c]"two"d:flip tda| 1 2 3b| 1.1 2.2 3.3c| "one" "two" "three"d[`c;1]"two"getenv
The monadic function getenv takes a symbolargument representing the name of an OS environment variable and returns thevalue (if any) of that environment variable.
getenv `SHELL"/bin/bash"group
The monadic function group operates on a list (source)and returns a dictionary in which each distinct item insource is mappedto a list of the indices of its occurrences in source. The items in the domainof the result are in the order of their first appearance insource.
group "i miss mississippi"i| 0 3 8 11 14 17| 1 6m| 2 7s| 4 5 9 10 12 13p| 15 16This can be used to extract specific information about theoccurrences, such as,
dm:group "i miss mississippi"count each dmi| 6| 2m| 2s| 6p| 2first each dmi| 0| 1m| 2s| 4p| 15iasc
The monadic function iasc operates on a list or adictionary (source). Consideringsource as a mapping, the resultofiasc is a list comprising the domain items arranged in increasingorder of their associated range items. Otherwise put, retrieving the items ofsourcein the order specified byiasc sorts source in ascending order.
L:3 7 2 8 1 9iasc L4 2 0 1 3 5L[iasc L]1 2 3 7 8 9d:`b`c`a!3 2 1iasc d`a`c`bd[iasc d]1 2 3identity
The monadic function denoted by double colon ( ::), is the identity function, meaning that the return value is the same as theargument.
::[42]42::[`zaphod]`zaphod::["Life the Universe and Everything"]"Life the Universe and Everything"Note:The identity function cannot be used with juxtaposition or @. Its argument mustbe enclosed in brackets.
:: 42'idesc
The monadic function idesc operates on a list or adictionary (source). Consideringsource as a mapping, the resultofidesc is a list comprising the domain items arranged in decreasingorder of their associated range items. Otherwise put, retrieving the items ofsourcein the order specified byidesc sorts source in descendingorder.
L:3 7 2 8 1 9idesc L5 3 1 0 2 4L[idesc L]9 8 7 3 2 1d:`b`c`a!3 2 1idesc d`b`c`ad[idesc d]3 2 1in
The dyadic function in is atomic in its firstargument (source) and takes a second argument (target) that is anatom or list. It returns a boolean result that indicates whethersourceappears intarget. The comparison is strict with regard to type.
3 in 80b42 in 0 6 7 42 981b"cat" in "abcdefg"110b`zap in `zaphod`beeblebrox0b2 in 0 2 4j'typeinter
The dyadic inter can be applied to lists,dictionaries and tables. It returns an entity of the same type as itsarguments, containing those elements of the first argument that appear in thesecond argument.
1 1 2 3 inter 1 2 3 41 1 2 3"ab cd " inter " bc f""b c "Note:Lists are not sets and the operation ofinter on lists is notidentical to intersection of sets. In particular, the result ofinter does notcomprise the distinct items common to the two arguments. One consequenceis that the expression,
(x inter y)~y inter xis not true in general.
When applied to dictionaries, inter returns theset of common range items that are mapped from the the same domain items.
d1:1 2 3!100 200 300d2:2 4 6!200 400 600d1 inter d2,200Tables that have the same columns can participate in inter.The result is a table with the records that are common to the two tables.
t1a b--------1 first2 second3 thirdt2a b--------2 second4 fourth6 sixtht1 inter t2a b--------2 secondjoin (,)
The dyadic join ( , ) can take many differentcombinations of arguments.
When both operands are either lists or atoms, the result isa list with the item(s) of the left operand followed by the item(s) of theright operand.
2,32 3`a,`b`c`a`b`c"xy","yz""xyyz"1.1 2.2,3 41.12.234Observe that the result is a general list unless all itemsare of a homogeneous type.
When both operands are dictionaries, the result is themerge of the dictionaries using upsert semantics. The domain of the result isthe (set theoretic) union of the two domains. Range assignment of the rightoperand prevails on common domain items.
d1:1 2 3!`a`b`cd2:3 4 5!`cc`d`ed1,d21| a2| b3| cc4| d5| eWhen both operands are tables having the same column namesand types, the result is a table in which the records of the right operand areappended to those of the left operand.
t1:([]a:1 2 3;b:`x`y`z)t1a b---1 x2 y3 zt2:([]a:3 4;b:`yy`z)t2a b----3 yy4 zt1,t2a b----1 x2 y3 z3 yy4 zWhen both operands are keyed tables having the same key andvalue columns, the result is a keyed table in which the records of the leftoperand are upserted with those of the right operand.
kt1:([k:1 2 3]v:`a`b`c)kt1k| v-| -1| a2| b3| ckt2:([k:3 4]v:`cc`d)kt2k| v-| --3| cc4| dkt1,kt2k| v-| --1| a2| b3| cc4| djoin-each(,')
The verb join ( , ) can be combined with the adverb monadiceach ( ' ) to yield join-each ( ,' ), which can be used on lists, dictionariesor tables.
List operands must have the same count.
L1:1 2 3L2:`a`b`cL1,'L21 `a2 `b3 `cAs always with dictionaries, the operation occurs along thecommon domain items, with null extension elsewhere.
d1:1 2 3!10 20 30d2:2 3 4!`a`b`cd1,'d21| 10 `2| 20 `a3| 30 `b4| 0N `cFor two tables with the same count of records, join-eachresults in a column join (Column Join), in which columns withnon-common names are juxtaposed and overlapping columns are upserted.
t1:([]c1:1 2 3;c2:1.1 2.2 3.3)t1c1 c2------1 1.12 2.23 3.3t2:([]c2:`a`b`c;c3:100 200 300)t2c2 c3------a 100b 200c 300t1,'t2c1 c2 c3---------1 a 1002 b 2003 c 300Note:When join-each is used in aselect, it must be enclosed in parentheses to avoid the comma beinginterpreted as a separator.
select j:(c1,'c2) from t1j-----1 1.12 2.23 3.3list
The function list replaces plist. It XE"list (function)" takes a variable number of arguments and returns alist whose items are the arguments. It is useful for creating listsprogrammatically.
Note:Unlike user-defined functions, the number of arguments to list is notrestricted to eight.
For example,
list[6;7;42;`Life;"The Universe"]6742`Life"The Universe"list[1;2;3;4;5;6;7;8;9;10]1 2 3 4 5 6 7 8 9 10null
The atomic function null takes a list (source)and returns a binary list comprising the result of testing each item insourceagainst null.
null 1 2 3 0N 5 0N000101bnull `a`b``d```f0010110bSince null is atomic, it is applied recursively tosublists.
null (1 2;3 0N)00b01bIt is useful to combine where with nullto obtain the positions of the null items.
where null 1 2 3 0N 5 0N3 5When applied to a dictionary (source), nullreturns a dictionary in which each item in thesource range is replacedwith the result of testing the item against null.
null 1 2 3!100 0N 3001| 02| 13| 0The action of null on a table (source) isexplained by recalling that the table is a flipped column dictionary. Based onthe action ofnull on a dictionary, we expect the result ofnullon a table will be a new table in which each column value in the source isreplaced with the result of testing the value against null.
tnull:([]a:1 0N 3; b:0N 200 300)null tnulla b---0 11 00 0Similarly, we expect null to operate on a keyedtable by returning a result keyed table whose value table entries are theresult of testing those of the argument against null.
ktnull:([k:101 102 103];v:`first``third)null ktnullk | v---| ---101| 0102| 1103| 0parse
The monadic function parse takes a string argumentcontaining a valid q expression and returns a list containing the correspondingparse tree. Applying the functioneval to the result will evaluate it.A discussion of q parse trees is beyond the scope of this tutorial.
.Q.s1 parse "a:6*7""(:;`a;(*;6;7))"eval parse "a:6*7"42Note:It is useful to apply parse to a query template in order to discover itsfunctional form. The result is not always exactly the functional form,especially for exec, but a little experimenting will lead to the correct form.
t:([]c1:`a`b`a; c2:1 2 3)select c2 by c1 from tc1| c2--| ---a | 1 3b | ,2parse "select c2 by c1 from t"?`t()(,`c1)!,`c1(,`c2)!,`c2?[t;();(enlist `c1)!enlist `c1;(enlist `c2)!enlist `c2]c1| c2--| ---a | 1 3b | ,2exec c2 by c1 from ta| 1 3b| ,2parse "exec c2 by c1 from t"?`t(),`c1,`c2?[t;();`c1;`c1]a| `a`ab| ,`brand (?)
The dyadic function rand ( ? ) is overloaded tohave different meanings. In the case where both arguments are numeric scalars,?returns a list of random numbers. More specifically, the first argument must beof integer type, and the second argument can by any numeric value. In this context,? returns a list of pseudo-random numbers of count given by firstargument.
In case the second argument is a positive number offloating point type and the first argument is positive, the result is a list ofrandom float selectedwith replacement from the range between 0(inclusive) and the second argument (exclusive).
5?4.23.778553 1.230056 1.572286 0.517468 0.071075984?1.00.5274765 0.5435815 0.4611484 0.7493561In case the second argument is of integer type and thefirst argument is positive, the result is a list of random integers selectedwithreplacement from the range between 0 (inclusive) and the second argument(exclusive).
10?51 2 0 3 4 4 4 0 3 110?50 2 1 0 2 4 2 3 4 01+10?54 2 3 3 3 2 1 1 5 3The last example shows how to select randomintegers between 1 and 5. More generally, for integersi andj,where i
, and any integer n, the idiom, i+n?j+1-iselects n random integers between i and jinclusive.
i:3j:7n:10i+n?j+1-i3 4 5 7 7 5 4 4 7 4In case the second argument is of integer type and thefirst argument is negative, the result is a list of random integers selectedwithoutreplacement from the range between 0 (inclusive) and the second argument(exclusive). Since the selected values are not replaced, the absolute value ofthe first argument cannot exceed the second argument,
-3?52 3 0-5?54 1 2 0 3-6?5'lengthraze
The monadic raze takes a list or dictionary (source)and returns the entity derived from the source by eliminating the top-mostlevel of nesting.
raze (1 2;`a`b)12`a`bOne way to envision the action of raze is to writethe source list in general form, then remove the parentheses directly beneaththe outer-most enclosing pair.
raze ((1;2);(`a;`b))12`a`bObserve that raze only removes the top-most levelof nesting and does not apply recursively to sublists.
raze ((1 2;3 4);(5;(6 7;8 9)))1 23 45(6 7;8 9)If source is not nested, the result is the source.
raze 1 2 3 41 2 3 4When raze is applied to an atom, the result is alist.
raze 42,42When raze is applied to a dictionary, the resultis raze applied to the range.
dd:`a`b`c!(1 2; 3 4 5;6)raze dd1 2 3 4 5 6reshape(#)
When the first argument of the dyadic reshape ( # ) is alist (shape) of two positive int, the result reshapes the source into arectangular list according toshape. Specifically, the count of theresult in dimensioni is given by the item in positioni in shape.The elements are taken from the beginning of the source.
A simple example makes this clear.
2 3#1 2 3 4 5 61 2 34 5 6As in the case of take, if the number of elements in thesource exceeds what is necessary to form the result, trailing elements areignored.
2 2#`a`b`c`d`e`f`g`ha bc dSimilarly, if the number of elements in the source is lessthan necessary to form the result, the extraction resumes from the initial itemof the source; this process is repeated until the result is complete.
#!q5 4#"Now is the time""Now ""is t""he t""imeN""ow i"It is possible create a ragged array of any number ofcolumns by using 0N as the number of rows with the reshape operator ( # ).
0N 3#til 100 1 23 4 56 7 8,9reverse
The monadic reverse inverts the order of theconstituents of its argument. In the case of an atom, it simply returns theargument.
reverse 4242In the case of a list, the result is a list in which theitems are in reverse order of the argument.
reverse 1 2 3 4 55 4 3 2 1For nested lists, the reversal takes place only at thetopmost level.
reverse (1 2 3; "abc"; `Four`Score`and`Seven)`Four`Score`and`Seven"abc"1 2 3In the case of an empty list, reverse returns theargument.
reverse ()()In the case of a dictionary, reverse inverts boththe domain and range lists.
reverse`a`b`c!1 2 3c| 3b| 2a| 1Since a table is a list of records, reverseinverts the order of the records.
t:([] c1:`a`b`c; c2:1 2 3)tc1 c2-----a 1b 2c 3reverse tc1 c2-----c 3b 2a 1Since a keyed table is a dictionary, reverseinverts both the domain and range tables, effectively inverting the row order.
ktk| c-| ---1| 1002| 1013| 102reverse ktk| c-| ---3| 1022| 1011| 100sublist
The dyadic function sublist retrieves a sublist ofcontiguous items from a list. The left operand is a simple list of two ints:the first item is the starting index (start); the second item is thenumber of items to retrieve (count). The right operand (target)is a list or dictionary.
If target is a list, the result is a list comprisingcount items fromtarget beginning at indexstart.
L:1 2 3 4 51 3 sublist L2 3 4If target is a dictionary, the result is adictionary whose domain comprisescount items from thetargetdomain beginning at index start, and whose range is the correspondingitems in thetarget range.
d:`a`b`c`d`e!1 2 3 4 51 3 sublist db| 2c| 3d| 4Since a table is a list of records, sublistapplies to the rows of a table.
t:([]c1:`a`b`c`d`e;c2:1 2 3 4 5)1 3 sublist tc1 c2-----b 2c 3d 4Since a keyed table is a dictionary, sublist isapplied to the key table.
kt:([k:`a`b`c`d`e]c1:1 2 3 4 5)1 3 sublist ktk| c1-| --b| 2c| 3d| 4system
The monadic system takes a string argument and executes itis a q command, if recognized, or an OS command otherwise. The function systemis equivalent to\\ but can be more convenient or readable insituations such as remote or programmatic execution in which the backslashesmust be escaped.
The following changes the current working directory to itssparent directory.
system "cd .."take (#)
When the left operand of take ( # ) is an int atom, itcreates a new entity via extraction from its right operand (source) asspecified by the first operand. A positive integer in the first operandindicates that the extraction occurs from the beginning of the source,whereas a negative integer in the first operand indicates that the extractionoccurs from the end of thesource.
The source can be an atom, a list, a dictionary, atable or a keyed table.
2#33 3-1#10 20 30 40,40-2#`a`b`c`d!10 20 30 40c| 30d| 403#([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)a b------10 1.120 2.230 3.31#([k:10 20 30] c:`one`two`three)k | c--| ---10| oneThe result of take is of the same type and shape as the source,except the result is never a scalar.
1#42,42If the number of elements in source exceeds what isnecessary to form the result, trailing elements are ignored.
4#`a`b`c`d`e`f`g`h`a`b`c`dIf the number of elements in source is less thannecessary to form the result, the extraction resumes from the starting point ofthesource list; this process is repeated until the result is filled.
5#98 9998 99 98 99 98-7#`a`b`c`c`a`b`c`a`b`cIn the degenerate case, the result is an empty entity withthe same type as the source. This is an effective way to obtain the schema of aq dictionary or list.
0#42`int$()0#10 20 30 40`int$()0#`a`b`c`d!10 20 30 40_0#([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)a b---0#([k:10 20 30] c:`one`two`three)k| c-| -Note:Since the result of0# on a list is always a list, we can use this construct as shorthandto initialize an empty value column with a definite type in a table definition.This ensures that only values of the specified type can be inserted into thecolumn. For example,
([] a:0#0; b:0#`)a b---defines an empty table whose first column is of type intand whose second column is of type symbol.
When the left operand of # is a list of symbolcolumn names and the right operand is a table, the result is the table obtainedby extracting the specified columns from t.
t:([] c1:`a`b`c; c2:1 2 3; c3:1.1 2.2 3.3)`c1`c3#tc1 c3------a 1.1b 2.2c 3.3When the left operand of # is a table (keys)and the second operand is a keyed table whose key table containskeys,the result is the keyed table corresponding to those values inkeys.
ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford] iq:98 42 126)ktclname fname | iq-----------------| ---Dent Arthur| 98Beeblebrox Zaphod| 42Prefect Ford | 126K:([] lname:`Dent`Prefect; fname:`Arthur`Ford)K#ktclname fname | iq--------------| ---Dent Arthur| 98Prefect Ford | 126til
The monadic til returns a list of the integersfrom 0 to n-1, where its argumentn is a non-negative integer.
til 40 1 2 3The result of til is always a list of int. So,
til 1,0til 0`int$()Generating sequences is simple with til.
2*til 10 / evens0 2 4 6 8 10 12 14 16 181+2*til 10 / odds1 3 5 7 9 11 13 15 17 1920+til 520 21 22 23 240.5*til 100 0.5 1 1.5 2 2.5 3 3.5 4 4.5The function til is useful for extracting asublist from a list. The idiom,
L[i+til n]extracts from the list L the sublist of length nstarting with the element in positioni. For example,
L:10 20 30 40 50 60 70i:2n:3L[i+til n]30 40 50Similarly, the idiom
L[i+til j+1-i]extracts the sublist from positions i through j,inclusive. WithL andi as above,
i:2j:5L[i+til j+1-i]30 40 50 60Note:In the second idiom, omitting the increment-by-one retrieves one less item thanyou probably intend. This is an easy error to make.
These idioms are useful for extracting substrings.
s:"abcdefg"i:1n:2j:4s[i+til n]"bc"s[i+til j+1-i]"bcde"Note:You can use the built-in functionsublist to retrieve substrings.
The expression,
n = count til nis true for every n ? 0. Similarly, theexpression,
L~L[til count L]is true for every list L. Both expressions remainvalid in the degenerate case of the empty list.
ungroup
The monadic ungroup can be applied to a keyedtable that is the result of aselect with grouping or of thexgroupfunction. The result will have the selected records in the same format as theoriginal table but they may be in a different order since they will be sortedby the grouping column(s).
Using the distribution example,
sps p qty---------s1 p1 300s1 p2 200s1 p3 400s1 p4 200s4 p5 100s1 p6 100s2 p1 300s2 p2 400s3 p2 200s4 p2 200s4 p4 300s1 p5 400ungroup select s, qty by p from spp s qty---------p1 s1 300p1 s2 300p2 s1 200p2 s2 400p2 s3 200p2 s4 200p3 s1 400p4 s1 200p4 s4 300p5 s4 100p5 s1 400p6 s1 100Note:You can apply ungroup to a keyed table that did not arise from a groupoperation, but it must have the correct form or an error will result.
union
The dyadic union can be applied to lists andtables. It returns an entity of the same type as its arguments containing thedistinct elements from both arguments.
1 union 2 31 2 31 2 union 2 31 2 31 1 3 union 1 2 3 11 3 2"a good time" union "was had by all""a godtimewshbyl"Observe that the items of the first argument appear firstin the result.
Tables that have the same columns can participate in union.The result is a table with the distinct records from the combination of the twotables.
t1:([] a:1 2 3 4; b:`first`second`third`fourth)t2:([] a:2 4 6; b:`dos`cuatro`seis)t1a b--------1 first2 second3 third4 fourtht2a b--------2 dos4 cuatro6 seist1 union t2a b--------1 first2 second3 third4 fourth2 dos4 cuatro6 seisNote:As of this writing (Jun 2007), union does not apply to dictionaries or keyedtables.
value
The function value has two uses. When applied to adictionary, value returns the range of the dictionary.
d:`a`b`c!1 2 3value d1 2 3Logically enough, for a keyed table, value returns thevalue table.
kt:([k:101 102 103] c1:`a`b`c)ktk | c1---| --101| a102| b103| cvalue ktc1--abcWhen value is applied to a string, it passes thestring to the q interpreter and returns the result.
value "6*7"42value "{x*x} til 10"0 1 4 9 16 25 36 49 64 81z:98.6value"z"98.6value "a:6;b:7;c:a*b"a6b7c42Note:This use of thevalue function is a powerful feature that allows q code to be written andexecuted on the fly. If abused, it can quickly lead to unmaintainable code.(The spellchecker suggests "unmentionable" instead of"unmaintainable." How did it know?)
A common use of value is to convert a symbol orstring containing the name of a q entity into the value associated with theentity.
a:42s:`avalue `a42value s42value "a"42where
The monadic where has multiple uses, depending onthe type of its argument.
When the argument is a boolean list, where returnsa list of int comprising the positions in the argument having value1b.
where 00110101b2 3 5 7This is useful when the boolean list is generated by a teston a list.
L:"Now;is;the;time"where L=";"3 6 10L[where L=";"]:" "L"Now is the time"Note:The behavior of the where phrase in theselect template is relatedto thewhere function on a boolean list. The former limits the selection totable rows in those positions where the value of the where expression is notzero. Since the expression involves test(s) on column value(s), the wherephrase effectively selects the rows satisfying its column condition, just as inSQL. SeeThe where Phrase for more on thewhere phrase.
When the argument s of where is a list ofnon-negative int, the result is a list of int comprising the items 0, ... ,-1+counts, in which the original item at positioni is repeated s[i]times.
For example,
where 2 1 30 0 1 2 2 2where 4 0 20 0 0 0 2 2where 4#10 1 2 3Note:The behavior of where on an int list reduces to that on a boolean list byconsidering the boolean values as ints.
When the argument s is a dictionary whose range is alist of non-negative int,where returns a list comprising items of thedomain ofs, in which the item at positioni is repeated s[i]times.
For example,
where `a`b`c!2 1 3`a`a`b`c`c`cwhere `a`b`c!4 0 2`a`a`a`a`c`cNote:The behavior ofwhere on a dictionary is consistent with its behavior on a list byconsidering a list L as a mapping whose implicit domain istil count L.
within
The dyadic function within is atomic in its firstargument (source) and takes a second argument that is a list of twoitems that have underlying numeric values. It returns a boolean valuerepresenting whether source is between the two items of the second argument(inclusive).
3 within 2 51b100 within 0 1001b"c" within "az"1b2006.11.19 2007.07.04 2008.08.12 within 2007.01.01 2007.12.31010bObserve that within is type tolerant provided botharguments have underlying numeric values, meaning that the types of itsarguments do not need to match.
0x42 within (30h;100j)1b100 within "aj"1bIt is also possible to apply within to symbolssince they have lexicographic order.
`ab within `a`z1bNote:The expression
x within (a;b)is equivalent to,
(a<=x)&x<=bThus, if the items of the second argument are not inincreasing order, the result ofwithin will always be0b.
5 within 6 20b