It is the study of how sentences are put together out of words.
The word “syntax” comes from Greek, meaning “setting out together or arrangement”, and refers to the way words are arranges together.
Context-Free Grammar is the most widely used formal system for modeling constituent structure in natural languages (aka. Phrase-Structure Grammar).
Context-free grammars are the backbone of many formal models of the syntax of natural language. They are powerful enough to express sophisticated relations among the words in a sentence.
A context-free grammar consists of a set of rules, each of which expresses the ways that symbols of the language can be grouped and ordered to express the ways that symbols of the language can be grouped and ordered together, and a lexicon of words and symbols.
For example, the following productions express that an NP (or noun phrase) can be composed of either a ProperNoun or a determiner (Det) followed by a Nominal; a Nominal in turn can consist of one or more Nouns.
The approach to grammar presented thus far emphasizes phrase-structure rules while minimizing the role of the lexicon. Numerous alternative approaches have been developed that all share the common theme of making better use of the lexicon. These approaches differ with respect to how lexicalized they are — the degree to which they rely on the lexicon as opposed to phrase structure rules to capture facts about the language.
Ambiguity is perhaps the most serious problem faced by syntactic parsers. Before we introduced the notions of part-of-speech ambiguity and part-of-speech disambiguation. Here, we introduce a new kind of ambiguity, called structural ambiguity, which arises from many commonly used rules in phrase-structure grammars.
Syntactic parsing is the task of recognizing a sentence and assigning a syntactic structure to it.
Context-free grammars themselves don’t specify how the parse tree for a given sentence should be computed. We therefore need to specify algorithms that employ these grammars to efficiently produce correct trees.
Related approaches include the Earley algorithm (Earley, 1970) and chart parsing (Kaplan 1973, Kay 1982), grammars in CNF are restricted to rules of the form A -> B C or A -> w. That is, the right-hand side of each rule must expand either to two nonterminals or to a single terminal. Restricting a grammar to CNF does not lead to any loss in expressiveness, since any context-free grammar can be converted into a corresponding CNF grammar that accepts exactly the same set of strings as the original grammar.
There are three situations we need to address in any generic grammar:
rules that mix terminals with non-terminals on the right-hand side,
rules that have a single non-terminal on the right-hand side,
and rules in which the length of the right-hand side is greater than 2.
The remedy for rules that mix terminals and non-terminals is to simply introduce a new dummy non-terminal that covers only the original terminal. For example, a rule for an infinitive verb phrase such as INF-VP -> to VP would be replaced by the two rules INF-VP ->TO VP and TO -> to.
Rules with a single non-terminal on the right are called unit productions. We productions can eliminate unit productions by rewriting the right-hand side of the original rules with the right-hand side of all the non-unit production rules that they ultimately lead to. if A ->B by a chain of one or more unit productions and B->g is a non-unit production in our grammar, then we add A -> g for each such rule in the grammar and discard all the intervening unit productions.
Rules with right-hand sides longer than 2 are normalized through the introduction of new non-terminals that spread the longer sequences over several new rules. if we have a rule like A -> B C g
With our grammar now in CNF, each non-terminal node above the part-of-speech level in a parse tree will have exactly two daughters. A two-dimensional matrix can be used to encode the structure of an entire tree.
The superdiagonal row in the matrix contains the parts of speech for each word in the input. The subsequent diagonals above that superdiagonal contain constituents that cover all the spans of increasing length in the input.
Given this setup, CKY recognition consists of filling the parse table in the right way. To do this, we’ll proceed in a bottom-up fashion so that at the point where we are filling any cell [i; j], the cells containing the parts that could contribute to this entry (i.e., the cells to the left and the cells below) have already been filled. The algorithm given in Fig. 13.5 fills the upper-triangular matrix a column at a time working from left to right, with each column filled from bottom to top, This scheme guarantees that at each point in time we have all the information we need (to the left, since all the columns to the left have already been filled, and below since we’re filling bottom to top).
One crucial use of probabilistic parsing is to solve the problem of disambiguation.
The most commonly used probabilistic constituency grammar formalism is the probabilistic context-free grammar (PCFG), a probabilistic augmentation of context-free grammars in which each rule is associated with a probability.
Shallow/Partial Parsing (Chunking): Identify the non-overlapping segments of a sentence, such as noun phrases, verb phrases, adjective phrases, and prepositional phrases.
Many language processing tasks do not require complex, complete parse trees for all inputs. For these tasks, a partial parse, or shallow parse, of input sentences may be sufficient.
Chunking: identifying and classifying the flat, non-overlapping segments of a sentence that constitute the basic non-recursive phrases corresponding to the major parts-of-speech found in most wide-coverage grammars. This set typically includes noun phrases, verb phrases, adjective phrases, and prepositional phrases; in other words, the phrases that correspond to the content-bearing parts-of-speech.
State-of-the-art approaches to chunking use supervised machine learning to train a chunker by using annotated data as a training set and training any sequence labeler. IOB It’s common to model chunking as IOB tagging. In IOB tagging we introduce a tag for the beginning (B) and inside (I) of each chunk type, and one for tokens outside (O) any chunk. The number of tags is thus 2n+1 tags, where n is the number of chunk types.
Relations among the words are illustrated above the sentence with directed, labeled arcs from heads to dependents. We call this a typed dependency structure because the labels are drawn from a fixed inventory of grammatical relations.
A major advantage of dependency grammars is their ability to deal with languages that are morphologically rich and have a relatively free word order.
Speech and Language Processing
Chapter 12 Constituency Grammars
Chapter 13 Constituency Parsing
Chapter 14 Statistical Constituency Parsing
Chapter 15 Dependency Parsing