In formal language theory, a grammar is a set of rules used to generate strings in a language. Grammars are commonly used to describe the syntax of programming languages, natural languages, and various other formal languages.
A grammar consists of the following components:
- Terminal Symbols (Terminals): These are the basic symbols from which strings in the language are constructed. Terminals are also referred to as “tokens” or “terminal characters.” In formal notation, terminals are often represented by lowercase letters, digits, or special characters.
- Non-terminal Symbols (Non-terminals): These are symbols that represent sets of strings in the language. Non-terminals are placeholders that can be replaced by other symbols according to the rules of the grammar. Non-terminals are typically represented by uppercase letters or combinations of letters.
- Production Rules (or Productions): These rules specify how non-terminal symbols can be replaced by sequences of terminal and non-terminal symbols. A production rule consists of a non-terminal symbol (on the left-hand side) and a string of terminals and/or non-terminals (on the right-hand side). Each non-terminal symbol may have multiple production rules associated with it. Production rules are often written in the form:
css
A → α
where A is a non-terminal symbol, and α is a string of terminals and/or non-terminals.
- Start Symbol: This is a special non-terminal symbol that serves as the starting point for generating strings in the language. All strings in the language must be derivable from the start symbol by applying production rules.
A grammar can be formally defined as a 4-tuple G = (V, Σ, R, S), where:
- V is a finite set of non-terminal symbols.
- Σ is a finite set of terminal symbols (disjoint from V).
- R is a finite set of production rules, each of the form A → α, where A is a non-terminal symbol in V, and α is a string of symbols from (V ∪ Σ)* (the Kleene star denotes concatenation of zero or more elements).
- S is the start symbol, a non-terminal symbol in V.
Grammars are classified based on their expressive power. Regular grammars generate regular languages, while context-free grammars generate context-free languages, and so on. Non-regular grammars are grammars that generate languages that are not regular. These include context-sensitive grammars, unrestricted grammars, and recursively enumerable grammars, which have increasing levels of generative power.