# The CUE Language Specification

## Introduction

This is a reference manual for the CUE data constraint language. CUE, pronounced cue or Q, is a general-purpose and strongly typed constraint-based language. It can be used for data templating, data validation, code generation, scripting, and many other applications involving structured data. The CUE tooling, layered on top of CUE, provides a general purpose scripting language for creating scripts as well as simple servers, also expressed in CUE.

CUE was designed with cloud configuration and related systems in mind, but is not limited to this domain. It derives its formalism from relational programming languages. This formalism allows for managing and reasoning over large amounts of data in a straightforward manner.

The grammar is compact and regular, allowing for easy analysis by automatic tools such as integrated development environments.

This document is maintained by mpvl@golang.org. CUE has a lot of similarities with the Go language. This document draws heavily from the Go specification as a result.

CUE draws its influence from many languages. Its main influences were BCL/GCL (internal to Google), LKB (LinGO), Go, and JSON. Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google), Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.

## Notation

The syntax is specified using Extended Backus-Naur Form (EBNF):

Production  = production_name "=" [ Expression ] "." .
Expression  = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term        = production_name | token [ "…" token ] | Group | Option | Repetition .
Group       = "(" Expression ")" .
Option      = "[" Expression "]" .
Repetition  = "{" Expression "}" .


Productions are expressions constructed from terms and the following operators, in increasing precedence:

|   alternation
()  grouping
[]  option (0 or 1 times)
{}  repetition (0 to n times)


Lower-case production names are used to identify lexical tokens. Non-terminals are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes .

The form a … b represents the set of characters from a through b as alternatives. The horizontal ellipsis … is also used elsewhere in the spec to informally denote various enumerations or code snippets that are not further specified. The character … (as opposed to the three characters ...) is not a token of the CUE language.

## Source code representation

Source code is Unicode text encoded in UTF-8. Unless otherwise noted, the text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document will use the unqualified term character to refer to a Unicode code point in the source text.

Each code point is distinct; for instance, upper and lower case letters are different characters.

Implementation restriction: For compatibility with other tools, a compiler may disallow the NUL character (U+0000) in the source text.

Implementation restriction: For compatibility with other tools, a compiler may ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code point in the source text. A byte order mark may be disallowed anywhere else in the source.

### Characters

The following terms are used to denote specific Unicode character classes:

newline        = /* the Unicode code point U+000A */ .
unicode_char   = /* an arbitrary Unicode code point except newline */ .
unicode_letter = /* a Unicode code point classified as "Letter" */ .
unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .


In The Unicode Standard 8.0, Section 4.5 “General Category” defines a set of character categories. CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo as Unicode letters, and those in the Number category Nd as Unicode digits.

### Letters and digits

The underscore character _ (U+005F) is considered a letter.

"foo" =~ "^[a-z]{4}$" // false  #### Logical operators Logical operators apply to boolean values and yield a result of the same type as the operands. The right operand is evaluated conditionally. && conditional AND p && q is "if p then q else false" || conditional OR p || q is "if p then true else q" ! NOT !p is "not p"  ### Calls Calls can be made to core library functions, called builtins. Given an expression f of function type F, f(a1, a2, … an)  calls f with arguments a1, a2, … an. Arguments must be expressions of which the values are an instance of the parameter types of F and are evaluated before the function is called. a: math.Atan2(x, y)  In a function call, the function value and arguments are evaluated in the usual order. After they are evaluated, the parameters of the call are passed by value to the function and the called function begins execution. The return parameters of the function are passed by value back to the calling function when the function returns. ### Comprehensions Lists and fields can be constructed using comprehensions. Comprehensions define a clause sequence that consists of a sequence of for, if, and let clauses, nesting from left to right. The sequence must start with a for or if clause. The for and let clauses each define a new scope in which new values are bound to be available for the next clause. The for clause binds the defined identifiers, on each iteration, to the next value of some iterable value in a new scope. A for clause may bind one or two identifiers. If there is one identifier, it binds it to the value of a list element or struct field value. If there are two identifiers, the first value will be the key or index, if available, and the second will be the value. For lists, for iterates over all elements in the list after closing it. For structs, for iterates over all non-optional regular fields. An if clause, or guard, specifies an expression that terminates the current iteration if it evaluates to false. The let clause binds the result of an expression to the defined identifier in a new scope. A current iteration is said to complete if the innermost block of the clause sequence is reached. Syntactically, the comprehension value is a struct. A comprehension can generate non-struct values by embedding such values within this struct. Within lists, the values yielded by a comprehension are inserted in the list at the position of the comprehension. Within structs, the values yielded by a comprehension are embedded within the struct. Both structs and lists may contain multiple comprehensions. Comprehension = Clauses StructLit . Clauses = StartClause { [ "," ] Clause } . StartClause = ForClause | GuardClause . Clause = StartClause | LetClause . ForClause = "for" identifier [ "," identifier ] "in" Expression . GuardClause = "if" Expression . LetClause = "let" identifier "=" Expression .  a: [1, 2, 3, 4] b: [ for x in a if x > 1 { x+1 } ] // [3, 4, 5] c: { for x in a if x < 4 let y = 1 { "\(x)": x + y } } d: { "1": 2, "2": 3, "3": 4 }  ### String interpolation String interpolation allows constructing strings by replacing placeholder expressions with their string representation. String interpolation may be used in single- and double-quoted strings, as well as their multiline equivalent. A placeholder consists of \( followed by an expression and ). The expression is evaluated in the scope within which the string is defined. The result of the expression is substituted as follows: • string: as is • bool: the JSON representation of the bool • number: a JSON representation of the number that preserves the precision of the underlying binary coded decimal • bytes: as if substituted within single quotes or converted to valid UTF-8 replacing the maximal subpart of ill-formed subsequences with a single replacement character (W3C encoding standard) otherwise • list: illegal • struct: illegal a: "World" b: "Hello \( a )!" // Hello World!  ## Builtin Functions Built-in functions are predeclared. They are called like any other function. ### len The built-in function len takes arguments of various types and returns a result of type int. Argument type Result string string length in bytes bytes length of byte sequence list list length, smallest length for an open list struct number of distinct data fields, excluding optional  Expression Result len("Hellø") 6 len([1, 2, 3]) 3 len([1, 2, ...]) >=2  ### close The builtin function close converts a partially defined, or open, struct to a fully defined, or closed, struct. ### and The built-in function and takes a list and returns the result of applying the & operator to all elements in the list. It returns top for the empty list. Expression: Result and([a, b]) a & b and([a]) a and([]) _  ### or The built-in function or takes a list and returns the result of applying the | operator to all elements in the list. It returns bottom for the empty list. Expression: Result or([a, b]) a | b or([a]) a or([]) _|_  ### div, mod, quo and rem For two integer values x and y, the integer quotient q = div(x, y) and remainder r = mod(x, y) implement Euclidean division and satisfy the following relationship: r = x - y*q with 0 <= r < |y|  where |y| denotes the absolute value of y.  x y div(x, y) mod(x, y) 5 3 1 2 -5 3 -2 1 5 -3 -1 2 -5 -3 2 1  For two integer values x and y, the integer quotient q = quo(x, y) and remainder r = rem(x, y) implement truncated division and satisfy the following relationship: x = q*y + r and |r| < |y|  with quo(x, y) truncated towards zero.  x y quo(x, y) rem(x, y) 5 3 1 2 -5 3 -1 -2 5 -3 -1 2 -5 -3 1 -2  A zero divisor in either case results in bottom (an error). ## Cycles Implementations are required to interpret or reject cycles encountered during evaluation according to the rules in this section. ### Reference cycles A reference cycle occurs if a field references itself, either directly or indirectly. // x references itself x: x // indirect cycles b: c c: d d: b  Implementations should treat these as _. Two particular cases are discussed below. #### Expressions that unify an atom with an expression An expression of the form a & e, where a is an atom and e is an expression, always evaluates to a or bottom. As it does not matter how we fail, we can assume the result to be a and postpone validating a == e until after all references in e have been resolved. // Config Evaluates to (requiring concrete values) x: { x: { a: b + 100 a: _|_ // cycle detected b: a - 100 b: _|_ // cycle detected } } y: x & { y: { a: 200 a: 200 // asserted that 200 == b + 100 b: 100 } }  #### Field values A field value of the form r & v, where r evaluates to a reference cycle and v is a concrete value, evaluates to v. Unification is idempotent and unifying a value with itself ad infinitum, which is what the cycle represents, results in this value. Implementations should detect cycles of this kind, ignore r, and take v as the result of unification. Configuration Evaluated // c Cycles in nodes of type struct evaluate // ↙︎ ↖ to the fixed point of unifying their // a → b values ad infinitum. a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 } b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 } c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 } // resolve a b & {x:1} // substitute b c & {y:2} & {x:1} // substitute c a & {z:3} & {y:2} & {x:1} // eliminate a (cycle) {z:3} & {y:2} & {x:1} // simplify {x:1,y:2,z:3}  This rule also applies to field values that are disjunctions of unification operations of the above form. a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1} b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2} c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3} // resolving a b&{x:1} | {y:1} // substitute b ({x:2} | c&{z:2})&{x:1} | {y:1} // simplify c&{z:2}&{x:1} | {y:1} // substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1} // simplify a&{y:3}&{z:2}&{x:1} | {y:1} // eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1} // expand {x:1,y:3,z:2} | {y:1}  Note that all nodes that form a reference cycle to form a struct will evaluate to the same value. If a field value is a disjunction, any element that is part of a cycle will evaluate to this value. ### Structural cycles A structural cycle is when a node references one of its ancestor nodes. It is possible to construct a structural cycle by unifying two acyclic values: // acyclic y: { f: h: g g: _ } // acyclic x: { f: _ g: f } // introduces structural cycle z: x & y  Implementations should be able to detect such structural cycles dynamically. A structural cycle can result in infinite structure or evaluation loops. // infinite structure a: b: a // infinite evaluation f: { n: int out: n + (f & {n: 1}).out }  CUE must allow or disallow structural cycles under certain circumstances. If a node a references an ancestor node, we call it and any of its field values a.f cyclic. So if a is cyclic, all of its descendants are also regarded as cyclic. A given node x, whose value is composed of the conjuncts c1 & ... & cn, is valid if any of its conjuncts is not cyclic. // Disallowed: a list of infinite length with all elements being 1. #List: { head: 1 tail: #List } // Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...). a: { b: c } c: { d: a } // #List defines a list of arbitrary length. Because the recursive reference // is part of a disjunction, this does not result in a structural cycle. #List: { head: _ tail: null | #List } // Usage of #List. The value of tail in the most deeply nested element will // be null: as the value of the disjunct referring to list is the only // conjunct, all conjuncts are cyclic and the value is invalid and so // eliminated from the disjunction. MyList: #List & { head: 1, tail: { head: 2 }}  ## Modules, instances, and packages CUE configurations are constructed combining instances. An instance, in turn, is constructed from one or more source files belonging to the same package that together declare the data representation. Elements of this data representation may be exported and used in other instances. ### Source file organization Each source file consists of an optional package clause defining collection of files to which it belongs, followed by a possibly empty set of import declarations that declare packages whose contents it wishes to use, followed by a possibly empty set of declarations. Like with a struct, a source file may contain embeddings. Unlike with a struct, the embedded expressions may be any value. If the result of the unification of all embedded values is not a struct, it will be output instead of its enclosing file when exporting CUE to a data format SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .  "Hello \(#place)!" #place: "world" // Outputs "Hello world!"  ### Package clause A package clause is an optional clause that defines the package to which a source file the file belongs. PackageClause = "package" PackageName . PackageName = identifier .  The PackageName must not be the blank identifier or a definition identifier. package math  ### Modules and instances A module defines a tree of directories, rooted at the module root. All source files within a module with the same package name belong to the same package. A module may define multiple packages. An instance of a package is any subset of files belonging to the same package. It is interpreted as the concatenation of these files. An implementation may impose conventions on the layout of package files to determine which files of a package belongs to an instance. For example, an instance may be defined as the subset of package files belonging to a directory and all its ancestors. ### Import declarations An import declaration states that the source file containing the declaration depends on definitions of the imported package and enables access to exported identifiers of that package. The import names an identifier (PackageName) to be used for access and an ImportPath that specifies the package to be imported. ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) . ImportSpec = [ PackageName ] ImportPath . ImportLocation = { unicode_value } . ImportPath = " ImportLocation [ ":" identifier ] " .  The PackageName is used in qualified identifiers to access exported identifiers of the package within the importing source file. It is declared in the file block. It defaults to the identifier specified in the package clause of the imported package, which must match either the last path component of ImportLocation or the identifier following it. The interpretation of the ImportPath is implementation-dependent but it is typically either the path of a builtin package or a fully qualifying location of a package within a source code repository. An ImportLocation must be a non-empty string using only characters belonging to Unicode’s L, M, N, P, and S general categories (the Graphic characters without spaces) and may not include the characters !"#$%&'()*,:;<=>?[\\]^{|} or the Unicode replacement character U+FFFD.

Assume we have package containing the package clause package math, which exports function Sin at the path identified by lib/math. This table illustrates how Sin is accessed in files that import the package after the various types of import declaration.

Import declaration          Local name of Sin

import   "lib/math"         math.Sin
import   "lib/math:math"    math.Sin
import m "lib/math"         m.Sin
`

An import declaration declares a dependency relation between the importing and imported package. It is illegal for a package to import itself, directly or indirectly, or to directly import a package without referring to any of its exported identifiers.

### An example package

TODO

Last modified August 26, 2022: all: do not ignore generated files (fe31a57)