# Common Language Reference Manual **Version 1.0** **Target**: x86-32 (IA-32) Linux ELF **Calling Convention**: cdecl **Author**: Common Compiler Project --- ## Table of Contents 1. [Introduction](#introduction) 2. [Compiler Usage](#compiler-usage) 3. [Lexical Elements](#lexical-elements) 4. [Type System](#type-system) 5. [Declarations](#declarations) 6. [Expressions](#expressions) 7. [Statements](#statements) 8. [Functions](#functions) 9. [Scope and Linkage](#scope-and-linkage) 10. [Memory Model](#memory-model) 11. [Assembly Interface](#assembly-interface) 12. [Limitations](#limitations) 13. [Examples](#examples) --- ## 1. Introduction Common is a statically-typed, imperative programming language that compiles to x86-32 assembly (NASM syntax). It provides a minimal yet complete set of features for systems programming: - Integer types from 8 to 64 bits - Pointers and arrays - Functions with parameters - Control flow (if, while, for, switch) - Full operator set (arithmetic, logical, bitwise) - Direct C library interoperability ### Design Philosophy - **No runtime dependencies**: Compiled programs link only against libc - **Explicit control**: No hidden allocations or implicit conversions - **Predictable output**: Direct mapping to assembly - **C compatibility**: Can call and be called by C code --- ## 2. Compiler Usage ### Building the Compiler ```bash gcc -o common common.c ``` ### Compiling Programs ```bash # Compile Common source to NASM assembly ./common source.cm output.asm # Assemble to object file nasm -f elf32 output.asm -o output.o # Link (requires 32-bit support) gcc -m32 output.o -o executable ``` ### One-Line Compilation ```bash ./common source.cm output.asm && nasm -f elf32 output.asm && gcc -m32 output.o -o program ``` ### Compiler Output The compiler writes NASM x86-32 assembly to stdout (or specified file) using: - **ELF32** object format - **cdecl** calling convention - **Sections**: `.text`, `.data`, `.bss` ### Error Reporting Errors are reported to stderr with line numbers: ``` line 42: syntax error near 'token' line 15: Unknown char '~' ``` --- ## 3. Lexical Elements ### Comments ```c // Single-line comment (C++ style) /* Multi-line comment spanning multiple lines */ ``` Comments are stripped during lexical analysis. ### Keywords ``` if else while for switch case default break continue return void uint8 uint16 uint32 uint64 int8 int16 int32 int64 ``` ### Identifiers ``` [a-zA-Z_][a-zA-Z0-9_]* ``` - Must start with letter or underscore - Case-sensitive - No length limit (internal buffer: 256 chars) ### Integer Literals ```c 42 // Decimal 0x2A // Hexadecimal 052 // Octal 0b101010 // Binary (if supported by strtoul) ``` Literals are parsed by `strtoul()` with base 0 (auto-detect). ### String Literals ```c "Hello, World!" "Line 1\nLine 2" "Tab\there" ``` Supported escape sequences: - `\n` - newline - `\t` - tab - `\r` - carriage return - `\0` - null character - `\\` - backslash - `\"` - quote - Any other `\x` - literal `x` String literals are null-terminated and stored in `.data` section. ### Operators and Punctuation **Multi-character operators**: ``` == != <= >= && || << >> ++ -- += -= *= /= %= &= |= ^= <<= >>= ``` **Single-character operators**: ``` + - * / % & | ^ ~ ! < > = ``` **Punctuation**: ``` ( ) { } [ ] ; , : ? ``` --- ## 4. Type System ### Integer Types | Type | Size | Range (Unsigned) | Range (Signed) | |---------|-------|----------------------|-----------------------------| | uint8 | 1 byte| 0 to 255 | - | | int8 | 1 byte| - | -128 to 127 | | uint16 | 2 bytes| 0 to 65,535 | - | | int16 | 2 bytes| - | -32,768 to 32,767 | | uint32 | 4 bytes| 0 to 4,294,967,295 | - | | int32 | 4 bytes| - | -2,147,483,648 to 2,147,483,647 | | uint64 | 8 bytes| 0 to 2^64-1 | - | | int64 | 8 bytes| - | -2^63 to 2^63-1 | **Note**: 64-bit types are partially supported. They occupy 8 bytes in memory but arithmetic operations truncate to 32 bits on x86-32. ### Void Type ```c void ``` - Used only for function return types - Cannot declare variables of type void - `void` in parameter list means "no parameters" ### Pointer Types ```c int32 *ptr; // Pointer to int32 uint8 **pptr; // Pointer to pointer to uint8 void *generic; // Generic pointer (4 bytes) ``` - All pointers are 4 bytes (32-bit addresses) - Pointer arithmetic scales by pointee size - Can be cast between types ### Array Types ```c int32 arr[10]; // Array of 10 int32 uint8 matrix[5][5]; // Not supported (single dimension only) ``` Arrays: - Decay to pointers when used in expressions - Cannot be returned from functions - Cannot be assigned (use element-wise copy) ### Type Qualifiers Common has no type qualifiers (no `const`, `volatile`, `restrict`). --- ## 5. Declarations ### Variable Declarations **Local variables**: ```c int32 x; // Uninitialized int32 y = 42; // Initialized uint8 c = 'A'; // Character (just an int) ``` **Global variables**: ```c int32 global_var; // Zero-initialized (.bss) int32 initialized = 100; // Explicitly initialized (.data) ``` ### Array Declarations **Local arrays**: ```c int32 arr[10]; // Uninitialized int32 nums[5] = { 1, 2, 3, 4, 5 }; // Initialized uint8 partial[10] = { 1, 2 }; // Rest zero-filled ``` **Global arrays**: ```c int32 global_arr[100]; // Zero-initialized (.bss) int32 data[3] = { 10, 20, 30 }; // Initialized (.data) ``` ### Pointer Declarations ```c int32 *ptr; // Pointer to int32 uint8 *str; // Pointer to uint8 (common for strings) void *generic; // Generic pointer int32 **pptr; // Pointer to pointer ``` ### Type Syntax ``` type_specifier ::= base_type pointer_suffix base_type ::= "int8" | "int16" | "int32" | "int64" | "uint8" | "uint16" | "uint32" | "uint64" | "void" pointer_suffix ::= ("*")* ``` Examples: ```c int32 x; // Base type: int32, no pointers uint8 *s; // Base type: uint8, 1 pointer level void **pp; // Base type: void, 2 pointer levels ``` --- ## 6. Expressions ### Primary Expressions ```c 42 // Integer literal "string" // String literal variable // Identifier (expression) // Parenthesized expression ``` ### Postfix Expressions ```c array[index] // Array subscript function(args) // Function call expr++ // Post-increment expr-- // Post-decrement ``` ### Unary Expressions ```c ++expr // Pre-increment --expr // Pre-decrement -expr // Negation !expr // Logical NOT ~expr // Bitwise NOT &expr // Address-of *expr // Dereference (type)expr // Type cast ``` ### Binary Expressions **Arithmetic**: ```c a + b // Addition a - b // Subtraction a * b // Multiplication a / b // Division a % b // Modulo ``` **Bitwise**: ```c a & b // Bitwise AND a | b // Bitwise OR a ^ b // Bitwise XOR a << b // Left shift a >> b // Right shift (arithmetic for signed, logical for unsigned) ``` **Comparison**: ```c a == b // Equal a != b // Not equal a < b // Less than a <= b // Less than or equal a > b // Greater than a >= b // Greater than or equal ``` **Logical**: ```c a && b // Logical AND (short-circuit) a || b // Logical OR (short-circuit) ``` ### Assignment Expressions ```c a = b // Assignment a += b // Add and assign a -= b // Subtract and assign a *= b // Multiply and assign a /= b // Divide and assign a %= b // Modulo and assign a &= b // AND and assign a |= b // OR and assign a ^= b // XOR and assign a <<= b // Left shift and assign a >>= b // Right shift and assign ``` ### Ternary Expression ```c condition ? true_expr : false_expr ``` Example: ```c max = (a > b) ? a : b; ``` ### Operator Precedence From highest to lowest: | Level | Operators | Associativity | |-------|----------------------------|---------------| | 1 | `()` `[]` `++` `--` (post) | Left to right | | 2 | `++` `--` (pre) `+` `-` `!` `~` `&` `*` `(cast)` | Right to left | | 3 | `*` `/` `%` | Left to right | | 4 | `+` `-` | Left to right | | 5 | `<<` `>>` | Left to right | | 6 | `<` `<=` `>` `>=` | Left to right | | 7 | `==` `!=` | Left to right | | 8 | `&` | Left to right | | 9 | `^` | Left to right | | 10 | `|` | Left to right | | 11 | `&&` | Left to right | | 12 | `||` | Left to right | | 13 | `?:` | Right to left | | 14 | `=` `+=` `-=` etc. | Right to left | ### Pointer Arithmetic ```c int32 *p = arr; p + 1 // Points to next int32 (address + 4) p - 1 // Points to previous int32 (address - 4) p[i] // Equivalent to *(p + i) ``` Pointer arithmetic automatically scales by the size of the pointed-to type: - `uint8*` increments by 1 - `uint16*` increments by 2 - `int32*` increments by 4 - Any pointer-to-pointer increments by 4 ### Type Conversions **Explicit casting**: ```c (uint8)value // Truncate to 8 bits (int32)byte_value // Sign-extend or zero-extend (uint32*)ptr // Pointer type conversion ``` **Implicit conversions**: - Arrays decay to pointers - Smaller integers promote to int32 in expressions --- ## 7. Statements ### Expression Statement ```c expression; ``` Examples: ```c x = 42; function_call(); x++; ``` ### Compound Statement (Block) ```c { statement1; statement2; ... } ``` Blocks create new scopes for local variables. ### If Statement ```c if (condition) statement if (condition) statement else statement ``` Examples: ```c if (x > 0) printf("positive\n"); if (x > 0) { printf("positive\n"); } else if (x < 0) { printf("negative\n"); } else { printf("zero\n"); } ``` ### While Statement ```c while (condition) statement ``` Example: ```c while (x < 100) { x = x * 2; } ``` ### For Statement ```c for (init; condition; increment) statement ``` The `init` can be: - Empty: `for (; condition; increment)` - Expression: `for (x = 0; x < 10; x++)` - Declaration: `for (int32 i = 0; i < 10; i++)` Example: ```c for (int32 i = 0; i < 10; i = i + 1) { sum = sum + i; } ``` ### Switch Statement ```c switch (expression) { case value1: statements break; case value2: statements break; default: statements } ``` - Cases must be integer constants - Fall-through is allowed (no automatic break) - `default` is optional Example: ```c switch (day) { case 0: printf("Sunday\n"); break; case 6: printf("Saturday\n"); break; default: printf("Weekday\n"); } ``` ### Break Statement ```c break; ``` Exits the innermost `while`, `for`, or `switch` statement. ### Continue Statement ```c continue; ``` Skips to the next iteration of the innermost `while` or `for` loop. ### Return Statement ```c return; // Return from void function return expression; // Return value ``` Example: ```c return 42; return x + y; return; ``` --- ## 8. Functions ### Function Declarations ```c return_type function_name(parameter_list); ``` Forward declaration (prototype): ```c int32 add(int32 a, int32 b); ``` ### Function Definitions ```c return_type function_name(parameter_list) { statements } ``` Example: ```c int32 add(int32 a, int32 b) { return a + b; } ``` ### Parameters ```c void no_params(void) { } // No parameters int32 one_param(int32 x) { } // One parameter int32 two_params(int32 x, uint8 *s) { } // Multiple parameters ``` Parameters are passed by value. To modify caller's data, use pointers: ```c void swap(int32 *a, int32 *b) { int32 temp = *a; *a = *b; *b = temp; } ``` ### Return Values ```c int32 get_value(void) { return 42; } void no_return(void) { // No return statement needed return; // Optional } ``` Return value is passed in `eax` register (32-bit). ### Recursion Recursion is fully supported: ```c int32 factorial(int32 n) { if (n <= 1) return 1; return n * factorial(n - 1); } ``` ### Calling Convention Functions use **cdecl** convention: - Arguments pushed right-to-left on stack - Caller cleans up stack - Return value in `eax` - `eax`, `ecx`, `edx` are caller-saved - `ebx`, `esi`, `edi`, `ebp` are callee-saved ### Calling C Functions Common can call C library functions: ```c // Declare C functions void printf(uint8 *format, ...); void *malloc(uint32 size); void free(void *ptr); int32 main(void) { printf("Hello from Common\n"); void *mem = malloc(100); free(mem); return 0; } ``` **Note**: Variadic functions (`...`) can be declared but not defined in Common. --- ## 9. Scope and Linkage ### Scope Rules **Global scope**: - Variables and functions declared outside any function - Visible to all functions in the file **Local scope**: - Variables declared inside a function or block - Visible only within that function/block - Shadows global variables with the same name **Block scope**: ```c { int32 x = 1; { int32 x = 2; // Different variable, shadows outer x printf("%d\n", x); // Prints 2 } printf("%d\n", x); // Prints 1 } ``` ### Linkage **External linkage** (default for functions): ```c int32 global_function(void) { ... } ``` Symbol is exported (`global` directive in assembly). **No linkage** (local variables): ```c void func(void) { int32 local; // No linkage } ``` **Static linkage**: Not supported. All functions have external linkage. ### Name Resolution 1. Check local scope (function parameters and locals) 2. Check global scope 3. If not found, assumed to be external symbol --- ## 10. Memory Model ### Stack Layout ``` High Address +------------------+ | Return address | +------------------+ | Saved EBP | <-- EBP +------------------+ | Local variable 1 | EBP - 4 +------------------+ | Local variable 2 | EBP - 8 +------------------+ | ... | +------------------+ | Array data | (grows down) +------------------+ <-- ESP Low Address ``` ### Function Call Stack ```c caller(): push arg2 push arg1 call callee add esp, 8 // Clean up arguments callee(arg1, arg2): push ebp // Save old frame pointer mov ebp, esp // Set up new frame sub esp, N // Allocate locals ... mov esp, ebp // Restore stack pop ebp ret ``` Arguments accessed via `[ebp+8]`, `[ebp+12]`, etc. Locals accessed via `[ebp-4]`, `[ebp-8]`, etc. ### Data Sections **.text**: Read-only code ```nasm section .text function_name: ; assembly code ``` **.data**: Initialized data ```nasm section .data global_var: dd 42 string: db "Hello", 0 ``` **.bss**: Zero-initialized data ```nasm section .bss uninit_var: resd 1 array: resb 100 ``` ### Size Directives | Directive | Size | Common Type | |-----------|-------|----------------| | `resb`/`db` | 1 byte | uint8/int8 | | `resw`/`dw` | 2 bytes| uint16/int16 | | `resd`/`dd` | 4 bytes| uint32/int32/pointers | | `resq`/`dq` | 8 bytes| uint64/int64 | ### Alignment - Stack is 16-byte aligned (per System V ABI) - Local variables are 4-byte aligned - Arrays follow element alignment --- ## 11. Assembly Interface ### Generated Assembly Structure ```nasm BITS 32 section .text ; External function declarations extern printf extern malloc ; Exported functions global main global my_function main: push ebp mov ebp, esp sub esp, 16 ; Allocate locals ; ... function body ... mov esp, ebp pop ebp ret section .data _s0: db "Hello", 0 ; String literal section .bss global_var: resd 1 ; Global variable ``` ### Register Usage **Caller-saved** (may be modified by called function): - `eax` - Return value, scratch - `ecx` - Scratch, left operand - `edx` - Scratch, division remainder **Callee-saved** (preserved across calls): - `ebx` - Base register - `esi` - Source index - `edi` - Destination index - `ebp` - Frame pointer - `esp` - Stack pointer **Common usage**: - `eax` - Expression results, return values - `ecx` - Left operand in binary operations - `[ebp+N]` - Function parameters - `[ebp-N]` - Local variables ### Calling C from Assembly ```nasm ; Call: printf("Value: %d\n", x); push dword [ebp-4] ; Push x push _s0 ; Push format string call printf add esp, 8 ; Clean up (2 args × 4 bytes) ``` ### Inline Assembly Not supported. Use C library functions or write separate assembly files. --- ## 12. Limitations ### Language Limitations 1. **Single-file compilation only** - No `#include` or import mechanism - All code must be in one source file - Use forward declarations for ordering 2. **No structures or unions** - Can simulate with arrays: `node[0]` = data, `node[1]` = next - Manual offset calculation required 3. **No floating point** - Integer arithmetic only - No `float`, `double`, or `long double` 4. **No preprocessor** - No `#define`, `#ifdef`, etc. - No macro expansion - No file inclusion 5. **No enums** - Use integer constants instead 6. **Limited 64-bit support** - 64-bit types exist but operations truncate to 32-bit - Full 64-bit arithmetic not implemented 7. **No static/extern keywords** - All functions are global - No static local variables - No explicit extern declarations 8. **Single-dimensional arrays only** - Multidimensional arrays not supported - Can use pointer arithmetic for 2D: `arr[i * width + j]` 9. **No goto** - Use loops and breaks instead 10. **No comma operator** - Cannot use `a = (b, c)` ### Implementation Limitations 1. **Fixed buffer sizes** - 256 identifiers/strings - 256 local variables per function - 256 global variables total - 512 string literals 2. **No optimization** - Generated code is unoptimized - Expressions fully evaluated (no constant folding) 3. **Limited error messages** - Basic syntax errors reported - No semantic analysis warnings - No type mismatch warnings 4. **x86-32 only** - Not portable to other architectures - Requires 32-bit toolchain ### Workarounds **Structures**: Use arrays ```c // Instead of: struct { int x; int y; } point; int32 point[2]; // point[0] = x, point[1] = y ``` **Multidimensional arrays**: Manual indexing ```c // Instead of: int matrix[10][10]; int32 matrix[100]; int32 value = matrix[row * 10 + col]; ``` **Enums**: Integer constants ```c // Instead of: enum { RED, GREEN, BLUE }; int32 RED = 0; int32 GREEN = 1; int32 BLUE = 2; ``` --- ## 13. Examples ### Hello World ```c void puts(uint8 *s); int32 main(void) { puts("Hello, World!"); return 0; } ``` ### Factorial (Iterative) ```c int32 factorial(int32 n) { int32 result = 1; for (int32 i = 2; i <= n; i = i + 1) { result = result * i; } return result; } ``` ### Fibonacci (Recursive) ```c int32 fib(int32 n) { if (n <= 1) return n; return fib(n - 1) + fib(n - 2); } ``` ### String Length ```c int32 strlen(uint8 *s) { int32 len = 0; while (s[len]) len = len + 1; return len; } ``` ### Array Sum ```c int32 sum_array(int32 *arr, int32 len) { int32 total = 0; for (int32 i = 0; i < len; i = i + 1) { total = total + arr[i]; } return total; } ``` ### Pointer Swap ```c void swap(int32 *a, int32 *b) { int32 temp = *a; *a = *b; *b = temp; } ``` ### Bubble Sort ```c void bubble_sort(int32 *arr, int32 n) { for (int32 i = 0; i < n - 1; i = i + 1) { for (int32 j = 0; j < n - i - 1; j = j + 1) { if (arr[j] > arr[j + 1]) { int32 temp = arr[j]; arr[j] = arr[j + 1]; arr[j + 1] = temp; } } } } ``` ### Binary Search ```c int32 binary_search(int32 *arr, int32 n, int32 target) { int32 left = 0; int32 right = n - 1; while (left <= right) { int32 mid = left + (right - left) / 2; if (arr[mid] == target) return mid; if (arr[mid] < target) left = mid + 1; else right = mid - 1; } return -1; // Not found } ``` ### Linked List (Simulated) ```c void *malloc(uint32 size); void free(void *ptr); // Node: [0] = data, [1] = next pointer int32 *create_node(int32 value) { int32 *node = (int32*)malloc(8); node[0] = value; node[1] = 0; return node; } void insert_front(int32 **head, int32 value) { int32 *new_node = create_node(value); new_node[1] = (int32)(*head); *head = new_node; } ``` ### Bitwise Operations ```c // Check if bit N is set int32 is_bit_set(uint32 value, int32 n) { return (value >> n) & 1; } // Set bit N uint32 set_bit(uint32 value, int32 n) { return value | (1 << n); } // Clear bit N uint32 clear_bit(uint32 value, int32 n) { return value & ~(1 << n); } // Toggle bit N uint32 toggle_bit(uint32 value, int32 n) { return value ^ (1 << n); } ``` --- ## Appendix A: Grammar Summary ``` program ::= declaration* declaration ::= | type_spec identifier "(" param_list ")" ( ";" | block ) | type_spec identifier ";" | type_spec identifier "=" expr ";" | type_spec identifier "[" expr "]" ";" | type_spec identifier "[" expr "]" "=" "{" expr_list "}" ";" type_spec ::= base_type "*"* base_type ::= "void" | "int8" | "int16" | "int32" | "int64" | "uint8" | "uint16" | "uint32" | "uint64" param_list ::= "void" | ( param ( "," param )* )? param ::= type_spec identifier block ::= "{" statement* "}" statement ::= | block | type_spec identifier ";" | type_spec identifier "=" expr ";" | type_spec identifier "[" expr "]" ( "=" "{" expr_list "}" )? ";" | expr ";" | "if" "(" expr ")" statement ( "else" statement )? | "while" "(" expr ")" statement | "for" "(" (decl | expr)? ";" expr? ";" expr? ")" statement | "switch" "(" expr ")" "{" case_clause* "}" | "return" expr? ";" | "break" ";" | "continue" ";" case_clause ::= | "case" expr ":" statement* | "default" ":" statement* expr ::= assignment assignment ::= ternary ( assign_op ternary )? assign_op ::= "=" | "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" ternary ::= logical_or ( "?" expr ":" ternary )? logical_or ::= logical_and ( "||" logical_and )* logical_and ::= bit_or ( "&&" bit_or )* bit_or ::= bit_xor ( "|" bit_xor )* bit_xor ::= bit_and ( "^" bit_and )* bit_and ::= equality ( "&" equality )* equality ::= relational ( ("==" | "!=") relational )* relational ::= shift ( ("<" | "<=" | ">" | ">=") shift )* shift ::= additive ( ("<<" | ">>") additive )* additive ::= multiplicative ( ("+" | "-") multiplicative )* multiplicative ::= unary ( ("*" | "/" | "%") unary )* unary ::= | postfix | "++" unary | "--" unary | "-" unary | "!" unary | "~" unary | "&" unary | "*" unary | "(" type_spec ")" unary postfix ::= | primary | postfix "[" expr "]" | postfix "(" expr_list? ")" | postfix "++" | postfix "--" primary ::= | integer_literal | string_literal | identifier | "(" expr ")" ``` --- ## Appendix B: Quick Reference Card **Types**: void, int8, int16, int32, int64, uint8, uint16, uint32, uint64 **Operators**: + - * / % & | ^ ~ ! < > <= >= == != << >> && || ?: = += -= *= /= %= &= |= ^= <<= >>= ++ -- & * [] **Keywords**: if else while for switch case default break continue return **Control**: if/else, while, for, switch/case, break, continue, return **Functions**: type name(params) { body } **Arrays**: type name[size], type name[size] = { values } **Pointers**: type *name, &var, *ptr, ptr[index] **Comments**: // line, /* block */ --- *End of Common Language Reference Manual*