Files
common/MANUAL.md
2026-03-14 14:14:37 -04:00

1258 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Common Language Reference Manual
**Version 1.0**
**Target**: x86-32 (IA-32) Linux ELF
**Calling Convention**: cdecl
**Author**: Common Compiler Project
---
## Table of Contents
1. [Introduction](#introduction)
2. [Compiler Usage](#compiler-usage)
3. [Lexical Elements](#lexical-elements)
4. [Type System](#type-system)
5. [Declarations](#declarations)
6. [Expressions](#expressions)
7. [Statements](#statements)
8. [Functions](#functions)
9. [Scope and Linkage](#scope-and-linkage)
10. [Memory Model](#memory-model)
11. [Assembly Interface](#assembly-interface)
12. [Limitations](#limitations)
13. [Examples](#examples)
---
## 1. Introduction
Common is a statically-typed, imperative programming language that compiles to x86-32 assembly (NASM syntax). It provides a minimal yet complete set of features for systems programming:
- Integer types from 8 to 64 bits
- Pointers and arrays
- Functions with parameters
- Control flow (if, while, for, switch)
- Full operator set (arithmetic, logical, bitwise)
- Direct C library interoperability
### Design Philosophy
- **No runtime dependencies**: Compiled programs link only against libc
- **Explicit control**: No hidden allocations or implicit conversions
- **Predictable output**: Direct mapping to assembly
- **C compatibility**: Can call and be called by C code
---
## 2. Compiler Usage
### Building the Compiler
```bash
gcc -o common common.c
```
### Compiling Programs
```bash
# Compile Common source to NASM assembly
./common source.cm output.asm
# Assemble to object file
nasm -f elf32 output.asm -o output.o
# Link (requires 32-bit support)
gcc -m32 output.o -o executable
```
### One-Line Compilation
```bash
./common source.cm output.asm && nasm -f elf32 output.asm && gcc -m32 output.o -o program
```
### Compiler Output
The compiler writes NASM x86-32 assembly to stdout (or specified file) using:
- **ELF32** object format
- **cdecl** calling convention
- **Sections**: `.text`, `.data`, `.bss`
### Error Reporting
Errors are reported to stderr with line numbers:
```
line 42: syntax error near 'token'
line 15: Unknown char '~'
```
---
## 3. Lexical Elements
### Comments
```c
// Single-line comment (C++ style)
/* Multi-line comment
spanning multiple lines */
```
Comments are stripped during lexical analysis.
### Keywords
```
if else while for switch case default
break continue return
void uint8 uint16 uint32 uint64
int8 int16 int32 int64
```
### Identifiers
```
[a-zA-Z_][a-zA-Z0-9_]*
```
- Must start with letter or underscore
- Case-sensitive
- No length limit (internal buffer: 256 chars)
### Integer Literals
```c
42 // Decimal
0x2A // Hexadecimal
052 // Octal
0b101010 // Binary (if supported by strtoul)
```
Literals are parsed by `strtoul()` with base 0 (auto-detect).
### String Literals
```c
"Hello, World!"
"Line 1\nLine 2"
"Tab\there"
```
Supported escape sequences:
- `\n` - newline
- `\t` - tab
- `\r` - carriage return
- `\0` - null character
- `\\` - backslash
- `\"` - quote
- Any other `\x` - literal `x`
String literals are null-terminated and stored in `.data` section.
### Operators and Punctuation
**Multi-character operators**:
```
== != <= >= && || << >> ++ --
+= -= *= /= %= &= |= ^= <<= >>=
```
**Single-character operators**:
```
+ - * / % & | ^ ~ ! < > =
```
**Punctuation**:
```
( ) { } [ ] ; , : ?
```
---
## 4. Type System
### Integer Types
| Type | Size | Range (Unsigned) | Range (Signed) |
|---------|-------|----------------------|-----------------------------|
| uint8 | 1 byte| 0 to 255 | - |
| int8 | 1 byte| - | -128 to 127 |
| uint16 | 2 bytes| 0 to 65,535 | - |
| int16 | 2 bytes| - | -32,768 to 32,767 |
| uint32 | 4 bytes| 0 to 4,294,967,295 | - |
| int32 | 4 bytes| - | -2,147,483,648 to 2,147,483,647 |
| uint64 | 8 bytes| 0 to 2^64-1 | - |
| int64 | 8 bytes| - | -2^63 to 2^63-1 |
**Note**: 64-bit types are partially supported. They occupy 8 bytes in memory but arithmetic operations truncate to 32 bits on x86-32.
### Void Type
```c
void
```
- Used only for function return types
- Cannot declare variables of type void
- `void` in parameter list means "no parameters"
### Pointer Types
```c
int32 *ptr; // Pointer to int32
uint8 **pptr; // Pointer to pointer to uint8
void *generic; // Generic pointer (4 bytes)
```
- All pointers are 4 bytes (32-bit addresses)
- Pointer arithmetic scales by pointee size
- Can be cast between types
### Array Types
```c
int32 arr[10]; // Array of 10 int32
uint8 matrix[5][5]; // Not supported (single dimension only)
```
Arrays:
- Decay to pointers when used in expressions
- Cannot be returned from functions
- Cannot be assigned (use element-wise copy)
### Type Qualifiers
Common has no type qualifiers (no `const`, `volatile`, `restrict`).
---
## 5. Declarations
### Variable Declarations
**Local variables**:
```c
int32 x; // Uninitialized
int32 y = 42; // Initialized
uint8 c = 'A'; // Character (just an int)
```
**Global variables**:
```c
int32 global_var; // Zero-initialized (.bss)
int32 initialized = 100; // Explicitly initialized (.data)
```
### Array Declarations
**Local arrays**:
```c
int32 arr[10]; // Uninitialized
int32 nums[5] = { 1, 2, 3, 4, 5 }; // Initialized
uint8 partial[10] = { 1, 2 }; // Rest zero-filled
```
**Global arrays**:
```c
int32 global_arr[100]; // Zero-initialized (.bss)
int32 data[3] = { 10, 20, 30 }; // Initialized (.data)
```
### Pointer Declarations
```c
int32 *ptr; // Pointer to int32
uint8 *str; // Pointer to uint8 (common for strings)
void *generic; // Generic pointer
int32 **pptr; // Pointer to pointer
```
### Type Syntax
```
type_specifier ::= base_type pointer_suffix
base_type ::= "int8" | "int16" | "int32" | "int64"
| "uint8" | "uint16" | "uint32" | "uint64"
| "void"
pointer_suffix ::= ("*")*
```
Examples:
```c
int32 x; // Base type: int32, no pointers
uint8 *s; // Base type: uint8, 1 pointer level
void **pp; // Base type: void, 2 pointer levels
```
---
## 6. Expressions
### Primary Expressions
```c
42 // Integer literal
"string" // String literal
variable // Identifier
(expression) // Parenthesized expression
```
### Postfix Expressions
```c
array[index] // Array subscript
function(args) // Function call
expr++ // Post-increment
expr-- // Post-decrement
```
### Unary Expressions
```c
++expr // Pre-increment
--expr // Pre-decrement
-expr // Negation
!expr // Logical NOT
~expr // Bitwise NOT
&expr // Address-of
*expr // Dereference
(type)expr // Type cast
```
### Binary Expressions
**Arithmetic**:
```c
a + b // Addition
a - b // Subtraction
a * b // Multiplication
a / b // Division
a % b // Modulo
```
**Bitwise**:
```c
a & b // Bitwise AND
a | b // Bitwise OR
a ^ b // Bitwise XOR
a << b // Left shift
a >> b // Right shift (arithmetic for signed, logical for unsigned)
```
**Comparison**:
```c
a == b // Equal
a != b // Not equal
a < b // Less than
a <= b // Less than or equal
a > b // Greater than
a >= b // Greater than or equal
```
**Logical**:
```c
a && b // Logical AND (short-circuit)
a || b // Logical OR (short-circuit)
```
### Assignment Expressions
```c
a = b // Assignment
a += b // Add and assign
a -= b // Subtract and assign
a *= b // Multiply and assign
a /= b // Divide and assign
a %= b // Modulo and assign
a &= b // AND and assign
a |= b // OR and assign
a ^= b // XOR and assign
a <<= b // Left shift and assign
a >>= b // Right shift and assign
```
### Ternary Expression
```c
condition ? true_expr : false_expr
```
Example:
```c
max = (a > b) ? a : b;
```
### Operator Precedence
From highest to lowest:
| Level | Operators | Associativity |
|-------|----------------------------|---------------|
| 1 | `()` `[]` `++` `--` (post) | Left to right |
| 2 | `++` `--` (pre) `+` `-` `!` `~` `&` `*` `(cast)` | Right to left |
| 3 | `*` `/` `%` | Left to right |
| 4 | `+` `-` | Left to right |
| 5 | `<<` `>>` | Left to right |
| 6 | `<` `<=` `>` `>=` | Left to right |
| 7 | `==` `!=` | Left to right |
| 8 | `&` | Left to right |
| 9 | `^` | Left to right |
| 10 | `|` | Left to right |
| 11 | `&&` | Left to right |
| 12 | `||` | Left to right |
| 13 | `?:` | Right to left |
| 14 | `=` `+=` `-=` etc. | Right to left |
### Pointer Arithmetic
```c
int32 *p = arr;
p + 1 // Points to next int32 (address + 4)
p - 1 // Points to previous int32 (address - 4)
p[i] // Equivalent to *(p + i)
```
Pointer arithmetic automatically scales by the size of the pointed-to type:
- `uint8*` increments by 1
- `uint16*` increments by 2
- `int32*` increments by 4
- Any pointer-to-pointer increments by 4
### Type Conversions
**Explicit casting**:
```c
(uint8)value // Truncate to 8 bits
(int32)byte_value // Sign-extend or zero-extend
(uint32*)ptr // Pointer type conversion
```
**Implicit conversions**:
- Arrays decay to pointers
- Smaller integers promote to int32 in expressions
---
## 7. Statements
### Expression Statement
```c
expression;
```
Examples:
```c
x = 42;
function_call();
x++;
```
### Compound Statement (Block)
```c
{
statement1;
statement2;
...
}
```
Blocks create new scopes for local variables.
### If Statement
```c
if (condition)
statement
if (condition)
statement
else
statement
```
Examples:
```c
if (x > 0)
printf("positive\n");
if (x > 0) {
printf("positive\n");
} else if (x < 0) {
printf("negative\n");
} else {
printf("zero\n");
}
```
### While Statement
```c
while (condition)
statement
```
Example:
```c
while (x < 100) {
x = x * 2;
}
```
### For Statement
```c
for (init; condition; increment)
statement
```
The `init` can be:
- Empty: `for (; condition; increment)`
- Expression: `for (x = 0; x < 10; x++)`
- Declaration: `for (int32 i = 0; i < 10; i++)`
Example:
```c
for (int32 i = 0; i < 10; i = i + 1) {
sum = sum + i;
}
```
### Switch Statement
```c
switch (expression) {
case value1:
statements
break;
case value2:
statements
break;
default:
statements
}
```
- Cases must be integer constants
- Fall-through is allowed (no automatic break)
- `default` is optional
Example:
```c
switch (day) {
case 0:
printf("Sunday\n");
break;
case 6:
printf("Saturday\n");
break;
default:
printf("Weekday\n");
}
```
### Break Statement
```c
break;
```
Exits the innermost `while`, `for`, or `switch` statement.
### Continue Statement
```c
continue;
```
Skips to the next iteration of the innermost `while` or `for` loop.
### Return Statement
```c
return; // Return from void function
return expression; // Return value
```
Example:
```c
return 42;
return x + y;
return;
```
---
## 8. Functions
### Function Declarations
```c
return_type function_name(parameter_list);
```
Forward declaration (prototype):
```c
int32 add(int32 a, int32 b);
```
### Function Definitions
```c
return_type function_name(parameter_list) {
statements
}
```
Example:
```c
int32 add(int32 a, int32 b) {
return a + b;
}
```
### Parameters
```c
void no_params(void) { } // No parameters
int32 one_param(int32 x) { } // One parameter
int32 two_params(int32 x, uint8 *s) { } // Multiple parameters
```
Parameters are passed by value. To modify caller's data, use pointers:
```c
void swap(int32 *a, int32 *b) {
int32 temp = *a;
*a = *b;
*b = temp;
}
```
### Return Values
```c
int32 get_value(void) {
return 42;
}
void no_return(void) {
// No return statement needed
return; // Optional
}
```
Return value is passed in `eax` register (32-bit).
### Recursion
Recursion is fully supported:
```c
int32 factorial(int32 n) {
if (n <= 1)
return 1;
return n * factorial(n - 1);
}
```
### Calling Convention
Functions use **cdecl** convention:
- Arguments pushed right-to-left on stack
- Caller cleans up stack
- Return value in `eax`
- `eax`, `ecx`, `edx` are caller-saved
- `ebx`, `esi`, `edi`, `ebp` are callee-saved
### Calling C Functions
Common can call C library functions:
```c
// Declare C functions
void printf(uint8 *format, ...);
void *malloc(uint32 size);
void free(void *ptr);
int32 main(void) {
printf("Hello from Common\n");
void *mem = malloc(100);
free(mem);
return 0;
}
```
**Note**: Variadic functions (`...`) can be declared but not defined in Common.
---
## 9. Scope and Linkage
### Scope Rules
**Global scope**:
- Variables and functions declared outside any function
- Visible to all functions in the file
**Local scope**:
- Variables declared inside a function or block
- Visible only within that function/block
- Shadows global variables with the same name
**Block scope**:
```c
{
int32 x = 1;
{
int32 x = 2; // Different variable, shadows outer x
printf("%d\n", x); // Prints 2
}
printf("%d\n", x); // Prints 1
}
```
### Linkage
**External linkage** (default for functions):
```c
int32 global_function(void) { ... }
```
Symbol is exported (`global` directive in assembly).
**No linkage** (local variables):
```c
void func(void) {
int32 local; // No linkage
}
```
**Static linkage**: Not supported. All functions have external linkage.
### Name Resolution
1. Check local scope (function parameters and locals)
2. Check global scope
3. If not found, assumed to be external symbol
---
## 10. Memory Model
### Stack Layout
```
High Address
+------------------+
| Return address |
+------------------+
| Saved EBP | <-- EBP
+------------------+
| Local variable 1 | EBP - 4
+------------------+
| Local variable 2 | EBP - 8
+------------------+
| ... |
+------------------+
| Array data | (grows down)
+------------------+ <-- ESP
Low Address
```
### Function Call Stack
```c
caller():
push arg2
push arg1
call callee
add esp, 8 // Clean up arguments
callee(arg1, arg2):
push ebp // Save old frame pointer
mov ebp, esp // Set up new frame
sub esp, N // Allocate locals
...
mov esp, ebp // Restore stack
pop ebp
ret
```
Arguments accessed via `[ebp+8]`, `[ebp+12]`, etc.
Locals accessed via `[ebp-4]`, `[ebp-8]`, etc.
### Data Sections
**.text**: Read-only code
```nasm
section .text
function_name:
; assembly code
```
**.data**: Initialized data
```nasm
section .data
global_var: dd 42
string: db "Hello", 0
```
**.bss**: Zero-initialized data
```nasm
section .bss
uninit_var: resd 1
array: resb 100
```
### Size Directives
| Directive | Size | Common Type |
|-----------|-------|----------------|
| `resb`/`db` | 1 byte | uint8/int8 |
| `resw`/`dw` | 2 bytes| uint16/int16 |
| `resd`/`dd` | 4 bytes| uint32/int32/pointers |
| `resq`/`dq` | 8 bytes| uint64/int64 |
### Alignment
- Stack is 16-byte aligned (per System V ABI)
- Local variables are 4-byte aligned
- Arrays follow element alignment
---
## 11. Assembly Interface
### Generated Assembly Structure
```nasm
BITS 32
section .text
; External function declarations
extern printf
extern malloc
; Exported functions
global main
global my_function
main:
push ebp
mov ebp, esp
sub esp, 16 ; Allocate locals
; ... function body ...
mov esp, ebp
pop ebp
ret
section .data
_s0: db "Hello", 0 ; String literal
section .bss
global_var: resd 1 ; Global variable
```
### Register Usage
**Caller-saved** (may be modified by called function):
- `eax` - Return value, scratch
- `ecx` - Scratch, left operand
- `edx` - Scratch, division remainder
**Callee-saved** (preserved across calls):
- `ebx` - Base register
- `esi` - Source index
- `edi` - Destination index
- `ebp` - Frame pointer
- `esp` - Stack pointer
**Common usage**:
- `eax` - Expression results, return values
- `ecx` - Left operand in binary operations
- `[ebp+N]` - Function parameters
- `[ebp-N]` - Local variables
### Calling C from Assembly
```nasm
; Call: printf("Value: %d\n", x);
push dword [ebp-4] ; Push x
push _s0 ; Push format string
call printf
add esp, 8 ; Clean up (2 args × 4 bytes)
```
### Inline Assembly
Not supported. Use C library functions or write separate assembly files.
---
## 12. Limitations
### Language Limitations
1. **Single-file compilation only**
- No `#include` or import mechanism
- All code must be in one source file
- Use forward declarations for ordering
2. **No structures or unions**
- Can simulate with arrays: `node[0]` = data, `node[1]` = next
- Manual offset calculation required
3. **No floating point**
- Integer arithmetic only
- No `float`, `double`, or `long double`
4. **No preprocessor**
- No `#define`, `#ifdef`, etc.
- No macro expansion
- No file inclusion
5. **No enums**
- Use integer constants instead
6. **Limited 64-bit support**
- 64-bit types exist but operations truncate to 32-bit
- Full 64-bit arithmetic not implemented
7. **No static/extern keywords**
- All functions are global
- No static local variables
- No explicit extern declarations
8. **Single-dimensional arrays only**
- Multidimensional arrays not supported
- Can use pointer arithmetic for 2D: `arr[i * width + j]`
9. **No goto**
- Use loops and breaks instead
10. **No comma operator**
- Cannot use `a = (b, c)`
### Implementation Limitations
1. **Fixed buffer sizes**
- 256 identifiers/strings
- 256 local variables per function
- 256 global variables total
- 512 string literals
2. **No optimization**
- Generated code is unoptimized
- Expressions fully evaluated (no constant folding)
3. **Limited error messages**
- Basic syntax errors reported
- No semantic analysis warnings
- No type mismatch warnings
4. **x86-32 only**
- Not portable to other architectures
- Requires 32-bit toolchain
### Workarounds
**Structures**: Use arrays
```c
// Instead of: struct { int x; int y; } point;
int32 point[2]; // point[0] = x, point[1] = y
```
**Multidimensional arrays**: Manual indexing
```c
// Instead of: int matrix[10][10];
int32 matrix[100];
int32 value = matrix[row * 10 + col];
```
**Enums**: Integer constants
```c
// Instead of: enum { RED, GREEN, BLUE };
int32 RED = 0;
int32 GREEN = 1;
int32 BLUE = 2;
```
---
## 13. Examples
### Hello World
```c
void puts(uint8 *s);
int32 main(void) {
puts("Hello, World!");
return 0;
}
```
### Factorial (Iterative)
```c
int32 factorial(int32 n) {
int32 result = 1;
for (int32 i = 2; i <= n; i = i + 1) {
result = result * i;
}
return result;
}
```
### Fibonacci (Recursive)
```c
int32 fib(int32 n) {
if (n <= 1)
return n;
return fib(n - 1) + fib(n - 2);
}
```
### String Length
```c
int32 strlen(uint8 *s) {
int32 len = 0;
while (s[len])
len = len + 1;
return len;
}
```
### Array Sum
```c
int32 sum_array(int32 *arr, int32 len) {
int32 total = 0;
for (int32 i = 0; i < len; i = i + 1) {
total = total + arr[i];
}
return total;
}
```
### Pointer Swap
```c
void swap(int32 *a, int32 *b) {
int32 temp = *a;
*a = *b;
*b = temp;
}
```
### Bubble Sort
```c
void bubble_sort(int32 *arr, int32 n) {
for (int32 i = 0; i < n - 1; i = i + 1) {
for (int32 j = 0; j < n - i - 1; j = j + 1) {
if (arr[j] > arr[j + 1]) {
int32 temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
}
}
```
### Binary Search
```c
int32 binary_search(int32 *arr, int32 n, int32 target) {
int32 left = 0;
int32 right = n - 1;
while (left <= right) {
int32 mid = left + (right - left) / 2;
if (arr[mid] == target)
return mid;
if (arr[mid] < target)
left = mid + 1;
else
right = mid - 1;
}
return -1; // Not found
}
```
### Linked List (Simulated)
```c
void *malloc(uint32 size);
void free(void *ptr);
// Node: [0] = data, [1] = next pointer
int32 *create_node(int32 value) {
int32 *node = (int32*)malloc(8);
node[0] = value;
node[1] = 0;
return node;
}
void insert_front(int32 **head, int32 value) {
int32 *new_node = create_node(value);
new_node[1] = (int32)(*head);
*head = new_node;
}
```
### Bitwise Operations
```c
// Check if bit N is set
int32 is_bit_set(uint32 value, int32 n) {
return (value >> n) & 1;
}
// Set bit N
uint32 set_bit(uint32 value, int32 n) {
return value | (1 << n);
}
// Clear bit N
uint32 clear_bit(uint32 value, int32 n) {
return value & ~(1 << n);
}
// Toggle bit N
uint32 toggle_bit(uint32 value, int32 n) {
return value ^ (1 << n);
}
```
---
## Appendix A: Grammar Summary
```
program ::= declaration*
declaration ::=
| type_spec identifier "(" param_list ")" ( ";" | block )
| type_spec identifier ";"
| type_spec identifier "=" expr ";"
| type_spec identifier "[" expr "]" ";"
| type_spec identifier "[" expr "]" "=" "{" expr_list "}" ";"
type_spec ::= base_type "*"*
base_type ::= "void" | "int8" | "int16" | "int32" | "int64"
| "uint8" | "uint16" | "uint32" | "uint64"
param_list ::= "void" | ( param ( "," param )* )?
param ::= type_spec identifier
block ::= "{" statement* "}"
statement ::=
| block
| type_spec identifier ";"
| type_spec identifier "=" expr ";"
| type_spec identifier "[" expr "]" ( "=" "{" expr_list "}" )? ";"
| expr ";"
| "if" "(" expr ")" statement ( "else" statement )?
| "while" "(" expr ")" statement
| "for" "(" (decl | expr)? ";" expr? ";" expr? ")" statement
| "switch" "(" expr ")" "{" case_clause* "}"
| "return" expr? ";"
| "break" ";"
| "continue" ";"
case_clause ::=
| "case" expr ":" statement*
| "default" ":" statement*
expr ::= assignment
assignment ::= ternary ( assign_op ternary )?
assign_op ::= "=" | "+=" | "-=" | "*=" | "/=" | "%="
| "&=" | "|=" | "^=" | "<<=" | ">>="
ternary ::= logical_or ( "?" expr ":" ternary )?
logical_or ::= logical_and ( "||" logical_and )*
logical_and ::= bit_or ( "&&" bit_or )*
bit_or ::= bit_xor ( "|" bit_xor )*
bit_xor ::= bit_and ( "^" bit_and )*
bit_and ::= equality ( "&" equality )*
equality ::= relational ( ("==" | "!=") relational )*
relational ::= shift ( ("<" | "<=" | ">" | ">=") shift )*
shift ::= additive ( ("<<" | ">>") additive )*
additive ::= multiplicative ( ("+" | "-") multiplicative )*
multiplicative ::= unary ( ("*" | "/" | "%") unary )*
unary ::=
| postfix
| "++" unary
| "--" unary
| "-" unary
| "!" unary
| "~" unary
| "&" unary
| "*" unary
| "(" type_spec ")" unary
postfix ::=
| primary
| postfix "[" expr "]"
| postfix "(" expr_list? ")"
| postfix "++"
| postfix "--"
primary ::=
| integer_literal
| string_literal
| identifier
| "(" expr ")"
```
---
## Appendix B: Quick Reference Card
**Types**: void, int8, int16, int32, int64, uint8, uint16, uint32, uint64
**Operators**: + - * / % & | ^ ~ ! < > <= >= == != << >> && || ?: = += -= *= /= %= &= |= ^= <<= >>= ++ -- & * []
**Keywords**: if else while for switch case default break continue return
**Control**: if/else, while, for, switch/case, break, continue, return
**Functions**: type name(params) { body }
**Arrays**: type name[size], type name[size] = { values }
**Pointers**: type *name, &var, *ptr, ptr[index]
**Comments**: // line, /* block */
---
*End of Common Language Reference Manual*