24 KiB
Common Language Reference Manual
Version 1.0
Target: x86-32 (IA-32) Linux ELF
Calling Convention: cdecl
Author: Common Compiler Project
Table of Contents
- Introduction
- Compiler Usage
- Lexical Elements
- Type System
- Declarations
- Expressions
- Statements
- Functions
- Scope and Linkage
- Memory Model
- Assembly Interface
- Limitations
- Examples
1. Introduction
Common is a statically-typed, imperative programming language that compiles to x86-32 assembly (NASM syntax). It provides a minimal yet complete set of features for systems programming:
- Integer types from 8 to 64 bits
- Pointers and arrays
- Functions with parameters
- Control flow (if, while, for, switch)
- Full operator set (arithmetic, logical, bitwise)
- Direct C library interoperability
Design Philosophy
- No runtime dependencies: Compiled programs link only against libc
- Explicit control: No hidden allocations or implicit conversions
- Predictable output: Direct mapping to assembly
- C compatibility: Can call and be called by C code
2. Compiler Usage
Building the Compiler
gcc -o common common.c
Compiling Programs
# Compile Common source to NASM assembly
./common source.cm output.asm
# Assemble to object file
nasm -f elf32 output.asm -o output.o
# Link (requires 32-bit support)
gcc -m32 output.o -o executable
One-Line Compilation
./common source.cm output.asm && nasm -f elf32 output.asm && gcc -m32 output.o -o program
Compiler Output
The compiler writes NASM x86-32 assembly to stdout (or specified file) using:
- ELF32 object format
- cdecl calling convention
- Sections:
.text,.data,.bss
Error Reporting
Errors are reported to stderr with line numbers:
line 42: syntax error near 'token'
line 15: Unknown char '~'
3. Lexical Elements
Comments
// Single-line comment (C++ style)
/* Multi-line comment
spanning multiple lines */
Comments are stripped during lexical analysis.
Keywords
if else while for switch case default
break continue return
void uint8 uint16 uint32 uint64
int8 int16 int32 int64
Identifiers
[a-zA-Z_][a-zA-Z0-9_]*
- Must start with letter or underscore
- Case-sensitive
- No length limit (internal buffer: 256 chars)
Integer Literals
42 // Decimal
0x2A // Hexadecimal
052 // Octal
0b101010 // Binary (if supported by strtoul)
Literals are parsed by strtoul() with base 0 (auto-detect).
String Literals
"Hello, World!"
"Line 1\nLine 2"
"Tab\there"
Supported escape sequences:
\n- newline\t- tab\r- carriage return\0- null character\\- backslash\"- quote- Any other
\x- literalx
String literals are null-terminated and stored in .data section.
Operators and Punctuation
Multi-character operators:
== != <= >= && || << >> ++ --
+= -= *= /= %= &= |= ^= <<= >>=
Single-character operators:
+ - * / % & | ^ ~ ! < > =
Punctuation:
( ) { } [ ] ; , : ?
4. Type System
Integer Types
| Type | Size | Range (Unsigned) | Range (Signed) |
|---|---|---|---|
| uint8 | 1 byte | 0 to 255 | - |
| int8 | 1 byte | - | -128 to 127 |
| uint16 | 2 bytes | 0 to 65,535 | - |
| int16 | 2 bytes | - | -32,768 to 32,767 |
| uint32 | 4 bytes | 0 to 4,294,967,295 | - |
| int32 | 4 bytes | - | -2,147,483,648 to 2,147,483,647 |
| uint64 | 8 bytes | 0 to 2^64-1 | - |
| int64 | 8 bytes | - | -2^63 to 2^63-1 |
Note: 64-bit types are partially supported. They occupy 8 bytes in memory but arithmetic operations truncate to 32 bits on x86-32.
Void Type
void
- Used only for function return types
- Cannot declare variables of type void
voidin parameter list means "no parameters"
Pointer Types
int32 *ptr; // Pointer to int32
uint8 **pptr; // Pointer to pointer to uint8
void *generic; // Generic pointer (4 bytes)
- All pointers are 4 bytes (32-bit addresses)
- Pointer arithmetic scales by pointee size
- Can be cast between types
Array Types
int32 arr[10]; // Array of 10 int32
uint8 matrix[5][5]; // Not supported (single dimension only)
Arrays:
- Decay to pointers when used in expressions
- Cannot be returned from functions
- Cannot be assigned (use element-wise copy)
Type Qualifiers
Common has no type qualifiers (no const, volatile, restrict).
5. Declarations
Variable Declarations
Local variables:
int32 x; // Uninitialized
int32 y = 42; // Initialized
uint8 c = 'A'; // Character (just an int)
Global variables:
int32 global_var; // Zero-initialized (.bss)
int32 initialized = 100; // Explicitly initialized (.data)
Array Declarations
Local arrays:
int32 arr[10]; // Uninitialized
int32 nums[5] = { 1, 2, 3, 4, 5 }; // Initialized
uint8 partial[10] = { 1, 2 }; // Rest zero-filled
Global arrays:
int32 global_arr[100]; // Zero-initialized (.bss)
int32 data[3] = { 10, 20, 30 }; // Initialized (.data)
Pointer Declarations
int32 *ptr; // Pointer to int32
uint8 *str; // Pointer to uint8 (common for strings)
void *generic; // Generic pointer
int32 **pptr; // Pointer to pointer
Type Syntax
type_specifier ::= base_type pointer_suffix
base_type ::= "int8" | "int16" | "int32" | "int64"
| "uint8" | "uint16" | "uint32" | "uint64"
| "void"
pointer_suffix ::= ("*")*
Examples:
int32 x; // Base type: int32, no pointers
uint8 *s; // Base type: uint8, 1 pointer level
void **pp; // Base type: void, 2 pointer levels
6. Expressions
Primary Expressions
42 // Integer literal
"string" // String literal
variable // Identifier
(expression) // Parenthesized expression
Postfix Expressions
array[index] // Array subscript
function(args) // Function call
expr++ // Post-increment
expr-- // Post-decrement
Unary Expressions
++expr // Pre-increment
--expr // Pre-decrement
-expr // Negation
!expr // Logical NOT
~expr // Bitwise NOT
&expr // Address-of
*expr // Dereference
(type)expr // Type cast
Binary Expressions
Arithmetic:
a + b // Addition
a - b // Subtraction
a * b // Multiplication
a / b // Division
a % b // Modulo
Bitwise:
a & b // Bitwise AND
a | b // Bitwise OR
a ^ b // Bitwise XOR
a << b // Left shift
a >> b // Right shift (arithmetic for signed, logical for unsigned)
Comparison:
a == b // Equal
a != b // Not equal
a < b // Less than
a <= b // Less than or equal
a > b // Greater than
a >= b // Greater than or equal
Logical:
a && b // Logical AND (short-circuit)
a || b // Logical OR (short-circuit)
Assignment Expressions
a = b // Assignment
a += b // Add and assign
a -= b // Subtract and assign
a *= b // Multiply and assign
a /= b // Divide and assign
a %= b // Modulo and assign
a &= b // AND and assign
a |= b // OR and assign
a ^= b // XOR and assign
a <<= b // Left shift and assign
a >>= b // Right shift and assign
Ternary Expression
condition ? true_expr : false_expr
Example:
max = (a > b) ? a : b;
Operator Precedence
From highest to lowest:
| Level | Operators | Associativity |
|---|---|---|
| 1 | () [] ++ -- (post) |
Left to right |
| 2 | ++ -- (pre) + - ! ~ & * (cast) |
Right to left |
| 3 | * / % |
Left to right |
| 4 | + - |
Left to right |
| 5 | << >> |
Left to right |
| 6 | < <= > >= |
Left to right |
| 7 | == != |
Left to right |
| 8 | & |
Left to right |
| 9 | ^ |
Left to right |
| 10 | ` | ` |
| 11 | && |
Left to right |
| 12 | ` | |
| 13 | ?: |
Right to left |
| 14 | = += -= etc. |
Right to left |
Pointer Arithmetic
int32 *p = arr;
p + 1 // Points to next int32 (address + 4)
p - 1 // Points to previous int32 (address - 4)
p[i] // Equivalent to *(p + i)
Pointer arithmetic automatically scales by the size of the pointed-to type:
uint8*increments by 1uint16*increments by 2int32*increments by 4- Any pointer-to-pointer increments by 4
Type Conversions
Explicit casting:
(uint8)value // Truncate to 8 bits
(int32)byte_value // Sign-extend or zero-extend
(uint32*)ptr // Pointer type conversion
Implicit conversions:
- Arrays decay to pointers
- Smaller integers promote to int32 in expressions
7. Statements
Expression Statement
expression;
Examples:
x = 42;
function_call();
x++;
Compound Statement (Block)
{
statement1;
statement2;
...
}
Blocks create new scopes for local variables.
If Statement
if (condition)
statement
if (condition)
statement
else
statement
Examples:
if (x > 0)
printf("positive\n");
if (x > 0) {
printf("positive\n");
} else if (x < 0) {
printf("negative\n");
} else {
printf("zero\n");
}
While Statement
while (condition)
statement
Example:
while (x < 100) {
x = x * 2;
}
For Statement
for (init; condition; increment)
statement
The init can be:
- Empty:
for (; condition; increment) - Expression:
for (x = 0; x < 10; x++) - Declaration:
for (int32 i = 0; i < 10; i++)
Example:
for (int32 i = 0; i < 10; i = i + 1) {
sum = sum + i;
}
Switch Statement
switch (expression) {
case value1:
statements
break;
case value2:
statements
break;
default:
statements
}
- Cases must be integer constants
- Fall-through is allowed (no automatic break)
defaultis optional
Example:
switch (day) {
case 0:
printf("Sunday\n");
break;
case 6:
printf("Saturday\n");
break;
default:
printf("Weekday\n");
}
Break Statement
break;
Exits the innermost while, for, or switch statement.
Continue Statement
continue;
Skips to the next iteration of the innermost while or for loop.
Return Statement
return; // Return from void function
return expression; // Return value
Example:
return 42;
return x + y;
return;
8. Functions
Function Declarations
return_type function_name(parameter_list);
Forward declaration (prototype):
int32 add(int32 a, int32 b);
Function Definitions
return_type function_name(parameter_list) {
statements
}
Example:
int32 add(int32 a, int32 b) {
return a + b;
}
Parameters
void no_params(void) { } // No parameters
int32 one_param(int32 x) { } // One parameter
int32 two_params(int32 x, uint8 *s) { } // Multiple parameters
Parameters are passed by value. To modify caller's data, use pointers:
void swap(int32 *a, int32 *b) {
int32 temp = *a;
*a = *b;
*b = temp;
}
Return Values
int32 get_value(void) {
return 42;
}
void no_return(void) {
// No return statement needed
return; // Optional
}
Return value is passed in eax register (32-bit).
Recursion
Recursion is fully supported:
int32 factorial(int32 n) {
if (n <= 1)
return 1;
return n * factorial(n - 1);
}
Calling Convention
Functions use cdecl convention:
- Arguments pushed right-to-left on stack
- Caller cleans up stack
- Return value in
eax eax,ecx,edxare caller-savedebx,esi,edi,ebpare callee-saved
Calling C Functions
Common can call C library functions:
// Declare C functions
void printf(uint8 *format, ...);
void *malloc(uint32 size);
void free(void *ptr);
int32 main(void) {
printf("Hello from Common\n");
void *mem = malloc(100);
free(mem);
return 0;
}
Note: Variadic functions (...) can be declared but not defined in Common.
9. Scope and Linkage
Scope Rules
Global scope:
- Variables and functions declared outside any function
- Visible to all functions in the file
Local scope:
- Variables declared inside a function or block
- Visible only within that function/block
- Shadows global variables with the same name
Block scope:
{
int32 x = 1;
{
int32 x = 2; // Different variable, shadows outer x
printf("%d\n", x); // Prints 2
}
printf("%d\n", x); // Prints 1
}
Linkage
External linkage (default for functions):
int32 global_function(void) { ... }
Symbol is exported (global directive in assembly).
No linkage (local variables):
void func(void) {
int32 local; // No linkage
}
Static linkage: Not supported. All functions have external linkage.
Name Resolution
- Check local scope (function parameters and locals)
- Check global scope
- If not found, assumed to be external symbol
10. Memory Model
Stack Layout
High Address
+------------------+
| Return address |
+------------------+
| Saved EBP | <-- EBP
+------------------+
| Local variable 1 | EBP - 4
+------------------+
| Local variable 2 | EBP - 8
+------------------+
| ... |
+------------------+
| Array data | (grows down)
+------------------+ <-- ESP
Low Address
Function Call Stack
caller():
push arg2
push arg1
call callee
add esp, 8 // Clean up arguments
callee(arg1, arg2):
push ebp // Save old frame pointer
mov ebp, esp // Set up new frame
sub esp, N // Allocate locals
...
mov esp, ebp // Restore stack
pop ebp
ret
Arguments accessed via [ebp+8], [ebp+12], etc.
Locals accessed via [ebp-4], [ebp-8], etc.
Data Sections
.text: Read-only code
section .text
function_name:
; assembly code
.data: Initialized data
section .data
global_var: dd 42
string: db "Hello", 0
.bss: Zero-initialized data
section .bss
uninit_var: resd 1
array: resb 100
Size Directives
| Directive | Size | Common Type |
|---|---|---|
resb/db |
1 byte | uint8/int8 |
resw/dw |
2 bytes | uint16/int16 |
resd/dd |
4 bytes | uint32/int32/pointers |
resq/dq |
8 bytes | uint64/int64 |
Alignment
- Stack is 16-byte aligned (per System V ABI)
- Local variables are 4-byte aligned
- Arrays follow element alignment
11. Assembly Interface
Generated Assembly Structure
BITS 32
section .text
; External function declarations
extern printf
extern malloc
; Exported functions
global main
global my_function
main:
push ebp
mov ebp, esp
sub esp, 16 ; Allocate locals
; ... function body ...
mov esp, ebp
pop ebp
ret
section .data
_s0: db "Hello", 0 ; String literal
section .bss
global_var: resd 1 ; Global variable
Register Usage
Caller-saved (may be modified by called function):
eax- Return value, scratchecx- Scratch, left operandedx- Scratch, division remainder
Callee-saved (preserved across calls):
ebx- Base registeresi- Source indexedi- Destination indexebp- Frame pointeresp- Stack pointer
Common usage:
eax- Expression results, return valuesecx- Left operand in binary operations[ebp+N]- Function parameters[ebp-N]- Local variables
Calling C from Assembly
; Call: printf("Value: %d\n", x);
push dword [ebp-4] ; Push x
push _s0 ; Push format string
call printf
add esp, 8 ; Clean up (2 args × 4 bytes)
Inline Assembly
Not supported. Use C library functions or write separate assembly files.
12. Limitations
Language Limitations
-
Single-file compilation only
- No
#includeor import mechanism - All code must be in one source file
- Use forward declarations for ordering
- No
-
No structures or unions
- Can simulate with arrays:
node[0]= data,node[1]= next - Manual offset calculation required
- Can simulate with arrays:
-
No floating point
- Integer arithmetic only
- No
float,double, orlong double
-
No preprocessor
- No
#define,#ifdef, etc. - No macro expansion
- No file inclusion
- No
-
No enums
- Use integer constants instead
-
Limited 64-bit support
- 64-bit types exist but operations truncate to 32-bit
- Full 64-bit arithmetic not implemented
-
No static/extern keywords
- All functions are global
- No static local variables
- No explicit extern declarations
-
Single-dimensional arrays only
- Multidimensional arrays not supported
- Can use pointer arithmetic for 2D:
arr[i * width + j]
-
No goto
- Use loops and breaks instead
-
No comma operator
- Cannot use
a = (b, c)
- Cannot use
Implementation Limitations
-
Fixed buffer sizes
- 256 identifiers/strings
- 256 local variables per function
- 256 global variables total
- 512 string literals
-
No optimization
- Generated code is unoptimized
- Expressions fully evaluated (no constant folding)
-
Limited error messages
- Basic syntax errors reported
- No semantic analysis warnings
- No type mismatch warnings
-
x86-32 only
- Not portable to other architectures
- Requires 32-bit toolchain
Workarounds
Structures: Use arrays
// Instead of: struct { int x; int y; } point;
int32 point[2]; // point[0] = x, point[1] = y
Multidimensional arrays: Manual indexing
// Instead of: int matrix[10][10];
int32 matrix[100];
int32 value = matrix[row * 10 + col];
Enums: Integer constants
// Instead of: enum { RED, GREEN, BLUE };
int32 RED = 0;
int32 GREEN = 1;
int32 BLUE = 2;
13. Examples
Hello World
void puts(uint8 *s);
int32 main(void) {
puts("Hello, World!");
return 0;
}
Factorial (Iterative)
int32 factorial(int32 n) {
int32 result = 1;
for (int32 i = 2; i <= n; i = i + 1) {
result = result * i;
}
return result;
}
Fibonacci (Recursive)
int32 fib(int32 n) {
if (n <= 1)
return n;
return fib(n - 1) + fib(n - 2);
}
String Length
int32 strlen(uint8 *s) {
int32 len = 0;
while (s[len])
len = len + 1;
return len;
}
Array Sum
int32 sum_array(int32 *arr, int32 len) {
int32 total = 0;
for (int32 i = 0; i < len; i = i + 1) {
total = total + arr[i];
}
return total;
}
Pointer Swap
void swap(int32 *a, int32 *b) {
int32 temp = *a;
*a = *b;
*b = temp;
}
Bubble Sort
void bubble_sort(int32 *arr, int32 n) {
for (int32 i = 0; i < n - 1; i = i + 1) {
for (int32 j = 0; j < n - i - 1; j = j + 1) {
if (arr[j] > arr[j + 1]) {
int32 temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
}
}
Binary Search
int32 binary_search(int32 *arr, int32 n, int32 target) {
int32 left = 0;
int32 right = n - 1;
while (left <= right) {
int32 mid = left + (right - left) / 2;
if (arr[mid] == target)
return mid;
if (arr[mid] < target)
left = mid + 1;
else
right = mid - 1;
}
return -1; // Not found
}
Linked List (Simulated)
void *malloc(uint32 size);
void free(void *ptr);
// Node: [0] = data, [1] = next pointer
int32 *create_node(int32 value) {
int32 *node = (int32*)malloc(8);
node[0] = value;
node[1] = 0;
return node;
}
void insert_front(int32 **head, int32 value) {
int32 *new_node = create_node(value);
new_node[1] = (int32)(*head);
*head = new_node;
}
Bitwise Operations
// Check if bit N is set
int32 is_bit_set(uint32 value, int32 n) {
return (value >> n) & 1;
}
// Set bit N
uint32 set_bit(uint32 value, int32 n) {
return value | (1 << n);
}
// Clear bit N
uint32 clear_bit(uint32 value, int32 n) {
return value & ~(1 << n);
}
// Toggle bit N
uint32 toggle_bit(uint32 value, int32 n) {
return value ^ (1 << n);
}
Appendix A: Grammar Summary
program ::= declaration*
declaration ::=
| type_spec identifier "(" param_list ")" ( ";" | block )
| type_spec identifier ";"
| type_spec identifier "=" expr ";"
| type_spec identifier "[" expr "]" ";"
| type_spec identifier "[" expr "]" "=" "{" expr_list "}" ";"
type_spec ::= base_type "*"*
base_type ::= "void" | "int8" | "int16" | "int32" | "int64"
| "uint8" | "uint16" | "uint32" | "uint64"
param_list ::= "void" | ( param ( "," param )* )?
param ::= type_spec identifier
block ::= "{" statement* "}"
statement ::=
| block
| type_spec identifier ";"
| type_spec identifier "=" expr ";"
| type_spec identifier "[" expr "]" ( "=" "{" expr_list "}" )? ";"
| expr ";"
| "if" "(" expr ")" statement ( "else" statement )?
| "while" "(" expr ")" statement
| "for" "(" (decl | expr)? ";" expr? ";" expr? ")" statement
| "switch" "(" expr ")" "{" case_clause* "}"
| "return" expr? ";"
| "break" ";"
| "continue" ";"
case_clause ::=
| "case" expr ":" statement*
| "default" ":" statement*
expr ::= assignment
assignment ::= ternary ( assign_op ternary )?
assign_op ::= "=" | "+=" | "-=" | "*=" | "/=" | "%="
| "&=" | "|=" | "^=" | "<<=" | ">>="
ternary ::= logical_or ( "?" expr ":" ternary )?
logical_or ::= logical_and ( "||" logical_and )*
logical_and ::= bit_or ( "&&" bit_or )*
bit_or ::= bit_xor ( "|" bit_xor )*
bit_xor ::= bit_and ( "^" bit_and )*
bit_and ::= equality ( "&" equality )*
equality ::= relational ( ("==" | "!=") relational )*
relational ::= shift ( ("<" | "<=" | ">" | ">=") shift )*
shift ::= additive ( ("<<" | ">>") additive )*
additive ::= multiplicative ( ("+" | "-") multiplicative )*
multiplicative ::= unary ( ("*" | "/" | "%") unary )*
unary ::=
| postfix
| "++" unary
| "--" unary
| "-" unary
| "!" unary
| "~" unary
| "&" unary
| "*" unary
| "(" type_spec ")" unary
postfix ::=
| primary
| postfix "[" expr "]"
| postfix "(" expr_list? ")"
| postfix "++"
| postfix "--"
primary ::=
| integer_literal
| string_literal
| identifier
| "(" expr ")"
Appendix B: Quick Reference Card
Types: void, int8, int16, int32, int64, uint8, uint16, uint32, uint64
Operators: + - * / % & | ^ ~ ! < > <= >= == != << >> && || ?: = += -= *= /= %= &= |= ^= <<= >>= ++ -- & * []
Keywords: if else while for switch case default break continue return
Control: if/else, while, for, switch/case, break, continue, return
Functions: type name(params) { body }
Arrays: type name[size], type name[size] = { values }
Pointers: type *name, &var, *ptr, ptr[index]
Comments: // line, /* block */
End of Common Language Reference Manual