Files
common/MANUAL.md
2026-03-14 14:14:37 -04:00

24 KiB
Raw Permalink Blame History

Common Language Reference Manual

Version 1.0
Target: x86-32 (IA-32) Linux ELF
Calling Convention: cdecl
Author: Common Compiler Project


Table of Contents

  1. Introduction
  2. Compiler Usage
  3. Lexical Elements
  4. Type System
  5. Declarations
  6. Expressions
  7. Statements
  8. Functions
  9. Scope and Linkage
  10. Memory Model
  11. Assembly Interface
  12. Limitations
  13. Examples

1. Introduction

Common is a statically-typed, imperative programming language that compiles to x86-32 assembly (NASM syntax). It provides a minimal yet complete set of features for systems programming:

  • Integer types from 8 to 64 bits
  • Pointers and arrays
  • Functions with parameters
  • Control flow (if, while, for, switch)
  • Full operator set (arithmetic, logical, bitwise)
  • Direct C library interoperability

Design Philosophy

  • No runtime dependencies: Compiled programs link only against libc
  • Explicit control: No hidden allocations or implicit conversions
  • Predictable output: Direct mapping to assembly
  • C compatibility: Can call and be called by C code

2. Compiler Usage

Building the Compiler

gcc -o common common.c

Compiling Programs

# Compile Common source to NASM assembly
./common source.cm output.asm

# Assemble to object file
nasm -f elf32 output.asm -o output.o

# Link (requires 32-bit support)
gcc -m32 output.o -o executable

One-Line Compilation

./common source.cm output.asm && nasm -f elf32 output.asm && gcc -m32 output.o -o program

Compiler Output

The compiler writes NASM x86-32 assembly to stdout (or specified file) using:

  • ELF32 object format
  • cdecl calling convention
  • Sections: .text, .data, .bss

Error Reporting

Errors are reported to stderr with line numbers:

line 42: syntax error near 'token'
line 15: Unknown char '~'

3. Lexical Elements

Comments

// Single-line comment (C++ style)

/* Multi-line comment
   spanning multiple lines */

Comments are stripped during lexical analysis.

Keywords

if       else     while    for      switch   case     default
break    continue return
void     uint8    uint16   uint32   uint64
         int8     int16    int32    int64

Identifiers

[a-zA-Z_][a-zA-Z0-9_]*
  • Must start with letter or underscore
  • Case-sensitive
  • No length limit (internal buffer: 256 chars)

Integer Literals

42          // Decimal
0x2A        // Hexadecimal
052         // Octal
0b101010    // Binary (if supported by strtoul)

Literals are parsed by strtoul() with base 0 (auto-detect).

String Literals

"Hello, World!"
"Line 1\nLine 2"
"Tab\there"

Supported escape sequences:

  • \n - newline
  • \t - tab
  • \r - carriage return
  • \0 - null character
  • \\ - backslash
  • \" - quote
  • Any other \x - literal x

String literals are null-terminated and stored in .data section.

Operators and Punctuation

Multi-character operators:

==  !=  <=  >=  &&  ||  <<  >>  ++  --
+=  -=  *=  /=  %=  &=  |=  ^=  <<=  >>=

Single-character operators:

+  -  *  /  %  &  |  ^  ~  !  <  >  =

Punctuation:

(  )  {  }  [  ]  ;  ,  :  ?

4. Type System

Integer Types

Type Size Range (Unsigned) Range (Signed)
uint8 1 byte 0 to 255 -
int8 1 byte - -128 to 127
uint16 2 bytes 0 to 65,535 -
int16 2 bytes - -32,768 to 32,767
uint32 4 bytes 0 to 4,294,967,295 -
int32 4 bytes - -2,147,483,648 to 2,147,483,647
uint64 8 bytes 0 to 2^64-1 -
int64 8 bytes - -2^63 to 2^63-1

Note: 64-bit types are partially supported. They occupy 8 bytes in memory but arithmetic operations truncate to 32 bits on x86-32.

Void Type

void
  • Used only for function return types
  • Cannot declare variables of type void
  • void in parameter list means "no parameters"

Pointer Types

int32 *ptr;        // Pointer to int32
uint8 **pptr;      // Pointer to pointer to uint8
void *generic;     // Generic pointer (4 bytes)
  • All pointers are 4 bytes (32-bit addresses)
  • Pointer arithmetic scales by pointee size
  • Can be cast between types

Array Types

int32 arr[10];           // Array of 10 int32
uint8 matrix[5][5];      // Not supported (single dimension only)

Arrays:

  • Decay to pointers when used in expressions
  • Cannot be returned from functions
  • Cannot be assigned (use element-wise copy)

Type Qualifiers

Common has no type qualifiers (no const, volatile, restrict).


5. Declarations

Variable Declarations

Local variables:

int32 x;              // Uninitialized
int32 y = 42;         // Initialized
uint8 c = 'A';        // Character (just an int)

Global variables:

int32 global_var;           // Zero-initialized (.bss)
int32 initialized = 100;    // Explicitly initialized (.data)

Array Declarations

Local arrays:

int32 arr[10];                          // Uninitialized
int32 nums[5] = { 1, 2, 3, 4, 5 };     // Initialized
uint8 partial[10] = { 1, 2 };          // Rest zero-filled

Global arrays:

int32 global_arr[100];                  // Zero-initialized (.bss)
int32 data[3] = { 10, 20, 30 };        // Initialized (.data)

Pointer Declarations

int32 *ptr;              // Pointer to int32
uint8 *str;              // Pointer to uint8 (common for strings)
void *generic;           // Generic pointer
int32 **pptr;            // Pointer to pointer

Type Syntax

type_specifier ::= base_type pointer_suffix
base_type      ::= "int8" | "int16" | "int32" | "int64"
                 | "uint8" | "uint16" | "uint32" | "uint64"
                 | "void"
pointer_suffix ::= ("*")*

Examples:

int32 x;         // Base type: int32, no pointers
uint8 *s;        // Base type: uint8, 1 pointer level
void **pp;       // Base type: void, 2 pointer levels

6. Expressions

Primary Expressions

42              // Integer literal
"string"        // String literal
variable        // Identifier
(expression)    // Parenthesized expression

Postfix Expressions

array[index]           // Array subscript
function(args)         // Function call
expr++                 // Post-increment
expr--                 // Post-decrement

Unary Expressions

++expr          // Pre-increment
--expr          // Pre-decrement
-expr           // Negation
!expr           // Logical NOT
~expr           // Bitwise NOT
&expr           // Address-of
*expr           // Dereference
(type)expr      // Type cast

Binary Expressions

Arithmetic:

a + b           // Addition
a - b           // Subtraction
a * b           // Multiplication
a / b           // Division
a % b           // Modulo

Bitwise:

a & b           // Bitwise AND
a | b           // Bitwise OR
a ^ b           // Bitwise XOR
a << b          // Left shift
a >> b          // Right shift (arithmetic for signed, logical for unsigned)

Comparison:

a == b          // Equal
a != b          // Not equal
a < b           // Less than
a <= b          // Less than or equal
a > b           // Greater than
a >= b          // Greater than or equal

Logical:

a && b          // Logical AND (short-circuit)
a || b          // Logical OR (short-circuit)

Assignment Expressions

a = b           // Assignment
a += b          // Add and assign
a -= b          // Subtract and assign
a *= b          // Multiply and assign
a /= b          // Divide and assign
a %= b          // Modulo and assign
a &= b          // AND and assign
a |= b          // OR and assign
a ^= b          // XOR and assign
a <<= b         // Left shift and assign
a >>= b         // Right shift and assign

Ternary Expression

condition ? true_expr : false_expr

Example:

max = (a > b) ? a : b;

Operator Precedence

From highest to lowest:

Level Operators Associativity
1 () [] ++ -- (post) Left to right
2 ++ -- (pre) + - ! ~ & * (cast) Right to left
3 * / % Left to right
4 + - Left to right
5 << >> Left to right
6 < <= > >= Left to right
7 == != Left to right
8 & Left to right
9 ^ Left to right
10 ` `
11 && Left to right
12 `
13 ?: Right to left
14 = += -= etc. Right to left

Pointer Arithmetic

int32 *p = arr;
p + 1           // Points to next int32 (address + 4)
p - 1           // Points to previous int32 (address - 4)
p[i]            // Equivalent to *(p + i)

Pointer arithmetic automatically scales by the size of the pointed-to type:

  • uint8* increments by 1
  • uint16* increments by 2
  • int32* increments by 4
  • Any pointer-to-pointer increments by 4

Type Conversions

Explicit casting:

(uint8)value          // Truncate to 8 bits
(int32)byte_value     // Sign-extend or zero-extend
(uint32*)ptr          // Pointer type conversion

Implicit conversions:

  • Arrays decay to pointers
  • Smaller integers promote to int32 in expressions

7. Statements

Expression Statement

expression;

Examples:

x = 42;
function_call();
x++;

Compound Statement (Block)

{
    statement1;
    statement2;
    ...
}

Blocks create new scopes for local variables.

If Statement

if (condition)
    statement

if (condition)
    statement
else
    statement

Examples:

if (x > 0)
    printf("positive\n");

if (x > 0) {
    printf("positive\n");
} else if (x < 0) {
    printf("negative\n");
} else {
    printf("zero\n");
}

While Statement

while (condition)
    statement

Example:

while (x < 100) {
    x = x * 2;
}

For Statement

for (init; condition; increment)
    statement

The init can be:

  • Empty: for (; condition; increment)
  • Expression: for (x = 0; x < 10; x++)
  • Declaration: for (int32 i = 0; i < 10; i++)

Example:

for (int32 i = 0; i < 10; i = i + 1) {
    sum = sum + i;
}

Switch Statement

switch (expression) {
    case value1:
        statements
        break;
    case value2:
        statements
        break;
    default:
        statements
}
  • Cases must be integer constants
  • Fall-through is allowed (no automatic break)
  • default is optional

Example:

switch (day) {
    case 0:
        printf("Sunday\n");
        break;
    case 6:
        printf("Saturday\n");
        break;
    default:
        printf("Weekday\n");
}

Break Statement

break;

Exits the innermost while, for, or switch statement.

Continue Statement

continue;

Skips to the next iteration of the innermost while or for loop.

Return Statement

return;              // Return from void function
return expression;   // Return value

Example:

return 42;
return x + y;
return;

8. Functions

Function Declarations

return_type function_name(parameter_list);

Forward declaration (prototype):

int32 add(int32 a, int32 b);

Function Definitions

return_type function_name(parameter_list) {
    statements
}

Example:

int32 add(int32 a, int32 b) {
    return a + b;
}

Parameters

void no_params(void) { }                    // No parameters
int32 one_param(int32 x) { }                // One parameter
int32 two_params(int32 x, uint8 *s) { }     // Multiple parameters

Parameters are passed by value. To modify caller's data, use pointers:

void swap(int32 *a, int32 *b) {
    int32 temp = *a;
    *a = *b;
    *b = temp;
}

Return Values

int32 get_value(void) {
    return 42;
}

void no_return(void) {
    // No return statement needed
    return;  // Optional
}

Return value is passed in eax register (32-bit).

Recursion

Recursion is fully supported:

int32 factorial(int32 n) {
    if (n <= 1)
        return 1;
    return n * factorial(n - 1);
}

Calling Convention

Functions use cdecl convention:

  • Arguments pushed right-to-left on stack
  • Caller cleans up stack
  • Return value in eax
  • eax, ecx, edx are caller-saved
  • ebx, esi, edi, ebp are callee-saved

Calling C Functions

Common can call C library functions:

// Declare C functions
void printf(uint8 *format, ...);
void *malloc(uint32 size);
void free(void *ptr);

int32 main(void) {
    printf("Hello from Common\n");
    void *mem = malloc(100);
    free(mem);
    return 0;
}

Note: Variadic functions (...) can be declared but not defined in Common.


9. Scope and Linkage

Scope Rules

Global scope:

  • Variables and functions declared outside any function
  • Visible to all functions in the file

Local scope:

  • Variables declared inside a function or block
  • Visible only within that function/block
  • Shadows global variables with the same name

Block scope:

{
    int32 x = 1;
    {
        int32 x = 2;  // Different variable, shadows outer x
        printf("%d\n", x);  // Prints 2
    }
    printf("%d\n", x);  // Prints 1
}

Linkage

External linkage (default for functions):

int32 global_function(void) { ... }

Symbol is exported (global directive in assembly).

No linkage (local variables):

void func(void) {
    int32 local;  // No linkage
}

Static linkage: Not supported. All functions have external linkage.

Name Resolution

  1. Check local scope (function parameters and locals)
  2. Check global scope
  3. If not found, assumed to be external symbol

10. Memory Model

Stack Layout

High Address
+------------------+
| Return address   |
+------------------+
| Saved EBP        | <-- EBP
+------------------+
| Local variable 1 | EBP - 4
+------------------+
| Local variable 2 | EBP - 8
+------------------+
| ...              |
+------------------+
| Array data       | (grows down)
+------------------+ <-- ESP
Low Address

Function Call Stack

caller():
    push arg2
    push arg1
    call callee
    add esp, 8      // Clean up arguments

callee(arg1, arg2):
    push ebp        // Save old frame pointer
    mov ebp, esp    // Set up new frame
    sub esp, N      // Allocate locals
    ...
    mov esp, ebp    // Restore stack
    pop ebp
    ret

Arguments accessed via [ebp+8], [ebp+12], etc. Locals accessed via [ebp-4], [ebp-8], etc.

Data Sections

.text: Read-only code

section .text
function_name:
    ; assembly code

.data: Initialized data

section .data
global_var: dd 42
string: db "Hello", 0

.bss: Zero-initialized data

section .bss
uninit_var: resd 1
array: resb 100

Size Directives

Directive Size Common Type
resb/db 1 byte uint8/int8
resw/dw 2 bytes uint16/int16
resd/dd 4 bytes uint32/int32/pointers
resq/dq 8 bytes uint64/int64

Alignment

  • Stack is 16-byte aligned (per System V ABI)
  • Local variables are 4-byte aligned
  • Arrays follow element alignment

11. Assembly Interface

Generated Assembly Structure

BITS 32
section .text

; External function declarations
extern printf
extern malloc

; Exported functions
global main
global my_function

main:
    push ebp
    mov ebp, esp
    sub esp, 16          ; Allocate locals
    ; ... function body ...
    mov esp, ebp
    pop ebp
    ret

section .data
_s0: db "Hello", 0      ; String literal

section .bss
global_var: resd 1      ; Global variable

Register Usage

Caller-saved (may be modified by called function):

  • eax - Return value, scratch
  • ecx - Scratch, left operand
  • edx - Scratch, division remainder

Callee-saved (preserved across calls):

  • ebx - Base register
  • esi - Source index
  • edi - Destination index
  • ebp - Frame pointer
  • esp - Stack pointer

Common usage:

  • eax - Expression results, return values
  • ecx - Left operand in binary operations
  • [ebp+N] - Function parameters
  • [ebp-N] - Local variables

Calling C from Assembly

; Call: printf("Value: %d\n", x);
push dword [ebp-4]          ; Push x
push _s0                     ; Push format string
call printf
add esp, 8                   ; Clean up (2 args × 4 bytes)

Inline Assembly

Not supported. Use C library functions or write separate assembly files.


12. Limitations

Language Limitations

  1. Single-file compilation only

    • No #include or import mechanism
    • All code must be in one source file
    • Use forward declarations for ordering
  2. No structures or unions

    • Can simulate with arrays: node[0] = data, node[1] = next
    • Manual offset calculation required
  3. No floating point

    • Integer arithmetic only
    • No float, double, or long double
  4. No preprocessor

    • No #define, #ifdef, etc.
    • No macro expansion
    • No file inclusion
  5. No enums

    • Use integer constants instead
  6. Limited 64-bit support

    • 64-bit types exist but operations truncate to 32-bit
    • Full 64-bit arithmetic not implemented
  7. No static/extern keywords

    • All functions are global
    • No static local variables
    • No explicit extern declarations
  8. Single-dimensional arrays only

    • Multidimensional arrays not supported
    • Can use pointer arithmetic for 2D: arr[i * width + j]
  9. No goto

    • Use loops and breaks instead
  10. No comma operator

    • Cannot use a = (b, c)

Implementation Limitations

  1. Fixed buffer sizes

    • 256 identifiers/strings
    • 256 local variables per function
    • 256 global variables total
    • 512 string literals
  2. No optimization

    • Generated code is unoptimized
    • Expressions fully evaluated (no constant folding)
  3. Limited error messages

    • Basic syntax errors reported
    • No semantic analysis warnings
    • No type mismatch warnings
  4. x86-32 only

    • Not portable to other architectures
    • Requires 32-bit toolchain

Workarounds

Structures: Use arrays

// Instead of: struct { int x; int y; } point;
int32 point[2];  // point[0] = x, point[1] = y

Multidimensional arrays: Manual indexing

// Instead of: int matrix[10][10];
int32 matrix[100];
int32 value = matrix[row * 10 + col];

Enums: Integer constants

// Instead of: enum { RED, GREEN, BLUE };
int32 RED = 0;
int32 GREEN = 1;
int32 BLUE = 2;

13. Examples

Hello World

void puts(uint8 *s);

int32 main(void) {
    puts("Hello, World!");
    return 0;
}

Factorial (Iterative)

int32 factorial(int32 n) {
    int32 result = 1;
    for (int32 i = 2; i <= n; i = i + 1) {
        result = result * i;
    }
    return result;
}

Fibonacci (Recursive)

int32 fib(int32 n) {
    if (n <= 1)
        return n;
    return fib(n - 1) + fib(n - 2);
}

String Length

int32 strlen(uint8 *s) {
    int32 len = 0;
    while (s[len])
        len = len + 1;
    return len;
}

Array Sum

int32 sum_array(int32 *arr, int32 len) {
    int32 total = 0;
    for (int32 i = 0; i < len; i = i + 1) {
        total = total + arr[i];
    }
    return total;
}

Pointer Swap

void swap(int32 *a, int32 *b) {
    int32 temp = *a;
    *a = *b;
    *b = temp;
}

Bubble Sort

void bubble_sort(int32 *arr, int32 n) {
    for (int32 i = 0; i < n - 1; i = i + 1) {
        for (int32 j = 0; j < n - i - 1; j = j + 1) {
            if (arr[j] > arr[j + 1]) {
                int32 temp = arr[j];
                arr[j] = arr[j + 1];
                arr[j + 1] = temp;
            }
        }
    }
}
int32 binary_search(int32 *arr, int32 n, int32 target) {
    int32 left = 0;
    int32 right = n - 1;
    
    while (left <= right) {
        int32 mid = left + (right - left) / 2;
        
        if (arr[mid] == target)
            return mid;
        
        if (arr[mid] < target)
            left = mid + 1;
        else
            right = mid - 1;
    }
    
    return -1;  // Not found
}

Linked List (Simulated)

void *malloc(uint32 size);
void free(void *ptr);

// Node: [0] = data, [1] = next pointer
int32 *create_node(int32 value) {
    int32 *node = (int32*)malloc(8);
    node[0] = value;
    node[1] = 0;
    return node;
}

void insert_front(int32 **head, int32 value) {
    int32 *new_node = create_node(value);
    new_node[1] = (int32)(*head);
    *head = new_node;
}

Bitwise Operations

// Check if bit N is set
int32 is_bit_set(uint32 value, int32 n) {
    return (value >> n) & 1;
}

// Set bit N
uint32 set_bit(uint32 value, int32 n) {
    return value | (1 << n);
}

// Clear bit N
uint32 clear_bit(uint32 value, int32 n) {
    return value & ~(1 << n);
}

// Toggle bit N
uint32 toggle_bit(uint32 value, int32 n) {
    return value ^ (1 << n);
}

Appendix A: Grammar Summary

program ::= declaration*

declaration ::= 
    | type_spec identifier "(" param_list ")" ( ";" | block )
    | type_spec identifier ";"
    | type_spec identifier "=" expr ";"
    | type_spec identifier "[" expr "]" ";"
    | type_spec identifier "[" expr "]" "=" "{" expr_list "}" ";"

type_spec ::= base_type "*"*

base_type ::= "void" | "int8" | "int16" | "int32" | "int64"
            | "uint8" | "uint16" | "uint32" | "uint64"

param_list ::= "void" | ( param ( "," param )* )?

param ::= type_spec identifier

block ::= "{" statement* "}"

statement ::=
    | block
    | type_spec identifier ";"
    | type_spec identifier "=" expr ";"
    | type_spec identifier "[" expr "]" ( "=" "{" expr_list "}" )? ";"
    | expr ";"
    | "if" "(" expr ")" statement ( "else" statement )?
    | "while" "(" expr ")" statement
    | "for" "(" (decl | expr)? ";" expr? ";" expr? ")" statement
    | "switch" "(" expr ")" "{" case_clause* "}"
    | "return" expr? ";"
    | "break" ";"
    | "continue" ";"

case_clause ::=
    | "case" expr ":" statement*
    | "default" ":" statement*

expr ::= assignment

assignment ::= ternary ( assign_op ternary )?

assign_op ::= "=" | "+=" | "-=" | "*=" | "/=" | "%="
            | "&=" | "|=" | "^=" | "<<=" | ">>="

ternary ::= logical_or ( "?" expr ":" ternary )?

logical_or ::= logical_and ( "||" logical_and )*

logical_and ::= bit_or ( "&&" bit_or )*

bit_or ::= bit_xor ( "|" bit_xor )*

bit_xor ::= bit_and ( "^" bit_and )*

bit_and ::= equality ( "&" equality )*

equality ::= relational ( ("==" | "!=") relational )*

relational ::= shift ( ("<" | "<=" | ">" | ">=") shift )*

shift ::= additive ( ("<<" | ">>") additive )*

additive ::= multiplicative ( ("+" | "-") multiplicative )*

multiplicative ::= unary ( ("*" | "/" | "%") unary )*

unary ::=
    | postfix
    | "++" unary
    | "--" unary
    | "-" unary
    | "!" unary
    | "~" unary
    | "&" unary
    | "*" unary
    | "(" type_spec ")" unary

postfix ::=
    | primary
    | postfix "[" expr "]"
    | postfix "(" expr_list? ")"
    | postfix "++"
    | postfix "--"

primary ::=
    | integer_literal
    | string_literal
    | identifier
    | "(" expr ")"

Appendix B: Quick Reference Card

Types: void, int8, int16, int32, int64, uint8, uint16, uint32, uint64

Operators: + - * / % & | ^ ~ ! < > <= >= == != << >> && || ?: = += -= *= /= %= &= |= ^= <<= >>= ++ -- & * []

Keywords: if else while for switch case default break continue return

Control: if/else, while, for, switch/case, break, continue, return

Functions: type name(params) { body }

Arrays: type name[size], type name[size] = { values }

Pointers: type *name, &var, *ptr, ptr[index]

Comments: // line, /* block */


End of Common Language Reference Manual