363 lines
8.8 KiB
Markdown
363 lines
8.8 KiB
Markdown
# Common Programming Language
|
|
|
|
A minimalist, dependency-free compiler for a C-like language that targets x86-32 assembly.
|
|
|
|
## Overview
|
|
|
|
Common is a statically-typed systems programming language with:
|
|
- **No runtime dependencies** - compiles to standalone executables
|
|
- **Direct C interoperability** - call and be called by C code
|
|
- **Predictable codegen** - straightforward mapping to assembly
|
|
- **Complete type system** - 8 integer types, pointers, arrays
|
|
- **Full control flow** - if/else, loops, switch, functions
|
|
- **Zero external dependencies** - just libc, gcc, and nasm
|
|
|
|
## Quick Start
|
|
|
|
### Build the Compiler
|
|
|
|
```bash
|
|
gcc -o common common.c
|
|
```
|
|
|
|
### Hello World
|
|
|
|
Create `hello.cm`:
|
|
```c
|
|
void puts(uint8 *s);
|
|
|
|
int32 main(void) {
|
|
puts("Hello, World!");
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
Compile and run:
|
|
```bash
|
|
./common hello.cm hello.asm
|
|
nasm -f elf32 hello.asm -o hello.o
|
|
gcc -m32 hello.o -o hello
|
|
./hello
|
|
```
|
|
|
|
Or use the Makefile:
|
|
```bash
|
|
make # Build compiler and test suite
|
|
make test # Run all tests
|
|
make hello # Build hello example
|
|
make examples # Build all examples
|
|
make run-examples # Build and run all examples
|
|
```
|
|
|
|
## Documentation
|
|
|
|
### For Users
|
|
|
|
- **[Quick Reference](QUICKREF.md)** - One-page cheat sheet for syntax and operators
|
|
- **[Reference Manual](MANUAL.md)** - Complete language specification (80+ pages)
|
|
- **[Troubleshooting Guide](TROUBLESHOOTING.md)** - Solutions to common problems
|
|
|
|
### For Developers
|
|
|
|
- **[Test Suite README](README_TESTS.md)** - How to run and write tests
|
|
- **[Source Code](common.c)** - Well-commented compiler implementation
|
|
|
|
## Language Features
|
|
|
|
### Types
|
|
|
|
```c
|
|
// Integers
|
|
int8 int16 int32 int64 // Signed
|
|
uint8 uint16 uint32 uint64 // Unsigned
|
|
|
|
// Pointers and arrays
|
|
int32 *ptr; // Pointer
|
|
int32 arr[10]; // Array
|
|
uint8 *str = "text"; // String
|
|
```
|
|
|
|
### Control Flow
|
|
|
|
```c
|
|
if (x > 0) { ... }
|
|
while (x < 100) { ... }
|
|
for (int32 i = 0; i < n; i++) { ... }
|
|
switch (x) { case 1: ... break; }
|
|
```
|
|
|
|
### Operators
|
|
|
|
```c
|
|
// Arithmetic: + - * / %
|
|
// Comparison: == != < <= > >=
|
|
// Logical: && || !
|
|
// Bitwise: & | ^ ~ << >>
|
|
// Pointers: & *
|
|
// Increment: ++ --
|
|
```
|
|
|
|
### Functions
|
|
|
|
```c
|
|
int32 add(int32 a, int32 b) {
|
|
return a + b;
|
|
}
|
|
|
|
int32 factorial(int32 n) {
|
|
if (n <= 1) return 1;
|
|
return n * factorial(n - 1);
|
|
}
|
|
```
|
|
|
|
## Example Programs
|
|
|
|
All examples are in the `examples/` directory:
|
|
|
|
| Program | Description |
|
|
|---------|-------------|
|
|
| **hello.cm** | Hello World |
|
|
| **fibonacci.cm** | Recursive Fibonacci |
|
|
| **arrays.cm** | Array operations |
|
|
| **pointers.cm** | Pointer manipulation |
|
|
| **bubblesort.cm** | Bubble sort algorithm |
|
|
| **bitwise.cm** | Bitwise operations |
|
|
| **types.cm** | Type casting examples |
|
|
| **switch.cm** | Switch statements |
|
|
| **primes.cm** | Prime number calculator |
|
|
| **strings.cm** | String functions |
|
|
| **calculator.cm** | Expression evaluator |
|
|
| **linkedlist.cm** | Linked list (simulated) |
|
|
|
|
Build any example:
|
|
```bash
|
|
make fibonacci && ./fibonacci
|
|
make bubblesort && ./bubblesort
|
|
```
|
|
|
|
## Test Suite
|
|
|
|
The test suite includes 60+ automated tests covering:
|
|
- Arithmetic and operators
|
|
- Variables and arrays
|
|
- Control flow
|
|
- Functions and recursion
|
|
- Pointers and type casting
|
|
- All integer types
|
|
|
|
Run tests:
|
|
```bash
|
|
make test
|
|
# or
|
|
./run_tests.sh
|
|
```
|
|
|
|
## Compilation Pipeline
|
|
|
|
```
|
|
source.cm → [common compiler] → output.asm → [nasm] → output.o → [gcc] → executable
|
|
```
|
|
|
|
1. **Common compiler**: Parses source, generates NASM assembly
|
|
2. **NASM**: Assembles to ELF32 object file
|
|
3. **GCC**: Links with C runtime library
|
|
|
|
## Requirements
|
|
|
|
- **GCC** with 32-bit support (gcc-multilib)
|
|
- **NASM** assembler
|
|
- **Linux** or compatible environment (WSL works)
|
|
|
|
Installation:
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo apt-get install build-essential gcc-multilib nasm
|
|
|
|
# Fedora/RHEL
|
|
sudo dnf install gcc glibc-devel.i686 nasm
|
|
|
|
# Arch
|
|
sudo pacman -S gcc lib32-gcc-libs nasm
|
|
```
|
|
|
|
## Language Limitations
|
|
|
|
- **Single file compilation** - no modules or includes
|
|
- **No structs/unions** - use arrays for structured data
|
|
- **No floating point** - integers only
|
|
- **No preprocessor** - no #define, #include
|
|
- **1D arrays only** - simulate 2D with manual indexing
|
|
- **Partial 64-bit support** - types exist but ops truncate to 32-bit
|
|
|
|
See [MANUAL.md](MANUAL.md) for complete details and workarounds.
|
|
|
|
## Implementation Details
|
|
|
|
**Target**: x86-32 (IA-32) ELF
|
|
**Calling convention**: cdecl
|
|
**Stack alignment**: 16-byte (System V ABI)
|
|
**Registers**:
|
|
- `eax`: return values, expressions
|
|
- `ecx`: left operand
|
|
- `edx`: scratch
|
|
- `ebp`: frame pointer
|
|
- `esp`: stack pointer
|
|
|
|
**Code sections**:
|
|
- `.text`: executable code
|
|
- `.data`: initialized globals, strings
|
|
- `.bss`: zero-initialized globals
|
|
|
|
## Architecture
|
|
|
|
The compiler is a single-pass implementation in C99:
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ Lexer │ Tokenize source
|
|
├─────────────┤
|
|
│ Parser │ Build AST
|
|
├─────────────┤
|
|
│ Type Check │ Infer expression types
|
|
├─────────────┤
|
|
│ Code Gen │ Emit NASM assembly
|
|
└─────────────┘
|
|
```
|
|
|
|
Key components:
|
|
- **Lexer** (150 LOC): Tokenization with lookahead
|
|
- **Parser** (400 LOC): Recursive descent parser
|
|
- **Type System** (200 LOC): Type inference for pointer arithmetic
|
|
- **Code Generator** (800 LOC): Assembly emission
|
|
|
|
Total: ~2000 lines of C99
|
|
|
|
## C Interoperability
|
|
|
|
Common can call C functions:
|
|
|
|
```c
|
|
// Declare C functions
|
|
void printf(uint8 *fmt, ...);
|
|
void *malloc(uint32 size);
|
|
void free(void *ptr);
|
|
|
|
int32 main(void) {
|
|
printf("Allocated %d bytes\n", 100);
|
|
void *mem = malloc(100);
|
|
free(mem);
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
C can call Common functions:
|
|
```c
|
|
// common.cm
|
|
int32 compute(int32 x) {
|
|
return x * x;
|
|
}
|
|
|
|
// main.c
|
|
extern int compute(int);
|
|
int main() {
|
|
printf("%d\n", compute(10));
|
|
}
|
|
```
|
|
|
|
Compile:
|
|
```bash
|
|
./common common.cm common.asm
|
|
nasm -f elf32 common.asm -o common.o
|
|
gcc -m32 main.c common.o -o program
|
|
```
|
|
|
|
## Comparison to C
|
|
|
|
### Similar to C
|
|
- Syntax and semantics
|
|
- Type system (with fewer types)
|
|
- Pointer arithmetic
|
|
- Control flow
|
|
- Function calls (cdecl)
|
|
|
|
### Different from C
|
|
- No preprocessor
|
|
- No structs/unions
|
|
- No enums
|
|
- No static/extern keywords
|
|
- No goto
|
|
- Single file only
|
|
- Simpler type system
|
|
|
|
### Simpler than C
|
|
- No type qualifiers (const, volatile)
|
|
- No storage classes (auto, register)
|
|
- No function pointers (can cast to void*)
|
|
- No variadic function definitions
|
|
- No bitfields
|
|
- No flexible array members
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
.
|
|
├── common.c # Compiler source (2000 LOC)
|
|
├── commonl # Linker
|
|
├── Makefile # Build automation
|
|
├── run_tests.sh # Quick test script
|
|
│
|
|
├── MANUAL.md # Complete language reference
|
|
├── QUICKREF.md # One-page cheat sheet
|
|
├── TROUBLESHOOTING.md # Problem solutions
|
|
├── README_TESTS.md # Test suite documentation
|
|
│
|
|
├── test_runner.c # Automated test harness
|
|
├── test_suite.cm # Test suite
|
|
│
|
|
└── examples/ # Example programs
|
|
├── hello.cm
|
|
├── fibonacci.cm
|
|
├── arrays.cm
|
|
├── pointers.cm
|
|
├── bubblesort.cm
|
|
├── bitwise.cm
|
|
├── types.cm
|
|
├── switch.cm
|
|
├── primes.cm
|
|
├── strings.cm
|
|
├── calculator.cm
|
|
└── linkedlist.cm
|
|
```
|
|
|
|
## License & Educational Purpose
|
|
|
|
### Public Domain / CC0
|
|
|
|
This project is dedicated to the public domain under the CC0 1.0 Universal license. This means you can copy, modify, distribute, and perform the work, even for commercial purposes, all without asking permission.
|
|
|
|
### Why Public Domain?
|
|
|
|
- **Ease of Use:** - By removing all licensing restrictions, students can freely integrate code snippets from common.c into their own projects without legal overhead or attribution requirements.
|
|
- **Educational Accessibility:** - The compiler is designed to be a "pure" learning resource. Its single-file implementation is intended to be read and modified as if it were a textbook example.
|
|
No Barriers: Just as the language requires "zero external dependencies," its legal status requires no compliance tracking, making it ideal for classroom settings and open-source forks.
|
|
- **Simplicity:** - A complex license would contradict the project's philosophy of "simplicity over features"
|
|
|
|
## Credits
|
|
|
|
Inspired by:
|
|
|
|
- **C** - Dennis Ritchie and Brian Kernighan
|
|
- **chibicc** - Rui Ueyama's educational C compiler
|
|
- **8cc** - Rui Ueyama's C compiler
|
|
- **tcc** - Fabrice Bellard's Tiny C Compiler
|
|
|
|
Built for programmers who value:
|
|
|
|
- Simplicity over features
|
|
- Control over convenience
|
|
- Learning over abstraction
|
|
|
|
---
|
|
|
|
Start with the [Quick Reference](QUICKREF.md) or dive into the [Manual](MANUAL.md).
|