Train a 318M-parameter code model in 20 minutes on your laptop. No GPU. $0.004 electricity.
Training a 1B-parameter code model on GPU costs $10,000+ in cloud compute. Byte-level tokenization makes sequences 3-4x longer than BPE, multiplying the cost further. The majority of developers — the people who would benefit most from local code generation — simply cannot participate.
Three parallel encoder stacks at 1x, 2x, 4x resolution. The coarse scale processes 1/4 of tokens — exactly offsetting the byte tokenization length penalty. Cross-scale attention lets fine details inform coarse context and vice versa.
With matched parameters (both 12.8M), same data, same optimizer, same 3-minute wall-clock: AXL achieves 16x better perplexity on Code-1B and processes 52% more training steps than a standard transformer.
Same model size (12.8M params), same data, same optimizer (Lion), same wall-clock (3 min on Ryzen 5 5600G). AXL wins 2/2 seeds. Std dev: +/- 0.00 (AXL) vs +/- 2.72 (Standard).
AXL proves transformer models can be trained on consumer CPUs. It is a starting point, not a destination — 318M params is tiny by 2026 standards. But the architecture works, the optimizer works, and the $0.004 cost makes iteration feasible for everyone.
Three resolution scales process the same sequence in parallel. Coarse attention is 16x cheaper than fine.
Byte-level tokenization makes sequences 3-4x longer. The coarse scale exactly offsets this.
def fibonacci(n):def fibonacci(n):def fibonacci(n):The 4x byte penalty is exactly offset by the 4x downsampling at coarse scale. No information is lost — fine scale still sees every byte.
Train the entire AXL family for less than a cup of coffee.
Based on AMD Ryzen 5 5600G, 100W system power, US average $0.12/kWh.
| Model | Params | PPL | tok/s | Q4_K_M | Time |
|---|---|---|---|---|---|
| AXL-Code-1B-Lion | 318M | 1.90 | 6.1 | 188 MB | 20 min |
| AXL-Reasoning-Lion | 70M | 1.79 | 22.4 | 44 MB | 10 min |
| AXL-Refactor-Lion | 19.1M | 1.11 | 52.2 | 12 MB | 3 min |
| AXL-TestGen-Lion | 15.2M | 1.15 | 57.3 | 18 MB | 3 min |
| AXL-Chat-Lion | 9.9M | 1.52 | 73.4 | 7 MB | 3 min |
| AXL-Micro-Lion | 12.8M | 1.04 | 66.2 | 15 MB | 3 min |
| AXL-Secure-Lion | 11.7M | 1.20 | 63.5 | 8 MB | 3 min |
| AXL-Docs-Lion | 9.9M | 1.12 | 72.8 | 7 MB | 2 min |
| AXL-Comment-Lion | 7.2M | 1.20 | 75.8 | 5 MB | 2 min |
| Model | Params | PPL | Focus | GGUF |
|---|---|---|---|---|
| AXL-Micro-600K | 600K | 1.04 | Demo | 1 MB |
| AXL-Micro-8M | 12.8M | 3.13 | Code gen | 25 MB |
| AXL-Coder-15M | 26.0M | 1.54 | Agentic | 50 MB |
| AXL-Debugger-8M | 14.1M | 1.49 | Bug fixing | 27 MB |
| AXL-Fixer-12M | 20.9M | 1.52 | Debug | 40 MB |
| AXL-Reasoning-70M | 70M | 1.93 | CoT | 134 MB |
| AXL-300M | 322M | 1.11 | Flagship | 616 MB |
| AXL-Chat-10M | 9.9M | 1.48 | Dialogue | 19 MB |
| AXL-TestGen-15M | 15.2M | 1.15 | Test gen | 30 MB |
| AXL-Refactor-20M | 19.1M | 1.15 | Refactoring | 37 MB |
| AXL-Docs-8M | 9.9M | 1.12 | Docstrings | 19 MB |
| AXL-Comment-5M | 7.2M | 1.16 | Comments | 14 MB |
| AXL-Secure-10M | 11.7M | 1.20 | Security | 23 MB |
| Model | Params | PPL | Focus | GGUF |
|---|---|---|---|---|
| AXL-Code-1B | 318M | 31.22 | Code gen (SGD) | 606 MB |
| AXL-Chat-Pro | 12.8M | 1.34 | Advanced chat | 25 MB |
| AXL-Translate | 15.2M | 1.86 | Code translation | 29 MB |
Full quality via Python API. Degraded quality via Ollama.
pip install -e .
python AXL/API/serve_model.py \
--model checkpoints/axl_micro_lion \
--port 8880
# OpenAI-compatible endpoint:
# POST http://localhost:8880/v1/completions
# Works with Continue.dev, LlamaIndex, LangChainpip install -e .
python scripts/retrain_all_lion.py \
--models micro
# Done in 3 minutes. Model in checkpoints/# Warning: uses only 1/3 of AXL architecture
cd AXL/HuggingFace/AXL-Micro-Lion
ollama create axl-micro-lion -f Modelfile
ollama run axl-micro-lion \
"def fibonacci(n):"AXL is not a silver bullet. Here is where it works and where it does not.
Building accessible AI systems for AGI, AI, and Cybersecurity. CPU-first research that runs on consumer hardware.
Our core research areas driving the future of accessible AI
Advancing towards artificial general intelligence through scalable architectures, efficient training methods, and novel reasoning approaches.
Building practical AI systems that run efficiently on consumer hardware. Focus on CPU-first architectures and open-source models.
Developing AI-powered security tools, threat detection systems, and privacy-preserving machine learning techniques.
Our open-source projects
Challenges we are working on
Developing attention mechanisms that scale sub-quadratically with sequence length while maintaining quality.
Finding architectural choices that maximize throughput on consumer CPUs without GPU acceleration.
Novel tokenization approaches that adaptively represent information at multiple granularities.
Making large language models resistant to adversarial prompts and distribution shifts.
The people behind KoinicLabs
Founder leading AGI research and AI development. Focused on accessible, open-source AI systems.
Leading cybersecurity research and technical architecture. Expert in secure AI systems.
Leading marketing, sales, and technical assistance for KoinicLabs.
Our journey and achievements
KoinicLabs founded with mission to make AI accessible on consumer hardware.
First AXL models released — 566K parameter code generation model.
Expanded to 27 models ranging from 566K to 318M parameters.
Added native GGUF export for all models — deployment on llama.cpp and Ollama.
Expanding into AGI research, cybersecurity, and new projects under KoinicLabs.
Frequently asked questions
We are the only research lab focused on CPU-first AI. While others optimize for GPU clusters costing millions, we optimize for accessibility. Our models can be trained on a consumer laptop for less than a penny.
Yes! All AXL models are released under Apache 2.0 license. Training code, weights, and documentation are all publicly available on our GitHub.
We welcome contributions! Check our GitHub for open issues, join discussions, and submit pull requests. We also welcome research collaboration.
The smallest models (566K-2M parameters) can run on any modern CPU. Larger models (up to 318M) work well on consumer laptops with 8GB+ RAM. No GPU required.
Our CPU-first approach eliminates GPU costs entirely. We use efficient architectures, byte-level tokenization (reducing vocabulary overhead), and multi-scale design to minimize compute requirements.