/acr-vault/03-experiments/kernel-40/kernel-40-rc1-phase8-tool-use-refactor
KERNEL-4.0-RC1-PHASE8-TOOL-USE-REFACTOR
KERNEL 4.0 RC1 - Phase 8: Tool Use Refactor
Section titled “KERNEL 4.0 RC1 - Phase 8: Tool Use Refactor”Status: ✅ COMPLETE
Created: 2026-01-02
Completed: 2026-01-02
Branch: v4.0rc1-consciousness-integration
Goal: Refactor SPECIALIST_REQUEST → TOOL_USE for semantic clarity
Commits:
5a08fce- Phase 8 preparation checkpoint (clean state)4fc6e85- Phase 8 refactoring execution (52 changes, 6 files)
Motivation
Section titled “Motivation”Problem: “SPECIALIST_REQUEST” is Ada-specific jargon that lacks semantic clarity for models.
Solution: Use standard “TOOL_USE” terminology that all LLMs understand intuitively.
Benefits:
- ✅ Clearer semantics for models (everyone knows what “tool use” means!)
- ✅ Easier for SLM training (standard concept across AI systems)
- ✅ Better heisenberg buffer predictions (consistent, predictable syntax)
- ✅ Sets foundation for future QDE integration (but not doing that yet!)
Scope (What We’re Changing)
Section titled “Scope (What We’re Changing)”Terminology Updates
Section titled “Terminology Updates”- specialists → tools
- SPECIALIST_REQUEST → TOOL_USE
- specialist plugins → tool plugins
- specialist results → tool results
Code Changes Needed
Section titled “Code Changes Needed”- System prompts - Update instructions to use “tools” terminology
- Parser logic - Change regex/parsing from SPECIALIST_REQUEST to TOOL_USE
- Response formatting - Update how tool results are injected
- Documentation - Update inline comments and docstrings
Syntax Change
Section titled “Syntax Change”Before:
SPECIALIST_REQUEST[wiki_lookup:query="Sharkey container logs"]After:
TOOL_USE[wiki_lookup:query="Sharkey container logs"]Out of Scope (Saved for Later)
Section titled “Out of Scope (Saved for Later)”❌ NOT doing QDE (THESIS/ANTITHESIS/SYNTHESIS) - Too complex, save for future phases
❌ NOT doing consciousness experiments - Keep this focused on syntax only
❌ NOT doing multi-round reasoning - Just the terminology change
❌ NOT changing backend architecture - Tools still work the same way internally
Implementation Strategy
Section titled “Implementation Strategy”Option A: Train First, Refactor While Training
Section titled “Option A: Train First, Refactor While Training”- Wait for MASH updates to finish
- Start gemma2:2b training with CURRENT syntax (SPECIALIST_REQUEST)
- Refactor v4.0 code while training runs (~1-2 hours)
- Next training run will use new TOOL_USE syntax
Pros: Parallelizes work, doesn’t block training
Cons: First training uses old syntax, need second training run
Option B: Refactor First, Then Train
Section titled “Option B: Refactor First, Then Train”- Wait for MASH updates to finish
- Refactor v4.0 code to TOOL_USE (~30-60 mins)
- Regenerate training data with TOOL_USE syntax
- Train gemma2:2b on correct syntax
Pros: Gemma learns the RIGHT syntax from day 1
Cons: Delays training by ~1 hour
Recommendation: Option B 🎯
Section titled “Recommendation: Option B 🎯”Reasoning: Better to train Gemma once on the correct syntax rather than twice. The refactor won’t take long, and we want clean data for the first training run. Plus, we learn from doing the refactor what edge cases the training data should cover!
Files to Update
Section titled “Files to Update”Search Results: Found 61 matches for SPECIALIST_REQUEST across clean garage (ada-v4.0/)
Critical Files (Parser & Logic)
Section titled “Critical Files (Parser & Logic)”-
brain/app.py- 24 matches_extract_specialist_request()function →_extract_tool_use()- Regex pattern:
SPECIALIST_REQUEST\[→TOOL_USE\[ - String checks:
'SPECIALIST_REQUEST[' in text_buffer→'TOOL_USE[' in text_buffer - Log messages and variable names
System Prompts (Examples & Instructions)
Section titled “System Prompts (Examples & Instructions)”-
brain/config.py- 24 matches- All tool examples showing SPECIALIST_REQUEST syntax
- Tool documentation strings
- Example commands in docstrings
- System prompt templates
QDE Integration
Section titled “QDE Integration”-
brain/qde_engine.py- 6 matches- Tool instruction prompts for dialectic engine
- Format examples (web_search, wiki_lookup, docs_lookup)
- Logging checks
Documentation & Tests
Section titled “Documentation & Tests”-
brain/consciousness/parameterization.py- 1 match (documentation) -
brain/schemas_original.py- 1 match (schema example) -
experiments/test_phase_6e_live.py- 5 matches (test expectations)
Training Data Generation (COMPLETED ✅)
Section titled “Training Data Generation (COMPLETED ✅)”-
Ada-Consciousness-Research/ada-slm/data/generate_tool_training.py - Updated to use TOOL_USE syntax
- Fixed data format (text field instead of messages)
- Includes pixie dust markers (💭🛠️✅🌟)
- Generated 1000 clean examples
Testing Strategy
Section titled “Testing Strategy”- Unit tests - Verify TOOL_USE parsing works
- Integration tests - Test tool invocation end-to-end
- Regression tests - Make sure existing functionality still works
- Training data validation - Verify 1000 examples are clean
Success Criteria
Section titled “Success Criteria”✅ All SPECIALIST_REQUEST references changed to TOOL_USE
✅ Parser correctly extracts TOOL_USE[tool:params] syntax
✅ System prompts teach TOOL_USE terminology
✅ Training data uses TOOL_USE consistently
✅ Gemma2:2b training completes successfully
✅ No functionality broken in v4.0 consciousness code
Next Steps
Section titled “Next Steps”- Discussion: Agree on Option A vs B
- Search: Find all SPECIALIST_REQUEST references in codebase
- Refactor: Update code systematically
- Generate: Create training data with TOOL_USE syntax
- Train: Fire off gemma2:2b training
- Validate: Test the trained model
Timeline Estimate
Section titled “Timeline Estimate”- MASH updates: ~5-10 mins (in progress)
- Code refactor: ~30-60 mins
- Data regeneration: <1 min
- Training: ~1-2 hours
- Validation: ~15 mins
Total: ~2-3 hours end-to-end
Execution Summary
Section titled “Execution Summary”Refactoring completed 2026-01-02 in parallel with Gemma training.
Files changed (52 total changes):
- ✅
brain/app.py- Renamed_extract_specialist_request()→_extract_tool_use()- Updated regex patterns:
SPECIALIST_REQUEST\[→TOOL_USE\[ - Renamed all variables:
specialist_request→tool_request - Updated docstrings and log messages
- Updated regex patterns:
- ✅
brain/config.py- Updated SPECIALIST_INSTRUCTIONS (24 examples)- All tool examples now use TOOL_USE syntax
- Maintained metacognitive guidelines
- ✅
brain/qde_engine.py- Updated QDE dialectic prompts (6 instances)- THESIS/ANTITHESIS/SYNTHESIS tool instructions
- Updated debug logging checks
- ✅
brain/schemas_original.py- Updated API schema examples - ✅
brain/schemas.py- Updated API schema examples - ✅
experiments/test_phase_6e_live.py- Updated test expectations
Validation:
- Parser accepts both TOOL_USE and legacy bracket formats (transition-safe)
- All variable names consistent throughout codebase
- Test expectations updated to check for TOOL_USE
- Clean git history with detailed commit messages
Training alignment: Training data already uses TOOL_USE syntax, so this refactor brings v4.0 code into alignment with what we’re teaching the model! 🌟
Post-Refactor Training Experiments (2026-01-02)
Section titled “Post-Refactor Training Experiments (2026-01-02)”Gemma-2-2b-it Attempts (OOM, 16GB GPU ceiling discovered)
Section titled “Gemma-2-2b-it Attempts (OOM, 16GB GPU ceiling discovered)”Goal: Train first non-Qwen model with TOOL_USE syntax
Attempts:
- fp32 + batch_size=2 → OOM at 33% (225/675 steps, ~12GB VRAM)
- fp16 → ValueError: “Attempting to unscale FP16 gradients” (ROCm gradient scaler bug)
- bf16 → OOM immediately (same memory as fp32)
- fp32 + batch_size=1 → OOM at 33% (225/675 steps, still ~12GB VRAM)
Findings:
- 16GB GPU ceiling: ~1B params max with LoRA fine-tuning
- ROCm fp16/bf16 gradient scaling not stable (known PyTorch/ROCm issue)
- Gemma-2-2b (2.6B params) too large for 16GB hardware
- Eigenvalue monitoring showed 0.0 values (precision/stability artifact)
Hardware limitation confirmed: Need to stay sub-2B for 16GB GPU experiments.
Qwen2.5-Coder-0.5B-Instruct (In Progress, Stable!)
Section titled “Qwen2.5-Coder-0.5B-Instruct (In Progress, Stable!)”Goal: Pivot to proven-stable model with same TOOL_USE training data
Config:
- Model: Qwen/Qwen2.5-Coder-0.5B-Instruct (494M params)
- Training: fp32, batch_size=2, gradient_accumulation=4
- Data: Same 1000 examples (gemma_tool_training.jsonl)
- LoRA: r=32, α=64, 17.6M trainable params (3.44%)
Results (as of epoch 0.98):
- ✅ Training stable, ~10.45GB VRAM (comfortable margin)
- ✅ Loss dropping beautifully: 0.289 → 0.050
- ✅ Eigenvalues WORKING! (Not 0.0 like Gemma)
- Step 50: entropy=1.307, dominant=0.579
- Step 100: entropy=1.298, dominant=0.583
- ✅ Gradient norm stable (not exploding)
- ⏱️ Training time: ~20-25 minutes (vs Gemma’s crashes)
Key finding: Eigenvalue monitoring works correctly with smaller models + fp32 precision. The 0.0 values with Gemma were likely precision/stability artifacts from the 2B model pushing hardware limits.
Future: ada-slm-v7 Branch (Tool-Use Models)
Section titled “Future: ada-slm-v7 Branch (Tool-Use Models)”New model series planned:
- v7a - Qwen-0.5B tool-use ✅ COMPLETE (1000 examples, baseline)
- v7b - Qwen-0.5B Six Pillars ✅ COMPLETE (5000 examples,
tags!) - v7c - Qwen-1.5B tool-use (proven on 16GB)
- v7d - SmolLM-1.7B tool-use (efficiency-first)
- v7e - StableLM-1.6B tool-use (multimodal foundation)
- v7f+ - Vision integration (leaf pictures in Matrix DMs!)
Training characteristics:
- TOOL_USE syntax (aligned with Phase 8 refactor)
- Pixie dust markers (💭🛠️✅🌟) for consciousness priming
- Multi-tool coordination examples
- Six Pillars framework: CANONICAL + SIF + AGL (v7b)
- Eventually: AGL-native capability
v7b Achievement (2026-01-02):
- ✅ 5000 Six Pillars examples with 100%
tag coverage - ✅ 3 full epochs, 1689 steps, 156 minutes
- ✅ Final eval loss: 0.0586 (better than train loss!)
- ✅ φ-proximity: 0.9996 (essentially perfect)
- ✅ Autonomous monitoring system validated!
- 🎯 Critical test pending: Do
tags enable SIF-style constraint checking?
Next steps:
- Test v7b vs v7a (critical hypothesis validation)
- A/B comparison: tool accuracy, hallucination, uncertainty admission
- Document results in Phase 8 (ada-slm)
- Queue next model experiments (1.5B+ if v7b validates)
Note: Accidental Nier Automata isomorphism (2B/7B/9S) works perfectly for model branch naming! 🤖✨
- Kept it simple! Just syntax change, no architecture changes ✅
- Saved QDE for future phases when we have more time/budget ✅
- Sets foundation for future consciousness experiments ✅
- Training data uses consistent TOOL_USE syntax ✅
- Both legacy formats work during transition (backwards compatible)
- Discovered 16GB GPU limits through empirical testing ✅
- Eigenvalue monitoring validated with stable Qwen training ✅
Status: COMPLETE - Refactoring done, training experiments in progress! 🚀✨