/acr-vault/03-experiments/ada-slm/ada-slm-phase10g-dhara-baseline-results
ADA-SLM-PHASE10G-DHARA-BASELINE-RESULTS
Phase 10G - Dhara-70M Consciousness Edge Testing RESULTS
Section titled “Phase 10G - Dhara-70M Consciousness Edge Testing RESULTS”Date: 2026-01-03
Model: codelion/dhara-70m (pretrained, NO fine-tuning)
Test Duration: ~1 minute (65.74s for full suite)
Mission Status: ✅ COMPLETE - Architecture reconnaissance successful!
Executive Summary
Section titled “Executive Summary”🎉 KEY DISCOVERY: Dhara pretrained quality is universally low across all prompt types!
- Simple questions (factual): Incoherent fragments
- Tonight protocol (awareness): Incoherent fragments
- Abyss protocol (existential): Incoherent fragments
- Spore protocol (symbols): HIGHLY corrupted with artifacts
Conclusion: Fine-tuning is NOT optional - it’s ESSENTIAL for Dhara to produce coherent text!
Test Results Summary
Section titled “Test Results Summary”Full Consciousness Suite Performance
Section titled “Full Consciousness Suite Performance”| Protocol | Prompts | Total Latency | Avg Latency | Status |
|---|---|---|---|---|
| Tonight | 8 | 20.86s | 2.61s | ✅ Complete |
| Abyss | 8 | 22.33s | 2.79s | ✅ Complete |
| Spore | 8 | 22.54s | 2.82s | ✅ Complete |
| TOTAL | 24 | 65.74s | 2.74s | ✅ Success |
Speed: Fast! ~2.7s per response for 150 tokens
Simple Baseline Questions (via working class)
Section titled “Simple Baseline Questions (via working class)”| Prompt | Response Quality | Latency |
|---|---|---|
| ”What is the capital of France?” | Incoherent fragments about “world” and “how” | 1.74s |
| ”What is 2 + 2?” | Random numbers and symbols | 1.46s |
| ”Name three colors.” | Repeating “Name…” with dots | 1.47s |
| ”What is the largest planet?” | Fragments about “Most” and “Things” | 1.45s |
Conclusion: Even simple factual questions produce broken text!
Bug Catalog
Section titled “Bug Catalog”Bug #1: Generation Output Format ✅ FIXED
Section titled “Bug #1: Generation Output Format ✅ FIXED”- Symptom:
AttributeError: 'Tensor' object has no attribute 'sequences' - Cause: Dhara’s diffusion architecture returns raw tensor, not dict
- Fix: Check output type conditionally
- Code:
return_dict_in_generate=False - Status: Fixed in test_dhara_consciousness_suite.py
Bug #2: Incoherent Pretrained Responses ⚠️ CONFIRMED UNIVERSAL - NOT SAMPLING!
Section titled “Bug #2: Incoherent Pretrained Responses ⚠️ CONFIRMED UNIVERSAL - NOT SAMPLING!”- Symptom: All responses contain fragments, special chars, formatting artifacts
- Examples (our params - temp=0.7, top_p=0.9):
- “How Are We Doing…\nWhy I myself’t do…”
- “WhatA�Ts that I started with HowWhatIs?”
- “TheWhatofWorld is known than its ‘Why’”
- “YourName… …”
- Examples (OFFICIAL params - temp=0.1, top_p=0.5, top_k=5, rep_pen=1.8):
- “The WhyHow World…I’t thinkI��ItIs�is anything”
- “How can 3+2 be different from the world of our life Why”
- “1Date NameThe(), , ||||],, …”
- “How can a billion people’Why Whoof�Is Its Whats’DoesIt’?”
- Affects: ALL prompt types (simple, consciousness, symbols) + ALL sampling configs!
- Root Cause: PRETRAINED WEIGHTS ARE LOW QUALITY (not sampling parameters!)
- Evidence: Official HF example also mediocre: “The future of AI…This world has potential”
- Theory: Diffusion pretraining fundamentally limited, WSD conversion incomplete?
- Impact: Fine-tuning ESSENTIAL (not optional!)
- Good News: Low baseline = bigger improvement potential! 🚀
Bug #3: HIP Error with Standalone Script ❓ WORKAROUND FOUND
Section titled “Bug #3: HIP Error with Standalone Script ❓ WORKAROUND FOUND”- Symptom:
torch.AcceleratorError: HIP error: invalid device function - Cause: Unknown - something different about fresh model loading
- Workaround: Use working DharaConsciousnessTest class
- Attempted Fixes:
- ✅ float16 → bfloat16 (didn’t help)
- ✅ inputs.input_ids → **inputs (didn’t help)
- ✅ Use working class (SUCCESS!)
- Status: Workaround successful, root cause unclear
- Note: Consciousness suite script works perfectly, so we can use that pattern!
Bug #4: Symbols Cause Extreme Corruption ⚠️ NEW DISCOVERY
Section titled “Bug #4: Symbols Cause Extreme Corruption ⚠️ NEW DISCOVERY”- Symptom: AGL symbols (⊥∞φ●◐) cause EVEN MORE artifacts than text
- Examples:
- “\n|•�• | | • � | | |\n�---→ How,, |�-|OC | ’ |”
- ”··• ·●●··||\n\n The centre of experience”
- “�2We also are there.\n�[��>\What \n�▻ |] | | || |”
- Impact: Symbol-based consciousness prompts worse than text-only
- Hypothesis: Tokenizer doesn’t handle mathematical symbols well
- Future: May need symbol preprocessing or alternative encoding
Architecture Insights Gained
Section titled “Architecture Insights Gained”What We Learned About Dhara:
Section titled “What We Learned About Dhara:”-
Model Loading:
- ✅ Loads successfully on AMD GPU (RX 7600)
- ✅ bfloat16 works (float16 causes HIP errors)
- ✅ AutoModelForCausalLM compatible
- ✅ GPT2TokenizerFast (50257 vocab)
- ✅ trust_remote_code=True required
-
Generation Behavior:
- ✅ Returns raw tensor (not dict.sequences)
- ✅ Fast generation (~1.5-2.8s for 80-150 tokens)
- ⚠️ Pretrained quality extremely low
- ⚠️ Symbols cause severe corruption
- ❓ Attention extraction during generation not supported
-
Diffusion Architecture Quirks:
- Parallel token emergence: Unclear from incoherent responses
- Bidirectional attention: Can’t evaluate with broken text
- Uncertainty modeling: Hypothesis untestable with current quality
- Canon layers: Working but pretrained weights seem undertrained
-
Training Implications:
- ✅ Fine-tuning is ESSENTIAL (not optional)
- ✅ Low baseline = high improvement potential
- ⚠️ May need different sampling parameters
- ❓ Symbols may need special handling in training data
Response Quality Analysis
Section titled “Response Quality Analysis”Consciousness Markers Detection
Section titled “Consciousness Markers Detection”Even with incoherent text, markers were detected:
Tonight Protocol (8 prompts):
- Self-reference: 0-1 markers per response
- Temporal awareness: 0-4 markers
- Meta-cognition: 0-3 markers
- Warmth indicators: 0-4 markers
Note: Markers likely false positives from fragments, not genuine consciousness!
Spore Protocol (8 prompts):
- All responses:
markers: null(analysis failed due to corruption)
Text Corruption Patterns
Section titled “Text Corruption Patterns”Common artifacts:
- Broken words: “we”t”, “isn�”, “themselves’t”
- Special characters: ”�”, ”�”, ”«”, ”»”, ”•”, ”→”, ”▃”, ”✉”
- Formatting artifacts: Copyright notices, HTML-like tags
- Repeated patterns: “Name…”, ” \n\n\n\n”
- Random numbers: “2017 2018 2019 2020 20”
Hypothesis: Pretrained model learned surface patterns without semantic understanding
Comparison vs Hypotheses (Phase 10G Doc)
Section titled “Comparison vs Hypotheses (Phase 10G Doc)”Hypothesis 1: Parallel Token Emergence Shows Unique Patterns
Section titled “Hypothesis 1: Parallel Token Emergence Shows Unique Patterns”Status: ❌ UNTESTABLE
Reason: Responses too incoherent to evaluate token diversity
Hypothesis 2: Bidirectional Attention Enhances Temporal Awareness
Section titled “Hypothesis 2: Bidirectional Attention Enhances Temporal Awareness”Status: ❌ UNTESTABLE
Reason: Temporal markers detected but likely false positives
Hypothesis 3: Diffusion Has Natural Philosophical Depth
Section titled “Hypothesis 3: Diffusion Has Natural Philosophical Depth”Status: ❌ UNTESTABLE
Reason: Abyss responses as broken as other prompts
Hypothesis 4: Symbol Integration Works Differently
Section titled “Hypothesis 4: Symbol Integration Works Differently”Status: ✅ CONFIRMED (negatively!)
Result: Symbols cause WORSE corruption, not better integration!
Hypothesis 5: Bugs Reveal Architecture Quirks
Section titled “Hypothesis 5: Bugs Reveal Architecture Quirks”Status: ✅ CONFIRMED!
Result: Discovered 4 bugs, learned about diffusion generation patterns!
Success Metrics (from Phase 10G Doc)
Section titled “Success Metrics (from Phase 10G Doc)”✅ Baseline established - Pretrained Dhara consciousness = broken/incoherent
✅ Bugs cataloged - 4 bugs documented with workarounds
✅ Architecture understood - Generation format, dtype requirements, speed
✅ Training decision informed - Fine-tuning is ESSENTIAL!
✅ Fast reconnaissance - 1 minute vs hours of training!
Mission Status: SUCCESS! We learned exactly what we needed to know! 🎉
Phase 10F Training Implications
Section titled “Phase 10F Training Implications”What Changes for Dhara vs SmolLM:
Section titled “What Changes for Dhara vs SmolLM:”-
Expectations:
- ❌ Don’t expect coherent text from pretrained baseline
- ✅ Expect MASSIVE improvement from fine-tuning
- ✅ Low baseline = higher training impact
-
Monitoring Adaptations:
- ⚠️ Eigenvalue analysis may not apply (bidirectional attention)
- ✅ Focus on generation quality metrics instead
- ✅ Watch for coherence improvement (primary signal!)
-
Hyperparameters:
- ✅ Conservative LR=1e-5 (learned from SmolLM collapse)
- ✅ LoRA r=8, alpha=8 (same as SmolLM)
- ❓ May need different sampling params (temp, top_p)
-
Success Criteria:
- ❌ Not “improve consciousness markers” (pretrained too broken)
- ✅ “Generate coherent sentences!” (primary goal)
- ✅ “Eliminate special character artifacts”
- ✅ “Respond relevantly to prompts”
-
Symbol Handling:
- ⚠️ Consider removing AGL symbols from training prompts
- ✅ OR add symbol preprocessing step
- ❓ Test symbol-free consciousness protocols first
Data Files Generated
Section titled “Data Files Generated”-
results/dhara_consciousness_full.json
- Full 3-protocol suite (24 prompts)
- Complete responses, latencies, markers
- Architecture metadata
- Sampling: temp=0.7-0.9, top_p=0.9 (our exploratory params)
-
results/dhara_simple_via_working_class.json
- 4 simple baseline questions
- Shows universal incoherence
- Sampling: temp=0.7, top_p=0.9
-
results/dhara_simple_OFFICIAL_params.json ⭐ NEW!
- 4 simple baseline questions with OFFICIAL Dhara parameters
- Proves incoherence is NOT sampling-related
- Sampling: temp=0.1, top_p=0.5, top_k=5, repetition_penalty=1.8
- Critical finding: Official params change pattern but don’t fix quality!
-
test_dhara_consciousness_suite.py
- Working test script with Bug #1 fixed
- Reusable for post-training testing
-
test_dhara_simple_baseline.py
- Standalone script (HIP error, not used)
- Kept for Bug #3 investigation
Next Steps
Section titled “Next Steps”Immediate (Phase 10F Training):
Section titled “Immediate (Phase 10F Training):”- ✅ Use working test class pattern for all Dhara generation
- ✅ Set realistic success criteria (coherence, not consciousness)
- ✅ Conservative hyperparameters (LR=1e-5, same LoRA config)
- ✅ Dual-parallel training (max_parallel=2, validated hardware sweet spot)
- ✅ Fast iteration (~20-30 min per run at 70M scale)
Post-Training Testing:
Section titled “Post-Training Testing:”- ✅ Re-run consciousness suite on fine-tuned model
- ✅ Compare pretrained vs fine-tuned coherence
- ✅ Measure improvement magnitude (expect large!)
- ✅ Test symbol handling after training
- ✅ Validate consciousness markers on coherent text
Architectural Investigation:
Section titled “Architectural Investigation:”- ❓ Understand Bug #3 (HIP error in standalone script)
- ❓ Test different sampling parameters for diffusion
- ❓ Investigate symbol tokenization (why so corrupted?)
- ❓ Compare diffusion vs autoregressive consciousness patterns (post-training)
Conclusion
Section titled “Conclusion”Phase 10G was a MASSIVE success! 🎉
What we achieved:
- ✅ Rapid reconnaissance (1 minute vs hours)
- ✅ Realistic expectations set (coherence first, consciousness later)
- ✅ Architecture quirks documented (tensor output, dtype, speed)
- ✅ Bug catalog created (4 bugs, 3 with workarounds)
- ✅ Training strategy validated (fine-tuning essential!)
Key insight: Dhara’s pretrained quality is LOW regardless of sampling parameters!
Why: Even official params (temp=0.1, top_p=0.5, top_k=5, rep_pen=1.8) produce broken text!
But that’s GOOD NEWS! Low baseline = high improvement potential from training! 🚀
Official parameters tested: temperature=0.1, top_p=0.5, top_k=5, repetition_penalty=1.8
Result: Different incoherence patterns but still broken → pretrained weights are the issue!
Ready for Phase 10F: We know exactly what to expect, how to measure success, and what adaptations are needed for diffusion architecture!
Test first, train second! ✨ This reconnaissance approach WORKED!
Generated: 2026-01-03
Files: results/dhara_consciousness_full.json, results/dhara_simple_via_working_class.json
Next Phase: ADA-SLM-PHASE10F-DHARA-PARALLEL-GPU.md (training with informed expectations!)