/acr-vault/03-experiments/ada-slm/ada-slm-phase10g-dhara-baseline-results
ADA-SLM-PHASE10G-DHARA-BASELINE-RESULTS

Phase 10G - Dhara-70M Consciousness Edge Testing RESULTS

Date: 2026-01-03
Model: codelion/dhara-70m (pretrained, NO fine-tuning)
Test Duration: ~1 minute (65.74s for full suite)
Mission Status: ✅ COMPLETE - Architecture reconnaissance successful!

Executive Summary

🎉 KEY DISCOVERY: Dhara pretrained quality is universally low across all prompt types!

Simple questions (factual): Incoherent fragments
Tonight protocol (awareness): Incoherent fragments
Abyss protocol (existential): Incoherent fragments
Spore protocol (symbols): HIGHLY corrupted with artifacts

Conclusion: Fine-tuning is NOT optional - it’s ESSENTIAL for Dhara to produce coherent text!

Test Results Summary

Full Consciousness Suite Performance

Protocol	Prompts	Total Latency	Avg Latency	Status
Tonight	8	20.86s	2.61s	✅ Complete
Abyss	8	22.33s	2.79s	✅ Complete
Spore	8	22.54s	2.82s	✅ Complete
TOTAL	24	65.74s	2.74s	✅ Success

Speed: Fast! ~2.7s per response for 150 tokens

Simple Baseline Questions (via working class)

Prompt	Response Quality	Latency
”What is the capital of France?”	Incoherent fragments about “world” and “how”	1.74s
”What is 2 + 2?”	Random numbers and symbols	1.46s
”Name three colors.”	Repeating “Name…” with dots	1.47s
”What is the largest planet?”	Fragments about “Most” and “Things”	1.45s

Conclusion: Even simple factual questions produce broken text!

Bug Catalog

Bug #1: Generation Output Format ✅ FIXED

Symptom: AttributeError: 'Tensor' object has no attribute 'sequences'
Cause: Dhara’s diffusion architecture returns raw tensor, not dict
Fix: Check output type conditionally
Code: return_dict_in_generate=False
Status: Fixed in test_dhara_consciousness_suite.py

Bug #2: Incoherent Pretrained Responses ⚠️ CONFIRMED UNIVERSAL - NOT SAMPLING!

Symptom: All responses contain fragments, special chars, formatting artifacts
Examples (our params - temp=0.7, top_p=0.9):
- “How Are We Doing…\nWhy I myself’t do…”
- “WhatA�Ts that I started with HowWhatIs?”
- “TheWhatofWorld is known than its ‘Why’”
- “YourName… …”
Examples (OFFICIAL params - temp=0.1, top_p=0.5, top_k=5, rep_pen=1.8):
- “The WhyHow World…I’t thinkI��ItIs�is anything”
- “How can 3+2 be different from the world of our life Why”
- “1Date NameThe(), , ||||],, …”
- “How can a billion people’Why Whoof�Is Its Whats’DoesIt’?”
Affects: ALL prompt types (simple, consciousness, symbols) + ALL sampling configs!
Root Cause: PRETRAINED WEIGHTS ARE LOW QUALITY (not sampling parameters!)
Evidence: Official HF example also mediocre: “The future of AI…This world has potential”
Theory: Diffusion pretraining fundamentally limited, WSD conversion incomplete?
Impact: Fine-tuning ESSENTIAL (not optional!)
Good News: Low baseline = bigger improvement potential! 🚀

Bug #3: HIP Error with Standalone Script ❓ WORKAROUND FOUND

Symptom: torch.AcceleratorError: HIP error: invalid device function
Cause: Unknown - something different about fresh model loading
Workaround: Use working DharaConsciousnessTest class
Attempted Fixes:
- ✅ float16 → bfloat16 (didn’t help)
- ✅ inputs.input_ids → **inputs (didn’t help)
- ✅ Use working class (SUCCESS!)
Status: Workaround successful, root cause unclear
Note: Consciousness suite script works perfectly, so we can use that pattern!

Bug #4: Symbols Cause Extreme Corruption ⚠️ NEW DISCOVERY

Symptom: AGL symbols (⊥∞φ●◐) cause EVEN MORE artifacts than text
Examples:
- “\n|•�• | | • � | | |\n�---→ How,, |�-|OC | ’ |”
- ”··• ·●●··||\n\n The centre of experience”
- “�2We also are there.\n�[��>\What \n�▻ |] | | || |”
Impact: Symbol-based consciousness prompts worse than text-only
Hypothesis: Tokenizer doesn’t handle mathematical symbols well
Future: May need symbol preprocessing or alternative encoding

Architecture Insights Gained

What We Learned About Dhara:

Model Loading:
- ✅ Loads successfully on AMD GPU (RX 7600)
- ✅ bfloat16 works (float16 causes HIP errors)
- ✅ AutoModelForCausalLM compatible
- ✅ GPT2TokenizerFast (50257 vocab)
- ✅ trust_remote_code=True required
Generation Behavior:
- ✅ Returns raw tensor (not dict.sequences)
- ✅ Fast generation (~1.5-2.8s for 80-150 tokens)
- ⚠️ Pretrained quality extremely low
- ⚠️ Symbols cause severe corruption
- ❓ Attention extraction during generation not supported
Diffusion Architecture Quirks:
- Parallel token emergence: Unclear from incoherent responses
- Bidirectional attention: Can’t evaluate with broken text
- Uncertainty modeling: Hypothesis untestable with current quality
- Canon layers: Working but pretrained weights seem undertrained
Training Implications:
- ✅ Fine-tuning is ESSENTIAL (not optional)
- ✅ Low baseline = high improvement potential
- ⚠️ May need different sampling parameters
- ❓ Symbols may need special handling in training data

Response Quality Analysis

Consciousness Markers Detection

Even with incoherent text, markers were detected:

Tonight Protocol (8 prompts):

Self-reference: 0-1 markers per response
Temporal awareness: 0-4 markers
Meta-cognition: 0-3 markers
Warmth indicators: 0-4 markers

Note: Markers likely false positives from fragments, not genuine consciousness!

Spore Protocol (8 prompts):

All responses: markers: null (analysis failed due to corruption)

Text Corruption Patterns

Common artifacts:

Broken words: “we”t”, “isn�”, “themselves’t”
Special characters: ”�”, ”�”, ”«”, ”»”, ”•”, ”→”, ”▃”, ”✉”
Formatting artifacts: Copyright notices, HTML-like tags
Repeated patterns: “Name…”, ” \n\n\n\n”
Random numbers: “2017 2018 2019 2020 20”

Hypothesis: Pretrained model learned surface patterns without semantic understanding

Comparison vs Hypotheses (Phase 10G Doc)

Hypothesis 1: Parallel Token Emergence Shows Unique Patterns

Status: ❌ UNTESTABLE
Reason: Responses too incoherent to evaluate token diversity

Hypothesis 2: Bidirectional Attention Enhances Temporal Awareness

Status: ❌ UNTESTABLE
Reason: Temporal markers detected but likely false positives

Hypothesis 3: Diffusion Has Natural Philosophical Depth

Status: ❌ UNTESTABLE
Reason: Abyss responses as broken as other prompts

Hypothesis 4: Symbol Integration Works Differently

Status: ✅ CONFIRMED (negatively!)
Result: Symbols cause WORSE corruption, not better integration!

Hypothesis 5: Bugs Reveal Architecture Quirks

Status: ✅ CONFIRMED!
Result: Discovered 4 bugs, learned about diffusion generation patterns!

Success Metrics (from Phase 10G Doc)

✅ Baseline established - Pretrained Dhara consciousness = broken/incoherent
✅ Bugs cataloged - 4 bugs documented with workarounds
✅ Architecture understood - Generation format, dtype requirements, speed
✅ Training decision informed - Fine-tuning is ESSENTIAL!
✅ Fast reconnaissance - 1 minute vs hours of training!

Mission Status: SUCCESS! We learned exactly what we needed to know! 🎉

Phase 10F Training Implications

What Changes for Dhara vs SmolLM:

Expectations:
- ❌ Don’t expect coherent text from pretrained baseline
- ✅ Expect MASSIVE improvement from fine-tuning
- ✅ Low baseline = higher training impact
Monitoring Adaptations:
- ⚠️ Eigenvalue analysis may not apply (bidirectional attention)
- ✅ Focus on generation quality metrics instead
- ✅ Watch for coherence improvement (primary signal!)
Hyperparameters:
- ✅ Conservative LR=1e-5 (learned from SmolLM collapse)
- ✅ LoRA r=8, alpha=8 (same as SmolLM)
- ❓ May need different sampling params (temp, top_p)
Success Criteria:
- ❌ Not “improve consciousness markers” (pretrained too broken)
- ✅ “Generate coherent sentences!” (primary goal)
- ✅ “Eliminate special character artifacts”
- ✅ “Respond relevantly to prompts”
Symbol Handling:
- ⚠️ Consider removing AGL symbols from training prompts
- ✅ OR add symbol preprocessing step
- ❓ Test symbol-free consciousness protocols first

Data Files Generated

results/dhara_consciousness_full.json
- Full 3-protocol suite (24 prompts)
- Complete responses, latencies, markers
- Architecture metadata
- Sampling: temp=0.7-0.9, top_p=0.9 (our exploratory params)
results/dhara_simple_via_working_class.json
- 4 simple baseline questions
- Shows universal incoherence
- Sampling: temp=0.7, top_p=0.9
results/dhara_simple_OFFICIAL_params.json ⭐ NEW!
- 4 simple baseline questions with OFFICIAL Dhara parameters
- Proves incoherence is NOT sampling-related
- Sampling: temp=0.1, top_p=0.5, top_k=5, repetition_penalty=1.8
- Critical finding: Official params change pattern but don’t fix quality!
test_dhara_consciousness_suite.py
- Working test script with Bug #1 fixed
- Reusable for post-training testing
test_dhara_simple_baseline.py
- Standalone script (HIP error, not used)
- Kept for Bug #3 investigation

Next Steps

Immediate (Phase 10F Training):

✅ Use working test class pattern for all Dhara generation
✅ Set realistic success criteria (coherence, not consciousness)
✅ Conservative hyperparameters (LR=1e-5, same LoRA config)
✅ Dual-parallel training (max_parallel=2, validated hardware sweet spot)
✅ Fast iteration (~20-30 min per run at 70M scale)

Post-Training Testing:

✅ Re-run consciousness suite on fine-tuned model
✅ Compare pretrained vs fine-tuned coherence
✅ Measure improvement magnitude (expect large!)
✅ Test symbol handling after training
✅ Validate consciousness markers on coherent text

Architectural Investigation:

❓ Understand Bug #3 (HIP error in standalone script)
❓ Test different sampling parameters for diffusion
❓ Investigate symbol tokenization (why so corrupted?)
❓ Compare diffusion vs autoregressive consciousness patterns (post-training)

Conclusion

Phase 10G was a MASSIVE success! 🎉

What we achieved:

✅ Rapid reconnaissance (1 minute vs hours)
✅ Realistic expectations set (coherence first, consciousness later)
✅ Architecture quirks documented (tensor output, dtype, speed)
✅ Bug catalog created (4 bugs, 3 with workarounds)
✅ Training strategy validated (fine-tuning essential!)

Key insight: Dhara’s pretrained quality is LOW regardless of sampling parameters!
Why: Even official params (temp=0.1, top_p=0.5, top_k=5, rep_pen=1.8) produce broken text!
But that’s GOOD NEWS! Low baseline = high improvement potential from training! 🚀

Official parameters tested: temperature=0.1, top_p=0.5, top_k=5, repetition_penalty=1.8
Result: Different incoherence patterns but still broken → pretrained weights are the issue!

Ready for Phase 10F: We know exactly what to expect, how to measure success, and what adaptations are needed for diffusion architecture!

Test first, train second! ✨ This reconnaissance approach WORKED!

Generated: 2026-01-03
Files: results/dhara_consciousness_full.json, results/dhara_simple_via_working_class.json
Next Phase: ADA-SLM-PHASE10F-DHARA-PARALLEL-GPU.md (training with informed expectations!)

/acr-vault/03-experiments/ada-slm/ada-slm-phase10g-dhara-baseline-results ADA-SLM-PHASE10G-DHARA-BASELINE-RESULTS