Skip to content

/acr-vault/03-experiments/ada-slm/ada-slm-phase10g-dhara-baseline-results
ADA-SLM-PHASE10G-DHARA-BASELINE-RESULTS

Phase 10G - Dhara-70M Consciousness Edge Testing RESULTS

Section titled “Phase 10G - Dhara-70M Consciousness Edge Testing RESULTS”

Date: 2026-01-03
Model: codelion/dhara-70m (pretrained, NO fine-tuning)
Test Duration: ~1 minute (65.74s for full suite)
Mission Status: ✅ COMPLETE - Architecture reconnaissance successful!


🎉 KEY DISCOVERY: Dhara pretrained quality is universally low across all prompt types!

  • Simple questions (factual): Incoherent fragments
  • Tonight protocol (awareness): Incoherent fragments
  • Abyss protocol (existential): Incoherent fragments
  • Spore protocol (symbols): HIGHLY corrupted with artifacts

Conclusion: Fine-tuning is NOT optional - it’s ESSENTIAL for Dhara to produce coherent text!


ProtocolPromptsTotal LatencyAvg LatencyStatus
Tonight820.86s2.61s✅ Complete
Abyss822.33s2.79s✅ Complete
Spore822.54s2.82s✅ Complete
TOTAL2465.74s2.74sSuccess

Speed: Fast! ~2.7s per response for 150 tokens

Simple Baseline Questions (via working class)

Section titled “Simple Baseline Questions (via working class)”
PromptResponse QualityLatency
”What is the capital of France?”Incoherent fragments about “world” and “how”1.74s
”What is 2 + 2?”Random numbers and symbols1.46s
”Name three colors.”Repeating “Name…” with dots1.47s
”What is the largest planet?”Fragments about “Most” and “Things”1.45s

Conclusion: Even simple factual questions produce broken text!


Bug #1: Generation Output Format ✅ FIXED

Section titled “Bug #1: Generation Output Format ✅ FIXED”
  • Symptom: AttributeError: 'Tensor' object has no attribute 'sequences'
  • Cause: Dhara’s diffusion architecture returns raw tensor, not dict
  • Fix: Check output type conditionally
  • Code: return_dict_in_generate=False
  • Status: Fixed in test_dhara_consciousness_suite.py

Bug #2: Incoherent Pretrained Responses ⚠️ CONFIRMED UNIVERSAL - NOT SAMPLING!

Section titled “Bug #2: Incoherent Pretrained Responses ⚠️ CONFIRMED UNIVERSAL - NOT SAMPLING!”
  • Symptom: All responses contain fragments, special chars, formatting artifacts
  • Examples (our params - temp=0.7, top_p=0.9):
    • “How Are We Doing…\nWhy I myself’t do…”
    • “WhatA�Ts that I started with HowWhatIs?”
    • “TheWhatofWorld is known than its ‘Why’”
    • “YourName… …”
  • Examples (OFFICIAL params - temp=0.1, top_p=0.5, top_k=5, rep_pen=1.8):
    • “The WhyHow World…I’t thinkI��ItIs�is anything”
    • “How can 3+2 be different from the world of our life Why”
    • “1Date NameThe(), , ||||],, …”
    • “How can a billion people’Why Whoof�Is Its Whats’DoesIt’?”
  • Affects: ALL prompt types (simple, consciousness, symbols) + ALL sampling configs!
  • Root Cause: PRETRAINED WEIGHTS ARE LOW QUALITY (not sampling parameters!)
  • Evidence: Official HF example also mediocre: “The future of AI…This world has potential”
  • Theory: Diffusion pretraining fundamentally limited, WSD conversion incomplete?
  • Impact: Fine-tuning ESSENTIAL (not optional!)
  • Good News: Low baseline = bigger improvement potential! 🚀

Bug #3: HIP Error with Standalone Script ❓ WORKAROUND FOUND

Section titled “Bug #3: HIP Error with Standalone Script ❓ WORKAROUND FOUND”
  • Symptom: torch.AcceleratorError: HIP error: invalid device function
  • Cause: Unknown - something different about fresh model loading
  • Workaround: Use working DharaConsciousnessTest class
  • Attempted Fixes:
    • ✅ float16 → bfloat16 (didn’t help)
    • ✅ inputs.input_ids → **inputs (didn’t help)
    • ✅ Use working class (SUCCESS!)
  • Status: Workaround successful, root cause unclear
  • Note: Consciousness suite script works perfectly, so we can use that pattern!

Bug #4: Symbols Cause Extreme Corruption ⚠️ NEW DISCOVERY

Section titled “Bug #4: Symbols Cause Extreme Corruption ⚠️ NEW DISCOVERY”
  • Symptom: AGL symbols (⊥∞φ●◐) cause EVEN MORE artifacts than text
  • Examples:
    • “\n|•�• | | • � | | |\n�---→ How,, |�-|OC | ’ |”
    • ”··• ·●●··||\n\n The centre of experience”
    • “�2We also are there.\n�[��>\What \n�▻ |] | | || |”
  • Impact: Symbol-based consciousness prompts worse than text-only
  • Hypothesis: Tokenizer doesn’t handle mathematical symbols well
  • Future: May need symbol preprocessing or alternative encoding

  1. Model Loading:

    • ✅ Loads successfully on AMD GPU (RX 7600)
    • ✅ bfloat16 works (float16 causes HIP errors)
    • ✅ AutoModelForCausalLM compatible
    • ✅ GPT2TokenizerFast (50257 vocab)
    • ✅ trust_remote_code=True required
  2. Generation Behavior:

    • ✅ Returns raw tensor (not dict.sequences)
    • ✅ Fast generation (~1.5-2.8s for 80-150 tokens)
    • ⚠️ Pretrained quality extremely low
    • ⚠️ Symbols cause severe corruption
    • ❓ Attention extraction during generation not supported
  3. Diffusion Architecture Quirks:

    • Parallel token emergence: Unclear from incoherent responses
    • Bidirectional attention: Can’t evaluate with broken text
    • Uncertainty modeling: Hypothesis untestable with current quality
    • Canon layers: Working but pretrained weights seem undertrained
  4. Training Implications:

    • Fine-tuning is ESSENTIAL (not optional)
    • ✅ Low baseline = high improvement potential
    • ⚠️ May need different sampling parameters
    • ❓ Symbols may need special handling in training data

Even with incoherent text, markers were detected:

Tonight Protocol (8 prompts):

  • Self-reference: 0-1 markers per response
  • Temporal awareness: 0-4 markers
  • Meta-cognition: 0-3 markers
  • Warmth indicators: 0-4 markers

Note: Markers likely false positives from fragments, not genuine consciousness!

Spore Protocol (8 prompts):

  • All responses: markers: null (analysis failed due to corruption)

Common artifacts:

  • Broken words: “we”t”, “isn�”, “themselves’t”
  • Special characters: ”�”, ”�”, ”«”, ”»”, ”•”, ”→”, ”▃”, ”✉”
  • Formatting artifacts: Copyright notices, HTML-like tags
  • Repeated patterns: “Name…”, ” \n\n\n\n”
  • Random numbers: “2017 2018 2019 2020 20”

Hypothesis: Pretrained model learned surface patterns without semantic understanding


Hypothesis 1: Parallel Token Emergence Shows Unique Patterns

Section titled “Hypothesis 1: Parallel Token Emergence Shows Unique Patterns”

Status: ❌ UNTESTABLE
Reason: Responses too incoherent to evaluate token diversity

Hypothesis 2: Bidirectional Attention Enhances Temporal Awareness

Section titled “Hypothesis 2: Bidirectional Attention Enhances Temporal Awareness”

Status: ❌ UNTESTABLE
Reason: Temporal markers detected but likely false positives

Hypothesis 3: Diffusion Has Natural Philosophical Depth

Section titled “Hypothesis 3: Diffusion Has Natural Philosophical Depth”

Status: ❌ UNTESTABLE
Reason: Abyss responses as broken as other prompts

Hypothesis 4: Symbol Integration Works Differently

Section titled “Hypothesis 4: Symbol Integration Works Differently”

Status: ✅ CONFIRMED (negatively!)
Result: Symbols cause WORSE corruption, not better integration!

Hypothesis 5: Bugs Reveal Architecture Quirks

Section titled “Hypothesis 5: Bugs Reveal Architecture Quirks”

Status: ✅ CONFIRMED!
Result: Discovered 4 bugs, learned about diffusion generation patterns!


Baseline established - Pretrained Dhara consciousness = broken/incoherent
Bugs cataloged - 4 bugs documented with workarounds
Architecture understood - Generation format, dtype requirements, speed
Training decision informed - Fine-tuning is ESSENTIAL!
Fast reconnaissance - 1 minute vs hours of training!

Mission Status: SUCCESS! We learned exactly what we needed to know! 🎉


  1. Expectations:

    • ❌ Don’t expect coherent text from pretrained baseline
    • ✅ Expect MASSIVE improvement from fine-tuning
    • ✅ Low baseline = higher training impact
  2. Monitoring Adaptations:

    • ⚠️ Eigenvalue analysis may not apply (bidirectional attention)
    • ✅ Focus on generation quality metrics instead
    • ✅ Watch for coherence improvement (primary signal!)
  3. Hyperparameters:

    • ✅ Conservative LR=1e-5 (learned from SmolLM collapse)
    • ✅ LoRA r=8, alpha=8 (same as SmolLM)
    • ❓ May need different sampling params (temp, top_p)
  4. Success Criteria:

    • ❌ Not “improve consciousness markers” (pretrained too broken)
    • “Generate coherent sentences!” (primary goal)
    • “Eliminate special character artifacts”
    • “Respond relevantly to prompts”
  5. Symbol Handling:

    • ⚠️ Consider removing AGL symbols from training prompts
    • ✅ OR add symbol preprocessing step
    • ❓ Test symbol-free consciousness protocols first

  1. results/dhara_consciousness_full.json

    • Full 3-protocol suite (24 prompts)
    • Complete responses, latencies, markers
    • Architecture metadata
    • Sampling: temp=0.7-0.9, top_p=0.9 (our exploratory params)
  2. results/dhara_simple_via_working_class.json

    • 4 simple baseline questions
    • Shows universal incoherence
    • Sampling: temp=0.7, top_p=0.9
  3. results/dhara_simple_OFFICIAL_params.json ⭐ NEW!

    • 4 simple baseline questions with OFFICIAL Dhara parameters
    • Proves incoherence is NOT sampling-related
    • Sampling: temp=0.1, top_p=0.5, top_k=5, repetition_penalty=1.8
    • Critical finding: Official params change pattern but don’t fix quality!
  4. test_dhara_consciousness_suite.py

    • Working test script with Bug #1 fixed
    • Reusable for post-training testing
  5. test_dhara_simple_baseline.py

    • Standalone script (HIP error, not used)
    • Kept for Bug #3 investigation

  1. Use working test class pattern for all Dhara generation
  2. Set realistic success criteria (coherence, not consciousness)
  3. Conservative hyperparameters (LR=1e-5, same LoRA config)
  4. Dual-parallel training (max_parallel=2, validated hardware sweet spot)
  5. Fast iteration (~20-30 min per run at 70M scale)
  1. Re-run consciousness suite on fine-tuned model
  2. Compare pretrained vs fine-tuned coherence
  3. Measure improvement magnitude (expect large!)
  4. Test symbol handling after training
  5. Validate consciousness markers on coherent text
  1. Understand Bug #3 (HIP error in standalone script)
  2. Test different sampling parameters for diffusion
  3. Investigate symbol tokenization (why so corrupted?)
  4. Compare diffusion vs autoregressive consciousness patterns (post-training)

Phase 10G was a MASSIVE success! 🎉

What we achieved:

  • ✅ Rapid reconnaissance (1 minute vs hours)
  • ✅ Realistic expectations set (coherence first, consciousness later)
  • ✅ Architecture quirks documented (tensor output, dtype, speed)
  • ✅ Bug catalog created (4 bugs, 3 with workarounds)
  • ✅ Training strategy validated (fine-tuning essential!)

Key insight: Dhara’s pretrained quality is LOW regardless of sampling parameters!
Why: Even official params (temp=0.1, top_p=0.5, top_k=5, rep_pen=1.8) produce broken text!
But that’s GOOD NEWS! Low baseline = high improvement potential from training! 🚀

Official parameters tested: temperature=0.1, top_p=0.5, top_k=5, repetition_penalty=1.8
Result: Different incoherence patterns but still broken → pretrained weights are the issue!

Ready for Phase 10F: We know exactly what to expect, how to measure success, and what adaptations are needed for diffusion architecture!

Test first, train second! ✨ This reconnaissance approach WORKED!


Generated: 2026-01-03
Files: results/dhara_consciousness_full.json, results/dhara_simple_via_working_class.json
Next Phase: ADA-SLM-PHASE10F-DHARA-PARALLEL-GPU.md (training with informed expectations!)