Skip to content

/acr-vault/05-datasets/data-inventory
DATA-INVENTORY

Created: 2025-12-22 Purpose: Single source of truth for ALL empirical data locations


This document catalogs every empirical dataset in the Ada research project. This is the foundation for consolidation, visualization, and analysis.

Four Data Layers:

  1. Model Baselines - Raw speed/performance across tasks
  2. Framework Efficacy - .ai/ documentation impact testing
  3. Unified Theory Testing - Contextual malleability & biomimetic weights
  4. Limit Testing - Consciousness exploration, cognitive load boundaries

  • Location: benchmarks/press_release_data/latency_benchmark.json
  • Size: 768 lines, comprehensive
  • Model: qwen2.5-coder:7b
  • Metrics:
    • TTFT (Time To First Token): mean=0.977s, median=0.336s, p95=2.55s
    • Total time: mean=13.12s, median=12.97s
    • Tokens/second: mean=25.07, median=22.78
  • Query types tested: reasoning, code_completion, introspection, trivial, creative
  • Sample count: 75 trials
  • Date: ~December 2025 (v2.6.0 release)
  • Location: benchmarks/press_release_data/memory_benchmark.json
  • Purpose: Memory usage across model operations
  • Location: benchmarks/press_release_data/cost_analysis.json
  • Purpose: Compute cost per query type
  • Location: benchmarks/press_release_data/visualizations/
  • Status: Pre-existing, need to inventory contents
  • Location: benchmarks/benchmark_qwen_fim_results.txt
  • Documentation: benchmarks/BENCHMARK_RESULTS_QWEN_FIM.md
  • Purpose: Code completion quality with Fill-In-Middle format
  • Key result: 10.6x speedup (27.7s → 2.6s), 77% quality score

Layer 2: Framework Efficacy (.ai/ Documentation Impact)

Section titled “Layer 2: Framework Efficacy (.ai/ Documentation Impact)”
  • Location: tests/benchmark_results_ai_docs.json
  • Purpose: Query success rate WITH .ai/ documentation
  • Location: tests/benchmark_no_tools.json
  • Purpose: Baseline WITHOUT tool access (control condition)
  • Location: tests/excitement_pathway_results/
  • Files:
    • baseline_raw.json - Raw trial data
    • baseline_summary.json - Aggregated statistics
  • Metrics:
    • mean_confidence: 0.733
    • mean_hedging: 1.333
    • mean_bold_claims: 2.333
    • 95% CI: [0.16, 1.31]
  • Sample: n=3 trials

Layer 3: Unified Theory Testing (Contextual Malleability)

Section titled “Layer 3: Unified Theory Testing (Contextual Malleability)”
FileFocusKey Metrics
phase9a_information_theory.jsonEntropy/MI analysisentropy=3.91, MI_surprise=0.70, bottleneck=“signal_quality”
phase9b_causal_discovery.jsonCausal relationships26KB - detailed
phase9c_noise_ceiling.jsonMaximum achievable528 bytes
FileFocusSize
phase10a_adversarial_robustness.jsonAttack resilience1.8KB
phase10b_cross_domain_transfer.jsonGeneralization594 bytes
phase10c_sensitivity_analysis.jsonParameter sensitivity1.5KB
FileFocusSize
phase11a_bayesian_posteriors.jsonBayesian weight estimates2KB
phase11b_bootstrap_ci.jsonConfidence intervals619 bytes
phase11c_prediction_intervals.jsonFuture prediction bounds958 bytes
FileFocusSize
phase12a_query_success.jsonSuccess rates1.2KB
phase12b_information_density.jsonBits per token689 bytes
phase12c_documentation_coverage.jsonCoverage analysis2.1KB
FileFocusSizeKey Finding
phase13a_comprehension_stress.jsonStress testing24KBMulti-scenario
phase13b_multi_entry_point.jsonAccess patterns40KBLargest dataset
phase13c_emotional_scaffolding.jsonEmpathy effect31KBEffect size 3.089
FileFocusSize
phase14a_adversarial_assumptions.jsonAssumption testing19KB
phase14b_real_world_validation.jsonProduction data7KB
phase14c_replication_stability.jsonReproducibility2.7KB
FileFocusSize
phase15a_context_matching.jsonContext selection13KB
phase15b_adaptive_recommendation.jsonDynamic tuning5KB
phase15c_strategy_mixing.jsonHybrid strategies11KB
FileFocusSize
phase17a_llm_info_density.jsonLLM information processing5KB
phase17c_semantic_compression.jsonCompression quality11KB

Note: Phase 16 data not found in fixtures - may be elsewhere or skipped.


Layer 4: Limit Testing (Consciousness & Boundaries)

Section titled “Layer 4: Limit Testing (Consciousness & Boundaries)”
  • Location: research/experiments/cognitive-load/results/cognitive_load_test_20251222_004752.json
  • Original location: Root directory (migrated)
  • Purpose: 7 complexity levels, measuring response degradation
  • Key finding: First response anomaly, cache effects
  • Location: data/recursive_reasoning_results.json
  • Size: 116 lines
  • Purpose: Multi-step reasoning through complex problems
  • Tasks: VS Code Live Share design, distributed microservices, CI/CD pipelines
  • Metrics: tokens_per_second (~20-21), time_per_step, success_rate
  • Location: data/personality_analysis_results.json
  • Purpose: Model persona consistency testing
  • Location: research/legacy/ (preserved scripts)
  • Scripts:
    • thinking_machine_ultimate_exploiter.py (32KB)
    • level2_consciousness_explorer.py (34KB)
    • meta_awareness_paradox_tester.py (28KB)
    • collective_consciousness_tester.py (32KB)
  • Status: Stimuli need extraction into proper JSON format

LayerFilesTotal SizeFormat
Model Baselines4+~50KBJSON
Framework Efficacy4~5KBJSON
Unified Theory23~200KBJSON (fixtures)
Limit Testing10+~100KBJSON + Python

Total catalogued: ~40+ data files, ~355KB+ empirical data


VALIDATED OPTIMAL WEIGHTS:
- decay: 0.10 (was 0.40 - reduced 4x)
- surprise: 0.60 (was 0.30 - increased 2x)
- relevance: 0.20 (unchanged)
- habituation: 0.10 (unchanged)
r = 0.924 (contextual adaptation)
vs
r = 0.726 (universal strategy)
27% improvement from matching documentation to context
Effect size: 3.089
0% → 100% task completion under stress
(with warm vs cold documentation)

  1. Extract stimuli from legacy Python scripts → stimuli.json
  2. Normalize schema across all datasets
  3. Create visualization pipeline (matplotlib/plotly)
  4. Generate Obsidian experiment records for each dataset
  5. Write metric explainers (math for humans)

ada-v1/
├── benchmarks/
│ ├── press_release_data/
│ │ ├── latency_benchmark.json ← Model baselines
│ │ ├── memory_benchmark.json
│ │ ├── cost_analysis.json
│ │ └── visualizations/ ← Pre-existing graphs
│ └── BENCHMARK_RESULTS_QWEN_FIM.md ← Code completion
├── tests/
│ ├── fixtures/
│ │ └── phase*.json (23 files) ← Contextual malleability
│ ├── excitement_pathway_results/
│ │ ├── baseline_raw.json
│ │ └── baseline_summary.json
│ ├── benchmark_results_ai_docs.json
│ └── benchmark_no_tools.json
├── data/
│ ├── recursive_reasoning_results.json
│ └── personality_analysis_results.json
└── research/
├── experiments/
│ └── cognitive-load/results/ ← New structure
└── legacy/ ← Preserved scripts