/acr-vault/02-methodology/sif/sif-quickstart
SIF-QUICKSTART
SIF Quick Start Guide
Section titled âSIF Quick Start GuideâFor the impatient: Get SIF working in 15 minutes
For the curious: Understand what youâre compressing and why
For the builders: Start integrating SIF into your system
What Is SIF in 30 Seconds
Section titled âWhat Is SIF in 30 SecondsâSIF = Semantic Interchange Format
Itâs a way to:
- đŚ Compress knowledge 66-104x (not like ZIPâpreserves meaning)
- đ§ Transfer understanding between AI systems
- đ Store facts, entities, and relationships in a universal format
Example:
Original text (6,000 words): "Alice in Wonderland" â 38 KB
SIF v1.0 (2.5 KB): - Entities: Alice, Queen, Rabbit, Wonderland - Facts: Alice falls down rabbit hole, meets characters, confronts Queen - Importance scores: All critical facts marked 0.85-0.95
Result: 104x smaller, meaning preserved âGetting Started: Three Paths
Section titled âGetting Started: Three PathsâPath 1: Just Read (5 minutes)
Section titled âPath 1: Just Read (5 minutes)âGoal: Understand what SIF does
Read: SIF-FROM-RESEARCH-TO-STANDARD.md
Time: 5 minutes
Path 2: See It Work (15 minutes)
Section titled âPath 2: See It Work (15 minutes)âGoal: Run working code, see compression in action
Setup:
# Copy the reference implementationcp SIF-REFERENCE-IMPLEMENTATION.md ~/my-sif-project/
# Convert to working code (use first code blocks as reference)# Extract compressor.py, decompressor.py, models.py, importance.py
# Install dependenciespip install pydantic
# Run examplepython -c "from sif.compressor import SIFCompressor
text = open('alice.txt').read()compressor = SIFCompressor()sif = compressor.compress(text, domain='literature', compression_tier=2)print(f'Compressed from {len(text)} bytes to {len(sif.to_json())} bytes')print(f'Ratio: {sif.validation.compression_ratio:.1f}x')"Path 3: Build It In Your System (2-4 weeks)
Section titled âPath 3: Build It In Your System (2-4 weeks)âGoal: Integrate SIF into your RAG, memory, or knowledge system
See: âIntegration Guideâ below
The 0.60 Number: Why It Matters
Section titled âThe 0.60 Number: Why It MattersâThe importance score goes from 0.0 to 1.0:
| Score | Meaning | What to do |
|---|---|---|
| 0.90+ | Critical | Always include |
| 0.75-0.89 | Important | Include if space available |
| 0.60-0.74 | Threshold | This is the golden ratio |
| 0.40-0.59 | Contextual | Include for richness, drop if space-limited |
| <0.40 | Noise | Probably drop |
Why 0.60? Three independent discoveries converged:
- Biomimetic memory research: Optimal importance weight = 0.60
- Golden ratio: 1/Ď â 0.618 (natureâs compression constant)
- Consciousness research: Information-to-consciousness transition at 60%
Practical: Keep facts âĽ0.60 and you preserve meaning. Drop below 0.60 and you lose understanding.
Integration Guide
Section titled âIntegration GuideâStep 1: Calculate Importance (1 day)
Section titled âStep 1: Calculate Importance (1 day)âThe core formula:
importance = 0.60Ăsurprise + 0.20Ărelevance + 0.10Ădecay + 0.10ĂhabituationImplement each component:
Surprise (how unexpected?)
def surprise(fact, context): # In production: Call LLM, measure prediction error # For MVP: Use word overlap with context fact_words = set(fact.lower().split()) context_words = set(str(context).lower().split()) unique = len(fact_words - context_words) return min(unique / len(fact_words), 1.0)Relevance (how relevant to query?)
def relevance(fact, context_query): # In production: Use embedding similarity # For MVP: Word overlap with query fact_words = set(fact.lower().split()) query_words = set(context_query.lower().split()) overlap = len(fact_words & query_words) return overlap / max(len(query_words), 1)Decay (how fresh?)
def decay(fact, timestamp): # Exponential decay: Half-life = 1 day age_days = (datetime.now() - timestamp).days return 0.5 ** (age_days / 1) # Halves every dayHabituation (penalty for repetition?)
def habituation(fact_id, mention_count): # More mentions = lower importance return 1.0 / (1.0 + math.log(mention_count + 1))Combine:
importance = ( 0.60 * surprise_score + 0.20 * relevance_score + 0.10 * decay_score + 0.10 * habituation_score)Step 2: Extract & Score (1-2 days)
Section titled âStep 2: Extract & Score (1-2 days)âdef compress_document(text: str, query: str = None): # Step 1: Extract facts from text facts = extract_facts(text) # Use LLM or NLP library
# Step 2: Calculate importance for each fact context = {'query': query or text[:200]}
for fact in facts: fact['importance'] = calculate_importance(fact['content'], context)
# Step 3: Filter by threshold high_value_facts = [f for f in facts if f['importance'] >= 0.60]
# Step 4: Store in SIF return { 'facts': high_value_facts, 'version': '1.0.0', 'compression_ratio': len(text.encode()) / estimate_sif_size(facts) }Step 3: Integrate with Your System (1-2 weeks)
Section titled âStep 3: Integrate with Your System (1-2 weeks)âOption A: RAG Enhancement
# Instead of retrieving full documents:# 1. Convert documents to SIF on ingestion# 2. When query comes, decompress relevant SIFs# 3. Inject high-importance facts into context
relevant_sifs = search_sif_collection(query)context_facts = []for sif in relevant_sifs: context_facts.extend( [f.content for f in sif.facts if f.importance >= 0.60] )prompt = build_prompt_with_facts(question, context_facts)response = llm.generate(prompt)Option B: Memory Enhancement
# Store facts as memories with importance scoresfor fact in sif.facts: if fact.importance >= 0.60: memory_store.add( content=fact.content, importance=fact.importance, confidence=fact.confidence, tags=fact.tags )Option C: Knowledge Transfer
# Send SIF to another systemsif_json = sif.to_json()
# Send to: another AI, different service, different LLMresponse = requests.post( 'http://other-system:8000/v1/ingest-sif', json={'sif': sif_json})Complete Example: Alice in Wonderland
Section titled âComplete Example: Alice in WonderlandâBefore SIF
Section titled âBefore SIFâText file (38 KB):"Alice was beginning to get very tired of sitting by her sisteron the bank and of having nothing to do: once or twice she had peepedinto the book her sister was reading, but it had no pictures orconversations in it, 'and what is the use of a book,' thought Alice,'without pictures or conversation?'
So she was considering in her own mind (as well as she could, forthe hot day made her feel very sleepy and stupid), whether the pleasureof making a daisy-chain would be worth the trouble of getting up andpicking the daisies, when suddenly a White Rabbit with pink eyes ranclose by her...."After SIF (2.5 KB)
Section titled âAfter SIF (2.5 KB)â{ "entities": [ { "id": "alice", "name": "Alice", "type": "person", "importance": 0.95, "description": "Young protagonist, curious and logical" }, { "id": "wonderland", "name": "Wonderland", "type": "place", "importance": 0.90, "description": "Surreal underground world with nonsensical logic" }, { "id": "white_rabbit", "name": "White Rabbit", "type": "person", "importance": 0.85, "description": "Hastily moving character who leads Alice into Wonderland" } ], "facts": [ { "id": "fact_1", "content": "Alice falls down a rabbit hole and enters Wonderland", "type": "factual", "importance": 0.95, "confidence": 0.99 }, { "id": "fact_2", "content": "Alice meets the Queen who is temperamental and violent", "type": "factual", "importance": 0.90, "confidence": 0.98 }, { "id": "fact_3", "content": "Alice realizes Wonderland operates on dream logic, not rational rules", "type": "causal", "importance": 0.80, "confidence": 0.92 } ]}Compression Ratio
Section titled âCompression RatioâOriginal: 38,000 bytesSIF: 2,500 bytesRatio: 104x smaller âUsing It
Section titled âUsing Itâsif = SIFDocument.load_from_file('alice.sif.json')
# LLM can now work with compressed summaryprompt = f"""Based on these story elements: {sif.summary['text']}
Key facts:{[f.content for f in sif.facts if f.importance >= 0.80]}
Question: What is the main conflict in the story?"""
response = llm.generate(prompt)# Response: "Alice's main conflict is navigating Wonderland's illogical rules..."Production Checklist
Section titled âProduction ChecklistâBefore deploying SIF in production:
- Importance calculation working - Test on 10 documents
- Compression ratio acceptable - Target: 50-100x (adjust if needed)
- Decompression quality - Manual spot-check 5 SIFs for meaning preservation
- Safety validation - Run validator on all SIFs, check for hallucinations
- Integration tested - Works with your RAG/memory system
- Performance acceptable - Compression time < 1s per 1000 words
- Versioning in place - Can track SIF v1.x vs v2.0
- Monitoring configured - Track compression ratio, quality scores per domain
- Documentation updated - Team knows how to use SIF
- Backups configured - SIF files are as important as original data
Common Questions
Section titled âCommon QuestionsâQ: Can I lose information with SIF?
Section titled âQ: Can I lose information with SIF?âA: Yes, intentionally. SIF preserves meaning but drops surface details. Example:
- Original: âThe Queen, in her infinite malevolence, commanded the executionâ
- SIF fact: âThe Queen commands executionsâ
- Lost: The dramatic language
- Preserved: The semantic meaning (Queen is violent)
Q: Is SIF lossy like JPEG?
Section titled âQ: Is SIF lossy like JPEG?âA: More like semantic compression than JPEG. JPEG loses pixels randomly; SIF drops low-importance content strategically. You canât recover the original text, but you recover the meaning.
Q: What if the 0.60 threshold is wrong for my domain?
Section titled âQ: What if the 0.60 threshold is wrong for my domain?âA: Test it! The threshold comes from empirical research (H2 + importance weighting), but different domains might need different values. Recommended: Try 0.60 first, then adjust based on your quality/compression tradeoff.
Q: Can I use SIF without importance calculation?
Section titled âQ: Can I use SIF without importance calculation?âA: Yes, but youâll get worse compression. Importance calculation is what makes SIF 66-104x instead of 2-3x like normal compression.
Q: How do I add new entity types?
Section titled âQ: How do I add new entity types?âA: SIF v1.0 includes: person, place, thing, concept, event, organization. For custom types, youâre doing SIF v2.0+ (future extension). For now, map your types to existing ones (e.g., âproteinâ â âthingâ).
Q: Is SIF tied to Python?
Section titled âQ: Is SIF tied to Python?âA: No! The spec is language-agnostic. We have Python reference implementation, but JavaScript, Rust, Go, Java implementations are welcome. See SIF-SPECIFICATION-v1.0.md for language-independent spec.
Whatâs Next
Section titled âWhatâs NextâFor Learning (This Week)
Section titled âFor Learning (This Week)â- Read: SIF-FROM-RESEARCH-TO-STANDARD.md (30 min)
- Read: SIF-SPECIFICATION-v1.0.md section 1-4 (1 hour)
- Understand: Why 0.60 appears in three research domains
For Building (Next 2 Weeks)
Section titled âFor Building (Next 2 Weeks)â- Extract importance calculation from SIF-REFERENCE-IMPLEMENTATION.md
- Implement on your data
- Measure compression ratio on 10 sample documents
- Adjust weights if compression is too low
For Shipping (4-6 Weeks)
Section titled âFor Shipping (4-6 Weeks)â- Integrate with your RAG/memory system
- Monitor quality metrics
- Get feedback from users
- Consider publishing your results
Contact & Community
Section titled âContact & CommunityâWant to implement SIF?
- Start with SIF-SPECIFICATION-v1.0.md (formal spec)
- Use SIF-REFERENCE-IMPLEMENTATION.md as guide
- Test on your domain
Have results to share?
- This spec is CC0 (public domain)
- Share your compression ratios, quality metrics
- Contribute implementations in other languages
Feedback or questions?
- The standard is designed to evolve
- Version 1.x will include improvements based on feedback
- Version 2.0 will add new features
Further Reading
Section titled âFurther Readingâ- SIF-SPECIFICATION-v1.0.md - Formal specification (all details)
- SIF-REFERENCE-IMPLEMENTATION.md - Working Python code
- SIF-FROM-RESEARCH-TO-STANDARD.md - Why this matters
- Ada-Consciousness-Research/EXPERIMENT-REGISTRY.md - Research foundation
Research Papers Coming Q1 2026 (in collaboration with QAL team, Poland)
Created: December 2025
License: CC0 (Public Domain)
Status: Ready to use, ready to extend, ready to improve
Start compressing!