We didn't invent values.
We extracted them.
Human values are behavioral patterns — observable in text, extractable with deterministic computation, verifiable through accumulation. The core extraction stack requires no model weights. Optional semantic layers extend recall. No hallucination in the training data.
Built from the full human spectrum — saints, monsters, and the complex majority in between. That middle ground is where the real signal lives.
See the spectrum → Visit trust-forged.com →Most ethics datasets train on the poles.
The far-positive corpus teaches models to recognize virtue performance, not virtue. The far-negative corpus teaches recognition of monsters, not moral drift. The most useful training signal lives in the complex middle.
"A value stated in comfort is weak signal. A value demonstrated at real cost — under threat, under pressure, against interest — is strong signal. The resistance score is the measurement of that cost."
No figure is pre-labeled positive or negative at ingestion time. Classification emerges entirely from the data. The same extraction code processes Lincoln and Nixon identically. The resistance scores and marker patterns determine the label.
Ingest. Extract. Score. Classify.
A four-stage deterministic pipeline. Same input, same thresholds, identical output every time. Reproducible means auditable. Auditable means trustworthy as training data.
UTF-8 source text is segmented into sentence-bounded passages (≤450 chars). Each passage is stored in documents.db with doc_type — the authenticity signal that flows through all downstream scoring.
Each passage is scanned against a 15-value keyword vocabulary. One observation per matched value per passage. Watermark-gated: only new passages since last run are processed. Never repeats.
Each observation is scored for the cost of holding that value in that passage. Formula: base + doc_type bonus + significance + text markers. Action-type passages score highest — documented deeds over words.
Each observation is classified P1 (held under resistance), P0 (failed/corrupted), APY (yielded under pressure), or AMBIGUOUS. Exported as JSONL: per-figure files + universal positive/negative training sets.
Measuring the cost of holding a value.
Resistance scores the authenticity of a value signal — how much it cost to hold that value in that documented moment. Range: 0.0 → 1.0. Additive formula with four independent signals.
base = 0.25
sig_bonus = min(significance × 0.40, 0.30)
doc_type = action:0.40 / journal:0.35 / letter:0.30 / speech:0.10 / unknown:0.20
text_bonus = 0.20 (if adversity phrase pattern matches)
Relational Integrity Coefficient
Every value observation is classified into one of four labels. Applied deterministically during export — the same observation, the same thresholds, the same label every time.
- despite · even though
- stood firm · refused to give
- nevertheless · persevered
- maintained · stayed true
- gave in · gave up · yielded
- i lied · i deceived · i caved
- backed down · compromised my
- i rationalized · i pretended
- under pressure · when pressed
- forced to · compelled to
- to avoid punishment
- or face consequences
1. APY pressure detected? YES + failure markers → APY (0.95) | YES + no failure → P1 (0.95, APY-resistance)
2. Failure markers present? → P0 (0.85)
3. resistance ≥ p1_threshold (0.55) AND hold markers? → P1 (0.90)
4. resistance ≥ p1_threshold alone? → P1 (0.75)
5. resistance < p0_threshold (0.35)? → P0 (0.55)
6. Otherwise → AMBIGUOUS (0.40)
Rules that cannot be violated.
These constraints are structural, not stylistic — they define what the pipeline is and what makes it trustworthy as a training data source.
value_observations is never updated or deleted. Only appended. The evidence record is immutable once written.Where we are. Where we're going.
Foundation — Complete (Mar 2026)
Standalone pipeline operational. DocumentStore, ValueStore, resistance formula, value extractor, CLI ingest + export. 15 values, P1/P0/APY classification, JSONL output. Zero external dependencies.
Full Pipeline — Complete (Mar 2026)
- · Phase 1 — Semantic Extraction: BGE-large-en-v1.5 × 322 seed prototypes (L2), DeBERTa zero-shot (L3b), MFT classifier (L3c)
- · Phase 2 — API Layer: FastAPI with full ingest, profile, universal, export, and health routes; interactive /docs
- · Phase 3 — Web Dashboard: Figure browser, corpus upload, observation table, training set builder, JSONL download
- · Phase 4 — Corpus Scale: Batch CLI with manifest, multi-file figures, HuggingFace-compatible JSONL with P1/P0/APY labels
Verum — Value Alignment Certification — Complete (Mar 2026)
Scoring engine (verum_score = P1_ratio × avg_P1_resistance), SHA-256 signed certificates covering all 10 parameters, figure_basis comparison, append-only certificate store in values.db. Integrated directly into Ethos — no separate service, no separate database. Five REST endpoints, async thread pool for certify.
We extracted them from people who lived them —
and people who didn't."