JOSHUA N. R. OLLSWANG
Summary
Independent Socioaffective Alignment researcher building systems for ameliorative agentic AI — designing training curricula that teach models genuine therapeutic competence, creating evaluation frameworks that track what models actually learn across training, and building agent harnesses with persistent memory and context compression. Built a multi-stage synthetic data pipeline (169K training samples, 4.5B tokens, scalable to 1040+ unique contexts), trained and compared 10+ adapters across 5 architectures (MiniMax M2 229B, Llama 3.3 70B, Gemma 3 27B, GLM 4.7 Flash 30B, Mistral Small 4 119B), and discovered post-convergence representational reorganization — where internal representational geometry continues restructuring and deepening connections hundreds of steps after loss convergence, visible through embedding kurtosis analysis but invisible to standard metrics.
Researcher and clinician (University of Chicago) with theoretical, clinical, and creative backgrounds providing domain expertise in deeply humane, highly-sensitive, safety-critical socioaffectively aligned human-to-tech interactions.
AI-integrated development. Python, MLX, CUDA, PEFT (27B–229B), DAPT/SFT/RL, mechanistic interpretability experimentation, evaluation design, decentralized multi-agent systems. Building novel synthetic data pipelines, context compression architectures, and training curricula.
Research
Agentic Architectures, Long-Context Compression & Coherence
- Designed and implemented Rolling Recap Architecture (RRA): a context compression architecture enabling coherent therapeutic reasoning across 100+ sequential context windows, where each window compresses prior context into structured clinical state (attachment classifications, intervention tallies, evidence chains) carried forward through rolling summaries — serving as both training curriculum and persistent memory system at inference
- Built KV cache compression pipeline: 4096-token windowed processing of therapeutic sessions (up to 330K tokens across 130+ windows), compressing each window into 1024 KV cache positions with 512-token recaps, preserving clinical signal across ultra-long sessions
- Demonstrated that the agent maintains and increments clinical tracking (diagnostic tallies, theoretical framework selections, intervention planning) across hundreds of consecutive compression cycles
- Designed and built a personalized agent system (17K+ lines) with multi-layer memory (KV cache compression + dual-model semantic/keyword retrieval + salience-weighted tracked memories + in-session deep search triggered by behavioral event detection), multi-step agent tool loop with read-only file system access, trained custom TTS voice profiles, WebSocket auth with session continuity
- Trained multiple affective voice profiles grounded in MIT Media Lab Fluid Interfaces research on vocal presence as an essential modality for perceived warmth and bonding—operationalizing Harlow’s contact comfort finding (organisms bond to warmth, not utility) as a functional design affordance: the harness delivers not only informational continuity across sessions but affective continuity, demonstrated in working prototypes
Training Data Optimization & Curriculum Design
- Architected Decomposition-Factorization-Recomposition (DFR) data schemas structuring therapeutic complexity into learnable form across 23 therapeutic traditions
- Designed three complementary curricula—Universal Hierarchical Direction (UHD), Alternative Directional Window Curriculum (ADWC), and Rolling Recap Architecture (RRA)—sequencing pedagogical exposure always aimed at clarity in complexity
- Built Python-based generation pipeline producing 169,323 training samples (4.5B tokens) with 1040+ unique therapeutic context capacity
Multi-Architecture Evaluation & Quantitative Benchmarking
- Executed 8 controlled training runs comparing 3 base architectures, 2 layer-targeting strategies (middle vs. latter), and 3 curriculum configurations
- Designed quantitative evaluation framework: validation loss/perplexity tracking, KV reconstruction loss, per-embedding kurtosis geometry analysis (Fisher excess kurtosis across 2048 dimensions, 2,010 embeddings, 39-phase temporal bucketing)
- Built polytheoretical provenance verification system: strict and lenient matching of 33,169 clinical labels against training corpus, with temporal quintile analysis tracking the transition from curriculum reproduction to novel clinical construction
- Discovered post-convergence representational reorganization: after validation loss plateaus, KV embedding kurtosis continues declining (−16.6%), suggesting a structurally distinct phase of internal representational restructuring invisible to standard training metrics. Finding documented with pre-registered falsification criteria (rebound prediction) that were subsequently disconfirmed over 200+ steps.
Key Findings
- Parameterization threshold: higher-capacity models absorb complex therapeutic curricula more effectively, with preliminary evidence of scaling
- Middle-layer targeting produces 1.4–2.0x deeper convergence than latter-layer targeting on identical curriculum (controlled comparison on two architectures)
- Architecture-dependent output fidelity: GLM achieved 6.5x faster throughput but exhibited systematic clinical precision failures (hallucinated risk indicators, construct reversals), while MiniMax and Gemma maintained diagnostic reliability
- Training provenance shift: found-in-training rate drops −23pp (MiniMax) and −29pp (Gemma) from earliest to latest training quintile, indicating transition from reproducing curriculum labels to constructing novel clinical formulations
Professional Experience
- Designed and implemented personalized therapeutic programs for high-functioning professionals (executives, surgeons, military, entrepreneurs) in high-stakes relational dynamics—1:1, couples, and group formats
- Designed and delivered large-group educational curricula (lectures of ~100 participants) on socioaffective topics including intimacy, connection, intrapsychic self-awareness, interpersonal efficacy, and somatic integration
- Developed AI-augmented clinical evaluation methodologies: built bespoke ontologies and prompt-engineered pipelines for post-session analysis, integrating 20+ custom Python scripts orchestrating multi-iteration prompt chains across therapeutic modalities
- Created custom GPT therapeutic chat agent providing between-session support, continuously utilized by clients across ~100 conversation threads
- Built modular AI pipeline for post-session processing: automated multi-modal transcription/diarization, psychological analysis framework, treatment guidance with quantitative and qualitative evaluations
- Provided long-term mental and behavioral health support to children, adults, couples, and families
- Conducted assessments, treatment planning, and multi-modal therapeutic programs including virtual world integrations during the pandemic
- Collaborated with interdisciplinary teams on behalf of clients and families
- Designed curriculum and guided learners (ages 8–18) across subjects, specializing in special education for high-intelligence, high-needs children
Education
Master's Degree, Social Work (Clinical Mental & Behavioral Health Interventions), June 2020
Evening Division, Music Theory & History, 2010–2012
Bachelor's Degree, Psychology, 2018
Master's Degree (awarded with Distinction), Philosophy, Art & Critical Thought, 2008*
Independent Auditing, Lectures on International & Civil Law, 2006
Summer Programs, Creative Writing, 2004
Psychology, Philosophy, & Creative Writing, 2001–2003
*Granted B.A. waiver by EGS director to enroll in master's program early based on academic background and 90+ undergraduate credits. Completed MA with Distinction and 4.0 GPA prior to completing BA (2018) and second MA (2020).