JOSHUA N. R. OLLSWANG
Summary
Independent alignment researcher building personalized AI systems that learn over time, adapt to individuals, and support human flourishing — designing RLHF and post-training pipelines (DAPT/SFT/RL across 5 architectures at up to 229B parameters), creating evaluation frameworks that track what models actually learn beyond surface metrics, and building agent harnesses with persistent memory, user modeling, and long-horizon context compression. Built a multi-stage synthetic data pipeline (169K training samples, 4.5B tokens, scalable to 1040+ unique contexts) encoding human preference structures from 23 therapeutic traditions, and discovered post-convergence representational reorganization — where internal representational geometry continues restructuring hundreds of steps after loss convergence, visible through embedding kurtosis analysis but invisible to standard metrics.
Licensed clinician and researcher (University of Chicago) with deep domain expertise in human preference modeling, behavioral adaptation, and the relational dynamics that determine whether personalized AI systems build genuine trust or erode it.
Python, MLX, CUDA, PEFT/LoRA (27B–229B), RLHF/reward modeling, DAPT/SFT/RL pipelines, mechanistic interpretability, evaluation framework design, preference-learning systems, multi-agent architectures, long-context memory systems.
Research
RLHF, Post-Training & Evaluation Frameworks
- Designed and executed multi-stage post-training pipeline (DAPT → SFT → RL) across 10+ adapter runs on 5 architectures (MiniMax M2 229B, Llama 3.3 70B, Gemma 3 27B, GLM 4.7 Flash 30B, Mistral Small 4 119B), with controlled comparisons of layer-targeting strategies, curriculum configurations, and quantization precision
- Built quantitative evaluation framework capturing what models actually learn: validation loss/perplexity tracking, KV reconstruction loss, per-embedding kurtosis geometry analysis (Fisher excess kurtosis across 2048 dimensions, 2,010 embeddings, 39-phase temporal bucketing) — revealing training dynamics invisible to standard metrics
- Built preference-learning verification system: strict and lenient matching of 33,169 clinical labels against training corpus, with temporal quintile analysis tracking the transition from reproducing preference patterns to constructing novel preference-aligned formulations
- Discovered post-convergence representational reorganization: after validation loss plateaus, embedding kurtosis continues declining (−16.6%), revealing a structurally distinct phase of internal representational restructuring — a long-horizon training dynamic invisible to standard metrics. Finding documented with pre-registered falsification criteria subsequently disconfirmed over 200+ steps
- Designed RL reward modeling through Teaching by Negation: reinforcement learning from contrastive human preference signals, training models to distinguish aligned from misaligned behavioral patterns across nuanced relational contexts
Multimodal Post-Training & SME RLHF (In Progress)
- Architecting multimodal post-training pipeline grounded in 450 longitudinal psychotherapy sessions (580+ hours, 2.47 TB) — extracting aligned audio, video, face, and text embeddings to train models that learn from richer interaction signals beyond text alone
- Designed two-phase training: (1) self-supervised cross-modal contrastive learning (InfoNCE across 4 modalities + temporal forecasting via CPC) to build shared representational space, (2) supervised fine-tuning on gold-annotated segments with continuous Valence/Arousal/Dominance/Salience predictions per modality
- Built face recognition and affect tracking pipeline: RetinaFace detection → SAM2 temporal tracking → ArcFace identity embeddings (512d) + HSEmotion affect embeddings (256d), with cross-session identity resolution and adversarial gradient reversal (λ=0.1) to prevent affect representations from collapsing into identity
- Designed SME (Subject Matter Expert) RLHF: bus-conditioned LLM receives real-time multimodal affect events and generates candidate policy distributions over actions, scored by therapeutic reward model — clinician SMEs provide preference feedback on action alignment, appropriateness, and predicted outcomes
- Audio emotional salience pipeline: WavLM-Large (1024d) + emotion2vec (768d) + 7-dimensional prosody features, capturing tone, rhythm, breath, tremor, and paralinguistic signals that carry preference and emotional state information invisible to text-only systems
Training Data & Preference-Learning Pipelines
- Architected Decomposition-Factorization-Recomposition (DFR) data schemas encoding human preference structures into learnable form across 23 behavioral and relational traditions — each tradition representing a distinct lens on human needs, values, and well-being
- Designed three complementary curricula — Universal Hierarchical Direction (UHD), Alternative Directional Window Curriculum (ADWC), and Rolling Recap Architecture (RRA) — sequencing preference-learning exposure to build progressively richer user models
- Built Python-based generation pipeline producing 169,323 training samples (4.5B tokens) with 1040+ unique context capacity, encoding rubrics and evaluation frameworks capturing long-term human value
Personalization, Long-Term Memory & User Modeling
- Designed and implemented Rolling Recap Architecture (RRA): a context compression architecture enabling coherent personalized reasoning across 100+ sequential context windows, where each window compresses prior context into structured user state carried forward through rolling summaries — serving as both training curriculum and persistent memory system at inference
- Built KV cache compression pipeline: 4096-token windowed processing of long-horizon interactions (up to 330K tokens across 130+ windows), compressing each window into 1024 KV cache positions with 512-token recaps, preserving preference signal across ultra-long sessions
- Designed and built a personalized agent system (17K+ lines) with multi-layer memory (KV cache compression + dual-model semantic/keyword retrieval + salience-weighted tracked memories + in-session deep search triggered by behavioral event detection), multi-step agent tool loop with read-only file system access, trained custom TTS voice profiles, WebSocket auth with session continuity
- Trained multiple affective voice profiles grounded in MIT Media Lab Fluid Interfaces research on vocal presence as an essential modality for perceived warmth and bonding — operationalizing Harlow’s contact comfort finding (humans bond to warmth, not utility) as a functional design affordance for personalized AI
Key Findings
- Parameterization threshold: higher-capacity models absorb complex preference-alignment curricula more effectively, with preliminary evidence of scaling
- Middle-layer targeting produces 1.4–2.0x deeper convergence than latter-layer targeting on identical curriculum (controlled comparison on two architectures), suggesting alignment is best embedded in representational composition layers
- Architecture-dependent output fidelity: GLM achieved 6.5x faster throughput but exhibited systematic preference-alignment failures (hallucinated signals, construct reversals), while MiniMax and Gemma maintained reliability — demonstrating that speed and alignment quality can diverge
- Preference provenance shift: found-in-training rate drops −23pp (MiniMax) and −29pp (Gemma) from earliest to latest training quintile, indicating transition from reproducing training labels to constructing novel preference-aligned formulations
Professional Experience
- Designed and implemented personalized intervention programs for high-functioning professionals (executives, surgeons, military, entrepreneurs) — modeling individual user preferences, adapting strategies based on explicit and implicit feedback signals, and tracking long-horizon outcomes across 1:1, couples, and group formats
- Designed and delivered large-group educational curricula (lectures of ~100 participants) on human relational dynamics including attunement, trust-building, intrapsychic self-awareness, and interpersonal efficacy
- Developed AI-augmented evaluation methodologies: built bespoke preference ontologies and multi-iteration prompt-engineered pipelines for post-session analysis, integrating 20+ custom Python scripts orchestrating evaluation chains across behavioral modalities
- Created custom GPT agent providing personalized between-session support, continuously utilized by users across ~100 conversation threads — early prototype of adaptive AI that learns from ongoing interaction
- Built modular AI pipeline for post-session processing: automated multi-modal transcription/diarization, behavioral analysis framework, and personalized guidance with quantitative and qualitative evaluations
- Provided long-term behavioral health support to children, adults, couples, and families — modeling individual preferences and adapting intervention strategies based on ongoing feedback
- Conducted assessments, treatment planning, and multi-modal personalized programs including virtual world integrations during the pandemic
- Collaborated with interdisciplinary teams on behalf of clients and families
- Designed personalized curriculum and guided learners (ages 8–18) across subjects, specializing in adaptive instruction for high-intelligence, high-needs children
Education
Master's Degree, Social Work (Clinical Mental & Behavioral Health Interventions), June 2020
Evening Division, Music Theory & History, 2010–2012
Bachelor's Degree, Psychology, 2018
Master's Degree (awarded with Distinction), Philosophy, Art & Critical Thought, 2008*
Independent Auditing, Lectures on International & Civil Law, 2006
Summer Programs, Creative Writing, 2004
Psychology, Philosophy, & Creative Writing, 2001–2003
*Granted B.A. waiver by EGS director to enroll in master's program early based on academic background and 90+ undergraduate credits. Completed MA with Distinction and 4.0 GPA prior to completing BA (2018) and second MA (2020).