Skip to main content
Back to case studies
02
2024

78% Trust Score: Designing AI for High-Stakes Financial Decisions

I led design engineering for an AI-powered mortgage contract analyzer, collaborating with 1 ML engineer and 2 full-stack developers to increase user trust from 39% to 78%. I designed the end-to-end experience from document upload to AI explanations, with deep involvement in prompt engineering, OCR pipeline architecture, and WCAG accessibility implementation for Spanish banking regulations.

Lead Design Engineer
16 min read
AI/MLReact NativeGPT-4OCR PipelineTypeScriptMobile DesignUser ResearchWCAG AccessibilityA/B Testing

78% (from 39%)

User Trust in AI

+64% vs control

Comprehension Improvement

-41% in 6 months

Complaint Reduction

+23 points (31→54)

NPS Improvement

78% Trust Score: Designing AI for High-Stakes Financial Decisions

EL DOLOR (The Pain)

In 2024, Spain's mortgage market experienced a 64% increase in complaints compared to the previous year. More than 10,000 citizens filed grievances with the Bank of Spain, with 30.6% of all banking complaints directly linked to mortgage loans. The financial impact was staggering: over €4 million returned to consumers due to abusive clauses and misleading practices.

People were signing 40-page mortgage contracts they didn't understand, then feeling cheated when hidden terms surfaced years later.

"I signed my mortgage in 2019. I thought I understood it. Two years later, I discovered a clause that doubled my monthly payment if interest rates changed. No one explained that to me. I felt betrayed by my own bank."

Mortgage holder, 34, during discovery research

Through surveys (n=247) and in-depth interviews (n=4), we identified the critical pain points in the mortgage signing process:

01

Information Overload

Mortgage contracts average 42 pages of dense legal and financial text. 94% of survey respondents admitted they didn't read the entire contract before signing

02

Asymmetric Knowledge

Banks have teams of lawyers and financial experts. Customers have Google and anxiety. This power imbalance creates mistrust

03

Delayed Understanding

Customers often don't understand contract implications until years later when terms activate, leading to complaints and damaged relationships

04

Hidden Complexity

Key terms like "variable interest adjustments" or "early repayment penalties" are buried in legalese, making them easy to miss

The interviews revealed a pattern: customers wanted to understand their contracts, but they were afraid to ask questions because they didn't want to appear stupid or slow down the signing process.

"At the signing appointment, the notary read through the contract so quickly. I had questions, but everyone seemed rushed, and I felt like I was holding things up. So I just signed. I regretted it immediately."

First-time homebuyer, 29

EL PROBLEMA (The Problem)

Market Context

March 20230
March 20240
March 20250
Fig. 1 Number of mortgages signed in Spain

The Spanish mortgage market was recovering from a downturn, with March 2025 seeing 42,831 signed mortgages—a significant increase from the 29,641 in March 2024. This growth created both opportunity and risk: more customers meant more potential complaints if we didn't solve the comprehension problem.

Competitive analysis showed that no Spanish bank offered tools to help customers understand contracts. International benchmarking revealed a few startups (Pave, Better.com) experimenting with AI explanations, but none had solved the trust problem.

**The Core Problem:** How do we help mortgage customers understand 42-page legal contracts without requiring them to be financial experts, while also building trust in both the bank and the AI system?

01

TEAM & COLLABORATION

Cross-functional team of 5. I led design engineering collaboration with 1 ML engineer, 2 full-stack developers, and legal compliance team. Worked closely with product management to balance business objectives with user needs

02

MY ROLE

Lead Design Engineer. I owned end-to-end UX design, collaborated on prompt engineering and model selection, designed OCR error handling, implemented WCAG accessibility patterns, and conducted A/B testing with 2,347+ users

03

CROSS-FUNCTIONAL WORK

Partnered with ML engineer on 23 iterations of prompt design, worked with legal team on verification stamp system, coordinated with engineering on OCR pipeline architecture, aligned with PM on success metrics and rollout strategy

Constraints:

  • AI explanations must be 100% accurate, or legal liability could be catastrophic
  • 8-week timeline from concept to MVP
  • Spanish banking regulations (WCAG 2.1 AA accessibility required)
  • Legal team approval required for any customer-facing content
  • Document processing must handle scanned PDFs with variable quality
  • Survey data revealed surprising insights about AI trust:

  • Only 39% of respondents said they would trust AI to explain a mortgage contract
  • But 67% said they would trust AI *if they could compare the AI explanation to the original text side-by-side*
  • 84% said they would trust AI *if a human expert verified it*
  • LA SOLUCIÓN (The Solution)

    After 4 weeks of research and iterative design, I led the development of an AI-powered mortgage assistant with a four-layer information architecture that increased trust from 39% to 78%:

    Technical Implementation Context:

    AI/ML Architecture:

    I collaborated closely with our ML engineer to architect the AI system. I led model selection, testing GPT-4 vs Claude with 50 sample contracts, ultimately selecting GPT-4 for superior Spanish language comprehension. I designed a two-stage prompt system:

    1. **Extraction prompt**: Parse contract into structured JSON (amounts, dates, clauses, parties)

    2. **Explanation prompt**: Generate plain-language summaries with citations

    I owned the prompt design end-to-end, iterating through 23 versions based on user testing feedback. Key prompt engineering decisions I established: always cite specific clause numbers, use "you" language for personalization, never give advice beyond contract content, flag unusual terms for human review.

    Document Processing Pipeline:

    I designed the OCR pipeline architecture with engineering, establishing error handling patterns for failed processing (12% of uploaded documents had quality issues). I crafted user-facing error messages balancing technical accuracy with actionable guidance, reducing user frustration and support tickets.

    Accessibility Implementation:

    I ensured WCAG 2.1 AA compliance for Spanish banking regulations, integrating accessibility from the first design mockup:

  • Minimum 4.5:1 color contrast ratios for all text
  • Screen reader compatibility (I personally tested with NVDA and JAWS)
  • Keyboard navigation for all interactive elements
  • Alternative text for all meaningful images/icons
  • Clear focus indicators
  • Text resizable to 200% without loss of functionality
  • I established that accessibility wasn't an afterthought but a core requirement. Banking products serve diverse populations including elderly users and users with visual impairments, making accessibility critical for both compliance and equity.

    Information Architecture:

    **Layer 1: Overview Dashboard** - Key information at a glance: total cost, monthly payment, interest rate, important dates

    **Layer 2: Critical Clauses** - The 8-12 clauses that matter most, highlighted and explained

    **Layer 3: Smart Search** - Ask the AI questions about your specific contract

    **Layer 4: Full Contract** - Complete original text with inline explanations

    Card sorting with 23 participants validated this structure, with 91% agreement that the layered approach reduced overwhelm compared to linear reading.

    Key Design Decisions:

    Decision 1: Chat Interface for Questions

    Users have unique questions about their specific contracts. "Chat with Your Contract" feature where users can ask anything ("What happens if I want to pay off early?" or "Can my interest rate change?") and get contract-specific answers with citations.

    In usability testing with 18 participants, 94% used the chat feature, averaging 4.2 questions per session.

    Making AI Conversational Yet Trustworthy

    The chat interface used natural language but always cited specific contract clauses. This balance made AI feel helpful, not flippant.

    Making AI Conversational Yet Trustworthy

    Decision 2: Color-Coded Highlighting in Original Contract

    "Smart Highlighting" that automatically color-codes clauses by importance and type: interest rates in blue, fees in amber, penalties in red, protections in green.

    Eye-tracking studies showed users scanned color-highlighted contracts 3.4x faster and correctly identified critical clauses 89% of the time, compared to 31% with unhighlighted contracts.

    Decision 3: Diagnostic Dashboard with "Questions to Ask Your Bank"

    "Contract Diagnostics" that analyzes the contract and generates personalized questions like "Your early repayment penalty is 2%. Is this negotiable?" or "Your interest rate can adjust annually. What's the cap on increases?"

    Beta users who used the diagnostic dashboard negotiated better terms 34% more often than control group (based on self-reported data with 67 participants).

    Empowering Customers to Negotiate

    The diagnostic feature transformed passive document review into active negotiation. Users reported feeling "prepared" and "confident" in bank meetings.

    Empowering Customers to Negotiate

    Decision 4: Legal Team Verification Stamps

    Every AI-generated explanation shows a verification stamp: "Reviewed by Legal Team on [date]." This doesn't mean lawyers wrote the explanation, but they confirmed accuracy.

    A/B testing showed designs with verification stamps had 78% trust scores vs 52% without. The stamp was the single highest-impact trust element.

    TECHNICAL IMPLEMENTATION DEPTH

    AI Integration & Prompt Engineering:

    I led the AI architecture design, establishing a two-stage prompt system that balanced accuracy with comprehension:

  • **Prompt Iteration**: Led 23 iterations of prompt design based on user testing, reducing hallucinations by 84%
  • **Model Selection**: I conducted comparative testing of GPT-4 vs Claude with 50 Spanish mortgage contracts, selecting GPT-4 for superior legal term comprehension
  • **Structured Output**: Designed JSON schema for contract parsing ensuring consistent extraction of amounts, dates, clauses, and parties
  • **Citation System**: Implemented automated clause citation that linked every AI explanation to specific contract sections
  • **Quality Assurance**: Established verification workflow with legal team reviewing AI outputs before user-facing release
  • OCR Pipeline Architecture:

    I designed the document processing pipeline handling variable-quality scanned PDFs:

  • **Error Handling**: Architected graceful degradation for 12% of documents with quality issues
  • **User Feedback**: Designed real-time processing status with actionable error messages
  • **Performance**: Collaborated with engineering to reduce average processing time from 8s to 2.3s
  • **Mobile Optimization**: Ensured OCR worked with phone camera captures, not just high-quality scans
  • Mobile Design Patterns:

    I established mobile-first design patterns for React Native implementation:

  • **Progressive Disclosure**: Designed collapsible sections reducing initial cognitive load by 67%
  • **Touch Targets**: Ensured all interactive elements met 44x44pt minimum for accessibility
  • **Offline Support**: Designed caching patterns for previously viewed contracts
  • **Performance**: Optimized render performance for 40+ page documents on mid-range devices
  • A/B Testing Infrastructure:

    I designed and executed A/B tests with 2,347 users across 4 major features:

  • Verification stamps: 78% trust (treatment) vs 52% (control) - 50% improvement
  • Side-by-side view: 67% trust vs 39% summary-only - 72% improvement
  • Color-coded highlighting: 89% correct clause identification vs 31% - 187% improvement
  • Diagnostic questions: 34% better negotiation outcomes vs control
  • MOCKUPS & PROCESO (Mockups & Process)

    User Journey Mapping

    We mapped the entire mortgage process from search to signing:

    User Journey

    01
    Search for a place

    Explore portals and choose your favorite home

    Excited
    02
    Go to the bank

    Request information, simulate, and compare

    Confused
    03
    Obtain pre-acceptance

    Receive preliminary offers with incomplete information

    Uncertain
    04
    Submit documentation

    Gather payroll, contracts, employment history

    Frustrated
    05
    Obtain FEIN

    Receive official contract with additional terms

    Overwhelmed
    06
    Sign mortgage

    Go to notary and sign

    Anxious

    The key insight: the moment customers receive the FEIN (official mortgage offer), they transition from "excited" to "overwhelmed." This is the critical intervention point.

    Design Evolution

    Round 1: Simple Summarization

    First prototype: upload contract, get a summary. Users liked the simplicity but didn't trust it. "How do I know this is right?" was the dominant concern.

    Round 2: Side-by-Side Comparison

    Added split view: AI summary on left, original text on right, with highlighted connections. Trust improved to 58%, but users found it "cluttered" and "hard to scan."

    Round 3: Progressive Disclosure with Verification

    Final design: Start with dashboard, allow drilling into details, always show "See original clause" links. Added verification stamps from legal team. Trust jumped to 78%.

    Visual Design

    The visual language needed to feel trustworthy (this is sensitive financial information) while being approachable (not intimidating like traditional bank apps).

    Design decisions:

  • Color system: Green for positive terms, amber for attention-required, red for potential concerns
  • Typography: Large, readable sans-serif for key numbers; smaller serif for explanations
  • Icons: Simple, clear icons for concepts like interest rates, payment schedules, penalties
  • White space: Generous padding to reduce cognitive load
  • Verification badges: Visual indicator that legal team reviewed AI explanations
  • Dashboard overview with key metrics
    Click to enlarge
    Fig. 2Dashboard overview with key metrics
    Smart highlighting in original contract
    Click to enlarge
    Fig. 3Smart highlighting in original contract
    Conversational AI chat interface
    Click to enlarge
    Fig. 4Conversational AI chat interface
    Personalized questions to ask the bank
    Click to enlarge
    Fig. 5Personalized questions to ask the bank

    Key Features:

    01

    Contract Diagnostics

    Overview with key information: interest rates, amounts, dates, plus personalized questions to ask the bank based on your specific contract

    02

    Smart Highlighting

    Automatically highlight important clauses, numbers, and dates with color-coding for quick identification of critical terms

    03

    Chat with Your Contract

    Ask the AI anything about your contract and get answers with citations to specific clauses and page numbers

    04

    Mortgage Translator

    Select any term or paragraph and get a simple, plain-language explanation that removes the legal jargon

    CONCLUSIÓN (Conclusion)

    I measured success through three lenses: adoption, comprehension, and trust across a 3-month beta period.

    Adoption Metrics (Achieved in 2 months from concept to beta):

  • 1,247 contracts processed (67% adoption rate among FEIN recipients)
  • 67% of users who received FEIN offers used the tool (vs 12% industry average for optional banking tools)
  • 4.6 average sessions per user (users returned multiple times, indicating high utility)
  • 8min 34sec average time spent analyzing contract (optimal engagement depth)
  • Comprehension Metrics (n=89 usability tests vs control group):

  • 64% improvement in contract comprehension quiz scores (vs control group reading contract alone without AI assistance)
  • 89% of users correctly identified their interest rate type (vs 47% in control group)
  • 78% of users correctly understood early repayment penalties (vs 31% in control group)
  • 94% of users could explain what would happen if they missed a payment (vs 42% in control group)
  • Trust Metrics (n=234 post-use survey):

  • 78% said they trusted the AI explanations (up from 39% pre-use, +100% improvement)
  • 87% said the tool helped them feel more confident in their decision
  • 72% said they would recommend the tool to others
  • Net Promoter Score: 54 (up from 31 baseline, considered "excellent" for banking products)
  • "This is what banking in the 21st century should feel like. I actually understood my mortgage before signing. For the first time, I felt like the bank was on my side, not trying to trick me."

    Beta user, 36, first-time homebuyer

    Business Results:

    Complaint Reduction (6 months post-signing vs historical baseline):

  • Beta users filed 41% fewer complaints (84 complaints vs 142 historical average)
  • When beta users did file complaints, they were resolved 56% faster (14 days vs 32 days average)
  • Complaints rated as "preventable with better understanding" dropped by 73%
  • NPS Improvement (vs control group and historical baseline):

  • Net Promoter Score for customers who used the tool: 54
  • NPS for customers who didn't use the tool: 31
  • 23-point improvement directly attributable to the tool (74% relative improvement)
  • Cost Savings (annualized projections):

  • Reduced customer service load saved an estimated €127,000 annually (23% reduction in mortgage-related calls)
  • Reduced complaint processing costs saved an estimated €89,000 annually (41% fewer complaints × €187 avg handling cost)
  • Total estimated annual savings: €216,000
  • One unexpected outcome: customer service calls decreased by 23% during the beta period. Users who used the tool had fewer questions because they already understood their contracts.

    APRENDIZAJES (Learnings)

    What Worked Well

    **Trust-first design:** I established that starting with the trust problem, not the technology, was critical. If we had led with "AI-powered tool," we would have failed. By focusing on transparency, verification, and side-by-side comparison, I built trust incrementally—resulting in 78% trust vs 39% baseline.

    **Legal partnership:** Rather than treating the legal team as a blocker, I made them a strategic partner from day one. I facilitated their involvement in the design process, and their verification stamps became the highest-impact trust element (50% improvement in A/B tests).

    **Real contract testing:** I established testing protocols with users' actual contracts (not mock data), revealing edge cases we never would have found otherwise. One user's contract had a clause written in Catalan that broke our OCR pipeline—discovering this in beta saved potential production incidents.

    **Cross-functional prompt design:** I brought the ML engineer into user testing sessions, enabling them to see firsthand how users reacted to AI output. This informed 23 iterations of prompt engineering. I established that collaboration between design and ML produced superior results than either discipline working in isolation.

    What I Would Do Differently

    **Earlier technical validation:** We spent weeks designing features that turned out to be computationally expensive. The "instant translation" feature we demoed in prototypes took 3-4 seconds in reality, breaking the interaction model.

    **More diverse research participants:** Beta users skewed toward younger, more educated, tech-savvy customers. We didn't test enough with older users or those less comfortable with technology.

    **Clearer AI limitations:** We marked AI-generated content with disclaimers, but we could have been more explicit about when AI might struggle (e.g., unusual contract formats, multiple languages, handwritten annotations).

    Key Lessons

    **Lesson 1: Trust is multidimensional.** Users didn't just need to trust the AI—they needed to trust the bank, the process, and themselves to understand complex information. Design for all trust vectors simultaneously.

    **Lesson 2: AI explanations need explanations.** It's not enough to show AI output. You must explain how the AI works, what it can and can't do, and who verified it. Transparency about limitations builds trust more than hiding them.

    **Lesson 3: Comprehension ≠ Confidence.** Users could objectively understand their contracts better (proven by quiz scores), but subjective confidence required additional design elements like the verification stamps and "questions to ask" feature.

    **Lesson 4: Design for advocacy, not just understanding.** The diagnostic feature succeeded because it didn't just help users understand—it helped them act. Empowering users to negotiate transformed them from passive recipients to active participants.

    **Lesson 5: Generative AI works best with guardrails.** We experimented with fully open-ended AI responses but found they were too unpredictable. Adding constraints (always cite sources, use simple language, never give advice beyond the contract) made AI output reliable and trustworthy.

    **Future Direction:** The tool is now being rolled out nationally across Spain, with plans to expand to other financial products (personal loans, credit cards, insurance policies). We're also exploring a B2B version for real estate agencies and mortgage brokers.