Lost in Vector Space: How LLMs Find Their Way to Answers

Following the Vector Paths: How LLMs Navigate from Question to Answer

When you ask a large language model “why is the sky blue?”,
what actually happens inside? The answer lies in tracing vector paths through
high-dimensional embedding space.

Semantic Triple: (Large Language Model, processes, Natural Language Query) → (Query, transforms into, Vector Representation) → (Vector, navigates through, High-Dimensional Space)

Understanding Vector Embeddings

Vector embeddings transform discrete tokens into continuous numerical representations. Each word becomes a point in a 768 to 4096-dimensional space.

Semantic Triple: (Token, maps to, Vector) → (Vector, exists in, Embedding Space) → (Embedding Space, has, Semantic Structure)

Token to Vector Transformation

Each token transforms into a high-dimensional vector representation. Each token receives a unique vector that encodes semantic meaning.

Semantic Triple: (Neural Network, learns, Semantic Relationships) → (Semantic Relationships, encoded as, Geometric Distances) → (Geometric Distances, measured by, Cosine Similarity)

Semantic Vector Relationships

In embedding space, semantic similarity corresponds to geometric proximity. The model learns that “sky” clusters near “atmosphere” with 0.85 cosine similarity.

Semantic Triple: (Sky, relates to, Atmosphere) → (Atmosphere, relates to, Air) → (Air, relates to, Weather) [Transitive Semantic Chain]

Vector Space Clustering

Semantic similarity shown as geometric proximity

“Sky” Vector Neighborhood

atmosphere (~0.85 similarity)
clouds (~0.82 similarity)
air (~0.79 similarity)
weather (~0.76 similarity)

Semantic Triple: (Blue, is a, Color) → (Color, relates to, Wavelength) → (Wavelength, property of, Light)

“Blue” Vector Neighborhood

The vector for “blue” activates clusters connecting color perception to physical properties of light.

color (~0.88 similarity)
wavelength (~0.71 similarity)
light (~0.68 similarity)

Semantic Triple: (Attention Mechanism, computes, Query-Key-Value) → (Query-Key-Value, determines, Information Flow) → (Information Flow, produces, Context-Aware Representations)

The Attention Mechanism Journey

Transformer attention mechanisms route information through 32+ layers, where each layer refines understanding through query-key-value transformations.

Semantic Triple: (Early Layers, process, Syntax) → (Middle Layers, extract, Semantics) → (Late Layers, perform, Reasoning)

Layer 1-5: Syntax and Grammar

Initial layers detect structural patterns. The query vector identifies that “why” signals a causal explanation request.

Semantic Triple: (Query Token, triggers, Explanation Pattern) → (Explanation Pattern, activates, Causal Reasoning) → (Causal Reasoning, structures, Response Format)

Layer 6-15: Semantic Understanding

Middle layers activate domain knowledge. The combined [sky, blue] representation triggers optical phenomenon concepts, pulling in vectors for
light, atmosphere, and scattering.

Semantic Triple: (Semantic Layer, connects, Domain Knowledge) → (Domain Knowledge, includes, Physics Concepts) → (Physics Concepts, explain, Natural Phenomena)

Semantic Activation Pattern


Query: [sky, blue] → optical phenomenon
Key: Activates {light, color, atmosphere} clusters  
Value: Retrieves scattering concepts
Result: Physics domain activation

Layer 16-25: Knowledge Retrieval

Deep layers access factual relationships stored in network weights. Rayleigh scattering emerges with associated wavelength mathematics.

Semantic Triple: (Knowledge Layer, retrieves, Factual Information) → (Factual Information, includes, Rayleigh Scattering) → (Rayleigh Scattering, explains, Blue Sky Phenomenon)

Retrieved Physical Relationships


λ_blue = 450nm (wavelength of blue light)
scattering ∝ 1/λ⁴ (inverse fourth power law)
atmosphere = N₂ + O₂ (molecular composition)
Rayleigh → Lord Kelvin → elastic scattering

These relationships exist as learned weight patterns across billions of parameters.

Semantic Triple: (Network Weights, encode, Statistical Patterns) → (Statistical Patterns, represent, World Knowledge) → (World Knowledge, retrieved via, Vector Operations)

Layer 26-32: Reasoning and Assembly

Final layers construct causal chains. The model assembles the sequence:
sunlight → atmosphere → molecular scattering → wavelength selection → blue perception.

Semantic Triple: (Sunlight, interacts with, Atmosphere) → (Atmosphere, causes, Scattering) → (Scattering, produces, Blue Sky Perception)

Multi-Head Attention Specialization

Transformer models employ multiple attention heads that specialize in different reasoning types. Each head learns distinct semantic relationships.

Semantic Triple: (Attention Head, specializes in, Reasoning Type) → (Reasoning Type, includes, Subject Identification) → (Subject Identification, targets, Main Entities)

Specialized Processing Heads

Head 1 – Subject Identification: Assigns high weight (0.89) to “sky”
Head 2 – Property Attribution: Links subject to attribute (“sky” ↔ “blue”) with 0.92 weight
Head 3 – Causal Reasoning: “why” activates explanation mode, pulls physics knowledge vectors
Head 4 – Entity Relationships: Tracks sequential dependencies sun → light → atmosphere → eye
Head 5 – Comparative Reasoning: Contrasts related concepts (blue vs red wavelengths, violet vs blue perception)

Semantic Triple: (Multiple Heads, process, Parallel Aspects) → (Parallel Aspects, combine into, Unified Representation) → (Unified Representation, feeds, Next Layer)

Hidden State Evolution

As information flows through the network, each token’s vector representation evolves to incorporate contextual understanding.

Semantic Triple: (Token Vector, transforms through, Network Layers) → (Network Layers, add, Contextual Information) → (Contextual Information, refines, Semantic Meaning)

Vector Evolution for “sky”

The representation shifts position in embedding space, moving closer to relevant physics concepts.

Semantic Triple: (Vector Position, changes through, Layer Processing) → (Layer Processing, moves vector, Toward Target Concepts) → (Target Concepts, enable, Accurate Response)

Geometric Operations in Vector Space

The model’s reasoning manifests as geometric operations. Semantic drift
describes how vectors move through space to approach related concepts.

Semantic Triple: (Semantic Drift, is a, Geometric Movement) → (Geometric Movement, occurs in, High-Dimensional Space) → (High-Dimensional Space, encodes, Conceptual Relationships)

Attention Weight Distribution

When processing “blue”, the attention mechanism distributes focus across semantically related tokens:


"blue" attends to:
  - "light"      (0.87 weight)
  - "wavelength" (0.79 weight)
  - "scattering" (0.82 weight)
  - "atmosphere" (0.76 weight)

Semantic Triple: (Blue, attends to, Light) → (Light, relates to, Wavelength) → (Wavelength, participates in, Scattering) [Attention Chain]

Semantic Distance Reduction

Through layer processing, the “blue” vector transitions from color space to physics space. Its distance to “wavelength” decreases dramatically: 0.42 →0.81 cosine similarity.

Semantic Triple: (Color Concept, bridges to, Physics Concept) → (Physics Concept, explains, Optical Phenomenon) → (Optical Phenomenon, answers, User Query)

Output Generation: From Vectors to Text

Final layer representations project onto vocabulary space, producing probability distributions over 50,000+ possible next tokens.

Semantic Triple: (Hidden State, projects to, Vocabulary Space) → (Vocabulary Space, yields, Token Probabilities) → (Token Probabilities, determine, Generated Text)

Next Token Prediction


After "The sky is blue because"...

P("light")    = 0.23  ← High probability
P("of")       = 0.18  ← Grammatical connector
P("Rayleigh") = 0.15  ← Technical term
P("sunlight") = 0.12  ← Alternative phasing
P("the")      = 0.09  ← Generic article
...

The model samples from this distribution, typically selecting high-probability tokens while maintaining coherence.

Semantic Triple: (Model, samples from, Distribution) → (Distribution, weighted by, Context) → (Context, determined by, Previous Vectors)

The Complete Vector Journey

From “why is the sky blue?” to a complete explanation, the process traverses billions of learned parameters.

Semantic Triple: (Question, enters as, Text) → (Text, transforms through, Vector Operations) → (Vector Operations, produce, Structured Answer)

The Path Summary

The Complete Vector Journey From question to answer: The full processing pipeline

Tokenization:
Text splits into processable units
Embedding:
Tokens become high-dimensional vectors
Attention:
Vectors exchange information through learned patterns
Transformation:
Representations evolve through 32+ layers
Knowledge Retrieval:
Physics concepts activate from weight patterns
Reasoning:
Causal chains assemble in late layers
Generation:
Vectors project to vocabulary, producing text

Semantic Triple: (Tokenization, enables, Embedding) → (Embedding, enables, Attention) → (Attention, enables, Reasoning) → (Reasoning, enables, Generation) [Processing Pipeline]

Following the Path Forward

Understanding these vector paths reveals how language models think. Rather than retrieving pre-written answers, they navigate learned geometric relationships in high-dimensional space.

Semantic Triple: (LLMs, navigate, Learned Geometry) → (Learned Geometry, represents, World Knowledge) → (World Knowledge, generates, Novel Responses)

Each query initiates a unique journey through this learned landscape, where semantic proximity guides reasoning and geometric operations produce understanding.

Final Triple: (Vector Paths, enable, LLM Reasoning) → (LLM Reasoning, produces, Intelligent Responses) → (Intelligent Responses, emerge from, Geometric Computation)