[TITLE]

TODO: TLDR summary TODO: Author list and title TODO: Add Raymond to people page TODO: Equations for each of the scores

1. Background

An ontology is a collection of primitives/concepts that model a certain domain of knowledge (Gruber et al.). Concepts in ontologies are hierarchically connected in parent-child relationships. A parent (also called hypernym) can be thought of as encompassing a child (hyponym). For example, "mammal" is a hypernym to "dog".

1.1 Linear Representation Hypothesis

Park and coauthors consider how LLMs might represent concepts. Since 2013, the field of natural language processing has known that neural networks trained using self-supervision can represent words or concepts as vectors.

Since then, the neural network architectures used for word representation have changed, from single-layer architectures like word2vec to deep architectures based on the Transformer (Vaswani). The typical Transformer LM architecture consists of an embedding matrix, dozens of Transformer blocks, and an output unembedding matrix. Park et. al. are concerned with connecting model behavior (counterfactual word pairs) to concept representations (vectors). The unembedding matrix is the part of the network that translates from vector representation to model output. It’s a natural choice to look at the space spanned by the unembedding matrix.

Park et. al. define the causal inner product - a transformation of a LLM unembedding matrix that preserves language semantics by encoding causally-separable concepts as orthogonal vectors. For Park and coauthors, a concept is defined by a set of counterfactual word pairs, like {(“man”, “woman”), (“king”, “queen”)}. If a language model produces the word “man” as its next output, manipulating the binary gender concept should result in the model producing the output “woman” instead. Two concepts are causally separable if they can be “freely manipulated”. For example, two concepts color (red->blue) and vehicle type (car->truck) are causally separable, since it makes sense to think about a red car, red truck, blue car, or blue truck. But concepts like [...] are not causally separable, since a () () is oxymoronic.

Ontological sets are expansions upon individual concepts, a collection of synonyms of the concept term. These sets are encoded as a vector as a result of retrieving unembeddings of a concept’s synonyms.

Their work confirms the linear representation hypothesis - “that high-level concepts are linearly encoded in the representation spaces of LLMs” (Park 2024). This may already be expected, as we’ve already had examples of word representations that support linear transformations since the 2013 Word2Vec paper. [man woman thing example idk]

Moreover, they show that the concept vectors pointing from child-parent and parent-grandparent nodes in an ontology are orthogonal to each other.

Park et. al.’s theory gives some predictions about what we would expect the model unembedding matrix to look like under the causal inner product. Words belonging to the same concept should have similar vectors, measured using cosine similarity. Words that are ontologically related should have high cosine similarity with each other. Child-parent and parent-grandparent vectors should be orthogonal.

1.2 Linear Representations in Practice

Park et al. shows these relationships with heatmaps, included here as Figure 1. These heatmaps are ordered by concept hierarchy - it would be expected that "entity" be the upper left most entry. The first heatmap is an adjacency matrix describing the child-parent relationship between concepts of an ontology, where a child is adjacent to another concept if descending at any depth. The second heatmap is a cosine similarity matrix between the linear representations of these concepts. As seen in Figure 1, the adjacency matrix is echoed in the second heatmap, suggesting ontological representation in the LLM space. The third heatmap shows cosine similarity between the difference of a concept's and its parent's linear representation. The branches from the adjacency matrix have near-zero values in the third heatmap, demonstrating the orthogonality between the child-parent and parent-grandparent concepts.

Figure 1 from Park et. al.
Figure 1 from Park et. al.

Park et al. also take linear representations of ontological concepts along with random words, and compare the norms of these vectors. As seen in Figure 2, there is a significant gap between the two groups, indicating high level representation of ontological features in the LLM space.

Figure 1 from Park et. al.
Figure 2 from Park et. al.

2. Research Question

In this investigation, we explore if longer-trained LLM models have better ontological representations.

2.1 Method

2.1.1 Scores/Evaluation

Using the formalization from Park et al. as a basis, we introduce three scores for evaluating a model’s ontological maturity. Linear representation: the average of the concept representation vector norms from Figure 2. Causal separability: by comparing the adjacency matrix and cosine similarity of representation vectors from Figure 1, we find the Frobenius norm of the difference between the cosine similarity and adjacency matrices, ignoring the diagonal. Hierarchical: the Frobenius norm of the cosine similarity between child-parent representation differences, disregarding the diagonal as well.

A boilerplate LaTeX equation is shown below:

\[ E = mc^2 \]

2.1.2 Model

For the LLM model, Park et al. uses Google’s Gemma-2b model. However, we use EleutherAI’s Pythia models in our investigation, a collection of models with various parameter sizes with checkpoints in terms of training steps. These include every thousandth step between step1000 and step143000, of which we use the odd thousands (step1000, step3000, …). This allows us to measure the above three scores as a function of training steps.

We use five different sized models: 70M, 160M, 1.4B, 2.8B, and 12B parameters. [add something about MMLU scores?] Of these, evaluations on 160M, 2.8B, and 12B can be found on the Open LLM Leaderboard. link?

2.1.3 Data

For the ontology, we use the WordNet database as did so by Park et al. in their original exploration. This database contains thousands of synsets (groups of synonyms for a concept), related to one another in a hierarchical fashion.

2.1.4 Results

Figure 1: Causal separation, hierarchy, and linear representation scores for Pythia models
Causal separation, hierarchy, and linear representation scores for Pythia 70M, 160M, 1.4B, 2.8B, and 12B models, for training steps from 1000 to 143,000.
Observation 1: Ontological maturity increases with training steps

From each graph, we see a clear improvement in each of the respective scores, an increase in linear representation scores and a decrease in the causal separation and hierarchical scores. This plateaus after sharp improvement, similar to the common trend that models improve mostly at the beginning of training.

Observation 2: Larger models demonstrate improved scores (generally)

The models improve in terms of parameter size as well, where increasingly higher parameter models have better scores with the exception of linear representation, where we see a counter-intuitive pattern, as progressively lower parameter sizes lead to higher scores.

Observation 3: Larger models show more variability in hierarchy and causal separation scores

Looking at the causal separation and hierarchical scores, we see that for 70M and 160M models, scores are relatively smooth throughout training. However, there is more noise in the curves moving down to the larger models. It seems that the noise increases along with model size.

We computed levels of noise by totaling the squared error between the points and a best fit logarithmic function.

This is not the case of linear representation, where all curves are smooth and generally monotonic.

A higher linear representation score means better high-level representation of ontological concepts, as the distinction between these concepts and random words grows.

The decrease in causal separability score is indicative that the reflection of the adjacency matrix is more apparent in the cosine similarity matrix, implying improved representation of ontological categories in the Pythia model space.

The decrease in hierarchical score is indicative of more pronounced orthogonality between child-parent and parent-grandparent pairs, as the cosine similarity between these pairs gets closer and closer to 0.

Applications

[Content for Applications section]

Future Work

We continue this work with different ontologies, found on OBOFoundry. We also find term frequencies of The Pile, the dataset used to train Pythia models. From this we are able to evaluate a relationship between individual ontology term scores and their respective term frequencies.