The letters of the alphabet are written out on a chalkboard, but there are blanks in place of a few of the letters. A school girl is writing in the missing letters.

Image Source: Photo 135270998 © Bangkok Click Studio | Dreamstime.com

From image recognition to conversational text powered by massive datasets of recorded human knowledge, our social and business ecosystems are on the cusp of a major paradigm shift in knowledge retrieval and information interaction. Large language models (LLMs) are the poster child for applying artificial intelligence (AI) in the real world. What’s the practical next step for AI and LLMs for your organization? In this two-part series, we’ll discuss how both digital experience and technical organizations can plan for the use of LLMs to improve the returns on digital transformation and get digital right.

Spoiler Alert

For the readers who are researching to understand what makes language models tick in the enterprise, one subtle conclusion you’ll reach is that information architecture is essential.

Since language models find their conceptual home in the world of data science, language model conversations will always take place in the context of information systems, where the term “information architecture” takes on different meanings. The following is a brief summary:

Information Architecture for IT Systems
In this context, information architecture often relates to the infrastructure and efficient exchange of data within technical systems in support of critical business functions. While it has an enterprise-wide scope, information architecture for IT systems has historically focused on technology infrastructure and business data objects tied to core business initiatives and operations.

Information Architecture for Digital Environments
Information architecture in this context is mostly human-centered and concerned with “facilitating shared understanding and alignment with conceptual clarity.” This translates into modeling business- and human-centered concepts in a way that gives structure to and promotes the usability of digital user interfaces. The information models managed as part of human-centered information architecture provide essential blueprints for digital strategy, experience design, and technical implementation.

While information architecture for both systems and humans have different goals, they are complementary and interdependent.

For the remainder of this two-part post, I’ll be referring to human-centered information architecture.

AI & Large Language Models: The Power and Pain of Prediction

There are a few views on what is meant by AI which I won’t get into here. However, it’s generally held that today’s current form of AI–the one we all see in the market–is considered weak AI, since it mainly represents advances in computational automation, machine learning (ML), and artificial neural networks (ANN). They simply make use of statistical models and algorithms that are worthless unless fed with massive amounts of training data. Large language models like GPT-3.5 and GPT-4 fall into this category.

Language models owe their recent success to advancements in natural language processing (NLP) and training methods that allow language models to predict, with a high degree of certainty, coherent responses to questions or commands as long as the topic of discussion is part of the training data (text). If the LLM does not have a sufficient amount of training data to produce a coherent sentence that is a statistically likely response, the model defaults to a response that is probabilistically plausible. The difference in quality is like getting a response that maps to common or expert knowledge, versus a response that generates fictional, inaccurate, or incoherent responses called hallucinations.

It’s important to note that prediction is different from understanding. The data science approach to enabling predictions is mainly about discovering and referencing strong text patterns that emerge over many iterations and refining the predictive parameters of the model to align as closely as possible to the intended result, as defined by the LLM engineers. It’s a brute-force approach to generate the results you want.

Armed with reliable predictive capabilities, today’s LLMs can fill in the blanks to text, words, and images with impressive results by measuring the occurrence and proximity of all possible terms and sequences as they have appeared over the immense training data, and they will use this to predict the next likely word or phrase. For example:

If given the task to fill in the blank to, “I am ____”, getting a statistically likely response that has a higher degree of certainty is going to be low because there are potentially thousands of words that are equally probable.

However, if the sentence or prompt is changed to, “I have not eaten all day. I am ____”, the statistically likely responses will surely relate to words or phrases regarding hunger.

The word that’s actually used by a language model platform can be determined by many factors. The following is an example of common criteria:

the strongest pattern observed in the training data
a degree of randomness that allows for a broader range of possible responses
filters that remove toxic and other disallowed language

It is important to note that all of the information used to train the model is NOT retained – only the final parameters (or model) that produce the desired level of predictability from a prompt remain. Hence, while language models can emulate human conversation, they cannot unpack or deduce symbolic (representations) and semantic (meaning) relationships inherent in the text unless it is explicitly part of the training data.

This is similar to teaching someone mathematics for the first time and uttering the phrase “one plus one equals two.” Once the student has committed the phrase to memory, I can then ask the student to complete the sentence, “one plus one equals ____” and she would reply, “two.” Her confidence in her answer is high since she has just a single reference and no other reason to consider another possible answer. If I ask the same student to complete the sentence “two plus one equals ____”, she would have to understand that “one” represents any thing, and that “two” represents the case where two things have been added together and that “plus” represents the concept of addition.

Without referencing the symbolic (“plus”) and semantic nuances in the phrase, the participant in the test will have to either A) concede to not knowing what word to insert, B) guess a word, or C) use information that she became aware of where eight fellow students inserted “three” and two students inserted “two.” Since our student is confident with the frequent pattern expressed by her eight classmates, she inserts “three” as the next word.

But what if the answers by all the students were evenly distributed, where the responses that came from students were all different? If the student is not told that there is only one possible answer, she might logically assume that all answers are equally valid and then randomly choose a number. This is similar (but in a grossly oversimplified way) for language models. They independently learn by the textual patterns expressed in the training data and post-training.

Language models are so good at producing statistically probable responses that are both accurate and coherent, it’s easy to forget how fallible they can be when they respond with less-than-desired confidence due to limited conceptual information (including symbolic or semantic understanding) about a particular subject.

Language models are amazing prediction engines that deserve our attention and praise. However, as you proceed to explore how to leverage them in your work, make a special note of these three implementation challenges:

It’s currently impossible for language model creators to rationalize the final predictive model that is created from training! It’s the classic black-box challenge.
Predictive language models don’t know what they don’t know. Their ability to express knowledge across many subjects will be extensive but never comprehensive or error-free without diligent human intervention.
Language models are trained on information that is available in the public domain. The burden of training even the best language models on the knowledge and conceptual nuances of your company falls on your digital team.

Since there is always a risk of mis-information and mis-alignment with reality, language models and other weak forms of AI with similar challenges, should be limited in the scope of their autonomy.

Further, to avoid disruptive or catastrophic outcomes, teams should leverage language models as an assistive capability and assess how their use creates new domain-specific responsibilities for ensuring a model’s alignment with business, product, and human-centered guidelines.

Up Next

Now that you’re up to speed on language models, Part 2 of this series will review how human-centered information architecture is used to remove the symbolic and semantic challenges of language models while contributing to a responsible pursuit of AI.