Can AI do Mediation? - AI Conflict Analysis & Prognosis - Part I - Section 1

by Monika Ortega y Strupp

AI Conflict Analysis & Prognosis (Part I)

October 2024
Can AI do Mediation?
AI Conflict Analysis & Prognosis
Part I
Monika Ortega y Strupp
Conflicts are as old as life itself. Understanding them and dealing with them efficiently promises to conserve one's own energies and provide an advantage in life and survival that should not be underestimated. This applies not only to major political conflicts, but also to personal, private and interpersonal conflicts. Mediation is a way out of conflicts to find consensus solutions.
Nowadays, the following questions arise: To what extent has AI - or more precisely
Deep Learning Models (DLMs)
In the following, I use the terms
Artificial Intelligence (AI)
and
Deep Learning Model (DLM)
as synonyms, although there are various other AIs.
*
- already found their way into this area? What are the advantages of involving AI in conflicts? What are the risks? What are opportunities?
In this first series of articles, I would first like to lay the foundations for understanding this technology in order to answer these questions.
How do AIs work?
In order to understand how AIs and DLMs can currently be used, we first need to understand how they work. The development of artificial intelligence did not just start yesterday and has decades of history behind it.
The key question is:
Can a machine think / learn like a human?
Of course, the question can be asked whether this is desirable - but it would be going too far here.
*
Deep Learning Models
are currently (still) enjoying a great deal of hype. They are based on the use of neural networks. Roughly speaking, they are modelled on the functioning of natural nerve cells: signals can enter a nerve cell and signals can leave the nerve cell again. In other words, there are input signals and output signals. A technically constructed neural network consists of a topology of technical neurons. Depending on the type of neuron and architecture, neurons can perform different tasks.
Figure 1:
Imagination of ChatGPT 4 of a biological neural network
Figure 2:
A technical neural network with multiple neuron layers
Fig.1 shows an abstract idea of several biological neurons connected to each other in a single strand. In such nerve connections, signals can be recorded, generated or not generated as required and thus transmitted to different connection points. Fig.2 shows a technical neural network with neurons arranged in a topology with different layers. Input signals are also processed and output signals generated in this network.
Deep Learning Models
Deep Learning Models consist of neural networks and are named after their architecture. Their architectural structure consists of different, hidden layers or
depths
of neurons. These layers enable them to learn complex patterns and representations from large amounts of data. And they require large amounts of data!
A Deep Learning Model that is used to generate language is known as a
Large Language Model (LLM)
. It is trained to calculate probabilities for language sequences. To put it in simple terms, the task is to calculate the probability of how a sentence or text will continue. We are familiar with this in a simplified form when we are impatient when listening to someone. They say ‘I'm eating a ...’. And in our minds we complete this sentence with the word ‘... Bread’. But if we don't know, it could also be a jelly baby. The calculated speech sequences of LLMs can of course be much more complex.
If an LLM is trained from scratch, it requires billions of text units
(Tokens)
. The ChatGPT model GPT-3.5 is such a Large Language Model and was trained with approx. 300 billion words from the Internet, which had a total data size of approx. 570 GB.
Very few enterprises have this amount of data at their disposal.
To better visualise this: Let's assume that a person can speak 140 words per minute per hour. And let's further assume that he could do this for 12 hours a day without interruption. Then he would be able to speak around one hundred thousand words in a day. If a person could do this every day, it would take him about 8,000 years to teach an LLM like ChatGPT.
Classic programs, such as search or sorting procedures, are usually based on deterministic algorithms. The program sequence is determined by rules. The same input produces the same output. DLMs are often designed differently.
A deep learning model constructs generalisations based on examples and transfers these to new data. DLMs can also deliver non-deterministic results. These are predictions based on statistical distributions that can differ even with the same input.
To put it clearly: Deep learning models
calculate probabilities
, even if you can't see it in the output. It is important to understand that the output of an AI is not based on facts and deductions, but on distributions and similarities.
An input in ChatGPT generates the probability of a response sequence. ChatGPT only displays the sequence and not its associated probability. And since probabilities do not necessarily apply, the output of ChatGPT is not necessarily true.
Figure 3:
DLM Midjourney generated visualisation of hands with six fingers each, with the task of imagining a handshake [...].
Image generators currently still deliver a common error: the representation of more or less than five fingers on a human hand (see Figure 3). This is not surprising, because a DLM cannot count. It calculates the probability of what part of the image is next to the image part of a finger. And what is the probability for an AI that there is a finger next to a finger in an image?
The development of a DLM for a specific task area is initially based on a training phase. In this phase, the AI is provided with positive and negative examples. From this training data, an AI can learn to recognise patterns or generalisations. The test coverage and quality of the training examples determine the quality of learning. In short, in the worst case: bullshit in, bullshit out!
Figure 4:
Learning from training examples - left: underfitting, centre: ideal, right: overfitting
Especially with a small database for training, there is a risk that the model to be trained cannot find ideal patterns. With a small or poor coverage of training examples,
underfitting
,
overfitting
or
catastrophic forgetting
can occur in the AI. (See Figure 4)
If a model is trained in the training phase with poor example coverage, this can lead to overfitting (Fig.4, right). Although the model can then recognise the training data well, it fails during generalisation. In the case of underfitting, the AI cannot achieve its learning goal due to insufficient training data coverage (Fig.4, left). The model cannot fulfil its task. With specific learning, it can also happen that the newly learned pattern displaces those already recognised by the AI. The model then forgets its previous knowledge, which is known as
catastrophic forgetting
Of course, you want to avoid these phenomena, which is why the selection of the right training examples and the training process for a DLM are extremely important and the database should not be too small.
If you now want to use an AI in the field of conflict analysis or forecasting, the question arises as to how and in what form you might use sample data for an AI, possibly in large numbers.
The strengths and weaknesses of DLMs
The strengths of DLMs lie in recognising patterns. More precisely: in learning patterns and predicting and categorising values based on probabilities. A DLM can learn things from a vast amount of data that the human brain lacks the capacity for.
A DLM can use this pattern learning to make classifications, predict subsequent values, complete sequences, generate similar values for input, make decisions or recognise anomalies.
In his novel
The Hitchhiker's Guide to the Galaxy
, Douglas Adams describes the answer given by the supercomputer
Deep Thought
, which, after a very long period of reflection, provides the answer 42
to the Ultimate Question of Life, the Universe, and Everything
In terms of derivation, Deep Thought is comparable to today's DLMs. It is a black box. Apart from the fact that no one in history knows the exact question, the supercomputer's answer may or may not be true. The derivation of this answer is not known.
How a DLM comes up with its patterns is its secret. For humans, AI is a black box.
AI cannot (currently) explain what it learns. It recognises its own patterns, which can surprise people both positively and negatively.
For example, Amazon developed an AI in the HR area. The aim of this AI was to identify the best applicants quickly and easily. Applications from a period of ten years were used as training data. Over time, it turned out that the AI had defined masculinity as a quality characteristic of technicians. The AI had learnt a pattern from the applications: technicians are male. There were so few women in engineering and so few applications from women in the training data that the AI considered masculinity to be a key qualitative characteristic of the engineering profession.
Another example of a surprising pattern is provided by AI, which has learned to diagnose diabetes based on the voice of test subjects.
Figure 5:
Possible
thought steps
from o1 to the question
What is 13 x 12?
(Chain of Thought)
OpenAI's o1 is also based on the same principles as all DLMs. The presented deduction, the
thinking
of o1 (see Fig.5) are only learned patterns. They have been trained by examples and are applied by o1. The individual
thought steps
still consist of calculating the probability of sequences, just like the final thought, the generated answer. They may or may not be true. This is not understanding in the sense that a principle is grasped, formulated and applied to a new task.
The real strength of current AIs is therefore their ability to independently learn patterns using large amounts of data and use them to calculate similarities, classifications, predictions or forecasts. They are not good at deduction, their results are non-deterministic and not necessarily true.
Training an AI is expensive. It requires both hardware with the appropriate capacity and energy, both in the training phase and during operation. The human processing of training examples also contributes to the costs. In realisations, the question therefore always arises as to whether the results that an AI provides, which do not necessarily have to be correct, represent such a great benefit that they outweigh the costs.
How can the capabilities of AIs be assessed?
To evaluate an AI mathematically, its ability to predict new values accurately is considered. Let us imagine a simple classification task, for example:
What is a cow?
(See Fig.6) Assume an AI model evaluates inputs as positive if it is a cow and negative if it is not. Fictitious results are shown in Figure 7.
Figure 6:
Example classification task: What is a cow?
Figure 7:
An AI evaluates what a cow is as positive or negative.
To evaluate the capabilities of this AI, we assess its results. Which results are true and which are false? (See Fig.8) A so-called
confusion matrix
can be set up for this purpose. (See Fig.9) The accuracy of the AI's calculations can be viewed here in relation to each other. In this example, the AI has made: four right positive
(cow correctly recognised)
, three right negative
(no cow, correctly recognised)
, one false negative
(cow, not recognised)
and one false positive
(incorrectly recognised as a cow)
classification. Mathematically, this allows very precise statements about the precision and behaviour of the AI to be established.
Figure 8:
Evaluation of the AI results: What is a cow?
Figure 9:
A confusion matrix for the AI results.
What cannot be seen in the mathematical values are the effects of individual incorrectly categorised values. False categorisation of an AI in classifications or incorrect answers or results in generative AIs can have very far-reaching consequences for individuals. What about the cow that is not a cow? What about the zebra that is a cow? - We humans also make mistakes, is an AI better or worse at this? What happens to the mistakes?
In order to evaluate the use of AI, it is not only essential to assess its purpose and functionality, but also the impact it can have on individuals, interactions and people. This applies not only to its use as an instrument in regular operations and its intent, but also to malfunctions, possible misuse and erroneous results in normal use.
What does this range of AI capabilities entail in conflicts? And what are the risks of using them in conflicts?
------------------ END PART I - Episode 1 ------------------------
Outlook
The results of modern AI are impressive. The capacities for data acquisition exceed the capabilities of a human being many times over. The results are calculated probabilities. Is the use of AI in conflict analysis and forecasting worthwhile? What possibilities are there? How interesting or advantageous is it to use AI to support the prediction of conflicts? How can this be done?
These and other topics continue in PART I - Episode 2
Literature & Sources
(1) Based on a presentation at the AI Campus on Natural Language Processing by Salar Mohtaj, German Research Centre for Artificial Intelligence, TU Berlin
(3) Source: „An Introductory Guide to Fine-Tuning LLMs“, Joseph Ferrer, August 1, 2024
(4) Model for the illustration from „Machine Learning tips and tricks cheatsheet“, Afshine Amidi und Shervine Amidi, Stanford University, Herbst 2018

Go back