AI generally refers to 'narrow AI', which describes purpose-built algorithms that can perform specific tasks in a limited field only, such as language translation. In contrast, 'general AI' refers to algorithms that can learn anything humans can, by effectively transferring its capabilities in one field across to other fields (wherever necessary to perform all types of new tasks). Most work in data science concentrates around narrow AI.

Machine learning (ML) is one of the most prevalent methods of achieving 'narrow AI'. It works by effectively giving machines the ability to learn from data and recognise patterns through statistical models in order to make accurate predictions and, where necessary (such as when being deployed in self-driving cars), to determine the best course of action.

04 TA01 01

In the legal services sector, narrow AI is often deployed to perform tasks in the field of natural language processing (NLP). That is, tasks that involve interpreting and generating human language in both verbal and written forms. Performing these tasks well usually requires machines to recognise and understand grammar, sentence structure and word meaning, which they are often (but not always) trained to do through the use of algorithms and statistical models in ML.


ML algorithms are used to generate statistical models (trained on historical data), which generalise to new examples that are similar to the training data. These statistical models are configured to produce a specified type of output and operate purely on numerical values: numbers in, numbers out. All other types of input data (e.g. images for face recognition), must be converted into numbers before getting fed to an ML model. This also applies to words and sentences, which is the domain of NLP considered in Principle 4. ML can be supervised or unsupervised.

04 TA02 01

With supervised learning, a machine is given training data that includes inputs (e.g. a candidate's job history) and corresponding expected outputs (e.g. their current salary). When training, the machine tries to predict the outputs and in doing so assesses its own prediction errors to fine-tune the model. Supervised learning algorithms can be used to train models that perform regression (predicting a quantity e.g. current salary) or classification (sorting objects into categories).

In unsupervised learning, correct answers are not provided to the machine. The algorithm will find similarities between observations (data points) in a data set and group these together (this is known as 'clustering'). Some unsupervised learning algorithms are well suited to detecting anomalies, i.e. observations that seem out of place in the wider data set. Unsupervised learning can be used to detect credit card fraud, by highlighting transaction anomalies (e.g. from unlikely geographical locations).


Natural language processing (NLP) techniques enable machines to undertake a variety of tasks such as translating, summarising or categorising documents, identifying the names of entities mentioned within a given document, or recognising intentions or feelings expressed by the writer or speaker. In the legal services sector, these abilities are then applied to a broad range of use cases, such as reviewing contracts for risks and insights, conducting due diligence and case law research.

Although many NLP techniques involve machine learning, some techniques can be strictly heuristic (rule-based) and may not involve statistical models at all. These techniques are often based on pattern matching, or 'regular expressions'. A simple example is detecting whole questions in a document by searching for sentences (which can be viewed as continuous sequences of characters) that start with a capitalised letter and end with '?'.

04 TA03 01

Rule-based NLP cannot accommodate exceptions easily and it is therefore difficult to scale and maintain. However, compared to machine learning techniques, it can be deployed faster and, because it relies on human expertise to specify the set of applicable rules, its performance will not be hindered by a lack of sufficient training data otherwise required by ML algorithms and models.


Statistical models generated by ML algorithms can only be trained directly on numerical values, not on alphabets. Words in each document will therefore be converted into numbers when they are analysed by these algorithms and models. Often, the numerical outputs will also need to be converted back into natural language in order for them to be meaningful to humans (e.g. an algorithm output of 1 or 0 must be mapped back to the categories of 'confidential' and 'not confidential').

04 TA04 01

One way of converting a passage of text into numbers is to decide on an ordered list of words to include in a model (e.g. [party, loan, facility, ...]) to generate a corresponding list indicating how many times each word occurred in the text (e.g. [145, 14, 7, 30, ...]). An algorithm can then use these counts to categorise a suite of documents by topic based on the relative frequency of particular words (e.g. text that includes the words "rent" and "landlord" several times is likely to relate to property letting).

A more complex method of conversion involves using word embeddings as algorithmic inputs. For a given word, an 'embedding' is a list of numbers (like GPS-coordinates) that represents the meaning of that word, so that two words with similar meanings have similar 'coordinates' when compared. However, the challenge of capturing the different meanings of a word is that meaning can be context-dependent (e.g. 'bat' as in mammal or cricket?) and this is still an area under active research.


The performance and usefulness of models generated by ML algorithms can be evaluated using many different measurements ('metrics'). It is much more important for the right evaluation metrics and methods to be chosen than simply for a high percentage score to be achieved on a favourable metric. These choices depend on the type of algorithm used, the type of task performed, and how 'success' or 'utility' in practice should be measured for that given task.

Creating useful classification models is often based on business logic and requires discussion and input by subject matter experts. For example, a model for classifying credit card transactions as fraudulent may prefer a false alarm to an overlooked instance of fraud (higher model recall). On the other hand, a model for identifying good investments may prefer to find the best opportunities rather than all good ones (higher model accuracy).

04 TA05 01

With research still active in this space, it is important to assess ML tools in NLP not only on the high accuracy they promise but also on the metrics by which this is measured. For language translation, an example metric can be the percentage of correctly generated words. Given a correct output of 'I am here', an output of 'I is here' scores 2/3 = 67% accuracy. However, this metric cannot fully reflect true model performance since an arguably less accurate output of 'I am away' would also score 67%.


Thank you for signing up for LawtechUK news and updates

You will now be notified of all our latest updates.

Thank you for contacting LawtechUK

We will get back to you as soon as we can.

Thank you for your interest in the Open Legal Data project.

The LawtechUK team will be in touch with you shortly.

Thank you for signing up for LawtechUK news and updates

You will now be notified of all our latest updates.