Data is information arranged in varying degrees of structure, whilst a data set is a collection of data. A data point is a discrete unit of information that is extracted from data e.g. the contract date in an agreement. Each data set can consist of text and numbers (or pictures, audio and video translated into numerical form). The degree of structure in a dataset determines how easily it can be ‘understood’ by a computer and/or algorithm.

A data type is a data storage format that specifies what type of value (or format) a variable has (or must have). For example, whether the input must be in a text or numerical format. Many data types exist, but common ones include: string (any sequence of characters); date (e.g. yyyy-mm-dd); boolean (e.g. yes/no, true/false); integers (whole numbers); floating points (numbers with decimal points); and character (a single character that can be a letter, number or symbol).

An algorithm is a procedure, or set of rules, for a computer to follow in a specified sequence, to achieve a particular goal. A common analogy is to a recipe. Algorithms are the foundation of computer programming and data science as they allow computers to process data (and automate computer decision making). To work successfully, it is important that an algorithm has clearly defined steps, is feasible and finite.

Data scientists often use computer programming languages (such as SQL, R and Python) to analyse large amounts of data efficiently, but it is possible to conduct useful data analysis without writing code. Microsoft Excel has a number of useful data analysis and visualisation functions and there are an enormous number of tools and accompanying courses that seek to make data analysis easy for beginners.

Data science can provide an alternative perspective on all kinds of legal work and can often make work more efficient and rigorous. For example, text and predictive analytics can assist with automating discovery (identifying relevant disclosures), predictive analytics can be used to anticipate litigation and settlement decisions, and data visualisation is increasingly used to support arguments in court.

