ǸԹ

Skip to main content
Research and analysis

Appendix B: full methodology

Published 30 April 2026

Throughout the process to create the UK Standard Skills Classification (SSC), we relied heavily on the use of Artificial Intelligence (AI), particularly Large Language Models (LLMs). These were mostly OpenAI models (and generally the best available model at the time) but during early development we also used open-source Llama models for some of the more data intensive tasks that would have been too costly using OpenAI. At each stage, we manually reviewed exceptions and inspected the outputs from theAImodels, sometimes having to check large amounts of data. During the development of SSC Version 1.0, we investigated the performance of GPT-5.4. Evaluation of importance score differences between GPT-5.4 outputs and the GPT-4.1-derived prototype outputs for a sample of mappings showed a noticeably more accurate performance by GPT-5.4. This led to the decision to regenerate all primary and secondary mappings using GPT-5.4 for SSC Version 1.0.

A central part of ourAIapproach was the use of text embeddings. These are numerical representations (vectors) of text that allow computers to understand the semantic meaning of a piece of text. Text embeddings are used in clustering text and to compare meanings of text strings. To understand how related 2 text strings are to each other, we calculate the distance between the vectors using cosine similarity. The larger the score, the closer the two text strings are in semantic meaning.

This method of comparing text strings was used extensively throughout the project. Early in the project, we experimented with different embeddings models and decided on the use of OpenAI 3-Large (OAL3) embeddings in the main. For the development of SSC Version 1.0, we re-checked the performance of OAL3 against newer embedding models (e.g. Qwen3) but it remained the best performing for our specific needs.

AIwas also used in other ways in the project. We usedAIprompts at many stages of the project, for example to quality assure skill and task statements and to detect inconsistencies and errors in mappings. Designing and refining prompts often involved several iterations as we learnt the best ways to interact with theAIto get the desired results.

The steps below outline the main stages in the creation of theSSC.

Tasks

Figure 8: Development of UK SSC Occupational Tasks

Figure 8 outlines the development process of theSSCOccupational Task library, detailing the main input libraries used, the data cleaning steps, and the validation against other information sources.

This is displayed as a series of processing steps from T1 to T6 in a row across the top of the diagram with each step shown below in a flow diagram. On the left-hand side are the 4 input libraries which feed into the first processing step: T1.

The ‘T’ prefix for each step ID (such as T1) relates to ‘Task’. For similar processes shared in later sections an ‘S’ prefix relates to ‘Skills’ and a ‘K’ prefix to ‘Knowledge’.

The input libraries from top to bottom are:

  • the Graduate Futures Institute (GFI) responsibilities
  • Skills England Occupational Standard duties
  • the US Occupational Information Network (O*NET) tasks
  • National Careers Service (NCS) day-to-day tasks

The processing steps show:

  • T1 ‘validate as Task Statements’
  • T2 ‘cluster bySOCSUG
  • T3 ‘useAIto sub-cluster by meaning’
  • T4 ‘useAIto merge and deduplicate’ which has arrows pointing to 2 steps under T5
  • T5 ‘validate viaSOCSUG description’ and ‘validate against job ads’ which both have arrows to step T6
  • T6 ‘Occupational Tasks’

T1: Process and validate inputs

Task statement libraries were obtained from GFI(responsibilities), Skills England Occupational Standards(duties), O*NET(tasks), and National Careers Service (day-to-day tasks). These libraries were then cleaned and standardised usingAItools.AItools were used to quality assure the tasks statements and correct tasks that were too generic, too specific, too wordy, incorrectly structured, compound tasks or not tasks. The quality assurance process also converted US spellings and phrasing to UK English.

T2 - T4: Refine, deduplicate and cluster

Text embeddings were generated using two models: OpenAI 3-Large and Bidirectional Encoder Representations from Transformers (BERT) MP-Net and then a variety of cluster models were tested and compared to remove duplicate and similar tasks. OpenAI 3-Large embeddings with a hierarchical clustering model was found to have the best results.

Clustered tasks were sorted by meaning (based on embeddings) to identify overlapping and close clusters and merged through manual inspection. Orphan clusters (those containing only one task) were integrated with multi-task clusters using results from other clustering and embeddings models.

The centroid task statement within each cluster was identified and became the task label. These cluster labels then became the initial version of theSSCTask library.

T5: Validate against other sources

SOC SUGs

Tasks were extracted from allSOCSUGdescriptions (except n.e.c. groups ending /99) using the Llama3LLMand then embeddings were created to enable matching to theSSCTasks. The similarity between theSUGdescription task embeddings and theSSCTask embeddings was calculated to provide a numerical score representing the degree of similarity. The best matchingSSCTask for eachSUGdescription task was identified so thatSSCTasks were assigned to all relevantSOCSUGs. Potential Task toSUGmatches were also identified via an analysis of existing job profiles such as those withinO*NETwhere associated task statements appear in clusters used to deriveSSCTasks.

FurtherAIprompts were used to check the combined mappings and estimate the relatedness of these tasks to the associatedSUGs. Significant discrepancies between a legacy mapping (such as a task match with a high level of importance within anO*NETprofile but rejected by theAIanalysis) were manually checked and reconciled.

Vacancy data

TheIERholds a large vacancy database which is coded toSOCSUGs. A sample of distinct vacancy descriptions was created with a maximum size of 200 vacancies perSUG. The sample was selected from vacancies with longer job descriptions and those that were well coded to eachSUG.

Llama3 was used to extract tasks from this database of vacancy descriptions and then the tasks were quality assured, clustered, and embeddings created using a similar process to the creation of the task library (T2-T4). These embeddings were then compared to theSSCTask embeddings. Vacancy tasks that were quality assured as being ‘good’ tasks but had a low similarity score to an existingSSCTask were manually inspected to identify any tasks that should be added to theSSCTask library.

The database of vacancy tasks was also used to identify additional tasks forSUGswith no or low numbers of associated tasks and similarly forSSCSkills with no linkedSSCTasks.

T6: Final SSC Occupational Tasks

The final list ofSSCTasks consists of 22,583 tasks. This is based primarily on the SSC prototype version but following analysis of tasks added to O*NET, Skills England Occupational Standards (i.e. duty statements) and GFI responsibilities since the original task library was created, an extra 692 tasks were added. This extended library was then mapped toSSCSkills and Knowledge concepts and to occupations.

Skills

Figure 9: Development of UK SSC Occupational Skills

Figure9 shows the equivalent process for the construction of the hierarchical classification ofSSCOccupational Skills together with a set of 13 Core Skills.

This is displayed as a series of processing steps from S1 to S7 in a row across the top of the diagram with each step shown below in a flow diagram. On the left-hand side are the 6 input libraries which feed into the first processing step: S1.

S1 to S7 refer to each processing step in the creation of the Occupational Skills library.

The input libraries from top to bottom are:

  • European Skills, Competences, Qualifications and Occupations (ESCO) Level 4 skills
  • the National Careers Service (NCS) skills
  • O*NETDetailed Work Activities (DWAs)
  • Skills EnglandOccupational Standards skills
  • GFIskills
  • the Workforce Foresighting Hub, Innovate UK (WFH) skills

The processing steps show:

  • S1 ‘validate as Skills’
  • S2 ‘cluster by meaning’
  • S3 ‘useAIto merge and deduplicate’
  • S4 ‘map againstSOCSUGs’ which has arrows pointing to 2 steps under S5
  • S5 ‘validate against Tasks’ and ‘validate against job ads’ which both have arrows to step S6
  • S6 ‘Occupational Skills’
  • S7 ‘Core Skills’

S1: Process and validate inputs

Skills statement libraries were obtained fromGFI(skills),ESCO(Level 4 skills), Skills England(skills), Innovate UK Workforce Foresighting Hub (skills), O*NET(Detailed Work Activities) and the National Careers Service (skills). These libraries were cleaned and standardised usingAItools.AItools were again used to quality assure the skill statements and correct any that were too generic, incorrectly structured, compound, invalid, elementary, ambiguous, traversal, or too specific.

The text below shows an example of a prompt used to quality assure skill statements:

dzٳٱ=ŨŨĝ

A good occupational skill label complies with all of the following criteria:

1. It describes a skill that requires significant training and practice to acquire.

2. It describes a skill and not an attitude or outcome. For example, ‘maintaining a positive outlook’ or ‘Ensuring customer satisfaction’ would therefore not qualify as occupational skills.

3. It describes a skill that is developed and not innate. For example, ‘a good sense of smell’ is not a skill although “Smelling foods and ingredients to evaluate quality” is.

4. It begins with an action-based verb followed by a specific noun (i.e. describes something being actively done to an object).

5. It is no more than nine words long (and ideally between three and six).

6. It is unambiguous (i.e. it describes a specific skill and couldn’t be misinterpreted as something else)

7. It describes a specialist skill and therefore is only relevant to a subset of jobs. For example, “supervise workers” is too broad

8. It describes a skill that is broad enough to be relevant to or transferable between multiple jobs but not overly generic

Examples of good occupational skill labels include:

1. Install heat pumps

2. Administer standardised psychological tests

3. Manage software development projects

4. Read musical scores

5. Inspect aircraft to check airworthiness

6. Design relational database schemas

Quality Evaluation Category Codes, Category Names & Rewriting Guidance:

For evaluation and, where necessary, editing, occupational skill labels can be classified into one or more of the following categories:

  1. Good - This label meets all the criteria
  2. Compound - This describes multiple skills. It needs to be split into multiple skill labels, one per different skill.
  3. Too Generic - This is too generic and isn’t describing a specific skill.
  4. Invalid - This does not describe a skill and is instead a tool, subject, attitude or outcome. It needs to be removed.
  5. Too Complex – The vocabulary used to define the skill is verbose and unnecessarily difficult to read. It needs to be simplified.
  6. Disordered – This label does not follow the verb-noun sequential format. It needs to be rewritten to present the information in this order.
  7. Elementary - This is an unskilled or very low-skilled activity
  8. Ambiguous - This label could represent two totally different skills
  9. Traversal - This is a skill that is very broad and is required in a wide variety of unrelated job role
  10. Too Specific - This is a skill that is too specialised and only relevant to a specific part of one job

Quality Evaluation Category Examples:

Examples of skill labels assigned to the various evaluation categories (some examples may belong to more than one category)

  1. Good – “Administer standardised psychological tests.”
  2. Compound – “Design, administer & interpret standardised psychological tests.”
  3. Too Generic – “Analyse data.”
  4. Invalid – “Stay positive.”
  5. Too Complex – “Apply research ethics and scientific integrity principles in research activities.”
  6. Disordered – “Safe working Practices: Meet legal, industry and organisational requirements.”
  7. Elementary - “Fill kettle with water” or “Pass dental instruments.”
  8. Ambiguous - “Conduct pipeline analysis” (this is ambiguous as it could refer to an oil or data pipeline)
  9. Traversal - “Think analytically”
  10. Too Specific - “Repair vehicles with fuel-injection problems”

With this context, please evaluate the occupational skill labels in the provided list of tuples (containing the statement_id and statement_text) and assign each one to one or more of the of the Evaluation Category codes.

Next step:

Rewrite each statement by applying the rewriting guidance for all of its category codes as well as using the original criteria for good occupational skill labels and examples of good skill labels provided.

For example, a code 2 (Compound) statement should be split into two distinct skill labels.

If the original statement does not contain enough information to apply the guidance properly then instead assign a label “Insufficient content to rewrite”.

Finally, return a json list of dictionaries (one dictionary per record) containing (in the following order):

  • Statement_id:
  • Statement_text:
  • Evaluation_categories: A comma separated list of the Evaluation Category codes and their corresponding names
  • Statement_refined: The rewritten statement or statements or the label “Insufficient content” (*If there is more than one statement, these should be separated by the “#” character.)

ŨŨĝ

Please note that this prompt was developed in May 2024 and used with theLLMmodel OpenAI GPT-4o. CurrentLLMsare significantly more capable and the prompt could be improved (quite possibly by anLLM) to produce better results. Use of this exact prompt is therefore not recommended.

S2 - S3: Refine, deduplicate and cluster

OpenAI 3-Large embeddings were created and a hierarchical clustering model was used to deduplicate and refine the library of skills.

Skill clusters were then sorted by meaning to identify overlapping clusters and these were manually inspected for inclusion or deletion.AIprompts were used to analyse the consistency of the skill clusters and generate a new skill label to best describe the cluster of skills (rather than using the centroid skill as the label).

The verbs in the skill labels were standardised and became theSSCSkills.

AItools were used to write a description of theSSCSkill label and then a further prompt identified any ambiguous skills labels and descriptions which were rewritten.

Create Skill Groups, Areas and Domains

TheSSCSkills were clustered to create Skill Groups and parent or child overlaps were manually checked. AnAIprompt was used to check theSSCSkills within each Skill Group and identify any overlapping Skill Groups.

The Skill Groups were then clustered to create Skill Areas and the language of the Skill Groups and Skill Areas was standardised. AnAIprompt was used to check the skills in each Skill Area and return a skill relatedness score.

The Skill Areas were then mapped to Skill Domains and anAIprompt used to checkSSCskills within Skill Domains.

S4: Map against SOC SUGs

The original prototype mapping fromSOCSUGstoSSCSkills was based primarily on the occupational mappings in the input skill libraries.

The final Version 1.0 mapping was however entirely regenerated by first identifying potential matches from a text embedding comparison of the new Version 6 SOC SUG titles and descriptions against SSC skills and descriptions. The potential match lists were then extended by the addition of any of the top 30 skill matches from the original mapping not already included.

The original SUG to skill mapping contained only a single weighted importance score but, even with the latest LLMs (e.g. GPT-5.4), prompts to generate similar importance scores were inconsistent (i.e. running the same prompt against the same dataset would generate significantly different scores). This aligned with broader concerns about prototype mapping score accuracy. SUGs are quite broad occupational concepts and skills within some related roles may be very important while unrelated to others. For example, different application developer roles will involve using different programming languages and libraries which, in turn, will need different skills. A prompt evaluating the requirement ‘probability’ for a skill within an SUG, and then the percentage ‘importance to competence’ separately led to the generation of significantly more consistent scores. Moreover, these were significantly more correlated with a sample of independent importance evaluations of the existing SSC skill matches than the original mapping.

The augmented potential match lists were therefore evaluated using the AI prompt format below to generate these two distinct estimates of relatedness. Matches with a frequency score below 10 and an average weighted score of below 25 were typically excluded, although some were retained to improve overall coverage (see “The UK Standard Skills Classification” for details).

Example prompt:

“Ũĝ

You are a skills analyst and need to evaluate the importance of skills within a list to a specific UK occupation.

To do this you will be given a list object that contains:

1) An occupation_id

2) An occupation_title and description (hyphen separated e.g. ‘Chemical engineers - Chemical engineers design and develop large scale chemical and physical production processes.’)

3) A list of ; separated tuples containing a skill_id and a hyphen-separated skill_label and skill_description

For example:

[1132/02,’Sales directors - Sales directors are responsible for overseeing all sales operations for an organisation or business.’,(S.2978;Supervise sales staff - Set daily priorities, monitor calls and deals, coach staff on products, and review progress against targets.);(S.0106;Analyse sales data - Analyse sales figures to find trends by product, customer or region and spot issues affecting revenue.);(S.2862;Set sales targets - Set measurable sales targets based on past results and forecasts, such as revenue, units sold or new customers.)]

For each of these occupation_skill lists please evaluate each skill and then:

1) Assign a % probability (as an integer value) that in a UK context the skill would be required for roles within the occupation described (for example, an application developer role would be more likely to require the skill of Python programming as a skill rather than Scala or Rust). Remember this score as the skill_required_probability_percentage..

2) Assign a % score (as an integer value) to indicate the importance of competence in that skill to overall competence of roles belonging to that occupation and requiring that skill (For example an application developer role requiring Django). Remember this as the skill_importance_if_required

3) Don’t include any rationale, return only a json list of dictionaries (one dictionary per occupation-skill pair) containing (in the following order):

a) occupation_id:

b) skill_id:

c) skill_required_probability_percentage:

d) skill_importance_if_required

“Ũĝ

Weighted frequencies were then calculated to show how SUGs relate to SSC Skill Groups and SSC Skill Areas.

S5: Validate against other sources

SSC Tasks

TheSSCSkills embeddings were compared toSSCTasks embeddings to identify links between them. This mapping was then checked using anAIprompt and a further prompt defined the importance score of theSSCSkill to theSSCTask.

Vacancy data

Following a similar process to the validation of tasks using vacancy data, skills were extracted from a sample of vacancy descriptions using Llama3. These were quality assured usingAIand then embeddings were created and the vacancy skills were clustered within eachSUGand then across allSUGs. The centroid embedding within each cluster became the vacancy skill label. These were then compared to theSSCSkills embeddings to check coverage and any vacancy skills quality assured as being of good quality with a low similarity score to theSSCSkills were inspected for inclusion. This resulted in eight new concepts (e.g. S.1388 - Install EV charging points) being added to the prototype classification.

S6: Final SSC Occupational Skills

The set ofSSCSkills consists of a hierarchy of 3,350 Occupational Skills, 607 Skill Groups, 106 Skill Areas and 22 Skill Domains. This is based primarily on the prototype classification but, following user feedback and evaluation of pilot outputs, 10 new occupational skills were added, 27 skill labels modified (e.g. ‘S.0271 - Build axed arches and haunch brickwork’ changed to ‘Build arches and angled brickwork’ to improve clarity) and two redundant concepts removed. All occupational skills descriptions were also revised using GPT-5.4 to improve consistency and readability. The datafile changelog contains full details.

S7: Core skills

The Skills Builder Partnership essential skill concepts were considered and then a list of 13SSCCore Skills and definitions were drawn up.

AIprompts were used to help create definitions for each of the 5 skill levels of eachSSCCore Skill and then to evaluate the level of Core Skill proficiency in eachSSCSkill and eachSOCSUG. SeveralAImodels were used in this step to try to attain the best and most consistent results.

Knowledge

Figure 10: Development of UK SSC Knowledge concepts

Figure 10 illustrates the process to develop theSSClibrary of Knowledge concepts.

K1 to K6 refer to each processing step in the creationof the Occupational Knowledge library.

The main input libraries from top to bottom are:

  • ESCO(European Skills, Competences, Qualifications and Occupations) Knowledge concepts
  • Higher Education Coding of Subjects (HECoS)
  • Learn Direct Classification of Subject Codes (LDCSC)
  • O*NET(knowledge, tools used and technology skills)
  • Stack Exchange (topic tags)
  • Wikipedia (article titles)

The processing steps show:

  • K1 ‘validate as Knowledge concepts’
  • K2 ‘cluster by meaning’
  • K3 ‘useAIto merge and deduplicate’ which has arrows pointing to 5 steps under K4
  • K4 ‘validate versusOfqual’, ‘validate versusSkills England’, ‘validate versus tasks’, ‘validate versus job ads’, and ‘validate versus prototype’ which each have arrows to step K5
  • K5 ‘identify primary concepts’
  • K6 ‘Occupational Knowledge’

The Knowledge concept, subject and topic names were collected from the input libraries.

K1: Process and validate inputs

Knowledge libraries were obtained fromESCO(Knowledge),HECoS(Higher Education Coding of Subjects), LDCSC(Learn Direct Classification of Subject Codes), O*NET(Knowledge, Tools Used and Technology Skills), Stack Exchange (Topic Tags) and Wikipedia (Article Titles). These were cleaned and standardised usingAItools. The list of Knowledge concepts was checked for any matching or equivalent terms and then filtered to only include concepts that were evident within a UK context.

K2 - K3: Refine, deduplicate and cluster

Knowledge concepts were clustered by meaning using embeddings and further deduplicated using clustering methods.

K4: Validate against other sources

Ofqual

Up to 50 potential matches per qualification were identified by comparing a text embedding vector of a concatenated text string of each qualification title and its associated qualification units against a text embedding for eachSSCKnowledge concept label.

Text embedding vectors were generated using the OpenAI 3-Large Model with a cosine-similarity match threshold of 0.3 being applied. Matches above this threshold were then evaluated by prompting anLLM(GPT-5.4) with a simplified text string for each qualification (its simplified title and up to 5 example qualification units) to validate each match and also, where appropriate, assign a percentage probability that “a significant amount of knowledge in that area would be learnt by achieving that qualification”. Following a sample inspection, matches assigned a match probability score below 50% were rejected.

The closest Sector Subject Areas were identified using embedding matches and validated using anLLMprompt and manual inspection.

IfATE and Skills England

Up to 10 potential matches per Occupational Standard Knowledge statement were identified by comparing a text embedding vector of a concatenated text string of each statement and its associated occupational standard against a text embedding for eachSSCKnowledge concept label. Text embedding vectors were generated using the OpenAI 3-Large Model with a cosine-similarity match threshold of 0.3 being applied. Matches above this threshold were then evaluated by prompting anLLM(GPT-5.4) and, where appropriate, assign a percentage importance of the knowledge to that statement. Following a sample inspection, matches assigned a probability score below 50% were rejected.

SSC Tasks

Embeddings matches were also used to assignSSCKnowledge concepts toSSCTasks and then anAIprompt checked whether the Knowledge concepts had been correctly assigned to Tasks. AnAIprompt was then used to define the importance score of the Knowledge to the Task.

Vacancy data

The sample of vacancy descriptions was searched for theSSCKnowledge concepts to check that they are all terms in common usage.

K5: Primary concepts

The primary concept type and potentially related concepts were identified using embeddings matches and checked usingLLMprompts.

K6: Final SSC Occupational Knowledge concepts

The final set ofSSCKnowledge concepts consists of 5,056 concepts linked to SSCTasks, SSCSkills and subjects. This is based primarily on the prototype classification but, following user feedback, evaluation of pilot outputs and a re-analysis of previously excluded terms, 145 new concepts were added, 10 concept labels were modified to improve clarity (e.g. ‘K.0663 – Casting’ changed to ‘Casting (Manufacturing)’) and 15 redundant concepts were removed. All concept descriptions were also revised using GPT-5.4 to improve consistency and readability. The datafile changelog contains full details.

Secondary mappings

Secondary mappings to existing classifications of Skills, Tasks and Knowledge concepts were created using embeddings matches. The full list of secondary mappings available can be found in Appendix A.

Skill categorisations

1. Numeracy skills and Digital skills

These classifications were created using theSSCSkills that were rated as requiring an expert level of proficiency in theSSCCore Skills of Numeracy and Digital Literacy.

2. Green skills

AnAIprompt was used to score eachSSCSkill in how related (directly or indirectly) it is to the UK’s net zero emissions target and other environmental goals. Using previous work to define the GreenSOC(see, the skills mapped to greenSUGswere also identified. A manual inspection of the skills with highAIgreen scores and those mapped to greenSUGswas then carried out to identify a list of Green and Green enabling skills.

3. STEM-M&H (Science, Technology and Engineering, Mathematics, Medicine and Health) skills

The definitions of STEM-M&H used to define SUGs as STEM-M&H in were used in an AI prompt to score the SSC skills against each of the four categories. After a manual inspection and comparison to the STEM-M&H SUGs linked to each skill, a threshold score was applied to define theSTEM-M&Hcategory.

4. Artificial Intelligence (AI) skills

A model was developed to define 4 different categories of AI skills, as listed in Table 5 below.

Table 5: AI Skills Categories within the UK SSC

AI Skill Category Name AI Skill Category Description
AI Development Skills Technical skills that help develop, implement and maintain Artificial Intelligence (AI) tools and capabilities.
AI Operation Skills Skills that directly relate to the use of Artificial Intelligence (AI) tools and capabilities.
AI-Augmented Skills Skills that can be performed without AI tools and capabilities but can be materially simplified, accelerated, improved or scaled through their use.
AI Oversight Skills Skills that help plan, govern, monitor, audit, assure, validate, regulate, approve, or oversee the safe, lawful, ethical, effective, or responsible use of Artificial Intelligence (AI) tools and capabilities.

AnAIprompt was used to assign a percentage score for each of the categories for eachSSCSkill. For scores above 50% a rationale was also generated for validation purposes.