The purpose of this document is to provide a rough framework for understanding the roles and responsibilities of a machine learning and artificial intelligence team in a small/medium business. It is a living document and I will periodically add information and links
Introduction
In a machine learning practice there are, potentially, a large number of responsibilities to distribute amongst practitioners. The specific responsibilities and practice will depend on what your team does and how they serve the business.
Whether your business operates a dedicated AI/ML team as a centre-of-excellence or whether you operate a squads and guilds model where ML expertise is embedded into specific teams, coordination of these competencies and roles is crucial in order to ensure that AI and ML projects are carried out to a high standard within your organisation.
Core Team Responsibilities
ML Project Workflow
Almost all machine learning projects follow the same general workflow. The main differences between projects lie in the area of focus.
graph LR
Start((Start)) --> A[Data Preparation]
A --> B[Model Development]
B --> C{Evaluation}
C -->|Satisfactory| D[Deployment]
C -->|Not Satisfactory| B
D --> E[Monitoring]
E --> F{Performance OK?}
F -->|Yes| E
F -->|No| G{Major Changes?}
G -->|Yes| A
G -->|No| B
classDef start fill:#4CAF50,stroke:#333,stroke-width:4px;
class Start start;
This diagram illustrates the typical lifecycle of a machine learning project, from data preparation through model development, evaluation, deployment, and ongoing monitoring. It emphasises the iterative nature of ML projects, with feedback loops for model refinement and potential revisiting of earlier stages based on performance.
The ML team should take ownership of implementing and managing this entire workflow. They are responsible for ensuring each stage is executed effectively and that transitions between stages are smooth. However, successful integration with other business units is crucial. For instance, the data preparation stage may require collaboration with data engineering or business intelligence teams. The deployment phase often involves working closely with DevOps or IT infrastructure teams. During monitoring, the ML team should establish clear communication channels with end-users or product teams to gather feedback and identify performance issues promptly.
Horizon Scanning/Research
Team members must make a concerted effort to stay up to date with the current state-of-the-art in the machine learning community. AI and ML are fast-moving areas where important research quickly transitions from universities and academic institutions into engineering departments at tech firms.
To maintain cutting-edge knowledge and skills:
-
Regularly read papers and preprints from advanced tech firms like Huggingface, OpenAI, Google, and Microsoft. Joining or forming a regular reading group or interest group can help facilitate discussions and deeper understanding of these materials.
-
Attend meetups, conferences, and workshops to network with other practitioners in the space. Consider participating in industry-specific events that align with your company’s market. If there’s enough interest, hosting such an event can position your team as thought leaders in the field.
-
Allocate time for experimenting with new models and frameworks. This hands-on experience can significantly reduce the learning curve when implementing cutting-edge models in live projects. It also provides necessary context and understanding for practitioners to more effectively access and interpret new research papers.
-
Encourage team members to write about or present their findings and experiences to peers, either within the company or at external events. This practice helps cement learnings, builds personal brands, and contributes to the broader AI/ML community.
See below for specific advice on horizon scanning and staying up to date for different types of models and projects.
Advocacy and Evangelism
The AI and ML team’s advocacy and evangelism role involves educating stakeholders about the responsible and effective use of AI technologies within the organisation. Practitioners should focus on promoting AI solutions for use cases with clear, measurable outcomes and success criteria. They should emphasise the importance of starting with existing processes, pain points, and hypotheses rather than blindly searching for patterns in data.
Practitioners must advocate for ethical AI practices and ensure that stakeholders acknowledge that all models have inherent biases. They should stress the need to consider and mitigate potential negative impacts of automated decisions. Promoting human-in-the-loop approaches and ensuring decision-makers have access to relevant information is crucial.
Additionally, the ML team should educate stakeholders about the ongoing nature of AI projects, emphasising the need for continuous monitoring, maintenance, and model retraining to address concept drift over time. By focusing on these aspects, the team can foster a responsible and sustainable approach to AI adoption within the business.
Data Exploration, Management and Annotation
Dataset Management and Versioning
Secure and Compliant Data Management
The secure and compliant data management process is critical to ensuring that client data is handled responsibly and ethically. Practitioners must be aware of, and adhere to, all contractual and legal requirements for storing client data securely. This includes implementing a robust data deletion plan in the event of GDPR subject access requests or revocation of data processing consent by clients. This may also include committing to specific backup and recovery strategies (see Backup and Recovery).
Purposeful Data Collection
When collecting data, it’s essential to approach the task with purposefulness and caution. Practitioners should scope projects thoroughly to understand the specific data needs, avoiding blind data transmission that can lead to unnecessary collection of sensitive information. This includes pushing back on requests for data that could lead to discrimination or harm, and excluding sensitive attributes such as gender or race (NB: there are almost no valid use cases for collecting or using this kind of data in a commercial context. However, there may be some specific academic/investigative contexts where such data is carefully and sensitively analysed).
Code-Data Linkage and Access Control
Effective data version control and code-data linkage is essential for maintaining transparency and reproducibility in AI projects. Practitioners should utilise tools like DVC (Data Version Control), which integrates with Git and allows data versioning to be tied to code versions. whilst storing data separately from code repositories in secure environments. Additionally, implementing strict access controls through role-based access control (RBAC) ensures that only authorised personnel can view or modify data. Data storage tools like Google Cloud Storage, Amazon S3 and Azure Blob Storage provide RBAC by default and practitioners may be able to lean on DevOps and infrastructure team for advice on how to set up this access correctly.
Documentation and Data Lineage
Maintaining comprehensive documentation of data sources, preprocessing steps, and transformations applied is crucial for ensuring transparency and accountability in AI projects. Alongside commitments to version control and data integrity, practitioners should document data lineage to track the origin and evolution of datasets, making it easier to identify potential issues and ensure reproducibility of results. This could be as simple as keeping a data change log alongside scripts in the code repository in which comments are added when data sets are changed or updated.
Backup and Recovery
Maintaining regular backups of data and implementing a robust recovery process is critical for preventing data losses and ensuring business continuity. Practitioners should test recovery procedures periodically to ensure their effectiveness, reducing downtime and minimising potential disruptions to operations. Data stored in cloud storage should be eligible for automatic backups but practitioners may also want to make use of tools like Restic for local backups.
Exploratory Data Analysis (EDA)
Context-Driven Analysis
EDA is a crucial step in the machine learning pipeline, requiring a tailored approach based on the specific use case and stakeholder requirements. Practitioners should begin by thoroughly understanding the business context and desired outcomes, ensuring that the EDA process aligns with project goals and key performance indicators. Practitioners should push back on requests to start EDA without defined success criteria (see Purposeful Data Collection above) since EDA is tightly coupled to the business context and the task at hand.
Data Quality and Consistency
A comprehensive EDA involves rigorous assessment of data quality, consistency, and integrity. This includes checking for missing values, identifying outliers, and verifying data ranges. Practitioners should probe for existing patterns and correlations within the dataset, paying special attention to the relationships between features and the target variable. In the NLP domain additional checks like vocabulary size and document complexity should be conducted.
Insights and Decision-Making
The EDA process should ultimately highlight the strengths and weaknesses of the dataset, providing insights that inform decision-making for subsequent stages of the ML pipeline. This may involve data cleaning, preprocessing, or even the need for additional data collection or generation (see Curation and Annotation) . By conducting thorough EDA, practitioners can assess the feasibility of proceeding to the modelling stage and identify potential challenges or biases that may impact model performance. Remember, EDA is an iterative process that may need to be revisited as the project evolves, ensuring that the foundation for successful machine learning implementation remains solid and well-informed.
Data Curation, Annotation and Synthesis
Making Use of Subject Matter Expertise
Data curation and annotation require close collaboration with subject matter experts (SMEs) to ensure the quality and relevance of the dataset. SMEs provide crucial business context and insights that help ML practitioners understand why certain parts of the dataset are of high or low quality. Their expertise is invaluable in interpreting the findings from the Exploratory Data Analysis (EDA) phase and guiding subsequent data preparation steps.
Direct SME Involvement in Annotation
To maintain data integrity and accuracy, it’s essential that SMEs directly participate in the annotation process. The primary subject matter expert, usually the lead stakeholder on the client’s team, should be responsible for making tie-breaking decisions or directly conducting the bulk of the annotation work. This direct involvement ensures that the annotations align closely with the stakeholders’ requirements and success criteria.
Ensuring Annotation Consistency and Alignment
Practitioners should work closely with SMEs to develop comprehensive annotation guidelines. These guidelines serve as a reference point for consistent decision-making during the annotation process. They should include examples of confusing or controversial cases, their final assigned labels, and the rationale behind these decisions. Regular meetings between SMEs and practitioners to discuss challenging examples can refine these guidelines and improve the overall annotation process.
If multiple SMEs are involved, they should be in agreement about annotation practices. Regular check-ins and discussions about the annotation process can help identify and resolve any discrepancies or misunderstandings, ensuring a cohesive and high-quality dataset for subsequent machine learning tasks.
Thoughtful Task Design
When setting up annotation tasks, practitioners should be deliberate in their approach. Understanding human cognitive strengths and limitations is crucial. For instance, humans generally perform better at direct comparison or ranking tasks than at making absolute judgments on scales like Likert scales. Task design should leverage these strengths to ensure high-quality annotations.
Semi-Automatic Annotation Techniques
Advanced machine learning techniques can be employed to streamline the annotation process. Active learning and Few-Shot Learning approaches can help prioritise the annotation of data points that models find challenging to classify automatically. However, practitioners should be aware that these methods may introduce biases and that human annotators might be influenced by model predictions.
Synthetic Data Generation
In some contexts, Large Language Models (LLMs) can be used to generate synthetic data to augment the annotation process. While this can be a powerful tool, practitioners must exercise caution and thoroughly understand the limitations and potential biases of synthetic data generation approaches. The use of synthetic data should be carefully evaluated and validated to ensure it enhances rather than compromises the quality of the dataset.
Modelling and Feature Engineering
Establishing a Baseline
Establishing a baseline model is a crucial first step in any machine learning project. It provides a performance benchmark, helps identify if simple methods suffice, and offers insights into task complexity, ultimately saving time and resources. Practitioners should implement straightforward methods that offer reasonable performance with minimal effort, such as logistic regression for classification, linear regression for regression tasks, or basic pre-trained models for NLP and computer vision tasks.
Once implemented, carefully evaluate and document the baseline model’s performance, architecture, and resource usage. Analyse these results to gauge the problem’s complexity and identify areas for improvement. Use this information to set realistic expectations with stakeholders and estimate the project’s potential impact.
Remember that the baseline is just the starting point. Use it as a foundation for iterative improvement, experimenting with more advanced techniques and model architectures as needed. This approach ensures efficient resource allocation and provides a solid framework for developing increasingly sophisticated machine learning solutions.
Further Modelling
Further modelling involves using a structured approach to iteratively build upon the baseline to improve performance while working towards the defined success criteria. Practitioners should regularly check your model results against specific success criteria or set a firm deadline to prevent indefinite iteration. This ensures focused, efficient development and helps manage stakeholder expectations.
During this phase, practitioners should evaluate all models against the same test set used for the baseline, maintaining consistency in performance comparisons. Approaches may include training new models from scratch or fine-tuning pre-trained models, depending on the task and available resources.
For training new models, consider experimenting with more complex architectures or ensemble methods. When fine-tuning pre-trained models, especially in domains like NLP or computer vision, explore techniques such as gradual unfreezing or discriminative fine-tuning to optimise performance.
Throughout the process, document each experiment meticulously, recording hyper parameters, data preprocessing steps, and performance metrics ( See Model Version Control) in order to keep track of the most promising directions for further improvement.
Regularly reassess progress against the defined success criteria or deadline. If improvements plateau or the deadline approaches, prepare to conclude the modelling phase and move towards model selection and deployment. Remember, the goal is to develop a model that sufficiently meets the project requirements, not necessarily to achieve state-of-the-art performance at the expense of project timelines or resources.
Model Version Control
Effective model version control is crucial for maintaining reproducibility, traceability, and efficiency in machine learning projects. Practitioners should leverage specialised tools like MLFlow to automate and streamline this process. MLFlow, or similar model management platforms, should be used to automatically record key information for each model iteration, including hyper parameters, training metrics, evaluation metrics, and model artefacts.
Importantly, each MLFlow run should also capture the git version of the training scripts used. This creates a crucial link between the model, its training code, and the data used (which should be version-controlled using tools like DVC ). This ensures that for any given model, practitioners can reproduce the exact training conditions, trace the model back to specific code and data versions, compare different model versions effectively, and quickly identify the best-performing models for further analysis or deployment.
Model Evaluation
Selecting appropriate evaluation metrics is a critical step in assessing model performance and ensuring alignment with business objectives. Practitioners should carefully consider the nature of their data and the specific requirements of their task when choosing evaluation metrics.
For classification tasks, it’s essential to look beyond simple accuracy, especially when dealing with imbalanced datasets. Metrics such as precision, recall, F1-score, and area under the ROC curve (AUC-ROC) often provide a more nuanced view of model performance. In cases of severe class imbalance, consider using the balanced accuracy or Matthews correlation coefficient.
For regression tasks, while mean squared error (MSE) and root mean squared error (RMSE) are common choices, mean absolute error (MAE) might be more appropriate when the target variable contains outliers. R-squared (R²) can provide insight into the proportion of variance explained by the model.
Practitioners should use visualisation techniques to get a clearer picture of model performance. Confusion matrices for classification tasks can reveal specific areas where the model excels or struggles. ROC curves and precision-recall curves offer a visual representation of the trade-offs between different performance aspects.
It’s crucial to interpret these metrics and visualisations within the context of the business problem. For instance, in a fraud detection scenario, recall might be prioritised over precision to ensure most fraudulent cases are caught, even at the cost of some false positives. Conversely, in a medical diagnosis context, high precision might be critical to avoid unnecessary treatments or anxiety.
In use cases where general purpose models are exposed directly to the user (e.g. a chatbot powered by an LLM) behavioural tests like those provided by the checklist framework may be appropriate to reduce the risk that models will respond in an unpredictable or inappropriate way in the wild.
Model Explainability
As machine learning models become increasingly complex, the need for model explainability grows in importance. Practitioners must grapple with the challenge of interpreting and explaining the decisions of these often opaque “black box” models, particularly in domains where transparency is crucial for building trust or meeting regulatory requirements.
To address this challenge, practitioners can leverage a variety of explainability techniques and tools. LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) are two widely-used approaches that can provide insights into individual predictions. These methods work by training interpretable meta-models that approximate the behaviour of the complex underlying model in the vicinity of specific data points. While these tools can offer valuable insights, practitioners must be aware of their limitations. The explanations provided are approximations and may not fully capture the nuances of the model’s decision-making process. It’s crucial to communicate these caveats clearly to stakeholders and end-users to prevent misinterpretation or over-reliance on these explanations.
For Natural Language Processing (NLP) tasks, recent advancements in large language models have introduced new possibilities for explainability. Many modern NLP models can now generate human-readable rationales for their outputs, providing a form of self-explanation. Additionally, these models can often identify and extract relevant snippets from source documents, offering a clear link between the input data and the model’s decision.
Practitioners should consider incorporating these explainability techniques into their workflow, particularly when deploying models in high-stakes environments. By providing interpretable insights alongside model predictions, they can empower human decision-makers with additional context and confidence. However, it’s important to note that explainability remains an active area of research, and current methods may not be suitable for all types of models or use cases. Practitioners should stay informed about the latest developments in this field and carefully evaluate the appropriateness and limitations of explainability techniques for their specific applications.
MLOps and Deployment
Effective MLOps practices are crucial for seamlessly transitioning machine learning models from development to production and maintaining them throughout their lifecycle. Practitioners must consider various aspects of model deployment and management to ensure reliable, scalable, and maintainable ML systems.
Hardware Requirements and Resource Management
Understanding and documenting the hardware requirements of trained models is a critical first step in the deployment process. Practitioners should carefully assess and communicate the GPU, memory, and CPU requirements of their models. This information is essential for DevOps teams to provision appropriate resources and for stakeholders to understand the infrastructure costs associated with model deployment.
Standardised Deployment Process
To ensure consistency and reliability in model deployment, practitioners should establish a templatable, repeatable process. A common and effective approach is to encapsulate models within containers that expose a standardised API. Frameworks such as FastAPI or Flask, combined with containerisation technologies like Docker, provide an excellent foundation for creating self-contained modules that can be easily deployed and managed by other teams.
When developing these containerised solutions, practitioners should prioritise clear documentation and standardisation of API schemas. This documentation is crucial for facilitating seamless integration with other systems and for enabling other teams to effectively utilise the deployed models.
Automated Testing and Quality Assurance
To maintain the stability and reliability of deployed models, practitioners should implement comprehensive automated testing suites. These tests should verify not only the functionality of the API but also ensure that the model’s behaviour remains consistent across updates. By incorporating these tests into the deployment pipeline, practitioners can catch potential issues early and maintain the integrity of the production system.
Load Testing and Model Optimisation
Collaboration with DevOps teams is essential for conducting thorough load and scale testing. These tests help ensure that the deployed solution can handle the expected system load and identify any performance bottlenecks. In cases where performance does not meet requirements, practitioners may need to employ model compression, quantisation, or optimisation techniques.
Frameworks like Huggingface Optimum and llama.cpp offer tools for reducing model size and improving inference speed, often with minimal impact on accuracy. Practitioners should work closely with stakeholders to determine acceptable trade-offs between model performance and resource utilisation, ensuring that the optimised models still meet the required accuracy thresholds for the specific use case.
Ongoing Monitoring and Logging
Effective MLOps practices extend beyond initial deployment to include comprehensive logging and monitoring strategies. These practices are crucial for maintaining model performance, detecting issues early, and continuously improving the system over time.
Practitioners should implement robust logging mechanisms to capture model requests and responses. By comparing model predictions with corresponding human judgments over time, practitioners can gain insights into real-world performance and identify when a model needs retraining or fine-tuning due to concept drift. To facilitate logging and monitoring, practitioners can use tools such as Prometheus and Grafana for data collection and visualisation. Purpose-built MLOps and LLOps platforms like Langfuse offer specialised features for monitoring and analysing the performance of language models in production.
In addition to performance monitoring, practitioners should implement liveness tests for their model servers. These tests help ensure continuous availability by automatically detecting issues and triggering restarts when necessary, minimising downtime and maintaining consistent service quality. These tests can also be flagged in reporting dashboards so that DevOps teams know when something is wrong.
Key metrics to monitor may include model accuracy (or f1-score etc), response times, error rates, resource utilisation, and request volumes. This approach not only helps maintain optimal performance but also provides valuable insights for future model development and deployment strategies. Regular review of logs and metrics, combined with automated alerts for anomalies or performance degradation, ensures that practitioners can respond quickly to issues and make data-driven decisions about model updates or retraining.
LLM and VLM Specific Guidance
For teams working with Large Language Models, Vision Language Models and NLP (+vision) use cases that involve embeddings, generation, information extraction etc.
Outside of large enterprise/silicon valley tech firms, practitioners are unlikely to be training their own new LLMs from scratch (this guide does not cover how to structure teams in those sorts of organisations). Therefore, teams should focus on staying informed when it comes to available third party models and their capabilities and weaknesses.
Model Fitness Testing and Evaluation
Practitioners should take into account model statistical performance in the context of cost to run and use. Practitioners should also be aware of any applicable technical or legal limitations of specific models and any ramifications for model where models may or may not be used. For example, clients in the EU usually require that third party models are hosted in the EU to comply with GDPR.
Practitioners may use public leaderboards and metrics as a signpost,. However, they should be aware that many of these benchmarks are easy to game and may not be applicable in the context of their product or project. Practitioners should instead build and maintain internal benchmarks and evaluation tasks showcasing task-specific model performance.
For example, a commonly used benchmark for LLMs is MMLU which evaluates generative models’ ability to answer high-school difficulty multiple choice questions across a broad set of STEM subjects. Performance at this task may be a useful signpost if you are training a question answering model but is not likely to yield helpful information about model performance in an information extraction setting. Furthermore, a model that performs poorly at MMLU could still perform well at question answering in a Retrieval Augmented Generation setting.
Prompt Engineering, Management and Improvement
Prompt engineering, while widely discussed, remains a challenging and inconsistent practice in machine learning. Its lack of theoretical foundation, high variability between models and runs, and time-intensive nature make it a questionable investment for practitioners seeking reliable, scalable solutions. The field’s sensitivity to minor wording changes and model-specific optimizations further complicate its practical application.
Instead of focusing heavily on manual prompt engineering, ML practitioners are encouraged to explore automated approaches like DSpy and other emerging tools that can systematically optimise prompts. Additionally, emphasis should be placed on developing robust evaluation frameworks, integrating prompting with other techniques like fine-tuning or retrieval-augmented generation, and maintaining careful documentation and versioning of prompts used in production systems. As the field evolves, research into more stable prompting techniques and less prompt-sensitive models may eventually reduce the need for extensive prompt engineering.
Retrieval Augmented Generation
RAG, or Retrieval-Augmented Generation, is a technique that combines the power of large language models with external knowledge retrieval. In this approach, when a query is received, relevant information is first retrieved from a knowledge base or document collection. This retrieved information is then used to augment the input to the language model, providing it with additional context and facts to generate a more informed and accurate response.
Managing and Minimising Hallucinations
All generative models are essentially sophisticated pattern recognition systems and lack true understanding of the subject matter they are generating text about. They are prone to generating plausible-sounding but factually incorrect information – a phenomenon known as “hallucination.” All generative models, including those using advanced techniques like Retrieval-Augmented Generation (see Retrieval Augmented Generation above), are susceptible to this issue. While RAG can reduce hallucinations, it doesn’t eliminate them entirely.
To manage and minimise hallucinations, practitioners can employ various strategies. Verification techniques such as substring testing (does the answer that the model gave appear verbatim the source document?) and factuality testing can help ensure the generated answers align with source documents. Controlled generation, through structured outputs or predefined vocabularies, can limit the model’s ability to produce invalid responses. Careful prompt optimisation and few-shot learning examples can guide the model towards more accurate outputs.
For critical applications, human-in-the-loop verification serves as a final safeguard, albeit resource-intensive. Importantly, practitioners should always communicate the limitations of LLMs to end-users and stakeholders, emphasising the potential for hallucinations and the need for critical evaluation of model outputs.
While these strategies can significantly reduce hallucinations, no method can completely eliminate them. Practitioners should implement a combination of techniques, tailored to their specific use case and risk tolerance. Regular monitoring and updating of these safeguards is crucial as LLM capabilities and mitigation techniques continue to evolve.
Prompt Injections and Attacks
Prompt injection attacks occur when malicious users craft inputs that manipulate the model into performing unintended actions or revealing sensitive information. To mitigate this risk, practitioners should implement robust input sanitisation and validation processes. It’s crucial to avoid allowing end-users to directly prompt the model without any intermediary checks or filters. Instead, user inputs should be carefully cleansed and validated before being passed to the model.
Information exfiltration is another significant concern, where attackers may attempt to extract sensitive or proprietary information from the model. To prevent this, practitioners should carefully curate the training data and fine-tune models to avoid incorporating sensitive information. Additionally, implementing strict output filters can help prevent the model from inadvertently revealing confidential data.
A critical best practice is to never directly link model outputs to security-critical processes without a verification step. For example, a model could potentially be tricked into generating commands that delete critical files or send data to an attacker. All model-generated actions that interact with sensitive systems or data should be subject to human review or additional automated checks before execution.
Practitioners should also consider implementing rate limiting, user authentication, and logging mechanisms to detect and prevent potential abuse. Regular security audits of the entire system, including the model, its inputs, and outputs, can help identify and address vulnerabilities.
LLM Fine Tuning and Adapter Training
Recent advancements in language model adaptation techniques have opened up new possibilities for practitioners to optimise model performance for specific tasks. Notably, methods like QLoRA (Quantised Low-Rank Adaptation) have demonstrated that smaller language models, when properly fine-tuned, can outperform larger frontier models on specific tasks. These techniques offer a cost-effective approach to model customisation, often requiring only a single machine for training.
Adapter training involves adding small, trainable modules to a pre-trained language model while keeping the original model parameters frozen. This approach allows for task-specific optimisation without the need for full model fine-tuning, significantly reducing computational requirements and training time.
To effectively implement adapter training, practitioners should follow a standard workflow:
-
Data Preparation: Curate a high-quality dataset specific to the target task. This dataset should be split into training, validation, and test sets.
-
Baseline Evaluation: Establish benchmark performance metrics using the pre-trained model without adapters on the test set.
-
Adapter Configuration: Choose an appropriate adapter architecture and hyperparameters. For QLoRA, this involves selecting quantisation parameters and determining the rank of the low-rank approximation.
-
Training: Train the adapter modules on the training set, using the validation set to monitor for overfitting and adjust hyperparameters as needed.
-
Evaluation: Assess the adapted model’s performance on the test set, comparing it to the baseline metrics established earlier.
-
Iteration: If necessary, refine the adapter configuration or training process based on the evaluation results.
Practitioners should pay particular attention to the quality and relevance of the training data, as this significantly impacts the effectiveness of the adaptation. Additionally, careful hyperparameter tuning can lead to substantial improvements in adapter performance.
The cost-effectiveness and efficiency of adapter training make it an attractive option for practitioners looking to customise large language models for specific applications. By leveraging these techniques, organisations can achieve state-of-the-art performance on targeted tasks without the need for extensive computational resources or the challenges associated with full model fine-tuning.
As the field of adapter training continues to evolve, practitioners should stay informed about new developments and best practices. This approach to model adaptation represents a powerful tool in the ML practitioner’s toolkit, enabling rapid, resource-efficient customisation of large language models for diverse applications.