By Tim Leers
The race is on to harness the potential of Large Language Models (LLMs) in enterprises. Right now, there is significant risk in adopting LLMs in many usecases, without a clear path to deploying them to deliver business value. In part, that is because, the broad principles that drive value creation in traditional machine learning (ML) model deployment and operations (MLOps) aren't directly transferable to LLM operations (LLMOps).
In collaboration with Dataroots research & Talan research, we identified the following challenges and open questions that we will be actively tackling in the near future in this area:
Data privacy & intellectual property protection: In an era where data privacy regulations are increasingly stringent, it's important to implement robust measures to maintain the confidentiality and integrity of data used to train and fine-tune large language models (LLMs). This involves scrubbing sensitive information from training data, implementing effective differential privacy techniques, and ensuring that the model doesn't memorise and subsequently leak confidential or proprietary information. Also, it's vital to establish clear ownership rights for the generated outputs.
Prompt management: This refers to the challenge of designing, validating, and managing the prompts that are used to generate outputs from the LLM. This may involve developing a formal process for creating and testing prompts, as well as establishing a versioning system to keep track of changes. Given the creativity and subtlety involved in crafting effective prompts, there may be a role for specialised "prompt engineers" or artists.
Data asset interactions: LLMs often need to interact with various data assets in order to generate meaningful and accurate outputs. This might involve retrieving information from databases, integrating with other software systems, or interfacing with user-provided data. Deterministic pipelines or business rules could be used to manage these interactions, but it's important to ensure that these approaches can handle the complexity and unpredictability of natural language processing tasks.
Evaluation: Evaluating the performance of LLMs is a complex task that goes beyond standard accuracy metrics. It might be necessary to establish a "golden test set" that includes a diverse range of language tasks and challenges. Furthermore, "watcher models" could be used to monitor the behavior of the LLM in real-time, flagging potential issues related to bias, fairness, or inappropriate content.
Re-training & fine-tuning: To keep up with evolving language usage and to continuously improve performance, LLMs often need to be re-trained and fine-tuned on new data. This requires robust tools and processes for managing the training data, the training process, and the versioning of models. Leveraging tools like LORA (Low-Rank Adaptation) and developing benchmarks for performance can be beneficial in this context.
Foundation vs. expert models: Depending on the use case, it might be necessary to switch between different models - for example, between a "foundation" model that has been trained on a wide range of data and an "expert" model that has been fine-tuned for a specific task. This requires a robust system for managing these different models, as well as rigorous testing to ensure that third-party models meet the necessary quality and reliability standards.
Drift & follow-up: Over time, the performance of LLMs can "drift" as the distribution of inputs changes or as the model's behaviour changes due to ongoing learning. It's important to have systems in place for detecting and managing this drift, which could involve automated responses or human intervention.
Adversarial Attacks & Misaligned Input: Like any machine learning model, LLMs can be vulnerable to adversarial attacks, where malicious actors attempt to trick the model into behaving in undesirable ways. Similarly, misaligned inputs (where the user's intent does not align with the model's objectives) can also cause problems. Developing robust security measures and input validation techniques can help to mitigate these risks.
These are complex challenges that require a multidisciplinary approach, combining expertise in machine learning, natural language processing, software engineering, and data privacy. The solutions will likely involve both technical developments and organisational changes, such as the establishment of new roles and processes. Collaboration and knowledge sharing will be key to addressing these challenges effectively.
Do these challenges resonate with your experiences in using LLMs in your company or project? We'd love to hear from you if you have any questions or would like to collaborate.