The business risk of using LLM services
Using third party LLM services poses significant security and privacy risks for businesses. There is genuine cause for concern regarding employees sharing personally identifiable information and sensitive business data with LLM services. The LLM services may use input data to improve the models. Several large corporations (Amazon, JPMorgan, Microsoft, Walmart) have restricted access or issued warnings to their employees about entering confidential information into ChatGPT.
Another concern is the increasing complexity of compliance related to LLM services. Earlier this month, OpenAI accidentally exposed the conversations of its users, a GDPR violation. And recently the Italian Data Protection Authority banned ChatGPT in Italy and launched an investigation of suspected breach of privacy rules.
Also, an important aspect to understand is the cost of using LLM services. Most LMM services use a consumption pricing model that charges based on the amount of text characters (tokens) exchanged between an application and the AI. In many cases, OpenAI costs are surpassing AWS costs as the top-line item for cloud infrastructure expenses.
Clearly, if large businesses are to successfully deploy LLMs at scale and maximize ROI on their LLM investments, they must develop a strategy that carefully addresses the risks associated with LLMs.
Expert LLMs brings more value to businesses
LLMs trained on generic data have a general linguistic understanding. Training on public data can only get you so far for complex, enterprise-specific tasks. Specialized domains like law or medicine contain lots of complex domain knowledge that is not present within a generic pre-training text data set. Therefore, a generic LLM like ChatGPT is less likely to successfully generate legal documents or summarize medical information. As a matter of fact, ChatGPT loses to specialized fine-tuned models 75% of the time, according to a recent paper.
For many businesses the valuable uses of LLMs requires that the model be taught new behavior that is relevant to a particular application. For such an application, we need to somehow create an LLM that has a deeper knowledge of the particular domain in which a business is interested in.
A sensible approach is to start the LLM with generic pre-training, then perform further model tuning on domain-specific data. By learning from a more specific text data source, it is possible to capture more relevant information within our model, thus enabling more specialized behavior. This could include further conditioning of the LLMs over specific examples of prompts that match the use cases that it will encounter in the domain.
Examples of expert LLMs
Bloomberg, a leading company providing data analysis and information services to financial professionals around the world, have announced BloombergGPT, a 50-billion parameter LLM, purpose-built for finance. This LLM has been specifically trained on a wide range of financial data to support a diverse set of tasks within the financial industry.
Europol, a law enforcement agency of the European Union (EU) that serves as a center for criminal intelligence and coordination among the member countries, recommends in their latest report on LLMs that: “Law enforcement agencies may want to explore possibilities of customized LLMs trained on their own, specialized data, to leverage this type of technology for more tailored and specific use”.
GitHub Copilot, an expert LLM made with OpenAI’s Codex model, is already writing 46% of code on GitHub and helps developers code up to 55% faster. They have now launched Copilot X, an AI assistant throughout the entire development lifecycle and is expected to improve productivity for software developers even further.
Adopting the strategy of using expert LLMs presents significant benefits for businesses. By customizing an LLM to fit the organization and its unique use case, the business can better serve employees and customers. The businesses also have increased control over potential risks and even reduce them compared to utilizing an LLM service. Taking charge of model hosting minimizes the chance of data leaks and offers improved management of compliance needs, all while maintaining a better control of the costs linked to LLM implementation.
Removing risks with expert LLMs
Right from the outset, Scaleout has focused on delivering privacy-enhancing technologies (PETs) to customers. Our technology solution minimizes the risk of data leaks outside the organization and addresses compliance concerns by offering essential features for the development of private expert LLMs:
- An MLOps platform for managing the life cycle of the expert LLM services in an organization removing compliance risks and risks with leaking data
- Secure Sweden-based data centers for EU data privacy compliance, or alternatively on-premise
- Federated learning, enables loading data sources across an organization with a complex data environment and assists in tuning language models.
- Privacy-enhancing technologies, such as differential privacy, to reduce the vulnerabilities of ML models.
- Confidence intervals, an added security measurement to maintain quality of generated responses and activities of an LLM.
Conclusion
Undoubtedly, if businesses are to successfully deploy LLMs at scale and maximize the return-on-investment on their LLM efforts, they have to develop a strategy that carefully addresses the risks associated with LLM services. At Scaleout we have developed a technology solution, combining an MLOps platform, secure data centers, and privacy-enhancing features. This serves as an excellent platform for securely deploying accessible, sophisticated LLMs for businesses of all sizes. Talk to us to learn more!