Image

GenAI: The importance of good data management

Lessons learned to ensure optimal results when using LLM

Share on Social Media

Businesses are using more data than ever – recent estimates suggest that 328.8m terabytes of data are created each day. Companies are increasingly using behavioral insights and predictive analytics to drive sales and some have devoted entire business models to effectively using consumer data. Effective data management is more critical than ever as this data is scattered across various platforms (such as SharePoint, OneDrive and local personal folders) and as more and more people work and access files remotely.

Generative AI (GenAI) is providing personalized models that allow immediate and access to large volumes of internal data, through simple conversational prompts. The possibilities are near endless. However, the successful use of GenAI solutions really hinges on a robust data management strategy to ensure accuracy, security and efficiency of the data used.  We look at some of the key lessons we have learned when it comes to robust and effective data management. 

DATA GOVERNANCE

Increased scattered data storage poses significant challenges for AI models and the consolidation of this data is crucial to facilitate efficient data access and management. Focusing on centralized data storage not only helps improve accessibility and reduces the time spent searching for information, it can also help with consolidation of data for successful AI usage. While this can be difficult for large organizations with legacy systems, it’s an important process and involves selecting the right data warehouse, integrating multiple data sources and making sure data is secure and compliant. 

mobile locked on a laptop

DATA CLASSIFICATION

Of course, not all data should be accessible to everyone all the time. Implementing strict data classification helps determine access levels and ensures that sensitive information is protected. Integrating data classification protocols with AI systems will help maintain security and comply with privacy regulations as well as help data providers know what data is being used.

CHANGE MANAGEMENT

AI systems often struggle with outdated data. Without a process to inform or retrain AI models when new data versions are released, users are far more likely to receive obsolete information and the process is more likely to break down. Establishing a robust change management process to keep AI models updated with the latest data ensures accuracy and relevance. 

DATA QUALITY

During SPS’ own AI implementation, we have often encountered contradictory statements within the documents used to train AI. This inconsistency can impact results, as AI might return conflicting information. Prioritizing data cleansing to remove or correct corrupt, inaccurate, or irrelevant data ensures consistent and accurate data for reliable outputs. AI can actually be used effectively in this process to spot patterns and inconsistencies. 

QUALITY-CHECKED DATA SETS

For AI to perform to its best capabilities, the training data provided must be high-quality and diverse. Using accurately tagged datasets that reflect real-world scenarios reduces biases and hallucinations in the AI outputs. We have found that ensuring rigorous quality checks of all training and regularly fine-tuning reference datasets helps enhance AI performance. 

CONTINUOUS MONITORING AND FEEDBACK

Ongoing monitoring of AI outputs against benchmark data is essential for robust results. This process involves periodic checks to detect anomalies or deviations and helps make sure that AI remains accurate and reliable over time. User and system feedback are invaluable for this fine-tuning of AI models. This might involve regular updates to the training set with new examples to help improve accuracy. Integrating feedback loops will help address misinterpretations. 
 
 

MULTI-LAYERED VALIDATION SYSTEMS

AI-generated outputs should always undergo multiple validation checks. For instance, secondary AI models or human reviewers can cross-verify a random selection of datasets, triggering manual reviews for discrepancies. One of our key learnings is to employ a multi-tier validation system to ensure accuracy in data capture and output accuracy. SPS employs a system where AI-generated outputs undergo additional checks by secondary models or human reviewers to correct and eliminate errors. For instance, when our Gen AI solution extracts and validates contextual data from documents, a secondary AI model cross-verifies a random selection of datasets.

GDPR hologram on a tablet

OTHER LESSONS LEARNED

You can have too much data. In a recent contract analysis scenario, our AI was initially overwhelmed by the volume of data it was provided with. By splitting the problem into manageable parts and validating each segment individually, we were able to significantly improve the accuracy of the AI outputs. The lesson here is to break down complex tasks and validate iteratively to achieve better results. 
 
You can also trust the response too much. In another example, when querying some policy documents, AI provided the source along with an interpretation. By treating the AI response as a suggestion from a novice employee, users were able to validate the information by referring to the original source. The lesson here is to always thoroughly cross-check AI interpretations against original documents to ensure accuracy and reliability. Always assume a novice ability on the part of the AI. 

CONCLUSION

Effective and meticulous data management is essential to successful AI usage. There are lessons to learn around centralizing data storage, implementing robust change management and classification processes, ensuring data quality and continuously monitoring and validating AI outputs. By putting these data management measures in place, organizations can leverage AI to its full potential.

RELATED ARTICLES

Placeholder image

Revolutionizing Business Processes with GenAI

GenAI drives advancements in machine learning, data analytics, and automation technologies. Its integration into existing system landscapes allows businesses to leverage the full potential of digital technologies, achieving digital efficiency and enhancing their competitiveness and capacity for innovation.

Placeholder image

Prompt engineering

Publicly available generative AI applications are now creating output that is virtually nearly indistinguishable from human efforts.

Placeholder image

What’s next? The Evolution and Impact of AI

Not all Artificial Intelligence is the same, nor does it have the same impact. Artificial Intelligence (AI) is an umbrella term that includes a broad range of technologies and techniques and is roughly defined as software that can be trained on examples