Privacy and data ownership in the age of Artificial Intelligence
by Jorge Sáez Gómez, Strategic Initiatives at Connecterra
Generative AI has attracted global attention for its potential implications for the dairy industry. There is incredible potential to automate very manual analyses and save valuable hours for time-strapped farmers and advisors. Its predictive capabilities can support more efficiency and productivity. Yet this new technology has also rekindled concerns about ownership and privacy. Understandably, farmers want control over who has access to their confidential data. But how does this work in practice? And what is Connecterra doing to ensure our customer data remains private?
How seemingly private data becomes public
The European Union defines personal data as any information that can be reasonably traced back to you. It enforces privacy with its strict General Data Protection Regulation, requiring your consent before personal data is collected, stored or shared.
- For instance, a statement like “John Smith from the state of Ohio has a 1000-acre dairy farm and net profit of $1 million annually” can be linked back to a specific person, making it a piece of data that reveals private information.
- In contrast, a statement like “a 1000-acre dairy farm that has a net profit of $1 million annually” would be virtually impossible to link it back to a specific person without additional information.
- While the previous scenarios are relatively straightforward, a statement like “a 1000-acre dairy farm in the state of Ohio that has a net profit of $1 million annually” lies in a grey zone. The statement itself is not personal; however, if there are only a few 1000-acre farms in Ohio and public records include acreage with farm ownership, a third-party could theoretically use the two sources of information to link the owner with the profitability of their farm.
Many users expect conversations with large language model (or LLMs) tools like ChatGPT will remain private. However, LLMs are trained using large text corpora, which often include actual conversations with users. If an LLM learns from your conversation, theoretically it can reproduce it almost verbatim later to anyone who asks similar questions or has a similar conversation. The risk of compromising data privacy is directly tied to the content of interactions with generative AI systems.
What is the solution?
The clearest way to mitigate risk is not training any LLM with any personal data. However, this approach is short-sighted and difficult to manage. It also eliminates the potential to address some of the industry’s biggest challenges, like mitigating the acute skilled labor shortage with less manual computer work or improving decisions on protocols and products by understanding the impact on herds or predicting the effect of interventions on the farm’s GHG footprint and profitability.
It is possible to create systems that leverage the power of LLMs while ensuring privacy and control over personal data sharing. However, designing these systems requires careful thought and expert knowledge. The team must consider that the definition of personal data is not clear-cut and take actions to address grey areas. These measures need to be taken in addition to other data protection techniques, such as having encrypted communications and storage to avoid unauthorized access to the data.
Our approach to privacy
At Connecterra, people—not tech—will always come first. While artificial intelligence has incredible potential for positive outcomes, we must use it responsibly. Our commitment to social consciousness means that we prioritize transparency, integrity and accountability.
This commitment includes our work with generative AI. In addition to the standard privacy measures outlined above, we have taken the extra step of running all AI systems within our own cloud infrastructure. This step ensures that no data is transferred from our premises to third-party companies without the explicit consent of the data owner. It also gives confidence that our customers’ data is not being stored for long periods of time and ensures our customers’ data is not used to train AI systems hosted by third parties.
To learn more about our work with generative AI or our solutions for data privacy, send us a note and our team will get in touch.