With GenerativeAI (Gen AI) making a considerable impression on people’s imagination, many organisations are thinking about how Artificial Intelligence could be used in their workplace. Numerous vendors and solutions have emerged to meet this demand. Given its deep reliance on vast amounts of data, a critical concern has been in relation to data privacy. In this quick article, we will have a look at the key things to look out for when doing a privacy assessment on Gen AI enterprise solutions.
Gen AI System Components
Before we get too deep into things, how enterprise Gen AI solutions are commonly designed
is worth noting.
While ChatGPT may be adequate for ‘off the cuff’ general knowledge questions, organisations are
likely to demand much more of their enterprise Gen AI solutions. To offer useful answers, Gen AI solutions will need to access proprietary organisational data.
To achieve this, currently available solutions often use Vector Databases to supplement the knowledge base of Large Language Models (LLMs) like ChatGPT. These databases ingest organisational documents or data, then index them to provide an extra layer of context when coming up with answers to a question. This mediation layer bridges the best of both worlds – connecting the global knowledge store of an LLM, with the specifics of an individual organisation.
Top 5 Issues to Consider
With the above in mind, the following are key things to look out for when evaluating the
privacy implications of an enterprise Gen AI solution.
Access Permission
If a Vector Database is used to ingest organisational data, an obvious question that arises is who will have access to that information? If no granular permission sets are available, a Gen AI solution could easily interpolate sensitive data in response to a prompt by a staff member who would otherwise not have access to such information. Where assignable permission sets do exist, organisations should determine how they will align them with administrative and other technical controls to minimize inadvertent disclosures of personal information.
Data Sovereignty
With ChatGPT being the progenitor of much of the interest in Gen AI, many vendors are inevitably located in the US. If your organisation is located elsewhere and local privacy laws require data sovereignty, then it is worth asking vendors where each component of their solution is hosted. This should include not only the LLM server, but also the Vector Database and any other servers being used. Personal information may interact with any and/or all solution components in response to a user prompt.
Extensions and Plugins
With the exponential growth of the Gen AI ecosystem, extensions and plugins are playing an increasing role in quickly adapting solutions to new requirements. As with user permission sets, organisations should understand potential data flows before enabling any such feature.
Permitted Use
If the Gen AI solution is to process personal information, organisations will need to determine whether it is being used in a manner consistent with the original intent of the data subject. Having said that, many jurisdictions also provide exceptions, particularly where use is related to the original purpose.
Right to Be Forgotten
The use of a Vector Database can also be problematic from a logistical perspective when it comes to the Right To Be Forgotten. When a data subject exercises such a right, these databases may also need to be updated. The format of Vector Databases are however somewhat different from standard relational databases. Organisations should therefore ensure they fully understand how they intend to meet their data subject obligations if self service tools are not readily available.
While Gen AI solutions offer a great deal of promise in terms of workplace productivity, organisations should remember that it is also a data intensive application. As such, adequate guardrails should be considered to ensure privacy obligations are not breached. Understanding a vendor’s system components and how potential data flows relate to privacy obligations in specific jurisdictions will help achieve this.