Opening Address by Minister Josephine Teo at the Personal Data Protection Week
Colleagues and Friends,
-
Good morning and thank you all for being here. To our international guests from the data protection authorities from ASEAN member states and from countries as far as Mexico, I am very appreciative of the long distances that you have travelled to be here. We very warmly welcome you, and we hope that you have a very fruitful time here.
-
The theme of this year’s event is “Innovate with Trust, Transform with Data”. It recognises the fact that quality data can help businesses make meaningful transformation. It also acknowledges the need for innovators to uphold trust, so that their stakeholders and the broader society continue to support experimentation. Without it, no innovation can occur.
-
In the age of AI, the need for data and trust have come into sharper focus.
-
Today’s Generative AI-enabled products and services, such as chatbot applications, are being built on the foundation of more and more powerful large language models, or LLMs. As everyone here knows, LLMs are trained on vast amounts of data. For example:
o OpenAI’s GPT-3, which was released back in 2020, was trained on 570 gigabytes of text data that was gathered from web pages, books, and other sources. Roughly 300 billion words were fed to the model for training.
o For GPT-4 Omni that was released earlier this year, one can only imagine how much more data was used.
o It’s no wonder then that some experts are predicting that the dearth of data will be the limiting factor for these large language models, from as early as 2026.
-
In the meantime, and likely for many more years, businesses will continue to need data to deploy applications on top of existing LLMs. Models must be fine-tuned to perform better and produce higher quality results for specific applications. This requires quality datasets. Techniques like retrieval augmented generation can also be used; these too can only work with additional data sources that were not used to train the base model. In addition, good datasets are needed to evaluate and benchmark the performance of the models.
-
However, quality datasets may not be readily available or accessible for all AI development. Even if they were, there are risks involved. Datasets may not be representative; models built on them may produce biased results. Datasets may contain personally identifiable information; and so Generative AI models built on them could regurgitate such information when prompted.
-
If these risks are not mitigated, businesses and consumers alike may find it difficult to trust the AI-enabled products and services. Without a foundation of trust, support for AI innovations could diminish over time. It could even grow into pushback that would one day result in a standstill.
-
Singapore’s response to these challenges is to be proactive, protective, and pragmatic.
-
We’ve long believed that it is better to recognise the risks upfront and acknowledge what we can or cannot presently do about them. It is also why we proactively introduced governance frameworks and guidelines for data and AI from as early as 2012. Building on these efforts, we launched the Model AI Governance Framework for Generative AI last month. This followed an extensive effort to incorporate inputs from the global community. The framework proposes nine dimensions for fostering a trusted AI ecosystem:
o starting with how we can clarify accountability along the AI value chain,
o to how we can improve the way we tackle different parts of the model development lifecycle,
o to addressing cybersecurity and content misinformation risks.
Taken together, these nine dimensions in the Framework help to mitigate the risks of Gen AI, while supporting its innovative use.
-
In fact, one could think of the Model Governance Framework as a roadmap for growing our capabilities in AI governance. It is how we can progressively become more effective in protecting people from the potential harms of AI while also preserving space for innovation.
-
At the same time, as the Ministry responsible for digital developments, we are constantly challenging ourselves to be pragmatic in making the AI ecosystem a more supportive one for businesses. Besides enabling AI development – for example, through AI skills enhancement for the workforce – we will also continue to help businesses make safe and trustworthy use of AI. To help organisations evaluate AI models and applications for AI risks, we developed a testing framework and software toolkit called AI Verify; through Project Moonshot, we extended our work on AI Verify into Generative AI.
-
Today, I would like to share three more practical steps that we are taking to strengthen AI governance in Singapore.
Creating a safe environment for Generative AI development
-
First, we will introduce a set of safety guidelines for Generative AI model developers and app deployers. This set of guidelines, which will be part of the AI Verify framework, seeks to establish a baseline, common standard through two priorities – transparency and testing:
o Our guidelines will recommend that developers and deployers be transparent with users by providing information on how the Generative AI models and apps work, such as the data used, the results of testing and evaluation, and the residual risks and limitations that the model or app may have. This is like when you open a box of off-the-counter medication, there is a sheet of paper telling you about how this medication is supposed to be used, and what are the side effects that you may face. This level of transparency will be needed for AI models built using generative AI – that’s the recommendation.
o Our guidelines will also outline safety and trustworthy attributes that should be tested before deployment of the models or applications. To address issues such as hallucination, toxic statements, and even bias content. This is like when we buy household appliances. There will be a label that says that it has been tested. But what is to be tested for the product developer to earn that label?
-
Our belief is that the heart of innovation in Gen AI lies in model development and how they are used in real-world applications. However, as users will not see or know the inner workings of the development process, these testing priorities and transparency recommendations can lead to better understanding and safer use of AI by everyone. IMDA will start consulting industry on these guidelines, to ensure they are relevant and robust.
Using Privacy Enhancing Technologies to safely unlock data
-
Second, we will support more use of Privacy Enhancing Technologies - or PETs - in AI.
-
PETs are one way to address concerns regarding the use of commercially sensitive and personal data in Gen AI. Specifically, by removing or protecting personally identifiable information, PETs can help businesses optimise the use of data without compromising personal data. PETs address many of the limitations in working with sensitive, personal data and opens new possibilities by making data access, sharing and collective analysis more secure.
-
We are presently still at an early stage of PETs adoption. This may be why IMDA’s PETs Sandbox has seen much interest from industry. This is because businesses are curious about how they can make use of it to help themselves. The practical implementations of these PETs through industry-led real world use cases have helped us to better understand how PETs can be applied. The learnings and insights also help shape future policy efforts. To encourage more use cases for Gen AI, I am pleased to share with you that IMDA will be expanding the Sandbox to support projects belonging to a new archetype, which is “Data Use for Generative AI”.
-
Through the PETs sandbox, the Personal Data Protection Commission has regularly published regulatory guidance to companies. The PDPC has also issued technology guides to help businesses understand when and how to use PETs. We started with guidelines on data anonymisation and today, the Commission has just released a “Proposed Guide on Synthetic Data Generation”.
-
Generating synthetic data is a form of PETs that is gaining traction. It creates realistic data for AI model training without using the actual sensitive data. This is like the ten-year series, which provides questions for students to practise for major exams. They comprise many questions, and you practice these questions to get better. Now, we all know this having used the ten-year series, that even if the actual exam questions were different from what we practiced on, the exercise or practice is still useful in helping us to learn. We expand our ability to understand the problems and we are able to produce something that is closer to the correct answer, even when we didn't spot the exact question. So this new Guide released today will help businesses make sense of synthetic data, by explaining what synthetic data is, how it can be used and the best practices in creating synthetic data.
-
As PETs are a nascent and rapidly evolving space, it is important for both regulators and industry to understand how they can be applied to effectively mitigate risks. I’m therefore glad that this year’s Personal Data Protection Week will include the PETs Summit Asia Pacific for the first time. This will be a good opportunity for data protection authorities, PETs solution providers, and users to connect and learn from one another.
Adopting a regional approach towards shaping the development of standards
-
Before I conclude, I would like to discuss briefly how we can work collectively in the region to help shape the development of standards and build trust in the use of data, for AI and other applications.
-
It goes without saying that each country has to develop its own regulatory approach, tailored to its specific needs. Yet, in an increasingly interconnected world, international cooperation is essential to truly harness the potential of the digital economy. The harmonisation of standards and sharing of best practices can better support the businesses in our region to develop and grow. Such collaborations will enhance the competitiveness of our economies while ensuring a secure and inclusive global digital ecosystem.
-
As Chair for the ASEAN Digital Ministers Meeting (ADGMIN) in 2024, Singapore has also been working within ASEAN to facilitate data governance and convergence in the region. By doing so, we hope to cultivate a better environment for ASEAN businesses to operate seamlessly across borders.
-
I have earlier shared on PETs and the PDPC’s development of guidelines on data anonymisation. I am pleased to share that ASEAN businesses can now look forward to a new ASEAN Guide on Data Anonymisation which will be released early next year. I hope it will serve as an important and practical resource for businesses in ASEAN looking to anonymise data for greater and more responsible use of data across the region.
Conclusion
-
To conclude, “innovation with trust” is a worthy goal for all of us. Innovation and trust are also not mutually exclusive. Together, they can lead to better outcomes for our economies and societies.
-
Over the rest of the week, you will have opportunities to exchange ideas to advance our important work on AI governance and data protection. The ultimate aim, however, is to help our societies benefit from the power of AI, to solve its most intractable problems in healthcare, environmental sustainability, education and many others.
-
My colleagues and I will continue to pursue AI for the Public Good, for Singapore and the World, by being proactive, protective, and pragmatic in promoting AI developments.
-
We thank you once again for joining us today, and hope you have a fruitful and productive week ahead. Thank you.