Presented by Jacki O’Neill at Microsoft Research Forum, June 2024
“It’s only by working together can we solve the challenges we face with generative AI at scale and, in doing so, capitalize on the opportunities these new technologies offer. … Now is the time to change the dialogue and change the direction of these new powerful technologies to ensure that they are globally equitable by design.”
– Jacki O’Neill, Lab Director, Microsoft Research Africa, Nairobi
- Microsoft research copilot experience What did Jacki O'Neill discuss in her keynote about building globally equitable AI?
Transcript: Keynote
Building Globally Equitable AI
Jacki O’Neill, Lab Director, Microsoft Research Africa, Nairobi
Jacki O’Neill discusses the importance of creating globally equitable generative AI. She addresses the technical and sociotechnical challenges that must be tackled to positively transform work futures worldwide.
Microsoft Research Forum, June 4, 2024
JACKI O’NEILL: Hi, I’m Jacki, and I head up Microsoft Research Africa, Nairobi. Welcome to the Microsoft Research Forum. I’m going to talk about the importance of building globally equitable generative AI.
Given its ability to generate and process human-like natural language, generative AI is set to transform the way we interact with technology. Like the graphical user interface before it, generative AI promises to make computing and AI more accessible to a wider range of people. This promise encompasses several features. Their natural language interfaces mean users can interact with these models conversationally. They can ask questions, give commands, and get tasks done. This can lead to a reduction of complexity across applications and devices as one can imagine navigating through and creating content using natural language without having to open different applications to find and extract information or even know which tool that content was created in. Given this, LLMs could reduce the burden of repetitive and nonessential tasks—from helping us craft email to summarizing documents and supporting report writing—giving us more time to focus on the work we love. Finally, multi-modal interactions with image, speech, and video processing and generation further enhance the transformational power of these tools, all of which could make both the power of AI specifically and that of computing more generally much more widely accessible, including to a mobile-first or mobile-only audience, thus reaching the billions of people who don’t work at desks.
As a result, generative AI is likely to transform the future of work across the globe in ways as yet unimagined and has sparked excitement about its potential impact on the Sustainable Development Goals. However, generative AI may not be equally useful for everyone, and its impact will not necessarily be evenly distributed globally, across regions, communities, or demographics, and as a consequence, there’s a risk of compounding existing systemic inequalities. For example, those that can most benefit from the promise of generative AI include populations in the Global South, who’ve been previously excluded due to both the traditional digital divides and the AI divides. The traditional digital divide has three levels, including access to digital technology, having the skills and knowledge required for their effective use, and the ability to translate use into desired outcomes. Generative AI then brings additional elements to the digital divide. The AI divide encompasses the socioeconomic conditions around who gets to create, deploy, and benefit from AI. The data divide refers to the consequences of not having good representation and equivalence in the training data and data-related processes, such as labeling and reinforcement learning. And the third divide is compute given the GPU requirements to build, train, and deploy these immense and immensely powerful models.
So how does generative AI impact these divides? Well, it reduces the traditional divide in some ways because natural language interfaces mean AI is more accessible to more people than ever before. For example, this latest generation of general-purpose off-the-shelf tools can and are being deployed to improve productivity by businesses around the globe, including many small businesses who were previously excluded from the AI revolution because they just didn’t have access to machine learning professionals in their companies. In terms of devices, many of the current AI tools can be used on smartphones, although they do require data. But there’s a plethora of specific feature-phone services which are being created in areas such as health and agriculture which don’t require the end user to have data. Whilst it’s too early to definitively talk about the ability to translate use into outcomes, research on small and medium businesses’ adoption of generative AI in Kenya and Nigeria suggests that it provides value for those who start using it in a number of use cases, such as writing emails in English.
The AI divides however remain, and there’s much work to be done to arrive at globally equitable generative AI. Today, I want to focus on the data divide, which stems from the fact that the vast majority of training data comes from the English-speaking Global North. This has an impact on the representations of both language and of knowledge in AI systems and consequently on their ability to process and produce appropriate output. But what does this mean in practice?
Let’s start with a look at language. Last year, research has shown that when compared to state-of-the-art non-autoregressive models, or SOTA models, on standard NLP tasks—natural language processing tasks and benchmarks—those SOTA models outperform large language models, including GPT-4. Large language models tended to work well on high-resource language families with Latin scripts but less well on low-resource languages with limited training data or non-Latin scripts. However, generative models introduced new challenges for NLP benchmarking, many of them due to prompt sensitivity. That is, even small changes in the prompt construction can impact performance, making consistent benchmarking difficult. For example, even asking the LLM to provide explanations for its output can change performance as does the choice of examples used in the prompt. Nonetheless, currently, African language performance on traditional metrics isn’t yet at a par with English performance. But this doesn’t tell the whole story.
In naturalistic interactions, GPT-4’s performance seems pretty amazing. For example, in a collaborative project with the University of Washington Global Health Department, we’ve been looking at building NLP, or natural language processing, tools to support medical facilitators. These facilitators manage peer support groups on WhatsApp for young people living with HIV in informal settlements in Nairobi. The data consists of chat messages in English, Swahili, and Sheng, a local dialect, and includes code-mixing, emojis, and “chat speak.” You can see an example of the data here. This message contains a mixture of English and Swahili code-mixing with its translation by human annotators. We found that even the best multilingual SOTA models performed so badly even after fine-tuning, that we stopped working on this project. Then, along came GPT-4, and suddenly, these tools seem possible again. What’s going on? Why are NLP benchmarks telling us one thing about African language performance and application-based practice telling us another?
Well, one part of the explanation is that previous models typically just couldn’t handle code-mixing, whereas generative models are much better equipped to handle natural language. Therefore, they’re not only able to handle naturally produced code-mixed language, but they can also handle chat speak with its abbreviations, colloquialisms, emojis, and so on. Now, we found that whilst both GPT-4 and LLaMA showed impressive results in sentiment analysis on this dataset, GPT-4 appears to use more of the whole context of the sentence to produce slightly more robust predictions. Returning to our example, if we assume some correlation between explanations and prediction, we can see that GPT-4 gave more nuanced predictions, whereas LLaMA did not seem to pick up on the more positive, although conditional, sentiment in the second part of the sentence.
Despite these impressive advances, there’s still plenty of work to be done. There are still many under-resourced African languages where performance lags far behind. For example, the models make more mistakes on Sheng, which is not included in the training data, and speech models lag behind, often failing at the code-mixing hurdle. This is important because voice interfaces are likely to be essential to enabling even broader access to AI. So this is an area of continued research for African and other under-resourced languages. But language is not the only concern. Whilst language is most researched, the widespread deployment globally of the latest generation of foundation models reveals another equally pressing problem. Models hallucinate, fail, or reproduce stereotypes in African and other Global South contexts.
On the positive side, we’ve seen adoption of text and text-to-image generation tools, generative AI search, AI-augmented design tools, speech generation tools by small businesses in Nigeria and Kenya from a range of sectors, including law, design, outdoor recreation, and retail. These businesses successfully use generative AI to support communication, especially being professional and polite and to the point in emails. A common example that we saw across sectors is illustrated here: how do I tell my client he’s four months late now to pay his fees, and I don’t want to sound rude? And we saw this across pretty much all of the small businesses where they needed customers to pay. They also used AI to support creative work, such as content creation and ideation and so on. They described how it helped save time. For example, it reduced the time for ideation. As an architectural designer said, “We would manually, kind of, like, bounce ideas off each other. … Arriving at 10 strong ideas would take two or three sessions, whereas now we get the same results in one.” Even the lawyers who charged by the hour wanted to reduce their mundane work so they could spend more time on creative work. They would have liked to deploy generative AI to reduce the time spent on small-case document review. As a senior lawyer said, “We could have spent that 15 hours on important things. Once the machine, the AI, had given us the report, we’d be thinking creatively now.” So far so good.
Problems often arise, though, when they want to use generative AI in work involving the African context, which—as SMBs in Africa—is quite often. Whilst generative AI can sometimes help to navigate cultural and contextual boundaries, it’s more likely to hallucinate when the proportion of training data is low—i.e., in most African context—and a whole host of problems starts arising, from accent recognition in meeting transcription to speech production. For example, the CEO of an IT company used voice cloning for training videos but found it gave her a British accent. And as she said, it “takes away from my originality, which is I’m not British; I’m Kenyan.” Or the poor context and consistency we’ve seen in image production systems creating unusable and sometimes stereotypical images of Africans and African landscape, not to mention the tendency to produce answers which neglect or misrepresent African people, history, culture, and knowledge. And even where information about Africa is generated, it often portrays the Western perspective. This was perhaps most clearly encapsulated by one of the lawyers, who explained, “Even if you put into the particular AI that you’re asking from a Kenyan perspective—while in Kenya, does this law apply?—they’ll reference the people in the US, which is insane because we have Kenyan authors; we’ve done the actual work.” Overall then, it can leave the feeling that generative AI is really Americanized.
This regional bias goes way beyond demographic biases like race, although it would be compounded by them. Whole continents and their knowledge are severely underrepresented, and this comes through clearly in use, both in usability and use cases that are directly impacted by this. Indeed, AI has a language problem, but just as importantly, it has a knowledge problem, and this is likely to compound existing systemic inequalities. But we’re at the very early stage of generative AI and the impacts it will have on work. This is a fast-moving field, and there’s an immense opportunity to take control of the agenda and build truly globally equitable AI systems. This requires ensuring that diverse contexts and applications, with their diverse datasets, drive the development of generative AI. We need to be intentional and embrace these approaches. Machine learning methods like fine-tuning and retrieval-augmented generation, or RAG, are unlikely to work well if we don’t design and build for these diverse contexts from the beginning.
This is not something that any one group or company, nor any one discipline, can or should do on their own. This needs to be a collaborative effort incorporating different voices, different perspectives, and different disciplines working more closely together than ever before. And so just before I finish, I want to highlight one initiative that’s attempting to address some of the concerns raised: the African Health Stories Project.
This is a multi-institution, multi-country, multidisciplinary project. Microsoft Research is working with public health researchers at Stellenbosch University, human-computer interaction researchers at the University of Swansea, and machine learning and data science researchers at the University of Pretoria to create culturally appropriate and sensitive stories supporting good health behaviors. We will use generative AI to create interactive visual, oral, and text stories which enable patients to better understand how to apply health advice to their local circumstances. Together, as a multidisciplinary team, we will use this specific real-world application area to probe, evaluate, and extend the ability of generative AI to create situated and culturally appropriate content at the same time as addressing a real health need. Because it’s only by working together can we solve the challenges we face with generative AI at scale and, in doing so, capitalize on the opportunities these new technologies offer.
We have plenty of work to do, but now is the time to change the dialogue and change the direction of these new powerful technologies to ensure that they are globally equitable by design. Thank you.
Related resources
- Research Lab Microsoft Research Lab – Africa, Nairobi
- Podcast What’s Your Story: Jacki O’Neill