Studying AI Alignment Ep1: the past and future between AI and me

A new blog series on AI alignment

Nov 03, 2024

I just started participating in the AI Safety Fundamentals-Alignment course by BlueDot Impact. It is a 12-week online course containing 8 weeks of reading and small-group discussions, followed by a 4-week capstone project. I will push myself to make the most use of the course by starting to write a blog series on AI Alignment. I am very interested in the field and broadly consuming, so I will also incorporate things I learned outside the course.

I intentionally chose ‘Studying AI Alignment’ as the series title because studying can mean learning and research. At first, I expect to be mainly absorbing and thinking. But after a while, I hope to make my own contribution by conducting original research and perhaps even building a career in AI alignment. Hopefully, the blog series will continue with my journey in the field.

The course curriculum and materials are publicly available. There is no point in using too much space to summarise the content here blindly. Instead, I will share things that resonate with me and my thoughts. Of course, my thoughts could be immature, biased, or simply wrong. I welcome all discussions and criticisms.

The first session of the course is titled AI and the Years Ahead. The materials contain introductory articles and videos on AI impact, neural networks, LLM and the evolution of deep learning. There are some exercises and discussions on imagining the future of AI. But, before imagining the future, I would like to reflect a bit on the past.

What brings me to AI alignment?

I was exposed to AI in Sci-Fi as a child, but I don’t recall any special interest in AI. The significant moment of AI in my life happened in March 2016, when AlphaGo beat Lee Sedol. I watched the online livestream and consumed many related discussions. Even though I don’t play Go, the event was significant to me because Go had been mystified at a certain level in Chinese culture; I believed that one needs to hold some unique human insight and wisdom to be good at Go. Of course, as I developed my critical thinking abilities, I would eventually realise the blindness of such a thought. But AlphaGo inspired a rapid disenchantment and let me start to question my unexamined beliefs about human intelligence. From the discussions about AlphaGo, I first learned the terms like machine learning and deep learning. The event planted the seed of AI in my mind, even though I was young/naive and failed to grasp the extent to which it could be relevant to my interest in social science.

Then, in 2018, my penultimate year of undergraduate, the Facebook–Cambridge Analytica scandal broke out. My professor in social psychology cited this scandal as an example of misuse of social psychology research: malicious actors used well-intended social psychology findings to benefit their ill agendas. This scandal connected the dots I’ve been developing in my mind. It made me realise the power of AI and data science in both understanding and affecting society. After that, I chose all possible statistics and data science courses in my final year of undergraduate and applied for Master’s programmes in social data science. I joined the MSc Applied Social Data Science program at the LSE. Then, I went on to do a PhD in Social Research Methods, where I utilise data science and AI to study socioeconomic inequality.

During my PhD, ChatGPT happened. I did enjoy my work in using data science and AI to study social science topics, but I was increasingly interested in the social impact of AI. Especially given that so much noise about the impact of AI has entered public discussions since the popularity of ChatGPT and other Generative AI products. The expectations of AI range from it makes everything better to it destroys humanity. I really want to get through the noises, obtain real understandings and do tangible work. So now I have finished my PhD, I want to shift my main focus from AI for social science to AI alignment. It is a slightly different but related topic where many of my existing skills remain valuable. I have both some technical understanding of AI and some social science insights into humanity and society; this combination should set me up well in AI alignment.

I like the term AI alignment because my main concern is indeed aligning AI with human values and intentions. I also like the ironic side of the term. We as humans don’t even have aligned values among ourselves, but now we want AI to align with us. This irony precisely points out that the existence of AI provides a mirror to reflect on what makes us human. Studying AI alignment for me is a lens through which to understand humanity and society better.

My problems with imagining the future of AI

The course is well-designed, with many prompts to stimulate thinking. The first exercise encourages me to think about what type of tasks comprise most of my working hours and how well AI systems currently perform at those tasks. Then, imagine what happens in 2034.

I can summarise how well AI systems currently perform my tasks, but I find it hard to imagine what will happen in 2034. In fact, the difficulty of imagining the future of AI exactly motivates me to learn more about the field. Of course, it is possible to imagine AI being transformative and impactful (e.g., Dario Amodei, the CEO of Anthropic, recently provided his vision). But it is very hard for me to imagine how AI will affect my life and work in the next ten years due to my ‘reverence’ for the complexity of humanity as a social scientist. I think some popular optimistic views of the progress of AI tend to underrate human factors in technological progress.

For example, people often talk about the scaling law of neural network models, where the model performance increases as the model size, training data and computing power increases. Some extrapolate the scaling law to propose that model size, training data, and computing power are all there is; as long as we keep increasing these key factors, AI will keep improving until or above human-level intelligence. Since we could estimate a timeframe for increasing model size, training data and computing power, we could have a timeframe for AI development. I think this argument is problematic. I think scaling law is impressive, but just because it worked in the past does not mean it works the same in the future. A classic logical fallacy.

Even if we believe scaling law works until above human intelligence, as model size, training data and computing power increase, more human factors must be involved, rapidly increasing the complexity of building AI models. In a way, until recently, building AI models was relatively ‘simple’. Of course, it requires a lot of resources, but it basically needs a group of people interested in the topic with suitable data and computing power. However, data and computing power do not increase in isolation. For example, the increasing training data is already involving intellectual property issues. Increasing data storage and computing involves building more data centres in different locations, which consists of many human factors. So, in a way, there is a parallel scaling law where the increase in model size, training data and computing power leads to the rise in the involvement of human factors, which increases complexity and uncertainty. As the performance of AI models increases, the technical problem of AI model development becomes more and more social.

Those are just the model development. The implementation of AI becomes even more complex. There are many aspects of the complexity of AI implementation. Here, I want to focus on emotions. I think many people working on AI are not talking about emotions enough. This is problematic both for the development and implementation of AI. The development of current popular AI models assumes intelligence and emotions are separated, which is not a settled debate in neuroscience. There is a chance that reaching human-level intelligence requires emotions.

Even if emotions do not play a part in the development of AI, they certainly play a critical role in its implementation. I keep getting this ad about an AI app that automatically arranges my calendar, but I don’t want to use it at all. First, I don’t trust it. Second, I want to organise my calendar myself! This is a simple example of how people might resist the implementation of AI in their work or lives. You can easily think of many such cases. For instance, many people would prefer an AI-assisted human teacher/doctor over a purely AI teacher/doctor. And we can all imagine the challenge of large-scale AI job replacement. So, even if AI does progress very fast, how to implement it takes a lot of consideration.

That’s why I cannot easily imagine how AI would affect my work and life in ten years and why I am so interested in AI alignment. I am dealing with the uncertainty by delving into it.

I questioned the extremely optimistic view of AI progress, but I am cautiously optimistic about AI. The current AI products are already doing amazing things; we are still consistently figuring out how to implement the existing models better and build better models. On the other hand, the existing models, even the less capable ones before GPT, have already been misused and caused harm. I want to contribute to balanced and cautious ways of moving forward.

Postscript

As I wrote this article, I had a deeper realisation that the things I am interested in are very complex, and I have much more to study. Right now, I don’t even have the capacity as I wished to elaborate on some of my already existing intuitions. So, there is a lot to learn, and writing more is good for me.

I hope you enjoy reading this article and join my journey of studying AI alignment. If you also want to work in the field, that’s great! Of course, not everyone wants/needs to work directly in the field. Then, maybe subscribing to my Substack is a way to keep yourself somewhat informed. I also write other things you might be interested in; have a look at my old posts (Shameless self-promotion because I do believe my writing could provide value to some people). See you next time.

I also translated this article into Chinese with the assistance of ChatGPT. Check here if you are interested.

qrelic- desire paths🏔️🌟

Dec 5, 2024

fascinating stuff! i think humanity needs to make a concerted effort to steer this technology in a good direction. its a very powerful technology with a huge potential for misuse! you’re doing great work and i hope you gain a lot of insight from the course. p.s, if you wouldn’t mind- any application tips for the course? im thinking of applying myself.

Expand full comment

1 reply by Yuanmo He

1 more comment...

Yuanmo’s Substack

Discussion about this post