A few weeks ago there was an open house at NYU Center for Data Science intended for faculty members of NYU. As one of the early members of the Center (i know! already!) i was given an opportunity to share why i joined the center and my experience at the Center so far with the audience. although i’m much more familiar with giving a research talk using a set of slides, i decided to try something new and give a talk without any slide. of course, this is totally new to me, and i couldn’t help but prepare a script in advance. i didn’t really stick to the script during my talk then, but thought it’s not too bad an idea to share this with a broader community beyond NYU.
What is intelligence?
It turned out that there’s a great list of speakers scheduled after my quick lightning talk, covering a broad set of topics spanning from mathematics, computer science, natural sciences, healthcare and medicine all the way to law. Each speaker will without a doubt tell us about the latest and greatest research in each direction they pursue and how it’s connected to data science and perhaps even more broadly artificial intelligence.
For me, i’m going through a bit of research identity crisis at the moment and thought i would spend a brief moment talking about why i decided to join the NYU Center for Data Science as one of the earliest so-called core faculty members in 2015.
My background is in computer science. I have received all of my degrees in computer science. The reason why i decided to pursue computer science was simple; i was fascinated by the idea that we can pose and answer this question; “what is computation?” this seemingly straightforward question has a lot of implications. First, it brings an abstract concept of computation into a scientifically well-founded concept that we can characterize and study. Second, this investigation into what computation is has led to practical solutions to many problems that were simply not straightforward to even define due to the lack of the definition of what computation is. What started as formal, mathematical journey into figuring out computation had become a major scientific discipline touching every corner of the society already when i started my undergrad years. Look around yourself and think of what you do everyday both personally and professionally. It is pretty much impossible nowadays to find a single activity that does not involve the outcome of computer science, and computer science continues to make progress in answering the question: “what is computation?”
Then, what is the next question we should and must ask? In my opinion, the next question is this; “what is intelligence?” or perhaps equivalently “what is knowledge?”
This question asks us what key concepts are needed for defining a sophisticated problem and a solution to such a problem, how these concepts could be scientifically and rigorously defined and characterized, and how they should be combined and searched through for us to automatically find a solution or algorithm to solve a complicated, real-world problems. In answering this question, two things have emerged as crucial components; they are learning and data.
“Learning” refers to in this context a process by which we automatically construct an algorithm to solve a problem. In other words, it’s a meta-algorithm that automatically builds a new algorithm. This “learning” process heavily relies on the availability of “data”, be it collected by humans, other algorithms or the learning process itself. From data, it identifies many underlying rules and regularities that could be exploited to solve a problem efficiently and effectively. This is precisely why we refer to this whole new discipline as data science.
We study mathematical and computational aspects surrounding this core concept of “data” behind intelligence and knowledge. What is the correct way to characterize data? What is the correct way to automate the collection of data in order to maximize the effectiveness and efficiency of “learning”? What is the correct way for a learning algorithm to maximally extract underlying rules and regularities from this data to construct an algorithm to solve a problem? All these questions point to the ultimate question of what intelligence is and what knowledge is, and on the way, solve many real-world problems based on data and learning.
This is the reason why I decided to join the Center for Data Science in addition to computer science in 2015 even when the center was in its early years. I haven’t had a single moment of regret since I joined CDS especially looking at the trajectory we have been taking.
Now let me tell you briefly about my own research in this context. One particular aspect of intelligence that sets us (humans) apart from other seemingly intelligent animals, such as other mammals and insects, is our use of sophisticated language. This use of sophisticated language presents a unique opportunity for us to push the boundaries of our experience. Although none of us in this room (i’m quite certain) has ever been to the Antarctica ourselves, we somehow all know that there are penguins in Antarctica. Although none of us in this room (i’m 100% certain) has ever been to the ancient Greece in person, we somehow know a lot about Ancient Greece, and probably more so than average Ancient Greeks who lived it themselves. Both of these are possible, because we use language to share experiences and broaden our boundary both spatially and temporally, which sets ourselves clearly apart from any other intelligent being on this earth. Together with our unique level of intelligence, it makes me believe we must study language carefully in order for us to answer the question “what is intelligence?”
There are two parts to studying and designing learning algorithms for natural language. One is to build a learning algorithm that focuses on extracting underlying semantics of language in order to solve problems that require in-depth knowledge expressed in text. This direction is pursued mainly by Sam Bowman and He He at NYU CDS, and I will skip it here myself. The other is to build a learning algorithm that knows how to generate a well-formed text, and this is my main research direction.
The problem of text generation belongs to a wider category of structured output prediction. In structured output prediction, a set of possible outcomes is very large, which is equivalent to technically saying the size grows exponentially w.r.t. the input size. In other words, it is not possible for our learning algorithm to naively test each and every possible configuration, and the learning algorithm must extract and exploit underlying structures that are often not apparent. once a good set of regularities have been extracted, learning provides us an efficient algorithm to rapidly search for a good configuration/sentence from this exponentially large space.
One particular approach I’ve been exploring since 2014 is called neural autoregressive models with attention, which has become the de facto standard not only in academia but also in industry for building machine translation, speech recognition and speech synthesis systems. This approach has recently been found by others as well as by my own group to be generally applicable to any structured object generation, where structured objects refer to generic graphs. One quick example is conditional molecule design. Together with Prof. Kang from SKKU who was visiting NYU Center for Data Science on his sabbatical, I was able to demonstrate the effectiveness of recurrent nets with attention mechanism and latent variables in controllable generation of molecular hypotheses. This effort, which started late 2017, has now been expanded to using graph neural networks (about which I believe Joan Bruna will tell you a more exciting story) to better capture the graph-likeness of molecules and proteins.
We are a very long way from answering the question; “what is intelligence” or “what is knowledge” in a rigorous manner. We have barely taken a step toward this goal, and if the history of any scientific discipline is any indication, it will be many correct and incorrect steps taken over decades if not centuries before we could barely claim to have taken a peek at the answer to this question.
One thing that is certain however is that we have been successfully building an environment here at the Center for Data Science by bringing in and hiring people with expertise necessary for us to advance toward answering this ultimate question. My research has certainly benefited from having a diverse set of colleagues of world-class caliber. I have designed and proposed a unified framework for online learning algorithms for recurrent networks together with Cristina Savin, which will be a crucial component for us to build an intelligent agent that lives indefinitely. I have worked with Sam Bowman to better understand and characterize these language understanding neural networks. I have studied the applicability of deep learning to physics and biology by working together with Kyle Cranmer and Rich Bonneau. I have been spending my effort in building a deep learning based diagnostic system for early-stage breast cancer screening together with Kryzsztof Geras, who was a postdoc at the NYU Center for Data Science and is now an assistant professor at Radiology. I have even had the pleasure of investigating the impact of uncertainty-aware word embedding in political science together with Arthur Spirling.
Thanks for listening to me, and I’ll be happy to chat more about any of these topics as well as how my experience with CDS has been so far.