Keeping up in an industry that evolves as fast as AI is quite a task. So until an AI can do it for you, here’s a handy recap of the past week’s stories in the world of machine learning, along with some notable research and experiments that we didn’t just cover.
This week, Google dominated the AI news cycle with a series of new products launched at its annual I/O developer conference. They range from a code-generating AI intended to compete with GitHub’s Copilot to an AI music generator that converts text prompts into short songs.
Quite a few of these tools appear to be legitimate labor savings – more than marketing fluff, that is. I’m particularly intrigued by Project Tailwind, a note-taking app that uses AI to organize, summarize, and analyze files from a personal Google Docs folder. But they also expose the limitations and shortcomings of even today’s best AI technologies.
Take, for example, PaLM 2, Google’s latest major language model (LLM). PaLM 2 will power Google’s updated Bard chat tool, the company’s competitor to OpenAI’s ChatGPT, and will act as the base model for most of Google’s new AI features. But while PaLM 2 can write code, emails, and more like similar LLMs, it also responds to queries in toxic and biased ways.
Google’s music generator is also quite limited in what it can achieve. As I wrote in my hands, most of the songs I’ve made with MusicLM sound passable at best – and like a four-year-old letting loose on a DAW at worst.
Much has been written about how AI will replace jobs — possibly the equivalent of 300 million full-time jobs, according to a Goldman Sachs report. In a study by Harris, 40% of employees familiar with OpenAI’s AI-powered chatbot tool, ChatGPT, are concerned that it will completely replace their jobs.
Google’s AI isn’t everything. Indeed, the company is arguably lagging behind in the AI race. But there’s no denying that Google employs some of the best AI researchers in the world. And if this is the best they can handle, that’s a testament to the fact that AI is far from a solved problem.
Here are the other AI headlines from the past few days:
- Meta brings generative AI to advertising: Meta this week announced an AI sandbox of sorts for advertisers to help them create alternate copy, generate backgrounds through text prompts, and crop images for Facebook or Instagram ads. The company said the features are currently available to select advertisers and will expand access to more advertisers in July.
- Added context: Anthropic has expanded the context window for Claude – the flagship text-generating AI model, still in preview – from 9,000 tokens to 100,000 tokens. Context window refers to the text that the model considers before generating additional text, while tokens represent plain text (for example, the word “fantastic” is split into the tokens “fan”, “bag”, and “tic”). Historically and even today, poor memory has been a barrier to the usefulness of text-generating AI. But larger context windows could change that.
- Anthropic touts ‘constitutional AI’: Larger context windows are not the only differentiator of the anthropic models. The company this week detailed “constitutional AI,” its internal AI training technique that aims to imbue AI systems with “values” defined by a “constitution.” Unlike other approaches, Anthropic argues that constitutional AI makes systems’ behavior both more understandable and easier to adjust if needed.
- An LLM built for research: The non-profit Allen Institute for AI Research (AI2) has announced that it plans to train a research-focused LLM called Open Language Model to complement its large and growing open source library. AI2 sees Open Language Model, or OLMo for short, as a platform and not just a model – one that allows the research community to take any component AI2 makes and use it themselves or try to improve it.
- New fund for AI: In other AI2 news: AI2 Incubator, the nonprofit’s AI startup fund, is three times its size: $30 million versus $10 million. Twenty-one companies have gone through the incubator since 2017, attracting some $160 million in further investment and at least one major acquisition: XNOR, an AI acceleration and efficiency outfit that was subsequently snapped up by Apple for approximately $200 million .
- EU intro rules for generative AI: In a series of votes in the European Parliament, MEPs this week backed a series of amendments to the bloc’s draft AI legislation, including setting requirements for the so-called fundamental models that underlie generative AI technologies such as OpenAI’s ChatGPT . The changes place the responsibility on base model providers to implement safeguards, data governance measures and risk mitigation before their models go to market
- A universal translator: Google is testing a powerful new translation service that redubs video in a new language while syncing the speaker’s lips with words they’ve never uttered. It can be very useful for many reasons, but the company has been outspoken about the possibility of abuse and the steps being taken to prevent it.
- Automated Explanation: It’s often said that LLMs along the lines of OpenAI’s ChatGPT are a black box, and there’s certainly some truth to that. In an effort to peel back their layers, OpenAI is developing a tool to automatically identify which parts of an LLM are responsible for which behavior. The engineers behind it stress that it’s in its early stages, but the code to run it is available in open source on GitHub starting this week.
- IBM launches new AI services: At the annual Think conference, IBM announced IBM Watsonx, a new platform that provides tools to build AI models and access to pre-trained models for generating computer code, text and more. The company says the launch was prompted by the challenges many companies still experience deploying AI in the workplace.
Other machine learning
Andrew Ng’s new company, Landing AI, takes a more intuitive approach to creating computer vision training. Getting a model to understand what you want to identify in images is pretty painstaking, but with their “visual prompting” technique you can just take a few brush strokes and figure out your intent from there. Anyone who has to build segmentation models says “my god, finally!” Probably a lot of grads currently spending hours masking organelles and household objects.
Microsoft has applied diffusion models in a unique and interesting way, essentially using them to generate an action vector rather than an image, training it on numerous observed human actions. It is still very early days and diffusion is not the obvious solution for this, but since they are stable and versatile it will be interesting to see how they can be applied beyond purely visual tasks. Their paper will be presented at ICLR later this year.
Meta is also pushing the boundaries of AI with ImageBind, which is claimed to be the first model to process and integrate data from six different modalities: images and video, audio, 3D depth data, thermal information, and motion or positional data. This means that in the small machine learning embedding space, an image can be associated with a sound, a 3D shape, and various text descriptions, all of which can be queried or used to make a decision. It’s a step towards “general” AI in that it absorbs and associates data, more like the brain – but it’s still simple and experimental, so don’t get too excited just yet.
Everyone got excited about AlphaFold, and rightly so, but structure is only a small part of the very complex science of proteomics. It’s how those proteins interact that is both important and difficult to predict – but this new PeSTo model from EPFL aims to do just that. “It focuses on significant atoms and interactions within the protein structure,” says lead developer Lucien Krapp. “It means that this method effectively captures the complex interactions within protein structures to enable accurate prediction of protein binding interfaces.” Even if it’s not exact or 100% reliable, it’s super helpful for researchers not to have to start from scratch.
The FBI is going big on AI. The president even stopped by a meeting with some of the top AI CEOs to say how important it is to get this right. Maybe some companies aren’t necessarily the right ones to ask, but at least they’ll have some ideas worth considering. But don’t they already have lobbyists?
I’m more excited about the new AI research centers popping up with federal funding. Basic research is sorely needed to counterbalance the product-focused work being done by the likes of OpenAI and Google – so if you have AI centers with mandates to do things like social sciences (at CMU), or climate change and agriculture (at U of Minnesota ), it feels like green fields (both figuratively and literally). Although I would also like to give a small compliment to this forestry measurement meta-study.
There are many interesting conversations about AI. I found this interview with UCLA (my alma mater, go Bruins) academics Jacob Foster and Danny Snelson interesting. Here’s a great thought about LLMs to pretend you came up with this weekend when people are talking about AI:
These systems show how formally consistent most writing is. The more generic the formats these predictive models simulate, the more successful they are. These developments force us to recognize the normative functions of our forms and possibly transform them. After the introduction of photography, which is very good at capturing a representational space, the painterly milieu developed Impressionism, a style that completely rejected accurate representation to dwell on the materiality of paint itself.
Definitely use that!