Scale Is All You Need for Superintelligence: Interesting Assertion

July 2, 2024

green-dino_thumb_thumb_thumb_thumb_tThis essay is the work of a dumb dinobaby. No smart software required. This blog post includes information from Claude 3.5 Sonnet. The information from smart software is in blue.

At this time, smart software boils down to the “transformer.” I navigated to Claude 3.5 Sonnet and asked the smart software to explain “transformer.” Here’s what the system generated:

The Google Transformer innovation, introduced in 2017, revolutionized natural language processing by introducing a novel architecture that relies entirely on attention mechanisms. This approach allows the model to process input sequences in parallel, rather than sequentially, leading to significant improvements in translation quality and training efficiency. The Transformer’s key features include self-attention, which enables it to capture relationships between words regardless of their position in a sentence, and positional encoding, which helps the model understand word order without relying on recurrent neural networks

I then asked, “Are there other ways to achieve smart software or AI information functions? Claud 3.5 Sonnet spit out this list:

  1. Machine Learning Algorithms
  2. Expert Systems
  3. Neural Networks.

Options are good. But the buzz focuses on transformers, a Google “invention” allegedly a decade old (but some suggest its roots reach back into the mists of time). But let’s stick with the Google and a decade.

image

The future is on the horizon. Thanks, MSFT Copilot. Good enough and you spelled “future” correctly.

Etched Is Making the Biggest Bet in AI” That’s is an interesting statement. The company states what its chip is not:

By burning the transformer architecture into our chip, we can’t run most traditional AI models: the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2. We can’t run CNNs, RNNs, or LSTMs either. But for transformers, Sohu is the fastest chip of all time.

What does the chip do? The company says:

With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is an order of magnitude faster and cheaper than even NVIDIA’s next-generation Blackwell (B200) GPUs.

The company again points out the downside of its “bet the farm” approach:

Today, every state-of-the-art AI model is a transformer: ChatGPT, Sora, Gemini, Stable Diffusion 3, and more. If transformers are replaced by SSMs, RWKV, or any new architecture, our chips will be useless.

Yep, useless.

What is Etched’s big concept? The company says:

Scale is all you need for superintelligence.

This means in my dinobaby-impaired understanding that big delivers a really smarter smart software. Skip the power, pipes, and pings. Just scale everything. The company agrees:

By feeding AI models more compute and better data, they get smarter. Scale is the only trick that’s continued to work for decades, and every large AI company (Google, OpenAI / Microsoft, Anthropic / Amazon, etc.) is spending more than $100 billion over the next few years to keep scaling.

Because existing chips are “hitting a wall,” a number of companies are in the smart software chip business. The write up mentions 12 of them, and I am not sure the list is complete.

Etched is different. The company asserts:

No one has ever built an algorithm-specific AI chip (ASIC). Chip projects cost $50-100M and take years to bring to production. When we started, there was no market.

The company walks through the problems of existing chips and delivers it knock out punch:

But since Sohu only runs transformers, we only need to write software for transformers!

Reduced coding and an optimized chip: Superintelligence is in sight. Does the company want you to write a check? Nope. Here’s the wrap up for the essay:

What happens when real-time video, calls, agents, and search finally just work? Soon, you can find out. Please apply for early access to the Sohu Developer Cloud here. And if you’re excited about solving the compute crunch, we’d love to meet you. This is the most important problem of our time. Please apply for one of our open roles here.

What’s the timeline? I don’t know. What’s the cost of an Etched chip? I don’t know. What’s the infrastructure required. I don’t know. But superintelligence is almost here.

Stephen E Arnold, July 2, 2024

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta