Here’s what actually happens inside LLM’s neural network

Arich Lawson | Good pictures

With most computer programs—even complex ones—you can drill down to find code and memory usage. why That program produces some specific behavior or output. This is generally not true in the field of AI, where the unexplainable neural networks underlying these models, for example, make it difficult even for experts to pinpoint exactly why they often confuse information.

now, New research from anthropology Claude offers a new window into what goes on in the LLM’s “black box.” of the company New paper “Extracting Interpretable Features from a Clad 3 Sonnet” describes a powerful new method to partially explain how a model’s millions of artificial neurons fire to produce surprisingly lifelike answers to common questions.

Opening the hood

When analyzing an LLM, it is trivial to see which specific artificial neurons are activated in response to which specific query. But LLMs do not simply store different words or concepts in a single neuron. Instead, as the Anthropic researchers explain, “Each concept is represented in multiple neurons, and each neuron is involved in representing multiple concepts.”

A system to sort out this one-to-many and many-to-one mess Scattered autoencoders and can be used to perform complex math A “dictionary learning” algorithm across the model. This procedure highlights which groups of neurons are most consistently activated for specific words appearing in various textual stimuli.

The same internal LL.M
Zoom in / The same internal LLM “feature” describes the Golden Gate Bridge in multiple languages ​​and methods.

These multidimensional neuron patterns are then called “features” associated with certain words or concepts. These features include anything from simple formal nouns such as Golden Gate Bridge For more concise concepts such as Programming errors Or addition function Representing the same concept in computer code and in multiple languages ​​and modes of communication (eg, text and images).

See also  Tropical Storm Joa causes dangerous surf and rip currents along the coasts of California and Mexico

A October 2023 Anthropological Review This basic process was shown to work on very small, one-layer toy models. The company’s new paper gauges have stepped up to identify the tens of thousands of active features on its mid-size Clad 3.0 Sonata model. The resulting feature map – you can Area study“Creates a rough conceptual map [Claude’s] The researchers write that internal levels are “halfway through its computation” and “reflect the depth, breadth, and abstraction of Sonnet’s advanced capabilities.” At the same time, the researchers caution that this is an “incomplete description of the model’s internal representations.” It may be “orders of magnitude” smaller than a complete mapping of clade 3. .

A simple diagram shows some concepts "nearby" The "Internal conflict" Feature in Anthropic's Claude model.
Zoom in / A simplified diagram shows some concepts that are “close” to the “Internal Conflict” feature in the Anthropocene clad model.

Even at a surface level, browsing through this feature map helps show how certain key words, phrases, and concepts connect to approximate knowledge. A The feature named “Capitals”, For example, the words “capital” tend to perform strongly, but specific city names such as Riga, Berlin, Azerbaijan, Islamabad, and Montpelier, Vermont are also mentioned to name a few.

The study also calculates a mathematical measure of “distance” based on the neural similarity of different features. This process results in “feature neighborhoods” that are “often organized into geometrically related clusters that share a semantic relationship,” the researchers write, adding that “the internal organization of concepts in an AI model resembles, at least to some extent, our human. concepts of similarity.” For example, the Golden Gate Bridge feature is relatively “close” to features describing “Alcatraz Island, Girardelli Square, the Golden State Warriors, California Governor Gavin Newsom, the 1906 earthquake, and the San Francisco-set Alfred Hitchcock film.” Vertigo.”

See also  Starliner takes off on 1st flight with NASA astronauts
Here are a few key points in answering the question about Kobe Bryant's team's state capital.
Zoom in / Here are a few key points in answering the question about Kobe Bryant’s team’s state capital.

Identifying specific LLM features can help researchers map the chain of inference the model uses to answer complex questions. For example, an announcement about “Capital of the state where Kobe Bryant played basketball” shows activity in a chain of features related to “Kobe Bryant,” “Los Angeles Lakers,” “California,” “Capitals,” and “Sacramento.” ,” to name a few calculated to have a greater effect on results.

JP Morgan expects to cut the base rate by 100 basis points this year

At the start of the month Wall Street was hopeful—but not convinced—that it would get a much-anticipated interest rate cut in September. Fed Chairman Jerome...

There are great players and potential matches

NFL Guardian Here's what we know about Caps right nowGuardian caps provide added protection over the helmets of players in inherently violent sports. Are...

Nasdaq falls as investors put their time ahead of Nvidia earnings

Shares of Coles ( KSS ) rose as much as 7% in early trading after the company beat Wall Street's revenue expectations by 15...

Dinosaur footprints found on two continents match

The video shows a large dinosaur with identical green bonesThe 150-million-year-old bones discovered in Utah will go on display at the Natural History Museum...

HMD’s Barbie Flip Phone is tough

HMD's Barbie-branded flip phone may be a bit late to catch the hot foldable summer wave, but it's certainly not lacking in appeal. Pre-announced...

Mark Zuckerberg says White House pressured Facebook to censor Covid-19 content | Meta

Meta boss Mark Zuckerberg has said he is bowing to what he says is pressure from the US government to censor Facebook and Instagram...