overview for Architeuthis

Google's Gemini 2.5 pro is out of beta. by Architeuthis in c/[email protected]

[-] [email protected] 21 points 3 weeks ago

Claude's system prompt had leaked at one point, it was a whopping 15K words and there was a directive that if it were asked a math question that you can't do in your brain or some very similar language it should forward it to the calculator module.

Just tried it, Sonnet 4 got even less digits right 425,808 × 547,958 = 233,325,693,264 (correct is 233.324.900.064)

I'd love to see benchmarks on exactly how bad at numbers LLMs are, since I'm assuming there's very little useful syntactic information you can encode in a word embedding that corresponds to a number. I know RAG was notoriously bad at matching facts with their proper year for instance, and using an LLM as a shopping assistant (ChatGTP what's the best 2k monitor for less than $500 made after 2020) is an incredibly obvious use case that the CEOs that love to claim so and so profession will be done as a human endeavor by next Tuesday after lunch won't even allude to.

I find absolutely consistently that gen-AI advocates are literally unable to tell good output from bad output. They think people who say their output is garbage are just lying to have a go at them. by Architeuthis in c/[email protected]

[-] [email protected] 24 points 4 months ago* (last edited 4 months ago)

That's the second model announcement in a row by the major LLM vendor where the supposed advantage over the current state of the art is presented as... better vibes. He actually doesn't even call the output good, just successfully metafictional.

Meanwhile over at anthropic Dario just declared that we're about 12 months before all written computer code is AI generated, and 90% percent of all code by the summer.

This is not a serious industry.

I am rich and have no idea what to do by Architeuthis in c/[email protected]

[-] [email protected] 21 points 6 months ago

Anthropic and Apollo astounded to find that a chatbot will lie to you if you tell it to lie to you by Architeuthis in c/[email protected]

[-] [email protected] 21 points 6 months ago

What new AI abilities, LLMs aren't pokemon.

TPOT hits the big time! by Architeuthis in c/[email protected]

[-] [email protected] 23 points 7 months ago

thinkers like computer scientist Eliezer Yudkowsky

That's gotta sting a bit.

Google Search is getting worse and worse by Architeuthis in c/[email protected]

[-] [email protected] 23 points 8 months ago

Maybe Momoa's PR agency forgot to send an appropriate tribute to Alphabet this month.

"The Subprime AI Crisis" - Ed Zitron on the bubble's impending collapse by Architeuthis in c/[email protected]

[-] [email protected] 21 points 10 months ago* (last edited 10 months ago)

On each step, one part of the model applies reinforcement learning, with the other one (the model outputting stuff) “rewarded” or “punished” based on the perceived correctness of their progress (the steps in its “reasoning”), and altering its strategies when punished. This is different to how other Large Language Models work in the sense that the model is generating outputs then looking back at them, then ignoring or approving “good” steps to get to an answer, rather than just generating one and saying “here ya go.”

Every time I've read how chain-of-thought works in o1 it's been completely different, and I'm still not sure I understand what's supposed to be going on. Apparently you get a strike notice if you try too hard to find out how the chain-of-thinking process goes, so one might be tempted to assume it's something that's readily replicable by the competition (and they need to prevent that as long as they can) instead of any sort of notably important breakthrough.

From the detailed o1 system card pdf linked in the article:

According to these evaluations, o1-preview hallucinates less frequently than GPT-4o, and o1-mini hallucinates less frequently than GPT-4o-mini. However, we have received anecdotal feedback that o1-preview and o1-mini tend to hallucinate more than GPT-4o and GPT-4o-mini. More work is needed to understand hallucinations holistically, particularly in domains not covered by our evaluations (e.g., chemistry). Additionally, red teamers have noted that o1-preview is more convincing in certain domains than GPT-4o given that it generates more detailed answers. This potentially increases the risk of people trusting and relying more on hallucinated generation.

Ballsy to just admit your hallucination benchmarks might be worthless.

The newsletter also mentions that the price for output tokens has quadrupled compared to the previous newest model, but the awesome part is, remember all that behind-the-scenes self-prompting that's going on while it arrives to an answer? Even though you're not allowed to see them, according to Ed Zitron you sure as hell are paying for them (i.e. they spend output tokens) which is hilarious if true.

Sam Bankman-Fried funded a group with racist ties by Architeuthis in c/[email protected]

[-] [email protected] 21 points 1 year ago* (last edited 1 year ago)

Great quote from the article on why prediction markets and scientific racism currently appear to be at one degree of separation:

Daniel HoSang, a professor of American studies at Yale University and a part of the Anti-Eugenics Collective at Yale, said: “The ties between a sector of Silicon Valley investors, effective altruism and a kind of neo-eugenics are subtle but unmistakable. They converge around a belief that nearly everything in society can be reduced to markets and all people can be regarded as bundles of human capital.”

New Windows AI feature records everything you’ve done on your PC by Architeuthis in c/[email protected]

[-] [email protected] 21 points 1 year ago* (last edited 1 year ago)

Nightmare blunt rotation in the Rewind AI front page recommendations:

Recommended by Andreessen, Altman and Reddit founder

Also it appears to be different than Recall in that it's a third party app and not pushed as the default in every new OS installation.

SBF's effective altruism and rationalism considered an aggravating circumstance in sentencing by Architeuthis in c/[email protected]

[-] [email protected] 21 points 1 year ago

If I remember correctly SBF taking the stand was completely against his lawyers' recommendations, and in general he seems to have a really hard time doing what people who know better tell him to, such as don't DM journalists about your crimes and definitely don't start a substack detailing how you felt justified in doing them, and also trying to 'explain yourself' to prosecution witnesses is witness tampering and will get your bail revoked.

"Why I'm no longer a White Nationalist." Neoreactionary blogger goes to live in Red America just like he always dreamed. What followed will shock you! by Architeuthis in c/[email protected]

[-] [email protected] 23 points 1 year ago

conflict averse and probably low testosterone German Catholics [...] overcivilized and effete Teutons

Kind of off topic, but this piece of wall to wall insanity reminded me how Steven Pinker tried to explain away southern US crime rates that didn't fit with his Violence Is Declining And In Fact Everything's Improving Inexorably (As Long As You Don't Rock The Boat) thesis by randomly blaming irish-catholic sheepherder genealogy.

Ex-OpenAI board member Tasha McCauley is deep state because she married Joseph Gordon-Levitt by Architeuthis in c/[email protected]

[-] [email protected] 21 points 2 years ago

To be clear, it's because he played Edward Snowden in a movie. That's the conspiracy.