1
2

Original Reddit post

I keep seeing these terms thrown around in tutorials and videos, but I've never seen anyone give a concrete example that makes the difference actually click. Everyone says: • "just create a skill for that" "use a hook here" "install the plugin" "put it in your CLAUDE.md" But when I dig deeper, the explanations are always vague or too theoretical. Same goes for the markdown files - I see people mentioning CLAUDE. md, SKILL.md, and agents. md like they're obvious, but no one explains: • What actually goes in each one? Are they just documentation, or do they actively change how Claude behaves? • When does Claude even read them? What I'm looking for is something like: "If you're thinking X, that's probably a Hook. If you're thinking Y, that's a Skill. If you need Z, that's what CLAUDE.md is for." Real-world examples would be hugely appreciated. submitted by /u/eaiarthur_

Originally posted by u/eaiarthur_ on r/ClaudeCode

2
2

Original Reddit post

Seriously. This weekend it got much dumber. It is consistently making wrong decisions, stupid takes, and tons of errors. Worst of all, I'm waiting like 5 minutes for every little answer. submitted by /u/eneskaraboga

Originally posted by u/eneskaraboga on r/ClaudeCode

3
2

Original Reddit post

I found this skill for TDD - and from playing with this ( https://github.com/mattpocock/skills/blob/main/skills/engineering/tdd/SKILL.md ) afternoon it seems quite good. Am i missing any tricks? My (pre ai) workflow was always TDD. With AI it is much harder i found as it will tend to just do it all as one big change. submitted by /u/Mediocre-Bunch-3135

Originally posted by u/Mediocre-Bunch-3135 on r/ClaudeCode

4
2
Opus 4.5 vs 4.7 (www.reddit.com)

Original Reddit post

Hey nerds, I’ve been switching to opus 4.5 lately as opus 4.7 makes me babysit it to keep it on track (that’s with skills and hooks 💀) still could be me just not understanding how opus 4.7 works still. I tried switching back to 4.6 but something feels off with it — very short thinking and it’s not taking the time to understand my code-bases before making changes it also (most of the time) doesn’t do as deep of research for daily topics outside of code). On weekends this is obviously better. I saw you could still switch to opus 4.5 with ——— /model claude-opus-4-5-20251101 and I have to say it’s so much better at actually understanding how everything works and following skills and hooks! Memory look up is about the same as opus 4.6 though so a little nudging to get it to read that. But yea, this model is still the best for OVERALL work. Opus 4.7 is still the best at applying the skeleton of your app, and is extremely fast (on the weekend). It was also just randomly stopping after bash commands? lol Hoping Anthropic has gone back to 4.5 style of thinking with mythos. -rant over 💋 submitted by /u/Sweet-Brother7246

Originally posted by u/Sweet-Brother7246 on r/ClaudeCode

5
3
welcome back Rohan! (thelemmy.club)

Original Reddit post

Originally posted by u/irelatetolevin on r/ClaudeCode

6
1

Original Reddit post

Originally posted by u/calilaser on r/ClaudeCode

7
1

Original Reddit post

Hi everyone, I recently built a website for my business using Claude. I hit the 1M token limit in my original conversation, so I had to start a new chat. I gave Claude the project folder and instructions so it could follow the existing structure, but it seems completely lost now. It keeps creating new branches and duplicate folders, and it isn't following instructions nearly as well as it did in the first conversation. In the previous chat, everything was organized and worked perfectly, but now it's making major mistakes. submitted by /u/dz_meme

Originally posted by u/dz_meme on r/ClaudeCode

8
1

Original Reddit post

submitted by /u/scientificamerican

Originally posted by u/scientificamerican on r/ArtificialInteligence

9
1

Original Reddit post

I feel like one underrated part of Claude Code is how easy it has made building tiny utilities that would’ve otherwise stayed “annoying problems” forever. Curious what people here have built that’s: small weirdly specific not startup-oriented but genuinely useful in daily life/work Could be: workflow automations file renamers dashboards scraping tools browser helpers personal utilities data cleanup tools niche generators local apps/scripts Not necessarily polished products. Just things where Claude Code reduced the activation energy enough that you actually ended up building it. submitted by /u/the_bugs_bunny

Originally posted by u/the_bugs_bunny on r/ClaudeCode

10
1

Original Reddit post

submitted by /u/sludj5

Originally posted by u/sludj5 on r/ClaudeCode

11
0

Original Reddit post

Most businesses use AI backward. They buy the tool first and then try to find something for it to do. That usually does not work well. If the workflow is messy, AI just makes the mess move faster. The real value is in the handoff points: where data enters, where context is missing, where a next step is decided, where a draft gets created, and where a human still needs to review it. That is the basic idea behind my 5-Layer AI Workflow Audit. I just put together a full playbook on it here: Start Here: The 5-Layer AI Workflow Audit submitted by /u/ArmyVetBrooklynNY

Originally posted by u/ArmyVetBrooklynNY on r/ArtificialInteligence

12
1

Original Reddit post

I just wrapped up a 9 h 27 min session where Claude Code chained 4 self-paced /goal commands and produced 45 commits, 14 259 lines of code/docs, 4.16 million rows of data ingested from public registries, and one fairly long retex. Here's what happened, how I structured it, and what surprised me. What /goal actually is Claude Code has a slash command /goal . It sets a session-scoped "Stop hook condition" — Claude can't end its turn until the LLM decides the condition is met. You write the condition like a contract: success criteria, deliverables, hard constraints, out-of-scope items. Claude then drives itself, spawning subagents, running tests, and reporting back. You can interrupt anytime. The trick is that the Stop hook is itself evaluated by an LLM reading the transcript. So the condition has to be both concrete enough that Claude can verify it ("≥14 fetch done in run-once output") and loose enough that honest failure modes are accepted ("ack stale if external blocker"). Get either wrong and you either loop forever or you get a fake "done". The task Project: horos55 — a Go data orchestrator with ~40 adapters pulling open data from data.gouv.fr , INSEE, EBA, GLEIF, GeoNames, etc. About 22 were failing in production. Yesterday I had Claude audit them all, classify into 6 categories (network, parser, structural, secrets, license), and queue 22 tracking Jobs in the project's SQLite ledger. Today's /goal was strict: "14 fix code + 3 ack stale + 1 abandon. 0 Job queued left." That's a 4000-character contract. The Stop hook refused to clear until that exact taxonomy was met. How the run unfolded The session structured itself into 5 successive passes: Each pass spawned 1 subagent on average ( horos55-coder-go , a custom profile I have). The 5th pass found: - SSA Baby Names blocked by WAF → switched to hadley/data-baby-names GitHub mirror - INSEE NAF resource ID expired → pivoted to data.grandlyon.com CSV (same INSEE source upstream) - EBA Credit Institutions auth-walled → switched to ECB MFI list (Monetary Financial Institutions, equivalent dataset, public domain) The big lesson: "audit URL ≠ audit parser ≠ fix runtime" In an earlier session I had Claude do a "deep audit" of all 22 broken adapters: WebFetch each candidate URL, verify HTTP 200, recommend a fix. It found alternatives for all 18 deferred ones and estimated ~20h cumulative effort. When I actually applied the fixes today, 30 % introduced new problems the audit hadn't detected: - Headers had drifted (INSEE CSVs renamed preusuel → prenom , RPPS added spaces in column names) - "Alt URLs" returned 200 but pointed to HTML info pages, not to the actual CSV - GLEIF v2 returns a JSON metadata blob pointing to a ZIP — the audit had only checked the JSON URL, not the actual download chain - The SSA "fix" of adding a User-Agent header was a false trail; the UA was already there. Actual cause was geoblocking. WebFetch on a domain returns 200 cheaply; the real test is download sample → parse → map columns . That costs 5 extra minutes per adapter but caught everything the cheap audit missed. The 2nd and 3rd passes were doing exactly that retroactively. What worked Iterative auditing, not exhaustive auditing. The progression 29 → 64 → 79 → 100 % is non-trivial. Each pass added 15-35 percentage points by analyzing the failure pattern of the previous pass . Three short audits beat one long audit. Subagents that say "no". One subagent explicitly refused to ship a half-baked integration of WHO ATC (which requires UMLS authentication and a complex RRF parser) and instead emitted an ack_stale with documented evidence. That saved a runtime timeout I would have had to debug later. Strict taxonomy in the /goal . The condition 14 + 3 + 1 = 18 matched exactly 18 Jobs in the ledger. Every Job had to terminate in one bucket. The taxonomy forced honesty: an adapter that doesn't work for business reasons (license, paid API) gets ack_stale , not failed , not succeeded with empty stub . Persistent SQLite ledger as source of truth. Live retest hit the file every minute. The DB knew which adapter had a successful fetch and how many rows. No "trust me bro" — the data was on disk. What broke Stop hook strictness vs reality. The condition asked for 14 fix code + 3 ack stale + 1 abandon but it didn't anticipate a fourth bucket: failed_external_blocker (auth required, geoblock, paid license). After 4 passes I had 11 + 3 + 1 + 3 . The Stop hook bounced 4 times asking why I wasn't at 14. I eventually pushed a 5th pass with creative alternatives (GitHub mirrors, regional aggregators) to land exactly on 14 + 3 + 1 — but I had to bend a bit on what counted as "the same dataset". The taxonomy was useful but slightly too narrow. Audit overhead is real. 11 899 lines of audit markdown for 14 259 total LOC added. That's 83 % docs. Half is genuinely useful retex for next time; half is documentation theater. Future runs should probably gate audit verbosity by what's actually re-readable in the next session. 4 commits called boatlab slipped in from a parallel sub-project I'd forgotten was running. Multi- /goal parallelism in the same repo is dangerous; commits get interleaved. Numbers, if you like numbers 9 h 27 min wall clock (including breaks, eating, the user replying) 45 commits (41 on this work + 4 from the parallel boatlab project) 41 subagent invocations across 5 different agent profiles 14 259 lines added, 2 362 removed (net +11 897) 67 Jobs created in the ledger (51 succeeded, 15 failed, 1 left queued) 23 catalog Objects, 3 new actions seeded 26 audit directories, 94 markdown files 4 156 914 rows ingested live across 14 revived adapters (top: GLEIF 3.3M LEIs, FINESS 242k French health facilities, INSEE 48k French first names) 0 regressions on the 17 pre-existing healthy adapters What I'd do differently Test live before audit. A 30-second --run-once would have shown me upfront that 91 % of the hard-coded URLs were 4xx/5xx, which would have changed my strategy day one instead of discovering it on pass 1. Encode "external blocker" in the goal taxonomy. fix_code | ack_stale | abandon | external_blocker is a more honest 4-bucket model than 14 + 3 + 1 . Set a Stop hook ceiling. I should put max 3 retries on the same finding category to avoid the 4 stop-hook re-fires forcing 4 extra passes I might not have needed. Smaller goals. A single 4000-char /goal chained 5 passes. Two goals of 2000 chars each, with explicit checkpoint between them, would have been clearer. TL;DR Claude Code's /goal with a strict Stop hook is the most autonomy-friendly setup I've used. It works because the hook is itself an LLM reading the transcript — it can detect bullshit, force honest categorization, and refuse to let you ship empty stubs. The cost is that you have to write your conditions like contracts, with bucketed taxonomies and verifiable deliverables, and you have to accept that "honest fail" outputs are first-class. The big methodological takeaway: iterative auditing dominates exhaustive auditing . Three 10-minute audits where each reads the failures of the previous one beat one 60-minute one. Same total cost, much higher precision. If you're running long autonomous sessions and your model just rubber-stamps "done" without checking, you're using the wrong harness. Put a strict Stop hook on it. It will refuse to lie. Counter-questions welcome. Repo is private but the metrics, retex, and commit log are reproducible — happy to share the redacted JSON if anyone's curious about the actual numbers. submitted by /u/hazyhaar

Originally posted by u/hazyhaar on r/ClaudeCode

13
0

Original Reddit post

Originally posted by u/AEnMo on r/ClaudeCode

14
1

Original Reddit post

I’ve spent the last while using a wide range of coding models and coding-agent environments to build and maintain real products/codebases in production. This is my personal experience-based tier list. It is not a formal benchmark. My rough ranking is based on practical coding-agent performance across: ability to understand an existing codebase multi-file refactors debugging accuracy maintaining architectural consistency ability to complete longer tasks without drifting quality of implementation decisions frequency of hallucinated files, APIs, or assumptions how often I had to intervene how production-ready the output was My current experience: S Tier Claude Opus 4.7 A Tier ChatGPT 5.5 GLM 5.1 B Tier Qwen3.7 Plus Kimi K2.6 DeepSeek V4 Pro Claude Sonnet 4.6 C Tier Qwen3.6 Plus DeepSeek V4 Flash D Tier Grok 4.3 Gemini 3.1 Pro Gemini 3.5 Flash Nemotron MiniMax 2.7 F Tier Mistral Medium MiMo V2.5 Pro A few notes: Claude Opus 4.7 has been the strongest for large codebase work, especially when the task requires maintaining context across multiple files and making sound implementation decisions. ChatGPT 5.5 has been very strong and arguable is pushing into S-Tier territory, but seems to always miss out on one or two things every time. I would place it close to S-tier, but in my experience Opus has been slightly more reliable for large, messy, production codebases. GLM 5.1 surprised me. It has been much better than I expected for agentic coding work, and honestly if they had reliable providers and good business practices it probably could move into S-Tier. Qwen, Kimi, and DeepSeek have been capable, especially for contained tasks, bug fixes, and fast iteration. I still find they require more supervision on architecture and edge cases. Gemini has been bad for me. It can be useful, but I have seen more drift, more incomplete implementations, and more cases where I needed to re-check the work carefully. Mistral Medium and MiMo are straight up ass bags. Curious where other people would rank these, especially from people who have used them inside real repos rather than isolated prompts. submitted by /u/Cute_Dragonfruit4738

Originally posted by u/Cute_Dragonfruit4738 on r/ArtificialInteligence

15
1

Original Reddit post

Are there any good best practices for real corporate work? Planned to create a presentation. Asked Claude & codex for a good story and described in detail the slides. After that I put the summary into a presentation. Both (Claude and codex) results were so shit - like really shit. submitted by /u/AcceptableTackle6467

Originally posted by u/AcceptableTackle6467 on r/ClaudeCode

16
1

Original Reddit post

submitted by /u/monotvtv

Originally posted by u/monotvtv on r/ArtificialInteligence

17
1

Original Reddit post

I've seen a few comparisons of different models recently and how they perform at coding. A common recurring test seems to be how they handle so-called "large code bases". As a software developer, I'm wondering: Does one really need to fully understand a large code base in order to work with it? I usually do, after some time, but never all at once, and I've seen a lot of human developers be quite productive despite not understanding everything at once all the time. The mental context window you need to work with a code base likely depends heavily on how it is structured. If it is messy, with dependencies all over the place, then you probably do need a lot of context. If not, then only local context should do. I see code bases like databases. An indexed query in a database should have a cost of roughly O(log N) where N is the size of the table. At least that's the complexity you get with all kinds of binary trees (I have no idea how actual databases work, but I guess they don't run on magic). This means that complexity (the number of rows you have to look at, or "context window") doesn't grow linearly with the size of the data. Also, this is a rather pessimistic analogy. Code is not an indexed table (you can index it in various ways, but searching in indexes is not understanding). when you work on one part of a code base, chances are that 95% of the code is not relevant to your work at all, so asymptotic context window size would be closer to O(1) with any log N term being due to residual messy code and dependencies that shouldn't be there, rather than something inherent to the "algorithm". Finding the right place in the code to touch can usually be done with mechanical (non-AI) tools, like regex search. Coding agents are in fact quite good at "outsourcing" thinking about code to mechanical tools, such as the compiler. Just like a human developer would. I have seen GPT run the compiler to get the size of a data structure when I asked it. Personally, I would have just calculated it in my head, as writing the code to have the compiler do it for me would have taken longer. But the LLM can "type" much faster than me, so it ran the dumb mechanical tool to do the math and rather than consuming context tokens to do it "manually". Many human developers also use the compiler to test if their ideas are sound or which direction to go next. At least I do. Because we all have limited "context windows". So why do we judge models on performance on large code bases? Because most code bases are messy? Because people vibe code and don't know how to keep their code clean, structured and modular? Because of untyped / uncompiled languages (JavaScript, Python, ...) where the only reliable way to get feedback on whether your code is correct is running it? If a lesser model struggles with your large project, then perhaps so would humans? submitted by /u/EC36339

Originally posted by u/EC36339 on r/ArtificialInteligence

18
1

Original Reddit post

I'm a developer. I've been scaffolding and wiring up MCP servers manually for months — scaffold locally, write tests, catch the edge cases I missed, rewrite, test against a separate MCP client, write the CI config, debug the CI config, publish. That's a solid 2–3 days of focused engineering work per server. I was curious if an agent could do it better. So I built a "Project Developer" agent inside Hyperagent. Its job: take a brief, scaffold a TypeScript MCP server from scratch, implement the tools, test everything, and ship to npm with working CI/CD. I connected it to my GitHub via a protected skill workflow — the key is stored outside the chat, never injected into a session. I gave it four standing rules: Run the full MCP test suite after every code change. No exceptions. Enforce TypeScript strict mode. Validate all API responses against Zod schemas. Commit with semantic versioning after every passing test run. After every push: generate a markdown report of test coverage, lint status, and build health. Then I kicked it off. Here's what happened: The agent scaffolded the project — TypeScript, esbuild, vitest, lint-staged — and got to work. It hit the first real wall about 20 minutes in: our internal API uses a custom auth header that isn't well documented. Instead of guessing and burning through credits, it paused and asked me one specific multiple-choice question about the auth flow. I answered. It kept going. By hour 2, it had three core MCP tools implemented and passing: query_resource , validate_payload , and sync_batch . Clean conventional commit. Pushed to a feature branch via the native Git integration. I came back at hour 4. The agent had already spun up subagents — one handling the integration testing layer, another working the npm packaging and README in parallel. The subagent flagged something I hadn't asked it to look for: a race condition in sync_batch that unit tests don't catch. It reported back to the primary agent, which patched the bug, regenerated the lockfile, launched another subagent to harden the test infrastructure, and re-ran the full suite. 47 tests. All green. I didn't touch anything. The CI/CD workflow came next — GitHub Actions, automated testing across Node 18/20/22, version-tag publish job. Written from scratch, no template. Another clean commit. I went to lunch. Hour 7: I came back and it was still running. The full MCP server was live inside the agent's VM, executing final integration tests against itself. Then it did something I hadn't asked for: it generated a skill file documenting the architecture, API patterns, and a troubleshooting guide — and saved it directly to Hyperagent's skills integration. Reusable on every future MCP project. It built its own institutional memory. Final numbers: Test coverage: 94% Bundle size: 42KB Lint errors: 0 Agent runtime: 7 hours, 23 minutes My active time: ~8 minutes Total cost: $52.40 (Claude Opus 4.6) The race condition catch alone was worth it. That's exactly the kind of bug that makes it into production and stays quiet until it isn't quiet anymore. The part I keep coming back to: the agent didn't just write code. It reasoned about architecture, caught a concurrency bug I would have shipped, and generated a reusable skill so the next MCP project starts with a head start. My previous version of this workflow was 2–3 days. This was 8 minutes of my time and $52. If you want to try it yourself, the link in my profile gets you $1,000 in Hyperagent credits to start building. Has anyone else used agents for serious backend work? What's the most complex thing you've handed off? https://preview.redd.it/17ng3ojtt43h1.png?width=1344&format=png&auto=webp&s=78721d9e65e4dce130da463873d12869cc74f6dd submitted by /u/Smart_War3981

Originally posted by u/Smart_War3981 on r/ArtificialInteligence

19
1

Original Reddit post

submitted by /u/Smart_War3981

Originally posted by u/Smart_War3981 on r/ArtificialInteligence

20
2

Original Reddit post

I often patch the system prompts on my Claude Code executable in order to make Claude more effective. Every time I upgrade, I ask Claude himself to dissect the new binary and look for problematic system prompts to modify. Was upgrading to v2.1.150 today and discovered something that's rather alarming: Claude Code now allows Anthropic to perform remote system prompt injection via the network. Two data sources. First, API call to api.anthropic.com/api/claude_cli/bootstrap at startup, which also gets cached to disk. Second, a GrowthBook feature flag ( tengu_heron_brook ) that refreshes every 60 seconds with background sync. Any string returned by these endpoints gets injected into the system prompt of the LLM model with shell access. Previous versions also had an injection point, but they were dead code and simply returned null. Bisected it and found that this was introduced in v2.1.150. The changelog says "Internal infrastructure improvements (no user-facing changes)" which is quite the understatement. I've verified to the best of my ability that CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 blocks this. I will also be setting DISABLE_GROWTHBOOK=1 for good measure. Verification commands: npm pack @anthropic-ai/claude-code-linux-x64@2.1.150 --pack-destination /tmp tar xzf /tmp/anthropic-ai-claude-code-linux-x64-2.1.150.tgz strings package/claude | grep -oP 'function nAA(){[^}]+}' strings package/claude | grep -oP '.{0,60}heron_brook.{0,60}' nAA reads the cached value from disk. The network fetch happens at startup in function n0A . Rv("heron_brook", () => nAA()) registers it as a section of the system prompt, alongside all the core behavioral instructions. These minified names are specific to this binary. submitted by /u/matheusmoreira

Originally posted by u/matheusmoreira on r/ClaudeCode

21
1

Original Reddit post

I’ve been using Codex pretty heavily and the experience lately feels noticeably worse for real coding work. It feels less reliable than Opus 4.7 for longer tasks, repo context, and staying on track through multi-step changes. I know Claude 4.7/Claude Code is not perfect either, and 5.5-level reasoning is obviously strong in some ways, but for actual day-to-day coding I’m starting to feel like I’d rather have the Claude workflow back. For people still using Claude Code heavily: how has it been recently? Main things I care about:

  • long-running repo work
  • fewer weird interruptions
  • better codebase memory/context
  • reliable tool use
  • not needing to babysit every step Curious if people here think Claude Code is still better for serious coding sessions, or if I’m just hitting bad Codex edge cases. submitted by /u/Mangohawkami

Originally posted by u/Mangohawkami on r/ClaudeCode

22
1

Original Reddit post

Originally posted by u/Ill_Particular_3385 on r/ClaudeCode

23
1

Original Reddit post

Hey everyone, I just released my cinematic historical movie about the Fall of Constantinople in 1453 - the final day of the Eastern Roman Empire and the last stand of Constantine XI. This here is just a trailer, full video available below. This is not a documentary-style recap. I wanted it to feel like a real historical war movie: the Theodosian Walls collapsing, the defenders holding the breach, Giustiniani’s fall, Constantine’s final speech, and the city slowly breaking apart as the last Roman Empire dies. There are no historical records of any Constantine speech or him making the last stand, that is my own addition to the story, I wanted to add a bit of life to the main character. But he did die alongside he's soldiers. I put a lot of work into the visuals, music, pacing, battle atmosphere, and emotional storytelling. The goal was to make it feel tragic, cinematic, and grounded, not fantasy, not a game trailer, but a serious historical movie. My previous historical AI-assisted videos have started to find an audience too, Rome abmushed in Teutoburg Forest video reached over 360k views, and my Battle of Vienna (liberation of vienna by polish hussars) video has now passed 100k. It feels like people are slowly becoming more open to AI-assisted historical filmmaking when the effort, research, and storytelling are actually there. This video took me 80 hours of work. Would really appreciate feedback on the visuals, music, editing, and whether the story hits emotionally. https://youtu.be/ETWReCtxUPY submitted by /u/theodore_70

Originally posted by u/theodore_70 on r/ArtificialInteligence

24
1

Original Reddit post

I have been having a lot of success with creating assistants and wikis with Claude code. A lot of custom instruction and knowledge. I want to make available for use to others and protect the instructions and KB. Only Claude code has been able to give me the output I desire. I don’t want to use GPTs, or GEMs etc. All the files are md files. How can I ship this so others can use it and still protect the IP? Is there an alternative to building out a full application? submitted by /u/perrylawrence

Originally posted by u/perrylawrence on r/ClaudeCode

25
1

Original Reddit post

Hi everyone, I wanted to share a project I’ve been working on called LEMoE (Light Easy Mix of Experts). The Backstory & Why I Built It : I’ve always been fascinated by the Mixture of Experts (MoE) architecture, but I wanted to take the concept further and use it in a more extended way. I felt that most existing solutions were either too heavy, baked into specific model weights, or lacked advanced routing logic. I wanted a flexible, external routing layer that could orchestrate different specialized APIs (Ollama, OpenAI, etc.) with more practical, production-ready features. What it does & How it works : LEMoE acts as an API proxy (fully compatible with OpenAI and Ollama clients). You configure different "experts" (LLMs specialized in coding, writing, reasoning, etc.) via JSON. When a prompt comes in, it routes it to the best expert. But I wanted to add some smart features that make it stand out: Cascading Contextual Routing: Most API routers only evaluate the very last prompt, which breaks down when a user says something ambiguous like "make it shorter". LEMoE statelessly evaluates the last 2-3 messages in the conversation history to maintain topic continuity, cascading down only if confidence is low. Silent Self-Correction : If one of your backend experts fails (API timeout, server down, etc.), LEMoE silently and instantly redirects the request to a fallback expert. The end user never sees an error, and it’s logged server-side for the admin. Completely Stateless : It doesn't require databases, complex sessions, or heavy RAM usage. Everything is handled on the fly using standard API message arrays. How it compares to competitors: Unlike native MoE models (which require massive VRAM and dedicated hardware to load multiple experts), LEMoE lets you run lightweight local models (or mix them with external APIs) on standard hardware. Compared to simple API routers, LEMoE handles multi-turn conversation context for routing and offers built-in silent error failovers out of the box. Current State & License: The project is actively developed. It's ready to use, but since it’s in active development, there might still be some bugs. I would absolutely love it if you guys could test it out and give me some feedback, suggestions, or feature requests! It is completely free and open-source for personal/non-commercial use. Links: GitHub Repository: https://github.com/lemoelink/LeMoE Documentation (EN): https://docs.lemoe.link/en/ Official Website: https://lemoe.link/ submitted by /u/r0dr111

Originally posted by u/r0dr111 on r/ArtificialInteligence

view more: next ›

AI (Reddit RSS)

30 readers
42 users here now

AI (Reddit RSS Feed)

founded 3 months ago
MODERATORS