Introducing effGen — The Future of SLM Agents

BuildPowerfulAIAgents
with Small Language Models

Optimized for SLMs. 5-10x faster with complexity routing, automatic task decomposition, multi-agent orchestration, and vLLM.

$
0x
Faster (vLLM)
0+
Built-in Tools
0
Presets
0
Inference Backends
0
Cloud Providers
effgen-terminal● RUNNING
agent_demo.py
$
AGENT LOOP
ThoughtActionObservationAnswer
SCROLL
Features

Everything You Need to Build
Production-Ready AI Agents

Optimized for Small Language Models with production-grade features

Policy-Based ModelRouter

Compose FirstAvailable, CostBased, and LatencyBased policies over the cloud providers with explainable RouterDecisions and transparent failover.

  • Cost, latency, and first-available policies
  • Auto-failover on rate-limit, 5xx, timeout, budget
  • RouterEvent subscribers · cost CLI
Click for details

14 Inference Backends

Five local engines plus 9 cloud providers: OpenAI, Anthropic, Gemini, Cerebras, Groq, Together, Fireworks, Replicate, and HF Inference.

  • 9 cloud providers with one Agent API
  • Provider-prefixed model IDs
  • Streaming + provider-supported tools
Click for details

ProviderRegistry + Doctor

A unified provider registry lists providers, resolves models, catches ambiguous IDs, and checks API key readiness.

  • list_providers() / list_models()
  • effgen doctor --json
  • Unified ModelAuthError
Click for details

Automatic Sub-Agent Routing

Runs complex tasks through Agent sub-agent mode with routing, decomposition, and result synthesis built into the agent loop.

  • AgentMode.AUTO routing
  • Built-in decomposition engine
  • Parallel or sequential sub-agents
Click for details

Multi-Agent Orchestration

Coordinate multiple specialized agents with lifecycle management and agent-to-agent communication.

  • Team patterns
  • Shared state
  • Message bus
Click for details

Ultra-Fast vLLM Integration

Native vLLM support delivers 5-10x faster inference. Auto multi-GPU tensor parallelism and PagedAttention.

  • 5-10x faster inference
  • PagedAttention memory efficiency
  • Auto multi-GPU support
Click for details

Universal Tool Integration

58+ local tools plus provider-native tools across OpenAI, Gemini, and experimental Anthropic adapter specs.

  • 58+ local tools (docs, OCR, audio, image, geo, comms, research, news, social, finance, DevOps, RAG…)
  • OpenAI / Gemini Agent-native tools
  • Anthropic experimental tool specs
Click for details

Guardrails & Safety

Offline, ML-free guardrails for toxicity, PII, prompt injection, topics, length, and tool safety. Composable chains with four presets.

  • PII (SSN/email/phone/CC-Luhn), Toxicity, Topic, Length
  • Prompt-injection detection (low/med/high)
  • Tool input/output/permission guardrails
Click for details
Get Started
Architecture

How effGen Works

The ReAct agent loop — reasoning and acting in perfect harmony

01

User Input

Natural language task or query is received by the agent

agent.run("Calculate 24344 * 334")
Click for details
02

Reasoning

Agent analyzes the task using ReAct-style thinking

Thought: I need to multiply these numbers...
Click for details
03

Tool Selection

Best tool is selected from 58+ built-in options

Action: Calculator(24344 * 334)
Click for details
04

Execution

Tool runs in a sandboxed environment with safety controls

Executing Calculator...
Click for details
05

Observation

Agent observes and validates the tool output

Observation: 8130896
Click for details
06

Final Answer

Synthesized response returned to the user

Answer: 8,130,896
Click for details
Steps 2-5 repeat until the task is complete (max_iterations configurable)
Built-in Tools

58+ Tools Ready to Use

From finance, data science, DevOps, and academic research to news, social, translation, QR codes, OCR, audio transcription, image analysis, document parsing (PDF/DOCX/Excel), geo/weather, and email/webhook communication — everything your agent needs, built in. 14 new tools landed in v0.2.6.

🧮
COMPUTATION

Calculator

Perform mathematical calculations, evaluate expressions, and convert units

expressionoperationfrom_unitto_unitprecision
CODE

CodeExecutor

Execute code in a secure sandboxed environment (Python, JS, Bash)

codelanguagetimeoutmemory_limitnetwork_enabledfilesenv_vars
🐍
CODE

PythonREPL

Execute Python code in a persistent REPL session

codesession_idreset_sessionreturn_variablesrestricted_mode
🔍
INFO

WebSearch

Search the web using DuckDuckGo, SerpAPI, or Google

querynum_resultsbackendtime_rangelanguageregion
🌐
INFO

URLFetchTool

Fetch webpage content and extract readable text

urlextract_links
📋
DATA

JSONTool

Parse, query (JSONPath), validate, and format JSON data

dataoperationquery
📁
FILES

FileOperations

Safe file system operations: read, write, search, convert

operationpathcontentformatencodingpatternrecursivetarget_format
💻
SYSTEM

BashTool

Execute shell commands with security controls

command
Agent Presets

One-Line Agent Creation with 8 Presets

Ready-to-use configurations optimized for common use cases. v0.2.6 adds media and notify.

🧮

math

Mathematical computations

temp
0.3
itr: 8
create_agent("math", model)
CLICK FOR DETAILS
🔬

research

Web · academic · news · social · video · docs

temp
0.5
itr: 10
create_agent("research", model)
CLICK FOR DETAILS
💻

coding

Code execution & development

temp
0.4
itr: 12
create_agent("coding", model)
CLICK FOR DETAILS
🚀

general

All 32 general-purpose tools

temp
0.7
itr: 10
create_agent("general", model)
CLICK FOR DETAILS
📖

rag

Retrieval-augmented Q&A over your docs

temp
0.3
itr: 8
create_agent("rag", model, knowledge_base="./docs/")
CLICK FOR DETAILS
🎞️

media

Audio transcription + vision captioning

temp
0.3
itr: 8
create_agent("media", model)
CLICK FOR DETAILS
📢

notify

Email + Slack + Discord notifications

temp
0.3
itr: 6
create_agent("notify", model)
CLICK FOR DETAILS

minimal

Direct inference, no tools

temp
0.7
itr: 1
create_agent("minimal", model)
CLICK FOR DETAILS
Quick Start

Up and Running in 60 Seconds

Three practical steps to your first local or cloud-backed AI agent. Expand the advanced section for budget-aware routing, thinking/grounding/caching, and persistent cost tracking.

01

Install

One command for the core; add an extra only when you need that provider.

bash
pip install -U effgen

# Optional extras
pip install "effgen[vllm]"       # local CUDA throughput
pip install "effgen[cerebras]"   # Cerebras
pip install "effgen[groq]"       # Groq
02

Set Keys & Check

Export the keys you have, then run the doctor to confirm what's wired up.

bash
export OPENAI_API_KEY="..."
export GROQ_API_KEY="..."
export CEREBRAS_API_KEY="..."

effgen doctor               # table of providers + key status
effgen doctor --json        # machine-readable
03

Create an Agent

Same Agent API across local SLMs and any registered cloud provider.

python
from effgen import load_model
from effgen.presets import create_agent

# Cloud
model = load_model("gpt-5.4-nano", provider="openai")
# model = load_model("groq:llama-3.3-70b-versatile")

# Or local
# model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")

agent = create_agent("general", model)
print(agent.run("What is (17 * 23) + sqrt(144)?").output)  # 403
Start Building Today — It's Free

ReadytoBuildthe
FutureofAI?

Join thousands of developers building next-gen agents with effGen. Open source, production-ready, and blazing fast.

...
GitHub Stars
...
Forks
...
Contributors