[SYS:ONLINE]

[AI:READY]

[v...]

[AGENTS:ACTIVE]

[TOOLS:58+]

[PROMPTS:31]

[PROVIDERS:9]

Introducing effGen — The Future of SLM Agents

BuildPowerfulAIAgents
with Small Language Models

Optimized for SLMs. 5-10x faster with complexity routing, automatic task decomposition, multi-agent orchestration, and vLLM.

Get Started arXiv Paper PyPIv...GitHub Examples

Faster (vLLM)

Built-in Tools

Presets

Inference Backends

Cloud Providers

effgen-terminal● RUNNING

agent_demo.py

AGENT LOOP

Thought→Action→Observation→Answer

SCROLL

Features

Everything You Need to Build
Production-Ready AI Agents

Optimized for Small Language Models with production-grade features

Policy-Based ModelRouter

Compose FirstAvailable, CostBased, and LatencyBased policies over the cloud providers with explainable RouterDecisions and transparent failover.

Cost, latency, and first-available policies
Auto-failover on rate-limit, 5xx, timeout, budget
RouterEvent subscribers · cost CLI

Click for details

14 Inference Backends

Five local engines plus 9 cloud providers: OpenAI, Anthropic, Gemini, Cerebras, Groq, Together, Fireworks, Replicate, and HF Inference.

9 cloud providers with one Agent API
Provider-prefixed model IDs
Streaming + provider-supported tools

Click for details

ProviderRegistry + Doctor

A unified provider registry lists providers, resolves models, catches ambiguous IDs, and checks API key readiness.

list_providers() / list_models()
effgen doctor --json
Unified ModelAuthError

Click for details

Automatic Sub-Agent Routing

Runs complex tasks through Agent sub-agent mode with routing, decomposition, and result synthesis built into the agent loop.

AgentMode.AUTO routing
Built-in decomposition engine
Parallel or sequential sub-agents

Click for details

Multi-Agent Orchestration

Coordinate multiple specialized agents with lifecycle management and agent-to-agent communication.

Team patterns
Shared state
Message bus

Click for details

Ultra-Fast vLLM Integration

Native vLLM support delivers 5-10x faster inference. Auto multi-GPU tensor parallelism and PagedAttention.

5-10x faster inference
PagedAttention memory efficiency
Auto multi-GPU support

Click for details

Universal Tool Integration

58+ local tools plus provider-native tools across OpenAI, Gemini, and experimental Anthropic adapter specs.

58+ local tools (docs, OCR, audio, image, geo, comms, research, news, social, finance, DevOps, RAG…)
OpenAI / Gemini Agent-native tools
Anthropic experimental tool specs

Click for details

Guardrails & Safety

Offline, ML-free guardrails for toxicity, PII, prompt injection, topics, length, and tool safety. Composable chains with four presets.

PII (SSN/email/phone/CC-Luhn), Toxicity, Topic, Length
Prompt-injection detection (low/med/high)
Tool input/output/permission guardrails

Click for details

Get Started

Architecture

How effGen Works

The ReAct agent loop — reasoning and acting in perfect harmony

User Input

Natural language task or query is received by the agent

agent.run("Calculate 24344 * 334")

Click for details

Reasoning

Agent analyzes the task using ReAct-style thinking

Thought: I need to multiply these numbers...

Click for details

Tool Selection

Best tool is selected from 58+ built-in options

Action: Calculator(24344 * 334)

Click for details

Execution

Tool runs in a sandboxed environment with safety controls

Executing Calculator...

Click for details

Observation

Agent observes and validates the tool output

Observation: 8130896

Click for details

Final Answer

Synthesized response returned to the user

Answer: 8,130,896

Click for details

Steps 2-5 repeat until the task is complete (max_iterations configurable)

Built-in Tools

58+ Tools Ready to Use

From finance, data science, DevOps, and academic research to news, social, translation, QR codes, OCR, audio transcription, image analysis, document parsing (PDF/DOCX/Excel), geo/weather, and email/webhook communication — everything your agent needs, built in. 14 new tools landed in v0.2.6.

🧮

COMPUTATION

Calculator

Perform mathematical calculations, evaluate expressions, and convert units

expressionoperationfrom_unitto_unitprecision

⚡

CODE

CodeExecutor

Execute code in a secure sandboxed environment (Python, JS, Bash)

codelanguagetimeoutmemory_limitnetwork_enabledfilesenv_vars

🐍

CODE

PythonREPL

Execute Python code in a persistent REPL session

codesession_idreset_sessionreturn_variablesrestricted_mode

🔍

INFO

WebSearch

Search the web using DuckDuckGo, SerpAPI, or Google

querynum_resultsbackendtime_rangelanguageregion

🌐

INFO

URLFetchTool

Fetch webpage content and extract readable text

urlextract_links

📋

DATA

JSONTool

Parse, query (JSONPath), validate, and format JSON data

dataoperationquery

📁

FILES

FileOperations

Safe file system operations: read, write, search, convert

operationpathcontentformatencodingpatternrecursivetarget_format

💻

SYSTEM

BashTool

Execute shell commands with security controls

command

Agent Presets

One-Line Agent Creation with 9 Presets

Ready-to-use configurations optimized for common use cases. v0.2.6 adds media and notify; v0.2.8 adds multimodal.

🧮

math

Mathematical computations

temp

0.3

itr: 8

create_agent("math", model)

CLICK FOR DETAILS

🔬

research

Web · academic · news · social · video · docs

temp

0.5

itr: 10

create_agent("research", model)

CLICK FOR DETAILS

💻

coding

Code execution & development

temp

0.4

itr: 12

create_agent("coding", model)

CLICK FOR DETAILS

🚀

general

All 32 general-purpose tools

temp

0.7

itr: 10

create_agent("general", model)

CLICK FOR DETAILS

📖

rag

Retrieval-augmented Q&A over your docs

temp

0.3

itr: 8

create_agent("rag", model,
  knowledge_base="./docs/")

CLICK FOR DETAILS

🎞️

media

Audio transcription + vision captioning

temp

0.3

itr: 8

create_agent("media", model)

CLICK FOR DETAILS

📢

notify

Email + Slack + Discord notifications

temp

0.3

itr: 6

create_agent("notify", model)

CLICK FOR DETAILS

🖼️

multimodal

Image, audio & video understanding

temp

0.3

itr: 10

create_agent("multimodal", model)

CLICK FOR DETAILS

⚡

minimal

Direct inference, no tools

temp

0.7

itr: 1

create_agent("minimal", model)

CLICK FOR DETAILS

Quick Start

Up and Running in 60 Seconds

Three practical steps to your first local or cloud-backed AI agent. Expand the advanced section for budget-aware routing, thinking/grounding/caching, and persistent cost tracking.

Step 01

Step 02

Step 03

Install

One command for the core; add an extra only when you need that provider.

bash

pip install -U effgen

# Optional extras
pip install "effgen[vllm]"       # local CUDA throughput
pip install "effgen[cerebras]"   # Cerebras
pip install "effgen[groq]"       # Groq

Set Keys & Check

Export the keys you have, then run the doctor to confirm what's wired up.

bash

export OPENAI_API_KEY="..."
export GROQ_API_KEY="..."
export CEREBRAS_API_KEY="..."

effgen doctor               # table of providers + key status
effgen doctor --json        # machine-readable

Create an Agent

Same Agent API across local SLMs and any registered cloud provider.

python

from effgen import load_model
from effgen.presets import create_agent

# Cloud
model = load_model("gpt-5.4-nano", provider="openai")
# model = load_model("groq:llama-3.3-70b-versatile")

# Or local
# model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")

agent = create_agent("general", model)
print(agent.run("What is (17 * 23) + sqrt(144)?").output)  # 403