Build Powerful AI Agents
with Small Language Models
Optimized for SLMs. 5-10x faster with complexity routing, automatictask decomposition, multi-agent orchestration, and vLLM.
1from effgen import Agent, load_model2from effgen.tools.builtin import Calculator, PythonREPL34# Load a small but mighty model5model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")67# Create agent with tools8agent = Agent(name="math_agent", model=model, tools=[Calculator(), PythonREPL()])910result = agent.run("What is 24344 * 334?")11print(f"Answer: {result.output}") # 8,130,896Everything You Need to Build
Production-Ready AI Agents
Optimized for Small Language Models with enterprise-grade features
Intelligent Task Decomposition
Automatically breaks down complex tasks with multi-dimensional complexity analysis and spawns specialized sub-agents.
- Automatic complexity scoring
- Sub-agent routing
- Parallel execution
Universal Tool Integration
14 built-in tools with full MCP, A2A, and ACP protocol support. Create custom tools in minutes.
- 14 tools (search, code, JSON, weather...)
- Plugin system for custom tools
- MCP/A2A/ACP protocols
SLM-Optimized Prompts
Advanced prompt engineering designed for smaller models with Jinja2 templates and few-shot learning.
- Template management
- Context compression
- Chain orchestration
Multi-Agent Orchestration
Coordinate multiple specialized agents with lifecycle management and agent-to-agent communication.
- Task routing
- Shared memory
- A2A protocol
Ultra-Fast vLLM Integration
Native vLLM support delivers 5-10x faster inference. Auto multi-GPU tensor parallelism and PagedAttention.
- 5-10x faster inference
- 60% memory reduction
- Auto multi-GPU support
Production Infrastructure
Docker sandboxed execution, comprehensive logging, state persistence, and enterprise security.
- CI/CD pipelines
- OpenTelemetry tracing
- Prometheus metrics
Real Token Streaming
True token-by-token streaming via generate_stream() with callbacks for thoughts, tool calls, and answers.
- Token-by-token output
- SSE API endpoint
- Streaming callbacks
Agent Presets
One-line agent creation with ready-to-use configurations. Math, research, coding, general, and minimal.
- create_agent() factory
- 5 built-in presets
- CLI --preset flag
Integrated Memory System
Short-term, long-term, and vector memory connected to every agent. Persistent multi-turn context.
- ShortTerm + LongTerm memory
- Vector store (FAISS/Chroma)
- Auto-summarization
14 Tools Ready to Use
From web search to code execution — everything your agent needs, built in
Calculator
Math, conversions, statistics
WebSearch
DuckDuckGo search
CodeExecutor
Sandboxed code execution
PythonREPL
Interactive Python
FileOperations
File read/write/search
Retrieval
RAG + BM25 hybrid search
AgenticSearch
ripgrep-based search
BashTool
Shell commands
WeatherTool
Open-Meteo API (free)
JSONTool
Parse, query, validate JSON
DateTimeTool
Timezones, date arithmetic
TextProcessing
Regex, word count, diff
URLFetchTool
Web page text extraction
WikipediaTool
Wikipedia article search
One-Line Agent Creation with 5 Presets
Ready-to-use configurations optimized for common use cases
math
Mathematical computations
create_agent("math", model)research
Web research & information
create_agent("research", model)coding
Code execution & development
create_agent("coding", model)general
All 11 tools for any task
create_agent("general", model)minimal
Direct inference, no tools
create_agent("minimal", model)Up and Running in 60 Seconds
Three simple steps to your first AI agent
Install
Get started with pip. Includes vLLM for blazing-fast inference.
pip install effgen[vllm]Create Agent
One-line agent creation with built-in presets and all 11 tools.
from effgen import load_model
from effgen.presets import create_agent
model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")
agent = create_agent("general", model) # All 11 tools includedExecute Tasks
Run tasks with real-time token streaming and tool execution.
result = agent.run(
"Analyze the latest tech trends and "
"create a comprehensive report"
)See effGen in Action
Real-world examples showcasing the power and versatility
Code Assistant
Generate, execute, and debug code with sandboxed CodeExecutor and PythonREPL. Real-time streaming.
Research Agent
Comprehensive research using WebSearch, Wikipedia, and URLFetch. Detailed reports with citations.
Data Analysis Pipeline
Load, clean, and analyze data with PythonREPL and FileOps. Create visualizations and insights.
Multi-Agent System
Coordinate specialized agents for complex workflows. A2A protocol with streaming callbacks.
Weather & JSON Pipeline
Fetch real-time weather data, process JSON responses, and format reports using WeatherTool and JSONTool.
RAG Knowledge Base
Build a knowledge base with document loaders, chunking strategies, and hybrid search (vector + BM25).
Join Our Growing Developer Community
Connect with developers building the future of AI agents
GitHub
Star the repo, contribute code, report issues, and stay updated with the latest releases.
Connect with us professionally, follow company updates, and network with the team.
Twitter/X
Follow us for the latest updates, announcements, tips, and community highlights.
Discord
Join our active community, get help, share projects, and discuss the future of AI agents.
Want to learn more about our community?
Visit our dedicated community page for resources, events, and contribution guidelines.
Explore CommunityReady to Build the
Future of AI?
Join thousands of developers building next-gen agents with effGen. Open source, production-ready, and blazing fast.