Simple Project of a RAG system (with React interface) with Ollama and OpenRouter interface via Langchain library
Find a file
2025-10-25 15:26:26 +02:00
rag_logic Updated the hole code to langchain 1.0 2025-10-25 15:25:33 +02:00
.gitignore Add some files to check environment 2025-10-25 15:26:26 +02:00
README.md Initial commit 2025-10-10 16:02:37 +02:00
requirements.txt Add some files to check environment 2025-10-25 15:26:26 +02:00

Ollama OpenRouter LangChain RAG System

A sophisticated Retrieval-Augmented Generation (RAG) system built with FastAPI, LangChain, and support for both local Ollama models and cloud-based OpenRouter models. This system enables intelligent document analysis and question-answering across multiple specialized knowledge domains.

🚀 Features

  • Multi-Service Architecture: Pre-configured RAG services for 11 different academic subjects
  • Dual LLM Support: Works with both local Ollama models and cloud OpenRouter models
  • Document Processing: Advanced PDF processing with relevance filtering and metadata standardization
  • Streaming Responses: Real-time streaming for both Q&A and summarization
  • Vector Database: Powered by Chroma with multilingual embeddings
  • RESTful API: Complete FastAPI-based REST API with automatic documentation
  • Document Relevance Filtering: AI-powered document filtering based on topic relevance
  • Batch Processing: Efficient document processing in configurable batches

📋 Pre-configured Services

The system comes with 11 pre-configured academic services:

  1. Macroeconomia - Macroeconomics
  2. Microeconomia - Microeconomics
  3. Diritto Commerciale - Commercial Law
  4. Bilancio - Accounting/Balance Sheet
  5. Diritto Tributario - Tax Law
  6. Diritto Privato - Private Law
  7. Storia Economica - Economic History
  8. Statistica I - Statistics I
  9. Informatica - Computer Science
  10. Metodi Statistici - Statistical Methods
  11. Marketing - Marketing

🛠 Installation

Prerequisites

  • Python 3.8+
  • Either Ollama (for local models) or OpenRouter API key (for cloud models)

Setup

  1. Clone the repository:
git clone <repository-url>
cd ollama_openrouter_langchain_rag
  1. Install dependencies:
pip install fastapi uvicorn langchain langchain-community langchain-huggingface langchain-openai chromadb pymupdf sentence-transformers
  1. Configure environment variables:

For Ollama (local models):

export OLLAMA_BASE_URL="http://localhost:11434"  # Single or comma-separated URLs
export USE_OPENROUTER="False"

For OpenRouter (cloud models):

export USE_OPENROUTER="True"
export OPENROUTER_API_KEY="your-openrouter-api-key"

Optional:

export ASSEMBLYAI_API_KEY="your-assemblyai-key"  # For audio transcription

🚀 Usage

Starting the Server

cd rag_logic
python rag_app.py

The server will start on http://localhost:8000 with automatic API documentation available at http://localhost:8000/docs.

API Endpoints

Service Management

  • GET /services/status - Get status of all services
  • GET /services/{service_id}/status - Get status of specific service
  • POST /services/{service_id}/initialize - Initialize a service
  • PATCH /services/{service_id}/topic - Update service topic

Document Management

  • POST /services/{service_id}/upload-documents - Upload PDF documents
    curl -X POST "http://localhost:8000/services/1/upload-documents" \
         -F "files=@document.pdf"
    

Question & Answer

  • POST /services/{service_id}/ask - Ask a question (blocking)

    curl -X POST "http://localhost:8000/services/1/ask" \
         -H "Content-Type: application/json" \
         -d '{"question": "What is macroeconomics?"}'
    
  • POST /services/{service_id}/stream-ask - Ask a question (streaming)

Summarization

  • POST /services/{service_id}/summarize - Get document summary (blocking)
  • POST /services/{service_id}/stream-summarize - Get document summary (streaming)

Python Client Example

import requests

# Initialize service
response = requests.post("http://localhost:8000/services/1/initialize")

# Upload documents
files = {"files": open("document.pdf", "rb")}
response = requests.post("http://localhost:8000/services/1/upload-documents", files=files)

# Ask a question
question_data = {"question": "Explain the concept of GDP"}
response = requests.post("http://localhost:8000/services/1/ask", json=question_data)
print(response.json())

🏗 Architecture

Core Components

  1. RAGService (rag_service.py): Core RAG logic with document processing, vector storage, and LLM integration
  2. FastAPI Application (rag_api.py): RESTful API endpoints with async support
  3. Application Runner (rag_app.py): Main application entry point

Key Features

  • Document Processing: Automatic PDF text extraction with metadata standardization
  • Relevance Filtering: AI-powered filtering to include only relevant documents
  • Vector Storage: Persistent Chroma database with multilingual embeddings
  • Streaming Support: Real-time response streaming for better user experience
  • Error Handling: Comprehensive error handling and graceful degradation

Model Support

  • Ollama Models: Support for local models like llama3.2:1b
  • OpenRouter Models: Cloud-based models like deepseek/deepseek-r1:free
  • Multiple Endpoints: Support for multiple Ollama instances via comma-separated URLs

⚙️ Configuration

Environment Variables

Variable Description Default
USE_OPENROUTER Use OpenRouter instead of Ollama False
OPENROUTER_API_KEY OpenRouter API key Required if using OpenRouter
OLLAMA_BASE_URL Ollama base URL(s) http://localhost:11434
ASSEMBLYAI_API_KEY AssemblyAI API key for audio Optional

Service Configuration

Services are configured in DEFAULT_SERVICE_CONFIGS with:

  • Service ID: Numeric identifier (1-11)
  • Topic: Subject matter for the service
  • Model: LLM model to use

Document Processing

  • Batch Size: 100 documents per batch
  • Relevance Threshold: 0.9 (configurable)
  • Embedding Model: sentence-transformers/paraphrase-multilingual-mpnet-base-v2
  • Vector Database: Persistent Chroma storage

📁 Project Structure

ollama_openrouter_langchain_rag/
├── README.md
└── rag_logic/
    ├── rag_app.py              # Main application entry point
    └── lib/
        ├── __init__.py
        ├── rag_api.py          # FastAPI application and endpoints
        └── rag_service.py      # Core RAG service implementation

🔧 Development

Adding New Services

  1. Add configuration to DEFAULT_SERVICE_CONFIGS in rag_service.py
  2. Restart the application
  3. The new service will be automatically available

Customizing Models

Edit the model configuration in DEFAULT_SERVICE_CONFIGS:

"12": {"topic": "New Subject", "model": "your-model-name"}

Extending Document Types

The system supports PDF documents by default. To add support for other formats, extend the document loading logic in RAGService.

📝 API Documentation

Once the server is running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

📄 License

This project is open source. Please check the license file for details.

🆘 Troubleshooting

Common Issues

  1. Model not found: Ensure Ollama is running and the model is pulled
  2. OpenRouter authentication: Verify your API key is correct
  3. Document upload fails: Check file permissions and available disk space
  4. Slow responses: Consider using smaller models or increasing batch sizes

Performance Tips

  • Use local Ollama models for faster responses
  • Adjust batch sizes based on available memory
  • Pre-initialize services at startup for better performance
  • Use SSD storage for vector database persistence

📞 Support

For issues and questions:

  1. Check the API documentation at /docs
  2. Review the troubleshooting section
  3. Create an issue in the repository