Build a sophisticated research system that can discover, analyze, and synthesize information from multiple sources automatically.
This tutorial demonstrates how to create an advanced research pipeline that can process thousands of sources, identify key insights, and generate comprehensive reports. The system uses parallel processing and sophisticated analysis techniques to handle large-scale research tasks.
Research Query ↓ ┌─────────────┐ │ Source │ │ Discovery │ └─────────────┘ ↓ ┌─────────────┬─────────────┬─────────────┐ │ Web Search │ Academic │ News/Blog │ │ Agent │ Agent │ Agent │ └─────────────┴─────────────┴─────────────┘ ↓ ↓ ↓ ┌─────────────┬─────────────┬─────────────┐ │ Content │ Content │ Content │ │ Extractor │ Extractor │ Extractor │ └─────────────┴─────────────┴─────────────┘ ↓ ↓ ↓ └─────────────┼─────────────┘ ↓ ┌─────────────┐ │ Analysis │ │ Agent │ └─────────────┘ ↓ ┌─────────────┐ │ Synthesis │ │ Agent │ └─────────────┘ ↓ ┌─────────────┐ │ Report │ │ Generator │ └─────────────┘
"You are a research source discovery agent. Given a research topic, identify the best types of sources to search: academic papers, news articles, industry reports, government data, expert blogs, etc. Generate specific search queries for each source type and prioritize them by relevance and reliability."
Searches general web sources using search APIs and web scraping.
Searches academic databases and research repositories.
Searches news sources and industry publications.
"You are a content extraction agent. Extract the main content from web pages, PDFs, and documents. Remove navigation, ads, and irrelevant content. Preserve important metadata like publication date, author, and source. Structure the content for analysis."
"You are a research analysis agent. Analyze extracted content for key insights, trends, and patterns. Identify supporting evidence, contradictions, and gaps in the research. Categorize findings by theme, credibility, and relevance to the research question. Extract quantitative data and statistics when available."
"You are a research synthesis agent. Combine analyzed findings into coherent insights. Identify consensus views, conflicting perspectives, and emerging trends. Create a structured summary with key findings, supporting evidence, and confidence levels. Highlight areas needing further research."
Store research findings in the memory bank for future reference and cross-research insights.
{ "content": "Key finding or insight", "metadata": { "research_topic": "AI trends 2024", "source_type": "academic", "credibility": "high", "date_found": "2024-01-15", "supporting_sources": ["source1.com", "source2.edu"], "confidence_level": 0.85, "category": "machine_learning" } }
Google Search API: Web search results
arXiv API: Academic papers
News API: Current news articles
Wikipedia API: Background information
Web Scraping: Extract page content
PDF Parser: Process research papers
RSS Reader: Monitor news feeds
Data APIs: Government/industry data
{"prompt": "Research AI trends in healthcare", "expected_result": "Comprehensive analysis covering ML diagnostics, robotic surgery, drug discovery", "quality_score": 0.9} {"prompt": "Find information about renewable energy adoption", "expected_result": "Data on solar/wind growth, policy impacts, cost trends", "quality_score": 0.85} {"prompt": "Analyze cryptocurrency market trends", "expected_result": "Price analysis, regulatory developments, adoption metrics", "quality_score": 0.8}
def reward_fn(completion, **kwargs): response = completion[0].get('content', '') expected = kwargs.get('expected_result', '') quality_score = kwargs.get('quality_score', 0.5) # Check for comprehensive coverage key_topics = expected.lower().split(', ') coverage_score = sum(1 for topic in key_topics if topic in response.lower()) / len(key_topics) # Check for evidence and sources evidence_indicators = ['study', 'research', 'data', 'according to', 'source'] evidence_score = min(1.0, sum(1 for indicator in evidence_indicators if indicator in response.lower()) / 3) # Check for analysis depth analysis_words = ['trend', 'pattern', 'insight', 'implication', 'conclusion'] analysis_score = min(1.0, sum(1 for word in analysis_words if word in response.lower()) / 3) # Combine scores with quality weighting final_score = (coverage_score * 0.4 + evidence_score * 0.3 + analysis_score * 0.3) * quality_score return final_score
Key findings and recommendations in 2-3 paragraphs
Sources searched, analysis methods, confidence levels
Structured insights with supporting evidence and source citations
Identified trends with quantitative data where available
Actionable insights and areas for further research
Set up agents to continuously monitor sources and update research as new information becomes available.
Use MotteMB to identify connections between different research topics and build knowledge graphs.
Extend the pipeline to search and analyze sources in multiple languages for global research coverage.