AI Research Assistant with PUNKU.AI's DeepResearch Workflow

AI Research Assistant with PUNKU.AI's DeepResearch Workflow

Apr 11, 2025

AI Research Assistant with PUNKU.AI's DeepResearch Workflow

Building a Powerful AI Research Assistant with PUNKU.AI's DeepResearch Workflow

TL;DR: The DeepResearch workflow in PUNKU.AI integrates Claude with powerful web browsing capabilities, specialized knowledge tools, and adaptive search strategies to create a comprehensive research assistant. Using Firecrawl for web exploration, along with Wikipedia, Wikidata, arXiv, and Yahoo Finance, this workflow delivers detailed, source-attributed research on any topic.

Introduction

In today's information-rich world, conducting thorough research requires navigating vast amounts of data across multiple sources. Traditional research methods often fall short in terms of efficiency and comprehensiveness. PUNKU.AI's DeepResearch workflow addresses this challenge by creating an AI-powered research assistant that combines sophisticated web browsing capabilities with access to specialized knowledge sources.

This blog post will explore how the DeepResearch workflow is constructed, examine its components, and demonstrate how it can be leveraged for comprehensive research tasks. We'll dive into the technical architecture that powers this workflow and show how PUNKU.AI's visual programming approach simplifies the creation of complex AI applications.

Visual Representation of the Workflow

The diagram above illustrates the DeepResearch workflow's structure, with the Deep Research Agent at the center orchestrating various tools to process user queries and deliver comprehensive research results.

Component Breakdown

Core Components

1. Deep Research Agent

The Deep Research Agent is the central coordinator of the workflow, powered by Claude (Anthropic) with special configuration to handle research tasks.

"model_name": "claude-3-7-sonnet-20250219",
"temperature": 0.1,
"max_tokens": 5000,
"max_iterations": 40

This agent orchestrates the research process by:

  • Breaking complex questions into smaller, researchable components

  • Selecting the appropriate tools for each research task

  • Managing the iterative research process with configurable depth (default: 7 iterations)

  • Synthesizing findings into comprehensive, well-structured responses

  • Providing proper source attribution for all information

2. Chat Input & Output Components

  • Chat Input: Receives user queries and passes them to the research agent

  • Chat Output: Displays formatted results to the user, including all sources

3. System Prompt

The system prompt provides detailed instructions to the agent on how to conduct research, evaluate sources, and format responses:



Web Exploration Tools

1. Tavily AI Search

This component provides an AI-optimized search engine specifically designed for LLMs and RAG applications:

  • Supports basic and advanced search depths

  • Configurable results limit and time range

  • Option to include images and summary answers

  • Structured output with URLs, titles, and content


2. Firecrawl Components

The workflow includes four specialized Firecrawl components that work together to provide comprehensive web browsing capabilities:

  • FirecrawlMapApi: Maps website structure to identify relevant pages

    • Creates site maps for systematic navigation

    • Identifies content relationships within domains

    • Supports sitemap and subdomain exploration options

  • FirecrawlCrawlApi: Crawls entire websites for comprehensive content exploration

    • Follows links up to specified depth

    • Handles crawler options like depth and link following

    • Configurable timeout settings (default: 3000ms)

  • FirecrawlScrapeApi: Extracts content from specific URLs

    • Retrieves clean, structured content from web pages

    • Formats output as markdown for consistent presentation

    • Focuses on main content while filtering navigation elements and ads

  • FirecrawlExtractApi: Performs targeted extraction of specific information

    • Extracts structured information using schemas

    • Uses natural language prompts to guide extraction

    • Supports web search integration for additional context

Knowledge Base Tools

1. Wikipedia Component

Provides encyclopedic knowledge on a wide range of topics:

"lang": "en",
"k": 4,
"doc_content_chars_max": 4000
  • Configurable language selection (default: English)

  • Adjustable result count (default: 4 articles)

  • Content length management with character limits

  • Returns both structured data and formatted text

2. Wikidata Component

Accesses structured data about entities and concepts:

  • Returns entity information with labels, descriptions, and identifiers

  • Provides unique entity IDs (Q-numbers) for reliable reference

  • Includes concept URIs and Wikidata page URLs

  • Useful for identifying specific entities and their properties

Specialized Research Tools

1. arXiv Component

Searches and retrieves academic research papers:

"search_type": "all",
"max_results": 10
  • Searches papers by title, abstract, author, or category

  • Returns comprehensive metadata including abstracts, authors, and publication dates

  • Provides direct links to PDF downloads and journal references

  • Configurable result limit (default: 10 papers)

2. Yahoo Finance Component

Accesses financial data and market information:

"method": "get_news",
"num_news": 5
  • Retrieves stock data using various methods (info, news, financial statements)

  • Supports 25+ data retrieval methods including earnings reports, dividends, and SEC filings

  • Configurable news article count for news retrieval

  • Structured output with titles, links, and content

Workflow Explanation

Step-by-Step Execution Flow

  1. User Input Processing:

    • The workflow begins when a user submits a research question through the Chat Input component

    • The query is passed to the Deep Research Agent along with the system prompt and available tools

  2. Research Planning:

    • The agent analyzes the query and breaks it down into specific research components

    • It determines an appropriate research strategy including which tools to use and in what sequence

  3. Iterative Research Process:

    • The agent conducts research in multiple iterations (configurable depth)

    • For each iteration, it follows a systematic process:

      • SEARCH phase: Uses appropriate search tools (Tavily, Firecrawl) to find relevant sources

      • EXTRACT phase: Extracts content from identified sources

      • ANALYZE phase: Analyzes gathered information and identifies knowledge gaps

      • PLAN phase: Determines the next search focus based on analysis

  4. Information Synthesis:

    • After completing the research iterations, the agent synthesizes all findings

    • It organizes information logically and creates a comprehensive response

    • All sources are properly attributed with URLs

  5. Response Generation:

    • The final research results are formatted with clear structure and headings

    • Sources are included in a dedicated section with proper citation format

    • The response is displayed to the user through the Chat Output component

Data Transformations

The workflow performs several key data transformations:

  1. Query → Search Results:

    • User query is transformed into multiple search queries across different tools

    • Search results are returned as structured data with URLs and metadata

  2. URLs → Content:

    • URLs from search results are used to extract full content

    • Content is cleaned, formatted, and structured for analysis

  3. Content → Insights:

    • Raw content is analyzed to extract key facts, data points, and concepts

    • Analysis identifies patterns, relationships, and knowledge gaps

  4. Insights → Comprehensive Answer:

    • Individual insights are synthesized into a coherent, comprehensive response

    • Information is organized with clear structure and logical flow

    • All sources are properly attributed


Key Mechanisms

1. Adaptive Search Strategy

The DeepResearch workflow employs an adaptive search strategy that selects the most appropriate tools based on the research topic and iteratively refines the search focus:

# Simplified logic for tool selection
if topic_is_academic:
    primary_tool = "arXiv"
elif topic_is_financial:
    primary_tool = "YahooFinance"
else:
    primary_tool = "TavilySearch"

2. Source Tracking

The workflow maintains meticulous tracking of all sources, ensuring that every piece of information can be traced back to its origin:

# Source tracking mechanism (simplified)
for content in extracted_content:
    source_url = content.get("source")
    if source_url and source_url not in state["all_sources"]:
        state["all_sources"].append(source_url)

3. Deep Research Loop

The iterative research process is managed through a structured loop that continues until sufficient information is gathered or the maximum depth is reached:

# Deep research loop (simplified)
while state["current_depth"] < state["max_depth"]:
    state["current_depth"] += 1
    search_results = await search_web(query)
    extracted_content = await extract_from_urls(search_results)
    analysis = await analyze_and_plan(extracted_content)
    
    if not analysis["should_continue"]:
        break
        
    query = analysis["next_search_topic"]

Use Cases & Applications

The DeepResearch workflow can be applied to a wide range of research scenarios:

1. Academic Research

Researchers can use the workflow to:

  • Conduct comprehensive literature reviews across multiple sources

  • Identify key papers and research findings on specific topics

  • Discover connections between different research areas

  • Stay updated on recent developments in their field

Adaptation: Increase the priority of arXiv search and modify the system prompt to emphasize academic citation standards.

2. Market Intelligence

Business analysts can leverage the workflow to:

  • Research industry trends and market dynamics

  • Analyze competitor strategies and positioning

  • Monitor financial performance of companies and sectors

  • Track news and developments affecting specific markets

Adaptation: Prioritize Yahoo Finance and news sources, and adjust the system prompt to focus on business insights.

3. Due Diligence

Investors and legal professionals can utilize the workflow for:

  • Comprehensive background checks on companies and individuals

  • Verification of claims and statements

  • Identification of potential risks or issues

  • Discovery of connections and relationships

Adaptation: Add specialized databases and enhance the extraction capabilities for specific types of information.

4. Technical Documentation

Developers and technical writers can benefit from:

  • Gathering comprehensive information on technical topics

  • Compiling documentation from multiple sources

  • Identifying best practices and solutions to technical challenges

  • Staying informed about emerging technologies

Adaptation: Add GitHub and technical documentation sites as specialized tools, and adjust the system prompt to prioritize code examples and technical details.

5. Content Creation

Content creators can use the workflow to:

  • Research topics thoroughly before creating content

  • Gather diverse perspectives and viewpoints

  • Ensure factual accuracy with proper source attribution

  • Identify interesting angles and insights for their content

Adaptation: Modify the output format to align with content creation needs and enhance the system prompt to emphasize engaging presentation.

Optimization & Customization

Improving Performance

  1. Adjust Research Depth:

    • Increase max_research_depth for more thorough research (default: 7)

    • Decrease for faster but less comprehensive results

    • Example: "max_research_depth": 10 for extremely thorough research

  2. Optimize Content Extraction:

    • Adjust content_char_limit to control the amount of text extracted from each source

    • Default: 16000 characters

    • Example: "content_char_limit": 8000 for faster processing with less context

  3. Configure URLs per Search:

    • Modify urls_per_search to control how many URLs are processed in each iteration

    • Default: 5 URLs

    • Example: "urls_per_search": 10 for broader coverage in each iteration

  4. Adjust Model Parameters:

    • Optimize temperature based on research needs (lower for factual research, higher for creative exploration)

    • Adjust max_tokens to control response length

Customizing for Specific Domains

  1. Specialized Research Agents: Modify the system prompt to create domain-specific research agents:

  1. Tool Prioritization: Adjust the configuration to prioritize domain-relevant tools:

# For financial research
financial_tools = ["YahooFinance", "FirecrawlExtractApi"]
  1. Custom Extraction Schemas: Define specialized extraction schemas for specific types of information:

"schema": {
  "type": "object",
  "properties": {
    "companyName": {"type": "string"},
    "revenue": {"type": "string"},
    "growthRate": {"type": "string"},
    "marketPosition": {"type": "string"}
  }
}
  1. Output Format Customization: Modify the system prompt to specify domain-appropriate output formats:


Technical Insights

Architecture Design Patterns

The DeepResearch workflow exemplifies several important architectural patterns:

  1. Tool Orchestration Pattern:

    • The Deep Research Agent acts as an orchestrator for a diverse set of tools

    • Each tool is specialized for specific types of information retrieval

    • The agent dynamically selects and applies the appropriate tools based on the research context

  2. Iterative Refinement Pattern:

    • Research is conducted through multiple iterations

    • Each iteration builds on previous findings and addresses identified knowledge gaps

    • The process continues until sufficient information is gathered or the maximum depth is reached

  3. Hierarchical Processing Pattern:

    • Information is processed through progressive stages of abstraction:

      • Raw content → Structured data → Key insights → Comprehensive synthesis

    • Each stage transforms the data into more valuable and usable forms


Innovative Approaches

The workflow incorporates several innovative approaches to research:

  1. Deep Research Algorithm: The core algorithm combines iterative exploration with systematic analysis:

async def deep_research(self, topic: str, max_depth: int = 7):
    state = {
        "findings": [],
        "summaries": [],
        "key_insights": [],
        "all_sources": [],
        "current_depth": 0,
        "max_depth": max_depth
    }
    
    while state["current_depth"] < state["max_depth"]:
        state["current_depth"] += 1
        
        # SEARCH phase
        search_results = await self.search_web(query)
        
        # EXTRACT phase
        extracted_content = await self.extract_from_urls(search_results)
        state["findings"].extend(extracted_content)
        
        # ANALYZE phase
        analysis, raw_analysis = await self.analyze_and_plan(state["findings"], topic)
        
        # Update state with analysis results
        state["summaries"].append(raw_analysis)
        state["next_search_topic"] = analysis.get("next_search_topic", "")
        
        # Check if we should continue
        if not analysis.get("should_continue", False):
            break
    
    # Final synthesis
    final_analysis = await self.synthesize_findings(state["findings"], state["summaries"], topic)
    
    return {
        "analysis": final_analysis,
        "findings": state["findings"],
        "sources": state["all_sources"]
    }
  1. Source Verification: The workflow implements a sophisticated approach to source tracking and verification:

    • Every piece of information is linked to its source

    • Sources are normalized to prevent duplication

    • Domain extraction provides additional context about source authority

    • Source formatting follows consistent citation standards

  2. Adaptive Tool Selection: The workflow dynamically selects the most appropriate tools based on patterns in the tool names and the research context:

# Tool selection strategy (simplified)
tool_patterns = [
    (lambda name: "tavily" in name.lower(), "Tavily Search"),
    (lambda name: "serp" in name.lower(), "Search Engine"),
    (lambda name: "wiki" in name.lower(), "Wikipedia Search"),
    (lambda name: "firecrawl" in name.lower(), "Firecrawl")
]

for pattern_func, tool_type in tool_patterns:
    for tool in tools:
        if pattern_func(tool.name):
            return tool, tool_type

Conclusion

The DeepResearch workflow in PUNKU.AI represents a powerful approach to AI-assisted research, combining the reasoning capabilities of advanced language models with specialized tools for information retrieval and analysis. By orchestrating these components through a systematic research process, the workflow enables comprehensive exploration of complex topics across multiple sources.

The modular architecture of the workflow allows for customization to specific domains and use cases, making it a versatile solution for researchers, analysts, and content creators. The emphasis on source attribution and structured presentation ensures that the research outputs are not only comprehensive but also credible and usable.

As AI technology continues to evolve, workflows like DeepResearch demonstrate how visual programming environments like PUNKU.AI can simplify the creation of sophisticated AI applications, making advanced capabilities accessible to a wider range of users without requiring deep technical expertise.

By leveraging the power of Claude, Firecrawl web browsing capabilities, and specialized knowledge sources, DeepResearch represents a significant step forward in how we approach information discovery and synthesis in the age of AI.

See PUNKU.AI in action

Fill in your details and a product expert will reach out shortly to arrange a demo.


Here’s what to expect:

A no-commitment product walkthrough 

Discussion built on your top priorities

Your questions, answered

See PUNKU.AI in action

Fill in your details and a product expert will reach out shortly to arrange a demo.


Here’s what to expect:

A no-commitment product walkthrough 

Discussion built on your top priorities

Your questions, answered