Technical Report · 2026

Let Your Data Build and Improve Itself

DataEvolver is an autonomous synthetic data construction framework where goal-driven loop agents orchestrate the full pipeline — from text-to-image generation through 3D reconstruction to scene-aware rendering — and iteratively refine output quality via VLM feedback loops.

View on GitHub How It Works Read Paper

350

Training Pairs

Unique Objects

Atomic Actions

Pipeline Stages

pipeline/agent_loop.py

# Goal-driven loop agent core
class EvolutionAgent:
    def run_loop(self, scene, obj):
        for round in range(self.max_rounds):
            render = self.blender_render(scene, obj)
            review = self.vlm_review(render)

            if review.verdict == "keep":
                return render  # quality met

            action = self.agent_decide(
                review.text,
                self.action_space
            )
            self.apply_action(scene, action)

        return render  # best effort

The Problem

Why Goal-Driven Agents Are the Missing Piece

Naive automated rendering produces artifacts — flat lighting, color shifts, floating objects. Traditional pipelines rely on rigid scoring rules that lack semantic understanding. What's needed are goal-driven agents that can perceive, diagnose, and act.

Naive Renders Have Artifacts

Auto-rendered 3D objects often exhibit flat lighting, implausible shadows, and color mismatches with the scene environment.

Rule-Based QC Lacks Semantics

Rigid numeric thresholds can't diagnose "this lighting feels flat" or "the object appears to float." Goal-driven agents with VLM perception can.

Manual Tuning Can't Scale

Human artists spend minutes per object adjusting Blender parameters. At 50+ objects with 8+ viewpoints, this becomes infeasible.

Pipeline

The Goal-Driven Data Construction Pipeline

From a natural language seed concept to quality-verified rendered pairs — fully automated by goal-driven loop agents, no human intervention required.

Text Expansion

LLM expands seed into T2I prompt

→

T2I Generation

Qwen-Image-2512 at 1024×1024

→

2.5

Segmentation

SAM3 extracts RGBA foreground

→

3D Reconstruction

Hunyuan3D-2.1 textured mesh

→

Scene Rendering

Blender 4.24 + Cycles 512spp

→

VLM Review Loop

Review → act → re-render until keep

Core Innovation

Goal-Driven Loop Agents: Perceive, Diagnose, Act, Repeat

The heart of DataEvolver: a goal-driven loop agent that perceives rendered outputs via VLM review, diagnoses semantic issues, selects targeted rendering adjustments from a structured action space, and repeats until quality goals are met.

Blender Render

Cycles 512spp at 1024×1024

→

VLM Review

Qwen3.5-35B free-form critique

→

Agent Decision

Reads review, selects from 24 actions

→

Quality Gate

Verdict: keep / revise / reject

Loop continues until reviewer says keep

Anti-Oscillation Control

Sign-flip tracking, dead-zone detection, and step-scale scheduling prevent infinite loops and parameter thrashing.

Scene-Aware Rendering

Objects placed in real Blender scenes with HDRI environments. Raycast ground detection ensures physical plausibility.

24 Atomic Actions

Structured action space across lighting, object transform, scene environment, and material property groups.

Dataset

DataEvolver-Rotate: View-Controlled Rotation Editing

A benchmark dataset for rotation-conditioned image editing. Each sample pairs a canonical front-view image with a target view specified in natural language.

Unique Objects

Viewpoints / Object

350

Training Pairs

3×A800

Infrastructure

          
          
          
        
Python — Load Dataset

import json
from pathlib import Path
from PIL import Image

root = Path("dataset_scene_v7_full50_rotation8_...")
rows = []
with (root / "pairs/train_pairs.jsonl").open("r") as f:
    for line in f:
        rows.append(json.loads(line))

row = rows[0]
source = Image.open(root / row["source_image"]).convert("RGB")
target = Image.open(root / row["target_image"]).convert("RGB")
instruction = row["instruction"]
      

Gallery

Scene-Aware Rendering Showcase

Best VLM-gated renders across 7 diverse Blender scenes. Each object is automatically placed, lit, and iteratively refined by goal-driven loop agents.

Indoor (4.blend)

Cordless Screwdriver

obj_1539 · score 0.623 · Round 0

Indoor (4.blend)

Nightstand

obj_1560 · score 0.609 · Round 0

Indoor (4.blend)

Folding Table

obj_1575 · score 0.471 · Round 1

Indoor (4.blend)

Pickup Truck

obj_1530 · score 0.592 · Round 0

Desert Highway

Toaster Oven

obj_1552 · score 0.578 · Round 2

Desert Highway

Table Lamp

obj_1531 · score 0.477 · Round 0

Coastal Bridge

Park Bench

obj_1528 · score 0.574 · Round 0

Coastal Bridge

Parking Meter

obj_1549 · score 0.564 · Round 0

Coastal Bridge

Sledgehammer

obj_1506 · score 0.577 · Round 0

Coastal Bridge

Rocking Chair

obj_1511 · score 0.576 · Round 3

Industrial Yard

Retro Telephone

obj_1546 · score 0.591 · Round 0

Industrial Yard

Concrete Barrier

obj_1567 · score 0.565 · Round 2

Neon Wetland

Espresso Machine

obj_1564 · score 0.552 · Round 0

Neon Wetland

Traffic Cone

obj_1543 · score 0.518 · Round 1

Neon Wetland

Electric Drill

obj_1578 · score 0.447 · Round 2

Neon Wetland

Cement Mixer Truck

obj_1536 · score 0.600 · Round 1

Misty Forest

Desk Stapler

obj_1534 · score 0.601 · Round 0

Misty Forest

Ceramic Teapot

obj_1569 · score 0.572 · Round 2

Misty Forest

Street Bench

obj_1555 · score 0.523 · Round 2

Urban Concrete

Rolling Office Chair

obj_1572 · score 0.549 · Round 0

Urban Concrete

Sewing Machine

obj_1558 · score 0.567 · Round 2

Action Space

24 Structured Atomic Actions

The AI agent selects from a discrete, structured action space to address VLM-identified issues. Each action targets a specific rendering parameter.

Key Light Intensity

×1.2 / ×0.8 multiplicative, bounded [0.5, 2.0]

Key Light Rotation

±15° yaw step, bounded [-90°, 90°]

Env Light Intensity

×1.2 / ×0.8 multiplicative, bounded [0.5, 2.0]

Env Rotation (Z)

±30° step, bounded [-180°, 180°]

Object Elevation

±0.02 step, bounded [-0.1, 0.1]

Material Roughness

±0.08 step, bounded [-0.3, 0.6]

+ 18 more actions — see scene_action_space.json

Why DataEvolver

Key Differentiators

What sets DataEvolver apart from other synthetic data pipelines and why goal-driven loop agents produce better training data.

VLM-as-Feedback, Not Score

Free-form natural language feedback provides semantic diagnosis that numeric scores cannot. The reviewer identifies why a render fails.

AI Agent Autonomy

The agent reads raw review text and reasons about which action to take — no scripted rule engine, no score-threshold mapping.

Proven Downstream Value

LoRA fine-tuning on DataEvolver-Rotate improves Qwen Image Edit 2511 on PSNR, SSIM, and LPIPS vs. the base model.

Multi-Modal Data Support

Beyond RGB pairs: mask, depth, normal maps, and geometry metadata — enabling multi-modal conditioning research.

Tech Stack

Built With

The models, frameworks, and infrastructure powering the DataEvolver pipeline.

Python 3.10+ Blender 4.24 Cycles Path Tracing PyTorch 2.8 Qwen-Image-2512 SAM3 Hunyuan3D-2.1 Qwen3.5-35B-A3B Qwen-Image-Edit-2511 DiffSynth-Studio LoRA (PEFT) 3×A800 80GB

Get Started

Quick Start

DataEvolver runs on a Linux server with GPU access and Blender. Clone the repo and start building self-improving data pipelines.

          
Shell — Setup

git clone https://github.com/PRIS-CV/DataEvolver.git
cd DataEvolver

# Explore the pipeline
ls pipeline/     # 6-stage data synthesis
ls configs/      # Action space, scene templates
ls scripts/      # Agent monitor, dataset builders

# Read the full documentation
cat CLAUDE.md     # Comprehensive project guide

Cite

Citation

If you use DataEvolver or DataEvolver-Rotate in your research, please cite our work.

          
          
          
        
BibTeX

@misc{zhang2026dataevolverletdatabuild,
  title        = {DataEvolver: Let Your Data Build and Improve
                  Itself via Goal-Driven Loop Agents},
  author       = {Qisong Zhang and Wenzhuo Wu and Zhuangzhuang Jia
                  and Yunhao Yang and Huayu Zhang and Xianghao Zang
                  and Zhixiang He and Zhongjiang He and Kongming Liang
                  and Zhanyu Ma},
  year         = {2026},
  eprint       = {2605.01789},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI},
  url          = {https://arxiv.org/abs/2605.01789}
}
      

Let Your Data Build and Improve Itself

Why Goal-Driven Agents Are the Missing Piece

Naive Renders Have Artifacts

Rule-Based QC Lacks Semantics

Manual Tuning Can't Scale

The Goal-Driven Data Construction Pipeline

Goal-Driven Loop Agents: Perceive, Diagnose, Act, Repeat

Anti-Oscillation Control

Scene-Aware Rendering

24 Atomic Actions

DataEvolver-Rotate: View-Controlled Rotation Editing

Scene-Aware Rendering Showcase

24 Structured Atomic Actions

Key Light Intensity

Key Light Rotation

Env Light Intensity

Env Rotation (Z)

Object Elevation

Material Roughness

Key Differentiators

VLM-as-Feedback, Not Score

AI Agent Autonomy

Proven Downstream Value

Multi-Modal Data Support

Built With

Quick Start

Citation

Ready to Build Self-Improving Data?