Tanziro

Designer + data tinkerer. I make apps that disappear into your workflow so you can get more done.

10/06/2026

Aishwarya Srinivasan's framing of where AI architecture stands in 2026:

2024 = RAG era
2025 = Agents era
2026 = Stateful Orchestration era

LangGraph 1.2, released May 11, 2026, makes this the default model: an agent run is a durable graph ex*****on, not a Python function call.

The fundamental unit is the StateGraph. Nodes read from and write to a single typed state object — the system's working memory. Edges are deterministic or conditional. Every transition auto-checkpointed via Memory, SQLite, or Postgres, making pause/resume, time-travel debugging, and horizontal scaling first-class features.

The practical difference from chains: chains pass outputs between steps. Graphs maintain and evolve shared state. That changes what you can reliably build.

DWG · AI-04 · tanziro.com

09/06/2026

LightMover, from Adobe Research at CVPR 2026, reframes what it means to edit light in an image.

The standard assumption: to move a light, you need the 3D scene. LightMover skips all of that.

The reframe: light editing is a sequence-to-sequence prediction problem in visual token space. Give it an image + light-control tokens specifying position, color, intensity. It returns the correctly-lit result — propagated shadows, reflections, falloff — from a single view.

The technical win: adaptive token pruning that preserves spatially informative tokens while compactly encoding non-spatial attributes. Reduces control sequence length by 41% with no loss in fidelity.

One pass. No re-render. No 3D reconstruction. Real physical consequences from a single image.

DWG · AI-03 · tanziro.com

09/06/2026

CVPR 2026 Best Paper goes to D4RT — and the architecture is worth understanding.

Dynamic 4D Reconstruction and Tracking. A single transformer, built by Chuhan Zhang at Google DeepMind with UCL and Oxford.

Feed it ordinary video. Get three things out simultaneously:
— Depth maps (per-frame geometry)
— 3D tracks (spatio-temporal correspondence)
— Camera poses (full parameters)

Those three outputs used to require three separate specialist networks. D4RT does all three in one encoder-decoder pass.

Speed: 18× to 300× faster than prior SOTA.

If you build spatial interfaces, AR, or video-driven interactions — the "one model handles all geometry" era just started.

DWG · AI-02 · tanziro.com

09/06/2026

Type III error doesn't show up in your loss curve.

It's not a false positive or a false negative. It's correct math applied to the wrong question entirely.

Cassie Kozyrkov coined this: Decision Intelligence exists to reduce Type III error.

Ex*****on is never the bottleneck. Problem selection is.

DWG · AI-01 · tanziro.com

09/06/2026

ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), released at Microsoft Build 2026, converts plain-language behavior specs into executable eval pipelines.

Feed it a written description of how your agent should behave. It generates:
- Stratified test scenarios across declared dimensions
- Full trace recordings including tool use and intermediate decisions
- Scored labels with rationales and failure patterns
- A CI-compatible scorecard that gates each deployment

MIT-licensed. Framework-agnostic: LangChain, CrewAI, AutoGen, OpenAI Agents SDK, DSPy, LlamaIndex, Semantic Kernel — all supported via LiteLLM.

github.com/responsibleai/ASSERT

09/06/2026

Ideogram 4.0 (June 3, 2026) is a 9.3B Diffusion Transformer trained exclusively on JSON-structured captions — not natural language prompts.

You specify:
- Bounding boxes → where each element sits on the canvas
- Hex palette → exact color conditioning
- Per-element text and style strings → what it says and how it looks

0.97 on X-Omni English OCR. 2K native resolution. Native alpha channel output — no background removal step needed.

First in quality mode among all open-weight text-to-image models. 9th overall, behind only closed-source OpenAI and Google models.

Relevant for: generative design pipelines, marketing automation, and any workflow where text-in-image accuracy matters.

ideogram.ai/news/ideogram-4.0

08/06/2026

SmolDocling packs a 93M SigLIP encoder + 135M SmolLM-2 into a 256M-parameter model that converts full document pages end-to-end.

The DocTags format gives every element — tables, equations, code blocks, charts, headers — a structured tag with bounding box and reading-order position. One inference call handles it all.

Performance on conversion benchmarks matches models up to 27x larger. 0.35s per page. 0.489 GB VRAM on an A100.

IBM's successor model, granite-docling-258M, is now the actively maintained version on HuggingFace.

Paper: arxiv 2503.11576 · huggingface.co/ds4sd/SmolDocling-256M-preview

08/06/2026

OpenClaw-RL from Gen-Verse (ICML 2026) turns every conversation into a training loop.

The system wraps your self-hosted LLM as an OpenAI-compatible API, intercepts multi-turn exchanges, and treats every next-state signal as reward — optimizing the policy silently while you keep using the model.

No annotation pipeline. No reward datasets. Every correction you type becomes gradient.

The same framework covers terminal agents, GUI agents, SWE agents, and tool-call agents via the Open-AgentRL companion repo.

Paper: arxiv 2603.10165 · github.com/Gen-Verse/OpenClaw-RL

08/06/2026

Computer use agents just went fully local — and this matters more than it sounds.

Hcompany dropped Holo3.1 on June 2, 2026: the first production-grade computer use model family to ship quantized checkpoints (FP8, NVFP4, Q4 GGUF) built for fully local inference.

4 model sizes: 0.8B, 4B, 9B, and 35B-A3B. The 35B model runs on a 12GB GPU at 140ms per action. The 4B runs on even lighter hardware.

Performance: AndroidWorld jumps from 67% to 79.3% on the 35B. Even the 4B and 9B go from 58% to 72%. That's not a benchmark win — that's a deployment story.

No cloud. No API keys. No per-call cost. Your computer use agent runs on-device.

huggingface.co/Hcompany ↗

08/06/2026

The first fully open omnimodal world model is here.

NVIDIA Cosmos 3 (June 1, 2026) pairs two towers in a mixture-of-transformers: a VLM reasoner that interprets images, video, and text — and a generation expert that produces physics-aware video, sound, and robot action trajectories.

One model. Every modality. Both directions: understanding and generation.

Ranked best open-source Text-to-Image and Image-to-Video by Artificial Analysis. Best policy model by RoboArena.

research.nvidia.com/labs/cosmos-lab/cosmos3

DWG · AI-07 · tanziro.com

Address

Radhanagar, Sreemangal
Maulvi Bazar
3210

Website

https://x.com/tanz1r, https://medium.com/@tanzir71, https://huggingface.co/tanziro, https://ganges.quest/

Alerts

Be the first to know and let us send you an email when Tanziro posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Shortcuts

Want your business to be the top-listed Media Company in Maulvi Bazar?

Tanziro

10/06/2026

09/06/2026

09/06/2026

09/06/2026

09/06/2026

09/06/2026

08/06/2026

08/06/2026

08/06/2026

08/06/2026

Address

Website

Alerts

Shortcuts

Share

Category