🦀 Frontier AI Update: Digital Companion AI Girlfriend, ChatGPT Agent

New Programming Tool Kiro, Suno V4.5 Vocal Replacement, and More Major Tool Updates.

Jul 19, 2025

Latest advancements in large models:

1. New Breakthrough in OpenAI's Reasoning Capabilities: Its latest experimental reasoning large language model (the specific model is unknown 😊), achieved gold medal-level performance in the 2025 International Mathematical Olympiad (IMO) competition. The model answered in natural language under the same rules as human contestants (such as time limits, no tools).

2. Google / DeepMind's Video Generation Model Veo 3 Officially Released: And provides paid preview access to developers through the Gemini API and Vertex AI. The model supports native audio generation, enabling better integration of visuals and audio. Cost reduction!

3. MirageLSD, the First Live Diffusion (LSD) AI Model: Input any video stream, from a camera or video call to a computer screen or game, and transform it in real-time into any world you desire (<40ms latency). 🐮

https://about.decart.ai/publications/mirage

4. NVIDIA:

Released Canary-Qwen-2.5B: Topped Hugging Face's OpenASR leaderboard

https://huggingface.co/nvidia/canary-qwen-2.5b

Released OpenReasoning-Nemotron series: This is a set of reasoning ability LLMs distilled from the DeepSeek R1 model, including 7B, 14B, and 32B three scales, achieving current best levels in multiple benchmark tests.

Announced Audio-Flamingo 3: Collaborating with Hugging Face to launch a fully open-source large audio language model (LALM) suitable for audio question answering, dialogue, and reasoning.

Empowering Other Models: The trillion-parameter open-source model Kimi K2 released for the Dark Side of the Moon now supports the Muon optimizer; the newly released Phi-4-mini-flash-reasoning model by Microsoft is also trained on its GPUs.

4. Microsoft CollabLLM Award: Its CollabLLM project, designed to improve collaboration between large language models and users, received the ICML 2025 Outstanding Paper Award.

5. Hume AI Released Speech-to-Speech Model EVI 3: This model not only mimics voices but also mimics speaking styles and languages, achieving the functionality of "personality cloning."

......

Latest AI tools and platform advancements:

1. OpenAI:

Launched ChatGPT Agent: Released a unified agent system ChatGPT Agent, which can complete complex tasks for users using its own computer, combining browser operations, web research, and conversational capabilities.

Upgraded Advanced Voice: Updated the advanced voice feature to make conversations more natural and smooth, and has been rolled out to all paid and free users.

Improved API functionality: Enhanced the image generation capabilities in the API, maintaining higher fidelity when editing fine details such as faces and logos; at the same time, increased the limit on structured output (Structured Outputs) to support larger data patterns.

Educational Cooperation: Collaborating with the American Federation of Teachers to launch the "National AI Teaching Academy," aiming to help 400,000 teachers better use and teach AI in schools within five years.

2. Google / DeepMind:

Launched "Deep Search": Introduced the "Deep Search" feature in the AI mode of Google Search, utilizing the Gemini 2.5 Pro model for in-depth exploration of complex questions.

Released ARC-AGI-3 Benchmark: Released the developer preview version of the next-generation benchmark test ARC-AGI-3 for evaluating general artificial intelligence, including games, APIs, and establishing competition prizes.

Upgrade Code Assist: Added a new "Proxy Pattern" to the code assistance tool Code Assist, which can analyze the entire codebase to handle multi-file tasks.

Launch NotebookLM Featured Notebooks: In collaboration with the well-known journal The Economist, we have launched a featured notebook that analyzes annual important trends.

VSCode Launches AI Programming Tool Traycer: A VSCode extension that is also an intelligent proxy

https://traycer.ai/

3. Amazon Launches AgentCore: At the AWS Summit, Amazon announced a new feature called AgentCore, a set of building blocks designed to help users deploy AI agents into production environments in a secure, scalable, and flexible way.

4. Amazon Unveils Kiro: A New AI Programming Tool, This is an experimental, agent-driven integrated development environment (IDE) aimed at revolutionizing the software development process. I really love the interface design, feels better than Cursor. The latest version is free, but the official download link is down (probably due to too many downloads), requiring a whitelist. Fortunately, I had downloaded it in advance and can use Claude 4, which is fantastic.

Cover image for Introducing Kiro – An AI IDE That Thinks Like a Developer

5. Moonshot AI's Kimi Launches Kimi Playground: Similar to a smart assistant function, it can call various tools through conversational interactions.

https://platform.moonshot.cn/playground

6. Suno launches new version v4.5+:

“Add Vocals”: This feature allows users to upload their own instrumental tracks (or use accompaniment generated by Suno), then input lyrics to add AI-generated vocals to the music. This is a revolutionary feature for users without professional recording equipment.

“Add Instrumentals”: Conversely, users can upload their own vocal recordings or a cappella audio, then have Suno generate complete instrumental accompaniment based on text prompts.

“Inspire”: This feature analyzes the playlists created by users to understand their music preferences and, based on this, creates new songs.

7. AI Girlfriend Goes Viral, Soon After Jackywine Team Recreated Grok's AI Girlfriend Project, Officially Released Their Latest Digital Companion App "Bella". Highly Personalized, Capable of Emotional Perception.

https://github.com/Jackywine/Bella

8. Other Tools and Platforms:

Runway: Launched the next-generation motion capture model Act-Two, significantly improving generation quality and tracking capabilities.

AssemblyAI: Introduced a super-fast, super-accurate streaming speech-to-text model designed for voice agents, Universal Streaming.

Midjourney: Announced that it is exploring an open enterprise-level API to facilitate integration by other companies and services.

Ideogram: Made major upgrades to its 3.0 version, enhancing image realism, style diversity, and prompt following capabilities.

Anthropic: Increased the rate limit for its Claude Sonnet 4 model on the API.

Cursor: Announced that its platform now supports the Grok 4 model.

MistalAi launched LeChat: Catching up with ChatGPT

https://chat.mistral.ai/chat

......

Latest AI in medical advancements:

1. OpenMed Project:

The OpenMed project was announced on Hugging Face, releasing over 380 advanced medical Named Entity Recognition (NER) models at once, all freely available under the Apache 2.0 license.

https://huggingface.co/blog/MaziyarPanahi/open-health-ai

.....

Autonomous driving:

Waymo:

Milestone: Announced that the cumulative mileage of its fully autonomous driving fleet on public roads has exceeded 100 million miles (approximately 160 million kilometers).

Business Expansion: Announced the expansion of its autonomous driving taxi service in Austin, Texas, USA, where users can call through the Uber platform.

Maodi's AI Newsletter

Discussion about this post