AI 每日快讯

AI 每日快讯

AI 产品、模型、开源工具和官方动态的时间流。保留历史记录,按分类、日期和标签继续筛选。

1335历史快讯
80开源工具
80当前结果
06 月 24 日 昨日快讯
06 月 23 日 2026-06-23 快讯
MarkTechPost 官方资讯

MarkTechPost:Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDF…

原文摘要:Datalab released lift, a 9B open-weights vision model that turns PDFs and images into schema-matching JSON. It uses schema-constrained decoding for valid structure and trained abst 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Py…

原文摘要:In this tutorial, we build a multilingual ASR and speech translation pipeline with NVIDIA Canary-1B-v2. We load the model on a GPU-enabled runtime, prepare audio into 16 kHz mono, 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 22 日 2026-06-22 快讯

AWS Machine Learning 动态:Running ComfyUI 工作流 on Amazon SageMaker AI processing jobs

原文摘要:In this post, we walk you through how to deploy ComfyUI 工作流 on Amazon SageMaker AI processing jobs to generate hundreds of high-quality images in a single batch. You learn ho 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

AWS Machine Learning 动态:Embed the world: Multimodal AI for searchable aerial imagery at scale

原文摘要:In this post, we walk through the problem space, our architecture on Amazon Bedrock and Amazon OpenSearch Serverless, the 评测 methodology we built on OpenStreetMap ground tr 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

NVIDIA AI 动态 官方资讯

NVIDIA AI 动态:NVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory

原文摘要:Mission, Vision and Veritas — new Los Alamos National Laboratory (LANL) supercomputers to be built with HPE and NVIDIA — are tapping NVIDIA Vera CPUs to accelerate scientific disco 来源:NVIDIA AI 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 20 日 2026-06-20 快讯
The Decoder 官方资讯

The Decoder:The EU doesn't really know what a deepfake is, and that's becoming a problem for retail

原文摘要:Eurocommerce, the trade association behind Amazon, H&M, and IKEA, wants AI-generated ads exempt from the EU AI Act's transparency rules. The argument: an AI-generated livi 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 19 日 2026-06-19 快讯
The Decoder 官方资讯

The Decoder:Norway bans generative AI tools in elementary schools to protect kids' basic learning skills

原文摘要:Norway is banning generative AI tools in elementary schools starting in late August. Students in grades 1 through 7 won't be allowed to use AI at all; secondary schools wi 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 18 日 2026-06-18 快讯

AWS Machine Learning 动态:Monitor and debug generative AI inference with SageMaker detailed metrics and Insights dashb…

原文摘要:Amazon SageMaker AI provides fully managed real-time inference hosting for machine learning models. You deploy a model to a SageMaker endpoint backed by one or more compute instanc 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Midjourney, known for AI image generation, unveils a full-body ultrasound scanner and its ow…

原文摘要:Rumors about Midjourney hardware have circulated for years, but nobody saw this coming. The AI image startup is building a full-body ultrasound scanner and opening its own 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 16 日 2026-06-16 快讯

AWS Machine Learning 动态:Introducing container caching in Amazon SageMaker AI for faster model scaling

原文摘要:Today, we’re excited to announce container image caching for Amazon SageMaker AI inference, the next major advancement in our faster scaling optimization journey. This speeds up en 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:How to Build a Parsing Pipeline with Docling Parse for Layout-Aware Document Intelligence

原文摘要:In this tutorial, we build a 工作流 that uses Docling Parse to analyze PDF documents at a detailed structural level. We prepare a stable Python environment, handle common Colab d 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 15 日 2026-06-15 快讯

AWS Machine Learning 动态:Introducing Gemma 4 models on Amazon Bedrock

原文摘要:Today, we are announcing the availability of the Gemma 4 family on Amazon Bedrock. Built by Google DeepMind and released under the Apache 2.0 license, Gemma 4 is a family of open-w 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

NVIDIA Developer 动态:Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models

原文摘要:Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it... 来源:NVIDIA 开发者 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 14 日 2026-06-14 快讯
The Decoder 官方资讯

The Decoder:Microsoft Research's Mirage gives video generation a persistent spatial memory that doesn't …

原文摘要:Mirage, a video world model from Microsoft Research and several universities, stores scene information directly in latent space instead of pixel-based point clouds. That s 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 13 日 2026-06-13 快讯
The Decoder 官方资讯

The Decoder:New AI model called "Count Anything" does exactly what it says, and that's harder than it so…

原文摘要:"Count Anything" is intended to be the first AI model capable of counting objects in any type of image, from crowds to cell samples under a microscope, using nothing more 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 12 日 2026-06-12 快讯

NVIDIA Developer 动态:Deploy Long-Context Reasoning and Agentic 工作流 with MiniMax M3 on NVIDIA Accelerated In…

原文摘要:As enterprise AI adoption scales, 开发者 are increasingly forced to stitch together fragmented pipelines—separate models for text, vision, and... 来源:NVIDIA 开发者 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:Zyphra Release Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-…

原文摘要:Zyphra has released Zamba2-VL, a family of open vision-language models at 1.2B, 2.7B, and 7B parameters. The models use a hybrid Mamba2 state-space and Transformer backbone, shippi 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:A Coding Implementation on MONAI for End-to-End 3D Spleen Segmentation Using UNet on Medical…

原文摘要:In this tutorial, we build an end-to-end 3D medical image segmentation pipeline using MONAI to segment the spleen on the Medical Segmentation Decathlon Task09 dataset. We work with 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 10 日 2026-06-10 快讯
The Decoder 官方资讯

The Decoder:Google's new open model DiffusionGemma generates text from noise instead of word by word

原文摘要:Google released DiffusionGemma, a 26-billion-parameter model that generates text not token by token but through diffusion, similar to how image AI turns noise into a pictu 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp…

原文摘要:Software development has changed. Engineers no longer type most code by hand. They describe intent, and AI agents do the work. Modern tools plan tasks, edit across files, run tests 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 09 日 2026-06-09 快讯

NVIDIA Developer 动态:Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise…

原文摘要:As AI infrastructure scales, enterprise expectations for operational maturity are increasing. Organizations expect these systems to be provisionable,... 来源:NVIDIA 开发者 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering…

原文摘要:Gemini 3.5 Live Translate streams speech-to-speech translation across 70+ languages. It generates audio continuously, staying a few seconds behind the speaker. The model reaches de 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Google's Gemini 3.5 Live Translate delivers real-time voice translation across 70+ languages

原文摘要:Google releases Gemini 3.5 Live Translate, an audio model for real-time translation across more than 70 languages. The system translates continuously without waiting for a 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 08 日 2026-06-08 快讯
The Decoder 官方资讯

The Decoder:Intel gets a second life as Google and Nvidia explore it as a TSMC backup for AI chips

原文摘要:Google has ordered more than three million AI chips from Intel for 2028. Nvidia is testing Intel's manufacturing tech for its upcoming Feynman architecture. Both moves com 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Microsoft Research's Lens proves detailed captions matter more than raw scale for training e…

原文摘要:Microsoft Research presents Lens, a text-to-image model with just 3.8 billion parameters that matches much larger rivals on 评测, at a fraction of the training cost. 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

AWS Machine Learning 动态:Evaluate your Amazon Nova Sonic voice agent at scale, no microphone required

原文摘要:In this post, we walk you through the Nova Sonic Test Harness, an 开源 framework that we built to solve both problems. It serves as a rapid iteration tool for tuning system 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

InfoQ AI ML Data Engineering:Gemma 4 12B Enables On-Device, Multimodal Agentic 工作流 with an Encoder-free Architectur…

原文摘要:Google says Gemma 4 12B is "designed to bring agentic, multimodal intelligence directly to your laptop", further noting that the new model can be combined with Google AI Edge to "b 来源:InfoQ AI ML Data Engineering。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class F…

原文摘要:Microsoft AI has released MAI-Transcribe-1.5, the second iteration of its in-house speech-to-text family. The model covers 43 languages, adds keyword (entity) biasing for domain-sp 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 06 日 2026-06-06 快讯
The Decoder 官方资讯

The Decoder:New open-source voice model listens nonstop and decides every 0.4 seconds whether to speak o…

原文摘要:Unlike GPT-4o or Qwen3.5-Omni, Audio Interaction doesn't wait for a recording to end: it translates, transcribes, chats, and picks up everyday noises like coughing in a si 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent

原文摘要:Alibaba's Qwen team has released Qwen3.7-Plus, a multimodal agent model that combines visual perception, GUI operation, and coding in a single agent loop. In a demo, an ag 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 05 日 2026-06-05 快讯

InfoQ AI ML Data Engineering:Article Series: Securing the AI Stack: From Model to Production

原文摘要:This series provides your roadmap for the machine age, exploring how to move from vulnerable prototypes to resilient systems through layered defense, robust MLOps, and integrated g 来源:InfoQ AI ML Data Engineering。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 04 日 2026-06-04 快讯
MarkTechPost 官方资讯

MarkTechPost:Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights

原文摘要:Miso Labs has released MisoTTS, an open-weights 8B text-to-speech model. It uses residual vector quantization (RVQ) to scale its sonic range without scaling parameters, and conditi 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 03 日 2026-06-03 快讯
The Decoder 官方资讯

The Decoder:Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16 GB of RAM

原文摘要:Google Deepmind's Gemma 4 12B is an open-source model that processes text, images, and audio natively and runs on laptops with just 16 GB of RAM. It nearly matches the twi 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio tha…

原文摘要:Gemma 4 12B feeds vision and audio straight into the LLM backbone, running locally under an Apache 2.0 license. The post Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multi 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Ideogram 4.0 drops as an open-weight model with native 2K resolution and improved text rende…

原文摘要:Ideogram releases version 4.0 of its text-to-image model as an open-weight model with native 2K resolution, bounding box control, and improved text rendering. On the Desig 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

NVIDIA AI 动态 官方资讯

NVIDIA AI 动态:NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicle…

原文摘要:At CVPR, NVIDIA is unveiling new physical AI agent skills that help researchers and 开发者 speed the development of autonomous vehicles, robots and vision AI systems. The core 来源:NVIDIA AI 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Build 2026: Microsoft tops Google in image generation while playing catch-up on reasoning

原文摘要:At Build 2026, Microsoft announced seven new AI models developed in-house, including its first reasoning model. The company also introduced a new tuning method and an auto 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

06 月 02 日 2026-06-02 快讯

MIT Technology Review AI:Rehumanizing global health care with agentic AI

原文摘要:The global health care sector is under increasing strain. Decades of chronic underinvestment and constraints in recruitment have coincided with a surge in demand for services for 来源:MIT Technology Review AI。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

Qwen3.7-Plus:阿里通义千问多模态智能体模型上线百炼平台

一句话结论:阿里 Qwen 团队发布 Qwen3.7-Plus,这是一个具备视觉理解、深度推理、工具调用和自主迭代能力的多模态智能体模型,已在百炼平台上线。原始信息明确:该模型不仅能理解图像和视频,还新增了自我编程和工具调用功能,标志着从单一语言模型向全能型智能体的进化。为什么值得关注:多模态与自主迭代能力的结合意味着模型可以主动调用外部工具、编写代码并自我修正,大幅扩展了 AI 在复杂任务中的应用边界。影响谁:使用阿里云百炼平台的企业开发者、AI 应用构建者,以及需要视觉理解与自动化推理能力的行业用户。下一步验证:登录百炼平台,在模型列表中查找 Qwen3.7-Plus,尝试上传图片或视频,测试其视觉问答和工具调用功能。

06 月 01 日 2026-06-01 快讯

AWS Machine Learning 动态:Transforming rare cancer research with Amazon Quick: Integrating biomedical databases for br…

原文摘要:In this post, we walk through how to use Amazon Quick Research to integrate biomedical data sources for rare cancer research. The walkthrough uses pediatric sarcoma as the research 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multim…

原文摘要:MiniMax M3 introduces MiniMax Sparse Attention, a 1M-token context window, and native image, video, and computer use support. The post MiniMax Releases MiniMax M3 with MSA Architec 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders

原文摘要:Chinese AI company MiniMax has released its new model M3. It's billed as the first open-weight model to combine top-tier coding performance, a one-million-token context wi 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open…

原文摘要:Nvidia used GTC Taipei to launch a series of models for robots, autonomous vehicles, and video systems. The centerpieces are the new world model Cosmos 3, a significantly 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

AI 资讯 官方资讯

AI 资讯:AI in video game development: How artificial intelligence is reshaping the industry

原文摘要:A Google Cloud survey found that 90% of 开发者 are already integrating AI into their daily work, and on Steam, 7,818 titles disclosed AI use in 2025 alone, a 681% increase over 来源:AI 资讯。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:OpenAI starts with infrastructure robots but aims for "everyone having a personal robot doin…

原文摘要:OpenAI is building a robotics team again, five years after shutting the division down. The team grew out of the world simulation research program. CEO Sam Altman's long-te 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

NVIDIA Developer 动态:How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo

原文摘要:Developing autonomous vehicle (AV) policies requires bridging an important gap between training and deployment. Vision-language-action (VLA) models that can... 来源:NVIDIA 开发者 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 30 日 2026-05-30 快讯
The Decoder 官方资讯

The Decoder:Terence Tao argues AI could bring division of labor to math for the first time in history

原文摘要:Mathematician Terence Tao describes how AI could reshape math research by enabling division of labor for the first time. Until now, researchers had to master every step th 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 29 日 2026-05-29 快讯
MarkTechPost 官方资讯

MarkTechPost:StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Sear…

原文摘要:StepFun releases Step 3.7 Flash, a 198B MoE model with native vision, 256k context, and Advisor Mode. The post StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Google fixes several bugs in Gemini usage limits that burned through quotas too fast

原文摘要:A bug in Google's Gemini app caused just one or two Omni videos to eat up the entire usage quota. Google has fixed the bug, Ultra members now get twice as many video gener 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

NVIDIA Developer 动态:Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI

原文摘要:AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and... 来源:NVIDIA 开发者 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 28 日 2026-05-28 快讯
The Decoder 官方资讯

The Decoder:Amazon builds its own AI production platform and greenlights three AI animated series for Pr…

原文摘要:Amazon MGM Studios and AWS are launching a "GenAI Creators' Fund" that gives filmmakers money and access to the in-house AI platform "Project Nara." Three animated series 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 27 日 2026-05-27 快讯
The Decoder 官方资讯

The Decoder:China turns its aging camera network into an AI-powered mass surveillance apparatus

原文摘要:China's police are upgrading millions of old surveillance cameras with AI. Manufacturers like Hikvision and Huawei now ship cameras with built-in computer vision and langu 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 26 日 2026-05-26 快讯

Awesome Gemini Omni Guide:Gemini Omni 提示词、API 指南与视频示例合集

一句话结论:Awesome Gemini Omni Guide 是一个精选资源合集,包含 Gemini Omni 的提示词、API 指南和视频示例,覆盖视频生成、编辑、相机控制、风格迁移、文字渲染和多模态工作流。原始信息明确发生了什么:该项目由 EvoLinkAI 在 GitHub 发布,基于 Google Gemini Omni 系列模型,包括 Gemini Omni Flash 等,提供了丰富的 prompt engineering 示例。为什么值得关注:Gemini Omni 是多模态生成模型,但官方文档示例有限;该合集整理了社区最佳实践,帮助开发者快速上手视频生成和多模态应用。影响谁:Gemini API 开发者、AI 视频创作者、多模态应用研究人员。下一步怎么验证或使用:可以浏览提示词示例,选择一个视频编辑提示词在 Gemini API 中测试,观察生成效果。

MarkTechPost 官方资讯

MarkTechPost:Stability AI Releases Stable Audio 3: A Family of Fast Latent Diffusion Models for Audio Gen…

原文摘要:Stability AI has released Stable Audio 3, a family of latent diffusion models for instrumental music and sound effects generation. The release includes open weights for the small a 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:Design a Complete Multimodal RLVR Pipeline with Open-MM-RL, Vision-Language Prompting, Rewar…

原文摘要:In this tutorial, we explore the TuringEnterprises/Open-MM-RL dataset as a practical foundation for multimodal reasoning and reinforcement learning with verifiable rewards. We load 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 24 日 2026-05-24 快讯
MarkTechPost 官方资讯

MarkTechPost:StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RL…

原文摘要:StepFun, the Shanghai-based AI lab, released StepAudio 2.5 Realtime in May 2026 — an end-to-end real-time speech large language model with fully customizable persona capabilities. 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:ByteDance study finds that asking LMMs questions beats making it transcribe text for long do…

原文摘要:ByteDance Seed shows that a 7B model can answer questions on long, image-heavy documents more reliably than much larger models, even when documents are four times longer t 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 23 日 2026-05-23 快讯
05 月 22 日 2026-05-22 快讯
The Decoder 官方资讯

The Decoder:OpenAI launches a ChatGPT Powerpoint plugin and warns it might accidentally delete your cont…

原文摘要:OpenAI brings ChatGPT directly into PowerPoint. A new beta plugin creates presentations from notes, documents, or images and edits existing slides. The add-in is available 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

NVIDIA Developer 动态:Synthesize Realistic 3D Medical Images at Scale to Ship Pre‑Trained Models

原文摘要:High‑quality 3D medical imaging data is the foundation of modern radiology AI, but access to it is often constrained by data scarcity, privacy restrictions,... 来源:NVIDIA 开发者 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 21 日 2026-05-21 快讯
MarkTechPost 官方资讯

MarkTechPost:Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic 工作流 That Runs on as Fe…

原文摘要:Cohere releases Command A+, an open-source 218B Sparse Mixture-of-Experts model consolidating four prior Command A variants into one. It runs on as few as two H100 GPUs at W4A4 qua 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:One Model, Three Modalities: ByteDance Releases Lance for Image and Video Understanding, Gen…

原文摘要:ByteDance's Intelligent Creation Lab has released Lance, an open-source native unified multimodal model that handles image and video understanding, generation, and editing — all wi 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 20 日 2026-05-20 快讯

AWS Machine Learning 动态:Multimodal evaluators: MLLM-as-a-judge for image-to-text tasks in Strands Evals

原文摘要:If you’re building visual shopping, image or document understanding, or chart analysis, you need a way to verify whether your model’s response is actually grounded in the source im 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

AWS Machine Learning 动态:Build real-time voice applications with Amazon SageMaker AI and vLLM

原文摘要:Voice agents, live captioning, contact center analytics, and accessibility tools all depend on real-time speech-to-text, where your application streams audio in and receives transc 来源:AWS Machine Learning 动态。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Stability AI launches Stable Audio 3.0 with up to six-minute tracks and open weights

原文摘要:Stability AI has unveiled Stable Audio 3.0, a new generation of audio models - three of which ship with open weights. The models generate music tracks up to six minutes lo 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

The Decoder 官方资讯

The Decoder:Google pairs its Genie world model with Street View to create explorable AI worlds based on …

原文摘要:Google Deepmind connects its Genie 3 world model to Street View imagery: users drop a pin on a map and get a walkable, AI-generated world based on a real place. Google's S 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6× Tokens Per For…

原文摘要:NVIDIA researchers have released Nemotron-Labs-Diffusion, a language model family that unifies three decoding modes in one architecture. The model supports autoregressive (AR) deco 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

MarkTechPost 官方资讯

MarkTechPost:Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretatio…

原文摘要:Alibaba's Qwen team has released Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that processes audio and video simultaneously. The model covers 60 input lang 来源:MarkTechPost。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。

05 月 19 日 2026-05-19 快讯
The Decoder 官方资讯

The Decoder:Google's I/O announcements: new models, a cloud agent that never sleeps, and a redesigned Ge…

原文摘要:Google used its I/O 开发者 conference to unveil a wave of new AI products. The highlights: a new model called Gemini 3.5 Flash, a multimodal model called Gemini Omni, a 来源:The Decoder。建议继续查看原文,重点核对它影响的工具入口、成本、风险和真实使用场景。