Hugging Face
Coverage
OpenAI's Privacy Filter is a 1.5B-parameter model designed to detect and redact personally identifiable information (PII) from text and images. The release includes tools for document exploration, image anonymization, and text redaction, all built on the Gradio framework.
A technical guide explaining how to implement local AI features in a Chrome extension using Transformers.js and Manifest V3. It details the architecture for a background service worker, side panel UI, and content scripts to run models directly in the browser.
This article demonstrates a Vision-Language-Action (VLA) implementation using Gemma 4 on an NVIDIA Jetson Orin Nano Super. The setup integrates speech-to-text, the Gemma 4 model, and text-to-speech to create a system capable of autonomous decision-making based on visual and auditory context.
NVIDIA has released Nemotron-Personas-Korea, a dataset of 6 million synthetic personas grounded in official South Korean demographic statistics. The dataset is designed to help developers build demographically accurate AI agents while remaining compliant with Korean privacy laws.
Hugging Face has introduced a new Skill and test harness to facilitate porting language models from transformers to mlx-lm. The initiative aims to support contributors and reviewers in an era where AI code agents are increasingly capable of submitting pull requests.
This article provides a technical guide on finetuning the Qwen3-VL-Embedding-2B model for Visual Document Retrieval (VDR). It demonstrates how domain-specific finetuning can significantly improve retrieval performance compared to general-purpose base models.
Hugging Face has introduced VAKRA, a new executable benchmark designed to evaluate AI agents' ability to perform multi-step reasoning and tool use in enterprise environments. The benchmark tests compositional reasoning across thousands of APIs and diverse domains to identify specific failure modes in agentic workflows.
This article explains the functionality and implementation of multimodal embedding and reranker models using Sentence Transformers. It details how these models map different modalities like text, images, and audio into a shared space for tasks like cross-modal search and RAG.
Hugging Face is transitioning the Safetensors project to the PyTorch Foundation to ensure broader community governance. Safetensors was developed to provide a secure, zero-copy alternative to the risky pickle-based formats used in model weight storage.
Hugging Face introduces Falcon Perception, a 0.6B-parameter early-fusion Transformer designed for open-vocabulary grounding and segmentation. The post also details the release of Falcon OCR, a high-throughput 0.3B-parameter model for document processing.
Hugging Face has introduced gradio.Server, a new way to build custom frontends using frameworks like React or Svelte while leveraging Gradio's backend infrastructure. This allows developers to create complex, highly interactive web applications that still benefit from Gradio's queuing, ZeroGPU, and hosting capabilities.
IBM has released Granite 4.0 3B Vision, a compact multimodal model designed for enterprise document processing. The model specializes in table extraction, chart understanding, and semantic key-value pair extraction using a modular LoRA adapter architecture.
Hugging Face has released TRL v1.0, a post-training library designed to handle the rapidly evolving landscape of AI model refinement. The library supports over 75 methods, including PPO and DPO, with a focus on stability and ease of use in a shifting research environment.
Hugging Face provides instructions for migrating OpenClaw, Pi, and Open Code agents to open-source models. Users can choose between using Hugging Face Inference Providers for hosted access or running models locally for privacy and control.
Researchers have introduced EVA, a new end-to-end evaluation framework designed to assess both the accuracy and conversational experience of voice agents. The framework uses a bot-to-bot architecture to score how well agents handle multi-turn spoken interactions across different domains.
Hugging Face and NVIDIA have released a tutorial and pipeline for building domain-specific embedding models using synthetic data. The process allows users to significantly improve model performance on specialized datasets using a single GPU in less than a day.
Hugging Face reports significant growth in its open-source ecosystem, reaching 13 million users and over 2 million models. The report highlights a shift from passive consumption to active creation of fine-tuned models and datasets.
H Company has released Holotron-12B, a multimodal model optimized for high-throughput computer-use agents. The model utilizes a hybrid State-Space Model (SSM) and attention architecture to improve inference efficiency and performance in interactive environments.
Hugging Face has introduced Storage Buckets, a new mutable, S3-like object storage solution designed for ML artifacts. The system uses Xet technology to provide efficient, chunk-based deduplication for training checkpoints and large datasets.
The article analyzes 16 open-source reinforcement learning libraries to address the efficiency gap between model inference and training. It highlights the importance of disaggregating inference and training to prevent GPU idle time and compares various orchestration and weight synchronization methods.
This article explains Ulysses Sequence Parallelism, a technique for distributing attention computation across multiple GPUs to enable million-token context training. It details how the protocol is integrated into the Hugging Face ecosystem, including Accelerate and Transformers Trainer.
Hugging Face has released LeRobot v0.5.0, which introduces full support for the Unitree G1 humanoid robot and advanced whole-body control. The update also adds new policies like Pi0-FAST, streaming video encoding, and integration with NVIDIA IsaacLab-Arena.
Hugging Face has introduced Modular Diffusers, a new way to build more flexible and composable diffusion pipelines. This update allows developers to build custom workflows by manipulating individual blocks like text encoding and denoising, and integrates with the Mellon visual interface.
The article details a 24-hour training speedrun of a text-to-image diffusion model using a specific set of architectural optimizations. The authors demonstrate how to achieve high-quality results with a limited compute budget and are open-sourcing the training code and framework.
Hugging Face has announced that Georgi Gerganov and the team behind GGML and llama.cpp are joining the platform. This partnership aims to provide long-term sustainable resources for the development of local AI inference and model definition.
Hugging Face and Unsloth are offering free credits and Pro subscriptions to help users fine-tune small language models like LFM2.5-1.2B-Instruct. The process utilizes Unsloth to significantly reduce training time and VRAM usage, making efficient model training accessible for on-device deployment.
Hugging Face announced new capabilities for Gradio's gr.HTML component, allowing for custom templates, scoped CSS, and JavaScript interactivity. This enables developers and LLMs to generate complex, interactive web applications within a single Python file.
Researchers developed an agent skill that enables AI coding agents to write production-ready CUDA kernels for diffusers and transformers. The method packages domain-specific expertise into the agent to automate the complex task of hardware-level optimization and PyTorch integration.
Meta and Hugging Face have introduced OpenEnv, an open-source framework designed to evaluate AI agents in real-world environments rather than simulations. The framework uses a standardized API to test how agents handle complex tasks like temporal reasoning and multi-agent coordination using real tools like calendars and browsers.
Hugging Face has released Transformers.js v4, which introduces a new WebGPU runtime rewritten in C++. This update enables high-performance, local execution of AI models across browsers and server-side environments like Node, Bun, and Deno.
Hugging Face has introduced SyGra Studio, a tool designed to help users build, configure, and debug AI-driven workflows. The platform allows for seamless integration with various model endpoints and data sources while providing visual debugging and monitoring capabilities.
Hugging Face is introducing a decentralized evaluation system to address the gap between benchmark scores and real-world performance. The new system allows the community to submit results via pull requests and uses verified badges to ensure reproducibility and transparency.
H Company has released the Holo2-235B-A22B Preview, a new model specialized in UI localization and element detection. The model utilizes agentic localization to achieve state-of-the-art performance on the ScreenSpot-Pro and OSWorld benchmarks.
This article explores the evolution and future trajectory of China's open-source AI ecosystem following the DeepSeek R1 release. It examines how Chinese AI organizations are utilizing open-source models, papers, and infrastructure to drive large-scale global deployment.
Hugging Face shares insights from their experimental logbook on training efficient text-to-image models from scratch. The post details how various training techniques and architectural choices impact model convergence, speed, and representation learning.
Hugging Face has introduced Daggr, a tool designed to help developers build and visualize AI application workflows. It allows users to define workflows in Python while providing a visual canvas to inspect and rerun individual steps in a pipeline.
Hugging Face introduces a process for using high-end models like Claude to generate 'agent skills' for smaller, open-source models. The demonstration focuses on teaching smaller models how to write complex CUDA kernels to improve performance on specialized tasks.
LinkedIn researchers explore the use of the GPT-OSS model for agentic reinforcement learning training. The post details how to optimize models for multi-step workflows and tool-calling capabilities using the verl framework.
