Milestone unveils Hafnia traffic VLM & XProtect plug-in
Milestone Systems has launched a vision language model focused on traffic video, alongside a video summarisation plug-in for its XProtect software and an API service for third-party developers.
The company said the model specialises in traffic understanding and uses NVIDIA Cosmos Reason. Milestone said it has fine-tuned the model on video data from Europe and the US.
The release covers two routes to market. One product sits inside the XProtect Smart Client as a plug-in called Video Summarization. The other product offers access to the same model through an API under the name Hafnia VLM as a Service.
Operator workflows
Milestone positioned the Video Summarization plug-in as a response to the volume of footage produced by modern camera systems and the time required for manual review. The company said early reports indicate video summarisation could reduce operator false alarm fatigue by up to 30%.
The plug-in analyses camera footage and generates a text description of what it shows. Users submit a short video snippet and a written prompt. The model then returns a summary.
Milestone said the tool converts video segments into structured text summaries inside XProtect Smart Client. It also allows users to search summaries based on content rather than timestamps or manual tagging.
The company said users can bookmark and filter summaries during review. It also said the plug-in works with existing XProtect event and rule logic. That approach can trigger automated summaries based on alarms or alerts.
Milestone said the product can filter out irrelevant motion or noise and focus attention on valid events. It also said it will offer "sovereign" models per region, starting with the US and EU, with more regions planned.
Milestone said Video Summarization is free to download and installs in a few minutes inside the XProtect Smart Client. Customers pay when they prompt the model.
Developer access
For developers and partners, Milestone has introduced Hafnia VLM as a Service. The company said the service provides API access over HTTPS and targets developers, integrators and partners that want to add video intelligence to applications.
Milestone said the API service targets teams that do not want to set up and manage their own AI systems. The company said it can work with existing solutions, regardless of the analytics already in place.
The company described the model as prompt-driven for traffic-related operations. It also said it provides fine-tuned versions for the US and EU, with more regions planned.
Milestone said VLMaaS uses a pay-per-use pricing model based on API calls and does not require large upfront investments or custom training costs.
Milestone also made a quantitative claim about development effort. It said work on AI and analytics could require "up to 70 times less effort" compared with fine-tuning a vision language model to reach similar results.
Data and compliance
Milestone said the two products use its Hafnia VLM, which it has fine-tuned on 75,000 hours of "responsibly sourced" real-world video data from either Europe or the US. It said it used NVIDIA Cosmos Curator for data preparation.
The company said the system can run on cloud infrastructure or regional data centres. It also said the fine-tuning data has auditable lineage and aligns with GDPR and the EU AI Act.
Milestone named traffic management as a near-term use case. It cited customers in Genoa, Italy, and Dubuque, Iowa, as examples of XProtect users that plan to use the new products for traffic operations.
Milestone framed the launch around reducing time spent on manual review in control rooms and expanding integration options for developers that already work with video systems.
"With the Vision Language Model as a Service and Video Summarization for XProtect, we're tackling some of the most challenging bottlenecks: video overload and time-consuming manual work. Operators get immediate insight directly within XProtect; builders get API‐first access to production‐ready intelligence without bespoke training or heavy infrastructure. Because this model is specialized for real-world traffic video and fine-tuned on responsibly sourced data, customers can trust the results, deploy with confidence, and enhance all existing solutions in place. It's the fastest, most advanced and impactful path to turning video into actionable outcomes," said Andrew Burnett, Interim Chief Technology Officer, Milestone Systems.
Milestone said it will expand regional availability beyond the US and EU for both the plug-in and the API service.