Millions of police records are accidentally deleted — what are the lessons?
In March this year, eight million police files were deleted in Dallas, Texas. Files that contained photos, videos, audio and case notes for criminal cases opened before July 2020.
So how does something like this happen? And what does it mean for the 175,000 Dallas County criminal court cases that have been impacted?
Since the incident occurred, these are questions officials have been trying to answer. At the end of September 30, The City of Dallas released its report on the incident. The report identified the following:
- In March 2021, 22 Terabytes (TB) of police files were deleted by an IT staff member
- 14.49TBs were recovered, but the remaining 7.51TBs were unrecoverable
- A subsequent audit found that another 13TB (or 14.6 million files) had been deleted previously from another server and could not be restored.
The report found the IT team was attempting to move content from its cloud servers back to on-premises servers due to cost constraints. Because these files contained many images and videos, they took up a lot of space in the cloud, which gets expensive. The City of Dallas had initially estimated that its Azure storage would cost them about $60k a year, but it ended up being $1.8 million — and was actually only handling about 7% of their IT processes.
The data in scope of the migration back to the on-premises network was considered ‘archival', not operational. However, the Police have subsequently identified more than 1,000 ‘priority' criminal case files containing data they are now scrambling to find alternate copies of. So, the data certainly did have continuing value.
But the business case rationale in this scenario was entirely driven by cost. The published report found the leadership team did not pause to consider the risks or review data migration best practices. They also did not have a recovery plan in the event of something going wrong.
And something did go wrong.
The IT team did not follow the vendor's instructions or standard operating procedures for migrating data and ignored multiple warnings in the interface that data would be lost. The report found that:
“...the majority of the loss has seemingly affected the Family Violence Unit. This data consisted of information gathered by DPD detectives for prosecutable, adjudicated, on-going cases; or general evidence gathered.
The City of Dallas is now searching every system it can for any old copies of parts of the data and re-uploading it into that expensive Azure cloud.
What are the takeaways?
There are three main takeaways from this incident that apply to anyone with responsibility for managing data:
- Archives are not records management systems
- Migration is not a copy/paste exercise; it's always a complex project
- Organisations need the ability to discover their data across their whole environment.
The City of Dallas has engaged contractors to help them search across the network for any possible copies of any of the important records. To do this, the contractors are crafting searches in Office 365, although it's taking an hour to create each search query string.
The good news is there will almost certainly be duplicates of much of this data, all over the network and on people's devices. The bad news is manual searching won't find it all, or find it affordably, and this whole problem started because cost was considered a constraint.
With AI technology, it's possible to rapidly search whole networks in seconds. It's simple to create the taxonomies of terms and run them instantaneously across every system and any data format. In the example above, this means the City of Dallas would know within 24 hours which parts of the lost files had been recovered and which were irretrievably lost.
Rapid data retrieval and risk assessment is illustrated by a recent case in Australia from an organisation experiencing a data breach. In just 24 hours, an AI compliance software was implemented in their network, ran across all their compromised data, and found everything with material risk (including IP addresses, passwords, PII, credit card and passport numbers, and information related to topics for which there are secrecy provisions under law).
Importantly, the organisation's Records Authority was coded and applied over the following 24 hours. This enabled them to see what spilled data should already have been discarded (which helps determine their scope of liability).
These kinds of data confidentiality, integrity and availability breaches happen every day in organisations all over the world. They're not always avoidable, but they can be mitigated, and the level of risk assessed rapidly if the right technology tools are applied.
As the City of Dallas learned the hard way, you need to know what you have and where it is, and protect it, all the time.