Deleting a lot of data
05 Jan 2022
When supporting Xsans with a lot of data, we occasionally need to remove a lot of data. As the year rolled over, we have created new folders for the 2022 projects and it is time to remove some old data from the production volumes. After confirming the tape archive and nearline volumes have the data, we are clearing up some previous year projects to free up space on production. Our first instinct is to rm -r
the folders, but in some cases using the Trash may make sense.
When we run rm -r
on a large folder, the process first traverses the entire directory and checks that the account that runs the remove has permission to remove the files. When you empty the Trash in the Finder, the Mac calls unlink
on the files which doesn’t check the users access and just starts trying to remove the files immediately. Some details available from this Stack Exchange post.
In some recent deletions we were seeing the Finder removing more than 300 files/sec vs much lower for rm -rv
. That being said, once the Finder got up to around 2,500,000 files deleted some things started acting strangely. We were cleaning up from a SAN connected Mac, but not an MDC. We got some non-sensical notifications from the MDC that volumes were running out of space. For example: “The volume SANVOLUME has only 716.66 TB (50.89%) of space available.” and similar messages for the Preboot, Recovery, and VM volumes. These volumes were all fine. So in the future we might use the Finder and Empty Trash, but in smaller groups. Or we can just be patient with rm -r
.