Powershell - Find All File and Folder Names with Non-ASCII Characters

Share on:

Despite us being a UTF-8 world file and folder names with non-latin chars can still trip processes up.

I was digging through my MP3 collection, around 420,000 tracks ripped from a spareroom that's just filled to the brim with CDs, and I found that auto-naming had set the file and folder names with some odd characters that made it hard to rename and locate files. Names like Händel are fine, no issue with those as I expect those but when you get something like this [ \The Mission\Godʼs Own Medicine ] it doesn't look wrong but note the apostrophe is not a standard ASCII one but one from another part of the wonderful UTF-8 selection.

So how to trace them down quickly, Powershell is always my go-to scripting language on Windows, it is simply the Swiss Army Knife of Windows admin work tools, or Schweizermesser as they say in German speaking countries.

Scan up the folder structure and pull out the names that have chars that fall beyond the standard simple, original ASCII set.

Long version for scripting:

1Get-ChildItem -Path "M:\GWJ\MUSIC\" -Recurse | ForEach-Object {
2    $currFile = $_
3
4    if ( $currFile.Name -cmatch "[^\u0000-\u007F]" ) {
5        $currFile.
6        Write-Host "FOUND:[$($currFile.FullName)]"
7    }
8}

This can be shortened to this if you need a single command verison:

1Get-ChildItem -Path "M:\GWJ\MUSIC\" -Recurse | Where-Object { $_.Name -cmatch "[^\u0000-\u007F]" }

Alternative:

1Get-ChildItem -Path "M:\GWJ\MUSIC\" -Recurse | Where-Object { $_.Name -cmatch "[^\x20-\x7F]" }