How to fix a problem that spawns over more than two thousand files without spending days with a word processor? CLI to the rescue.

Having a JAM1 stack with a static generator CMS2 under VC3 in Git with automated CI/CD4 through a global CDN5 is great for many reasons (speed, security, scalability…) but what happens when a new version decides to change which characters are accepted in the build and which ones are deemed invalid? Then you have a problem that in my case spawn over more than two thousand files.

I’m not nearly as much of a CLI6 master as my friends Alvaro or Santiago. But I love to learn new tricks, and get a kick out of the power of the terminal.

In this case I had to locate a number of “offending characters” (left overs from a previous blog engine migration, that handled the text encoding differently) in thousands of files, and then do a “replace with” in all of them at once.

What did I do? Thanks to tutorials by CLI Magic, Winaero, Linuxize, and Maketecheasier, I put together the following.

First, identify which files were causing trouble. After all, if it was just one or two, I could fix it manually. So I run (in the directory containing all my blog posts):

grep -iRl "&#"

Here are the options: -i - ignore text case -R - recursively search files in subdirectories -l - show file names instead of file contents portions

The reason why I used “&#” is because of the offending text encoding always included that partial string.

Once I realized we were talking about over two thousand files, I decided to use the CLI in order to substitute the offending strings. I won’t list them all, so I don’t give away clues and vulnerabilities, but the general command I used is:

find . -type f -exec sed -i 's/…/.../g' {} +

And, just like magic, in the blink of an eye, all the offending occurrences of … were turned into ....

That is, in essence, what lies at the heart of modern software and data transformation. Unix, still so brilliant after over half a century.

  1. JavaScript, APIs, and Markup ↩︎

  2. Content Management System ↩︎

  3. Version Control ↩︎

  4. Continuous Integration / Continuous Delivery ↩︎

  5. Content Delivery Network ↩︎

  6. Command Line Interface ↩︎