Web Scraping Images and Renaming Multiple Files

Imran Bangash
3 min readDec 5, 2021

SED commands | Image files | CLI | Remove/Replacing spaces and characters

Often we need to rename multiple files with simple commands without writing a script. Stream EDitor provides a powerful interface to accomplish this task. Before going into details we will web scrape images with random names from a fictional bookstore [ link ]. Important to mention! to run the script, create a folder with the name imagesFolder or change the script. The code is provided here

The above script will down images with source names. It is possible to rename the images while downloading but for the sake of the experiment we keep the original names and then follow some steps to rename the images.

Image from the author

Rename all images with the windows command by selecting all images in the folder with ctrl +a

Image by the author

Now right and rename one image, give a name book name (we intentionally give a space which we want to remove using SED command), and press enter . This will rename all images with book name and some additional symbols () and numeric numbers.

Now magic commands of SED

we want to have naming conventions as book_name_num.jpg e.g. first one should look like this book_name_1.jpg

Before going into details we look into the basic structure of SED commands

sed -i 's/original/new/g' file.txt

A good explanation is given [ link ] and using this structure, we remove empty spaces and special characters ( ) from the names. First, we create an output folder with name output and remove the spaces.

for filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/ //g")"; done

The output will look like

Image from the author

Now remove the spececial charaters

for filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/)//g")"; donefor filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/(//g")"; done

Finally replace e with E

for filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/e/E/g")"; done
Image from the author

Following are some commands which are semantically correct but not tested. Might need syntax correction.

Replacing . With number

for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e ‘s/\.\([0..9]\)/\1/g’)”; done

Replacing — with _

for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e ‘s/\(\-\)/_/g’)”; done

Removing blank space

for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/ //g”)”; done

Removing blank space and (1)

for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/\(\ (1)\)//g”)”; done

Replacing first blank space

for filename in ./container*; do cp “./$filename” “output/$(echo “$filename” | sed -e ‘s/\(\_\)//1’)”; done

Replacing rot- with rot_

sed -i ‘s/rot\-/rot_/g’ *.xml

Replace space and ( with _

for filename in ./page*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/\(\ (\)/\_/g”)”; done

Replacing )

for filename in ./page*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/)//g”)”; done

--

--

Imran Bangash

Imran is a computer vision and AI enthusiast with a PhD in computer vision. Imran loves to share his experience with self-improvement and technology.