Web Scraping Images and Renaming Multiple Files
--
SED commands | Image files | CLI | Remove/Replacing spaces and characters
Often we need to rename multiple files with simple commands without writing a script. Stream EDitor provides a powerful interface to accomplish this task. Before going into details we will web scrape images with random names from a fictional bookstore [ link ]. Important to mention! to run the script, create a folder with the name imagesFolder or change the script. The code is provided here
The above script will down images with source names. It is possible to rename the images while downloading but for the sake of the experiment we keep the original names and then follow some steps to rename the images.
Rename all images with the windows command by selecting all images in the folder with ctrl +a
Now right
and rename one image, give a name book name (we intentionally give a space which we want to remove using SED command), and press enter
. This will rename all images with book name and some additional symbols () and numeric numbers.
Now magic commands of SED
we want to have naming conventions as book_name_num.jpg e.g. first one should look like this book_name_1.jpg
Before going into details we look into the basic structure of SED commands
sed -i 's/original/new/g' file.txt
A good explanation is given [ link ] and using this structure, we remove empty spaces and special characters ( ) from the names. First, we create an output folder with name output and remove the spaces.
for filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/ //g")"; done
The output will look like
Now remove the spececial charaters
for filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/)//g")"; donefor filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/(//g")"; done
Finally replace e with E
for filename in ./book*; do cp "./$filename" "output/$(echo "$filename" | sed -e "s/e/E/g")"; done
Following are some commands which are semantically correct but not tested. Might need syntax correction.
Replacing . With number
for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e ‘s/\.\([0..9]\)/\1/g’)”; done
Replacing — with _
for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e ‘s/\(\-\)/_/g’)”; done
Removing blank space
for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/ //g”)”; done
Removing blank space and (1)
for filename in ./DJI*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/\(\ (1)\)//g”)”; done
Replacing first blank space
for filename in ./container*; do cp “./$filename” “output/$(echo “$filename” | sed -e ‘s/\(\_\)//1’)”; done
Replacing rot- with rot_
sed -i ‘s/rot\-/rot_/g’ *.xml
Replace space and ( with _
for filename in ./page*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/\(\ (\)/\_/g”)”; done
Replacing )
for filename in ./page*; do cp “./$filename” “output/$(echo “$filename” | sed -e “s/)//g”)”; done