From Chapter 5 of Linux Shell script introduction, a mess? No such thing!
Parsing website data
$ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html |grep -o "Rank-.*" | sed -e 's/ *Rank-\([0-9]*\) *\(.*\)/\1\t\2/' | sort -nk 1 > actresslist.txt 1 Keira Knightley 2 Natalie Portman 3 Monica Bellucci
The website is unresponsive. The above output is excerpted from the text
Image crawler and download tool
#!/bin/bash #Purpose: image download tool #File name: img_downloader.sh if [ $# -ne 3 ];then echo "Usage: $0 URL -d DIRECTORY" exit -1 fi while [ $# -gt 0 ];do case $1 in -d) shift; directory=$1; shift ;; *) url=$1; shift;; esac done # The case statement checks the first argument ($1). If it matches - d, the next parameter must be the directory, then move the parameter and save the directory name. Otherwise, it is the URL mkdir -p $directory; baseurl=$(echo $url | egrep -o "https?://[a-z.\-]+") echo Downloading $url # Egrep - O "< img SRC = [^ >] * >" only print < img > tags with attribute values # Sed's / < img SRC = \ "\ ([^"] * \). * / \ 1 / g 'the url can be extracted from the string src="url" # Sed "s, ^ /, $baseurl /," baseurl replaces the starting/ curl -s $url | egrep -o "<img[^>]*src=[^>]*>" | sed 's/<img[^>]*src=\"\([^"]*\).*/\1/g' | sed "s,^/,$baseurl/," > /tmp/$$.list cd $directory; while read filename;do echo Downloading $filename curl -s -O "$filename" --silent done < /tmp/$$.list
The website is unresponsive. The above output is excerpted from the text
Web Album Generator
$ cat generate_album.sh #!/bin/bash #File name: generate_album.sh #Purpose: to create an album with pictures in the current directory echo "Creating album.." mkdir -p thumbs # The script redirects this part up to EOF1 (excluding EOF1) to index.html cat <<EOF1 > index.html <html> <head> <style> body { width:470px; margin:auto; border: 1px dashed grey; padding:10px; } img { margin:5px; border: 1px solid black; } </style> </head> <body> <center><h1> #Album title </h1></center> <p> EOF1 for img in *.jpg; do # An image thumbnail with a width of 100 pixels is created convert "$img" -resize "100x" "thumbs/$img" echo "<a href=\"$img\" >" >>index.html echo "<img src=\"thumbs/$img\" title=\"$img\" /></a>" >> index.html done cat <<EOF2 >> index.html </p> </body> </html> EOF2 echo Album generated to index.html $ ./generate_album.sh Creating album.. Album generated to index.html
Twitter command line client
#!/bin/bash #File name: twitter.sh #Purpose: twitter client Basic Edition oauth_consumer_key=YOUR_CONSUMER_KEY oauth_consumer_secret=YOUR_CONSUMER_SECRET config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];then echo -e "Usage: $0 tweet status_message\n OR\n $0 read\n" exit -1; fi # source /usr/local/bin/TwitterOAuth.sh # Use the source command to introduce the TwitterOAuth.sh library, so that you can use the defined functions to access Twitter. Function TO_init is responsible for initializing the library source bash-oauth-master/TwitterOAuth.sh TO_init if [ ! -e $config_file ]; then # Obtain an OAuth token and token secret TO_access_token_helper if (( $? == 0 )); then echo oauth_token=${TO_ret[0]} > $config_file echo oauth_token_secret=${TO_ret[1]} >> $config_file fi fi source $config_file if [[ "$1" = "read" ]];then # Library function TO_statuses_home_timeline can get the published content from Twitter. The data returned by this function is a long string in JSON format # [{"created_at":"Thu Nov 10 14:45:20 +0000 "016","id":7...9,"id_str":"7...9","text":"Dining... TO_statuses_home_timeline '' 'YOUR_TWEET_NAME' '10' echo $TO_ret | sed 's/,"/\n/g' | sed 's/":/~/' | \ awk -F~ '{} {if ($1 == "text") {txt=$2;} else if ($1 == "screen_name") printf("From: %s\n Tweet: %s\n\n", $2, txt);} \ {}' | tr '"' ' ' elif [[ "$1" = "tweet" ]];then shift TO_statuses_update '' "$@" echo 'Tweeted :)' fi
Extract only
Query word meaning through Web server
#!/bin/bash #File name: define.sh #Purpose: used to obtain lexical meaning from dictionaryapi.com key=YOUR_API_KEY_HERE if [ $# -ne 2 ]; then echo -e "Usage: $0 WORD NUMBER" exit -1; fi # nl add line number before line curl --silent http://www.dictionaryapi.com/api/v1/references/learners/xml/$1?key=$key | grep -o \<dt\>.*\</dt\> | sed 's$</*[a-z]*>$$g' | head -n $2 | nl $ ./define.sh usb 1 1 :a system for connecting a computer to another device (such as a printer, keyboard, or mouse) by using a special kind of cord a USB cable/port USB is an abbreviation of "Universal Serial Bus."How it works...
Only excerpts were made, and the operation was not verified
Find invalid links in Web site
#!/bin/bash #File name: find_broken.sh #Purpose: find invalid links in Web site if [ $# -ne 1 ]; then echo -e "$Usage: $0 URL\n" exit 1; fi echo Broken links: mkdir /tmp/$$.lynx cd /tmp/$$.lynx # lynx -traversal URL will generate multiple files in the current working directory, including reject.dat, which contains all links in the website lynx -traversal $1 > /dev/null count=0; # sort -u is used to create a list without duplicates sort -u reject.dat > links.txt while read link; do # Verify the received response header with curl -I output=`curl -I $link -s | grep -e "HTTP/.*OK" -e "HTTP/.*200"` if [[ -z $output ]]; then output=`curl -I $link -s | grep -e "HTTP/.*301"` if [[ -z $output ]]; then echo "BROKEN: $link" let count++ else echo "MOVED: $link" fi fi done < links.txt [ $count -eq 0 ] && echo No broken links found. $ ./find_broken.sh http://10.18.7.30 Broken links: No broken links found.
Track website changes
#!/bin/bash #File name: change_track.sh #Purpose: tracking page changes if [ $# -ne 1 ]; then echo -e "$Usage: $0 URL\n" exit 1; fi first_time=0 # Non first run # Use [! -e "last.html"]; Check whether you are running for the first time. If last.html does not exist, it means that this is the first run, and you must download the Web page and copy it as last.html if [ ! -e "last.html" ]; then first_time=1 # First run fi curl --silent $1 -o recent.html if [ $first_time -ne 1 ]; then changes=$(diff -u last.html recent.html) if [ -n "$changes" ]; then echo -e "Changes:\n" echo "$changes" else echo -e "\nWebsite has no changes" fi else echo "[First run] Archiving.." fi cp recent.html last.html
Only excerpts
Send Web page and read response
POST and GET are two request types of HTTP, which are used to send or retrieve information. In the GET request mode, we use the URL of the page to send parameters (name value). In the POST request mode, parameters are sent in the HTTP message body. POST is often used to submit forms with more content or private information
Here we use the sample website guestbook that comes with tclhttpd package. You can start from http://sourceforge.net/ projects/tclhttpd download tclhttpd and run it on the local system to create a local Web server. If the user clicks the Add me to your guestbook button, the page will send a request containing the name and URL, and the information in the request will be added to the guestbook page to show who has visited the site
Download the tclhttpd package and switch to the bin directory. Start tclhttpd daemon
tclsh httpd.tcl
Use curl to send a POST request and read the response of the website (HTML format)
$ curl URL -d "postvar=postdata2&postvar2=postdata2" # perhaps $ curl http://127.0.0.1:8015/guestbook/newguest.html -d "name=Clif&url=www.noucorp.com&http=www.noucorp.com" <HTML> <Head> <title>Guestbook Registration Confirmed</title> </Head> <Body BGCOLOR=white TEXT=black> <a href="www.noucorp.com">www.noucorp.com</a> <DL> <DT>Name <DD>Clif <DT>URL <DD> </DL> www.noucorp.com </Body>
-d means submitting user data in POST mode- The string parameter form of d is similar to that of GET request. Each pair of var=value is separated by &
You can also use wget's – post data "string" to submit data
$ wget http://127.0.0.1:8015/guestbook/newguest.cgi --post-data "name=Clif&url=www.noucorp.com&http=www.noucorp.com" -O output.html
The format of name value is the same as that in cURL. The content in output.html is the same as that returned by the cURL command
Strings sent as POST (for ex amp le, - d or – POST date) should always be given as references. Otherwise, & will be interpreted by the shell as the command needs to be run as a background process
If you look at the source code of the website (using the View Source option of the web browser), you will find an HTML form similar to the following
<form action="newguest.cgi" " method="post" > <ul> <li> Name: <input type="text" name="name" size="40" > <li> Url: <input type="text" name="url" size="40" > <input type="submit" > </ul> </form>
Where newguest.cgi is the target URL. When the user enters the details and clicks the Submit button, the name and URL are sent to the newguest.cgi page in the form of POST request, and then the response page is returned to the browser
Download video from the Internet
There is a video download tool called YouTube dl. Most distributions do not include this tool, and the version in the software warehouse may not be the latest, so it is best to download it from the official website( http://yt-dl.org ).
Follow the links and information on the page to download and install YouTube dl
youtube-dl https://www.youtube.com/watch?v=AJrsl3fHQ74
Use OTS to summarize text
Open Text Summarizer (OTS) can delete irrelevant content from text and generate a concise summary
Most Linux distributions do not contain the ots package and can be installed using the following command
apt-get install libots-devel
ots is easy to use. It reads the input from a file or stdin and outputs the generated summary to stdout
ots LongFile.txt | less # perhaps cat LongFile.txt | ots | less
ots can also combine curl to generate the summary information of the website. For example, you can use ots to summarize those nagging blogs
curl http://BlogSite.org | sed -r 's/<[^>]+>//g' | ots | less
Translate text on the command line
You can access Google's online translation services through your browser. Andrei Neculau wrote an awk script that can access the service from the command line and translate it
Most Linux distributions do not include this command-line translator, but you can install it directly from Git
cd ~/bin wget git.io/trans chmod 755 ./trans
trans can translate text into the language set by the locale environment variable
$> trans "J'adore Linux" J'adore Linux I love Linux Translations of J'adore Linux French -> English J'adore Linux I love Linux
You can use options before the text to be translated to control the language used for translation. The options are formatted as follows
from:to
To translate English into French, use the following command
$> trans en:fr "I love Linux" J'aime Linux