Meeting-20100612: Difference between revisions

From WPLUG
Jump to navigation Jump to search
(→‎Speaker/Presentation: change to replacement topic)
Line 26: Line 26:


I can think of at least six ways of doing this, how about you?
I can think of at least six ways of doing this, how about you?

Below are the various examples I came up with, and running times using an input file with 1.5 million blank lines and 3 million non-blank lines. You can generate these statistics by preceding the command with 'time -p'.

# First example from <http://www.vectorsite.net/tsawk_3.html>
awk 'NF != 0 { ++count } END { print count }' filename
3000000
real 3.67
user 3.55
sys 0.10

awk '/./ { ++count } END { print count }' filename
3000000
real 2.96
user 2.81
sys 0.12

time -p grep -c . filename
3000000
real 0.94
user 0.84
sys 0.08

# sed is just a slower grep here.
sed -n -e '/./p' filename | wc -l
real 6.00
user 5.64
sys 0.14
3000000

# If you REALLY love sed, you can replace wc -l, too!
sed -n -e '/./p' filename | sed -n -e '$='
real 7.43
user 5.70
sys 0.19
3000000

tr -s '\012' < filename | wc -l
real 1.21
user 0.84
sys 0.13
3000000

# -b and -s are non-POSIX extensions to cat found on GNU and
# BSD systems.
cat -b -s filename | tail -n 2 | cut -f 1
real 1.20
user 0.63
sys 0.16
2999999
3000000

sh -c 'count=0
while read ln ; do
[ -n "$ln" ] && count=$(($count+1))
done
echo $count' < filename
3000000
real 240.14
user 214.12
sys 24.96

perl -e 'while (<>) { chomp; if ($_) { ++$count } } ;
print "$count\n"' < filename
3000000
real 7.22
user 7.00
sys 0.14

perl -e 'while (<>) { if (/./) { ++$count } } ;
print "$count\n"' < filename
3000000
real 8.93
user 8.78
sys 0.11

# This one displays a separate count of blank, non-blank, and
# total lines.
awk 'NF != 0 {++nonblank} NF == 0 {++blank}
END {print "Non-blank:",nonblank ; print "Blank:",blank ;
print "Total:",NR}' filename
Non-blank: 3000000
Blank: 1500000
Total: 4500000
real 5.42
user 5.28
sys 0.12

# Actually, we don't need a separate pattern and action to
# count blank lines; we can subtract from the total instead.
awk 'NF != 0 {++count}
END {print "Non-blank:",count ; print "Blank:",NR-count ;
print "Total:",NR}' filename
Non-blank: 3000000
Blank: 1500000
Total: 4500000
real 3.75
user 3.64
sys 0.09

# This does the same, but has to read the file three separate
# times. On your system, might be faster or slower than the
# one above; depends on whether CPU or I/O is the bottleneck.
sh -c 'printf "Non-blank: " ; grep -c . filename ;
printf "Blank: " ; grep -v -c . filename ;
printf "Total: " ; wc -l filename | cut -d " " -f 1'
Non-blank: 3000000
Blank: 1500000
Total: 4500000
real 2.25
user 1.91
sys 0.31


=== EXERCISE TWO ===
=== EXERCISE TWO ===

Revision as of 03:53, 13 June 2010

WPLUG will have a General User Meeting and presentation on Saturday, June 12th, 2010, starting at 11am at the Wilkins School Community Center.

Schedule for the Day

10:30am - Doors open, set up
11:00am - Business Meeting starts
11:30am - Featured Presentation
12:30pm - Meeting ends, everyone out. We are likely to go to D's 6pack or Square Cafe for lunch.

Speaker/Presentation

Vance Kochenderfer will be talking a bit about the UNIX text processing utilities such as grep, sed, awk, cat, wc, and the like.

However, you don't get to just sit on your butt and listen; this is an audience-participation event. What we're going to do is take a couple simple tasks, and then explore how you could accomplish them using various UNIX utilities.

The goal is not solely to find the standard, quickest, or simplest solution, but to try out as many different whacked-out options as we can. So don't stop thinking once you've got an answer, even if it's a good one - see what else you can come up with!

We'll talk over all the suggestions and how they work (or don't work), so hopefully we'll all learn something new.

Start thinking about these, and bring your ideas to the meeting:

EXERCISE ONE

You have a large text file. Some lines contain text; others are blank. Your goal is to figure out how many non-blank lines are in the file.

I can think of at least six ways of doing this, how about you?

Below are the various examples I came up with, and running times using an input file with 1.5 million blank lines and 3 million non-blank lines. You can generate these statistics by preceding the command with 'time -p'.

# First example from <http://www.vectorsite.net/tsawk_3.html>
awk 'NF != 0 { ++count } END { print count }' filename
3000000
real 3.67
user 3.55
sys 0.10
awk '/./ { ++count } END { print count }' filename
3000000
real 2.96
user 2.81
sys 0.12
time -p grep -c . filename
3000000
real 0.94
user 0.84
sys 0.08
# sed is just a slower grep here.
sed -n -e '/./p' filename | wc -l
real 6.00
user 5.64
sys 0.14
3000000
# If you REALLY love sed, you can replace wc -l, too!
sed -n -e '/./p' filename | sed -n -e '$='
real 7.43
user 5.70
sys 0.19
3000000
tr -s '\012' < filename | wc -l
real 1.21
user 0.84
sys 0.13
3000000
# -b and -s are non-POSIX extensions to cat found on GNU and
# BSD systems.
cat -b -s filename | tail -n 2 | cut -f 1
real 1.20
user 0.63
sys 0.16
2999999
3000000
sh -c 'count=0
while read ln ; do
[ -n "$ln" ] && count=$(($count+1))
done
echo $count' < filename
3000000
real 240.14
user 214.12
sys 24.96
perl -e 'while (<>) { chomp; if ($_) { ++$count } } ;
print "$count\n"' < filename
3000000
real 7.22
user 7.00
sys 0.14
perl -e 'while (<>) { if (/./) { ++$count } } ;
print "$count\n"' < filename
3000000
real 8.93
user 8.78
sys 0.11
# This one displays a separate count of blank, non-blank, and
# total lines.
awk 'NF != 0 {++nonblank} NF == 0 {++blank}
END {print "Non-blank:",nonblank ; print "Blank:",blank ;
print "Total:",NR}' filename
Non-blank: 3000000
Blank: 1500000
Total: 4500000
real 5.42
user 5.28
sys 0.12
# Actually, we don't need a separate pattern and action to
# count blank lines; we can subtract from the total instead.
awk 'NF != 0 {++count}
END {print "Non-blank:",count ; print "Blank:",NR-count ;
print "Total:",NR}' filename
Non-blank: 3000000
Blank: 1500000
Total: 4500000
real 3.75
user 3.64
sys 0.09
# This does the same, but has to read the file three separate
# times.  On your system, might be faster or slower than the
# one above; depends on whether CPU or I/O is the bottleneck.
sh -c 'printf "Non-blank: " ; grep -c . filename ;
printf "Blank: " ; grep -v -c . filename ;
printf "Total: " ; wc -l filename | cut -d " " -f 1'
Non-blank: 3000000
Blank: 1500000
Total: 4500000
real 2.25
user 1.91
sys 0.31

EXERCISE TWO

Determine whether a given value is numeric (decimal).

Example numeric values:

 123       45.6789   -3.4567   -0        000123    .01234
 54321.    00000.    -0.987    -.987     -0123.    012
 0.0       .0        -.000

Example non-numeric values:

 hello     3f        3F        AB        0xAB      0.0.
 -0-       3.0E8     3.0e-08   .-0123    1.23.4    5.678-
 --98      a space   a tab

As a bonus, make your command also consider a value numeric if it starts with a + instead of a -.

I haven't thought about this one as much, and only have one solution so far. Maybe you can come up with something using bc or some other non-obvious method?

Meeting Minutes

(TBA)

Meeting Staff

If you would like to volunteer to assist with this meeting, please add your name to one or more of the categories below.

Carpooling

  • Your name/location here