From: tim@menzies.us Newsgroups: comp.lang.awk,comp.unix.shell,comp.answers,news.answers Followup-To: poster Subject: Awk FAQ v2.003 Version: 2.003 Summary: FAQ (Frequently Asked Questions) about the awk programming language Last-modified: 2010-Mar-1 URL: http://lawker.googlecode.com/svn/fridge/doc/faq.txt Archive-name: http://lawker.googlecode.com/svn/fridge/doc/faq.txt Frequently Asked Questions == FAQ The FAQ list for comp.lang.awk can be found on the Internet: =============================================================== Contents: 0. Change history 1. Disclaimer 2. Spam 3. Can you answer my awk question? 4. How can I add a FAQ and its answer to the FAQ list? 5. What is awk? 6. What well-maintained awk-compatible languages are there? 6.1 nawk 6.2 gawk 6.3 mawk 6.4 xgawk 6.5 sqawk 6.6 jawk 6.7 runawk 6.8 older version 7. Where can I buy awk? 8. Where can I get awk for free? For what platforms? 8.1 OS/X 8.2 Windows 8.3 LINUX 9. Why would anyone use awk instead of language XYZ? 10. How can I learn awk? 11. What are some other awk resources? 11.1. The awk community portal. 11.2. Short tutorials for newcomers. 11.3. Longer Tutorials. 12. How do I report a bug in gawk? 13. How can I access shell or environment variables in an awk script? 14. How does awk deal with multiple files? 14.1 How can awk test for the existence of a file? 14.2 How can I get awk to read multiple files? 14.3 How can I tell from which file my input is coming? 14.4 How can I get awk to open multiple files (selected at runtime)? 14.5 How can I treat the first file specially? 14.6 How can I explicitly pass in a filename to treat specially? 15. How many elements were created by split()? 16. How can I split a string into characters? 17. How do I have dynamic-width printf strings, like C? 18. Why doesn't "\\$" behave like /\\$/ ? Why don't parentheses match? 19. What is awk's exit code? 20. How can I get awk to be case-insensitive? 20.1. use tolower() 20.2. use IGNORECASE=1 21. How can I force a numeric/non-numeric comparison? 22. Why does { FS=":"; print $1 } not split the first record? 23. Why doesn't awk 'begin {...}' work? 24. Why does awk 'BEGIN { print 6 " " -22 }' lose the space? 25. How do I take advantage of gawk's networking support? 26. How do I delete all fields up to field N, preserving input formatting? 27. How do I extract the string that matches a RE? 28. How do I substitute matched REs in *sub()? 29. How do I write changes back to the original file? 30. How do I convert a string to an array? 31. How do I convert and diff 2 date/time values? 32. How do I select a range of records? 33. How do I remove text between 2 tags? 98. Miscellaneous 99. Credits =============================================================== 0. Change history Mar 1, 2010: fixing minor typos Feb 28, 2010: lines reformatted to kill wordwrap. Feb 20, 2010: AWK FAQ (version 2) released. =============================================================== 1. Disclaimer Read at your own risk. The current, previous, or original authors make no claim as to fitness for any purpose or absence of any errors, and offer no warranty. Do not eat. =============================================================== 2. Spam You wouldn't believe how much spam I get to this address. =============================================================== 3. Can you answer my awk question? Probably not. Please don't mail it to me. Read the FAQ, and the materials pointed to by it, and if you can't find an answer there, by all means post to the newsgroup (see http://groups.google.com/group/comp.lang.awk). If you need help posting, see among others. A FAQ list is intended to reduce traffic on a newsgroup, not eliminate it. =============================================================== 4. How can I add a FAQ and its answer to the FAQ list? Mail BOTH of them to me. Then I can add them to the FAQ and it should help people who have that same question later, as well as everyone who reads the group, because they won't see it asked and answered so often. I do not work on this FAQ every day, but I will try to get updates incorporated in a timely manner (say, monthly). Of course, don't mail me my entire FAQ! I already have a copy! There are copies available all over the web that I could use if I lost mine! I pay for my access; don't you? =============================================================== 5. What is awk? Awk is a stable cross platform computer language named for its authors Alfred Aho, Peter Weinberger & Brian Kernighan. They write: "Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data-manipulation tasks". Alfred V. Aho Brian W. Kernighan Peter J. Weinberger In Classic Shell Scripting, Arnold Robbins & Nelson Beebe confess their Awk bias: "We like it. A lot. The simplicity and power of Awk often make it just the right tool for the job." Besides the Bourne shell, Awk is the only other scripting language available in the standard Unix environment. Implementations of AWK exist as installed software for almost all other operating systems. AWK is a superb language for testing algorithms and applications with some complexity, especially where the problem can be broken into chunks which can streamed as part of a pipe. It's an ideal tool for augmenting the features of shell programming as it is ubiquitous; found in some form on almost all Unix/Linux/BSD systems. Many problems dealing with text, log lines or symbol tables are handily solved or at the very least prototyped with awk along with the other tools found on Unix/Linux systems. =============================================================== 6. What well-maintained awk-compatible languages are there? 6.1 nawk "The one true awk" (the original Bell Labs AWK). Interpreter. See http://www.cs.princeton.edu/~bwk/btl.mirror/awk.tar.gz 6.2 gawk From the GNU project. Widely used. Interpreter. See http://www.gnu.org/software/gawk/ 6.3 mawk Mike's Awk (from Michael Brennan). For some code, runs very fast. Interpreter See http://freshmeat.net/projects/mawk/ 6.4 xgawk Gawk + XML + ... Interpreter See http://home.vrweb.de/~juergen.kahrs/gawk/XML/. 6.5 sqawk Gawk + SQL Interpreter See http://code.google.com/p/spawk/. 6.6 jawk Awk in the JAVA virtual machine Interpreter. See http://jawk.sourceforge.net/. 6.7 runawk A wrapper for the AWK interpreter, providing modules See http://sourceforge.net/projects/runawk/files/runawk/. 6.8 Older versions, may not be currently supported, translates to "C". * awka =============================================================== 7. Where can I buy awk? MKS sells their version of AWK, or at least as part of their toolkit. See http://www.mks.com =============================================================== 8. Where can I get awk for free? For what platforms? Most current AWK versions are open source; i.e. free. AWK runs on many platforms and can be downloaded and installed from many package management systems; e.g. 8.1. OS/X From FINK: http://www.finkproject.org/ From darwin ports: http://darwinports.com/ 8.2. Windows From GnuWin32: http://gnuwin32.sourceforge.net/ From Cygwin: http://www.cygwin.com/ 8.3. LINUX: From apt-get: from e.g. the Synaptic package manager. =============================================================== 9. Why would anyone use awk instead of language XYZ? Awk is a simple and elegant pattern scanning and processing language. Awk is also the most portable scripting language in existence. But why use it rather than Perl (or PHP or Ruby or...): - Awk is simpler (especially important if deciding which to learn first); - Awk syntax is far more regular (another advantage for the beginner, even without considering syntax-highlighting editors); - You may already know Awk well enough for the task at hand; - You may have only Awk installed; - Awk can be smaller, thus much quicker to execute for small programs. Tom Christiansen wrote : <3766d75e@cs.colorado.edu> Awk is a venerable, powerful, elegant, and simple tool that everyone should know. (Languages like) Perl are a superset and child of awk, but has much more power that comes at expense of sacrificing some of that simplicity. Carlo Strozzi writes: (Other languages like Perl is) a good programming language for writing self-contained programs, but pre-compilation and long start-up time are worth paying only if once the program has loaded it can do everything in one go. This contrasts sharply with the Operator-stream Paradigm, where operators are chained together in pipelines of two, three or more programs. The overhead associated with initializing (say) Perl at every stage of the pipeline makes pipelining inefficient. A better way of manipulating structured ASCII files is to use the AWK programming language, which is much smaller, more specialized for this task, and is very fast at startup. =============================================================== 10. How can I learn awk? English Book: _The AWK Programming Language_, by Aho, Kernighan and Weinberger, who invented the language. Published by Addison-Wesley. Lots of good material in not a lot of space. Out of date, with regard to POSIX awk. ISBN 0-201-07981-X Source code: English Book: _Effective Awk Programming_, by Arnold Robbins published by O'Reilly and Associates. ISBN 0-596-00070-7 (third edition) Errata: We recommend buying the book instead of trying to print it all out, for three reasons: 1. It's probably cheaper than using your own toner and paper. 2. Some money goes back to help further development, both to Arnold Robbins (only if you buy from ORA) and the Free Software Foundation (if you buy from either ORA or the FSF). 3. It helps convince publishers that we _like_ having full documentation available on-line (e.g., for searching), but will still pay for a compact, bound copy. English reference card: English Book: second edition: _Sed & Awk_, by Dale Dougherty & Arnold Robbins, published by O'Reilly and Associates. ISBN 1-56592-225-5 (second edition) _sed & awk_ describes two text manipulation programs that are mainstays of the UNIX programmer's toolbox. The last edition covers the sed and awk programs as they are now mandated by the POSIX standard and includes discussion of the GNU versions of these programs. An errata for the second edition of Sed & Awk is at English Book: _Classic Shell Scripting_ by Arnold Robbins and Nelson Beebe published by O'Reilly and Associates. ISBN 5-9600-595-4 Contains an (excellent) short introduction to Gawk, as well as numerous other UNIX shell languages that can be combined to quickly build applications. An errata for this book is at English Book: _Mastering Regular Expressions_, by Jeffrey E.F. Friedl, published by O'Reilly and Associates. 3rd edition. (the `Hip Owls Book') ``... you will learn how to use regular expressions to solve problems and get the most out of tools that provide them. Not only that, but much more: this book is about _mastering_ regular expressions.'' errata, additions, change log available at the author's home page ISBN 1-56592-257-3 Deutsch Book: Friedl's _Mastering Regular Expressions_. English Booklet: TCP/IP Internetworking With Gawk ISBN 1-882114-93-0 An abridged form is included in O'Reilly's Effective Awk Programming 3e A short worked example of this code is at http://awk.info/?tools/server. =============================================================== 11. What are some other awk resources? 11.1. The awk community portal: a large collection of awk tips and trips. 11.2. Short tutorials for newcomers. Sorted by newbie-ness (so best to start at the top): Eric Wendelin: Awk is a beautiful tool Tim Sherwood: AWK: The Duct Tape of Computer Science Research (slides) Ronald Loui: Samples of Gawk Andrew Ross: Getting started with awk Tim Menzies: Four Keys to Gawk Peteris Krumins: 10 Awk Tips, Tricks and Pitfalls Paul Jakma: Awk programmers' FAQ Ed Morton (and friends): Use (and Abuse) of Getline Note: Ed's text shows that getline is more complicated that it first appears. In short, getline should not be used by beginners. 11.3. Longer Tutorials The following list is sorted by the number of times this material is tagged at delicious.com (most tagged at top): Greg Goebel: An Awk Primer Bruce Barnett: Awk - A Tutorial and Introduction Arnold Robbins: The GNU Awk User's Guide Emmett Dulaney: AWK: The Linux Administrators' Wisdom Kit =============================================================== 12. How do I report a bug in gawk? This is described in great detail in the gawk documentation. In brief: 1. Make sure what you've discovered is really a bug by checking the documentation and, if possible, comparing with nawk and mawk. 2. Cut down the program and data to as small as possible a test case that will illustrate the bug. 3. Optionally post to comp.lang.awk; this allows others to confirm or deny the behavior, and its incorrectness (or lack thereof). 4. Send mail to . This automatically sends a copy to Arnold Robbins. Do not JUST post in comp.lang.awk; Arnold's readership there is sporadic, and of course any Usenet article can be missed, killed, or dropped. =============================================================== 13. How can I access shell or environment variables in an awk script? Short answer = either of these, where "svar" is a shell variable and "avar" is an awk variable: awk -v avar="$svar" '... avar ...' file awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}... avar ...' "$svar" file depending on your requirements for handling backslashes and handling ARGV[] if it contains a null string (see below for details). Long answer = There are several ways of passing the values of shell variables to awk scripts depending on which version of awk (and to a much lesser extent which OS) you're using. For this discussion, we'll consider the following 4 awk versions: oawk (old awk, /usr/bin/awk and /usr/bin/oawk on Solaris) nawk (new awk, /usr/bin/nawk on Solaris) sawk (non-standard name for /usr/xpg4/bin/awk on Solaris) gawk (GNU awk, downloaded from http://www.gnu.org/software/gawk) If you wanted to find all lines in a given file that match text stored in a shell variable "svar" then you could use one of the following: a) awk -v avar="$svar" '$0 == avar' file b) awk -vavar="$svar" '$0 == avar' file c) awk '$0 == avar' avar="$svar" file d) awk 'BEGIN{avar=ARGV[1];ARGV[1]=""}$0 == avar' "$svar" file e) awk 'BEGIN{avar=ARGV[1];ARGC--}$0 == avar' "$svar" file f) svar="$svar" awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file g) awk '$0 == '"$svar"'' file The following list shows which version is supported by which awk on Solaris (which should also apply to most other OSs): oawk = c, g nawk = a, c, d, f, g sawk = a, c, d, f, g gawk = a, b, c, d, f, g Notes: 1) Old awk only works with forms "c" and "g", both of whic have problems. 2) GNU awk is the only one that works with form "b" (no space between "-v" and "var="). Since gawk also supports form "a", as do all the other new awks, you should avoid form "b" for portability between newer awks. 3) In form "c", ARGV[1] is still getting populated, but because it contains an equals sign (=), awk changes it's normal behavior of assuming that arguments are file names and now instead assumes this is a variable assignment so you don't need to clear ARGV[1] as in form "d". 4) In light of "3)" above, this raises the interesting question of how to pass awk a file name that contains an equals sign - the answer is to do one of the following: i) Specify a path, e.g. for a file named "abc=def" in the current directory, you'd use: awk '...' ./abc=def Note that that won't work with older versions of gawk or with sawk. ii) Redirect the input from a file so it's opend by the shell rather than awk having to parse the file name as an argument and then open it: awk '...' < abc=def Note that you will not have access to the file name in the FILENAME variable in this case. 5) An alternative to setting ARGV[1]="" in form "d" is to delete that array entry, e.g.: awk 'BEGIN{avar=ARGV[1]; delete ARGV[1]}$0 == avar' "$svar" file This is slightly misleading, however since although ARGV[1] does get deleted in the BEGIN section and remains deleted for any files that preceed the deleted variable assignment, the ARGV[] entry is recreated by awk when it gets to that argument during file processing, so in the case above when parsing "file", ARGV[1] would actually exist with a null string value just like if you'd done ARGV[1]="". Given that it's misleading and introduces inconsistency of ARGV[] settings between files based on command-line order, it is not recommended. 6) An alternative to setting svar="$svar" on the command line prior to invoking awk in form "f" is to export svar first, e.g.: export svar awk 'BEGIN{avar=ENVIRON["svar"]}$0 == avar' file Since this forces you to export variables that you wouldn't normally export and so risk interfering with the environment of other commands invoked from your shell, it is not recommended. 7) When you use form "d", you end up with a null string in ARGV[1], so if at the end of your program you want to print out all the file names then instead of doing: END{for (i in ARGV) print ARGV[i]} you need to check for a null string before printing. or store FILENAMEs in a different array during processing. Note that the above loop as written could also print the script name stored in ARGV[0]. 8) When you use form "a", "b", or "c", the awk variable assignment gets processed during awks lexical analaysis stage (i.e. when the internal awk program gets built) and any backslashes present in the shell variable may get expanded so, for example, if svar contains "hi\there" then avar could contain "hithere" with a literal tab character. This behavior depends on the awk version as follows: oawk: does not print a warning and sets avar="hi\there" sawk: does not print a warning and sets avar="hihere" nawk: does not print a warning and sets avar="hihere" gawk: does not print a warning and sets avar="hihere" If the backslash preceeds a character that has no special meaning to awk then the backslash may be discarded with or without a warning, e.g. if svar contained "hi\john" then the backslash preceeds "j" and "\j" has no special meaning so the various new awks each would behave differently as follows: oawk: does not print a warning and sets avar="hi\john" sawk: does not print a warning and sets avar="hi\john" nawk: does not print a warning and sets avar="hijohn" gawk: prints a warning and sets avar="hijohn" 9) None of the awk versions discussed here work with form "e" but it is included above as there are older (i.e. pre-POSIX) versions of awk that will treat form "d" as if it's intended to access a file named "" so you instead need to use form "e". If you find yourself with that or any other version of "old awk", you need to get a new awk to avoid future headaches and they will not be discussed further here. So, the forms accepted by all 3 newer awks under discussion (nawk, sawk, and gawk) are a, c, d, f, and g. The main differences between each of these forms is as follows: |-------|-------|----------|-----------|-----------|--------| | BEGIN | files | requires | accepts | expands | null | | avail | set | access | backslash | backslash | ARGV[] | |-------|-------|----------|-----------|-----------|--------| a) | y | all | n | n | y | n | c) | n | sub | n | n | y | n | d) | y | all | n | n | n | y | f) | y | all | y | n | n | n | g) | y | all | n | y | n/a | n | |-------|-------|----------|-----------|-----------|--------| where the columns mean: BEGIN avail = y: variable IS available in the BEGIN section BEGIN avail = n: variable is NOT available in the BEGIN section files set = all: variable is set for ALL files regardless of command-line order. files set = sub: variable is ONLY set for those files subsequent to the definition of the variable on the command line requires access = y: variable DOES need to be exported or set on the command line requires access = n: shell variable does NOT need to be exported or set on the command line accepts backslash = y: variable CAN contain a backslash without causing awk to fail with a syntax error accepts backslash = n: variable can NOT contain a backslash without causing awk to fail with a syntax error expands backslash = y: if the variable contains a backslash, it IS expanded before execution begins expands backslash = n: if the variable contains a backslash, it is NOT expanded before execution begins null ARGV[] = y: you DO end up with a null entry in the ARGV[] array null ARGV[] = n: you do NOT end up with a null entry in the ARGV[] array For most applications, form "a" and "d" provide the most intuitive functionality. The only functional differences between the 2 are: 1) Whether or not backslashes get expanded on assignment. 2) Whether or not ARGV[] ends up containing a null string. So which one you choose to use depends on your requirements for these 2 situations. =============================================================== 14. How does awk deal with multiple files? Warning: some of these techniques will require non-ancient versions of awk. 14.1 How can awk test for the existence of a file? The most portable way is to simply try and read from the file. function exists(file, dummy, ret) { ret=0; if ( (getline dummy < file) >=0 ) { # file exists (possibly empty) @ # and can be read ret = 1; close(file); } return ret; } [ I've read reports that earlier versions of mawk would write to stderr as well as getline returning <0 -- is this still true? ] On Unix, you can probably use the `test' utility if (system("test -r " file) == 0) # file is readable else # file is not readable 14.2 How can I get awk to read multiple files? It's automatic (under Unix ) -- use something like: awk '/^#include/ {print $2}' *.c *.h 14.3 How can I tell from which file my input is coming? use the built-in variable FILENAME: awk '/^#include/ {print FILENAME,$2}' *.c *.h 14.4 How can I get awk to open multiple files (selected at runtime)? Use `getline', `close', `print EXPR > FILENAME', like: # assumes input file has at least 1 line, # output file writeable function double(infilename,outfilename, aline) { while ( (getline aline < infilename) >0 ) print(aline aline) > outfilename; close(infilename); close(outilename); } 14.5 How can I treat the first file specially? For the first file read, the FNR is the same as NR. Hemce... FNR == NR { stuff } 14.6 How can I explicitly pass in a filename to treat specially? use `-v rulesfile=filename' like you would any other variable, and then use a `getline' loop (and `close') in your BEGIN statement. BEGIN \ { if (rulesfile=="") { print "must use -v rulesfile=filename"; exit(1); } while ( (getline < rulesfile) >0 ) replace[$1]=$0; close(rulesfile); } { if ($1 in replace) print replace[$1]; else print; } =============================================================== 15. How many elements were created by split()? when I do a split on a field, e.g., split($1,x,"string") how can i find out how many elements x has (I mean other than testing for null string or doing a `for (n in x)' test)? split() is a function; use its return value: n = split($1, x, "string") =============================================================== 16. How can I split a string into characters? In portable POSIX awk, the only way to do this is to use substr to pull out each character, one by one. This is painful. However, gawk, mawk, and the newest version of the Bell Labs awk all allow you to set FS = "" and use "" as the third argument of split. So, split("chars",anarray,"") results in the array anarray containing 5 elements -- "c", "h", "a", "r", "s". If you don't have any ^As in your string, you could try: string=$0; gsub(".", "&\001", string) n=split(string, anarray, "\001") for (i=1;i<=n;i++) print "character " i "is '" anarray[i] "'"; =============================================================== 17. How do I have dynamic-width printf strings, like C? With modern awks, you can just do it like you would in C (though the justification is less clear; C doesn't have the trivial in-line string concatenation that awk does), like so: maxlen=0 for (i in arr) if (maxlen regular expression: literal backslash at end "\\$" => string: \$ => regular expression: literal dollar sign to get behavior like the first case in a string, use "\\\\$" . there are other, less obvious characters which need the same attention; under-quoting or over-quoting should be avoided: parentheses are special for alternation: /\(test\)/ => 6 characters `(test)' "\(test\)" => /(test)/ => 4 characters `test' an example of trying to match some diagonal compass directions: /(N|S)(E|W)/ => `NE' or `NW' or `SE' or `SW' (good) "(N|S)(E|W)" => /(N|S)(E|W)/ (good) "\(N|S\)\(E|W\)" => /(N|S)(E|W)/ (good) (NOTE: \ has no effect) "\(N\|S\)\(E\|W\)" => /(N|S)(E|W)/ (good) (NOTE: \ sno effect) expressions that look similar but behave totally differently: /\(N|S\)\(E|W\)/ => `(N' or `S)(E' or `W)' /\(N\|S\)\(E\|W\)/ => `(N|S)(E|W)' only There is also confusion regarding different forms of special characters; POSIX requires that `\052' be treated as any other `*', even though it is written with 4 bytes instead of 1. In compatibility mode, gawk will treat it as though it were escaped , namely `\*'. =============================================================== 19. What is awk's exit code? With no exit command, awk exits with a zero value, unless there were problems closing input/output files. You can supply an optional numeric value to the `exit' command to make it exit with a value: if (whatever) exit 12; If you have an END block, control first transfers there. Within the END block, an `exit' command exits immediately; if you had previously supplied a value, that value is used. But, if you give a new value to `exit' within the END block, the new value is used. This is documented in the GNU Awk User's Guide (gawk.texi). If you have an END block you want to be able to skip sometimes, you may have to do something like this: BEGIN \ { exitcode=0; ... } # normal rules processing... { ... if (fatal) { exitcode=12; exit(exitcode); } ... } END { if (exitcode!=0) exit(exitcode); ... } =============================================================== 20. How can I get awk to be case-insensitive? 20.1. use tolower() or tolower() - portable - must be explicitly used for each comparison instead of: if (avar=="a" || avar=="A") { ... } use: if (tolower(avar)=="a") { ... } or at the beginning of your code, add a line like { for (i=0;i<=NF;i++) $i=tolower($i) } { $0=tolower($0); } # awk rebuilds $1..$NF also 20.2. use IGNORECASE=1; - gawk only - used for all comparisons, regex comparisons, index() function - not used for array indexing =============================================================== 21. How can I force a numeric/non-numeric comparison? These are the canonical, work-in-all-versions snippets. there are many others, most longer, some shorter (but possibly less portable). To compare two variables as numbers ONLY, use if (0+var1 == 0+var2) To compare two variables as non-numeric strings ONLY, use if ("" var1 == "" var2) =============================================================== 22. Why does { FS=":"; print $1 } not split the first record? Basically, you should set FS before it may be called upon to split $0 into fields. Once awk encounters a `{', it is probably too late. Some awk implementations set the fields at the beginning of the block, and don't re-parse just because you changed FS. To get the desired behavior, you must set FS _before_ reading in a line. e.g., BEGIN { FS=":" } { print $1 } e.g., awk -F: '{ print $1 }' If you run code like this { FS=":"; print $1 } On this data: first:second:third but not last:fourth First:Second:Third But Not Last:Fourth FIRST:SECOND:THIRD BUT NOT LAST:FOURTH You may get either this: or this: ---- ------- first first:second:third First First FIRST FIRST Perhaps more surprisingly, code like { FS=":"; } { print $1; } will also behave in the same way. =============================================================== 23. Why doesn't awk 'begin {...}' work? It needs to be `BEGIN' (i.e., it's case-sensitive). =============================================================== 24. Why does awk 'BEGIN { print 6 " " -22 }' lose the space? You'd expect `6 -22', but you get `6-22'. It's because the `" " -22' is grouped first, as a subtraction instead of a concatenation, resulting in the numeric value `-22'; then it is concatenated with `6', giving the string `6-22'. Gentle application of parentheses will avoid this. =============================================================== 25. How do I take advantage of gawk's networking support? This code creates an html menu of local applications which you can season to taste. The usage requires two steps... 1) run: 'gawk -f server.awk' 2) open browser at: http://localhost:8080 This code is based on the examples located at the TCP/IP Internetworking With `gawk' manual and is licensed under GPL 3.0. For updates to this code, see http://topcat.hypermart.net/index.html. # by Michael Sanders, 2009 BEGIN { x = 1 # exits if x < 1 port = 8080 # port number host = "/inet/tcp/" port "/0/0" # host string url = "http://localhost:" port # server url RS = ORS = "\r\n" # header terminators doc = Setup() # html document while (x) { if ($1 == "GET") RunApp(substr($2, 2)) if (! x) break Message(doc) host |& getline # wait for new client request } Message(Bye()) # server terminated... } #Server Message function Message(txt) { status = 200 # 200 == OK reason = "OK" # server response len = length(txt) + length(ORS) # length of document print "HTTP/1.0", status, reason |& host print "Connection: Close" |& host print "Pragma: no-cache" |& host print "Content-length:", len |& host print ORS txt |& host close(host) } #HTML Menu function Setup() { tmp = "\ Simple gawk server\ \

xterm\

xcalc\

xload\

terminate script\ \ " return tmp } #Saying Good-bye function Bye() { tmp = "\ Simple gawk server\

Script Terminated...\ " return tmp } #Running Applications function RunApp(app) { if (app == "exit") {x = 0} else if (app == "xterm") {system("xterm&")} else if (app == "xcalc") {system("xcalc&")} else if (app == "xload") {system("xload&")} } =============================================================== 26. How do I delete all fields up to field N, preserving input formatting? With a POSIX awk: awk ' sub(/^[[:space:]]*([^[:space:]]*[[:space:]]*){N}/,"")' With GNU awk: gawk --re-interval ' sub(/^[[:space:]]*([^[:space:]]*[[:space:]]*){N}/,"")' The number "N" within the "{...}" is the number of initial fields to delete. Note that "gensub()" is not available with "--posix" but it is available with "--re-interval" so if you need to use an interval expression (e.g. {1,} or {8} or {2,4}) with gensub() then you must use --re-interval rather than --posix so --re-interval is generally the preferred method. =============================================================== 27. How do I extract the string that matches a RE? awk -v re='a|b' ' function extract(s,epr) { RMATCH = (match(s,eprx) ? substr(str,RSTART,RLENGTH) : "") return RSTART } extract($0,re) { print RMATCH } ' =============================================================== 28. How do I substitute matched REs in *sub(). $ echo "abcbd" | awk 'sub(/b/,"|&|")' a|b|cbd $ echo "abcbd" | awk 'gsub(/b/,"|&|")' a|b|c|b|d $ echo "abcbd" | gawk '$0=gensub(/b/,"|&|","")' a|b|cbd $ echo "abcbd" | gawk '$0=gensub(/b/,"|&|","g")' a|b|c|b|d $ echo "abcbd" | gawk '$0=gensub(/(b)/,"|\\1|","")' a|b|cbd $ echo "abcbd" | gawk '$0=gensub(/(b)/,"|\\1|","g")' a|b|c|b|d $ echo "abcbd" | gawk '$0=gensub(/(b)(c)/,"|\\2\\1|","g")' a|cb|bd =============================================================== 29. How do I write changes back to the original file? awk ' function saveRec(rec) { _File[++_Fnr] = rec } function printFile( fnr) { if (_PrevFilename != "") { close(_PrevFilename) # in case is called in END printf "" > _PrevFilename # ensure later close() works for (fnr=1; fnr<=_Fnr; fnr++) print _File[fnr] > _PrevFilename close(_PrevFilename) } _Fnr = 0 _PrevFilename = FILENAME } FNR==1 { printFile() } { ... do stuff with $0...; saveRec( $0 ) } END { printFile() } ' file1 file2 ... =============================================================== 30. How do I convert a string to an array? To convert a string to an array indexed by each word's position in the string: awk 'BEGIN{str="abc def";c=split(str,arr); for (i=1;i<=c;i++) print arr[i]}' To convert a string to an array indexed by each word: awk 'BEGIN{str="abc def";c=split(str,tmp); for (i=1;i<=c;i++) arr[tmp[i]]++; delete tmp; for (w in arr) print w}' =============================================================== 31. How do I convert and diff 2 date/time values? This will print the number of seconds between 2 date/time values given in some non-standard format: gawk-only solution: function cvttime(t, a) { split(t,a,"[/:]") match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2]) a[2] = sprintf("%02d",(RSTART+2)/3) return(mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6])) } BEGIN{ t1="01/Dec/2005:00:04:42" t2="01/Dec/2005:17:14:12" print cvttime(t2) - cvttime(t1) } =============================================================== 32. How do I select a range of records? The following idioms describe how to select a range of records given a specific pattern to match: a) Print all records from some pattern: awk '/pattern/{f=1}f' file b) Print all records after some pattern: awk 'f;/pattern/{f=1}' file c) Print the Nth record after some pattern: awk 'c&&!--c;/pattern/{c=N}' file d) Print every record except the Nth record after some pattern: awk 'c&&!--c{next}/pattern/{c=N}1' file e) Print the N records after some pattern: awk 'c&&c--;/pattern/{c=N}' file f) Print every record except the N records after some pattern: awk 'c&&c--{next}/pattern/{c=N}1' file g) Print the N records from some pattern: awk '/pattern/{c=N}c&&c--' file I changed the variable name from "f" for "found" to "c" for "count" where appropriate as that's more expressive of what the variable actually IS. =============================================================== 33. How do I remove text between 2 tags? POSIX: a 2-pass approach to turn all the searched-for patterns into a single char (control-B in this case for no particular reason) first and then use that as the RS (since an RS that's an RE is gawk-only): awk '{$1=$1}1' FS='(begin|end)' OFS=^B file | awk 'NR%2' RS=^B ORS= where the opening and closing tags are "begin" and "end" respecitvely. The gawk equivalent is to directly uses an RE for the RS: gawk -v RS='(begin|end)' -v ORS= 'NR%2' =============================================================== 98. Miscellaneous =============================================================== 99. Credits I most of the information in this FAQ has been be supplied by people other than myself -- it just works better that way. The newsgroup readers have a LOT more awk experience than I ever will (unless I multiply myself by a few thousand, which is not legal with today's tax laws). The following people have contributed to the well-being of the FAQ: Version Two (from 2010): tim [at] menzies.us (Tim Menzies) <== maintainer arnold [at] skeeve.com (Arnold Robbins) g_r_a_n_t_ [at] bugsplatter.id.au mike [at] topcat.hypermart.net (Michael Sanders) mortonspam [at] gmail.com (Ed Morton) triflemenot [at] beewyz.com (Trifle Menot) Version One (up until 2002): awkfaq at locutus.ofB.ORG (Russell Schulz) <== maintainer Alex.Schoenmakers [at] lhs.be David.Billinghurst [at] riotinto.com (David Billinghurst) Ferran.Jorba [at] uab.es (Ferran Jorba) Juergen.Kahrs [at] t-online.de Kalle.Tuulos [at] nmp.nokia.com (Kalle Tuulos) SimonN [at] draeger.com (Nicole Simon) afu [at] wta.att.ne.jp allen [at] gateway.grumman.com (John L. Allen) amnonc [at] mercury.co.il (Amnon Cohen) andrew_sumner [at] bigfoot.com (Andrew Sumner) arnold [at] skeeve.com (Arnold D. Robbins) art [at] pove.com (Art Povelones) bmarcum [at] iglou.com (Bill Marcum) boffi [at] rachele.stru.polimi.it (giacomo boffi) bps03z [at] email.mot.com (Peter Saffrey) brennan [at] whidbey.com (Michael D. Brennan) churchyh [at] ccwf.cc.utexas.edu (Henry Churchyard) db21 [at] ih4ess.ih.lucent.com (David Beyerl) dmckeon [at] swcp.com (Denis McKeon) dmeier.esperanto [at] gmx.de (Detlef Meier) dzubera [at] CS.ColoState.EDU (Zube) edgar.j.ramirez [at] lmco.com (Edgar J. Ramirez) eia018 [at] comp.lancs.ac.uk (Dr Andrew Wilson) epement [at] ripco.com (Eric Pement) gavin [at] wraith.u-net.com (Gavin Wraith) hankedr [at] mail.auburn.edu (Darrel Hankerson) hastinga [at] tarim.dialogic.com (Austin Hastings) heiner.steven [at] nexgo.de (Heiner Steven) hstein [at] airmail.net (Harry Stein) j-korsv [at] online.no (Jon-Egil Korsvold) jari.aalto [at] ntc.nokia.com (Jari Aalto) jblaine [at] shore.net (Jeff Blaine) jerabek [at] rm6208.gud.siemens.co.at (Martin Jerabek) jesusmc [at] scripps.edu (Jesus M. Castagnetto) jidanni [at] kimo.com.tw (Dan Jacobson) jlaiho [at] ichaos.nullnet.fi (Juha Laiho) jland [at] worldnet.att.net (Jim Land) jmccann [at] WOLFENET.com (James McCann) joe [at] plaguesplace.dyndns.org johnd [at] mozart.inet.co.th (John DeHaven) kahrs [at] iSenseIt.de (Juergen Kahrs) konrad [at] netcom.com (Konrad Hambrick) lehalle [at] earthling.net (Charles-Albert Lehalle) lothar [at] u-aizu.ac.jp (Lothar M. Schmitt) mark [at] ispc001.demon.co.uk (Mark Katz) markus [at] biewer.com (Markus B. Biewer) monty [at] primenet.com (Jim Monty) morrisl [at] scn.org (Larry D. Morris) neel [at] gnu.org neil_mahoney [at] il.us.swissbank.com (Neil Mahoney) neitzel [at] gaertner.de (Martin Neitzel) peter.tillier [at] btinternet.com (Peter S Tillier) pez68 [at] netscape.net (Peter Stromberg) phil [at] bolthole.com (Philip Brown) pholzleitner [at] unido.org (Peter HOLZLEITNER) pierre [at] mail.asianet.it (Gianni Rondinini) pjf [at] osiris.cs.uoguelph.ca (Peter Jaspers-Fayer) pjfarley [at] banet.net (Peter J. Farley III) ptjm [at] interlog.com (Patrick TJ McPhee) rms [at] friko.onet.pl (Rafal Sulejman) robin.moffatt [at] ntlworld.com (Robin Moffatt) rwab1 [at] cl.cam.ac.uk (Ralph Becket) saguyami [at] post.tau.ac.il (Shay) thobe [at] lafn.org (Glenn Thobe) thull [at] ocston.org (Tom Hull) tim [at] consultix-inc.com (Tim Maher/CONSULTIX) vincent [at] delau.nl (Vincent de Lau) vjpnreddy [at] hotmail.com (Jaya Reddy) walkerj [at] compuserve.com (James G. Walker) walter [at] wbriscoe.demon.co.uk (Walter Briscoe) yuli.barcohen [at] telrad.co.il (Yuli Barcohen) Thanks. =============================================================== thus endeth the awk FAQ.