Monday, April 21, 2008

sed programming - commands combination

Sed is a programming language, to me it basically contains two things, one is regexp, the other is the control commands such as "g G H h" which manipulate the hold buffer and pattern buffer. Sed only contains these two buffers, so it is simple. No more.

Although Sed itself is simple, programming with Sed can be complicated. This is like playing chess, to know the rule of each play does not mean you know how to play. You need to know at least some combinations.

Combination of "G h d". Here is an example "1!G;h;$!d" which REVERSE a list of items, typically 3 states control, at start it is "h d", then the second state which can be looped, which is "G h d", and the third state is "G h". Explanation is quite not hard to understand, the hold buffer was used to save the result, so for the first item, it hold it and delete current pattern, for the second, add "\n' on current buffer and append current buffer with the content in hold buffer, save current buffer to hold buffer, so the second item will be ahead of the first item in the hold buffer. it do the same for third, fourth, etc. until the last item, is just "G h", with out "d" means it did not delete current buffer, so at last the current buffer contains the reverse list of items. (My opinion, the last 'h' is not necessary since hold buffer is only for assistant use)

"x;x" combination. A second example:

/^<\example>/ { h; }
x
/^< example>/ { x; p; d;}
x
/^< example>/ {h;}

The five line script find the content of a XML "example" tag. It need a little time to figure out how it works, this script is not using b and n command, remember it and use it in some cases. A altered way may be easy to understand, here the hold buffer contains the content of the example tag.

/^< example>/ {
h
: example-loop
n
/^<\/example>/! {
H
b example-loop
}
x
p
d
}


Here is some of my thought,
1. "n" is alway used "p", thus with "sed -n". And if you use "p", then "p" is always connect with with "d", you don't do "b end" since "d" itself restart a new cycle, -- read a next line to current buffer.
2. using branch is easy to understand the flow, but in some case, the "x;x" command combination can also be considered.

Other thoughts:
3. eol : Use "sed -e 's/\r//' your_file" to convert line ending "dos to unix".