sed quick

From Devpit
Jump to: navigation, search

Escaping JSON string

To escape a string's tabs, newlines, and quotes with plain sed and sh. Hopefully this is portable, and it works at least in FreeBSD with no dependencies:

printf '%b' 't\\est\ttest\n{}['\'\'\''"""]\n' | (cat; echo .) | sed -En '1x;1!H;${x;s~['\\\"\'']~\\&~g;s~\n~\\n~g;s~'"$(printf '\t')"'~\\t~g;s~\.$~~;p;}'

-E for extended regex, and -n to avoid printing except with p.

Granted, this doesn't catch backspace, formfeed, carriage return, or other control characters. Is there an easy way to match those in sed?

Much of this higgelty piggelty is because sed can't match newlines that terminate lines. The idea is that for all lines except the last, simply append to the hold space; after appending the last line, fix everything up and print it. To handle the terminating newline correctly, append an extra dot to the input and strip it from the output. While the dot could've been in the test input, it seems clearer to demonstrate it separately so that the placeholder for arbitrary input is pure.

To walk through:

1x;1!H;

On only line 1, swap the input to the hold space. On not-line-1, append a newline and the input to the hold space. To begin with, I wrote this simply as H, but this resulted in an extra leading newline because the first iteration faithfully appends the newline and input to the empty hold space. So swap first, then append thereafter.

${}

Only apply the commands within to the last line.

x;

Swap the pattern to the input. Prior, all the input was concatenated into the hold space, so simply swapping puts all lines of input into a the input space.

s~['\\\"\'']~\\&~g;

Add leading backslash to backslash and quotes.

s~\n~\\n~g;

Substitute all newlines to \n. Without the trailing dot, this would not catch a terminating newline. In some cases, this may be preferable.

s~'"$(printf '\t')"'~\\t~g;

There is no expression for tab other than a literal tab. "$(printf '\t')" is handy (although cluttered) for inserting a literal tab without other interpretation problems.

s~\.$~~;p;

Strip the placeholder dot.

p;

Print.