File:  [Local Repository] / gnujdoc / sed-4.1.2 / sed-ja.texi
Revision 1.1: download - view: text, annotated - select for diffs
Sat Oct 1 13:35:24 2005 UTC (15 years, 1 month ago) by futoshi
Branches: MAIN
CVS tags: HEAD
add sed-4.1.2.

    1: \input texinfo  @c -*-texinfo-*-
    2: @c Do not edit this file!! It is automatically generated from sed-in.texi.
    3: @c
    4: @c -- Stuff that needs adding: ----------------------------------------------
    5: @c (document the `;' command-separator)
    6: @c --------------------------------------------------------------------------
    7: @c Check for consistency: regexps in @code, text that they match in @samp.
    8: @c 
    9: @c Tips:
   10: @c    @command for command
   11: @c    @samp for command fragments: @samp{cat -s}
   12: @c    @code for sed commands and flags
   13: @c    Use ``quote'' not `quote' or "quote".
   14: @c
   15: @c %**start of header
   16: @setfilename sed-ja.info
   17: @settitle sed, a stream editor
   18: @c %**end of header
   19: 
   20: @c @documentlanguage ja
   21: 
   22: @c @smallbook
   23: 
   24: @include sed-v.texi
   25: 
   26: @c Combine indices.
   27: @syncodeindex ky cp
   28: @syncodeindex pg cp
   29: @syncodeindex tp cp
   30: 
   31: @defcodeindex op
   32: @syncodeindex op fn
   33: 
   34: @include config-ja.texi
   35: 
   36: @copying
   37: This file documents version @value{VERSION} of
   38: @value{SSED}, a stream editor.
   39: 
   40: Copyright @copyright{} 1998, 1999, 2001, 2002, 2003, 2004 Free
   41: Software Foundation, Inc.
   42: 
   43: This document is released under the terms of the GNU Free Documentation
   44: License as published by the Free Software Foundation; either version 1.1, or
   45: (at your option) any later version.
   46: 
   47: You should have received a copy of the GNU Free Documentation License along
   48: with @value{SSED}; see the file @file{COPYING.DOC}.  If not, write to the Free
   49: Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
   50: 
   51: There are no Cover Texts and no Invariant Sections; this text, along
   52: with its equivalent in the printed manual, constitutes the Title Page.
   53: @end copying
   54: 
   55: @setchapternewpage off
   56: 
   57: @titlepage
   58: @title @command{sed}, a stream editor
   59: @subtitle version @value{VERSION}, @value{UPDATED}
   60: @author by Ken Pizzini, Paolo Bonzini
   61: @c 翻訳:西尾 太
   62: 
   63: @page
   64: @vskip 0pt plus 1filll
   65: Copyright @copyright{} 1998, 1999 Free Software Foundation, Inc.
   66: 
   67: @insertcopying
   68: 
   69: Published by the Free Software Foundation, @*
   70: 59 Temple Place - Suite 330 @*
   71: Boston, MA 02111-1307, USA
   72: @end titlepage
   73: 
   74: 
   75: @node Top
   76: @top
   77: 
   78: @ifnottex
   79: @insertcopying
   80: @end ifnottex
   81: 
   82: @menu
   83: * Introduction::               Introduction
   84: * Invoking sed::               Invocation
   85: * sed Programs::               @command{sed} programs
   86: * Examples::                   Some sample scripts
   87: * Limitations::                Limitations and (non-)limitations of @value{SSED}
   88: * Other Resources::            Other resources for learning about @command{sed}
   89: * Reporting Bugs::             Reporting bugs
   90: 
   91: * Extended regexps::           @command{egrep}-style regular expressions
   92: @ifset PERL
   93: * Perl regexps::               Perl-style regular expressions
   94: @end ifset
   95: 
   96: * Concept Index::              A menu with all the topics in this manual.
   97: * Command and Option Index::   A menu with all @command{sed} commands and
   98:                                command-line options.
   99: 
  100: @detailmenu
  101: @c --- The detailed node listing ---
  102: --- 詳細なノードリスト ---
  103: 
  104: @c sed Programs:
  105: sed プログラム:
  106: * Execution Cycle::                 How @command{sed} works
  107: * Addresses::                       Selecting lines with @command{sed}
  108: * Regular Expressions::             Overview of regular expression syntax
  109: * Common Commands::                 Often used commands
  110: * The "s" Command::                 @command{sed}'s Swiss Army Knife
  111: * Other Commands::                  Less frequently used commands
  112: * Programming Commands::            Commands for @command{sed} gurus
  113: * Extended Commands::               Commands specific of @value{SSED}
  114: * Escapes::                         Specifying special characters
  115: 
  116: @c Examples:
  117: 例:
  118: * Centering lines::
  119: * Increment a number::
  120: * Rename files to lower case::
  121: * Print bash environment::
  122: * Reverse chars of lines::
  123: * tac::                             Reverse lines of files
  124: * cat -n::                          Numbering lines
  125: * cat -b::                          Numbering non-blank lines
  126: * wc -c::                           Counting chars
  127: * wc -w::                           Counting words
  128: * wc -l::                           Counting lines
  129: * head::                            Printing the first lines
  130: * tail::                            Printing the last lines
  131: * uniq::                            Make duplicate lines unique
  132: * uniq -d::                         Print duplicated lines of input
  133: * uniq -u::                         Remove all duplicated lines
  134: * cat -s::                          Squeezing blank lines
  135: 
  136: @ifset PERL
  137: @c Perl regexps::                      Perl-style regular expressions
  138: Perlの正規表現::                    パール形式の正規表現
  139: * Backslash::                       Introduces special sequences
  140: * Circumflex/dollar sign/period::   Behave specially with regard to new lines
  141: * Square brackets::                 Are a bit different in strange cases
  142: * Options setting::                 Toggle modifiers in the middle of a regexp
  143: * Non-capturing subpatterns::       Are not counted when backreferencing
  144: * Repetition::                      Allows for non-greedy matching
  145: * Backreferences::                  Allows for more than 10 back references
  146: * Assertions::                      Allows for complex look ahead matches
  147: * Non-backtracking subpatterns::    Often gives more performance
  148: * Conditional subpatterns::         Allows if/then/else branches
  149: * Recursive patterns::              For example to match parentheses
  150: * Comments::                        Because things can get complex...
  151: @end ifset
  152: 
  153: @end detailmenu
  154: @end menu
  155: 
  156: 
  157: @node Introduction
  158: @c @chapter Introduction
  159: @chapter はじめに
  160: 
  161: @cindex Stream editor
  162: @c @command{sed} is a stream editor.
  163: @c A stream editor is used to perform basic text
  164: @c transformations on an input stream
  165: @c (a file or input from a pipeline).
  166: @c While in some ways similar to an editor which
  167: @c permits scripted edits (such as @command{ed}),
  168: @c @command{sed} works by making only one pass over the
  169: @c input(s), and is consequently more efficient.
  170: @c But it is @command{sed}'s ability to filter text in a pipeline
  171: @c which particularly distinguishes it from other types of
  172: @c editors.
  173: @c 
  174: @command{sed}はストリームエディタです.ストリームエディタとは,入力スト
  175: リーム(ファイルやパイプラインからの入力)で,基本的なテキスト変換を実行
  176: するために使用されます.(@command{ed}のように)スクリプトで編集可能なエ
  177: ディタに似たところもありますが,@command{sed}は入力を一回のみ通過ながら
  178: 動作し,結果としてより効率的になっています.しかし,他の形式のエディタ
  179: と特に区別される点として,パイプラインでテキストにフィルタをかける能力
  180: が@command{sed}にあることがあげられます.
  181: 
  182: @node Invoking sed
  183: @c @chapter Invocation
  184: @chapter 呼び出し
  185: 
  186: @c Normally @command{sed} is invoked like this:
  187: @c 
  188: 通常@command{sed}は以下のように呼び出されます.
  189: 
  190: @example
  191: sed SCRIPT INPUTFILE...
  192: @end example
  193: 
  194: @c The full format for invoking @command{sed} is:
  195: @c 
  196: @command{sed}の呼び出し全体の書式は以下のようになります.
  197: 
  198: @example
  199: sed OPTIONS... [SCRIPT] [INPUTFILE...]
  200: @end example
  201: 
  202: @c If you do not specify @var{INPUTFILE}, or if @var{INPUTFILE} is @file{-},
  203: @c @command{sed} filters the contents of the standard input.  The @var{script}
  204: @c is actually the first non-option parameter, which @command{sed} specially
  205: @c considers a script and not an input file if (and only if) none of the
  206: @c other @var{options} specifies a script to be executed, that is if neither
  207: @c of the @option{-e} and @option{-f} options is specified.
  208: @c 
  209: @var{INPUTFILE}を指定していない場合,または@var{INPUTFILE}が@file{-}の
  210: 場合,@command{sed}は標準入力の内容をフィルタリングします.
  211: @var{script}は,実際にはオプションではない最初のパラメータで,それは
  212: @command{sed}にとってスクリプトと思われるもの,かつ,入力ファイルではな
  213: いものです.それ以外の@var{options},つまり@option{-e}と@option{-f}オプ
  214: ションで実行スクリプトとして指定されている場合(そしてその場合だけ)はそ
  215: れがスクリプトになります.
  216: 
  217: @c @command{sed} may be invoked with the following command-line options:
  218: @c 
  219: @command{sed}は,以下のコマンドラインオプションを用いて呼び出してもかま
  220: いません.
  221: 
  222: @table @code
  223: @item --version
  224: @opindex --version
  225: @cindex Version, printing
  226: @c Print out the version of @command{sed} that is being run and a copyright notice,
  227: @c then exit.
  228: @c 
  229: 実行している@command{sed}のバージョンと著作権の注意を出力し終了します.
  230: 
  231: @item --help
  232: @opindex --help
  233: @cindex Usage summary, printing
  234: @c Print a usage message briefly summarizing these command-line options
  235: @c and the bug-reporting address,
  236: @c then exit.
  237: @c 
  238: これらのコマンドラインオプションを要約した,短い使用方法のメッセージと
  239: バグを報告するアドレスを出力し終了します.
  240: 
  241: @item -n
  242: @itemx --quiet
  243: @itemx --silent
  244: @opindex -n
  245: @opindex --quiet
  246: @opindex --silent
  247: @cindex Disabling autoprint, from command line
  248: @c By default, @command{sed} prints out the pattern space
  249: @c at the end of each cycle through the script.
  250: @c These options disable this automatic printing,
  251: @c and @command{sed} only produces output when explicitly told to
  252: @c via the @code{p} command.
  253: @c 
  254: デフォルトで@command{sed}は,スクリプトのサイクルごとの終了時にパターン
  255: 空間を出力します.これらのオプションで,この自動的な出力を使用不可能に
  256: し,@command{sed}は@code{p}コマンドで明示的に告げるときだけ出力を生成し
  257: ます.
  258: 
  259: @item -i[@var{SUFFIX}]
  260: @itemx --in-place[=@var{SUFFIX}]
  261: @opindex -i
  262: @opindex --in-place
  263: @cindex In-place editing, activating
  264: @cindex @value{SSEDEXT}, in-place editing
  265: @c This option specifies that files are to be edited in-place.
  266: @c @value{SSED} does this by creating a temporary file and
  267: @c sending output to this file rather than to the standard
  268: @c output.@footnote{This applies to commands such as @code{=},
  269: @c @code{a}, @code{c}, @code{i}, @code{l}, @code{p}.  You can
  270: @c still write to the standard output by using the @code{w}
  271: @c @cindex @value{SSEDEXT}, @file{/dev/stdout} file
  272: @c or @code{W} commands together with the @file{/dev/stdout}
  273: @c special file}.
  274: @c 
  275: このオプションは,その場でファイルを編集することを指定します.
  276: @value{SSED}は一時的なファイルを作成し,標準出力の代わりにこのファイル
  277: に出力を送ります@footnote{これは,@code{=},@code{a},@code{c},
  278: @code{i},@code{l},@code{p}のようなコマンドに適用されます.
  279: @cindex @value{SSEDEXT}, @file{/dev/stdout} file
  280: @file{/dev/stdout}の特殊ファイルを用いて@code{w}や@code{W}コマンドを使
  281: 用することで,標準出力にも書き出すことが可能です}.
  282: 
  283: @c This option implies @option{-s}.
  284: @c 
  285: このオプションは@option{-s}を暗黙に指定します.
  286: 
  287: @c When the end of the file is reached, the temporary file is
  288: @c renamed to the output file's original name.  The extension,
  289: @c if supplied, is used to modify the name of the old file
  290: @c before renaming the temporary file, thereby making a backup
  291: @c copy@footnote{Note that @value{SSED} creates the backup
  292: @c     file whether or not any output is actually changed.}).
  293: @c 
  294: ファイルの終りに到達したとき,一時ファイルの名前を出力ファイルの元の名
  295: 前に変更します.拡張子が与えられている場合,一時ファイル(それによってバッ
  296: クアップのコピー@footnote{@value{SSED}は,出力ファイルが実際に変更され
  297: ているかどうかに依存せずバックアップファイルを作成することに注意してく
  298: ださい.}が作成されます)の名前を変える前に古いファイルの名前を編集する
  299: ために,以下の規則を使用します.
  300: 
  301: @cindex In-place editing, Perl-style backup file names
  302: @c This rule is followed: if the extension doesn't contain a @code{*},
  303: @c then it is appended to the end of the current filename as a
  304: @c suffix; if the extension does contain one or more @code{*}
  305: @c characters, then @emph{each} asterisk is replaced with the
  306: @c current filename.  This allows you to add a prefix to the
  307: @c backup file, instead of (or in addition to) a suffix, or
  308: @c even to place backup copies of the original files into another
  309: @c directory (provided the directory already exists).
  310: @c 
  311: 拡張子が@code{*}を含まない場合,それはファイル名の接尾子として現在のファ
  312: イル名の終りに後置されます.拡張子が一つ以上の@code{*}文字を含む場合,
  313: @emph{それぞれの}アスタリスクは現在のファイル名で置換されます.これで,
  314: バックアップファイルに接尾子の代わりに(または接尾子に追加で)接頭辞を追
  315: 加することや,オリジナルのファイルのバックアップのコピーを(既存のディレ
  316: クトリを提供することで)他のディレクトリに配置することさえ可能になります.
  317: 
  318: @c If no extension is supplied, the original file is
  319: @c overwritten without making a backup.
  320: @c 
  321: 拡張子が与えられていない場合,本埜ファイルはバックアップを使用せず上書
  322: きされます.
  323: 
  324: @item -l @var{N}
  325: @itemx --line-length=@var{N}
  326: @opindex -l
  327: @opindex --line-length
  328: @cindex Line length, setting
  329: @c Specify the default line-wrap length for the @code{l} command.
  330: @c A length of 0 (zero) means to never wrap long lines.  If
  331: @c not specified, it is taken to be 70.
  332: @c 
  333: @code{l}コマンドに対するデフォルトの行を丸める長さを指定します.0 (ゼ
  334: ロ)の長さは,長い行を決して丸めないことを意味します.指定されていない場
  335: 合,それは70になります.
  336: 
  337: @item --posix
  338: @cindex @value{SSEDEXT}, disabling
  339: @c @value{SSED} includes several extensions to @acronym{POSIX}
  340: @c sed.  In order to simplify writing portable scripts, this
  341: @c option disables all the extensions that this manual documents,
  342: @c including additional commands.
  343: @c 
  344: @value{SSED}は,@acronym{POSIX}の@command{sed}への拡張がいくつかありま
  345: す.単純に移植性の高いスクリプトを書くために,このマニュアルで説明して
  346: いる拡張や追加コマンドを,このオプションで利用不可能にします.
  347: 
  348: @cindex @code{POSIXLY_CORRECT} behavior, enabling
  349: @c Most of the extensions accept @command{sed} programs that
  350: @c are outside the syntax mandated by @acronym{POSIX}, but some
  351: @c of them (such as the behavior of the @command{N} command
  352: @c described in @pxref{Reporting Bugs}) actually violate the
  353: @c standard.  If you want to disable only the latter kind of
  354: @c extension, you can set the @code{POSIXLY_CORRECT} variable
  355: @c to a non-empty value.
  356: @c 
  357: @command{sed}プログラムは,@acronym{POSIX}で必須とされている構文以外の
  358: ほとんどの拡張を受け入れるますが,(@pxref{Reporting Bugs}で記述されてい
  359: る@command{N}コマンドの動作のように)実際に標準を逸脱しているものもあり
  360: ます.後者のような拡張だけを利用不可能にしたい場合,
  361: @code{POSIXLY_CORRECT}変数を空ではない値に設定することで可能となります.
  362: 
  363: @item -r
  364: @itemx --regexp-extended
  365: @opindex -r
  366: @opindex --regexp-extended
  367: @cindex Extended regular expressions, choosing
  368: @cindex @acronym{GNU} extensions, extended regular expressions
  369: @c Use extended regular expressions rather than basic
  370: @c regular expressions.  Extended regexps are those that
  371: @c @command{egrep} accepts; they can be clearer because they
  372: @c usually have less backslashes, but are a @acronym{GNU} extension
  373: @c and hence scripts that use them are not portable.
  374: @c @xref{Extended regexps, , Extended regular expressions}.
  375: @c 
  376: 基本正規表現ではなく拡張正規表現を使用します.拡張正規表現とは,
  377: @command{egrep}が受け入れるものです.それらは,バックスラッシュが少ない
  378: ためよりわかりやすくなりますが,それは@acronym{GNU}の拡張なので,それを
  379: 使用しているスクリプトは移植性が無くなります.@xref{Extended regexps,
  380: , Extended regular expressions}.
  381: 
  382: @ifset PERL
  383: @item -R
  384: @itemx --regexp-perl
  385: @opindex -R
  386: @opindex --regexp-perl
  387: @cindex Perl-style regular expressions, choosing
  388: @cindex @value{SSEDEXT}, Perl-style regular expressions
  389: @c Use Perl-style regular expressions rather than basic
  390: @c regular expressions.  Perl-style regexps are extremely
  391: @c powerful but are a @value{SSED} extension and hence scripts that
  392: @c use it are not portable.  @xref{Perl regexps, ,
  393: @c Perl-style regular expressions}.
  394: @c 
  395: 基本的な正規表現ではなく,Perl形式の正規表現を使用します.Perl形式の正
  396: 規表現は非常に強力ですが,それは@value{SSED}の拡張なので,それを使用し
  397: ているスクリプトは移植性が無くなります.@xref{Perl regexps, ,
  398: Perl-style regular expressions}.
  399: @end ifset
  400: 
  401: @item -s
  402: @itemx --separate
  403: @cindex Working on separate files
  404: @c By default, @command{sed} will consider the files specified on the
  405: @c command line as a single continuous long stream.  This @value{SSED}
  406: @c extension allows the user to consider them as separate files:
  407: @c range addresses (such as @samp{/abc/,/def/}) are not allowed
  408: @c to span several files, line numbers are relative to the start
  409: @c of each file, @code{$} refers to the last line of each file,
  410: @c and files invoked from the @code{R} commands are rewound at the
  411: @c start of each file.
  412: @c 
  413: デフォルトで,@command{sed}はコマンドラインで指定されているファイルが単
  414: 一の長いストリームだと考えます.この@value{SSED}の拡張で,ユーザがそれ
  415: らを別々のファイルだと考えることを可能にします.(@samp{/abc/,/def/}のよ
  416: うな)範囲指定のアドレスは,複数のファイルを跨ぐことができず,ファイルの
  417: 行番号はそれぞれのファイルの開始位置から相対的なものになり,@code{$}は
  418: それぞれのファイルの最後の行を参照し,@code{R}コマンドから呼び出された
  419: ファイルはそれぞれのファイルの最初に巻き戻されます.
  420: 
  421: @item -u
  422: @itemx --unbuffered
  423: @opindex -u
  424: @opindex --unbuffered
  425: @cindex Unbuffered I/O, choosing
  426: @c Buffer both input and output as minimally as practical.
  427: @c (This is particularly useful if the input is coming from
  428: @c the likes of @samp{tail -f}, and you wish to see the transformed
  429: @c output as soon as possible.)
  430: @c 
  431: 実用として最小限,入力と出力の両方をバッファリングします.(入力が
  432: @samp{tail -f}のようなものからのときと,できるだけ速く出力を変換して見
  433: たいとき,特に役に立ちます.)
  434: 
  435: @item -e @var{script}
  436: @itemx --expression=@var{script}
  437: @opindex -e
  438: @opindex --expression
  439: @cindex Script, from command line
  440: @c Add the commands in @var{script} to the set of commands to be
  441: @c run while processing the input.
  442: @c 
  443: @var{script}のコマンドを,入力を処理している間に実行するコマンドの組に
  444: 追加します.
  445: 
  446: @item -f @var{script-file}
  447: @itemx --file=@var{script-file}
  448: @opindex -f
  449: @opindex --file
  450: @cindex Script, from a file
  451: @c Add the commands contained in the file @var{script-file}
  452: @c to the set of commands to be run while processing the input.
  453: @c 
  454: ファイル@var{script-file}に含まれているコマンドを,入力を処理している間
  455: に実行するコマンドの組に追加します.
  456: @end table
  457: 
  458: @c If no @option{-e}, @option{-f}, @option{--expression}, or @option{--file}
  459: @c options are given on the command-line,
  460: @c then the first non-option argument on the command line is
  461: @c taken to be the @var{script} to be executed.
  462: @c 
  463: @option{-e},@option{-f},@option{--expression},または@option{--file}
  464: オプションがコマンドラインで全く与えられていない場合,コマンドライン上
  465: の最初のオプションではない引数が実行するスクリプトとして渡されます.
  466: 
  467: @cindex Files to be processed as input
  468: @c If any command-line parameters remain after processing the above,
  469: @c these parameters are interpreted as the names of input files to
  470: @c be processed.
  471: @c 
  472: コマンドラインのパラメータが上記のものを処理した後に残っている場合,こ
  473: れらのパラメータは,処理する入力ファイルの名前として解釈されます.
  474: 
  475: @cindex Standard input, processing as input
  476: @c A file name of @samp{-} refers to the standard input stream.
  477: @c The standard input will be processed if no file names are specified.
  478: @c 
  479: @samp{-}のファイル名は標準入力を参照します.ファイル名が指定されていな
  480: い場合,標準入力が処理されます.
  481: 
  482: 
  483: @node sed Programs
  484: @c @chapter @command{sed} Programs
  485: @chapter @sc{sed}プログラム
  486: 
  487: @cindex @command{sed} program structure
  488: @cindex Script structure
  489: @c A @command{sed} program consists of one or more @command{sed} commands,
  490: @c passed in by one or more of the
  491: @c @option{-e}, @option{-f}, @option{--expression}, and @option{--file}
  492: @c options, or the first non-option argument if zero of these
  493: @c options are used.
  494: @c This document will refer to ``the'' @command{sed} script;
  495: @c this is understood to mean the in-order catenation
  496: @c of all of the @var{script}s and @var{script-file}s passed in.
  497: @c 
  498: @command{sed}プログラムは,一つ以上の@command{sed}コマンド,一つ以上の
  499: @option{-e},@option{-f},@option{--expression},そして@option{--file}
  500: オプション,またはこれらのオプションが使用されていない場合は最初のオプ
  501: ションではない引数で渡されるものから成り立っています.このドキュメント
  502: は``その'' @command{sed}スクリプトを記述します.渡される@var{script}と
  503: @var{script-file}を全て連結したものの意味はこれで理解できるでしょう.
  504: 
  505: @c Each @command{sed} command consists of an optional address or
  506: @c address range, followed by a one-character command name
  507: @c and any additional command-specific code.
  508: @c 
  509: それぞれの@command{sed}コマンドは,オプションのアドレスやアドレスの範囲
  510: から成り立っていて,それには一文字のコマンド名とさらにコマンド特有のコー
  511: ドが続きます.
  512: 
  513: @menu
  514: * Execution Cycle::          How @command{sed} works
  515: * Addresses::                Selecting lines with @command{sed}
  516: * Regular Expressions::      Overview of regular expression syntax
  517: * Common Commands::          Often used commands
  518: * The "s" Command::          @command{sed}'s Swiss Army Knife
  519: * Other Commands::           Less frequently used commands
  520: * Programming Commands::     Commands for @command{sed} gurus
  521: * Extended Commands::        Commands specific of @value{SSED}
  522: * Escapes::                  Specifying special characters
  523: @end menu
  524: 
  525: 
  526: @node Execution Cycle
  527: @c @section How @command{sed} Works
  528: @section @command{sed}が動作する様子
  529: 
  530: @cindex Buffer spaces, pattern and hold
  531: @cindex Spaces, pattern and hold
  532: @cindex Pattern space, definition
  533: @cindex Hold space, definition
  534: @c @command{sed} maintains two data buffers: the active @emph{pattern} space,
  535: @c and the auxiliary @emph{hold} space. Both are initially empty.
  536: @c 
  537: @command{sed}は二つのデータバッファを管理しています.アクティブな
  538: @emph{パターン}空間と補助的な@emph{ホールド}空間です.両方とも最初は空
  539: です.
  540: 
  541: @c @command{sed} operates by performing the following cycle on each
  542: @c lines of input: first, @command{sed} reads one line from the input
  543: @c stream, removes any trailing newline, and places it in the pattern space.
  544: @c Then commands are executed; each command can have an address associated
  545: @c to it: addresses are a kind of condition code, and a command is only
  546: @c executed if the condition is verified before the command is to be
  547: @c executed.
  548: @c 
  549: @command{sed}は,入力行単位で以下のサイクルを実行することで処理を実行し
  550: ます.最初に@command{sed}は入力ストリームから一行読み込み,くっついてい
  551: る改行を削除し,そしてそれをパターン空間に配置します.そして,コマンド
  552: が実行されます.それぞれのコマンドには実行するものに関連付けされたアド
  553: レスがあります.アドレスとは条件コードの一種で,コマンドが実行される前
  554: に条件が合致している場合のみコマンドは実行されます.
  555: 
  556: @c When the end of the script is reached, unless the @option{-n} option
  557: @c is in use, the contents of pattern space are printed out to the output
  558: @c stream, adding back the trailing newline if it was removed.@footnote{Actually,
  559: @c   if @command{sed} prints a line without the terminating newline, it will
  560: @c   nevertheless print the missing newline as soon as more text is sent to
  561: @c   the same output stream, which gives the ``least expected surprise''
  562: @c   even though it does not make commands like @samp{sed -n p} exactly
  563: @c   identical to @command{cat}.} Then the next cycle starts for the next
  564: @c input line.
  565: @c 
  566: スクリプトが終わると,@option{-n}オプションが使用されていないかぎり,パ
  567: ターン空間の内容が出力ストリームに出力され,削除されないかぎり改行が最
  568: 後に追加されます.@footnote{実際,@command{sed}は最後に改行を付けずに行
  569: を出力する場合,それ以降のテキストが同じ出力ストリームに送られるとすぐ
  570: に足りない改行を出力するので,@command{cat}とまったく同じ@samp{sed -n
  571: p}のようなコマンドではそうならないので,``予想外の驚き''になることでしょ
  572: う.}そして,次のサイクルでは次の入力行から開始します.
  573: 
  574: @c Unless special commands (like @samp{D}) are used, the pattern space is
  575: @c deleted between two cycles. The hold space, on the other hand, keeps
  576: @c its data between cycles (see commands @samp{h}, @samp{H}, @samp{x},
  577: @c @samp{g}, @samp{G} to move data between both buffers).
  578: @c 
  579: (@samp{D}のような)特殊コマンドが使用されていないかぎり,パターン空間は
  580: 二つのサイクルの間に削除されます.一方,ホールド空間はサイクル間でデー
  581: タを保持します(両方のバッファ間でデータを移動するコマンド@samp{h},
  582: @samp{H},@samp{x},@samp{g},@samp{G}を参照してください).
  583: 
  584: 
  585: @node Addresses
  586: @c @section Selecting lines with @command{sed}
  587: @section @command{sed}で行を選択する
  588: @cindex Addresses, in @command{sed} scripts
  589: @cindex Line selection
  590: @cindex Selecting lines to process
  591: 
  592: @c Addresses in a @command{sed} script can be in any of the following forms:
  593: @c 
  594: @command{sed}スクリプトのアドレスは,以下の形式のいずれかになります.
  595: 
  596: @table @code
  597: @item @var{number}
  598: @cindex Address, numeric
  599: @cindex Line, selecting by number
  600: @c Specifying a line number will match only that line in the input.
  601: @c (Note that @command{sed} counts lines continuously across all input files
  602: @c unless @option{-i} or @option{-s} options are specified.)
  603: @c 
  604: 行番号の指定は,入力のその行だけにマッチします.(@command{sed}は,
  605: @option{-i}や@option{-s}オプションが指定されていない限り,入力ファイル
  606: 全体に渡って,連続して行を数えることに注意してください.)
  607: 
  608: @item @var{first}~@var{step}
  609: @cindex @acronym{GNU} extensions, @samp{@var{n}~@var{m}} addresses
  610: @c This @acronym{GNU} extension matches every @var{step}th line
  611: @c starting with line @var{first}.
  612: @c In particular, lines will be selected when there exists
  613: @c a non-negative @var{n} such that the current line-number equals
  614: @c @var{first} + (@var{n} * @var{step}).
  615: @c Thus, to select the odd-numbered lines,
  616: @c one would use @samp{1~2};
  617: @c to pick every third line starting with the second, @samp{2~3} would be used;
  618: @c to pick every fifth line starting with the tenth, use @samp{10~5};
  619: @c and @samp{50~0} is just an obscure way of saying @samp{50}.
  620: @c 
  621: この@acronym{GNU}の拡張は,行@var{first}で始まり@var{step}番目ごとの行
  622: にマッチします.特に,負ではない@var{n}が存在するとき,現在の行番号が
  623: @var{first} + (@var{n} * @var{step})になる行が選択されます.このため,
  624: 偶数行を選択するには,@samp{1~2}を使用することになるでしょう.二行目か
  625: ら始まり三行ごとに取り上げるには,@samp{2~3}を使用することになるでしょ
  626: う.十行目から始まり五行ごとに取り上げるには,@samp{10~5}を使用すること
  627: になるでしょう.そして,@samp{50~0}は@samp{50}とする分かりにくい方法に
  628: すぎません.
  629: 
  630: @item $
  631: @cindex Address, last line
  632: @cindex Last line, selecting
  633: @cindex Line, selecting last
  634: @c This address matches the last line of the last file of input, or
  635: @c the last line of each file when the @option{-i} or @option{-s} options
  636: @c are specified.
  637: @c 
  638: このアドレスは,入力の最後のファイルの最後の行や,@option{-i}や
  639: @option{-s}オプションが指定されているときは,それぞれのファイルの最後の
  640: 行にマッチします.
  641: 
  642: @item /@var{regexp}/
  643: @cindex Address, as a regular expression
  644: @cindex Line, selecting by regular expression match
  645: @c This will select any line which matches the regular expression @var{regexp}.
  646: @c If @var{regexp} itself includes any @code{/} characters,
  647: @c each must be escaped by a backslash (@code{\}).
  648: @c 
  649: 正規表現@var{regexp}にマッチする全ての行を選択します.@var{regexp}自身
  650: に@code{/}文字が含まれる場合,それぞれバックスラッシュ(@code{\})でエス
  651: ケープする必要があります.
  652: 
  653: @cindex empty regular expression
  654: @cindex @value{SSEDEXT}, modifiers and the empty regular expression
  655: @c The empty regular expression @samp{//} repeats the last regular
  656: @c expression match (the same holds if the empty regular expression is
  657: @c passed to the @code{s} command).  Note that modifiers to regular expressions
  658: @c are evaluated when the regular expression is compiled, thus it is invalid to
  659: @c specify them together with the empty regular expression.
  660: @c 
  661: 空の正規表現@samp{//}は,前回の正規表現に繰り返しマッチします(それは,
  662: 空の正規表現が@code{s}コマンドに渡されても同じです).正規表現を編集する
  663: と,正規表現がコンパイルされるときに評価されるので,空の正規表現ととも
  664: にそれらを指定することは不正な処理になることに注意してください.
  665: 
  666: @item \%@var{regexp}%
  667: @c (The @code{%} may be replaced by any other single character.)
  668: @c 
  669: (@code{%}は他の任意の単一の文字で置換してもかまいません.)
  670: 
  671: @cindex Slash character, in regular expressions
  672: @c This also matches the regular expression @var{regexp},
  673: @c but allows one to use a different delimiter than @code{/}.
  674: @c This is particularly useful if the @var{regexp} itself contains
  675: @c a lot of slashes, since it avoids the tedious escaping of every @code{/}.
  676: @c If @var{regexp} itself includes any delimiter characters,
  677: @c each must be escaped by a backslash (@code{\}).
  678: @c 
  679: これは正規表現@var{regexp}にもマッチしますが,これで@code{/}以外の分離
  680: 文字を使用することが可能になります.@var{regexp}自身が大量の@code{/}を
  681: 含んでいる場合,個々の@code{/}をエスケープするのも退屈なので,それを避
  682: けるとき特に役に立ちます.@var{regexp}自身が分離文字を含んでいる場合,
  683: それぞれをバックスラッシュ(@code{\})でエスケープする必要があります.
  684: 
  685: @item /@var{regexp}/I
  686: @itemx \%@var{regexp}%I
  687: @cindex @acronym{GNU} extensions, @code{I} modifier
  688: @ifset PERL
  689: @cindex Perl-style regular expressions, case-insensitive
  690: @end ifset
  691: @c The @code{I} modifier to regular-expression matching is a @acronym{GNU}
  692: @c extension which causes the @var{regexp} to be matched in
  693: @c a case-insensitive manner.
  694: @c 
  695: 正規表現のマッチに対する@code{I}指示語は@acronym{GNU}の拡張で,それによ
  696: り@var{regexp}は大文字小文字を無視してマッチします.
  697: 
  698: @item /@var{regexp}/M
  699: @itemx \%@var{regexp}%M
  700: @ifset PERL
  701: @cindex @value{SSEDEXT}, @code{M} modifier
  702: @end ifset
  703: @cindex Perl-style regular expressions, multiline
  704: @c The @code{M} modifier to regular-expression matching is a @value{SSED}
  705: @c extension which causes @code{^} and @code{$} to match respectively
  706: @c (in addition to the normal behavior) the empty string after a newline,
  707: @c and the empty string before a newline.  There are special character
  708: @c sequences
  709: @c 
  710: 正規表現のマッチに対する@code{M}指示語は,@value{SSED}の拡張で,
  711: @code{^}と@code{$}を(通常の動作に加え)それぞれ改行後の空の文字列と改行
  712: 前の空の文字列にマッチさせます.これらは特殊文字のならびです.
  713: 
  714: @ifset PERL
  715: @c (@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'}
  716: @c in basic or extended regular expression modes)
  717: @c 
  718: (Perlモードの@code{\A}と@code{\Z},基本または拡張正規表現モードの
  719: @code{\`}と@code{\'})
  720: 
  721: @end ifset
  722: @ifclear PERL
  723: @c (@code{\`} and @code{\'})
  724: @c 
  725: (@code{\`}と@code{\'})
  726: @end ifclear
  727: @c which always match the beginning or the end of the buffer.
  728: @c @code{M} stands for @cite{multi-line}.
  729: @c 
  730: それは,バッファの最初と最後にマッチします.@code{M}は
  731: @cite{multi-line}を意味します.
  732: 
  733: @ifset PERL
  734: @item /@var{regexp}/S
  735: @itemx \%@var{regexp}%S
  736: @cindex @value{SSEDEXT}, @code{S} modifier
  737: @cindex Perl-style regular expressions, single line
  738: @c The @code{S} modifier to regular-expression matching is only valid
  739: @c in Perl mode and specifies that the dot character (@code{.}) will
  740: @c match the newline character too.  @code{S} stands for @cite{single-line}.
  741: @c 
  742: 正規表現のマッチに対する@code{S}指示語は,Perlモードでのみ有効で,ドッ
  743: ト文字(@code{.})が改行文字にもマッチするように指定します.@code{S}は
  744: @cite{single-line}を意味します.
  745: @end ifset
  746: 
  747: @ifset PERL
  748: @item /@var{regexp}/X
  749: @itemx \%@var{regexp}%X
  750: @cindex @value{SSEDEXT}, @code{X} modifier
  751: @cindex Perl-style regular expressions, extended
  752: @c The @code{X} modifier to regular-expression matching is also
  753: @c valid in Perl mode only.  If it is used, whitespace in the
  754: @c pattern (other than in a character class) and
  755: @c characters between a @kbd{#} outside a character class and the
  756: @c next newline character are ignored. An escaping backslash
  757: @c can be used to include a whitespace or @kbd{#} character as part
  758: @c of the pattern.
  759: @c 
  760: 正規表現のマッチに対する@code{X}指示語も,Perlモードでのみ有効です.そ
  761: れが使用されている場合,パターン内の空白(文字集合以外のもの)と,文字集
  762: 合の外側の,@kbd{#}と次の改行文字の間の文字を無視します.パターンの一部
  763: に空白や@kbd{#}文字を含めるために,バックスラッシュを使用することが可能
  764: です.
  765: @end ifset
  766: @end table
  767: 
  768: @c If no addresses are given, then all lines are matched;
  769: @c if one address is given, then only lines matching that
  770: @c address are matched.
  771: @c 
  772: アドレスが与えられていない場合,全ての行がマッチします.一つのアドレス
  773: が与えられている場合,マッチする行はアドレスがマッチしたものだけになり
  774: ます.
  775: 
  776: @cindex Range of lines
  777: @cindex Several lines, selecting
  778: @c An address range can be specified by specifying two addresses
  779: @c separated by a comma (@code{,}).  An address range matches lines
  780: @c starting from where the first address matches, and continues
  781: @c until the second address matches (inclusively).
  782: @c 
  783: アドレスの範囲はカンマ(@code{,})で分けられている二つのアドレスで指定す
  784: ることで指定することが可能です.アドレスの範囲は最初のアドレスにマッチ
  785: している行から始まり,二番目のアドレス(これは含まれます)にマッチするま
  786: で続きます.
  787: 
  788: @c If the second address is a @var{regexp}, then checking for the
  789: @c ending match will start with the line @emph{following} the
  790: @c line which matched the first address: a range will always
  791: @c span at least two lines (except of course if the input stream
  792: @c ends).
  793: @c 
  794: 二番目のアドレスが@var{regexp}の場合,マッチの終りの調査は,最初のアド
  795: レスにマッチした行の@emph{次の}行から開始されます.範囲は常に少なくとも
  796: 二行になります(入力ストリームが終わる場合はもちろん例外です).
  797: 
  798: @c If the second address is a @var{number} less than (or equal to)
  799: @c the line matching the first address, then only the one line is
  800: @c matched.
  801: @c 
  802: 二番目のアドレスが@var{number}で,最初にマッチした行のアドレスより小さ
  803: い(または同じ)場合,一行のみマッチします.
  804: 
  805: @cindex Special addressing forms
  806: @cindex Range with start address of zero
  807: @cindex Zero, as range start address
  808: @cindex @var{addr1},+N
  809: @cindex @var{addr1},~N
  810: @cindex @acronym{GNU} extensions, special two-address forms
  811: @cindex @acronym{GNU} extensions, @code{0} address
  812: @cindex @acronym{GNU} extensions, 0,@var{addr2} addressing
  813: @cindex @acronym{GNU} extensions, @var{addr1},+@var{N} addressing
  814: @cindex @acronym{GNU} extensions, @var{addr1},~@var{N} addressing
  815: @c @value{SSED} also supports some special two-address forms; all these
  816: @c are @acronym{GNU} extensions:
  817: @c 
  818: @value{SSED}も特殊な二つのアドレス形式をサポートしています.これらはす
  819: べて@acronym{GNU}の拡張です.
  820: @table @code
  821: @item 0,/@var{regexp}/
  822: @c A line number of @code{0} can be used in an address specification like
  823: @c @code{0,/@var{regexp}/} so that @command{sed} will try to match
  824: @c @var{regexp} in the first input line too.  In other words,
  825: @c @code{0,/@var{regexp}/} is similar to @code{1,/@var{regexp}/},
  826: @c except that if @var{addr2} matches the very first line of input the
  827: @c @code{0,/@var{regexp}/} form will consider it to end the range, whereas
  828: @c the @code{1,/@var{regexp}/} form will match the beginning of its range and
  829: @c hence make the range span up to the @emph{second} occurrence of the
  830: @c regular expression.
  831: @c 
  832: @command{sed}が最初の入力行でも@var{regexp}へのマッチを試みるように,
  833: @code{0,/@var{regexp}/}のように,アドレス指定で行番号の@code{0}を使用す
  834: ることが可能です.言い替えると,@code{0,/@var{regexp}/}は
  835: @code{1,/@var{regexp}/}に似ていますが,@var{addr2}が入力の最初の行にマッ
  836: チする場合,@code{0,/@var{regexp}/}の形式では範囲の終わりだと考慮されま
  837: すが,一方@code{1,/@var{regexp}/}の形式ではその範囲の最初にマッチするの
  838: で,正規表現が@emph{二番目に}マッチするのが見つかるまで範囲が広げられま
  839: す.
  840: 
  841: @c Note that this is the only place where the @code{0} address makes
  842: @c sense; there is no 0-th line and commands which are given the @code{0}
  843: @c address in any other way will give an error.
  844: @c 
  845: これは,@code{0}のアドレスに意味がある場所だけだということに注意してく
  846: ださい.0番目の行はなく,それ以外の方法では,@code{0}のアドレスに該当す
  847: るコマンドはエラーとなります.
  848: 
  849: @item @var{addr1},+@var{N}
  850: @c Matches @var{addr1} and the @var{N} lines following @var{addr1}.
  851: @c 
  852: @var{addr1}と@var{addr1}に続く@var{N}行にマッチします.
  853: 
  854: @item @var{addr1},~@var{N}
  855: @c Matches @var{addr1} and the lines following @var{addr1}
  856: @c until the next line whose input line number is a multiple of @var{N}.
  857: @c 
  858: @var{addr1}と,入力の行番号が@var{N}の倍数になるまでの@var{addr1}に続く
  859: 行にマッチします.
  860: @end table
  861: 
  862: @cindex Excluding lines
  863: @cindex Selecting non-matching lines
  864: @c Appending the @code{!} character to the end of an address
  865: @c specification negates the sense of the match.
  866: @c That is, if the @code{!} character follows an address range,
  867: @c then only lines which do @emph{not} match the address range
  868: @c will be selected.
  869: @c This also works for singleton addresses,
  870: @c and, perhaps perversely, for the null address.
  871: @c 
  872: アドレス指定の終りに@code{!}文字を後置すると,マッチの意味が否定されま
  873: す.すなわち,@code{!}がアドレスの範囲に続いている場合,選択されたアド
  874: レスの範囲にマッチ@emph{しない}行だけが選択されます.これは一つのアドレ
  875: スに対しても動作し,おそらくひねくれているだけでしょうが,何もないアド
  876: レスに対しても動作します.
  877: 
  878: @node Regular Expressions
  879: @c @section Overview of Regular Expression Syntax
  880: @section 正規表現の構文の概要
  881: 
  882: @c To know how to use @command{sed}, people should understand regular
  883: @c expressions (@dfn{regexp} for short).  A regular expression
  884: @c is a pattern that is matched against a
  885: @c subject string from left to right.  Most characters are
  886: @c @dfn{ordinary}: they stand for
  887: @c themselves in a pattern, and match the corresponding characters
  888: @c in the subject.  As a trivial example, the pattern
  889: @c 
  890: @command{sed}の使用方法を知るため,正規表現(短くすると@dfn{regexp})を理
  891: 解すべきです.正規表現とは,右から左へならぶ対象文字列に対してマッチす
  892: るパターンです.ほとんどの文字は@dfn{普通(ordinary)}のもので,それらは
  893: パターン内ではその文字自身を意味し,対象内の対応する文字にマッチします.
  894: ちょっとした例として以下のパターンを考えます.
  895: 
  896: @example
  897:      The quick brown fox
  898: @end example
  899: 
  900: @noindent
  901: @c matches a portion of a subject string that is identical to
  902: @c itself.  The power of regular expressions comes from the
  903: @c ability to include alternatives and repetitions in the pattern.
  904: @c These are encoded in the pattern by the use of @dfn{special characters},
  905: @c which do not stand for themselves but instead
  906: @c are interpreted in some special way.  Here is a brief description
  907: @c of regular expression syntax as used in @command{sed}.
  908: @c 
  909: それは,それと全く同一な対象の文字列の一部にマッチします.正規表現の威
  910: 力は,パターン内に代入と繰り返しを含めるという能力にあります.それらは
  911: @dfn{特殊文字(special characters)}を使用してパターン内にエンコードされ,
  912: その文字自身を意味せず,代わりに特殊な方法で解釈されます.
  913: @command{sed}で使用される正規表現の構文の短い記述は以下のようになります.
  914: 
  915: @table @code
  916: @item @var{char}
  917: @c A single ordinary character matches itself.
  918: @c 
  919: 単一文字で,特殊なものでない場合はテキストにマッチします.
  920: 
  921: @item *
  922: @cindex @acronym{GNU} extensions, to basic regular expressions
  923: @c Matches a sequence of zero or more instances of matches for the
  924: @c preceding regular expression, which must be an ordinary character, a
  925: @c special character preceded by @code{\}, a @code{.}, a grouped regexp
  926: @c (see below), or a bracket expression.  As a @acronym{GNU} extension, a
  927: @c postfixed regular expression can also be followed by @code{*}; for
  928: @c example, @code{a**} is equivalent to @code{a*}.  @acronym{POSIX}
  929: @c 1003.1-2001 says that @code{*} stands for itself when it appears at
  930: @c the start of a regular expression or subexpression, but many
  931: @c non@acronym{GNU} implementations do not support this and portable
  932: @c scripts should instead use @code{\*} in these contexts.
  933: @c 
  934: 直前の正規表現のゼロ回以上の繰り返しにマッチし,それは普通の文字,
  935: @code{\}が前置されている特殊な文字,@code{.},正規表現のグループ(以下を
  936: 参照),括弧でくくられた表現です.@acronym{GNU}の拡張として,正規表現の
  937: 語尾に@code{*}を続けることも可能です.例えば,@code{a**}は@code{a*}と等
  938: 価です.@acronym{POSIX} 1003.1-2001では,@code{*}が正規表現やサブ正規表
  939: 現の先頭にある場合には,その文字自身を意味するとされていますが,
  940: @acronym{GNU}以外での実装では,ほとんどもものがこれをサポートしていない
  941: ので,移植性の高いスクリプトでは,このときには代わりに@code{\*}を使用し
  942: てください.
  943: 
  944: @item \+
  945: @cindex @acronym{GNU} extensions, to basic regular expressions
  946: @c As @code{*}, but matches one or more.  It is a @acronym{GNU} extension.
  947: @c 
  948: @code{*}に似ていますが,一つ以上にマッチします.それは@acronym{GNU}の拡
  949: 張です.
  950: 
  951: @item \?
  952: @cindex @acronym{GNU} extensions, to basic regular expressions
  953: @c As @code{*}, but only matches zero or one.  It is a @acronym{GNU} extension.
  954: @c 
  955: @code{*}に似ていますが,ゼロまたは一つだけにマッチします.これは
  956: @acronym{GNU}の拡張です.
  957: 
  958: @item \@{@var{i}\@}
  959: @c As @code{*}, but matches exactly @var{i} sequences (@var{i} is a
  960: @c decimal integer; for portability, keep it between 0 and 255
  961: @c inclusive).
  962: @c 
  963: @code{*}に似ていますが,正確に@var{i}個連続したものにマッチします
  964: (@var{i}は10進数の整数です.移植性のために0から255の間にしてください).
  965: 
  966: @item \@{@var{i},@var{j}\@}
  967: @c Matches between @var{i} and @var{j}, inclusive, sequences.
  968: @c 
  969: @var{i}個以上@var{j}個以下の連続にマッチします.
  970: 
  971: @item \@{@var{i},\@}
  972: @c Matches more than or equal to @var{i} sequences.
  973: @c 
  974: @var{i}個以上の連続にマッチします.
  975: 
  976: @item \(@var{regexp}\)
  977: @c Groups the inner @var{regexp} as a whole, this is used to: 
  978: @c 
  979: 全体として@var{regexp}の内部グループにします.以下のように使用されます.
  980: 
  981: @itemize @bullet
  982: @item
  983: @cindex @acronym{GNU} extensions, to basic regular expressions
  984: @c Apply postfix operators, like @code{\(abcd\)*}:
  985: @c this will search for zero or more whole sequences 
  986: @c of @samp{abcd}, while @code{abcd*} would search
  987: @c for @samp{abc} followed by zero or more occurrences
  988: @c of @samp{d}.  Note that support for @code{\(abcd\)*} is
  989: @c required by @acronym{POSIX} 1003.1-2001, but many non-@acronym{GNU}
  990: @c implementations do not support it and hence it is not universally
  991: @c portable.
  992: @c 
  993: @code{\(abcd\)*}のような,後置されるオペレータを適用します.これは,ゼ
  994: ロ以上の@samp{abcd}全体の連続を検索しますが,@code{abcd*}は@samp{abc}に
  995: ゼロ以上の@samp{d}が続くものを検索します.@code{\(abcd\)*}は,
  996: @acronym{POSIX} 1003.1-2001で要求されているためサポートされていますが,
  997: @acronym{GNU}以外での実装では,ほとんどもものがこれをサポートしていない
  998: ので,一般的に移植性の高くありません.
  999: 
 1000: @item
 1001: @c Use back references (see below).
 1002: @c 
 1003: 後方参照を使用します(以下を参照してください).
 1004: @end itemize
 1005: 
 1006: @item .
 1007: @c Matches any character, including newline.
 1008: @c 
 1009: 改行を含む,あらゆる文字にマッチします.
 1010: 
 1011: @item ^
 1012: @c Matches the null string at beginning of line, i.e. what
 1013: @c appears after the circumflex must appear at the 
 1014: @c beginning of line. @code{^#include} will match only 
 1015: @c lines where @samp{#include} is the first thing on line---if
 1016: @c there are spaces before, for example, the match fails.
 1017: @c @code{^} acts as a special character only at the beginning
 1018: @c of the regular expression or subexpression (that is,
 1019: @c after @code{\(} or @code{\|}).  Portable scripts should avoid
 1020: @c @code{^} at the beginning of a subexpression, though, as
 1021: @c @acronym{POSIX} allows implementations that treat @code{^} as
 1022: @c an ordinary character in that context.
 1023: @c 
 1024: 行の最初のヌル文字にマッチし,すなわちサーカムフレクスの後にあるものは
 1025: 行の先頭にある必要があります.@code{^#include}は,"#include"が行の最初
 1026: にあるところだけにマッチします --- その前に一つか二つのスペースがある場
 1027: 合,マッチは失敗します.@code{^}は,正規表現や部分正規表現の最初(すなわ
 1028: ち,@code{\(}や@code{\|}の後)にある場合のみ特殊文字として動作します.移
 1029: 植性の高いスクリプトでは,部分正規表現の最初で@code{^}を使用するのは避
 1030: けるべきで,それは@acronym{POSIX}では,@code{^}を文章内の通常の文字とし
 1031: て扱う実装も許可しているためです.
 1032: 
 1033: @item $
 1034: @c It is the same as @code{^}, but refers to end of line.
 1035: @c @code{$} also acts as a special character only at the end
 1036: @c of the regular expression or subexpression (that is, before @code{\)}
 1037: @c or @code{\|}), and its use at the end of a subexpression is not
 1038: @c portable.
 1039: @c 
 1040: @code{^}と同じですが,行の終りを参照します.@code{$}も,正規表現や部分
 1041: 正規表現の最後(すなわち,@code{\)}や@code{\|}の後)にある場合のみ特殊文
 1042: 字として動作し,部分正規表現の最後での利用には移植性がありません.
 1043: 
 1044: @item [@var{list}]
 1045: @itemx [^@var{list}]
 1046: @c Matches any single character in @var{list}: for example,
 1047: @c @code{[aeiou]} matches all vowels.  A list may include
 1048: @c sequences like @code{@var{char1}-@var{char2}}, which
 1049: @c matches any character between (inclusive) @var{char1}
 1050: @c and @var{char2}.
 1051: @c 
 1052: @var{list}内の単一文字にマッチします.例えば,@samp{[aeiou]}はすべての
 1053: 母音にマッチします.リストには@samp{@var{char1}-@var{char2}}のような並
 1054: びを含めてもかまわず,それは,@var{char1}と@var{char2}の間(それぞれ含ま
 1055: れます)のあらゆる文字にマッチします.
 1056: 
 1057: @c A leading @code{^} reverses the meaning of @var{list}, so that
 1058: @c it matches any single character @emph{not} in @var{list}.  To include
 1059: @c @code{]} in the list, make it the first character (after
 1060: @c the @code{^} if needed), to include @code{-} in the list,
 1061: @c make it the first or last; to include @code{^} put
 1062: @c it after the first character.
 1063: @c 
 1064: 前置される@code{^}は正規表現の意味を反転するので,@var{list}に@emph{な
 1065: い}単一文字にマッチします.リストに@samp{]}を含めるため,それを(必要な
 1066: @code{^}の後の)最初の文字にし,@samp{-}をリストに含めるため,それを最初
 1067: または最後にしてください.@samp{^}を含めるため,最初の文字より後に書い
 1068: てください.
 1069: 
 1070: @cindex @code{POSIXLY_CORRECT} behavior, bracket expressions
 1071: @c The characters @code{$}, @code{*}, @code{.}, @code{[}, and @code{\}
 1072: @c are normally not special within @var{list}.  For example, @code{[\*]}
 1073: @c matches either @samp{\} or @samp{*}, because the @code{\} is not
 1074: @c special here.  However, strings like @code{[.ch.]}, @code{[=a=]}, and
 1075: @c @code{[:space:]} are special within @var{list} and represent collating
 1076: @c symbols, equivalence classes, and character classes, respectively, and
 1077: @c @code{[} is therefore special within @var{list} when it is followed by
 1078: @c @code{.}, @code{=}, or @code{:}.  Also, when not in
 1079: @c @env{POSIXLY_CORRECT} mode, special escapes like @code{\n} and
 1080: @c @code{\t} are recognized within @var{list}.  @xref{Escapes}.
 1081: @c 
 1082: 文字@code{$},@code{*},@code{.},@code{[},そして@code{\}は,通常は
 1083: @var{list}内で特殊文字にはなりません.例えば@code{[\*]}は,この状況では
 1084: @code{\}が特殊文字ではないので,@samp{\}にも@samp{*}にもマッチします.
 1085: しかし,@code{[.ch.]},@code{[=a=]},そして@code{[:space:]}のような文字
 1086: 列は@var{list}内で特殊扱いされ,対応するシンボル,等価のクラス,そして
 1087: 文字クラスにそれぞれ対応するので,後に@code{.},@code{=},または
 1088: @code{:}が続く@code{[}は@var{list}内で特殊扱いされます.
 1089: @env{POSIXLY_CORRECT}モードでないときも,@code{\n}と@code{\t}のような特
 1090: 殊なエスケープは@var{list}内で認識されます.@xref{Escapes}.
 1091: 
 1092: @item @var{regexp1}\|@var{regexp2}
 1093: @cindex @acronym{GNU} extensions, to basic regular expressions
 1094: @c Matches either @var{regexp1} or @var{regexp2}.  Use
 1095: @c parentheses to use complex alternative regular expressions.
 1096: @c The matching process tries each alternative in turn, from
 1097: @c left to right, and the first one that succeeds is used.
 1098: @c It is a @acronym{GNU} extension.
 1099: @c 
 1100: @var{regexp1}または@var{regexp2}にマッチします.複雑な選択的正規表現を
 1101: 使用するためカッコを使用してください.マッチの処理は,それぞれの選択物
 1102: を左から右へ順番に試し,最初に成功したものが使用されます.それは
 1103: @acronym{GNU}の拡張です.
 1104: 
 1105: @item @var{regexp1}@var{regexp2}
 1106: @c Matches the concatenation of @var{regexp1} and @var{regexp2}.
 1107: @c Concatenation binds more tightly than @code{\|}, @code{^}, and
 1108: @c @code{$}, but less tightly than the other regular expression
 1109: @c operators.
 1110: @c 
 1111: @var{regexp1}と@var{regexp2}が連結しているものにマッチします.結合具合
 1112: は@code{\|},@code{^},そして@code{$}より綿密になりますが,それ以外の正
 1113: 規表現オペレータよりは弱くなります.
 1114: 
 1115: @item \@var{digit}
 1116: @c Matches the @var{digit}-th @code{\(@dots{}\)} parenthesized
 1117: @c subexpression in the regular expression.  This is called a @dfn{back
 1118: @c reference}.  Subexpressions are implicity numbered by counting
 1119: @c occurrences of @code{\(} left-to-right.
 1120: @c 
 1121: 正規表現の@var{digit}番目の@code{\(@dots{}\)}の括弧付き部分正規表現にマッ
 1122: チします.これは,@dfn{後方参照(back reference)}と呼ばれています.部分
 1123: 正規表現は,左から右の出現順に数えた番号が暗黙に指定されます.
 1124: 
 1125: @item \n
 1126: @c Matches the newline character.
 1127: @c 
 1128: 改行文字にマッチします.
 1129: 
 1130: @item \@var{char}
 1131: @c Matches @var{char}, where @var{char} is one of @code{$},
 1132: @c @code{*}, @code{.}, @code{[}, @code{\}, or @code{^}.
 1133: @c Note that the only C-like
 1134: @c backslash sequences that you can portably assume to be
 1135: @c interpreted are @code{\n} and @code{\\}; in particular
 1136: @c @code{\t} is not portable, and matches a @samp{t} under most
 1137: @c implementations of @command{sed}, rather than a tab character.
 1138: @c 
 1139: @var{char}にマッチし,ここで@var{char}は,@code{$},@code{*},@code{.},
 1140: @code{[},@code{\},または@code{^}の一つです.移植性があると仮定可能な
 1141: Cのようなバックスラッシュシーケンスは,改行に対する@code{\n}と
 1142: @code{\\}だということに注意してください.特に@code{\t}は移植性がなく,
 1143: ほとんどの@command{sed}実装でタブ文字ではなく@samp{t}にマッチします.
 1144: 
 1145: @end table
 1146: 
 1147: @cindex Greedy regular expression matching
 1148: @c Note that the regular expression matcher is greedy, i.e., matches
 1149: @c are attempted from left to right and, if two or more matches are
 1150: @c possible starting at the same character, it selects the longest.
 1151: @c 
 1152: 正規表現のマッチは欲張りで,つまりマッチは左から右に試みられ,二つ以上
 1153: のマッチが同じ文字から開始される場合は,その最も長いものを選択すること
 1154: に注意してください.
 1155: 
 1156: @noindent
 1157: @c Examples:
 1158: @c 
 1159: 例です.
 1160: @table @samp
 1161: @item abcdef
 1162: @c Matches @samp{abcdef}.
 1163: @c 
 1164: @samp{abcdef}にマッチします.
 1165: 
 1166: @item a*b
 1167: @c Matches zero or more @samp{a}s followed by a single
 1168: @c @samp{b}.  For example, @samp{b} or @samp{aaaaab}. 
 1169: @c 
 1170: ゼロ個以上の@samp{a}に単一の@samp{b}が続くものにマッチします.例えば,
 1171: @samp{b}や@samp{aaaaab}です.
 1172: 
 1173: @item a\?b
 1174: @c Matches @samp{b} or @samp{ab}.
 1175: @c 
 1176: @samp{b}や@samp{ab}にマッチします.
 1177: 
 1178: @item a\+b\+
 1179: @c Matches one or more @samp{a}s followed by one or more
 1180: @c @samp{b}s: @samp{ab} is the shortest possible match, but
 1181: @c other examples are @samp{aaaab} or @samp{abbbbb} or
 1182: @c @samp{aaaaaabbbbbbb}.
 1183: @c 
 1184: 一つ以上の@samp{a}に一つ以上の@samp{b}が続くものにマッチします.
 1185: @samp{ab}はマッチの可能性がある最も短いものですが,それ以外の例として,
 1186: @samp{aaaab}や@samp{abbbbb}や@samp{aaaaaabbbbbbb}があります.
 1187: 
 1188: @item .*
 1189: @itemx .\+
 1190: @c These two both match all the characters in a string;
 1191: @c however, the first matches every string (including the empty
 1192: @c string), while the second matches only strings containing
 1193: @c at least one character.
 1194: @c 
 1195: これらの二つは両方とも,行のすべての文字にマッチします.しかし,最初の
 1196: ものはすべての行(空の文字列も含む)にマッチしますが,二番目のものは少な
 1197: くとも一文字含まれている行のみにマッチします.
 1198: 
 1199: @item ^main.*(.*)
 1200: @c This matches a string starting with @samp{main},
 1201: @c followed by an opening and closing
 1202: @c parenthesis.  The @samp{n}, @samp{(} and @samp{)} need not
 1203: @c be adjacent.
 1204: @c 
 1205: これは,行の最初が@samp{main}で,開カッコと閉カッコが続く行を探します.
 1206: @samp{n},@samp{(},そして@samp{)}が隣接している必要はありません.
 1207: 
 1208: @item ^#
 1209: @c This matches a string beginning with @samp{#}.
 1210: @c 
 1211: これは,@samp{#}始まる文字列にマッチします.
 1212: 
 1213: @item \\$
 1214: @c This matches a string ending with a single backslash.  The
 1215: @c regexp contains two backslashes for escaping.
 1216: @c 
 1217: これは単一のバックスラッシュで終る行にマッチします.その正規表現は,エ
 1218: スケープのために二つのバックスラッシュが含まれます.
 1219: 
 1220: @item \$
 1221: @c Instead, this matches a string consisting of a single dollar sign,
 1222: @c because it is escaped.
 1223: @c 
 1224: 代わりに,これは単一のドル記号にマッチし,それはエスケープされているた
 1225: めです.
 1226: 
 1227: @item [a-zA-Z0-9]
 1228: @c In the C locale, this matches any @acronym{ASCII} letters or digits.
 1229: @c 
 1230: Cロカールでは,これはあらゆる@acronym{ASCII}文字と数字にマッチします.
 1231: 
 1232: @item [^ @kbd{tab}]\+
 1233: @c (Here @kbd{tab} stands for a single tab character.)
 1234: @c This matches a string of one or more
 1235: @c characters, none of which is a space or a tab.
 1236: @c Usually this means a word.
 1237: @c 
 1238: (ここでの@kbd{tab}は単一のタブ文字を意味します.) これは,スペースとタ
 1239: ブ以外のあらゆる文字が一つ以上連続しているものにマッチします.通常,こ
 1240: れは単語を意味します.
 1241: 
 1242: @item ^\(.*\)\n\1$
 1243: @c This matches a string consisting of two equal substrings separated by
 1244: @c a newline.
 1245: @c 
 1246: これは,改行で分離されている二つの同じ部分文字列から成り立つ文字列にマッ
 1247: チします.
 1248: 
 1249: @item .\@{9\@}A$
 1250: @c This matches nine characters followed by an @samp{A}.
 1251: @c 
 1252: これは,@samp{A}が後置されている九文字にマッチします.
 1253: 
 1254: @item ^.\@{15\@}A
 1255: @c This matches the start of a string that contains 16 characters,
 1256: @c the last of which is an @samp{A}.
 1257: @c 
 1258: これは16文字含まれている文字列ではじまり,最後が@samp{A}のものみマッチ
 1259: します.
 1260: 
 1261: @end table
 1262: 
 1263: 
 1264: @node Common Commands
 1265: @c @section Often-Used Commands
 1266: @section よく使用されるコマンド
 1267: 
 1268: @c If you use @command{sed} at all, you will quite likely want to know
 1269: @c these commands.
 1270: @c 
 1271: 本当に@command{sed}を使用するのなら,きっとこれらのコマンドを知りたいと
 1272: 思うでしょう.
 1273: 
 1274: @table @code
 1275: @item #
 1276: @c [No addresses allowed.]
 1277: @c 
 1278: [アドレスは利用不可能です.]
 1279: 
 1280: @findex # (comments)
 1281: @cindex Comments, in scripts
 1282: @c The @code{#} character begins a comment;
 1283: @c the comment continues until the next newline.
 1284: @c 
 1285: @code{#}文字ははコメントを開始します.コメントは次の改行まで続きます.
 1286: 
 1287: @cindex Portability, comments
 1288: @c If you are concerned about portability, be aware that
 1289: @c some implementations of @command{sed} (which are not @sc{posix}
 1290: @c conformant) may only support a single one-line comment,
 1291: @c and then only when the very first character of the script is a @code{#}.
 1292: @c 
 1293: 移植性を心配している場合,@command{sed} (@sc{posix}に準拠していないも
 1294: の)の実装によっては,単一の一行のコメントのみサポートしていて,スクリプ
 1295: トの最初の文字が@code{#}のときだけサポートしている可能性があることを覚
 1296: えておいてください.
 1297: 
 1298: @findex -n, forcing from within a script
 1299: @cindex Caveat --- #n on first line
 1300: @c Warning: if the first two characters of the @command{sed} script
 1301: @c are @code{#n}, then the @option{-n} (no-autoprint) option is forced.
 1302: @c If you want to put a comment in the first line of your script
 1303: @c and that comment begins with the letter @samp{n}
 1304: @c and you do not want this behavior,
 1305: @c then be sure to either use a capital @samp{N},
 1306: @c or place at least one space before the @samp{n}.
 1307: @c 
 1308: 警告:@command{sed}スクリプトの最初の二文字が@code{#n}の場合,
 1309: @option{-n} (自動的に出力しない)オプションが強制的に使用されます.スク
 1310: リプトの最初の行にコメントを書き,そしてコメントを文字@samp{n}で開始し
 1311: たい場合で,このように動作して欲しくない場合は,大文字の@samp{N}を使用
 1312: するか,@samp{n}の前に少なくとも一つのスペースを書いてください.
 1313: 
 1314: @item q [@var{exit-code}]
 1315: @c This command only accepts a single address.
 1316: @c 
 1317: このコマンドは一つのアドレスだけ受け入れます.
 1318: 
 1319: @findex q (quit) command
 1320: @cindex @value{SSEDEXT}, returning an exit code
 1321: @cindex Quitting
 1322: @c Exit @command{sed} without processing any more commands or input.
 1323: @c Note that the current pattern space is printed if auto-print is
 1324: @c not disabled with the @option{-n} options.  The ability to return
 1325: @c an exit code from the @command{sed} script is a @value{SSED} extension.
 1326: @c 
 1327: それ以上のコマンドも入力も処理せず@command{sed}を終了します.自動的な出
 1328: 力が@option{-n}スイッチで利用不可能になっていない場合,現在のパターン空
 1329: 間が出力されることに注意してください.@command{sed}スクリプトが終了コー
 1330: ドを返す能力は,@value{SSED}の拡張です.
 1331: 
 1332: @item d
 1333: @findex d (delete) command
 1334: @cindex Text, deleting
 1335: @c Delete the pattern space;
 1336: @c immediately start next cycle.
 1337: @c 
 1338: パターン空間を削除します.すぐに次のサイクルを開始します.
 1339: 
 1340: @item p
 1341: @findex p (print) command
 1342: @cindex Text, printing
 1343: @c Print out the pattern space (to the standard output).
 1344: @c This command is usually only used in conjunction with the @option{-n}
 1345: @c command-line option.
 1346: @c 
 1347: パターン空間を(標準出力に)出力します.通常このコマンドは,@option{-n}コ
 1348: マンドラインオプションと組み合わせて使用します.
 1349: 
 1350: @item n
 1351: @findex n (next-line) command
 1352: @cindex Next input line, replace pattern space with
 1353: @cindex Read next input line
 1354: @c If auto-print is not disabled, print the pattern space,
 1355: @c then, regardless, replace the pattern space with the next line of input.
 1356: @c If there is no more input then @command{sed} exits without processing
 1357: @c any more commands.
 1358: @c 
 1359: 自動的な出力が利用不可能ではない場合,パターン空間を出力し,何も考えず,
 1360: パターン空間を入力の次の行で置換します.それ以上入力がない場合,
 1361: @command{sed}はそれ以上のコマンドを処理せずに終了します.
 1362: 
 1363: @item @{ @var{commands} @}
 1364: @findex @{@} command grouping
 1365: @cindex Grouping commands
 1366: @cindex Command groups
 1367: @c A group of commands may be enclosed between
 1368: @c @code{@{} and @code{@}} characters.
 1369: @c This is particularly useful when you want a group of commands
 1370: @c to be triggered by a single address (or address-range) match.
 1371: @c 
 1372: コマンドのグループは,@code{@{}文字と@code{@}}文字で囲んでもかまいませ
 1373: ん.コマンドのグループを,単一のアドレス(またはアドレスの範囲)にマッチ
 1374: したところで開始したいとき,これは特に役に立ちます.
 1375: 
 1376: @end table
 1377: 
 1378: @node The "s" Command
 1379: @c @section The @code{s} Command
 1380: @section @code{s}コマンド
 1381: 
 1382: @c The syntax of the @code{s} (as in substitute) command is
 1383: @c @samp{s/@var{regexp}/@var{replacement}/@var{flags}}.  The @code{/}
 1384: @c characters may be uniformly replaced by any other single
 1385: @c character within any given @code{s} command.  The @code{/}
 1386: @c character (or whatever other character is used in its stead)
 1387: @c can appear in the @var{regexp} or @var{replacement}
 1388: @c only if it is preceded by a @code{\} character.
 1389: @c 
 1390: (置換での)@code{s}コマンドの構文は,
 1391: @samp{s/@var{regexp}/@var{replacement}/@var{flags}}です.@code{/}文字は,
 1392: 他の単一文字を@code{s}コマンドで与えることで一様に置換してもかまいませ
 1393: ん.@code{/}文字(または,それの代わりの使用されているその他の文字)は,
 1394: @code{\}文字を前置した場合だけ@var{regexp}や@var{replacement}に書くこと
 1395: が可能です.
 1396: 
 1397: @c The @code{s} command is probably the most important in @command{sed}
 1398: @c and has a lot of different options.  Its basic concept is simple:
 1399: @c the @code{s} command attempts to match the pattern
 1400: @c space against the supplied @var{regexp}; if the match is
 1401: @c successful, then that portion of the pattern
 1402: @c space which was matched is replaced with @var{replacement}.
 1403: @c 
 1404: @code{s}コマンドは,おそらく@command{sed}で最も重要で,様々なオプション
 1405: が多くあります.基本的な概念は単純です.@code{s}コマンドは,提供されて
 1406: いる@var{regexp}に対しパターン空間のマッチを試みます.マッチが成功する
 1407: 場合.マッチしたパターン空間の位置が@var{replacement}で置換されます.
 1408: 
 1409: @cindex Backreferences, in regular expressions
 1410: @cindex Parenthesized substrings
 1411: @c The @var{replacement} can contain @code{\@var{n}} (@var{n} being
 1412: @c a number from 1 to 9, inclusive) references, which refer to
 1413: @c the portion of the match which is contained between the @var{n}th
 1414: @c @code{\(} and its matching @code{\)}.
 1415: @c Also, the @var{replacement} can contain unescaped @code{&}
 1416: @c characters which reference the whole matched portion
 1417: @c of the pattern space.
 1418: @c 
 1419: @var{replacement}に@code{\@var{n}}(@var{n}は1から9までの数字で,1と9も
 1420: 含まれます)での参照を含めることが可能で,それは@var{n}番目の@code{\(}と
 1421: そのマッチと@code{\)}に含まれているマッチの位置を参照します.また,
 1422: @var{replacement}に,パターン空間のマッチ位置全体を参照する,エスケープ
 1423: されていない@code{&}文字を含めることも可能です.
 1424: 
 1425: @cindex @value{SSEDEXT}, case modifiers in @code{s} commands
 1426: @c Finally, as a @value{SSED} extension, you can include a
 1427: @c special sequence made of a backslash and one of the letters
 1428: @c @code{L}, @code{l}, @code{U}, @code{u}, or @code{E}.
 1429: @c The meaning is as follows:
 1430: @c 
 1431: 最後に(これは@value{SSED}の拡張です),特別なバックスラッシュのシーケン
 1432: スと文字@code{L},@code{l},@code{U},@code{u},または@code{E}の一つを
 1433: 含めることが可能です.それぞれの意味は以下のとおりです.
 1434: 
 1435: @table @code
 1436: @item \L
 1437: @c Turn the replacement
 1438: @c to lowercase until a @code{\U} or @code{\E} is found,
 1439: @c 
 1440: @code{\U}や@code{\E}が見つかるまで小文字に置換します.
 1441: 
 1442: @item \l
 1443: @c Turn the
 1444: @c next character to lowercase,
 1445: @c 
 1446: 次の文字を小文字に置換します.
 1447: 
 1448: @item \U
 1449: @c Turn the replacement to uppercase
 1450: @c until a @code{\L} or @code{\E} is found,
 1451: @c 
 1452: @code{\L}や@code{\E}が見つかるまで大文字に置換します.
 1453: 
 1454: @item \u
 1455: @c Turn the next character
 1456: @c to uppercase,
 1457: @c 
 1458: 次の文字を大文字に置換します.
 1459: 
 1460: @item \E
 1461: @c Stop case conversion started by @code{\L} or @code{\U}.
 1462: @c 
 1463: @code{\L}や@code{\U}で開始した大文字小文字の変換を停止します.
 1464: @end table
 1465: 
 1466: @c To include a literal @code{\}, @code{&}, or newline in the final
 1467: @c replacement, be sure to precede the desired @code{\}, @code{&},
 1468: @c or newline in the @var{replacement} with a @code{\}.
 1469: @c 
 1470: @code{\},@code{&},または改行そのものを最終的な置換物に含めるため,
 1471: @var{replacement}内の@code{\},@code{&},または改行に必要な@code{\}を確
 1472: 実に前置してください.
 1473: 
 1474: @findex s command, option flags
 1475: @cindex Substitution of text, options
 1476: @c The @code{s} command can be followed by zero or more of the
 1477: @c following @var{flags}:
 1478: @c 
 1479: @code{s}コマンドにはゼロ以上の下記の@var{flags}を続けることが可能です.
 1480: 
 1481: @table @code
 1482: @item g
 1483: @cindex Global substitution
 1484: @cindex Replacing all text matching regexp in a line
 1485: @c Apply the replacement to @emph{all} matches to the @var{regexp},
 1486: @c not just the first.
 1487: @c 
 1488: 最初のものだけでなく@emph{すべての}@var{regexp}へのマッチを置換します.
 1489: 
 1490: @item @var{number}
 1491: @cindex Replacing only @var{n}th match of regexp in a line
 1492: @c Only replace the @var{number}th match of the @var{regexp}.
 1493: @c 
 1494: @var{regexp}の@var{number}番目のマッチのみ置換します.
 1495: 
 1496: @cindex @acronym{GNU} extensions, @code{g} and @var{number} modifier interaction in @code{s} command
 1497: @cindex Mixing @code{g} and @var{number} modifiers in the @code{s} command
 1498: @c Note: the @sc{posix} standard does not specify what should happen
 1499: @c when you mix the @code{g} and @var{number} modifiers,
 1500: @c and currently there is no widely agreed upon meaning
 1501: @c across @command{sed} implementations.
 1502: @c For @value{SSED}, the interaction is defined to be:
 1503: @c ignore matches before the @var{number}th,
 1504: @c and then match and replace all matches from
 1505: @c the @var{number}th on.
 1506: @c 
 1507: 注意:@sc{posix}の標準は,@code{g}と@var{number}指示語を混ぜたときに生
 1508: じることを指定しておらず,現在は@command{sed}の実装上で幅広い同意はあり
 1509: ません.@value{SSED}では,相互作用を以下のように定義しています.
 1510: @var{number}番目までのマッチを無視し,@var{number}番目からマッチしした
 1511: すべてのマッチを置換します.
 1512: 
 1513: @item p
 1514: @cindex Text, printing after substitution
 1515: @c If the substitution was made, then print the new pattern space.
 1516: @c 
 1517: 置換が行なわれた場合,新しいパターン空間を出力します.
 1518: 
 1519: @c Note: when both the @code{p} and @code{e} options are specified,
 1520: @c the relative ordering of the two produces very different results.
 1521: @c In general, @code{ep} (evaluate then print) is what you want,
 1522: @c but operating the other way round can be useful for debugging.
 1523: @c For this reason, the current version of @value{SSED} interprets
 1524: @c specially the presence of @code{p} options both before and after
 1525: @c @code{e}, printing the pattern space before and after evaluation,
 1526: @c while in general flags for the @code{s} command show their
 1527: @c effect just once.  This behavior, although documented, might
 1528: @c change in future versions.
 1529: @c 
 1530: 注意:@code{p}と@code{e}オプションの両方が指定されているとき,二つの順
 1531: 序に関連して,全く異なる結果を生成します.一般的に,@code{ep}(評価して
 1532: 出力)では期待したものになるでしょうが,もう一方の順番はデバッグで役に立
 1533: つものになります.この理由は,現在のバージョンの@value{SSED}が,
 1534: @code{e}前後の@code{p}オプションの存在を特別なものとして解釈しますが,
 1535: @code{s}コマンドに対する一般的なフラグはその効果を一度だけ表示するため
 1536: です.この動作は,ドキュメントには書かれていますが,将来のバージョンで
 1537: は変更するかもしれません.
 1538: 
 1539: @item w @var{file-name}
 1540: @cindex Text, writing to a file after substitution
 1541: @cindex @value{SSEDEXT}, @file{/dev/stdout} file
 1542: @cindex @value{SSEDEXT}, @file{/dev/stderr} file
 1543: @c If the substitution was made, then write out the result to the named file.
 1544: @c As a @value{SSED} extension, two special values of @var{file-name} are
 1545: @c supported: @file{/dev/stderr}, which writes the result to the standard
 1546: @c error, and @file{/dev/stdout}, which writes to the standard
 1547: @c output.@footnote{This is equivalent to @code{p} unless the @option{-i}
 1548: @c option is being used.}
 1549: @c 
 1550: 置換が行なわれた場合,結果を指名されたファイルに書き出します.
 1551: @value{SSED}の拡張として,@var{file-name}の特殊な値をサポートします.結
 1552: 果を標準エラー出力に書き出す@file{/dev/stderr},そして標準出力に書き出
 1553: す@file{/dev/stdout}です.@footnote{これは,@code{-i}スイッチが使用され
 1554: ていない限り,@code{p}と等価です.}
 1555: 
 1556: @item e
 1557: @cindex Evaluate Bourne-shell commands, after substitution
 1558: @cindex Subprocesses
 1559: @cindex @value{SSEDEXT}, evaluating Bourne-shell commands
 1560: @cindex @value{SSEDEXT}, subprocesses
 1561: @c This command allows one to pipe input from a shell command
 1562: @c into pattern space.  If a substitution was made, the command
 1563: @c that is found in pattern space is executed and pattern space
 1564: @c is replaced with its output.  A trailing newline is suppressed;
 1565: @c results are undefined if the command to be executed contains
 1566: @c a @sc{nul} character.  This is a @value{SSED} extension.
 1567: @c 
 1568: このコマンドで,シェルコマンドからの入力をパターン空間へのパイプで渡す
 1569: ことが可能になります.代入が行なわれた場合,パターン空間で見つかったコ
 1570: マンドが実行され,パターン空間はその出力で置換されます.後置される改行
 1571: は抑制されます.実行されたコマンドに@sc{nul}文字が含まれる場合,結果は
 1572: 定義されていません.これは@value{SSED}の拡張です.
 1573: 
 1574: @item I
 1575: @itemx i
 1576: @cindex @acronym{GNU} extensions, @code{I} modifier
 1577: @cindex Case-insensitive matching
 1578: @ifset PERL
 1579: @cindex Perl-style regular expressions, case-insensitive
 1580: @end ifset
 1581: @c The @code{I} modifier to regular-expression matching is a @acronym{GNU}
 1582: @c extension which makes @command{sed} match @var{regexp} in a
 1583: @c case-insensitive manner.
 1584: @c 
 1585: 正規表現にマッチさせる@code{I}指示語は@acronym{GNU}の拡張で,大文字小文
 1586: 字を無視する方法で,@command{sed}に@var{regexp}にマッチさせます.
 1587: 
 1588: @item M
 1589: @itemx m
 1590: @cindex @value{SSEDEXT}, @code{M} modifier
 1591: @ifset PERL
 1592: @cindex Perl-style regular expressions, multiline
 1593: @end ifset
 1594: @c The @code{M} modifier to regular-expression matching is a @value{SSED}
 1595: @c extension which causes @code{^} and @code{$} to match respectively
 1596: @c (in addition to the normal behavior) the empty string after a newline,
 1597: @c and the empty string before a newline.  There are special character
 1598: @c sequences
 1599: @c 
 1600: 正規表現のマッチに対する@code{M}指示語は,@value{SSED}の拡張で,
 1601: @code{^}と@code{$}を(通常の動作に加え),それぞれ改行後の空の文字列と改
 1602: 行前の空の文字列にマッチさせます.特殊な文字の並びがあります.
 1603: 
 1604: @ifset PERL
 1605: @c (@code{\A} and @code{\Z} in Perl mode, @code{\`} and @code{\'}
 1606: @c in basic or extended regular expression modes)
 1607: @c 
 1608: (Perlモードの@code{\A}と@code{\Z},基本または拡張正規表現モードの
 1609: @code{\`}と@code{\'})
 1610: @c 
 1611: @end ifset
 1612: @ifclear PERL
 1613: @c (@code{\`} and @code{\'})
 1614: @c 
 1615: (@code{\`}と@code{\'})
 1616: @end ifclear
 1617: @c which always match the beginning or the end of the buffer.
 1618: @c @code{M} stands for @cite{multi-line}.
 1619: @c 
 1620: これらは常にバッファの最初または最後にマッチします.@code{M}は
 1621: @cite{multi-line}を意味します.
 1622: 
 1623: @ifset PERL
 1624: @item S
 1625: @itemx s
 1626: @cindex @value{SSEDEXT}, @code{S} modifier
 1627: @cindex Perl-style regular expressions, single line
 1628: @c The @code{S} modifier to regular-expression matching is only valid
 1629: @c in Perl mode and specifies that the dot character (@code{.}) will
 1630: @c match the newline character too.  @code{S} stands for @cite{single-line}.
 1631: @c 
 1632: 正規表現のマッチに対する@code{S}指示語は,Perlモードでのみ有効で,ドッ
 1633: ト文字(@code{.})が改行文字にもマッチするように指定します.@code{S}は
 1634: @cite{single-line}を意味します.
 1635: @end ifset
 1636: 
 1637: @ifset PERL
 1638: @item X
 1639: @itemx x
 1640: @cindex @value{SSEDEXT}, @code{X} modifier
 1641: @cindex Perl-style regular expressions, extended
 1642: @c The @code{X} modifier to regular-expression matching is also
 1643: @c valid in Perl mode only.  If it is used, whitespace in the
 1644: @c pattern (other than in a character class) and
 1645: @c characters between a @kbd{#} outside a character class and the
 1646: @c next newline character are ignored. An escaping backslash
 1647: @c can be used to include a whitespace or @kbd{#} character as part
 1648: @c of the pattern.
 1649: @c 
 1650: 正規表現のマッチに対する@code{X}指示語も,Perlモードでのみ有効です.そ
 1651: れが使用されている場合,パターン内の空白(文字集合以外のもの)と,文字集
 1652: 合の外側の@kbd{#}と次の改行文字の間の文字を無視します.パターンの一部に
 1653: 空白や@kbd{#}文字を使用するため,バックスラッシュを使用することが可能で
 1654: す.
 1655: @end ifset
 1656: @end table
 1657: 
 1658: 
 1659: @node Other Commands
 1660: @c @section Less Frequently-Used Commands
 1661: @section あまり使用されないコマンド
 1662: 
 1663: @c Though perhaps less frequently used than those in the previous
 1664: @c section, some very small yet useful @command{sed} scripts can be built with
 1665: @c these commands.
 1666: @c 
 1667: 前のセクションのものより使用されることはおそらく少ないでしょうが,非常
 1668: にわずかな有用な@command{sed}スクリプトには,以下のコマンドを組み込むこ
 1669: とも可能です.
 1670: 
 1671: @table @code
 1672: @item y/@var{source-chars}/@var{dest-chars}/
 1673: @c (The @code{/} characters may be uniformly replaced by
 1674: @c any other single character within any given @code{y} command.)
 1675: @c 
 1676: (@code{/}文字は,@code{y}コマンドで与えられるその他の単一文字で一律に置
 1677: 換してもかまいません.)
 1678: 
 1679: @findex y (transliterate) command
 1680: @cindex Transliteration
 1681: @c Transliterate any characters in the pattern space which match
 1682: @c any of the @var{source-chars} with the corresponding character
 1683: @c in @var{dest-chars}.
 1684: @c 
 1685: @var{source-chars}にマッチしたパターン空間のすべての文字を,対応する
 1686: @var{dest-chars}の文字に変換します.
 1687: 
 1688: @c Instances of the @code{/} (or whatever other character is used in its stead),
 1689: @c @code{\}, or newlines can appear in the @var{source-chars} or @var{dest-chars}
 1690: @c lists, provide that each instance is escaped by a @code{\}.
 1691: @c The @var{source-chars} and @var{dest-chars} lists @emph{must}
 1692: @c contain the same number of characters (after de-escaping).
 1693: @c 
 1694: @code{/}(またはそのかわりに使用されている文字),@code{\},または改行の
 1695: インスタンスは,それぞれのインスタンスに@code{\}でエスケープを提供する
 1696: ことで@var{source-chars}や@var{dest-chars}のリストに書くことが可能です.
 1697: @var{source-chars}と@var{dest-chars}のリストには,(エスケープを取り除く
 1698: と)同じ数の文字を含める@emph{必要があります}.
 1699: 
 1700: @item a\
 1701: @itemx @var{text}
 1702: @cindex @value{SSEDEXT}, two addresses supported by most commands
 1703: @c As a @acronym{GNU} extension, this command accepts two addresses.
 1704: @c 
 1705: @acronym{GNU}の拡張として,このコマンドは二つのアドレスを受け入れます.
 1706: 
 1707: @findex a (append text lines) command
 1708: @cindex Appending text after a line
 1709: @cindex Text, appending
 1710: @c Queue the lines of text which follow this command
 1711: @c (each but the last ending with a @code{\},
 1712: @c which are removed from the output)
 1713: @c to be output at the end of the current cycle,
 1714: @c or when the next input line is read.
 1715: @c 
 1716: このコマンドに続いているテキストの行(最後が@code{\}で終っているものは,
 1717: 出力から取り除かれます)を,現在のサイクルの終りや,次の入力行が読み込ま
 1718: れるときに出力されるキューに保存します.
 1719: 
 1720: @c Escape sequences in @var{text} are processed, so you should
 1721: @c use @code{\\} in @var{text} to print a single backslash.
 1722: @c 
 1723: @var{text}のエスケープシーケンスは処理されるので,単一のバックスラッシュ
 1724: を出力するため,@var{text}で@code{\\}を使用してください.
 1725: 
 1726: @c As a @acronym{GNU} extension, if between the @code{a} and the newline there is
 1727: @c other than a whitespace-@code{\} sequence, then the text of this line,
 1728: @c starting at the first non-whitespace character after the @code{a},
 1729: @c is taken as the first line of the @var{text} block.
 1730: @c (This enables a simplification in scripting a one-line add.)
 1731: @c This extension also works with the @code{i} and @code{c} commands.
 1732: @c 
 1733: @acronym{GNU}の拡張として,@code{a}と改行の間に連続した空白と@code{\}が
 1734: ある場合,@code{a}の後に最初の空白文字以外で始まるこの行のテキストは,
 1735: @var{text}ブロックの最初の行として受けとられます.(これで,単純に一行の
 1736: スクリプトを追加するだけで可能になります.)この拡張は,@code{i}と
 1737: @code{c}コマンドを用いても動作します.
 1738: 
 1739: @item i\
 1740: @itemx @var{text}
 1741: @cindex @value{SSEDEXT}, two addresses supported by most commands
 1742: @c As a @acronym{GNU} extension, this command accepts two addresses.
 1743: @c 
 1744: @acronym{GNU}の拡張として,このコマンドは二つのアドレスを受け入れます.
 1745: 
 1746: @findex i (insert text lines) command
 1747: @cindex Inserting text before a line
 1748: @cindex Text, insertion
 1749: @c Immediately output the lines of text which follow this command
 1750: @c (each but the last ending with a @code{\},
 1751: @c which are removed from the output).
 1752: @c 
 1753: このコマンドに続いている行(最後が@code{\}で終っているものは,出力から取
 1754: り除かれます)をすぐに出力します.
 1755: 
 1756: @item c\
 1757: @itemx @var{text}
 1758: @findex c (change to text lines) command
 1759: @cindex Replacing selected lines with other text
 1760: @c Delete the lines matching the address or address-range,
 1761: @c and output the lines of text which follow this command
 1762: @c (each but the last ending with a @code{\},
 1763: @c which are removed from the output)
 1764: @c in place of the last line
 1765: @c (or in place of each line, if no addresses were specified).
 1766: @c A new cycle is started after this command is done,
 1767: @c since the pattern space will have been deleted.
 1768: @c 
 1769: マッチしたアドレスやアドレスの範囲の行を削除し,このコマンドに続いてい
 1770: る行(最後が@code{\}で終っているものは,出力から取り除かれます)を,最後
 1771: の行の位置(または,アドレスが指定されていない場合はそれぞれの行の位置)
 1772: に出力します.新しいサイクルは,パターン空間が削除されてから,このコマ
 1773: ンド終了後に開始されます.
 1774: 
 1775: @item =
 1776: @cindex @value{SSEDEXT}, two addresses supported by most commands
 1777: @c As a @acronym{GNU} extension, this command accepts two addresses.
 1778: @c 
 1779: @acronym{GNU}の拡張として,このコマンドは二つのアドレスを受け入れます.
 1780: 
 1781: @findex = (print line number) command
 1782: @cindex Printing line number
 1783: @cindex Line number, printing
 1784: @c Print out the current input line number (with a trailing newline).
 1785: @c 
 1786: 現在の入力行の行数を(改行を追加して)出力します.
 1787: 
 1788: @item l @var{n}
 1789: @findex l (list unambiguously) command
 1790: @cindex List pattern space
 1791: @cindex Printing text unambiguously
 1792: @cindex Line length, setting
 1793: @cindex @value{SSEDEXT}, setting line length
 1794: @c Print the pattern space in an unambiguous form:
 1795: @c non-printable characters (and the @code{\} character)
 1796: @c are printed in C-style escaped form; long lines are split,
 1797: @c with a trailing @code{\} character to indicate the split;
 1798: @c the end of each line is marked with a @code{$}.
 1799: @c 
 1800: 明確な様式でパターン空間を出力します.出力不可能な文字(と@code{\}文字)
 1801: は,Cの形式でエスケープされた様式で出力されます.長い行は分割を示す
 1802: @code{\}を後置して分割されます.それぞれの行の終りには@code{$}で印が付
 1803: きます.
 1804: 
 1805: @c @var{n} specifies the desired line-wrap length;
 1806: @c a length of 0 (zero) means to never wrap long lines.  If omitted,
 1807: @c the default as specified on the command line is used.  The @var{n}
 1808: @c parameter is a @value{SSED} extension.
 1809: @c 
 1810: @var{n}は,要求される行を丸める長さを指定します.0(ゼロ)の長さは長い行
 1811: を丸めないことを意味します.省略されている場合,コマンドラインで指定さ
 1812: れているものがデフォルトとして使用されます.@var{n}パラメータは
 1813: @value{SSED}の拡張です.
 1814: 
 1815: @item r @var{filename}
 1816: @cindex @value{SSEDEXT}, two addresses supported by most commands
 1817: @c As a @acronym{GNU} extension, this command accepts two addresses.
 1818: @c 
 1819: @acronym{GNU}の拡張として,このコマンドは二つのアドレスを受け入れます.
 1820: 
 1821: @findex r (read file) command
 1822: @cindex Read text from a file
 1823: @cindex @value{SSEDEXT}, @file{/dev/stdin} file
 1824: @c Queue the contents of @var{filename} to be read and
 1825: @c inserted into the output stream at the end of the current cycle,
 1826: @c or when the next input line is read.
 1827: @c Note that if @var{filename} cannot be read, it is treated as
 1828: @c if it were an empty file, without any error indication.
 1829: @c 
 1830: @var{filename}の内容を読み込み,現在のサイクルの終りや次の入力行が読み
 1831: 込まれたときに出力ストリームに挿入するためキューに保存します.
 1832: @var{filename}が読み込み不可能な場合,エラーを示すことなく空のファイル
 1833: が読み込まれているかのように扱われることに注意してください.
 1834: 
 1835: @c As a @value{SSED} extension, the special value @file{/dev/stdin}
 1836: @c is supported for the file name, which reads the contents of the
 1837: @c standard input.
 1838: @c 
 1839: @value{SSED}の拡張として,特殊な値@file{/dev/stdin}がファイル名としてサ
 1840: ポートされていて,それは標準入力の内容を読み込みます.
 1841: 
 1842: @item w @var{filename}
 1843: @findex w (write file) command
 1844: @cindex Write to a file
 1845: @cindex @value{SSEDEXT}, @file{/dev/stdout} file
 1846: @cindex @value{SSEDEXT}, @file{/dev/stderr} file
 1847: @c Write the pattern space to @var{filename}.
 1848: @c As a @value{SSED} extension, two special values of @var{file-name} are
 1849: @c supported: @file{/dev/stderr}, which writes the result to the standard
 1850: @c error, and @file{/dev/stdout}, which writes to the standard
 1851: @c output.@footnote{This is equivalent to @code{p} unless the @option{-i}
 1852: @c option is being used.}
 1853: @c 
 1854: パターン空間を@var{filename}に書き出します.@value{SSED}の拡張として,
 1855: @var{file-name}として二つの特殊な値がサポートされています.
 1856: @file{/dev/stderr}は結果を標準エラー出力に書き出し,@file{/dev/stdout}
 1857: は標準出力に書き出します.@footnote{@code{-i}スイッチが使用されていない
 1858: 限り,これは@code{p}と等価です.}
 1859: 
 1860: @c The file will be created (or truncated) before the
 1861: @c first input line is read; all @code{w} commands
 1862: @c (including instances of @code{w} flag on successful @code{s} commands)
 1863: @c which refer to the same @var{filename} are output without
 1864: @c closing and reopening the file.
 1865: @c 
 1866: 最初の入力行が読み込まれる前に,ファイルは作成され(または切り詰められ)
 1867: ます.同じ@var{filename}を参照するすべての@code{w}コマンドは(@code{s}コ
 1868: マンド成功時の@code{w}フラグのインスタンスを含めて),ファイルを閉じ再び
 1869: 開くこと無く出力されます.
 1870: 
 1871: @item D
 1872: @findex D (delete first line) command
 1873: @cindex Delete first line from pattern space
 1874: @c Delete text in the pattern space up to the first newline.
 1875: @c If any text is left, restart cycle with the resultant
 1876: @c pattern space (without reading a new line of input),
 1877: @c otherwise start a normal new cycle.
 1878: @c 
 1879: パターン空間のテキストを最初の改行まで削除します.テキストが残っている
 1880: 場合,(入力の新しい行を読み込むことなく)結果として生じているパターンス
 1881: ペースでサイクルを再び開始し,それ以外では通常通り新しいサイクルを開始
 1882: します.
 1883: 
 1884: @item N
 1885: @findex N (append Next line) command
 1886: @cindex Next input line, append to pattern space
 1887: @cindex Append next input line to pattern space
 1888: @c Add a newline to the pattern space,
 1889: @c then append the next line of input to the pattern space.
 1890: @c If there is no more input then @command{sed} exits without processing
 1891: @c any more commands.
 1892: @c 
 1893: パターン空間に改行を追加し,入力の次の行をパターン空間に後置します.入
 1894: 力がそれ以上ない場合,@command{sed}は終了し,それ以上のコマンドを処理し
 1895: ません.
 1896: 
 1897: @item P
 1898: @findex P (print first line) command
 1899: @cindex Print first line from pattern space
 1900: @c Print out the portion of the pattern space up to the first newline.
 1901: @c 
 1902: パターン空間の位置を最初の改行まで出力します.
 1903: 
 1904: @item h
 1905: @findex h (hold) command
 1906: @cindex Copy pattern space into hold space
 1907: @cindex Replace hold space with copy of pattern space
 1908: @cindex Hold space, copying pattern space into
 1909: @c Replace the contents of the hold space with the contents of the pattern space.
 1910: @c 
 1911: ホールド空間の内容をパターン空間の内容で置換します.
 1912: 
 1913: @item H
 1914: @findex H (append Hold) command
 1915: @cindex Append pattern space to hold space
 1916: @cindex Hold space, appending from pattern space
 1917: @c Append a newline to the contents of the hold space,
 1918: @c and then append the contents of the pattern space to that of the hold space.
 1919: @c 
 1920: ホールド空間の内容に改行を後置した後,パターン空間の内容をホールド空間
 1921: に後置します.
 1922: 
 1923: @item g
 1924: @findex g (get) command
 1925: @cindex Copy hold space into pattern space
 1926: @cindex Replace pattern space with copy of hold space
 1927: @cindex Hold space, copy into pattern space
 1928: @c Replace the contents of the pattern space with the contents of the hold space.
 1929: @c 
 1930: パターン空間の内容をホールド空間の内容で置換します.
 1931: 
 1932: @item G
 1933: @findex G (appending Get) command
 1934: @cindex Append hold space to pattern space
 1935: @cindex Hold space, appending to pattern space
 1936: @c Append a newline to the contents of the pattern space,
 1937: @c and then append the contents of the hold space to that of the pattern space.
 1938: @c 
 1939: パターン空間の内容に改行を後置した後,ホールド空間の内容をパターン空間
 1940: に後置します.
 1941: 
 1942: @item x
 1943: @findex x (eXchange) command
 1944: @cindex Exchange hold space with pattern space
 1945: @cindex Hold space, exchange with pattern space
 1946: @c Exchange the contents of the hold and pattern spaces.
 1947: @c 
 1948: ホールド空間とパターン空間の内容を入れ換えます.
 1949: 
 1950: @end table
 1951: 
 1952: 
 1953: @node Programming Commands
 1954: @c @section Commands for @command{sed} gurus
 1955: @section @command{sed}のベテランプログラマのためのコマンド
 1956: 
 1957: @c In most cases, use of these commands indicates that you are
 1958: @c probably better off programming in something like @command{awk}
 1959: @c or Perl.  But occasionally one is committed to sticking
 1960: @c with @command{sed}, and these commands can enable one to write
 1961: @c quite convoluted scripts.
 1962: @c 
 1963: ほとんどの状況で,これらのコマンドを使用するよりは,おそらく@code{awk}
 1964: やPerlのようなものでプログラムをした方が良いでしょう.しかし,時には
 1965: @command{sed}に執念を燃やす人もいて,これらのコマンドで全く複雑なスクリ
 1966: プトを書くことも可能になります.
 1967: 
 1968: @cindex Flow of control in scripts
 1969: @table @code
 1970: @item : @var{label}
 1971: @c [No addresses allowed.]
 1972: @c 
 1973: [アドレスは利用不可能です.]
 1974: 
 1975: @findex : (label) command
 1976: @cindex Labels, in scripts
 1977: @c Specify the location of @var{label} for branch commands.
 1978: @c In all other respects, a no-op.
 1979: @c 
 1980: 条件分岐コマンドに対する@var{label}の位置を指定します.それ以外では何も
 1981: しません.
 1982: 
 1983: @item b @var{label}
 1984: @findex b (branch) command
 1985: @cindex Branch to a label, unconditionally
 1986: @cindex Goto, in scripts
 1987: @c Unconditionally branch to @var{label}.
 1988: @c The @var{label} may be omitted, in which case the next cycle is started.
 1989: @c 
 1990: 無条件で@var{label}に分岐します.@var{label}は省略可能で,その場合は次
 1991: のサイクルが開始されます.
 1992: 
 1993: @item t @var{label}
 1994: @findex t (test and branch if successful) command
 1995: @cindex Branch to a label, if @code{s///} succeeded
 1996: @cindex Conditional branch
 1997: @c Branch to @var{label} only if there has been a successful @code{s}ubstitution
 1998: @c since the last input line was read or conditional branch was taken.
 1999: @c The @var{label} may be omitted, in which case the next cycle is started.
 2000: @c 
 2001: 前回の入力行の読み込みや条件分岐が行なわれてから,@code{s}の置換で成功
 2002: したしたものがある場合だけ,@var{label}に分岐します.@var{label}は省略
 2003: 可能で,その場合は次のサイクルが開始されます.
 2004: 
 2005: @end table
 2006: 
 2007: @node Extended Commands
 2008: @c @section Commands Specific to @value{SSED}
 2009: @section @value{SSED}特有のコマンド
 2010: 
 2011: @c These commands are specific to @value{SSED}, so you
 2012: @c must use them with care and only when you are sure that
 2013: @c hindering portability is not evil.  They allow you to check
 2014: @c for @value{SSED} extensions or to do tasks that are required
 2015: @c quite often, yet are unsupported by standard @command{sed}s.
 2016: @c 
 2017: 以下のコマンドは@value{SSED}特有なので,注意して使用する必要があり,移
 2018: 植性の邪魔が問題ないことが分かっているときだけ使用してください.それで
 2019: @value{SSED}の拡張を調査したり,標準的な@command{sed}ではまだサポートさ
 2020: れていないが,よく要求される作業を行なうことが可能になります.
 2021: 
 2022: @table @code
 2023: @item e [@var{command}]
 2024: @findex e (evaluate) command
 2025: @cindex Evaluate Bourne-shell commands
 2026: @cindex Subprocesses
 2027: @cindex @value{SSEDEXT}, evaluating Bourne-shell commands
 2028: @cindex @value{SSEDEXT}, subprocesses
 2029: @c This command allows one to pipe input from a shell command
 2030: @c into pattern space.  Without parameters, the @code{e} command
 2031: @c executes the command that is found in pattern space and
 2032: @c replaces the pattern space with the output; a trailing newline
 2033: @c is suppressed.
 2034: @c 
 2035: このコマンドで,シェルコマンドからの入力をパターン空間へパイプで渡すこ
 2036: とが可能になります.パラメータを用いていない場合,@code{e}コマンドはパ
 2037: ターン空間で見つかったコマンドを実行し,パターン空間を出力で置換します.
 2038: 後置される改行は抑制されます.
 2039: 
 2040: @c If a parameter is specified, instead, the @code{e} command
 2041: @c interprets it as a command and sends its output to the output stream
 2042: @c (like @code{r} does).  The command can run across multiple
 2043: @c lines, all but the last ending with a back-slash.
 2044: @c 
 2045: パラメータが指定されている場合は,代わりに@code{e}コマンドがそれをコマ
 2046: ンドとして解釈し,(@code{r}が行なうように)それを出力ストリームに送りま
 2047: す.そのコマンドは,最後の終りがバックスラッシュでない限り,複数の行を
 2048: 跨って実行することが可能です.
 2049: 
 2050: @c In both cases, the results are undefined if the command to be
 2051: @c executed contains a @sc{nul} character.
 2052: @c 
 2053: いずれの場合でも,実行されたコマンドに@sc{nul}文字が含まれる場合,結果
 2054: は定義されていません.
 2055: 
 2056: @item L @var{n}
 2057: @findex L (fLow paragraphs) command
 2058: @cindex Reformat pattern space
 2059: @cindex Reformatting paragraphs
 2060: @cindex @value{SSEDEXT}, reformatting paragraphs
 2061: @cindex @value{SSEDEXT}, @code{L} command
 2062: @c This @value{SSED} extension fills and joins lines in pattern space
 2063: @c to produce output lines of (at most) @var{n} characters, like
 2064: @c @code{fmt} does; if @var{n} is omitted, the default as specified
 2065: @c on the command line is used.  This command is considered a failed
 2066: @c experiment and unless there is enough request (which seems unlikely)
 2067: @c will be removed in future versions.
 2068: @c 
 2069: この@value{SSED}の拡張は,@code{fmt}が行なうように,(最大)@var{n}文字の
 2070: 行の出力を生成するため,パターン空間の行を補充しつなげます.@var{n}が省
 2071: 略されている場合,コマンドラインで指定されているデフォルトを使用します.
 2072: このコマンドの試みは失敗だと思っていて,要求が無ければ(きっと無いでしょ
 2073: うけど)将来のバージョンでは削除する予定です.
 2074: 
 2075: @ignore
 2076: Blank lines, spaces between words, and indentation are
 2077: preserved in the output; successive input lines with different
 2078: indentation are not joined; tabs are expanded to 8 columns.
 2079: 
 2080: If the pattern space contains multiple lines, they are joined, but
 2081: since the pattern space usually contains a single line, the behavior
 2082: of a simple @code{L;d} script is the same as @samp{fmt -s} (i.e.,
 2083: it does not join short lines to form longer ones).
 2084: 
 2085: @var{n} specifies the desired line-wrap length; if omitted,
 2086: the default as specified on the command line is used.
 2087: @end ignore
 2088: 
 2089: @item Q [@var{exit-code}]
 2090: @c This command only accepts a single address.
 2091: @c 
 2092: このコマンドは,単一のアドレスだけを受け入れます.
 2093: 
 2094: @findex Q (silent Quit) command
 2095: @cindex @value{SSEDEXT}, quitting silently
 2096: @cindex @value{SSEDEXT}, returning an exit code
 2097: @cindex Quitting
 2098: @c This command is the same as @code{q}, but will not print the
 2099: @c contents of pattern space.  Like @code{q}, it provides the
 2100: @c ability to return an exit code to the caller.
 2101: @c 
 2102: このコマンドは@code{q}と同じですが,パターン空間の内容を出力しません.
 2103: @code{q}に似ていて,呼び出し側に終了コードを返す能力を提供しています.
 2104: 
 2105: @c This command can be useful because the only alternative ways
 2106: @c to accomplish this apparently trivial function are to use
 2107: @c the @option{-n} option (which can unnecessarily complicate
 2108: @c your script) or resorting to the following snippet, which
 2109: @c wastes time by reading the whole file without any visible effect:
 2110: @c 
 2111: この一見些細な機能を達成する唯一の別の方法は,@code{-n}オプションを使用
 2112: する方法(スクリプトが不必要に複雑になります)や,見た目に影響しないよう
 2113: にファイル全体を読み込むと時間が無駄になるような以下の断片を利用する方
 2114: 法なので,役に立つはずです.
 2115: 
 2116: @example
 2117: :eat
 2118: $d       @i{Quit silently on the last line}
 2119: N        @i{Read another line, silently}
 2120: g        @i{Overwrite pattern space each time to save memory}
 2121: b eat
 2122: @end example
 2123: 
 2124: @item R @var{filename}
 2125: @findex R (read line) command
 2126: @cindex Read text from a file
 2127: @cindex @value{SSEDEXT}, reading a file a line at a time
 2128: @cindex @value{SSEDEXT}, @code{R} command
 2129: @cindex @value{SSEDEXT}, @file{/dev/stdin} file
 2130: @c Queue a line of @var{filename} to be read and
 2131: @c inserted into the output stream at the end of the current cycle,
 2132: @c or when the next input line is read.
 2133: @c Note that if @var{filename} cannot be read, or if its end is
 2134: @c reached, no line is appended, without any error indication.
 2135: @c 
 2136: @var{filename}の行を読み込み,現在のサイクルの終りや次の入力行が読み込
 2137: まれたときに出力ストリームに挿入するためキューに保存します.
 2138: @var{filename}が読み込み不可能,またはファイルの終りに達した場合,エラー
 2139: を示すことなく,行が追加されないことに注意してください.
 2140: 
 2141: @c As with the @code{r} command, the special value @file{/dev/stdin}
 2142: @c is supported for the file name, which reads a line from the
 2143: @c standard input.
 2144: @c 
 2145: @code{r}コマンド同様,特殊な値@file{/dev/stdin}がファイル名としてサポー
 2146: トされていて,それは標準入力の内容を読み込みます.
 2147: 
 2148: @item T @var{label}
 2149: @findex T (test and branch if failed) command
 2150: @cindex @value{SSEDEXT}, branch if @code{s///} failed
 2151: @cindex Branch to a label, if @code{s///} failed
 2152: @cindex Conditional branch
 2153: @c Branch to @var{label} only if there have been no successful
 2154: @c @code{s}ubstitutions since the last input line was read or
 2155: @c conditional branch was taken. The @var{label} may be omitted,
 2156: @c in which case the next cycle is started.
 2157: @c 
 2158: 前回の入力行の読み込みや条件分岐が行なわれてから,@code{s}の置換で成功
 2159: しなかったものがある場合だけ@var{label}に分岐します.@var{label}は省略
 2160: 可能で,その場合は次のサイクルが開始されます.
 2161: 
 2162: @item v @var{version}
 2163: @findex v (version) command
 2164: @cindex @value{SSEDEXT}, checking for their presence
 2165: @cindex Requiring @value{SSED}
 2166: @c This command does nothing, but makes @command{sed} fail if
 2167: @c @value{SSED} extensions are not supported, simply because other
 2168: @c versions of @command{sed} do not implement it.  In addition, you
 2169: @c can specify the version of @command{sed} that your script
 2170: @c requires, such as @code{4.0.5}.  The default is @code{4.0}
 2171: @c because that is the first version that implemented this command.
 2172: @c 
 2173: このコマンドは何もしませんが,@value{SSED}の拡張がサポートされていない
 2174: 場合は@command{sed}は異常終了し,それはその他の@command{sed}の実装では
 2175: それを実装していないためです.さらに,@code{4.0.5}の様に,スクリプトが
 2176: 要求する@command{sed}のバージョンを指定することもかのうです.デフォルト
 2177: は@code{4.0}で,それは,このコマンドが実装された最初のバージョンだから
 2178: です.
 2179: 
 2180: @c This command enables all @value{SSEDEXT} even if
 2181: @c @env{POSIXLY_CORRECT} is set in the environment.
 2182: @c 
 2183: このコマンドは,@env{POSIXLY_CORRECT}が環境変数で設定されている場合でも,
 2184: すべての@value{SSEDEXT}を利用可能にします.
 2185: 
 2186: @item W @var{filename}
 2187: @findex W (write first line) command
 2188: @cindex Write first line to a file
 2189: @cindex @value{SSEDEXT}, writing first line to a file
 2190: @c Write to the given filename the portion of the pattern space up to
 2191: @c the first newline.  Everything said under the @code{w} command about
 2192: @c file handling holds here too.
 2193: @c 
 2194: 最初の改行までのパターン空間の位置を,与えられた@var{filename}に書き出
 2195: します.ここでのファイル処理は@code{w}コマンドですべて述べています.
 2196: @end table
 2197: 
 2198: @node Escapes
 2199: @c @section @acronym{GNU} Extensions for Escapes in Regular Expressions
 2200: @section 正規表現でのエスケープに関する@acronym{GNU}の拡張
 2201: 
 2202: @cindex @acronym{GNU} extensions, special escapes
 2203: @c Until this chapter, we have only encountered escapes of the form
 2204: @c @samp{\^}, which tell @command{sed} not to interpret the circumflex
 2205: @c as a special character, but rather to take it literally.  For
 2206: @c example, @samp{\*} matches a single asterisk rather than zero
 2207: @c or more backslashes.
 2208: @c 
 2209: この章まで,@command{sed}にキャレットを特殊文字でなく文字通りに解釈する
 2210: ように伝える@samp{\^}の形式のエスケープだけを見てきました.例えば,
 2211: @samp{\*} はゼロ以上のバックスラッシュでは無く単一のアスタリスクにマッ
 2212: チします.
 2213: 
 2214: @cindex @code{POSIXLY_CORRECT} behavior, escapes
 2215: @c This chapter introduces another kind of escape@footnote{All
 2216: @c the escapes introduced here are @acronym{GNU}
 2217: @c extensions, with the exception of @code{\n}.  In basic regular
 2218: @c expression mode, setting @code{POSIXLY_CORRECT} disables them inside
 2219: @c bracket expressions.}---that
 2220: @c is, escapes that are applied to a character or sequence of characters
 2221: @c that ordinarily are taken literally, and that @command{sed} replaces
 2222: @c with a special character.  This provides a way
 2223: @c of encoding non-printable characters in patterns in a visible manner.
 2224: @c There is no restriction on the appearance of non-printing characters
 2225: @c in a @command{sed} script but when a script is being prepared in the
 2226: @c shell or by text editing, it is usually easier to use one of
 2227: @c the following escape sequences than the binary character it
 2228: @c represents:
 2229: @c 
 2230: この章では,他の種類のエスケープを紹介します@footnote{この章で紹介する
 2231: すべてのエスケープは,@code{\n}以外@acronym{GNU}の拡張です.基本正規表
 2232: 現モードでは,@code{POSIXLY_CORRECT}を設定することで,それらを利用する
 2233: ことができなくなります.} --- すなわち,通常の文字や文字の連続に適用さ
 2234: れるエスケープは文字通りに受けとられ,@command{sed}は特殊文字で置換しま
 2235: す.これは,パターン空間の印刷不可能な文字を目に見える方法でエンコード
 2236: する方法を提供します.@command{sed}スクリプト内での印刷不可能な文字の存
 2237: 在に制限はありませんが,スクリプトがシェルやテキストの編集で準備される
 2238: とき,バイナリ文字で表現するより,以下のエスケープシーケンスの一つを使
 2239: 用する方が通常は簡単です.
 2240: 
 2241: @c The list of these escapes is:
 2242: @c 
 2243: 以下は、これらのエスケープのリストです.
 2244: 
 2245: @table @code
 2246: @item \a
 2247: @c Produces or matches a @sc{bel} character, that is an ``alert'' (@sc{ascii} 7).
 2248: @c 
 2249: @sc{bel}文字を生成またはそれにマッチし,それは``アラート''(@sc{ascii}
 2250: 7)です.
 2251: 
 2252: @item \f
 2253: @c Produces or matches a form feed (@sc{ascii} 12).
 2254: @c 
 2255: フォームフィードを生成またはそれにマッチします(@sc{ascii} 12).
 2256: 
 2257: @item \n
 2258: @c Produces or matches a newline (@sc{ascii} 10).
 2259: @c 
 2260: 改行を生成またはそれにマッチします(@sc{ascii} 10).
 2261: 
 2262: @item \r
 2263: @c Produces or matches a carriage return (@sc{ascii} 13).
 2264: @c 
 2265: キャリッジリターンを生成またはそれにマッチします(@sc{ascii} 13).
 2266: 
 2267: @item \t
 2268: @c Produces or matches a horizontal tab (@sc{ascii} 9).
 2269: @c 
 2270: 水平タブを生成またはそれにマッチします(@sc{ascii} 9).
 2271: 
 2272: @item \v
 2273: @c Produces or matches a so called ``vertical tab'' (@sc{ascii} 11).
 2274: @c 
 2275: ``垂直タブ''と呼ばれるものを生成またはそれにマッチします(@sc{ascii}
 2276: 11).
 2277: 
 2278: @item \c@var{x}
 2279: @c Produces or matches @kbd{@sc{Control}-@var{x}}, where @var{x} is
 2280: @c any character.  The precise effect of @samp{\c@var{x}} is as follows:
 2281: @c if @var{x} is a lower case letter, it is converted to upper case.
 2282: @c Then bit 6 of the character (hex 40) is inverted.  Thus @samp{\cz} becomes
 2283: @c hex 1A, but @samp{\c@{} becomes hex 3B, while @samp{\c;} becomes hex 7B.
 2284: @c 
 2285: @kbd{@sc{Control}-@var{x}}を生成,またはそれにマッチし,@var{x}は任意の
 2286: 文字です.@samp{\c@var{x}}の明確な効果は以下のようになります.@var{x}が
 2287: 小文字の場合,それは大文字に変換されます.文字のビットの6(16進数の40)が
 2288: 反転します.このため,@samp{\cz}は16進数の1Aになりますが,@samp{\c@{}は
 2289: 16進数の3Bになり,@samp{\c;}は16進数の7Bになります.
 2290: 
 2291: @item \d@var{xxx}
 2292: @c Produces or matches a character whose decimal @sc{ascii} value is @var{xxx}.
 2293: @c 
 2294: 十進数の@sc{ascii}値が@var{xxx}の文字を生成またはそれにマッチします.
 2295: 
 2296: @item \o@var{xxx}
 2297: @ifset PERL
 2298: @item \@var{xxx}
 2299: @end ifset
 2300: @c Produces or matches a character whose octal @sc{ascii} value is @var{xxx}.
 2301: @c 
 2302: 八進数の@sc{ascii}値が@var{xxx}の文字を生成またはそれにマッチします.
 2303: 
 2304: @ifset PERL
 2305: @c The syntax without the @code{o} is active in Perl mode, while the one
 2306: @c with the @code{o} is active in the normal or extended @sc{posix} regular
 2307: @c expression modes.
 2308: @c 
 2309: @code{o}を用いない構文はPerlモードで動作しますが,@code{o}を用いたもの
 2310: は通常の,または拡張された@sc{posix}正規表現モードで動作します.
 2311: @end ifset
 2312: 
 2313: @item \x@var{xx}
 2314: @c Produces or matches a character whose hexadecimal @sc{ascii} value is @var{xx}.
 2315: @c 
 2316: 16進数の@sc{ascii}値が@var{xx}の文字を生成またはそれにマッチします.
 2317: @end table
 2318: 
 2319: @c @samp{\b} (backspace) was omitted because of the conflict with
 2320: @c the existing ``word boundary'' meaning.
 2321: @c 
 2322: @samp{\b}(バックスラッシュ)は,既存の"単語の境界"の意味と衝突するので削
 2323: 除されています.
 2324: 
 2325: @c Other escapes match a particular character class and are valid only in
 2326: @c regular expressions:
 2327: @c 
 2328: それ以外のエスケープは特定の文字集合にマッチし,正規表現内だけで有効で
 2329: す.
 2330: 
 2331: @table @code
 2332: @item \w
 2333: @c Matches any ``word'' character.  A ``word'' character is any
 2334: @c letter or digit or the underscore character.
 2335: @c 
 2336: あらゆる``単語''文字にマッチします.``単語''文字とはすべての文字と数字
 2337: とアンダースコアです.
 2338: 
 2339: @item \W
 2340: @c Matches any ``non-word'' character.
 2341: @c 
 2342: ``単語以外''の文字にマッチします.
 2343: 
 2344: @item \b
 2345: @c Matches a word boundary; that is it matches if the character
 2346: @c to the left is a ``word'' character and the character to the
 2347: @c right is a ``non-word'' character, or vice-versa.
 2348: @c 
 2349: 単語の境界にマッチします.つまり,左が``単語''の文字になっている文字と,
 2350: 右が``単語以外''の文字になっている文字,またはその逆にマッチします.
 2351: 
 2352: @item \B
 2353: @c Matches everywhere but on a word boundary; that is it matches
 2354: @c if the character to the left and the character to the right
 2355: @c are either both ``word'' characters or both ``non-word''
 2356: @c characters.
 2357: @c 
 2358: 単語の境界ならどこにでもマッチします.つまり,文字の左と文字の右の両方
 2359: が``単語''または``単語以外''のいずれかの場合にマッチします.
 2360: 
 2361: @item \`
 2362: @c Matches only at the start of pattern space.  This is different
 2363: @c from @code{^} in multi-line mode.
 2364: @c 
 2365: パターンスペースの最初だけにマッチします.これは,複数行モードの
 2366: @code{^}とは異なります.
 2367: 
 2368: @item \'
 2369: @c Matches only at the end of pattern space.  This is different
 2370: @c from @code{$} in multi-line mode.
 2371: @c 
 2372: パターンスペースの最後だけにマッチします.これは,複数行モードの
 2373: @code{$}とは異なります.
 2374: 
 2375: @ifset PERL
 2376: @item \G
 2377: @c Match only at the start of pattern space or, when doing a global
 2378: @c substitution using the @code{s///g} command and option, at
 2379: @c the end-of-match position of the prior match.  For example,
 2380: @c @samp{s/\Ga/Z/g} will change an initial run of @code{a}s to
 2381: @c a run of @code{Z}s
 2382: @c 
 2383: パターンスペースの最初だけ,または,@code{s///g}コマンドとオプションの
 2384: 対で全体を置換しているとき,前回マッチした最後の場所だけにマッチします.
 2385: 例えば,@samp{s/\Ga/Z/g}は最初の場所指定無しの@code{a}を場所指定無しの
 2386: @code{Z}に変更します.
 2387: @end ifset
 2388: @end table
 2389: 
 2390: @node Examples
 2391: @c @chapter Some Sample Scripts
 2392: @chapter いくつかの見本スクリプト
 2393: 
 2394: @c Here are some @command{sed} scripts to guide you in the art of mastering
 2395: @c @command{sed}.
 2396: @c 
 2397: 以下は,@command{sed}をマスターするためのガイドとなる@command{sed}スク
 2398: リプトです.
 2399: 
 2400: @menu
 2401: Some exotic examples:
 2402: * Centering lines::
 2403: * Increment a number::
 2404: * Rename files to lower case::
 2405: * Print bash environment::
 2406: * Reverse chars of lines::
 2407: 
 2408: Emulating standard utilities:
 2409: * tac::                             Reverse lines of files
 2410: * cat -n::                          Numbering lines
 2411: * cat -b::                          Numbering non-blank lines
 2412: * wc -c::                           Counting chars
 2413: * wc -w::                           Counting words
 2414: * wc -l::                           Counting lines
 2415: * head::                            Printing the first lines
 2416: * tail::                            Printing the last lines
 2417: * uniq::                            Make duplicate lines unique
 2418: * uniq -d::                         Print duplicated lines of input
 2419: * uniq -u::                         Remove all duplicated lines
 2420: * cat -s::                          Squeezing blank lines
 2421: @end menu
 2422: 
 2423: @node Centering lines
 2424: @c @section Centering Lines
 2425: @section 行の中央揃え
 2426: 
 2427: @c This script centers all lines of a file on a 80 columns width.
 2428: @c To change that width, the number in @code{\@{@dots{}\@}} must be
 2429: @c replaced, and the number of added spaces also must be changed.
 2430: @c 
 2431: 以下のスクリプトは,ファイルのすべての行を80桁の幅でセンタリングします.
 2432: 幅を変更するため,@code{\@{@dots{}\@}}の数値を変更する必要があり,追加
 2433: されるスペースも変更する必要があります.
 2434: 
 2435: @c Note how the buffer commands are used to separate parts in
 2436: @c the regular expressions to be matched---this is a common
 2437: @c technique.
 2438: @c 
 2439: マッチさせる正規表現の部分を分離するため,バッファコマンドが使用されて
 2440: いる方法に注意してください --- これは一般的なテクニックです.
 2441: 
 2442: @c start-------------------------------------------
 2443: @example
 2444: #!/usr/bin/sed -f
 2445: 
 2446: @group
 2447: # Put 80 spaces in the buffer
 2448: 1 @{
 2449:   x
 2450:   s/^$/          /
 2451:   s/^.*$/&&&&&&&&/
 2452:   x
 2453: @}
 2454: @end group
 2455: 
 2456: @group
 2457: # del leading and trailing spaces
 2458: y/@kbd{tab}/ /
 2459: s/^ *//
 2460: s/ *$//
 2461: @end group
 2462: 
 2463: @group
 2464: # add a newline and 80 spaces to end of line
 2465: G
 2466: @end group
 2467: 
 2468: @group
 2469: # keep first 81 chars (80 + a newline)
 2470: s/^\(.\@{81\@}\).*$/\1/
 2471: @end group
 2472: 
 2473: @group
 2474: # \2 matches half of the spaces, which are moved to the beginning
 2475: s/^\(.*\)\n\(.*\)\2/\2\1/
 2476: @end group
 2477: @end example
 2478: @c end---------------------------------------------
 2479: 
 2480: @node Increment a number
 2481: @c @section Increment a Number
 2482: @section 数字を増加させる
 2483: 
 2484: @c This script is one of a few that demonstrate how to do arithmetic
 2485: @c in @command{sed}.  This is indeed possible,@footnote{@command{sed} guru Greg
 2486: @c Ubben wrote an implementation of the @command{dc} @sc{rpn} calculator!
 2487: @c It is distributed together with sed.} but must be done manually.
 2488: @c 
 2489: 以下のスクリプトは,@command{sed}で算数を行なう方法を説明するものの一つ
 2490: です.これは実際には可能ですが@footnote{@command{sed}のベテランGreg
 2491: Ubbenは,@code{dc} @sc{rpn}の計算機の実装を書いています!それはsedとと
 2492: もに配布されています.},手動で行なうべきでしょう.
 2493: 
 2494: @c To increment one number you just add 1 to last digit, replacing
 2495: @c it by the following digit.  There is one exception: when the digit
 2496: @c is a nine the previous digits must be also incremented until you
 2497: @c don't have a nine.
 2498: @c 
 2499: 数値を一つ増加させるには,最後の桁に1を追加し,それ以降の桁を置換するだ
 2500: けです.例外が一つあります.その桁の前の数値が9のとき,9が無くなるまで
 2501: 増加させる必要もあります.
 2502: 
 2503: @c This solution by Bruno Haible is very clever and smart because
 2504: @c it uses a single buffer; if you don't have this limitation, the
 2505: @c algorithm used in @ref{cat -n, Numbering lines}, is faster.
 2506: @c It works by replacing trailing nines with an underscore, then
 2507: @c using multiple @code{s} commands to increment the last digit,
 2508: @c and then again substituting underscores with zeros.
 2509: @c 
 2510: このBruno Haibleによる解決方法は,単一のバッファを使用しているので非常
 2511: に賢く知的です.この制限がない場合,@ref{cat -n, Numbering lines}で使用
 2512: されているアルゴリズムの方がより速いでしょう.それは後置される9をアンダー
 2513: スコアで置換し,複数の@code{s}コマンドを最後の桁を増加させるために使用
 2514: し,そして,再びアンダースコアをゼロで置換することで動作します.
 2515: 
 2516: @c start-------------------------------------------
 2517: @example
 2518: #!/usr/bin/sed -f
 2519: 
 2520: /[^0-9]/ d
 2521: 
 2522: @group
 2523: # replace all leading 9s by _ (any other character except digits, could
 2524: # be used)
 2525: :d
 2526: s/9\(_*\)$/_\1/
 2527: td
 2528: @end group
 2529: 
 2530: @group
 2531: # incr last digit only.  The first line adds a most-significant
 2532: # digit of 1 if we have to add a digit.
 2533: #
 2534: # The @code{tn} commands are not necessary, but make the thing
 2535: # faster
 2536: @end group
 2537: 
 2538: @group
 2539: s/^\(_*\)$/1\1/; tn
 2540: s/8\(_*\)$/9\1/; tn
 2541: s/7\(_*\)$/8\1/; tn
 2542: s/6\(_*\)$/7\1/; tn
 2543: s/5\(_*\)$/6\1/; tn
 2544: s/4\(_*\)$/5\1/; tn
 2545: s/3\(_*\)$/4\1/; tn
 2546: s/2\(_*\)$/3\1/; tn
 2547: s/1\(_*\)$/2\1/; tn
 2548: s/0\(_*\)$/1\1/; tn
 2549: @end group
 2550: 
 2551: @group
 2552: :n
 2553: y/_/0/
 2554: @end group
 2555: @end example
 2556: @c end---------------------------------------------
 2557: 
 2558: @node Rename files to lower case
 2559: @c @section Rename Files to Lower Case
 2560: @section ファイル名を小文字に変更する
 2561: 
 2562: @c This is a pretty strange use of @command{sed}.  We transform text, and
 2563: @c transform it to be shell commands, then just feed them to shell.
 2564: @c Don't worry, even worse hacks are done when using @command{sed}; I have
 2565: @c seen a script converting the output of @command{date} into a @command{bc}
 2566: @c program!
 2567: @c 
 2568: 以下はちょっと変わった@command{sed}の使用方法です.我々はテキストを変換
 2569: し,それをシェルコマンドに変換し,そして,それらをそのままシェルに与え
 2570: ます.@command{sed}を使用するとき,更に悪いことになっても気にしないでく
 2571: ださい.@code{date}の出力を@code{bc}プログラムに変換するスクリプトを見
 2572: たことだってあります!
 2573:  
 2574: @c The main body of this is the @command{sed} script, which remaps the name
 2575: @c from lower to upper (or vice-versa) and even checks out 
 2576: @c if the remapped name is the same as the original name.
 2577: @c Note how the script is parameterized using shell
 2578: @c variables and proper quoting.
 2579: @c 
 2580: これのメインの本体は@command{sed}スクリプトで,名前を小文字から大文字
 2581: (またはその逆に)に置き換え,置き換えられた名前がオリジナルの名前と同じ
 2582: 場合でも適用します.スクリプトがシェル変数を使用して媒介している方法と,
 2583: 適切に引用符で囲んでいる方法に注意してください.
 2584: 
 2585: @c start-------------------------------------------
 2586: @example
 2587: @group
 2588: #! /bin/sh
 2589: # rename files to lower/upper case... 
 2590: #
 2591: # usage: 
 2592: #    move-to-lower * 
 2593: #    move-to-upper * 
 2594: # or
 2595: #    move-to-lower -R .
 2596: #    move-to-upper -R .
 2597: #
 2598: @end group
 2599: 
 2600: @group
 2601: help()
 2602: @{
 2603: 	cat << eof
 2604: Usage: $0 [-n] [-r] [-h] files...
 2605: @end group
 2606: 
 2607: @group
 2608: -n      do nothing, only see what would be done
 2609: -R      recursive (use find)
 2610: -h      this message
 2611: files   files to remap to lower case
 2612: @end group
 2613: 
 2614: @group
 2615: Examples:
 2616:        $0 -n *        (see if everything is ok, then...)
 2617:        $0 *
 2618: @end group
 2619: 
 2620:        $0 -R .
 2621: 
 2622: @group
 2623: eof
 2624: @}
 2625: @end group
 2626: 
 2627: @group
 2628: apply_cmd='sh'
 2629: finder='echo "$@@" | tr " " "\n"'
 2630: files_only=
 2631: @end group
 2632: 
 2633: @group
 2634: while :
 2635: do
 2636:     case "$1" in 
 2637:         -n) apply_cmd='cat' ;;
 2638:         -R) finder='find "$@@" -type f';;
 2639:         -h) help ; exit 1 ;;
 2640:         *) break ;;
 2641:     esac
 2642:     shift
 2643: done
 2644: @end group
 2645: 
 2646: @group
 2647: if [ -z "$1" ]; then
 2648:         echo Usage: $0 [-h] [-n] [-r] files...
 2649:         exit 1
 2650: fi
 2651: @end group
 2652: 
 2653: @group
 2654: LOWER='abcdefghijklmnopqrstuvwxyz'
 2655: UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
 2656: @end group
 2657: 
 2658: @group
 2659: case `basename $0` in
 2660:         *upper*) TO=$UPPER; FROM=$LOWER ;;
 2661:         *)       FROM=$UPPER; TO=$LOWER ;;
 2662: esac
 2663: @end group
 2664: 	
 2665: eval $finder | sed -n '
 2666: 
 2667: @group
 2668: # remove all trailing slashes
 2669: s/\/*$//
 2670: @end group
 2671: 
 2672: @group
 2673: # add ./ if there is no path, only a filename
 2674: /\//! s/^/.\//
 2675: @end group
 2676: 
 2677: @group
 2678: # save path+filename
 2679: h
 2680: @end group
 2681: 
 2682: @group
 2683: # remove path
 2684: s/.*\///
 2685: @end group
 2686: 
 2687: @group
 2688: # do conversion only on filename
 2689: y/'$FROM'/'$TO'/
 2690: @end group
 2691: 
 2692: @group
 2693: # now line contains original path+file, while
 2694: # hold space contains the new filename
 2695: x
 2696: @end group
 2697: 
 2698: @group
 2699: # add converted file name to line, which now contains
 2700: # path/file-name\nconverted-file-name
 2701: G
 2702: @end group
 2703: 
 2704: @group
 2705: # check if converted file name is equal to original file name,
 2706: # if it is, do not print nothing
 2707: /^.*\/\(.*\)\n\1/b
 2708: @end group
 2709: 
 2710: @group
 2711: # now, transform path/fromfile\n, into
 2712: # mv path/fromfile path/tofile and print it
 2713: s/^\(.*\/\)\(.*\)\n\(.*\)$/mv \1\2 \1\3/p
 2714: @end group
 2715: 
 2716: ' | $apply_cmd
 2717: @end example
 2718: @c end---------------------------------------------
 2719: 
 2720: @node Print bash environment
 2721: @c @section Print @command{bash} Environment
 2722: @section @command{bash}の環境変数の出力
 2723: 
 2724: @c This script strips the definition of the shell functions
 2725: @c from the output of the @command{set} Bourne-shell command.
 2726: @c 
 2727: 以下のスクリプトは,@code{set} Bourneシェルコマンドの出力から,シェル関
 2728: 数の定義を取り除きます.
 2729: 
 2730: @c start-------------------------------------------
 2731: @example
 2732: #!/bin/sh
 2733: 
 2734: @group
 2735: set | sed -n '
 2736: :x
 2737: @end group
 2738: 
 2739: @group
 2740: @ifinfo
 2741: # if no occurrence of "=()" print and load next line
 2742: @end ifinfo
 2743: @ifnotinfo
 2744: # if no occurrence of @samp{=()} print and load next line
 2745: @end ifnotinfo
 2746: /=()/! @{ p; b; @}
 2747: / () $/! @{ p; b; @}
 2748: @end group
 2749: 
 2750: @group
 2751: # possible start of functions section
 2752: # save the line in case this is a var like FOO="() "
 2753: h
 2754: @end group
 2755: 
 2756: @group
 2757: # if the next line has a brace, we quit because
 2758: # nothing comes after functions
 2759: n
 2760: /^@{/ q
 2761: @end group
 2762: 
 2763: @group
 2764: # print the old line
 2765: x; p
 2766: @end group
 2767: 
 2768: @group
 2769: # work on the new line now
 2770: x; bx
 2771: '
 2772: @end group
 2773: @end example
 2774: @c end---------------------------------------------
 2775: 
 2776: @node Reverse chars of lines
 2777: @c @section Reverse Characters of Lines
 2778: @section 行の文字を反転する
 2779: 
 2780: @c This script can be used to reverse the position of characters
 2781: @c in lines.  The technique moves two characters at a time, hence
 2782: @c it is faster than more intuitive implementations.
 2783: @c 
 2784: 以下のスクリプトは,行の文字の位置を反転するために使用することが可能で
 2785: す.二つの文字を同時に移動するテクニックで,直観的な実装より高速になり
 2786: ます.
 2787: 
 2788: @c Note the @code{tx} command before the definition of the label.
 2789: @c This is often needed to reset the flag that is tested by
 2790: @c the @code{t} command.
 2791: @c 
 2792: ラベル定義の前の@code{tx}コマンドに注意してください.これは@code{t}コマ
 2793: ンドでテストされるフラグをリセットするために必要になることがよくありま
 2794: す.
 2795: 
 2796: @c Imaginative readers will find uses for this script.  An example
 2797: @c is reversing the output of @command{banner}.@footnote{This requires
 2798: @c another script to pad the output of banner; for example
 2799: @c 
 2800: 想像力豊かな読者は,このスクリプトの使い方が分かるでしょう.例えば,
 2801: @code{banner}の出力を反転させることです@footnote{これは,@code{banner}
 2802: の出力を埋める他のスクリプトが必要です.例えば以下のようにします.
 2803: 
 2804: @example
 2805: #! /bin/sh
 2806: 
 2807: banner -w $1 $2 $3 $4 |
 2808:   sed -e :a -e '/^.\@{0,'$1'\@}$/ @{ s/$/ /; ba; @}' |
 2809:   ~/sedscripts/reverseline.sed
 2810: @end example
 2811: }
 2812: 
 2813: @c start-------------------------------------------
 2814: @example
 2815: #!/usr/bin/sed -f
 2816: 
 2817: /../! b
 2818: 
 2819: @group
 2820: # Reverse a line.  Begin embedding the line between two newlines
 2821: s/^.*$/\
 2822: &\
 2823: /
 2824: @end group
 2825: 
 2826: @group
 2827: # Move first character at the end.  The regexp matches until
 2828: # there are zero or one characters between the markers
 2829: tx
 2830: :x
 2831: s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/
 2832: tx
 2833: @end group
 2834: 
 2835: @group
 2836: # Remove the newline markers
 2837: s/\n//g
 2838: @end group
 2839: @end example
 2840: @c end---------------------------------------------
 2841: 
 2842: @node tac
 2843: @c @section Reverse Lines of Files
 2844: @section ファイルの行を反転する
 2845: 
 2846: @c This one begins a series of totally useless (yet interesting)
 2847: @c scripts emulating various Unix commands.  This, in particular,
 2848: @c is a @command{tac} workalike.
 2849: @c 
 2850: 以下のものは,様々なUnixコマンドをエミュレートする全く意味がない(面白い
 2851: けどね)スクリプトです.これは特に@code{tac}と同等の動作をします.
 2852: 
 2853: @c Note that on implementations other than @acronym{GNU} @command{sed}
 2854: @c 
 2855: @acronym{GNU} @command{sed}と@value{SSED}以外の実装では,
 2856: @ifset PERL
 2857: @c and @value{SSED}
 2858: @c 
 2859: そして,@value{SSED}以外の実装では,
 2860: @end ifset
 2861: @c this script might easily overflow internal buffers.
 2862: @c 
 2863: このスクリプトは簡単に内部バッファでオーバーフローする可能性があること
 2864: に注意してください.
 2865: 
 2866: @c start-------------------------------------------
 2867: @example
 2868: #!/usr/bin/sed -nf
 2869: 
 2870: # reverse all lines of input, i.e. first line became last, ...
 2871: 
 2872: @group
 2873: # from the second line, the buffer (which contains all previous lines)
 2874: # is *appended* to current line, so, the order will be reversed
 2875: 1! G
 2876: @end group
 2877: 
 2878: @group
 2879: # on the last line we're done -- print everything
 2880: $ p
 2881: @end group
 2882: 
 2883: @group
 2884: # store everything on the buffer again
 2885: h
 2886: @end group
 2887: @end example
 2888: @c end---------------------------------------------
 2889: 
 2890: @node cat -n
 2891: @c @section Numbering Lines
 2892: @section 行の番号付け
 2893: 
 2894: @c This script replaces @samp{cat -n}; in fact it formats its output
 2895: @c exactly like @acronym{GNU} @command{cat} does.
 2896: @c 
 2897: 以下のスクリプトは@code{cat -n}の置換えです.実際それは,出力を
 2898: @acronym{GNU} @code{cat}のように正確に書式化します.
 2899: 
 2900: @c Of course this is completely useless and for two reasons:  first,
 2901: @c because somebody else did it in C, second, because the following
 2902: @c Bourne-shell script could be used for the same purpose and would
 2903: @c be much faster:
 2904: @c 
 2905: もちろん,これは二つの理由から全く意味がありません.まず始めに他のもの
 2906: はCで行ないます.二番目に以下のBourneシェルスクリプトは同じ目的で使用さ
 2907: れ,はるかに速くなります.
 2908: 
 2909: @c start-------------------------------------------
 2910: @example
 2911: @group
 2912: #! /bin/sh
 2913: sed -e "=" $@@ | sed -e '
 2914:   s/^/      /
 2915:   N
 2916:   s/^ *\(......\)\n/\1  /
 2917: '
 2918: @end group
 2919: @end example
 2920: @c end---------------------------------------------
 2921: 
 2922: @c It uses @command{sed} to print the line number, then groups lines two
 2923: @c by two using @code{N}.  Of course, this script does not teach as much as
 2924: @c the one presented below.
 2925: @c 
 2926: それは行番号を出力するために@command{sed}を使用し,二つの@code{N}で行を
 2927: 二つにまとめます.もちろん,このスクリプトは以下で提示するものほど教わ
 2928: るものはありません.
 2929: 
 2930: @c The algorithm used for incrementing uses both buffers, so the line
 2931: @c is printed as soon as possible and then discarded.  The number
 2932: @c is split so that changing digits go in a buffer and unchanged ones go
 2933: @c in the other; the changed digits are modified in a single step
 2934: @c (using a @code{y} command).  The line number for the next line
 2935: @c is then composed and stored in the hold space, to be used in the
 2936: @c next iteration.
 2937: @c 
 2938: 増加で使用しているアルゴリズムを両方のバッファで使用しているので,行は
 2939: 可能な限り速く出力され,そして破棄されます.数値は変更した桁がバッファ
 2940: に入り,変更されないものがもう一方に行くように分離されています.変更さ
 2941: れる桁は単一のステップ(@code{y}コマンドを使用して)修正されます.次の行
 2942: の行番号は,次の繰り返しで使用されるように,作成されホールド空間に保存
 2943: されます.
 2944: 
 2945: @c start-------------------------------------------
 2946: @example
 2947: #!/usr/bin/sed -nf
 2948: 
 2949: @group
 2950: # Prime the pump on the first line
 2951: x
 2952: /^$/ s/^.*$/1/
 2953: @end group
 2954: 
 2955: @group
 2956: # Add the correct line number before the pattern
 2957: G
 2958: h
 2959: @end group
 2960: 
 2961: @group
 2962: # Format it and print it
 2963: s/^/      /
 2964: s/^ *\(......\)\n/\1  /p
 2965: @end group
 2966: 
 2967: @group
 2968: # Get the line number from hold space; add a zero
 2969: # if we're going to add a digit on the next line
 2970: g
 2971: s/\n.*$//
 2972: /^9*$/ s/^/0/
 2973: @end group
 2974: 
 2975: @group
 2976: # separate changing/unchanged digits with an x
 2977: s/.9*$/x&/
 2978: @end group
 2979: 
 2980: @group
 2981: # keep changing digits in hold space
 2982: h
 2983: s/^.*x//
 2984: y/0123456789/1234567890/
 2985: x
 2986: @end group
 2987: 
 2988: @group
 2989: # keep unchanged digits in pattern space
 2990: s/x.*$//
 2991: @end group
 2992: 
 2993: @group
 2994: # compose the new number, remove the newline implicitly added by G
 2995: G
 2996: s/\n//
 2997: h
 2998: @end group
 2999: @end example
 3000: @c end---------------------------------------------
 3001: 
 3002: @node cat -b
 3003: @c @section Numbering Non-blank Lines
 3004: @section 空白行以外に番号を付ける
 3005: 
 3006: @c Emulating @samp{cat -b} is almost the same as @samp{cat -n}---we only
 3007: @c have to select which lines are to be numbered and which are not.
 3008: @c 
 3009: @code{cat -b}のエミュレートは,ほとんど@code{cat -n}と同じです --- 我々
 3010: は,番号を付ける行と付けない行を選択する必要があっただけです.
 3011: 
 3012: @c The part that is common to this script and the previous one is
 3013: @c not commented to show how important it is to comment @command{sed}
 3014: @c scripts properly...
 3015: @c 
 3016: このスクリプトの前回ものとの共通部分には,適切な@command{sed}スクリプト
 3017: へのコメントがいかに重要かを表示するコメントを付けていません@enddots{}
 3018: 
 3019: @c start-------------------------------------------
 3020: @example
 3021: #!/usr/bin/sed -nf
 3022: 
 3023: @group
 3024: /^$/ @{
 3025:   p
 3026:   b
 3027: @}
 3028: @end group
 3029: 
 3030: @group
 3031: # Same as cat -n from now
 3032: x
 3033: /^$/ s/^.*$/1/
 3034: G
 3035: h
 3036: s/^/      /
 3037: s/^ *\(......\)\n/\1  /p
 3038: x
 3039: s/\n.*$//
 3040: /^9*$/ s/^/0/
 3041: s/.9*$/x&/
 3042: h
 3043: s/^.*x//
 3044: y/0123456789/1234567890/
 3045: x
 3046: s/x.*$//
 3047: G
 3048: s/\n//
 3049: h
 3050: @end group
 3051: @end example
 3052: @c end---------------------------------------------
 3053: 
 3054: @node wc -c
 3055: @c @section Counting Characters
 3056: @section 文字を数える
 3057: 
 3058: @c This script shows another way to do arithmetic with @command{sed}.
 3059: @c In this case we have to add possibly large numbers, so implementing
 3060: @c this by successive increments would not be feasible (and possibly
 3061: @c even more complicated to contrive than this script).
 3062: @c 
 3063: 以下のスクリプトは,@command{sed}で算数を行なうもう一つの方法を提示して
 3064: います.この状況では,我々は可能な限り大きな数を追加する必要があるので,
 3065: これを正しく増加するように実装することは不可能でしょう(そして,おそらく
 3066: このスクリプトより計算がより複雑になるでしょう).
 3067: 
 3068: @c The approach is to map numbers to letters, kind of an abacus
 3069: @c implemented with @command{sed}.  @samp{a}s are units, @samp{b}s are
 3070: @c tenths and so on: we simply add the number of characters
 3071: @c on the current line as units, and then propagate the carry
 3072: @c to tenths, hundredths, and so on.
 3073: @c 
 3074: 数字を文字に割り当てるアプローチは,@command{sed}を用いたソロバンの実装
 3075: の一種です.@samp{a}は一の位,@samp{b}は十の位などになっています.我々
 3076: は,一の位として現在の行に文字の数を単純に追加し,十の位,百の位などに
 3077: 繰り上げ伝搬させています.
 3078: 
 3079: @c As usual, running totals are kept in hold space.
 3080: @c 
 3081: 通常通り,実行時の総数はホールド空間に保持されます.
 3082: 
 3083: @c On the last line, we convert the abacus form back to decimal.
 3084: @c For the sake of variety, this is done with a loop rather than
 3085: @c with some 80 @code{s} commands@footnote{Some implementations
 3086: @c have a limit of 199 commands per script}: first we
 3087: @c convert units, removing @samp{a}s from the number; then we
 3088: @c rotate letters so that tenths become @samp{a}s, and so on
 3089: @c until no more letters remain.
 3090: @c 
 3091: 最後の行で,ソロバンを十進数の形式に戻しています.多様性のため,これは
 3092: 80の@code{s}コマンドではなく,ループを用いて行なっています@footnote{実
 3093: 装によっては,スクリプトごとのコマンドが199に制限されています.}.最初
 3094: に一の位を変換し,@samp{a}を数値から削除します.そして十の位が@samp{a}
 3095: になるように文字を回転させ,残っている文字が無くなるまでそれを続けます.
 3096: 
 3097: @c start-------------------------------------------
 3098: @example
 3099: #!/usr/bin/sed -nf
 3100: 
 3101: @group
 3102: # Add n+1 a's to hold space (+1 is for the newline)
 3103: s/./a/g
 3104: H
 3105: x
 3106: s/\n/a/
 3107: @end group
 3108: 
 3109: @group
 3110: # Do the carry.  The t's and b's are not necessary,
 3111: # but they do speed up the thing
 3112: t a
 3113: : a;  s/aaaaaaaaaa/b/g; t b; b done
 3114: : b;  s/bbbbbbbbbb/c/g; t c; b done
 3115: : c;  s/cccccccccc/d/g; t d; b done
 3116: : d;  s/dddddddddd/e/g; t e; b done
 3117: : e;  s/eeeeeeeeee/f/g; t f; b done
 3118: : f;  s/ffffffffff/g/g; t g; b done
 3119: : g;  s/gggggggggg/h/g; t h; b done
 3120: : h;  s/hhhhhhhhhh//g
 3121: @end group
 3122: 
 3123: @group
 3124: : done
 3125: $! @{
 3126:   h
 3127:   b
 3128: @}
 3129: @end group
 3130: 
 3131: # On the last line, convert back to decimal
 3132: 
 3133: @group
 3134: : loop
 3135: /a/! s/[b-h]*/&0/
 3136: s/aaaaaaaaa/9/
 3137: s/aaaaaaaa/8/
 3138: s/aaaaaaa/7/
 3139: s/aaaaaa/6/
 3140: s/aaaaa/5/
 3141: s/aaaa/4/
 3142: s/aaa/3/
 3143: s/aa/2/
 3144: s/a/1/
 3145: @end group
 3146: 
 3147: @group
 3148: : next
 3149: y/bcdefgh/abcdefg/
 3150: /[a-h]/ b loop
 3151: p
 3152: @end group
 3153: @end example
 3154: @c end---------------------------------------------
 3155: 
 3156: @node wc -w
 3157: @c @section Counting Words
 3158: @section 単語を数える
 3159: 
 3160: @c This script is almost the same as the previous one, once each
 3161: @c of the words on the line is converted to a single @samp{a}
 3162: @c (in the previous script each letter was changed to an @samp{a}).
 3163: @c 
 3164: このスクリプトは,前回のものとほとんど同じで,行にあるそれぞれの単語を
 3165: 単一の@samp{a}に一度変換します(前回のスクリプトでは,それぞれの文字を
 3166: @samp{a}に変更しています).
 3167: 
 3168: @c It is interesting that real @command{wc} programs have optimized
 3169: @c loops for @samp{wc -c}, so they are much slower at counting
 3170: @c words rather than characters.  This script's bottleneck,
 3171: @c instead, is arithmetic, and hence the word-counting one
 3172: @c is faster (it has to manage smaller numbers).
 3173: @c 
 3174: 本物の@code{wc}プログラムは@code{wc -c}に対しループが最適化されているの
 3175: で,文字を数えるより単語を数える方がはるかに遅くなります.これらのスク
 3176: リプトのボトルネックは,どちらかというと算数にあり,このため,単語を数
 3177: えるものはより速くなります(より小さい数を管理する必要があります).
 3178: 
 3179: @c Again, the common parts are not commented to show the importance
 3180: @c of commenting @command{sed} scripts.
 3181: @c 
 3182: 前回同様,共通部分には@command{sed}スクリプトへのコメントの重要性を示す
 3183: コメントがありません.
 3184: 
 3185: @c start-------------------------------------------
 3186: @example
 3187: #!/usr/bin/sed -nf
 3188: 
 3189: @group
 3190: # Convert words to a's
 3191: s/[ @kbd{tab}][ @kbd{tab}]*/ /g
 3192: s/^/ /
 3193: s/ [^ ][^ ]*/a /g
 3194: s/ //g
 3195: @end group
 3196: 
 3197: @group
 3198: # Append them to hold space
 3199: H
 3200: x
 3201: s/\n//
 3202: @end group
 3203: 
 3204: @group
 3205: # From here on it is the same as in wc -c.
 3206: /aaaaaaaaaa/! bx;   s/aaaaaaaaaa/b/g
 3207: /bbbbbbbbbb/! bx;   s/bbbbbbbbbb/c/g
 3208: /cccccccccc/! bx;   s/cccccccccc/d/g
 3209: /dddddddddd/! bx;   s/dddddddddd/e/g
 3210: /eeeeeeeeee/! bx;   s/eeeeeeeeee/f/g
 3211: /ffffffffff/! bx;   s/ffffffffff/g/g
 3212: /gggggggggg/! bx;   s/gggggggggg/h/g
 3213: s/hhhhhhhhhh//g
 3214: :x
 3215: $! @{ h; b; @}
 3216: :y
 3217: /a/! s/[b-h]*/&0/
 3218: s/aaaaaaaaa/9/
 3219: s/aaaaaaaa/8/
 3220: s/aaaaaaa/7/
 3221: s/aaaaaa/6/
 3222: s/aaaaa/5/
 3223: s/aaaa/4/
 3224: s/aaa/3/
 3225: s/aa/2/
 3226: s/a/1/
 3227: y/bcdefgh/abcdefg/
 3228: /[a-h]/ by
 3229: p
 3230: @end group
 3231: @end example
 3232: @c end---------------------------------------------
 3233: 
 3234: @node wc -l
 3235: @c @section Counting Lines
 3236: @section 行を数える
 3237: 
 3238: @c No strange things are done now, because @command{sed} gives us
 3239: @c @samp{wc -l} functionality for free!!! Look:
 3240: @c 
 3241: @command{sed}はに@code{wc -l}の機能があるので,今回はおかしなことを何も
 3242: しません!!! まあ見てください.
 3243: 
 3244: @c start-------------------------------------------
 3245: @example
 3246: @group
 3247: #!/usr/bin/sed -nf
 3248: $=
 3249: @end group
 3250: @end example
 3251: @c end---------------------------------------------
 3252: 
 3253: @node head
 3254: @c @section Printing the First Lines
 3255: @section 最初の行を出力する
 3256: 
 3257: @c This script is probably the simplest useful @command{sed} script.
 3258: @c It displays the first 10 lines of input; the number of displayed
 3259: @c lines is right before the @code{q} command.
 3260: @c 
 3261: 以下のスクリプトは,おそらく最も単純で役に立つ@command{sed}スクリプトで
 3262: す.それは入力の最初の十行を表示します.表示される行の数は,@code{q}コ
 3263: マンドの前と同じです.
 3264: 
 3265: @c start-------------------------------------------
 3266: @example
 3267: @group
 3268: #!/usr/bin/sed -f
 3269: 10q
 3270: @end group
 3271: @end example
 3272: @c end---------------------------------------------
 3273: 
 3274: @node tail
 3275: @c @section Printing the Last Lines
 3276: @section 最後の行を出力する
 3277: 
 3278: @c Printing the last @var{n} lines rather than the first is more complex
 3279: @c but indeed possible.  @var{n} is encoded in the second line, before
 3280: @c the bang character.
 3281: @c 
 3282: 最初ではなく最後の@var{n}行出力することはより複雑ですが実現可能です.
 3283: @var{n}は,文字を駄目にする前に二行目でエンコードされます.
 3284: 
 3285: @c This script is similar to the @command{tac} script in that it keeps the
 3286: @c final output in the hold space and prints it at the end:
 3287: @c 
 3288: このスクリプトは,最終的な出力をホールド空間に保持し,最後に出力する
 3289: @code{tac}スクリプトに似ています.
 3290: 
 3291: @c start-------------------------------------------
 3292: @example
 3293: #!/usr/bin/sed -nf
 3294: 
 3295: @group
 3296: 1! @{; H; g; @}
 3297: 1,10 !s/[^\n]*\n//
 3298: $p
 3299: h
 3300: @end group
 3301: @end example
 3302: @c end---------------------------------------------
 3303: 
 3304: @c Mainly, the scripts keeps a window of 10 lines and slides it
 3305: @c by adding a line and deleting the oldest (the substitution command
 3306: @c on the second line works like a @code{D} command but does not
 3307: @c restart the loop).
 3308: @c 
 3309: そのスクリプトの中心では,10行のウィンドウを保持し,行を追加し最も古い
 3310: 行を削除しながらスライドしていきます(二行目の置換コマンドは@code{D}コマ
 3311: ンドのように動作しますがループを再開しません).
 3312: 
 3313: @c The ``sliding window'' technique is a very powerful way to write
 3314: @c efficient and complex @command{sed} scripts, because commands like
 3315: @c @code{P} would require a lot of work if implemented manually.
 3316: @c 
 3317: ``スライドウィンドウ''のテクニックは,効率的で複雑な@command{sed}を書く
 3318: 強力な方法で,それは,@code{P}のようなコマンドを手動で実装する場合は多
 3319: くの作業が必要になるためです.
 3320: 
 3321: @c To introduce the technique, which is fully demonstrated in the
 3322: @c rest of this chapter and is based on the @code{N}, @code{P}
 3323: @c and @code{D} commands, here is an implementation of @command{tail}
 3324: @c using a simple ``sliding window.''
 3325: @c 
 3326: この章の残りで十分に説明している,@code{N},@code{P},そして@code{D}コ
 3327: マンドを基本としているテクニックを導入するため,単純な`スライドウィンド
 3328: ウ' を使用している@code{tail}の実装を以下に上げます.
 3329: 
 3330: @c This looks complicated but in fact the working is the same as
 3331: @c the last script: after we have kicked in the appropriate number
 3332: @c of lines, however, we stop using the hold space to keep inter-line
 3333: @c state, and instead use @code{N} and @code{D} to slide pattern
 3334: @c space by one line:
 3335: @c 
 3336: これは複雑に見えますが,実際最後のスクリプトと同じように動作します.し
 3337: かし,適切な行数を捨てた後で,内部の行の場所を保持するためのホールドス
 3338: ペースを使用するために停止し,パターン空間を一行スライドするために
 3339: @code{N}と@code{D}を代わりに使用しています.
 3340: 
 3341: @c start-------------------------------------------
 3342: @example
 3343: #!/usr/bin/sed -f
 3344: 
 3345: @group
 3346: 1h
 3347: 2,10 @{; H; g; @}
 3348: $q
 3349: 1,9d
 3350: N
 3351: D
 3352: @end group
 3353: @end example
 3354: @c end---------------------------------------------
 3355: 
 3356: 
 3357: @node uniq
 3358: @c @section Make Duplicate Lines Unique
 3359: @section 重複した行を一行にする
 3360: 
 3361: @c This is an example of the art of using the @code{N}, @code{P}
 3362: @c and @code{D} commands, probably the most difficult to master.
 3363: @c 
 3364: 以下は,@code{N},@code{P},そして@code{D}コマンドを使用した,おそらく
 3365: マスターするのが最も難しい芸術的な例です.
 3366: 
 3367: @c start-------------------------------------------
 3368: @example
 3369: @group
 3370: #!/usr/bin/sed -f
 3371: h
 3372: @end group
 3373: 
 3374: @group
 3375: :b
 3376: # On the last line, print and exit
 3377: $b
 3378: N
 3379: /^\(.*\)\n\1$/ @{
 3380:     # The two lines are identical.  Undo the effect of
 3381:     # the n command.
 3382:     g
 3383:     bb
 3384: @}
 3385: @end group
 3386: 
 3387: @group
 3388: # If the @code{N} command had added the last line, print and exit
 3389: $b
 3390: @end group
 3391: 
 3392: @group
 3393: # The lines are different; print the first and go
 3394: # back working on the second.
 3395: P
 3396: D
 3397: @end group
 3398: @end example
 3399: @c end---------------------------------------------
 3400: 
 3401: @c As you can see, we mantain a 2-line window using @code{P} and @code{D}.
 3402: @c This technique is often used in advanced @command{sed} scripts.
 3403: @c 
 3404: 御覧のように,@code{P}と@code{D}を使用して二行のウィンドウを管理してい
 3405: ます.このテクニックは,高度な@command{sed}スクリプトでよく使用されます.
 3406: 
 3407: @node uniq -d
 3408: @c @section Print Duplicated Lines of Input
 3409: @section 入力の重複している行を出力する.
 3410: 
 3411: @c This script prints only duplicated lines, like @samp{uniq -d}.
 3412: @c 
 3413: 以下のスクリプトは,@code{uniq -d}のように重複している行だけを出力しま
 3414: す.
 3415: 
 3416: @c start-------------------------------------------
 3417: @example
 3418: #!/usr/bin/sed -nf
 3419: 
 3420: @group
 3421: $b
 3422: N
 3423: /^\(.*\)\n\1$/ @{
 3424:     # Print the first of the duplicated lines
 3425:     s/.*\n//
 3426:     p
 3427: @end group
 3428: 
 3429: @group
 3430:     # Loop until we get a different line
 3431:     :b
 3432:     $b
 3433:     N
 3434:     /^\(.*\)\n\1$/ @{
 3435:         s/.*\n//
 3436:         bb
 3437:     @}
 3438: @}
 3439: @end group
 3440: 
 3441: @group
 3442: # The last line cannot be followed by duplicates
 3443: $b
 3444: @end group
 3445: 
 3446: @group
 3447: # Found a different one.  Leave it alone in the pattern space
 3448: # and go back to the top, hunting its duplicates
 3449: D
 3450: @end group
 3451: @end example
 3452: @c end---------------------------------------------
 3453: 
 3454: @node uniq -u
 3455: @c @section Remove All Duplicated Lines
 3456: @section すべての重複行を削除する
 3457: 
 3458: @c This script prints only unique lines, like @samp{uniq -u}.
 3459: @c 
 3460: 以下のスクリプトは,@code{uniq -u}のようにユニークな行だけを出力します.
 3461: 
 3462: @c start-------------------------------------------
 3463: @example
 3464: #!/usr/bin/sed -f
 3465: 
 3466: @group
 3467: # Search for a duplicate line --- until that, print what you find.
 3468: $b
 3469: N
 3470: /^\(.*\)\n\1$/ ! @{
 3471:     P
 3472:     D
 3473: @}
 3474: @end group
 3475: 
 3476: @group
 3477: :c
 3478: # Got two equal lines in pattern space.  At the
 3479: # end of the file we simply exit
 3480: $d
 3481: @end group
 3482: 
 3483: @group
 3484: # Else, we keep reading lines with @code{N} until we
 3485: # find a different one
 3486: s/.*\n//
 3487: N
 3488: /^\(.*\)\n\1$/ @{
 3489:     bc
 3490: @}
 3491: @end group
 3492: 
 3493: @group
 3494: # Remove the last instance of the duplicate line
 3495: # and go back to the top
 3496: D
 3497: @end group
 3498: @end example
 3499: @c end---------------------------------------------
 3500: 
 3501: @node cat -s
 3502: @c @section Squeezing Blank Lines
 3503: @section 空白行をまとめる
 3504: 
 3505: @c As a final example, here are three scripts, of increasing complexity
 3506: @c and speed, that implement the same function as @samp{cat -s}, that is
 3507: @c squeezing blank lines.
 3508: @c 
 3509: 最後の例として,空白行をまとめる@code{cat -s}と同じ機能を実装している,
 3510: 複雑さと速度を上げていく三つのスクリプトを以下に書きます.
 3511: 
 3512: @c The first leaves a blank line at the beginning and end if there are
 3513: @c some already.
 3514: @c 
 3515: 最初のものは,最初に空白行を取り去り,まだ残っていれば最後に取り去りま
 3516: す.
 3517: 
 3518: @c start-------------------------------------------
 3519: @example
 3520: #!/usr/bin/sed -f
 3521: 
 3522: @group
 3523: # on empty lines, join with next
 3524: # Note there is a star in the regexp
 3525: :x
 3526: /^\n*$/ @{
 3527: N
 3528: bx
 3529: @}
 3530: @end group
 3531: 
 3532: @group
 3533: # now, squeeze all '\n', this can be also done by:
 3534: # s/^\(\n\)*/\1/
 3535: s/\n*/\
 3536: /
 3537: @end group
 3538: @end example
 3539: @c end---------------------------------------------
 3540: 
 3541: @c This one is a bit more complex and removes all empty lines
 3542: @c at the beginning.  It does leave a single blank line at end
 3543: @c if one was there.
 3544: @c 
 3545: 以下のものはより複雑で,最初にすべての空の行を取り除きます.まだ残って
 3546: いる場合,最後に単一の空白行を取り去ります.
 3547: 
 3548: @c start-------------------------------------------
 3549: @example
 3550: #!/usr/bin/sed -f
 3551: 
 3552: @group
 3553: # delete all leading empty lines
 3554: 1,/^./@{
 3555: /./!d
 3556: @}
 3557: @end group
 3558: 
 3559: @group
 3560: # on an empty line we remove it and all the following
 3561: # empty lines, but one
 3562: :x
 3563: /./!@{
 3564: N
 3565: s/^\n$//
 3566: tx
 3567: @}
 3568: @end group
 3569: @end example
 3570: @c end---------------------------------------------
 3571: 
 3572: @c This removes leading and trailing blank lines.  It is also the
 3573: @c fastest.  Note that loops are completely done with @code{n} and
 3574: @c @code{b}, without exploting the fact that @command{sed} cycles back
 3575: @c to the top of the script automatically at the end of a line.
 3576: @c 
 3577: 以下は,前置および後置されている空白行を取り除きます.それは最も速いも
 3578: のです.@command{sed}が行の終りで自動的にスクリプトの最初のサイクルに戻
 3579: るという事実を利用することなく,@code{n}と@code{b}を用いて複雑なループ
 3580: を実行していることに注意してください.
 3581: 
 3582: @c start-------------------------------------------
 3583: @example
 3584: #!/usr/bin/sed -nf
 3585: 
 3586: @group
 3587: # delete all (leading) blanks
 3588: /./!d
 3589: @end group
 3590: 
 3591: @group
 3592: # get here: so there is a non empty
 3593: :x
 3594: # print it
 3595: p
 3596: # get next
 3597: n
 3598: # got chars? print it again, etc... 
 3599: /./bx
 3600: @end group
 3601: 
 3602: @group
 3603: # no, don't have chars: got an empty line
 3604: :z
 3605: # get next, if last line we finish here so no trailing
 3606: # empty lines are written
 3607: n
 3608: # also empty? then ignore it, and get next... this will
 3609: # remove ALL empty lines
 3610: /./!bz
 3611: @end group
 3612: 
 3613: @group
 3614: # all empty lines were deleted/ignored, but we have a non empty.  As
 3615: # what we want to do is to squeeze, insert a blank line artificially
 3616: i\
 3617: @end group
 3618: 
 3619: bx
 3620: @end example
 3621: @c end---------------------------------------------
 3622: 
 3623: @node Limitations
 3624: @c @chapter @value{SSED}'s Limitations and Non-limitations
 3625: @chapter @value{SSED}の制限と制限されていないこと
 3626: 
 3627: @cindex @acronym{GNU} extensions, unlimited line length
 3628: @cindex Portability, line length limitations
 3629: @c For those who want to write portable @command{sed} scripts,
 3630: @c be aware that some implementations have been known to
 3631: @c limit line lengths (for the pattern and hold spaces)
 3632: @c to be no more than 4000 bytes.
 3633: @c The @sc{posix} standard specifies that conforming @command{sed}
 3634: @c implementations shall support at least 8192 byte line lengths.
 3635: @c @value{SSED} has no built-in limit on line length;
 3636: @c as long as it can @code{malloc()} more (virtual) memory,
 3637: @c you can feed or construct lines as long as you like.
 3638: @c 
 3639: 移植性の高い@command{sed}スクリプトを書こうとしている人々は,実装形式に
 3640: よっては,(パターン空間とホールド空間の)行の長さに,最大でも4000バイト
 3641: までという既知の制限が有ることを覚えておいてください.@sc{posix}の標準
 3642: では,それに準じている@command{sed}の実装を少なくとも8192バイトの行の長
 3643: さをサポートするように指定しています.@value{SSED}には行の長さに組み込
 3644: まれている制限はありません.(仮想)メモリ上でmalloc()することが可能な限
 3645: り,心配するほど長い行を,与えたり構成したりすることが可能です.
 3646: 
 3647: @c However, recursion is used to handle subpatterns and indefinite
 3648: @c repetition.  This means that the available stack space may limit
 3649: @c the size of the buffer that can be processed by certain patterns.
 3650: @c 
 3651: しかし,再帰はサブパターンの処理と不定回の反復で使用されます.これで,
 3652: 利用可能なスタック空間は,特定のパターンで処理されるバッファのサイズを
 3653: 制限するかもしれません.
 3654: 
 3655: @ifset PERL
 3656: There are some size limitations in the regular expression
 3657: matcher but it is hoped that they will never in practice
 3658: be relevant.  The maximum length of a compiled pattern
 3659: is 65539 (sic) bytes.  All values in repeating quantifiers
 3660: must be less than 65536.  The maximum nesting depth of
 3661: all parenthesized subpatterns, including capturing and
 3662: non-capturing subpatterns@footnote{The
 3663: distinction is meaningful when referring to Perl-style
 3664: regular expressions.}, assertions, and other types of
 3665: subpattern, is 200.
 3666: 
 3667: Also, @value{SSED} recognizes the @sc{posix} syntax
 3668: @code{[.@var{ch}.]} and @code{[=@var{ch}=]}
 3669: where @var{ch} is a ``collating element'', but these
 3670: are not supported, and an error is given if they are
 3671: encountered.
 3672: 
 3673: Here are a few distinctions between the real Perl-style
 3674: regular expressions and those that @option{-R} recognizes.
 3675: 
 3676: @enumerate
 3677: @item
 3678: Lookahead assertions do not allow repeat quantifiers after them
 3679: Perl permits them, but they do not mean what you
 3680: might think. For example, @samp{(?!a)@{3@}} does not assert that the
 3681: next three characters are not @samp{a}. It just asserts three times that the
 3682: next character is not @samp{a} --- a waste of time and nothing else.
 3683: 
 3684: @item
 3685: Capturing subpatterns that occur inside  negative  lookahead
 3686: head  assertions  are  counted,  but  their  entries are counted
 3687: as empty in the second half of an @code{s} command.
 3688: Perl sets its numerical variables from any such patterns
 3689: that are matched before the assertion fails to match
 3690: something (thereby succeeding), but only if the negative
 3691: lookahead assertion contains just one branch.
 3692: 
 3693: @item
 3694: The following Perl escape sequences are not supported:
 3695: @samp{\l}, @samp{\u}, @samp{\L}, @samp{\U}, @samp{\E},
 3696: @samp{\Q}. In fact these are implemented by Perl's general
 3697: string-handling and are not part of its pattern matching engine.
 3698: 
 3699: @item
 3700: The Perl @samp{\G} assertion is not supported as it is not
 3701: relevant to single pattern matches.
 3702: 
 3703: @item
 3704: Fairly obviously, @value{SSED} does not support the @samp{(?@{code@})}
 3705: and @samp{(?p@{code@})} constructions. However, there is some experimental
 3706: support for recursive patterns using the non-Perl item @samp{(?R)}.
 3707: 
 3708: @item
 3709: There are at the time of writing some oddities in Perl
 3710: 5.005_02 concerned with the settings of captured strings
 3711: when part of a pattern is repeated. For example, matching
 3712: @samp{aba} against the pattern @samp{/^(a(b)?)+$/} sets
 3713: @samp{$2}@footnote{@samp{$2} would be @samp{\2} in @value{SSED}.}
 3714: to the value @samp{b}, but matching @samp{aabbaa}
 3715: against @samp{/^(aa(bb)?)+$/} leaves @samp{$2}
 3716: unset.  However, if the pattern is changed to
 3717: @samp{/^(aa(b(b))?)+$/} then @samp{$2} (and @samp{$3}) are set.
 3718: In Perl 5.004 @samp{$2} is set in both cases, and that is also
 3719: true of @value{SSED}.
 3720: 
 3721: @item
 3722: Another as yet unresolved discrepancy is that in Perl
 3723: 5.005_02 the pattern @samp{/^(a)?(?(1)a|b)+$/} matches
 3724: the string @samp{a}, whereas in @value{SSED} it does not.
 3725: However, in both Perl and @value{SSED} @samp{/^(a)?a/} matched
 3726: against @samp{a} leaves $1 unset.
 3727: @end enumerate
 3728: @end ifset
 3729: 
 3730: @node Other Resources
 3731: @c @chapter Other Resources for Learning About @command{sed}
 3732: @chapter @command{sed}を学ぶ際のその他の情報源
 3733: 
 3734: @cindex Additional reading about @command{sed}
 3735: @c In addition to several books that have been written about @command{sed}
 3736: @c (either specifically or as chapters in books which discuss
 3737: @c shell programming), one can find out more about @command{sed}
 3738: @c (including suggestions of a few books) from the FAQ
 3739: @c for the @code{sed-users} mailing list, available from any of:
 3740: @c 
 3741: @command{sed}に関して書かれているいくつかの本(または,シェルプログラミ
 3742: ングについて論じている本の特定の章)に加えて,@command{sed}について(数冊
 3743: の本の示唆を含めて)以下で利用可能な@code{sed-users}のメーリングリストの
 3744: FAQで見つけることが可能です.
 3745: 
 3746: @display
 3747:  @uref{http://www.student.northpark.edu/pemente/sed/sedfaq.html}
 3748:  @uref{http://sed.sf.net/grabbag/tutorials/sedfaq.html}
 3749: @end display
 3750: 
 3751: @c Also of interest are
 3752: @c @uref{http://www.student.northpark.edu/pemente/sed/index.htm}
 3753: @c and @uref{http://sed.sf.net/grabbag},
 3754: @c which include @command{sed} tutorials and other @command{sed}-related goodies.
 3755: @c 
 3756: また興味あるものとして,@command{sed}のチュートリアルとその他の
 3757: @command{sed}関連グッズが含まれているものは,
 3758: @uref{http://www.student.northpark.edu/pemente/sed/index.htm}と
 3759: @uref{http://sed.sf.net/grabbag}です.
 3760: 
 3761: @c The @code{sed-users} mailing list itself maintained by Sven Guckes.
 3762: @c To subscribe, visit @uref{http://groups.yahoo.com} and search
 3763: @c for the @code{sed-users} mailing list.
 3764: @c 
 3765: Sven Guckesが管理している非公式の``sed-users''のメーリングリストがあり
 3766: ます.購読するためには,@uref{http://groups.yahoo.com}を訪問し,
 3767: @code{sed-users}メーリングリストを検索してください.
 3768: 
 3769: @node Reporting Bugs
 3770: @c @chapter Reporting Bugs
 3771: @chapter バグの報告
 3772: 
 3773: @cindex Bugs, reporting
 3774: @c Email bug reports to @email{bonzini@@gnu.org}.
 3775: @c Be sure to include the word ``sed'' somewhere in the @code{Subject:} field.
 3776: @c Also, please include the output of @samp{sed --version} in the body
 3777: @c of your report if at all possible.
 3778: @c 
 3779: バグの報告は@email{bug-gnu-utils@@gnu.org}に電子メールを送ってください.
 3780: @code{Subject:}フィールドのどこかに,単語``sed''を含めてください.また,
 3781: 可能であれば,報告の内容に@code{sed --version}の出力も含めてください.
 3782: 
 3783: @c Please do not send a bug report like this:
 3784: @c 
 3785: 以下のようなバグの報告を送らないでください.
 3786: 
 3787: @example
 3788: @i{while building frobme-1.3.4}
 3789: $ configure 
 3790: @error{} sed: file sedscr line 1: Unknown option to 's'
 3791: @end example
 3792: 
 3793: @c If @value{SSED} doesn't configure your favorite package, take a
 3794: @c few extra minutes to identify the specific problem and make a stand-alone
 3795: @c test case.  Unlike other programs such as C compilers, making such test
 3796: @c cases for @command{sed} is quite simple.
 3797: @c 
 3798: @value{SSED}でお気に入りのパッケージをコンフィグレーションできない場合,
 3799: 特定の問題を確認するために時間をかけ,スタンドアローンのテストケースを
 3800: 作成してください.Cコンパイラのようなその他のプログラムとは異なり,
 3801: @command{sed}に対してそのようなテストケースを作成することはまったく単純
 3802: です.
 3803: 
 3804: @c A stand-alone test case includes all the data necessary to perform the
 3805: @c test, and the specific invocation of @command{sed} that causes the problem.
 3806: @c The smaller a stand-alone test case is, the better.  A test case should
 3807: @c not involve something as far removed from @command{sed} as ``try to configure
 3808: @c frobme-1.3.4''.  Yes, that is in principle enough information to look
 3809: @c for the bug, but that is not a very practical prospect.
 3810: @c 
 3811: スタンドアローンのテストケースには,テストを実行するために必要なすべて
 3812: のデータと,問題を生じた@command{sed}の呼び出しで指定したものを含めてく
 3813: ださい.スタンドアローンのテストケースは小さい方が良いでしょう.テスト
 3814: ケースは,``frobme-1.3.4のコンフィグレーションを試みた''というような,
 3815: @command{sed} からかけ離れているものを必要とすべきではありません.そう
 3816: です,それはバグを探す情報として原則的には十分ですが,見通しはあまり現
 3817: 実的ではありません.
 3818: 
 3819: @c Here are a few commonly reported bugs that are not bugs.
 3820: @c 
 3821: 以下は一般的なバグやバグではないもので報告されているものです.
 3822: 
 3823: @table @asis
 3824: @c @item @code{N} command on the last line
 3825: @item 最後の行の@code{N}コマンド
 3826: @cindex Portability, @code{N} command on the last line
 3827: @cindex Non-bugs, @code{N} command on the last line
 3828: 
 3829: @c Most versions of @command{sed} exit without printing anything when
 3830: @c the @command{N} command is issued on the last line of a file.
 3831: @c @value{SSED} prints pattern space before exiting unless of course
 3832: @c the @command{-n} command switch has been specified.  This choice is
 3833: @c by design.
 3834: @c 
 3835: @command{sed}のほとんどのバージョンは,@command{N}コマンドがファイルの
 3836: リストの最後に出力されるとき,何も出力せずに終了します.@value{SSED}は
 3837: @command{-n}コマンドスイッチが指定されていない限り,終了前にパターンス
 3838: ペースを出力します.この選択は設計されたものです.
 3839: 
 3840: @c For example, the behavior of
 3841: @c 
 3842: 例えば,以下の動作を考えます.
 3843: @example
 3844: sed N foo bar
 3845: @end example
 3846: @noindent
 3847: @c would depend on whether foo has an even or an odd number of
 3848: @c lines@footnote{which is the actual ``bug'' that prompted the
 3849: @c change in behavior}.  Or, when writing a script to read the
 3850: @c next few lines following a pattern match, traditional
 3851: @c implementations of @command{sed} would force you to write
 3852: @c something like
 3853: @c 
 3854: これは,fooが偶数または奇数の行があるかどうかに依存します@footnote{実際
 3855: には,動作の変更をうながす``bug''です}.または,パターンにマッチした後
 3856: に続く数行を読み込むスクリプトを書いているとき,伝統的な@command{sed}の
 3857: 実装では,以下のように書くことが強制されます.
 3858: 
 3859: @example
 3860: /foo/@{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N @}
 3861: @end example
 3862: @noindent
 3863: @c instead of just
 3864: @c 
 3865: 以下の代わりです.
 3866: @example
 3867: /foo/@{ N;N;N;N;N;N;N;N;N; @}
 3868: @end example
 3869:  
 3870: @cindex @code{POSIXLY_CORRECT} behavior, @code{N} command
 3871: @c In any case, the simplest workaround is to use @code{$d;N} in
 3872: @c scripts that rely on the traditional behavior, or to set
 3873: @c the @code{POSIXLY_CORRECT} variable to a non-empty value.
 3874: @c 
 3875: いずれにせよ,もっとも単純な回避方法は,伝統的な動作に依存するスクリプ
 3876: トで@code{$d;N}を使用すること,または,@code{POSIXLY_CORRECT}変数を空で
 3877: はない値に設定することです.
 3878: 
 3879: @c @item Regex syntax clashes
 3880: @item 正規表現の構文の破壊
 3881: @cindex @acronym{GNU} extensions, to basic regular expressions
 3882: @cindex Non-bugs, regex syntax clashes
 3883: @c @command{sed} uses the @sc{posix} basic regular expression syntax.  According to
 3884: @c the standard, the meaning of some escape sequences is undefined in
 3885: @c this syntax;  notable in the case of @command{sed} are @code{\|},
 3886: @c @code{\+}, @code{\?}, @code{\`}, @code{\'}, @code{\<},
 3887: @c @code{\>}, @code{\b}, @code{\B}, @code{\w}, and @code{\W}.
 3888: @c 
 3889: @command{sed}は,@sc{posix}の基本正規表現の構文を使用しています.標準に
 3890: よると,エスケープシーケンスの意味はこの構文では未定義です.
 3891: @command{sed}で有名なものは,@code{\|},@code{\+},@code{\?},
 3892: @code{\@code{},@code{\}},@code{\<},@code{\>},@code{\b},@code{\B},
 3893: @code{\w},そして@code{\W}です.
 3894: 
 3895: @c As in all GNU programs that use @sc{posix} basic regular expressions, @command{sed}
 3896: @c interprets these escape sequences as special characters.  So, @code{x\+}
 3897: @c matches one or more occurrences of @samp{x}.  @code{abc\|def} matches
 3898: @c either @samp{abc} or @samp{def}.
 3899: @c 
 3900: @sc{posix}の基本正規表現を使用しているすべてのGNUプログラムでは,
 3901: @command{sed}はこれらのエスケープシーケンスを特殊文字として解釈します.
 3902: そのため,@code{x\+}は一つ以上の@code{x}の出現にマッチします.
 3903: @code{abc\|def}は@code{abc}または@code{def}にマッチします.
 3904: 
 3905: @c This syntax may cause problems when running scripts written for other
 3906: @c @command{sed}s.  Some @command{sed} programs have been written with the
 3907: @c assumption that @code{\|} and @code{\+} match the literal characters
 3908: @c @code{|} and @code{+}.  Such scripts must be modified by removing the
 3909: @c spurious backslashes if they are to be used with modern implementations
 3910: @c of @command{sed}, like
 3911: @c 
 3912: この構文は,他の@command{sed}に対して書かれたスクリプトを実行するとき問
 3913: 題になるかもしれません.@command{sed}プログラムには,@code{\|}と
 3914: @code{\+}がリテラル文字の@code{|}と@code{+}にマッチするという仮定を用い
 3915: て書かれているものもあります.そのようなスクリプトを,
 3916: @ifset PERL
 3917: @c @value{SSED} or
 3918: @c 
 3919: @value{SSED}や
 3920: @end ifset
 3921: @c @acronym{GNU} @command{sed}.
 3922: @c 
 3923: @acronym{GNU} @command{sed}のような近代的な@command{sed}を実装している
 3924: ものを用いて使用する場合,偽のバックスラッシュを削除して修正する必要が
 3925: あります.
 3926: 
 3927: @cindex @acronym{GNU} extensions, special escapes
 3928: @c In addition, this version of @command{sed} supports several escape characters
 3929: @c (some of which are multi-character) to insert non-printable characters
 3930: @c in scripts (@code{\a}, @code{\c}, @code{\d}, @code{\o}, @code{\r},
 3931: @c @code{\t}, @code{\v}, @code{\x}).  These can cause similar problems
 3932: @c with scripts written for other @command{sed}s.
 3933: @c 
 3934: 更に,このバージョンの@command{sed}は,印刷不可能な文字(マルチバイト文
 3935: 字を含む)をスクリプトに挿入するため,いくつかのエスケープ文字
 3936: (@code{\a},@code{\c},@code{\d},@code{\o},@code{\r},@code{\t},
 3937: @code{\v},@code{\x})をサポートしています.これらは,他の@command{sed}
 3938: に対して書かれているスクリプトで同様な問題を生じるでしょう.
 3939: 
 3940: @c @item @option{-i} clobbers read-only files
 3941: @item @code{-i}で読み込み専用のファイルを破壊する
 3942: @cindex In-place editing
 3943: @cindex @value{SSEDEXT}, in-place editing
 3944: @cindex Non-bugs, in-place editing
 3945: 
 3946: @c In short, @samp{sed -i} will let you delete the contents of
 3947: @c a read-only file, and in general the @option{-i} option
 3948: @c (@pxref{Invoking sed, , Invocation}) lets you clobber
 3949: @c protected files.  This is not a bug, but rather a consequence
 3950: @c of how the Unix filesystem works.
 3951: @c 
 3952: 手短に言うと,@code{sed d -i}では読み取り専用のファイルの内容を削除し,
 3953: 一般的に@code{-i}オプション((@pxref{Invoking sed, , Invocation})では保
 3954: 護されたファイルを破壊します.これはバグではありませんが,Unixファイル
 3955: システムで動作する方法に影響します.
 3956: 
 3957: @c The permissions on a file say what can happen to the data
 3958: @c in that file, while the permissions on a directory say what can
 3959: @c happen to the list of files in that directory.  @samp{sed -i}
 3960: @c will not ever open for writing  a file that is already on disk.
 3961: @c Rather, it will work on a temporary file that is finally renamed
 3962: @c to the original name: if you rename or delete files, you're actually
 3963: @c modifying the contents of the directory, so the operation depends on
 3964: @c the permissions of the directory, not of the file.  For this same
 3965: @c reason, @command{sed} does not let you use @option{-i} on a writeable file
 3966: @c in a read-only directory (but unbelievably nobody reports that as a
 3967: @c bug@dots{}).
 3968: @c 
 3969: ファイルへの許可はファイルのデータに生じる可能性があることを告げていま
 3970: すが,ディレクトリの許可はディレクトリのファイルのリストに生じる可能性
 3971: があることを告げています.@code{sed -i}は書き込み属性でディスクに存在し
 3972: ているファイルを開きませんが,最終的にはオリジナルファイル名に名前を変
 3973: 更する一時的なファイルで作業します.ファイルの名前を変更したり削除した
 3974: りする場合,実際にはディレクトリの内容を編集していることになるので,そ
 3975: の処理はファイルではなくディレクトリの許可に依存します.同じ理由で,
 3976: @command{sed}は読み込み専用のディレクトリの書き込み可能なファイルで
 3977: @option{-i}を使用しません(しかし,信じられないことにバグとして報告して
 3978: きた人はいません@dots{}).
 3979: 
 3980: @c @item @code{0a} does not work (gives an error)
 3981: @item @code{0a}は動作しません(エラーになります)
 3982: @c There is no line 0.  0 is a special address that is only used to treat
 3983: @c addresses like @samp{0,/@var{RE}/} as active when the script starts: if
 3984: @c you write @samp{1,/abc/d} and the first line includes the word @samp{abc},
 3985: @c then that match would be ignored because address ranges must span at least
 3986: @c two lines (barring the end of the file); but what you probably wanted is
 3987: @c to delete every line up to the first one including @samp{abc}, and this
 3988: @c is obtained with @samp{0,/abc/d}.
 3989: @c 
 3990: 行0はありません.0は特殊なアドレスで,スクリプトの開始時にアクティブに
 3991: するため,@samp{0,/@var{RE}/}のように処理するためだけに使用されます.
 3992: @samp{1,/abc/d} と書き,最初の行に単語@samp{abc}が含まれている場合,ア
 3993: ドレス範囲は(ファイルの終わりでなければ)少なくとも二行必要なので,マッ
 3994: チは無視されます.しかし,期待することは,@samp{abc}を含んでいる行が最
 3995: 初に見つかるまで,すべての行を削除することなので,これは
 3996: @samp{0,/abc/d}で実行されます.
 3997: @end table
 3998: 
 3999: @node Extended regexps
 4000: @c @appendix Extended regular expressions
 4001: @appendix 拡張正規表現
 4002: @cindex Extended regular expressions, syntax
 4003: 
 4004: @c The only difference between basic and extended regular expressions is in
 4005: @c the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
 4006: @c and braces (@samp{@{@}}).  While basic regular expressions require
 4007: @c these to be escaped if you want them to behave as special characters,
 4008: @c when using extended regular expressions you must escape them if
 4009: @c you want them @emph{to match a literal character}.
 4010: @c 
 4011: 正規表現の基本と拡張の差は,いくつかの文字の動作だけです.それは
 4012: @samp{?},@samp{+},丸カッコ,そして弓カッコ(@samp{@{@}})です.基本正規
 4013: 表現では,それらを特殊文字として動作させたい場合,これらをエスケープす
 4014: る必要がありますが,拡張正規表現を使用しているとき,@emph{文字そのもの
 4015: にマッチさせたい場合}エスケープする必要があります.
 4016: 
 4017: @noindent
 4018: @c Examples:
 4019: @c 
 4020: 例えば以下のようになります.
 4021: @table @code
 4022: @item abc?
 4023: @c becomes @samp{abc\?} when using extended regular expressions.  It matches
 4024: @c the literal string @samp{abc?}.
 4025: @c 
 4026: 拡張正規表現を使用しているとき@samp{abc\?}になります.それは文字列
 4027: @samp{abc?}そのものにマッチします.
 4028: 
 4029: @item c\+
 4030: @c becomes @samp{c+} when using extended regular expressions.  It matches
 4031: @c one or more @samp{c}s.
 4032: @c 
 4033: 拡張正規表現を使用しているとき@samp{c+}になります.それは一つ以上の
 4034: @samp{c}にマッチします.
 4035: 
 4036: @item a\@{3,\@}
 4037: @c becomes @samp{a@{3,@}} when using extended regular expressions.  It matches
 4038: @c three or more @samp{a}s.
 4039: @c 
 4040: 拡張正規表現を使用しているとき@samp{a@{3,@}}になります.それは三つ以上
 4041: の@samp{a}にマッチします.
 4042: 
 4043: @item \(abc\)\@{2,3\@}
 4044: @c becomes @samp{(abc)@{2,3@}} when using extended regular expressions.  It
 4045: @c matches either @samp{abcabc} or @samp{abcabcabc}.
 4046: @c 
 4047: 拡張正規表現を使用しているとき@samp{(abc)@{2,3@}}になります.それは
 4048: @samp{abcabc}または@samp{abcabcabc}のいずれかにマッチします.
 4049: 
 4050: @item \(abc*\)\1
 4051: @c becomes @samp{(abc*)\1} when using extended regular expressions.
 4052: @c Backreferences must still be escaped when using extended regular
 4053: @c expressions.
 4054: @c 
 4055: 拡張正規表現を使用しているとき@samp{(abc*)\1}になります.後方参照は拡張
 4056: 正規表現を使用しているときもエスケープする必要があります.
 4057: @end table
 4058: 
 4059: @ifset PERL
 4060: @node Perl regexps
 4061: @c @appendix Perl-style regular expressions
 4062: @appendix Perl形式の正規表現
 4063: @cindex Perl-style regular expressions, syntax
 4064: 
 4065: @emph{This part is taken from the @file{pcre.txt} file distributed together
 4066: with the free @sc{pcre} regular expression matcher; it was written by Philip Hazel.}
 4067: 
 4068: Perl introduced several extensions to regular expressions, some
 4069: of them incompatible with the syntax of regular expressions
 4070: accepted by Emacs and other @acronym{GNU} tools (whose matcher was
 4071: based on the Emacs matcher).  @value{SSED} implements
 4072: both kinds of extensions.
 4073: 
 4074: @iftex
 4075: Summarizing, we have:
 4076: 
 4077: @itemize @bullet
 4078: @item
 4079: A backslash can introduce several special sequences
 4080: 
 4081: @item
 4082: The circumflex, dollar sign, and period characters behave specially 
 4083: with regard to new lines
 4084: 
 4085: @item
 4086: Strange uses of square brackets are parsed differently
 4087: 
 4088: @item
 4089: You can toggle modifiers in the middle of a regular expression
 4090: 
 4091: @item
 4092: You can specify that a subpattern does not count when numbering backreferences
 4093: 
 4094: @item
 4095: @cindex Greedy regular expression matching
 4096: You can specify greedy or non-greedy matching
 4097: 
 4098: @item
 4099: You can have more than ten back references
 4100: 
 4101: @item
 4102: You can do complex look aheads and look behinds (in the spirit of
 4103: @code{\b}, but with subpatterns).
 4104: 
 4105: @item
 4106: You can often improve performance by avoiding that @command{sed} wastes
 4107: time with backtracking
 4108: 
 4109: @item
 4110: You can have if/then/else branches
 4111: 
 4112: @item
 4113: You can do recursive matches, for example to look for unbalanced parentheses
 4114: 
 4115: @item
 4116: You can have comments and non-significant whitespace, because things can
 4117: get complex...
 4118: @end itemize
 4119: 
 4120: Most of these extensions are introduced by the special @code{(?}
 4121: sequence, which gives special meanings to parenthesized groups.
 4122: @end iftex
 4123: @menu
 4124: Other extensions can be roughly subdivided in two categories
 4125: On one hand Perl introduces several more escaped sequences
 4126: (that is, sequences introduced by a backslash).  On the other
 4127: hand, it specifies that if a question mark follows an open
 4128: parentheses it should give a special meaning to the parenthesized
 4129: group.
 4130: 
 4131: * Backslash::                       Introduces special sequences
 4132: * Circumflex/dollar sign/period::   Behave specially with regard to new lines
 4133: * Square brackets::                 Are a bit different in strange cases
 4134: * Options setting::                 Toggle modifiers in the middle of a regexp
 4135: * Non-capturing subpatterns::       Are not counted when backreferencing
 4136: * Repetition::                      Allows for non-greedy matching
 4137: * Backreferences::                  Allows for more than 10 back references
 4138: * Assertions::                      Allows for complex look ahead matches
 4139: * Non-backtracking subpatterns::    Often gives more performance
 4140: * Conditional subpatterns::         Allows if/then/else branches
 4141: * Recursive patterns::              For example to match parentheses
 4142: * Comments::                        Because things can get complex...
 4143: @end menu
 4144: 
 4145: @node Backslash
 4146: @c @appendixsec Backslash
 4147: @appendixsec バックスラッシュ
 4148: @cindex Perl-style regular expressions, escaped sequences
 4149: 
 4150: There are a few difference in the handling of backslashed 
 4151: sequences in Perl mode.
 4152: 
 4153: First of all, there are no @code{\o} and @code{\d} sequences.
 4154: @sc{ascii} values for characters can be specified in octal
 4155: with a @code{\@var{xxx}} sequence, where @var{xxx} is a
 4156: sequence of up to three octal digits.  If the first digit
 4157: is a zero, the treatment of the sequence is straightforward;
 4158: just note that if the character that follows the escaped digit
 4159: is itself an octal digit, you have to supply three octal digits
 4160: for @var{xxx}.  For example @code{\07} is a @sc{bel} character
 4161: rather than a @sc{nul} and a literal @code{7} (this sequence is
 4162: instead represented by @code{\0007}).
 4163: 
 4164: @cindex Perl-style regular expressions, backreferences
 4165: The handling of a backslash followed by a digit other than 0
 4166: is complicated.  Outside a character class, @command{sed} reads it
 4167: and any following digits as a decimal number. If the number
 4168: is less than 10, or if there have been at least that many
 4169: previous capturing left parentheses in the expression, the
 4170: entire sequence is taken as a back reference. A description
 4171: of how this works is given later, following the discussion
 4172: of parenthesized subpatterns.
 4173: 
 4174: Inside a character class, or if the decimal number is
 4175: greater than 9 and there have not been that many capturing
 4176: subpatterns, @command{sed} re-reads up to three octal digits following
 4177: the backslash, and generates a single byte from the
 4178: least significant 8 bits of the value. Any subsequent digits
 4179: stand for themselves.  For example:
 4180: 
 4181: @example
 4182:      \040  @i{is another way of writing a space}
 4183:      \40   @i{is the same, provided there are fewer than 40}
 4184:            @i{previous capturing subpatterns}
 4185:      \7    @i{is always a back reference}
 4186:      \011  @i{is always a tab}
 4187:      \11   @i{might be a back reference, or another way of}
 4188:            @i{writing a tab}
 4189:      \0113 @i{is a tab followed by the character @samp{3}}
 4190:      \113  @i{is the character with octal code 113 (since there}
 4191:            @i{can be no more than 99 back references)}
 4192:      \377  @i{is a byte consisting entirely of 1 bits (@sc{ascii} 255)}
 4193:      \81   @i{is either a back reference, or a binary zero}
 4194:            @i{followed by the two characters @samp{81}}
 4195: @end example
 4196: 
 4197: Note that octal values of 100 or greater must not be introduced
 4198: duced by a leading zero, because no more than three octal
 4199: digits are ever read.
 4200: 
 4201: All the sequences that define a single byte value can be
 4202: used both inside and outside character classes. In addition,
 4203: inside a character class, the sequence @code{\b} is interpreted
 4204: as the backspace character (hex 08). Outside a character
 4205: class it has a different meaning (see below).
 4206: 
 4207: In addition, there are four additional escapes specifying
 4208: generic character classes (like @code{\w} and @code{\W} do):
 4209: 
 4210: @cindex Perl-style regular expressions, character classes
 4211: @table @samp
 4212: @item \d
 4213: Matches any decimal digit
 4214: 
 4215: @item \D
 4216: Matches any character that is not a decimal digit
 4217: @end table
 4218: 
 4219: In Perl mode, these character type sequences can appear both inside and
 4220: outside character classes. Instead, in @sc{posix} mode these sequences
 4221: (as well as @code{\w} and @code{\W}) are treated as two literal characters
 4222: (a backslash and a letter) inside square brackets.
 4223: 
 4224: Escaped sequences specifying assertions are also different in
 4225: Perl mode.  An assertion specifies a condition that has to be met
 4226: at a particular point in a match, without consuming any
 4227: characters from the subject string. The use of subpatterns
 4228: for more complicated assertions is described below.  The
 4229: backslashed assertions are
 4230: 
 4231: @cindex Perl-style regular expressions, assertions
 4232: @table @samp
 4233: @item \b
 4234: Asserts that the point is at a word boundary.
 4235: A word boundary is a position in the subject string where
 4236: the current character and the previous character do not both
 4237: match @code{\w} or @code{\W} (i.e. one matches @code{\w} and
 4238: the other matches @code{\W}), or the start or end of the string
 4239: if the first or last character matches @code{\w}, respectively.
 4240: 
 4241: @item \B
 4242: Asserts that the point is not at a word boundary.
 4243: 
 4244: @item \A
 4245: Asserts the matcher is at the start of pattern space (independent
 4246: of multiline mode).
 4247: 
 4248: @item \Z
 4249: Asserts the matcher is at the end of pattern space,
 4250: or at a newline before the end of pattern space (independent of
 4251: multiline mode)
 4252: 
 4253: @item \z
 4254: Asserts the matcher is at the end of pattern space (independent
 4255: of multiline mode)
 4256: @end table
 4257: 
 4258: These assertions may not appear in character classes (but
 4259: note that @code{\b} has a different meaning, namely the
 4260: backspace character, inside a character class).
 4261: Note that Perl mode does not support directly assertions
 4262: for the beginning and the end of word; the @acronym{GNU} extensions
 4263: @code{\<} and @code{\>} achieve this purpose in @sc{posix} mode
 4264: instead.
 4265: 
 4266: The @code{\A}, @code{\Z}, and @code{\z} assertions differ
 4267: from the traditional circumflex and dollar sign (described below)
 4268: in that they only ever match at the very start and end of the
 4269: subject string, whatever options are set; in particular @code{\A}
 4270: and @code{\z} are the same as the @acronym{GNU} extensions
 4271: @code{\`} and @code{\'} that are active in @sc{posix} mode.
 4272: 
 4273: @node Circumflex/dollar sign/period
 4274: @appendixsec Circumflex, dollar sign, period
 4275: @cindex Perl-style regular expressions, newlines
 4276: 
 4277: Outside a character class, in the default matching mode, the
 4278: circumflex character is an assertion which is true only if
 4279: the current matching point is at the start of the subject
 4280: string.  Inside a character class, the circumflex has an entirely
 4281: different meaning (see below).
 4282: 
 4283: The circumflex need not be the first character of the pattern if
 4284: a number of alternatives are involved, but it should be the
 4285: first thing in each alternative in which it appears if the
 4286: pattern is ever to match that branch. If all possible alternatives,
 4287: start with a circumflex, that is, if the pattern is
 4288: constrained to match only at the start of the subject, it is
 4289: said to be an @dfn{anchored} pattern. (There are also other constructs
 4290: structs that can cause a pattern to be anchored.)
 4291: 
 4292: A dollar sign is an assertion which is true only if the
 4293: current matching point is at the end of the subject string,
 4294: or immediately before a newline character that is the last
 4295: character in the string (by default).  A dollar sign need not be the
 4296: last character of the pattern if a number of alternatives
 4297: are involved, but it should be the last item in any branch
 4298: in which it appears.  A dollar sign has no special meaning in a
 4299: character class.
 4300: 
 4301: @cindex Perl-style regular expressions, multiline
 4302: The meanings of the circumflex and dollar sign characters are
 4303: changed if the @code{M} modifier option is used. When this is
 4304: the case, they match immediately after and immediately
 4305: before an internal @code{\n} character, respectively, in addition
 4306: to matching at the start and end of the subject string.  For
 4307: example, the pattern @code{/^abc$/} matches the subject string
 4308: @samp{def\nabc} in multiline mode, but not otherwise.  Consequently,
 4309: patterns that are anchored in single line mode
 4310: because all branches start with @code{^} are not anchored in
 4311: multiline mode.
 4312: 
 4313: @cindex Perl-style regular expressions, multiline
 4314: Note that the sequences @code{\A}, @code{\Z}, and @code{\z}
 4315: can be used to match the start and end of the subject in both
 4316: modes, and if all branches of a pattern start with @code{\A}
 4317: is it always anchored, whether the @code{M} modifier is set or not.
 4318: 
 4319: @cindex Perl-style regular expressions, single line
 4320: Outside a character class, a dot in the pattern matches any
 4321: one character in the subject, including a non-printing character,
 4322: but not (by default) newline.  If the @code{S} modifier is used,
 4323: dots match newlines as well.  Actually, the handling of
 4324: dot is entirely independent of the handling of circumflex
 4325: and dollar sign, the only relationship being that they both
 4326: involve newline characters. Dot has no special meaning in a
 4327: character class.
 4328: 
 4329: @node Square brackets
 4330: @appendixsec Square brackets
 4331: @cindex Perl-style regular expressions, character classes
 4332: 
 4333: An opening square bracket introduces a character class, terminated
 4334: by a closing square bracket.  A closing square bracket on its own
 4335: is not special.  If a closing square bracket is required as a
 4336: member of the class, it should be the first data character in
 4337: the class (after an initial circumflex, if present) or escaped with a backslash.
 4338: 
 4339: A character class matches a single character in the subject;
 4340: the character must be in the set of characters defined by
 4341: the class, unless the first character in the class is a circumflex,
 4342: in which case the subject character must not be in
 4343: the set defined by the class. If a circumflex is actually
 4344: required as a member of the class, ensure it is not the
 4345: first character, or escape it with a backslash.
 4346: 
 4347: For example, the character class [aeiou] matches any lower
 4348: case vowel, while [^aeiou] matches any character that is not
 4349: a lower case vowel. Note that a circumflex is just a convenient
 4350: venient notation for specifying the characters which are in
 4351: the class by enumerating those that are not. It is not an
 4352: assertion: it still consumes a character from the subject
 4353: string, and fails if the current pointer is at the end of
 4354: the string.
 4355: 
 4356: @cindex Perl-style regular expressions, case-insensitive
 4357: When caseless matching is set, any letters in a class
 4358: represent both their upper case and lower case versions, so
 4359: for example, a caseless @code{[aeiou]} matches uppercase
 4360: and lowercase @samp{A}s, and a caseless @code{[^aeiou]}
 4361: does not match @samp{A}, whereas a case-sensitive version would.
 4362: 
 4363: @cindex Perl-style regular expressions, single line
 4364: @cindex Perl-style regular expressions, multiline
 4365: The newline character is never treated in any special way in
 4366: character classes, whatever the setting of the @code{S} and
 4367: @code{M} options (modifiers) is.  A class such as @code{[^a]} will
 4368: always match a newline.
 4369: 
 4370: The minus (hyphen) character can be used to specify a range
 4371: of characters in a character class.  For example, @code{[d-m]}
 4372: matches any letter between d and m, inclusive.  If a minus
 4373: character is required in a class, it must be escaped with a
 4374: backslash or appear in a position where it cannot be interpreted
 4375: as indicating a range, typically as the first or last
 4376: character in the class.
 4377: 
 4378: It is not possible to have the literal character @code{]} as the
 4379: end character of a range.  A pattern such as @code{[W-]46]} is
 4380: interpreted as a class of two characters (@code{W} and @code{-})
 4381: followed by a literal string @code{46]}, so it would match
 4382: @samp{W46]} or @samp{-46]}. However, if the @code{]} is escaped
 4383: with a backslash it is interpreted as the end of range, so
 4384: @code{[W-\]46]} is interpreted as a single class containing a
 4385: range followed by two separate characters. The octal or
 4386: hexadecimal representation of @code{]} can also be used to end a range.
 4387: 
 4388: Ranges operate in @sc{ascii} collating sequence. They can also be
 4389: used for characters specified numerically, for example
 4390: @code{[\000-\037]}. If a range that includes letters is used when
 4391: caseless matching is set, it matches the letters in either
 4392: case. For example, a caseless @code{[W-c]} is equivalent to
 4393: @code{[][\^_`wxyzabc]}, matched caselessly, and if character
 4394: tables for the French locale are in use, @code{[\xc8-\xcb]}
 4395: matches accented E characters in both cases.
 4396: 
 4397: Unlike in @sc{posix} mode, the character types @code{\d},
 4398: @code{\D}, @code{\s}, @code{\S}, @code{\w}, and @code{\W}
 4399: may also appear in a character class, and add the characters
 4400: that they match to the class. For example, @code{[\dABCDEF]} matches any
 4401: hexadecimal digit.  A circumflex can conveniently be used
 4402: with the upper case character types to specify a more restricted
 4403: set of characters than the matching lower case type.
 4404: For example, the class @code{[^\W_]} matches any letter or digit,
 4405: but not underscore.
 4406: 
 4407: All non-alphameric characters other than @code{\}, @code{-},
 4408: @code{^} (at the start) and the terminating @code{]}
 4409: are non-special in character classes, but it does no harm
 4410: if they are escaped.
 4411: 
 4412: Perl 5.6 supports the @sc{posix} notation for character classes, which
 4413: uses names enclosed by @code{[:} and @code{:]} within the enclosing
 4414: square brackets, and @value{SSED} supports this notation as well.
 4415: For example,
 4416: 
 4417: @example
 4418:      [01[:alpha:]%]
 4419: @end example
 4420: 
 4421: @noindent
 4422: matches @samp{0}, @samp{1}, any alphabetic character, or @samp{%}.
 4423: The supported class names are
 4424: 
 4425: @table @code
 4426: @item alnum
 4427: Matches letters and digits
 4428: 
 4429: @item alpha
 4430: Matches letters
 4431: 
 4432: @item ascii
 4433: Matches character codes 0 - 127
 4434: 
 4435: @item cntrl
 4436: Matches control characters
 4437: 
 4438: @item digit
 4439: Matches decimal digits (same as \d)
 4440: 
 4441: @item graph
 4442: Matches printing characters, excluding space
 4443: 
 4444: @item lower
 4445: Matches lower case letters
 4446: 
 4447: @item print
 4448: Matches printing characters, including space
 4449: 
 4450: @item punct
 4451: Matches printing characters, excluding letters and digits
 4452: 
 4453: @item space
 4454: Matches white space (same as \s)
 4455: 
 4456: @item upper
 4457: Matches upper case letters
 4458: 
 4459: @item word
 4460: Matches ``word'' characters (same as \w)
 4461: 
 4462: @item xdigit
 4463: Matches hexadecimal digits
 4464: @end table
 4465: 
 4466: The names @code{ascii} and @code{word} are extensions valid only in
 4467: Perl mode.  Another Perl extension is negation, which is
 4468: indicated by a circumflex character after the colon. For example,
 4469: 
 4470: @example
 4471:      [12[:^digit:]]
 4472: @end example
 4473: 
 4474: @noindent
 4475: matches @samp{1}, @samp{2}, or any non-digit.
 4476: 
 4477: @node Options setting
 4478: @appendixsec Options setting
 4479: @cindex Perl-style regular expressions, toggling options
 4480: @cindex Perl-style regular expressions, case-insensitive
 4481: @cindex Perl-style regular expressions, multiline
 4482: @cindex Perl-style regular expressions, single line
 4483: @cindex Perl-style regular expressions, extended
 4484: 
 4485: The settings of the @code{I}, @code{M}, @code{S}, @code{X}
 4486: modifiers can be changed from within the pattern by
 4487: a sequence of Perl option letters enclosed between @code{(?}
 4488: and @code{)}. The option letters must be lowercase.
 4489: 
 4490: For example, @code{(?im)} sets caseless, multiline matching. It is
 4491: also possible to unset these options by preceding the letter
 4492: with a hyphen; you can also have combined settings and unsettings:
 4493: @code{(?im-sx)} sets caseless and multiline matching,
 4494: while unsets single line matching (for dots) and extended
 4495: whitespace interpretation.  If a letter appears both before
 4496: and after the hyphen, the option is unset.
 4497: 
 4498: The scope of these option changes depends on where in the
 4499: pattern the setting occurs. For settings that are outside
 4500: any subpattern (defined below), the effect is the same as if
 4501: the options were set or unset at the start of matching. The
 4502: following patterns all behave in exactly the same way:
 4503: 
 4504: @example
 4505:      (?i)abc
 4506:      a(?i)bc
 4507:      ab(?i)c
 4508:      abc(?i)
 4509: @end example
 4510: 
 4511: which in turn is the same as specifying the pattern abc with
 4512: the @code{I} modifier.  In other words, ``top level'' settings
 4513: apply to the whole pattern (unless there are other
 4514: changes inside subpatterns). If there is more than one setting
 4515: of the same option at top level, the rightmost setting
 4516: is used.
 4517: 
 4518: If an option change occurs inside a subpattern, the effect
 4519: is different.  This is a change of behaviour in Perl 5.005.
 4520: An option change inside a subpattern affects only that part
 4521: of the subpattern @emph{that follows} it, so
 4522: 
 4523: @example
 4524:      (a(?i)b)c
 4525: @end example
 4526: 
 4527: @noindent
 4528: matches abc and aBc and no other  strings  (assuming
 4529: case-sensitive matching is used).  By this means, options can
 4530: be made to have different settings in different parts of the
 4531: pattern.  Any changes made in one alternative do carry on
 4532: into subsequent branches within the same subpattern.  For
 4533: example,
 4534: 
 4535: @example
 4536:      (a(?i)b|c)
 4537: @end example
 4538: 
 4539: @noindent
 4540: matches @samp{ab}, @samp{aB}, @samp{c}, and @samp{C},
 4541: even though when matching @samp{C} the first branch is
 4542: abandoned before the option setting.
 4543: This is because the effects of option settings happen at
 4544: compile time. There would be some very weird behaviour otherwise.
 4545: 
 4546: @ignore
 4547: There are two PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA
 4548: that can be changed in the same way as the Perl-compatible options by
 4549: using the characters U and X respectively.  The (?X) flag
 4550: setting is special in that it must always occur earlier in
 4551: the pattern than any of the additional features it turns on,
 4552: even when it is at top level. It is best put at the start.
 4553: @end ignore
 4554: 
 4555: 
 4556: @node Non-capturing subpatterns
 4557: @appendixsec Non-capturing subpatterns
 4558: @cindex Perl-style regular expressions, non-capturing subpatterns
 4559: 
 4560: Marking part of a pattern as a subpattern does two things.
 4561: On one hand, it localizes a set of alternatives; on the other
 4562: hand, it sets up the subpattern as a capturing subpattern (as
 4563: defined above).  The subpattern can be backreferenced and
 4564: referenced in the right side of @code{s} commands.
 4565: 
 4566: For example, if the string @samp{the red king} is matched against
 4567: the pattern
 4568: 
 4569: @example
 4570:      the ((red|white) (king|queen))
 4571: @end example
 4572: 
 4573: @noindent
 4574: the captured substrings are @samp{red king}, @samp{red},
 4575: and @samp{king}, and are numbered 1, 2, and 3.
 4576: 
 4577: The fact that plain parentheses fulfil two functions is not
 4578: always helpful.  There are often times when a grouping
 4579: subpattern is required without a capturing requirement.  If an
 4580: opening parenthesis is followed by @code{?:}, the subpattern does
 4581: not do any capturing, and is not counted when computing the
 4582: number of any subsequent capturing subpatterns. For example,
 4583: if the string @samp{the white queen} is matched against the pattern
 4584: 
 4585: @example
 4586:      the ((?:red|white) (king|queen))
 4587: @end example
 4588: 
 4589: @noindent
 4590: the captured substrings are @samp{white queen} and @samp{queen},
 4591: and are numbered 1 and 2. The maximum number of captured
 4592: substrings is 99, while the maximum number of all subpatterns,
 4593: both capturing and non-capturing, is 200.
 4594: 
 4595: As a convenient shorthand, if any option settings are
 4596: equired at the start of a non-capturing subpattern, the
 4597: option letters may appear between the @code{?} and the
 4598: @code{:}.  Thus the two patterns
 4599: 
 4600: @example
 4601:    (?i:saturday|sunday)
 4602:    (?:(?i)saturday|sunday)
 4603: @end example
 4604: 
 4605: @noindent
 4606: match exactly the same set of strings.  Because alternative
 4607: branches are tried from left to right, and options are not
 4608: reset until the end of the subpattern is reached, an option
 4609: setting in one branch does affect subsequent branches, so
 4610: the above patterns match @samp{SUNDAY} as well as @samp{Saturday}.
 4611: 
 4612: 
 4613: @node Repetition
 4614: @appendixsec Repetition
 4615: @cindex Perl-style regular expressions, repetitions
 4616: 
 4617: Repetition is specified by quantifiers, which can follow any
 4618: of the following items:
 4619: 
 4620: @itemize @bullet
 4621: @item
 4622: a single character, possibly escaped
 4623: 
 4624: @item
 4625: the @code{.} special character
 4626: 
 4627: @item
 4628: a character class
 4629: 
 4630: @item
 4631: a back reference (see next section)
 4632: 
 4633: @item
 4634: a parenthesized subpattern (unless it is an assertion; @pxref{Assertions})
 4635: @end itemize
 4636: 
 4637: The general repetition quantifier specifies a minimum and
 4638: maximum number of permitted matches, by giving the two
 4639: numbers in curly brackets (braces), separated by a comma.
 4640: The numbers must be less than 65536, and the first must be
 4641: less than or equal to the second. For example:
 4642: 
 4643: @example
 4644:      z@{2,4@}
 4645: @end example
 4646: 
 4647: @noindent
 4648: matches @samp{zz}, @samp{zzz}, or @samp{zzzz}. A closing brace on its own
 4649: is not a special character. If the second number is omitted,
 4650: but the comma is present, there is no upper limit; if the
 4651: second number and the comma are both omitted, the quantifier
 4652: specifies an exact number of required matches. Thus
 4653: 
 4654: @example
 4655:      [aeiou]@{3,@}
 4656: @end example
 4657: 
 4658: @noindent
 4659: matches at least 3 successive vowels, but may match many
 4660: more, while
 4661: 
 4662: @example
 4663:      \d@{8@}
 4664: @end example
 4665: 
 4666: @noindent
 4667: matches exactly 8 digits.  An opening curly bracket that
 4668: appears in a position where a quantifier is not allowed, or
 4669: one that does not match the syntax of a quantifier, is taken
 4670: as a literal character. For example, @{,6@} is not a quantifier,
 4671: but a literal string of four characters.@footnote{It
 4672: raises an error if @option{-R} is not used.}
 4673: 
 4674: The quantifier @samp{@{0@}} is permitted, causing the expression to
 4675: behave as if the previous item and the quantifier were not
 4676: present.
 4677: 
 4678: For convenience (and historical compatibility) the three
 4679: most common quantifiers have single-character abbreviations:
 4680: 
 4681: @table @code
 4682: @item *
 4683: is equivalent to @{0,@}
 4684: 
 4685: @item +
 4686: is equivalent to @{1,@}
 4687: 
 4688: @item ?
 4689: is equivalent to @{0,1@}
 4690: @end table
 4691: 
 4692: It is possible to construct infinite loops by following a
 4693: subpattern that can match no characters with a quantifier
 4694: that has no upper limit, for example:
 4695: 
 4696: @example
 4697:      (a?)*
 4698: @end example
 4699: 
 4700: Earlier versions of Perl used to give an error at
 4701: compile time for such patterns. However, because there are
 4702: cases where this can be useful, such patterns are now
 4703: accepted, but if any repetition of the subpattern does in
 4704: fact match no characters, the loop is forcibly broken.
 4705: 
 4706: @cindex Greedy regular expression matching
 4707: @cindex Perl-style regular expressions, stingy repetitions
 4708: By default, the quantifiers are @dfn{greedy} like in @sc{posix}
 4709: mode, that is, they match as much as possible (up to the maximum
 4710: number of permitted times), without causing the rest of the
 4711: pattern to fail. The classic example of where this gives problems
 4712: is in trying to match comments in C programs. These appear between
 4713: the sequences @code{/*} and @code{*/} and within the sequence, individual
 4714: @code{*} and @code{/} characters may appear. An attempt to match C
 4715: comments by applying the pattern
 4716: 
 4717: @example
 4718:      /\*.*\*/
 4719: @end example
 4720: 
 4721: @noindent
 4722: to the string
 4723: 
 4724: @example
 4725:      /* first command */ not comment /* second comment */
 4726: @end example
 4727: 
 4728: @noindent
 4729: 
 4730: fails, because it matches the entire string owing to the
 4731: greediness of the @code{.*} item.
 4732: 
 4733: However, if a quantifier is followed by a question mark, it
 4734: ceases to be greedy, and instead matches the minimum number
 4735: of times possible, so the pattern @code{/\*.*?\*/}
 4736: does the right thing with the C comments. The meaning of the
 4737: various quantifiers is not otherwise changed, just the preferred
 4738: number of matches.  Do not confuse this use of question
 4739: mark with its use as a quantifier in its own right.
 4740: Because it has two uses, it can sometimes appear doubled, as in
 4741: 
 4742: @example
 4743:      \d??\d
 4744: @end example
 4745: 
 4746: which matches one digit by preference, but can match two if
 4747: that is the only way the rest of the pattern matches.
 4748: 
 4749: Note that greediness does not matter when specifying addresses,
 4750: but can be nevertheless used to improve performance.
 4751: 
 4752: @ignore
 4753:    If the PCRE_UNGREEDY option is set (an option which is not
 4754:    available in Perl), the quantifiers are not greedy by
 4755:    default, but individual ones can be made greedy by following
 4756:    them with a question mark. In other words, it inverts the
 4757:    default behaviour.
 4758: @end ignore
 4759: 
 4760: When a parenthesized subpattern is quantified with a minimum
 4761: repeat count that is greater than 1 or with a limited maximum,
 4762: more store is required for the compiled pattern, in
 4763: proportion to the size of the minimum or maximum.
 4764: 
 4765: @cindex Perl-style regular expressions, single line
 4766: If a pattern starts with @code{.*} or @code{.@{0,@}} and the
 4767: @code{S} modifier is used, the pattern is implicitly anchored,
 4768: because whatever follows will be tried against every character
 4769: position in the subject string, so there is no point in
 4770: retrying the overall match at any position after the first.
 4771: PCRE treats such a pattern as though it were preceded by \A.
 4772: 
 4773: When a capturing subpattern is repeated, the value captured
 4774: is the substring that matched the final iteration. For example,
 4775: after
 4776: 
 4777: @example
 4778:      (tweedle[dume]@{3@}\s*)+
 4779: @end example
 4780: 
 4781: @noindent
 4782: has matched @samp{tweedledum tweedledee} the value of the
 4783: captured substring is @samp{tweedledee}.  However, if there are
 4784: nested capturing subpatterns, the corresponding captured
 4785: values may have been set in previous iterations. For example,
 4786: after
 4787: 
 4788: @example
 4789:      /(a|(b))+/
 4790: @end example
 4791: 
 4792: matches @samp{aba}, the value of the second captured substring is
 4793: @samp{b}.
 4794: 
 4795: @node Backreferences
 4796: @appendixsec Backreferences
 4797: @cindex Perl-style regular expressions, backreferences
 4798: 
 4799: Outside a character class, a backslash followed by a digit
 4800: greater than 0 (and possibly further digits) is a back
 4801: reference to a capturing subpattern earlier (i.e.  to its
 4802: left) in the pattern, provided there have been that many
 4803: previous capturing left parentheses.
 4804: 
 4805: However, if the decimal number following the backslash is
 4806: less than 10, it is always taken as a back reference, and
 4807: causes an error only if there are not that many capturing
 4808: left parentheses in the entire pattern. In other words, the
 4809: parentheses that are referenced need not be to the left of
 4810: the reference for numbers less than 10. @ref{Backslash}
 4811: for further details of the handling of digits following a backslash.
 4812: 
 4813: A back reference matches whatever actually matched the capturing
 4814: subpattern in the current subject string, rather than
 4815: anything matching the subpattern itself. So the pattern
 4816: 
 4817: @example
 4818:      (sens|respons)e and \1ibility
 4819: @end example
 4820: 
 4821: @noindent
 4822: matches @samp{sense and sensibility} and @samp{response and responsibility},
 4823: but not @samp{sense and responsibility}. If caseful
 4824: matching is in force at the time of the back reference, the
 4825: case of letters is relevant. For example,
 4826: 
 4827: @example
 4828:      ((?i)blah)\s+\1
 4829: @end example
 4830: 
 4831: @noindent
 4832: matches @samp{blah blah} and @samp{Blah Blah}, but not
 4833: @samp{BLAH blah}, even though the original capturing
 4834: subpattern is matched caselessly.
 4835: 
 4836: There may be more than one back reference to the same subpattern.
 4837: Also, if a subpattern has not actually been used in a
 4838: particular match, any back references to it always fail. For
 4839: example, the pattern
 4840: 
 4841: @example
 4842:      (a|(bc))\2
 4843: @end example
 4844: 
 4845: @noindent
 4846: always fails if it starts to match @samp{a} rather than
 4847: @samp{bc}.  Because there may be up to 99 back references, all
 4848: digits following the backslash are taken as part of a potential
 4849: back reference number; this is different from what happens
 4850: in @sc{posix} mode. If the pattern continues with a digit
 4851: character, some delimiter must be used to terminate the back
 4852: reference.  If the @code{X} modifier option is set, this can be
 4853: whitespace.  Otherwise an empty comment can be used, or the
 4854: following character can be expressed in hexadecimal or octal.
 4855: 
 4856: A back reference that occurs inside the parentheses to which
 4857: it refers fails when the subpattern is first used, so, for
 4858: example, @code{(a\1)} never matches.  However, such references
 4859: can be useful inside repeated subpatterns. For example, the
 4860: pattern
 4861: 
 4862: @example
 4863:      (a|b\1)+
 4864: @end example
 4865: 
 4866: @noindent
 4867: matches any number of @samp{a}s and also @samp{aba}, @samp{ababbaa},
 4868: etc. At each iteration of the subpattern, the back reference matches
 4869: the character string corresponding to the previous iteration.  In
 4870: order for this to work, the pattern must be such that the first
 4871: iteration does not need to match the back reference.  This can be
 4872: done using alternation, as in the example above, or by a
 4873: quantifier with a minimum of zero.
 4874: 
 4875: @node Assertions
 4876: @appendixsec Assertions
 4877: @cindex Perl-style regular expressions, assertions
 4878: @cindex Perl-style regular expressions, asserting subpatterns
 4879: 
 4880: An assertion is a test on the characters following or
 4881: preceding the current matching point that does not actually
 4882: consume any characters. The simple assertions coded as @code{\b},
 4883: @code{\B}, @code{\A}, @code{\Z}, @code{\z}, @code{^} and @code{$}
 4884: are described above. More complicated assertions are coded as
 4885: subpatterns.  There are two kinds: those that look ahead of the
 4886: current position in the subject string, and those that look behind it.
 4887: 
 4888: @cindex Perl-style regular expressions, lookahead subpatterns
 4889: An assertion subpattern is matched in the normal way, except
 4890: that it does not cause the current matching position to be
 4891: changed. Lookahead assertions start with @code{(?=} for positive
 4892: assertions and @code{(?!} for negative assertions. For example,
 4893: 
 4894: @example
 4895:      \w+(?=;)
 4896: @end example
 4897: 
 4898: @noindent
 4899: matches a word followed by a semicolon, but does not include
 4900: the semicolon in the match, and
 4901: 
 4902: @example
 4903:      foo(?!bar)
 4904: @end example
 4905: 
 4906: @noindent
 4907: matches any occurrence of @samp{foo} that is not followed by
 4908: @samp{bar}.
 4909: 
 4910: Note that the apparently similar pattern
 4911: 
 4912: @example
 4913:      (?!foo)bar
 4914: @end example
 4915: 
 4916: @noindent
 4917: @cindex Perl-style regular expressions, lookbehind subpatterns
 4918: finds any occurrence of @samp{bar} even if it is preceded by
 4919: @samp{foo}, because the assertion @code{(?!foo)} is always true
 4920: when the next three characters are @samp{bar}. A lookbehind
 4921: assertion is needed to achieve this effect.
 4922: Lookbehind assertions start with @code{(?<=} for positive
 4923: assertions and @code{(?<!} for negative assertions. So,
 4924: 
 4925: @example
 4926:      (?<!foo)bar
 4927: @end example
 4928: 
 4929: achieves the required effect of finding an occurrence of
 4930: @samp{bar} that is not preceded by @samp{foo}. The contents of a
 4931: lookbehind assertion are restricted
 4932: such that all the strings it matches must have a fixed
 4933: length.  However, if there are several alternatives, they do
 4934: not all have to have the same fixed length.  This is an extension
 4935: compared with Perl 5.005, which requires all branches to match
 4936: the same length of string. Thus
 4937: 
 4938: @example
 4939:      (?<=dogs|cats|)
 4940: @end example
 4941: 
 4942: @noindent
 4943: is permitted, but the apparently equivalent regular expression
 4944: 
 4945: @example
 4946:      (?<!dogs?|cats?)
 4947: @end example
 4948: 
 4949: @noindent
 4950: causes an error at compile time. Branches that match different
 4951: length strings are permitted only at the top level of
 4952: a lookbehind assertion: an assertion such as
 4953: 
 4954: @example
 4955:      (?<=ab(c|de))
 4956: @end example
 4957: 
 4958: @noindent
 4959: is not permitted, because its single top-level branch can
 4960: match two different lengths, but it is acceptable if rewritten
 4961: to use two top-level branches:
 4962: 
 4963: @example
 4964:      (?<=abc|abde)
 4965: @end example
 4966: 
 4967: All this is required because lookbehind assertions simply
 4968: move the current position back by the alternative's fixed
 4969: width and then try to match.  If there are
 4970: insufficient characters before the current position, the
 4971: match is deemed to fail.  Lookbehinds, in conjunction with
 4972: non-backtracking subpatterns can be particularly useful for
 4973: matching at the ends of strings; an example is given at the end
 4974: of the section on non-backtracking subpatterns.
 4975: 
 4976: Several assertions (of any sort) may occur in succession.
 4977: For example,
 4978: 
 4979: @example
 4980:      (?<=\d@{3@})(?<!999)foo
 4981: @end example
 4982: 
 4983: @noindent
 4984: matches @samp{foo} preceded by three digits that are not @samp{999}.
 4985: Notice that each of the assertions is applied independently
 4986: at the same point in the subject string. First there is a
 4987: check that the previous three characters are all digits, and
 4988: then there is a check that the same three characters are not
 4989: @samp{999}.  This pattern does not match @samp{foo} preceded by six
 4990: characters, the first of which are digits and the last three
 4991: of which are not @samp{999}.  For example, it doesn't match
 4992: @samp{123abcfoo}. A pattern to do that is
 4993: 
 4994: @example
 4995:      (?<=\d@{3@}...)(?<!999)foo
 4996: @end example
 4997: 
 4998: @noindent
 4999: This time the first assertion looks at the preceding six
 5000: characters, checking that the first three are digits, and
 5001: then the second assertion checks that the preceding three
 5002: characters are not @samp{999}.  Actually, assertions can be
 5003: nested in any combination, so one can write this as 
 5004: 
 5005: @example
 5006:      (?<=\d@{3@}(?!999)...)foo
 5007: @end example
 5008: 
 5009: or
 5010: 
 5011: @example
 5012:      (?<=\d@{3@}...(?<!999))foo
 5013: @end example
 5014: 
 5015: @noindent
 5016: both of which might be considered more readable.
 5017: 
 5018: Assertion subpatterns are not capturing subpatterns, and may
 5019: not be repeated, because it makes no sense to assert the
 5020: same thing several times. If any kind of assertion contains
 5021: capturing subpatterns within it, these are counted for the
 5022: purposes of numbering the capturing subpatterns in the whole
 5023: pattern.  However, substring capturing is carried out only
 5024: for positive assertions, because it does not make sense for
 5025: negative assertions.
 5026: 
 5027: Assertions count towards the maximum of 200 parenthesized
 5028: subpatterns.
 5029: 
 5030: @node Non-backtracking subpatterns
 5031: @appendixsec Non-backtracking subpatterns
 5032: @cindex Perl-style regular expressions, non-backtracking subpatterns
 5033: 
 5034: With both maximizing and minimizing repetition, failure of
 5035: what follows normally causes the repeated item to be evaluated
 5036: again to see if a different number of repeats allows the
 5037: rest of the pattern to match. Sometimes it is useful to
 5038: prevent this, either to change the nature of the match, or
 5039: to cause it fail earlier than it otherwise might, when the
 5040: author of the pattern knows there is no point in carrying
 5041: on.
 5042: 
 5043: Consider, for example, the pattern @code{\d+foo} when applied to
 5044: the subject line
 5045: 
 5046: @example
 5047:      123456bar
 5048: @end example
 5049: 
 5050: After matching all 6 digits and then failing to match @samp{foo},
 5051: the normal action of the matcher is to try again with only 5
 5052: digits matching the @code{\d+} item, and then with 4, and so on,
 5053: before ultimately failing. Non-backtracking subpatterns
 5054: provide the means for specifying that once a portion of the
 5055: pattern has matched, it is not to be re-evaluated in this way,
 5056: so the matcher would give up immediately on failing to match
 5057: @samp{foo} the first time.  The notation is another kind of special
 5058: parenthesis, starting with @code{(?>} as in this example:
 5059: 
 5060: @example
 5061:      (?>\d+)bar
 5062: @end example
 5063: 
 5064: This kind of parenthesis ``locks up'' the part of the pattern
 5065: it contains once it has matched, and a failure further into
 5066: the pattern is prevented from backtracking into it.
 5067: Backtracking past it to previous items, however, works as
 5068: normal.
 5069: 
 5070: Non-backtracking subpatterns are not capturing subpatterns.  Simple
 5071: cases such as the above example can be thought of as a maximizing
 5072: repeat that must swallow everything it can.  So,
 5073: while both @code{\d+} and @code{\d+?} are prepared to adjust the number of
 5074: digits they match in order to make the rest of the pattern
 5075: match, @code{(?>\d+)} can only match an entire sequence of digits.
 5076: 
 5077: This construction can of course contain arbitrarily complicated
 5078: subpatterns, and it can be nested.
 5079: 
 5080: @cindex Perl-style regular expressions, lookbehind subpatterns
 5081: Non-backtracking subpatterns can be used in conjunction with look-behind
 5082: assertions to specify efficient matching at the end
 5083: of the subject string. Consider a simple pattern such as
 5084: 
 5085: @example
 5086:      abcd$
 5087: @end example
 5088: 
 5089: @noindent
 5090: when applied to a long string which does not match.  Because
 5091: matching proceeds from left to right, @command{sed} will look for
 5092: each @samp{a} in the subject and then see if what follows matches
 5093: the rest of the pattern. If the pattern is specified as
 5094: 
 5095: @example
 5096:      ^.*abcd$
 5097: @end example
 5098: 
 5099: @noindent
 5100: the initial @code{.*} matches the entire string at first, but when
 5101: this fails (because there is no following @samp{a}), it backtracks
 5102: to match all but the last character, then all but the
 5103: last two characters, and so on. Once again the search for
 5104: @samp{a} covers the entire string, from right to left, so we are
 5105: no better off. However, if the pattern is written as
 5106: 
 5107: @example
 5108:      ^(?>.*)(?<=abcd)
 5109: @end example
 5110: 
 5111: there can be no backtracking for the .* item; it can match
 5112: only the entire string. The subsequent lookbehind assertion
 5113: does a single test on the last four characters. If it fails,
 5114: the match fails immediately. For long strings, this approach
 5115: makes a significant difference to the processing time.
 5116: 
 5117: When a pattern contains an unlimited repeat inside a subpattern
 5118: that can itself be repeated an unlimited number of
 5119: times, the use of a once-only subpattern is the only way to
 5120: avoid some failing matches taking a very long time
 5121: indeed.@footnote{Actually, the matcher embedded in @value{SSED}
 5122:     tries to do something for this in the simplest cases,
 5123:     like @code{([^b]*b)*}.  These cases are actually quite
 5124:     common: they happen for example in a regular expression
 5125:     like @code{\/\*([^*]*\*)*\/} which matches C comments.}
 5126: 
 5127: The pattern
 5128: 
 5129: @example
 5130:      (\D+|<\d+>)*[!?]
 5131: @end example
 5132: 
 5133: ([^0-9<]+<(\d+>)?)*[!?]
 5134: 
 5135: @noindent
 5136: matches an unlimited number of substrings that either consist
 5137: of non-digits, or digits enclosed in angular brackets, followed by
 5138: an exclamation or question mark. When it matches, it runs quickly.
 5139: However, if it is applied to
 5140: 
 5141: @example
 5142:      aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 5143: @end example
 5144: 
 5145: @noindent
 5146: it takes a long time before reporting failure.  This is
 5147: because the string can be divided between the two repeats in
 5148: a large number of ways, and all have to be tried.@footnote{The
 5149: example used @code{[!?]} rather than a single character at the end,
 5150: because both @value{SSED} and Perl have an optimization that allows
 5151: for fast failure when a single character is used. They
 5152: remember the last single character that is required for a
 5153: match, and fail early if it is not present in the string.}
 5154: 
 5155: If the pattern is changed to
 5156: 
 5157: @example
 5158:      ((?>\D+)|<\d+>)*[!?]
 5159: @end example
 5160: 
 5161: sequences of non-digits cannot be broken, and failure happens
 5162: quickly.
 5163: 
 5164: @node Conditional subpatterns
 5165: @appendixsec Conditional subpatterns
 5166: @cindex Perl-style regular expressions, conditional subpatterns
 5167: 
 5168: It is possible to cause the matching process to obey a subpattern
 5169: conditionally or to choose between two alternative
 5170: subpatterns, depending on the result of an assertion, or
 5171: whether a previous capturing subpattern matched or not. The
 5172: two possible forms of conditional subpattern are
 5173: 
 5174: @example
 5175:      (?(@var{condition})@var{yes-pattern})
 5176:      (?(@var{condition})@var{yes-pattern}|@var{no-pattern})
 5177: @end example
 5178: 
 5179: If the condition is satisfied, the yes-pattern is used; otherwise
 5180: the no-pattern (if present) is used. If there are more than two
 5181: alternatives in the subpattern, a compile-time error occurs.
 5182: 
 5183: There are two kinds of condition. If the text between the
 5184: parentheses consists of a sequence of digits, the condition
 5185: is satisfied if the capturing subpattern of that number has
 5186: previously matched.  The number must be greater than zero.
 5187: Consider the following pattern, which contains non-significant
 5188: white space to make it more readable (assume the @code{X} modifier)
 5189: and to divide it into three parts for ease of discussion:
 5190: 
 5191: @example
 5192:      ( \( )?   [^()]+   (?(1) \) )
 5193: @end example
 5194: 
 5195: The first part matches an optional opening parenthesis, and
 5196: if that character is present, sets it as the first captured
 5197: substring. The second part matches one or more characters
 5198: that are not parentheses. The third part is a conditional
 5199: subpattern that tests whether the first set of parentheses
 5200: matched or not.  If they did, that is, if subject started
 5201: with an opening parenthesis, the condition is true, and so
 5202: the yes-pattern is executed and a closing parenthesis is
 5203: required. Otherwise, since no-pattern is not present, the
 5204: subpattern matches nothing.  In other words, this pattern
 5205: matches a sequence of non-parentheses, optionally enclosed
 5206: in parentheses.
 5207: 
 5208: @cindex Perl-style regular expressions, lookahead subpatterns
 5209: If the condition is not a sequence of digits, it must be an
 5210: assertion.  This may be a positive or negative lookahead or
 5211: lookbehind assertion. Consider this pattern, again containing
 5212: non-significant white space, and with the two alternatives
 5213: on the second line:
 5214: 
 5215: @example
 5216:      (?(?=...[a-z])
 5217:         \d\d-[a-z]@{3@}-\d\d |
 5218:         \d\d-\d\d-\d\d )
 5219: @end example
 5220: 
 5221: The condition is a positive lookahead assertion that matches
 5222: a letter that is three characters away from the current point.
 5223: If a letter is found, the subject is matched against the first
 5224: alternative @samp{@var{dd}-@var{aaa}-@var{dd}} (where @var{aaa} are
 5225: letters and @var{dd} are digits); otherwise it is matched against 
 5226: the second alternative, @samp{@var{dd}-@var{dd}-@var{dd}}.
 5227: 
 5228: 
 5229: @node Recursive patterns
 5230: @appendixsec Recursive patterns
 5231: @cindex Perl-style regular expressions, recursive patterns
 5232: @cindex Perl-style regular expressions, recursion
 5233: 
 5234: Consider the problem of matching a string in parentheses,
 5235: allowing for unlimited nested parentheses. Without the use
 5236: of recursion, the best that can be done is to use a pattern
 5237: that matches up to some fixed depth of nesting. It is not
 5238: possible to handle an arbitrary nesting depth. Perl 5.6 has
 5239: provided an experimental facility that allows regular
 5240: expressions to recurse (amongst other things). It does this
 5241: by interpolating Perl code in the expression at run time,
 5242: and the code can refer to the expression itself. A Perl pattern
 5243: tern to solve the parentheses problem can be created like
 5244: this:
 5245: 
 5246: @example
 5247:      $re = qr@{\( (?: (?>[^()]+) | (?p@{$re@}) )* \)@}x;
 5248: @end example
 5249: 
 5250: The @code{(?p@{...@})} item interpolates Perl code at run time,
 5251: and in this case refers recursively to the pattern in which it
 5252: appears. Obviously, @command{sed} cannot support the interpolation of
 5253: Perl code.  Instead, the special item @code{(?R)} is provided for
 5254: the specific case of recursion. This pattern solves the
 5255: parentheses problem (assume the @code{X} modifier option is used
 5256: so that white space is ignored):
 5257: 
 5258: @example
 5259:      \( ( (?>[^()]+) | (?R) )* \)
 5260: @end example
 5261: 
 5262: First it matches an opening parenthesis. Then it matches any
 5263: number of substrings which can either be a sequence of
 5264: non-parentheses, or a recursive match of the pattern itself
 5265: (i.e. a correctly parenthesized substring). Finally there is
 5266: a closing parenthesis.
 5267: 
 5268: This particular example pattern contains nested unlimited
 5269: repeats, and so the use of a non-backtracking subpattern for
 5270: matching strings of non-parentheses is important when applying
 5271: the pattern to strings that do not match. For example, when
 5272: it is applied to
 5273: 
 5274: @example
 5275:      (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
 5276: @end example
 5277: 
 5278: it yields a ``no match'' response quickly. However, if a
 5279: standard backtracking subpattern is not used, the match runs
 5280: for a very long time indeed because there are so many different
 5281: ways the @code{+} and @code{*} repeats can carve up the subject,
 5282: and all have to be tested before failure can be reported.
 5283: 
 5284: The values set for any capturing subpatterns are those from
 5285: the outermost level of the recursion at which the subpattern
 5286: value is set. If the pattern above is matched against
 5287: 
 5288: @example
 5289:      (ab(cd)ef)
 5290: @end example
 5291: 
 5292: @noindent
 5293: the value for the capturing parentheses is @samp{ef}, which is
 5294: the last value taken on at the top level.
 5295: 
 5296: @node Comments
 5297: @appendixsec Comments
 5298: @cindex Perl-style regular expressions, comments
 5299: 
 5300: The sequence (?# marks the start of a comment which continues
 5301: ues up to the next closing parenthesis. Nested parentheses
 5302: are not permitted. The characters that make up a comment
 5303: play no part in the pattern matching at all.
 5304: 
 5305: @cindex Perl-style regular expressions, extended
 5306: If the @code{X} modifier option is used, an unescaped @code{#} character
 5307: outside a character class introduces a comment that continues
 5308: up to the next newline character in the pattern.
 5309: @end ifset
 5310: 
 5311: 
 5312: @page
 5313: @node Concept Index
 5314: @c @unnumbered Concept Index
 5315: @unnumbered 概念の索引
 5316: 
 5317: @c This is a general index of all issues discussed in this manual, with the
 5318: @c exception of the @command{sed} commands and command-line options.
 5319: @c 
 5320: 以下は,このマニュアルで議論してきたすべての問題の,@command{sed}コマン
 5321: ドとコマンドラインオプション以外の一般的な索引です.
 5322: 
 5323: @printindex cp
 5324: 
 5325: @page
 5326: @node Command and Option Index
 5327: @c @unnumbered Command and Option Index
 5328: @unnumbered コマンドとオプションの索引
 5329: 
 5330: @c This is an alphabetical list of all @command{sed} commands and command-line
 5331: @c options.
 5332: @c 
 5333: 以下は,すべての@command{sed}コマンドとコマンドラインオプションのアルファ
 5334: ベット順のリストです.
 5335: 
 5336: @printindex fn
 5337: 
 5338: @contents
 5339: @bye
 5340: 
 5341: @c XXX FIXME: the term "cycle" is never defined...

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>