Configuration

RedPen’s configuration file consists of two separate blocks. One block configures the RedPen validators and the other lets you override characters and symbols for the input documents.

Configuration file

RedPen has a single configuration file, which contains all the settings RedPen require to work with different types of input documents. The main configuration file is an xml file with a root element of “redpen-conf”. Within this element there are two sub elements named “validators” and “symbols”.

In order to match the default validators and character settings to a target language, such as Japanese or English, we can specify lang and type attributes in the redpen-conf element to override the default character settings.

The validators section specifies which validators are to be loaded by RedPen. Each validator within this section can have their property values overriden.

symbols section overrides the default symbol settings for the target language.

The following is an example of a RedPen configuration file.

<redpen-conf lang="en">
    <validators>
        <validator name="SentenceLength">
            <property name="max_len" value="200"/>
        </validator>
        <validator name="InvalidSymbol" />
        <validator name="SpaceWithSymbol" />
        <validator name="SectionLength">
            <property name="max_num" value="2000"/>
        </validator>
        <validator name="ParagraphNumber" />
    </validators>
    <symbols>
         <symbol name="EXCLAMATION_MARK" value="!" invalid-chars="!" after-space="true" />
         <symbol name="LEFT_QUOTATION_MARK" value="\'"  invalid-chars="“" before-space="true" />
    </symbols>
</redpen-conf>

In the next section we will cover the configuration of validators in greater detail. The settings for the symbols section are described in Setting symbols.

Validator configuration

The RedPen configuration file contains a “validators” section for registering Validators. RedPen will apply each validator specified in this section to the to the input document.

The following is a sample “validators” section.

<validators>
    <validator name="SentenceLength">
        <property name="max_len" value="200"/>
    </validator>
    <validator name="InvalidSymbol" />
    <validator name="SpaceWithSymbol" />
    <validator name="SectionLength">
        <property name="max_num" value="2000"/>
    </validator>
    <validator name="ParagraphNumber" />
 </validators>

Each validator is configured within its own “validator” element. The “name” attribute of this element specifies the name of the validator, which is essentially the validator’s class name without the trailing “Validator”. Each validator is responsible for checking a particular aspect of the input document. For example, if the “SectionLength” validator is included in the configuration, then RedPen’s DocumentValidator will check the length of each ‘section’ of each input document.

Some validator components can be configured using “property” elements. For example, you can override the maxmimum character count used by the “SectionLength” validator by specifying a “max_num” property. Some validators also have “sub-validators” which can also be configured within the validator section.

We will cover all supported validators in the Supported Validators page.

Setting symbols

The lang attribute of the redpen-conf element determines how various symbols are handled by RedPen. RedPen supports default symbols for “en” and “ja”, which are described in Default Settings for English and Default Settings for Japanese.

The default symbol settings for a target language can be overridden by configuring the “symbols” section of the RedPen configuration file.

The default settings are described in the following sections. Within the symbols configuration section we can use symbol elements to specify which symbols to use when validating documents. Each “symbol” element overrides a character found in the documents.

The following table describes the properties of the symbol element.

Property Mandatory Default Value Description
name true none Name of the symbol
value true none Value of the symbol
before-space false false Need space before the symbol
after-space false false Need space after the symbol
invalid-chars false “” List of invalid symbols

Sample: Setting symbols

In the following example, we can see a symbols section that defines 3 symbols. The first element defines exlamation mark as ‘!’. Then, FULL_STOP defines a period as the character ”.” and specifies that the symbol must be followed by a space. The third element defines comma as ‘,’ and also defines ‘、’ and ‘,’ as invalid comma characters. This is because some characters have equivalent symbolic meanings.

For example, in Japanese both ‘.’ and ‘。’ can represent a FULL_STOP. The invalid-chars setting allows us to restrict which character alternatives are permitted in our documents.

<symbols>
    <symbol name="EXCLAMATION_MARK" value="!" />
    <symbol name="FULL_STOP" value="." after-space="true" />
    <symbol name="COMMA" value="," invalid-chars="、," after-space="true" />
</symbols>

Default Settings for English

The following table shows the default symbol settings for English and other latin based documents. In the table, the first column contains the name of each symbol and the second column (Value) shows the symbol’s character value. The columns ‘NeedBeforeSpace’, ‘NeedAfterSpace’ ‘InvalidChars’ indicate if the symbol should be followed by or preceded by a space and the symbol’s invalid characters, respectively.

Symbol Value NeedBeforeSpace NeedAfterSpace InvalidChars Description
FULL_STOP ‘.’ false true ‘.’, ‘。’ Sentence period
SPACE ‘ ‘ false false ‘ ’ White space between words
EXCLAMATION_MARK ‘!’ false true ‘!’ Exclamation mark
NUMBER_SIGN ‘#’ false false ‘#’ Number sign
DOLLAR_SIGN ‘$’ false false ‘$’ Dollar sign
PERCENT_SIGN ‘%’ false false ‘%’ Percent sign
QUESTION_MARK ‘?’ false true ‘?’ Question mark
AMPERSAND ‘&’ false true ‘&’ Ampersand
LEFT_PARENTHESIS ‘(‘ true false ‘(’ Left parenthesis
RIGHT_PARENTHESIS ‘)’ false true ‘)’ Right parenthesis
ASTERISK ‘*’ false false ‘*’ Asterrisk
COMMA ‘,’ false true ‘、’,’,’ Comma
PLUS_SIGN ‘+’ false false ‘+’ Plus sign
HYPHEN_SIGN ‘-‘ false false ‘ー’ Hyphenation
SLASH ‘/’ false false ‘/’ Slash
COLON ‘:’ false true ‘:’ Colon
SEMICOLON ‘;’ false true ‘;’ Semicolon
LESS_THAN_SIGN ‘<’ false false ‘<’ Less than sign
GREATER_THAN_SIGN ‘>’ false false ‘>’ Greater than sign
EQUAL_SIGN ‘=’ false false ‘=’ Equal sign
AT_MARK ‘@’ false false ‘@’ At mark
LEFT_SQUARE_BRACKET ‘[‘ true false   Left square bracket
RIGHT_SQUARE_BRACKET ‘]’ false true   Right square bracket
BACKSLASH ‘’ false false   Backslash
CIRCUMFLEX_ACCENT ‘^’ false false ‘^’ Circumflex accent
LOW_LINE ‘_’ false false ‘_’ Low line (under bar)
LEFT_CURLY_BRACKET ‘{‘ true false ‘{’ Left curly bracket
RIGHT_CURLY_BRACKET ‘}’ true false ‘}’ Right curly bracket
VERTICAL_VAR ‘|’ false false ‘|’ Vertical bar
TILDE ‘~’ false false ‘〜’ Tilde
LEFT_SINGLE_QUOTATION_MARK ‘’‘ false false   Left single quotation mark
RIGHT_SINGLE_QUOTATION_MARK ‘’‘ false false   Right single quotation mark
LEFT_DOUBLE_QUOTATION_MARK ‘”’ false false   Left double quotation mark
RIGHT_DOUBLE_QUOTATION_MARK ‘”’ false false   Right double quotation mark

These settings are used by several Validators such as InvalidSymbol and SpaceValidator. If you want to change the symbol definitions used by these Validators, you can override the settings by adding symbol elements to the symbols section of the redpen configuration file.

Default Settings for Japanese

The following table shows the default symbol settings for Japanese documents. In the table, the first column contains the name of each symbol and the second column (Value) shows the symbol’s character value. The columns ‘NeedBeforeSpace’, ‘NeedAfterSpace’ ‘InvalidChars’ indicate if the symbol should be followed by or preceded by a space and the symbol’s invalid characters, respectively.

Symbol Value NeedBeforeSpace NeedAfterSpace InvalidChars Description
FULL_STOP ‘。’ false false ‘.’,’.’ Sentence period
SPACE ‘ ’ false false   White space between words
EXCLAMATION_MARK ‘!’ false false ‘!’ Exclamation mark
NUMBER_SIGN ‘#’ false false ‘#’ Number sign
DOLLAR_SIGN ‘$’ false false ‘$’ Dollar sign
PERCENT_SIGN ‘%’ false false ‘%’ Percent sign
QUESTION_MARK ‘?’ false false ‘?’ Question mark
AMPERSAND ‘&’ false false ‘&’ Ampersand
LEFT_PARENTHESIS ‘(’ false false ‘(‘ Left parenthesis
RIGHT_PARENTHESIS ‘)’ false false ‘)’ Right parenthesis
ASTERISK ‘*’ false false ‘*’ Asterrisk
COMMA ‘、’ false false ‘,’,’,’ Comma
PLUS_SIGN ‘+’ false false ‘+’ Plus sign
HYPHEN_SIGN ‘ー’ false false ‘-‘ Hyphenation
SLASH ‘/’ false false ‘/’ Slash
COLON ‘:’ false false ‘:’ Colon
SEMICOLON ‘;’ false false ‘;’ Semicolon
LESS_THAN_SIGN ‘<’ false false ‘<’ Less than sign
GREATER_THAN_SIGN ‘>’ false false ‘>’ Greater than sign
EQUAL_SIGN ‘=’ false false ‘=’ Equal sign
AT_MARK ‘@’ false false ‘@’ At mark
LEFT_SQUARE_BRACKET ‘「’ true false   Left square bracket
RIGHT_SQUARE_BRACKET ‘」’ false false   Right square bracket
BACKSLASH ‘¥’ false false   Backslash
CIRCUMFLEX_ACCENT ‘^’ false false ‘^’ Circumflex accent
LOW_LINE ‘_’ false false ‘_’ Low line (under bar)
LEFT_CURLY_BRACKET ‘{’ true false ‘{‘ Left curly bracket
RIGHT_CURLY_BRACKET ‘}’ true false ‘}’ Right curly bracket
VERTICAL_VAR ‘|’ false false ‘|’ Vertical bar
TILDE ‘〜’ false false ‘~’ Tilde
LEFT_SINGLE_QUOTATION_MARK ‘‘’ false false   Left single quotation mark
RIGHT_SINGLE_QUOTATION_MARK ‘’’ false false   Right single quotation mark
LEFT_DOUBLE_QUOTATION_MARK ‘“’ false false   Left double quotation mark
RIGHT_DOUBLE_QUOTATION_MARK ‘”’ false false   Right double quotation mark

These settings are used by several Validators such as InvalidSymbol and SpaceValidator. If you want to change the symbol definitions used by these Validators, you can override the settings by adding symbol elements to the symbols section of the redpen configuration file.

Japanese Symbol Valiations

Symbols in Japanese has vary by the author and the writing group. RedPen provide the two defalut symbol settings for Japanese. The valiations are specified with type attribute. Currently there are two variation for Japanese symbol settings (“zenkaku2” and “hankaku”).

For example the following is the sample of configuration file for Japanese text with the “zenkaku2” setting.

<redpen-conf lang="ja" type="zenkaku2">
    <validators>
        <validator name="InvalidSymbol" />
        <validator name="SpaceWithSymbol" />
        <validator name="SectionLength" />
        <validator name="ParagraphNumber" />
    </validators>
</redpen-conf>

The symbols of “hankaku” type is the same as the symbol settings as “en.” The symbols of “zenkaku2” is almost the same as normal type of “ja” with the following exceptions.

Symbol Value NeedBeforeSpace NeedAfterSpace InvalidChars Description
FULL_STOP ‘.’ false false ‘.’, ‘。’ Sentence period
COMMA ‘,’ false false ‘,’,’、’ Comma