Configuration¶
RedPen has a configuration file, which has two blocks. One is for validators configuration and the other is for overriding characters and symbols for input documents.
Configuration file¶
RedPen has one configuration file, which does the all settings needed to work RedPen with input documents. The main configuration file is a xml file which has the root block “redpen-conf” and configuration block contains two sub blocks “validators” and “symbols”.
In order to fit the default validaotrs and character settings for the target language such as Japanese or English, we can specify lang in the redpen-conf attribute to override the default character settings.
The validators block specifies a setting file to add validators, and symbols block specifies the input language such as en, ja and the character setting file.
symbols block override the default symbol settings of target language.
The following is an example of main configuration file.
<redpen-conf lang="en">
<validators>
<validator name="SentenceLength">
<property name="max_len" value="200"/>
</validator>
<validator name="InvalidSymbol" />
<validator name="SpaceWithSymbol" />
<validator name="SectionLength">
<property name="max_num" value="2000"/>
</validator>
<validator name="ParagraphNumber" />
</validators>
<symbols>
<symbol name="EXCLAMATION_MARK" value="!" invalid-chars="!" after-space="true" />
<symbol name="LEFT_QUOTATION_MARK" value="\'" invalid-chars="“" before-space="true" />
</symbols>
</redpen-conf>
In the next section, we will see the configuration of validators. The character settings are described in the Setting symbols section.
Let’s go into the details of validator configuration.
Validator configuration¶
RedPen configuration file contains “validators” block for registrating Validators. If a user adds a validaor for one checking point into validator-conf.xml, then RedPen applies the added Validator to the input document.
The following is the sample validators block.
<validators>
<validator name="SentenceLength">
<property name="max_len" value="200"/>
</validator>
<validator name="InvalidSymbol" />
<validator name="SpaceWithSymbol" />
<validator name="SectionLength">
<property name="max_num" value="2000"/>
</validator>
<validator name="ParagraphNumber" />
</validators>
All configurations are surrounded by one “validator” block, which contains many inner component blocks. Each inner “component” block represents a validator, which checks one aspect of the input document. For instance, adding “SectionLength” component block into the configuration file, DocuemntValidator checks the length of sections in input documents.
As we see some components have “property” to configure the validator specific settings. For example, the “SectionLength” validator has maximum character number in one section. Some validator has sub-validators.
We will see the all the supported validators in the Supported Validators page.
Setting symbols¶
Default settings of symbols is provided by language, the target language is specified in the redpen-conf attribute, lang. RedPen supports defualt symbols for “en” and “ja”, which are described in English Default Setting and Japanese Default Setting.
To override defult setting of symbols defined for the target language, Users can add configure settings for characters and symbols with “symbols” block in the RedPen configuration file.
Default settings are described in the following sections. In the symbols configuration block, we add the symbols to use in the document. The symbols block has multiple symbol elements. “symbol” element overrides the character used in the written documents.
The following table is the properties of symbol element.
Property | Mandatory | Default Value | Description |
---|---|---|---|
name | true | none | Name of the symbol |
value | true | none | Value of the symbol |
before-space | false | false | Need space before the symbol |
after-space | false | false | Need space after the symbol |
invalid-chars | false | “” | List of invalid symbols |
Sample: Setting symbols¶
In the following setting, we can see that symbols has define 3 symbols. First element defines exlamation mark as ‘!’. Second element , FULL_STOP defines period as ”.” and in addition the sybmol need space after the period. Third element defines comma as ‘,’ and also define invalid symbols ‘、’ and ‘,’. Here invalid symbols represents the variations of the target symbol. For example, In japanese FULL_STOP can be not only ‘.’ but also ‘。’. If we registered invalid-chars, we can prevents the mixture usages of symbol variations.
<symbols>
<symbol name="EXCLAMATION_MARK" value="!" />
<symbol name="FULL_STOP" value="." after-space="true" />
<symbol name="COMMA" value="," invalid-chars="、," after-space="true" />
</symbols>
English Default Setting¶
The following table shows the default symbol settings for English and other latin based documents. In the table, first column shows the names of symbols, second colums (Value) shows the symbol character. Colums ‘NeedBeforeSpace’, ‘NeedAfterSpace’ ‘InvalidChars’ represent that the symbol should have space before or after the symbol and the invalid symbols respectively.
Symbol | Value | NeedBeforeSpace | NeedAfterSpace | InvalidChars | Description |
---|---|---|---|---|---|
FULL_STOP | ‘.’ | false | true | ‘.’, ‘。’ | Period of sentence |
SPACE | ‘ ‘ | false | false | ‘ ’ | White space between words |
EXCLAMATION_MARK | ‘!’ | false | true | ‘!’ | Exclamation mark |
NUMBER_SIGN | ‘#’ | false | false | ‘#’ | Number sign |
DOLLAR_SIGN | ‘$’ | false | false | ‘$’ | Dollar sign |
PERCENT_SIGN | ‘%’ | false | false | ‘%’ | Percent sign |
QUESTION_MARK | ‘?’ | false | true | ‘?’ | Question mark |
AMPERSAND | ‘&’ | false | true | ‘&’ | Ampersand |
LEFT_PARENTHESIS | ‘(‘ | true | false | ‘(’ | Left parenthesis |
RIGHT_PARENTHESIS | ‘)’ | false | true | ‘)’ | Right parenthesis |
ASTERISK | ‘*’ | false | false | ‘*’ | Asterrisk |
COMMA | ‘,’ | false | true | ‘、’,’,’ | Comma |
PLUS_SIGN | ‘+’ | false | false | ‘+’ | Plus sign |
HYPHEN_SIGN | ‘-‘ | false | false | ‘ー’ | Hyphenation |
SLASH | ‘/’ | false | false | ‘/’ | Slash |
COLON | ‘:’ | false | true | ‘:’ | Colon |
SEMICOLON | ‘;’ | false | true | ‘;’ | Semicolon |
LESS_THAN_SIGN | ‘<’ | false | false | ‘<’ | Less than sign |
GREATER_THAN_SIGN | ‘>’ | false | false | ‘>’ | Greater than sign |
EQUAL_SIGN | ‘=’ | false | false | ‘=’ | Equal sign |
AT_MARK | ‘@’ | false | false | ‘@’ | At mark |
LEFT_SQUARE_BRACKET | ‘[‘ | true | false | Left square bracket | |
RIGHT_SQUARE_BRACKET | ‘]’ | false | true | Right square bracket | |
BACKSLASH | ‘’ | false | false | Backslash | |
CIRCUMFLEX_ACCENT | ‘^’ | false | false | ‘^’ | Circumflex accent |
LOW_LINE | ‘_’ | false | false | ‘_’ | Low line (under bar) |
LEFT_CURLY_BRACKET | ‘{‘ | true | false | ‘{’ | Left curly bracket |
RIGHT_CURLY_BRACKET | ‘}’ | true | false | ‘}’ | Right curly bracket |
VERTICAL_VAR | ‘|’ | false | false | ‘|’ | Vertical bar |
TILDE | ‘~’ | false | false | ‘〜’ | Tilde |
LEFT_SINGLE_QUOTATION_MARK | ‘’‘ | false | false | Left single quotation mark | |
RIGHT_SINGLE_QUOTATION_MARK | ‘’‘ | false | false | Right single quotation mark | |
LEFT_DOUBLE_QUOTATION_MARK | ‘”’ | false | false | Left double quotation mark | |
RIGHT_DOUBLE_QUOTATION_MARK | ‘”’ | false | false | Right double quotation mark |
The symbol setting are made use of seveal Validators such as InvalidSymbol, and SpaceValidator. If users want to change the symbol configuration settings. Users can override the settings adding symbol element into the symbols block in the redpen configuration file.
Japanese Default Setting¶
The following table shows the default symbol settings for Japanese documents. In the table, first column shows the names of symbols, second colums (Value) shows the symbol. Colums ‘NeedBeforeSpace’, ‘NeedAfterSpace’ ‘InvalidChars’ represent that the symbol should have space before or after the symbol and the invalid symbols respectively.
Symbol | Value | NeedBeforeSpace | NeedAfterSpace | InvalidChars | Description |
---|---|---|---|---|---|
FULL_STOP | ‘。’ | false | false | ‘.’,’.’ | Period of sentence |
SPACE | ‘ ’ | false | false | White space between words | |
EXCLAMATION_MARK | ‘!’ | false | false | ‘!’ | Exclamation mark |
NUMBER_SIGN | ‘#’ | false | false | ‘#’ | Number sign |
DOLLAR_SIGN | ‘$’ | false | false | ‘$’ | Dollar sign |
PERCENT_SIGN | ‘%’ | false | false | ‘%’ | Percent sign |
QUESTION_MARK | ‘?’ | false | false | ‘?’ | Question mark |
AMPERSAND | ‘&’ | false | false | ‘&’ | Ampersand |
LEFT_PARENTHESIS | ‘(’ | false | false | ‘(‘ | Left parenthesis |
RIGHT_PARENTHESIS | ‘)’ | false | false | ‘)’ | Right parenthesis |
ASTERISK | ‘*’ | false | false | ‘*’ | Asterrisk |
COMMA | ‘、’ | false | false | ‘,’,’,’ | Comma |
PLUS_SIGN | ‘+’ | false | false | ‘+’ | Plus sign |
HYPHEN_SIGN | ‘ー’ | false | false | ‘-‘ | Hyphenation |
SLASH | ‘/’ | false | false | ‘/’ | Slash |
COLON | ‘:’ | false | false | ‘:’ | Colon |
SEMICOLON | ‘;’ | false | false | ‘;’ | Semicolon |
LESS_THAN_SIGN | ‘<’ | false | false | ‘<’ | Less than sign |
GREATER_THAN_SIGN | ‘>’ | false | false | ‘>’ | Greater than sign |
EQUAL_SIGN | ‘=’ | false | false | ‘=’ | Equal sign |
AT_MARK | ‘@’ | false | false | ‘@’ | At mark |
LEFT_SQUARE_BRACKET | ‘「’ | true | false | Left square bracket | |
RIGHT_SQUARE_BRACKET | ‘」’ | false | false | Right square bracket | |
BACKSLASH | ‘¥’ | false | false | Backslash | |
CIRCUMFLEX_ACCENT | ‘^’ | false | false | ‘^’ | Circumflex accent |
LOW_LINE | ‘_’ | false | false | ‘_’ | Low line (under bar) |
LEFT_CURLY_BRACKET | ‘{’ | true | false | ‘{‘ | Left curly bracket |
RIGHT_CURLY_BRACKET | ‘}’ | true | false | ‘}’ | Right curly bracket |
VERTICAL_VAR | ‘|’ | false | false | ‘|’ | Vertical bar |
TILDE | ‘〜’ | false | false | ‘~’ | Tilde |
LEFT_SINGLE_QUOTATION_MARK | ‘‘’ | false | false | Left single quotation mark | |
RIGHT_SINGLE_QUOTATION_MARK | ‘’’ | false | false | Right single quotation mark | |
LEFT_DOUBLE_QUOTATION_MARK | ‘“’ | false | false | Left double quotation mark | |
RIGHT_DOUBLE_QUOTATION_MARK | ‘”’ | false | false | Right double quotation mark |