RedPen Server¶

The RedPen server delivers the majority of RedPen’s functionality via a simple HTTP REST API.

Starting the RedPen server¶

Please refer to the Commands page for details on how to start the RedPen server.

RedPen Server API¶

Configuration¶

/rest/config/redpens

Return the configuration for available, preconfigured redpens.

GET Parameters:

lang=xx restricts the returned configurations to those that match the specified language. By default, all configurations are returned.

The JSON response is as follows:

{
  "version": "1.1.2",
  "documentParsers": ["PLAIN", "MARKDOWN", "WIKI"],
  "redpens": {
     "en": {
      "lang": "en",
      "tokenizer": "cc.redpen.tokenizer.WhiteSpaceTokenizer",
      "validators": {
         "CommaNumber": { "languages": [], "properties": {} },
         "Contraction": { "languages": ["en"], "properties": {} },
         "DoubledWord": { "languages": [], "properties": {} },
         "EndOfSentence": { "languages": ["en"], "properties": {} },
         "InvalidExpression": { "languages": [], "properties": {} },
         "InvalidSymbol": { "languages": [], "properties": {} },
         "InvalidWord": { "languages": ["en"], "properties": {} },
         "ParagraphNumber": { "languages": [], "properties": {} },
         "Quotation": { "languages": ["en"], "properties": {} },
         "SectionLength": { "languages": [], "properties": {"max_char_num": "2000"} },
         "SentenceLength": { "languages": [], "properties": {"max_len": "200"} },
         "SpaceBetweenAlphabeticalWord": { "languages": [], "properties": {} },
         "Spelling": { "languages": [], "properties": {} },
         "StartWithCapitalLetter": { "languages": ["en"], "properties": {} },
         "SuccessiveWord": { "languages": [], "properties": {} },
         "SymbolWithSpace": { "languages": [], "properties": {} },
         "WordNumber": { "languages": [], "properties": {} }
      }
    },
    "ja": {
      "lang": "ja",
      "tokenizer": "cc.redpen.tokenizer.JapaneseTokenizer",
      "validators": {
         "CommaNumber": { "languages": [], "properties": {} },
         "DoubledWord": { "languages": [], "properties": {} },
         "HankakuKana": { "languages": ["ja"], "properties": {} },
         "InvalidSymbol": { "languages": [], "properties": {} },
         "KatakanaEndHyphen": { "languages": ["ja"], "properties": {} },
         "KatakanaSpellCheck": { "languages": ["ja"], "properties": {} },
         "ParagraphNumber": { "languages": [], "properties": {} },
         "SectionLength": { "languages": [], "properties": {"max_num": "1500"} },
         "SentenceLength": { "languages": [], "properties": {"max_len": "100"} },
         "SpaceBetweenAlphabeticalWord": { "languages": [], "properties": {} },
         "SuccessiveWord": { "languages": [], "properties": {} }
      }
    }
  }
}

The version property indicates the version of RedPen.
The documentParsers array contains all supported document parsers
The redpens object shows the available pre-configured redpens and how they are configured. Within each object:
- lang specifies the language the redpen is designed for
- tokenizer specifies the tokenizer class used by the redpen
- validators shows which validators are configured within the redpen. This object is in a format suitable for the document/validate/json request below. For each validator:
  The languages array indicates which languages for which the validator is suitable. An empty array indicates all languages.
  
  The properties object specifies the currently configured properties for this validator, as described in Supported Validators

Document Validation¶

/document/validate

This POST request validates a document and returns the errors.

POST Parameters:

document contains the text of the document RedPen is to validate
documentParser specifies which parser should be used to parse the document. Valid options are:
- PLAIN
- MARKDOWN
- WIKI
lang specifies the language used to tokenize the document. Currently, values of ja (Japanese) and en (English/Whitespace) are supported.
The optional format field determines the format for the results. It can be one of json (the default), json2, plain, plain2 or xml.
The optional config field contains the contents of a RedPen XML configuration file

Examples using curl and document/validate

$ curl --data document="Twas brillig and the slithy toves did gyre and gimble in the wabe" \
     --data lang=en --data format=PLAIN2 \
     --data config="`cat ./redpen-server/target/classes/conf/redpen-conf.xml`" \
     localhost:8080/rest/document/validate/
Line: 1, Offset: 0
    Sentence: Twas brillig and the slithy toves did gyre and gimble in the wabe
        Spelling: Found possibly misspelled word "brillig".
        Spelling: Found possibly misspelled word "slithy".
        Spelling: Found possibly misspelled word "toves".
        Spelling: Found possibly misspelled word "gyre".
        Spelling: Found possibly misspelled word "gimble".
        Spelling: Found possibly misspelled word "wabe".
        DoubledWord: Found repeated word "and".

$ curl -s --data document="古池や,蛙飛び込む水の音" \
          --data config="`cat ./redpen-server/target/classes/conf/redpen-conf-ja.xml`" \
          localhost:8080/rest/document/validate/ | json_reformat
{
    "errors": [
        {
            "sentence": "古池や,蛙飛び込む水の音",
            "endPosition": {
                "offset": 4,
                "lineNum": 1
            },
            "validator": "InvalidSymbol",
            "lineNum": 1,
            "sentenceStartColumnNum": 0,
            "message": "Found invalid symbol \",\".",
            "startPosition": {
                "offset": 3,
                "lineNum": 1
            }
        }
    ]
}

/document/validate/json

This POST request processes a redpen validation request, specified in JSON, and returns redpen errors in a supported RedPen format.

Request format:

{
  "document": "Theyre is a blak rownd borl.",
  "format": "json2",
  "documentParser": "PLAIN",
  "config": {
    "lang": "en",
    "validators": {
      "CommaNumber": {},
      "Contraction": {},
      "DoubledWord": {},
      "EndOfSentence": {},
      "InvalidExpression": {},
      "InvalidSymbol": {},
      "InvalidWord": {},
      "ParagraphNumber": {},
      "Quotation": {},
      "SectionLength": {
        "properties": {
          "max_char_num": "2000"
        }
      },
      "SentenceLength": {
        "properties": {
          "max_len": "200"
        }
      },
      "SpaceBetweenAlphabeticalWord": {},
      "Spelling": {},
      "StartWithCapitalLetter": {},
      "SuccessiveWord": {},
      "SymbolWithSpace": {},
      "WordNumber": {}
    },
    "symbols": {
      "AMPERSAND": {
        "after_space": false,
        "before_space": true,
        "invalid_chars": "＆",
        "value": "&"
      },
      "ASTERISK": {
        "after_space": true,
        "before_space": true,
        "invalid_chars": "＊",
        "value": "*"
      }
    }
  }
}

The document property specifies the text of the document to validate
The documentParser property should contain the name of a valid RedPen documentparser (ie: PLAIN, MARKDOWN or WIKI)
The format property determines the format for the results. It can be one of json, json2, plain, plain2 or xml.
The config object specifies the validator configuration for the request. This consists of:
- A config object, consisting of a series of objects that are named after a RedPen validator. If the object is present, the validator will be configured. Within this named object, a properties object can be used to set the name and values of any property used by the validator, as described in Supported Validators
- The lang property indicates the language of the document. It determines how the document will be tokenized by RedPen.
- A symbols object containing overridden symbols, as described in Configuration. Each entry must be a validate symbol name, and can contain the following elements:
  
  value specifies the Symbol’s value
  
  invalid_chars is a string of invalid alternatives for this Symbol
  
  before_space and after_space specify if a space is required before or after the Symbol.

Response (json2 format):

{
  "errors": [
    {
      "sentence": "Theyre is a blak rownd borl.",
      "position": {
        "start": {
          "offset": 0,
          "line": 1
        },
        "end": {
          "offset": 27,
          "line": 1
        }
      },
      "errors": [
        {
          "subsentence": {
            "offset": 0,
            "length": 6
          },
          "validator": "Spelling",
          "position": {
            "start": {
              "offset": 0,
              "line": 1
            },
            "end": {
              "offset": 6,
              "line": 1
            }
          },
          "message": "Found possibly misspelled word \"Theyre\"."
        },
        {
          "subsentence": {
            "offset": 12,
            "length": 4
          },
          "validator": "Spelling",
          "position": {
            "start": {
              "offset": 12,
              "line": 1
            },
            "end": {
              "offset": 16,
              "line": 1
            }
          },
          "message": "Found possibly misspelled word \"blak\"."
        },
        {
          "subsentence": {
            "offset": 17,
            "length": 5
          },
          "validator": "Spelling",
          "position": {
            "start": {
              "offset": 17,
              "line": 1
            },
            "end": {
              "offset": 22,
              "line": 1
            }
          },
          "message": "Found possibly misspelled word \"rownd\"."
        },
        {
          "subsentence": {
            "offset": 23,
            "length": 4
          },
          "validator": "Spelling",
          "position": {
            "start": {
              "offset": 23,
              "line": 1
            },
            "end": {
              "offset": 27,
              "line": 1
            }
          },
          "message": "Found possibly misspelled word \"borl\"."
        }
      ]
    }
  ]
}

Some examples using curl and document/validate/json

$ curl -s --data "document=fish and chips" http://localhost:8080/rest/document/validate | json_reformat
{
    "errors": [
        {
            "sentence": "fish and chips",
            "validator": "StartWithCapitalLetter",
            "lineNum": 1,
            "sentenceStartColumnNum": 0,
            "message": "Sentence starts with a lowercase character \"f\"."
        }
    ]
}

$ curl -s --data "document=ここはどこでうか?&lang=ja&" http://localhost:8080/rest/document/validate | json_reformat
{
    "errors": [
        {
            "sentence": "ここはどこでうか?",
            "endPosition": {
                "offset": 9,
                "lineNum": 1
            },
            "validator": "InvalidSymbol",
            "lineNum": 1,
            "sentenceStartColumnNum": 0,
            "message": "Found invalid symbol \"?\".",
            "startPosition": {
                "offset": 8,
                "lineNum": 1
            }
        }
    ]
}

$ curl -s --data "document=# Markdown Test%0A%0ASpellink Errah&lang=en&documentParser=MARKDOWN" http://localhost:8080/rest/document/validate | json_reformat
{
    "errors": [
        {
            "sentence": "Spellink Errah",
            "endPosition": {
                "offset": 8,
                "lineNum": 3
            },
            "validator": "Spelling",
            "lineNum": 3,
            "sentenceStartColumnNum": 0,
            "message": "Found possibly misspelled word \"Spellink\".",
            "startPosition": {
                "offset": 0,
                "lineNum": 3
            }
        },
        {
            "sentence": "Spellink Errah",
            "endPosition": {
                "offset": 14,
                "lineNum": 3
            },
            "validator": "Spelling",
            "lineNum": 3,
            "sentenceStartColumnNum": 0,
            "message": "Found possibly misspelled word \"Errah\".",
            "startPosition": {
                "offset": 9,
                "lineNum": 3
            }
        }
    ]
}

curl -s -H "Content-Type: application/json" \
     --data '{document:"fisch and chipps",format:"plain",config:{Spelling:{},SentenceLength:{properties:{max_len:6}}}}' \
     http://localhost:8080/rest/document/validate/json
1: ValidationError[Spelling], Found possibly misspelled word "fisch". at line: fisch and chipps
1: ValidationError[Spelling], Found possibly misspelled word "chipps". at line: fisch and chipps
1: ValidationError[SentenceLength], The length of the sentence (16) exceeds the maximum of 6. at line: fisch and chipps

RedPen Server¶

Starting the RedPen server¶

RedPen Server API¶

Configuration¶

Document Validation¶

Table Of Contents

Search