Most visited

Recently visited

Added in API level 24

BreakIterator

public abstract class BreakIterator
extends Object implements Cloneable

java.lang.Object
   ↳ android.icu.text.BreakIterator


[ICU增强] ICU的替代BreakIterator ICU特有的方法,字段和其他功能标记为“ [icu] ”。

一个在文本中定位边界的类。 这个类定义了一个对象的协议,它根据一组标准分解一段自然语言文本。 例如,可以提供BreakIterator的实例或子类,以根据某种语言或一组语言的约定将一段文本分解为单词,句子或逻辑字符。 我们提供了五种内置类型的BreakIterator:

The text boundary positions are found according to the rules described in Unicode Standard Annex #29, Text Boundaries, and Unicode Standard Annex #14, Line Breaking Properties. These are available at http://www.unicode.org/reports/tr14/ and http://www.unicode.org/reports/tr29/.

BreakIterator的接口遵循一个“迭代器”模型(因此是名称),这意味着它具有“当前位置”的概念以及用于更新当前位置的方法,如first(),last(),next()和previous()。 所有BreakIterators都支持以下不变式:

BreakIterator accesses the text it analyzes through a CharacterIterator, which makes it possible to use BreakIterator to analyze text in any text-storage vehicle that provides a CharacterIterator interface. Note: Some types of BreakIterator can take a long time to create, and instances of BreakIterator are not currently cached by the system. For optimal performance, keep instances of BreakIterator around as long as makes sense. For example, when word-wrapping a document, don't create and destroy a new BreakIterator for each line. Create one break iterator for the whole document (or whatever stretch of text you're wrapping) and use it to do the whole job of wrapping the text.

例子

创建和使用文本边界

 public static void main(String args[]) {
      if (args.length == 1) {
          String stringToExamine = args[0];
          //print each word in order
          BreakIterator boundary = BreakIterator.getWordInstance();
          boundary.setText(stringToExamine);
          printEachForward(boundary, stringToExamine);
          //print each sentence in reverse order
          boundary = BreakIterator.getSentenceInstance(Locale.US);
          boundary.setText(stringToExamine);
          printEachBackward(boundary, stringToExamine);
          printFirst(boundary, stringToExamine);
          printLast(boundary, stringToExamine);
      }
 }
 
Print each element in order
 public static void printEachForward(BreakIterator boundary, String source) {
     int start = boundary.first();
     for (int end = boundary.next();
          end != BreakIterator.DONE;
          start = end, end = boundary.next()) {
          System.out.println(source.substring(start,end));
     }
 }
 
Print each element in reverse order
 public static void printEachBackward(BreakIterator boundary, String source) {
     int end = boundary.last();
     for (int start = boundary.previous();
          start != BreakIterator.DONE;
          end = start, start = boundary.previous()) {
         System.out.println(source.substring(start,end));
     }
 }
 
Print first element
 public static void printFirst(BreakIterator boundary, String source) {
     int start = boundary.first();
     int end = boundary.next();
     System.out.println(source.substring(start,end));
 }
 
Print last element
 public static void printLast(BreakIterator boundary, String source) {
     int end = boundary.last();
     int start = boundary.previous();
     System.out.println(source.substring(start,end));
 }
 
Print the element at a specified position
 public static void printAt(BreakIterator boundary, int pos, String source) {
     int end = boundary.following(pos);
     int start = boundary.previous();
     System.out.println(source.substring(start,end));
 }
 
Find the next word
 public static int nextWordStartAfter(int pos, String text) {
     BreakIterator wb = BreakIterator.getWordInstance();
     wb.setText(text);
     int last = wb.following(pos);
     int current = wb.next();
     while (current != BreakIterator.DONE) {
         for (int p = last; p < current; p++) {
             if (Character.isLetter(text.charAt(p)))
                 return last;
         }
         last = current;
         current = wb.next();
     }
     return BreakIterator.DONE;
 }
 
(The iterator returned by BreakIterator.getWordInstance() is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses a simple heuristic to determine which boundary is the beginning of a word: If the characters between this boundary and the next boundary include at least one letter (this can be an alphabetical letter, a CJK ideograph, a Hangul syllable, a Kana character, etc.), then the text between this boundary and the next is a word; otherwise, it's the material between words.)

也可以看看:

Summary

Constants

int DONE

在返回所有有效边界后,DONE由previous()和next()返回。

int KIND_CHARACTER

[ICU]

int KIND_LINE

[ICU]

int KIND_SENTENCE

[ICU]

int KIND_TITLE

[ICU]

int KIND_WORD

[ICU]

int WORD_IDEO

包含表意字符的单词的标记值,下限

int WORD_IDEO_LIMIT

包含表意字符的单词的标记值,上限

int WORD_KANA

包含假名字符的单词的标记值,下限

int WORD_KANA_LIMIT

包含假名字符的单词的标记值,上限

int WORD_LETTER

为包含字母的单词标记值,不包括平假名,片假名或表意字符,下限。

int WORD_LETTER_LIMIT

包含字母的单词的标记值,上限

int WORD_NONE

为不符合任何其他类别的“单词”标记值。

int WORD_NONE_LIMIT

未分类词语标签的上限。

int WORD_NUMBER

为看起来是数字的单词标记值,下限。

int WORD_NUMBER_LIMIT

为数字显示的单词标记值,上限。

Protected constructors

BreakIterator()

默认的构造函数。

Public methods

Object clone()

克隆方法。

abstract int current()

返回迭代器的当前位置。

abstract int first()

将迭代器设置为第一个边界位置。

abstract int following(int offset)

将迭代器的当前迭代位置设置为指定位置后面的第一个边界位置。

static Locale[] getAvailableLocales()

返回可以使用BreakIterator的语言环境列表。

static BreakIterator getCharacterInstance(ULocale where)

[icu]返回定位逻辑字符边界的BreakIterator的新实例。

static BreakIterator getCharacterInstance(Locale where)

返回定位逻辑字符边界的BreakIterator的新实例。

static BreakIterator getCharacterInstance()

返回定位逻辑字符边界的BreakIterator的新实例。

static BreakIterator getLineInstance(Locale where)

返回BreakIterator的新实例,该实例查找合法的换行位置。

static BreakIterator getLineInstance(ULocale where)

[icu]返回BreakIterator的一个新实例,用于查找合法的换行位置。

static BreakIterator getLineInstance()

返回BreakIterator的新实例,该实例查找合法的换行位置。

int getRuleStatus()

对于RuleBasedBreakIterators,从确定最近返回的中断位置的中断规则返回状态标记。

int getRuleStatusVec(int[] fillInArray)

对于RuleBasedBreakIterators,从确定最近返回的中断位置的中断规则中获取状态(标记)值。

static BreakIterator getSentenceInstance(Locale where)

返回定位句子边界的BreakIterator的新实例。

static BreakIterator getSentenceInstance(ULocale where)

[icu]返回定位句子边界的新的BreakIterator实例。

static BreakIterator getSentenceInstance()

返回定位句子边界的BreakIterator的新实例。

abstract CharacterIterator getText()

在正在分析的文本上返回一个CharacterIterator。

static BreakIterator getTitleInstance(Locale where)

[icu]返回定位标题边界的新的BreakIterator实例。

static BreakIterator getTitleInstance(ULocale where)

[icu]返回定位标题边界的新的BreakIterator实例。

static BreakIterator getTitleInstance()

[icu]返回定位标题边界的新的BreakIterator实例。

static BreakIterator getWordInstance(Locale where)

返回定位字边界的BreakIterator的新实例。

static BreakIterator getWordInstance(ULocale where)

[icu]返回定位字边界的BreakIterator的新实例。

static BreakIterator getWordInstance()

返回定位字边界的BreakIterator的新实例。

boolean isBoundary(int offset)

如果指定的位置是边界位置,则返回true。

abstract int last()

将迭代器设置为最后的边界位置。

abstract int next()

将迭代器推进一个边界。

abstract int next(int n)

将迭代器按文本中指定的步数移动。

int preceding(int offset)

将迭代器的当前迭代位置设置为指定位置之前的最后一个边界位置。

abstract int previous()

将迭代器向后移动一个边界。

void setText(String newText)

设置迭代器分析一段新文本。

abstract void setText(CharacterIterator newText)

设置迭代器分析一段新文本。

Inherited methods

From class java.lang.Object

Constants

DONE

Added in API level 24
int DONE

在返回所有有效边界后,DONE由previous()和next()返回。

常量值:-1(0xffffffff)

KIND_CHARACTER

Added in API level 24
int KIND_CHARACTER

[ICU]

常量值:0(0x00000000)

KIND_LINE

Added in API level 24
int KIND_LINE

[ICU]

常量值:2(0x00000002)

KIND_SENTENCE

Added in API level 24
int KIND_SENTENCE

[ICU]

常量值:3(0x00000003)

KIND_TITLE

Added in API level 24
int KIND_TITLE

[ICU]

常量值:4(0x00000004)

KIND_WORD

Added in API level 24
int KIND_WORD

[ICU]

常数值:1(0x00000001)

WORD_IDEO

Added in API level 24
int WORD_IDEO

包含表意字符的单词的标记值,下限

常量值:400(0x00000190)

WORD_IDEO_LIMIT

Added in API level 24
int WORD_IDEO_LIMIT

包含表意字符的单词的标记值,上限

常量值:500(0x000001f4)

WORD_KANA

Added in API level 24
int WORD_KANA

包含假名字符的单词的标记值,下限

常量值:300(0x0000012c)

WORD_KANA_LIMIT

Added in API level 24
int WORD_KANA_LIMIT

包含假名字符的单词的标记值,上限

常量值:400(0x00000190)

WORD_LETTER

Added in API level 24
int WORD_LETTER

为包含字母的单词标记值,不包括平假名,片假名或表意字符,下限。

常量值:200(0x000000c8)

WORD_LETTER_LIMIT

Added in API level 24
int WORD_LETTER_LIMIT

包含字母的单词的标记值,上限

常量值:300(0x0000012c)

WORD_NONE

Added in API level 24
int WORD_NONE

为不符合任何其他类别的“单词”标记值。 包括空格和大多数标点符号。

常量值:0(0x00000000)

WORD_NONE_LIMIT

Added in API level 24
int WORD_NONE_LIMIT

未分类词语标签的上限。

常量值:100(0x00000064)

WORD_NUMBER

Added in API level 24
int WORD_NUMBER

为看起来是数字的单词标记值,下限。

常量值:100(0x00000064)

WORD_NUMBER_LIMIT

Added in API level 24
int WORD_NUMBER_LIMIT

为数字显示的单词标记值,上限。

常量值:200(0x000000c8)

Protected constructors

BreakIterator

Added in API level 24
BreakIterator ()

默认的构造函数。 这个抽象基类没有任何状态。

Public methods

clone

Added in API level 24
Object clone ()

克隆方法。 创建另一个具有与此相同的行为和当前状态的BreakIterator。

Returns
Object The clone.

current

Added in API level 24
int current ()

返回迭代器的当前位置。

Returns
int The iterator's current position.

first

Added in API level 24
int first ()

将迭代器设置为第一个边界位置。 这总是这个迭代器迭代的文本的开始索引。 例如,如果迭代器遍历整个字符串,该函数将始终返回0。

Returns
int The character offset of the beginning of the stretch of text being broken.

following

Added in API level 24
int following (int offset)

将迭代器的当前迭代位置设置为指定位置后面的第一个边界位置。 (指定的位置本身是否是边界位置并不重要 - 该函数总是将迭代位置移动到指定位置之后的第一个边界。)如果指定位置是过去结束位置,则返回DONE 。

Parameters
offset int: The character position to start searching from.
Returns
int The position of the first boundary position following "offset" (whether or not "offset" itself is a boundary position), or DONE if "offset" is the past-the-end offset.

getAvailableLocales

Added in API level 24
Locale[] getAvailableLocales ()

返回可以使用BreakIterator的语言环境列表。

Returns
Locale[] An array of Locales. All of the locales in the array can be used when creating a BreakIterator.

getCharacterInstance

Added in API level 24
BreakIterator getCharacterInstance (ULocale where)

[icu]返回定位逻辑字符边界的BreakIterator的新实例。

Parameters
where ULocale: A Locale specifying the language of the text being analyzed.
Returns
BreakIterator A new instance of BreakIterator that locates logical-character boundaries.
Throws
NullPointerException if where is null.

getCharacterInstance

Added in API level 24
BreakIterator getCharacterInstance (Locale where)

返回定位逻辑字符边界的BreakIterator的新实例。

Parameters
where Locale: A Locale specifying the language of the text being analyzed.
Returns
BreakIterator A new instance of BreakIterator that locates logical-character boundaries.
Throws
NullPointerException if where is null.

getCharacterInstance

Added in API level 24
BreakIterator getCharacterInstance ()

返回定位逻辑字符边界的BreakIterator的新实例。 此函数假定要分析的文本是默认语言环境的语言。

Returns
BreakIterator A new instance of BreakIterator that locates logical-character boundaries.

getLineInstance

Added in API level 24
BreakIterator getLineInstance (Locale where)

返回BreakIterator的新实例,该实例查找合法的换行位置。

Parameters
where Locale: A Locale specifying the language of the text being broken.
Returns
BreakIterator A new instance of BreakIterator that locates legal line-wrapping positions.
Throws
NullPointerException if where is null.

getLineInstance

Added in API level 24
BreakIterator getLineInstance (ULocale where)

[icu]返回BreakIterator的一个新实例,用于查找合法的换行位置。

Parameters
where ULocale: A Locale specifying the language of the text being broken.
Returns
BreakIterator A new instance of BreakIterator that locates legal line-wrapping positions.
Throws
NullPointerException if where is null.

getLineInstance

Added in API level 24
BreakIterator getLineInstance ()

返回BreakIterator的新实例,该实例查找合法的换行位置。 这个函数假设被破坏的文本是默认语言环境的语言。

Returns
BreakIterator A new instance of BreakIterator that locates legal line-wrapping positions.

getRuleStatus

Added in API level 24
int getRuleStatus ()

对于RuleBasedBreakIterators,从确定最近返回的中断位置的中断规则返回状态标记。

对于不支持规则状态的break迭代器类型,返回默认值0。

Returns
int The status from the break rule that determined the most recently returned break position.

getRuleStatusVec

Added in API level 24
int getRuleStatusVec (int[] fillInArray)

对于RuleBasedBreakIterators,从确定最近返回的中断位置的中断规则中获取状态(标记)值。

对于不支持规则状态的break迭代器类型,不返回任何值。

如果输出数组的大小不足以保存数据,则输出将被截断为可用长度。 不会抛出异常。

Parameters
fillInArray int: an array to be filled in with the status values.
Returns
int The number of rule status values from rules that determined the most recent boundary returned by the break iterator. In the event that the array is too small, the return value is the total number of status values that were available, not the reduced number that were actually returned.

getSentenceInstance

Added in API level 24
BreakIterator getSentenceInstance (Locale where)

返回定位句子边界的BreakIterator的新实例。

Parameters
where Locale: A Locale specifying the language of the text being analyzed.
Returns
BreakIterator A new instance of BreakIterator that locates sentence boundaries.
Throws
NullPointerException if where is null.

getSentenceInstance

Added in API level 24
BreakIterator getSentenceInstance (ULocale where)

[icu]返回定位句子边界的新的BreakIterator实例。

Parameters
where ULocale: A Locale specifying the language of the text being analyzed.
Returns
BreakIterator A new instance of BreakIterator that locates sentence boundaries.
Throws
NullPointerException if where is null.

getSentenceInstance

Added in API level 24
BreakIterator getSentenceInstance ()

返回定位句子边界的BreakIterator的新实例。 这个函数假定被分析的文本是默认语言环境的语言。

Returns
BreakIterator A new instance of BreakIterator that locates sentence boundaries.

getText

Added in API level 24
CharacterIterator getText ()

在正在分析的文本上返回一个CharacterIterator。 对于BreakIterator的至少一些子类,这是对BreakIterator 使用实际迭代器的引用,因此此函数的返回值应视为const 当迭代器返回时,不保证此迭代器的当前位置。 如果您需要移动该位置来检查文本,请首先克隆该函数的返回值。

Returns
CharacterIterator A CharacterIterator over the text being analyzed.

getTitleInstance

Added in API level 24
BreakIterator getTitleInstance (Locale where)

[icu]返回定位标题边界的新的BreakIterator实例。 迭代器只返回定位标题边界,如仅用于Unicode 3.2所述。 对于Unicode 4.0以上的标题边界迭代,请使用Word边界迭代器。 getWordInstance()

Parameters
where Locale: A Locale specifying the language of the text being analyzed.
Returns
BreakIterator A new instance of BreakIterator that locates title boundaries.
Throws
NullPointerException if where is null.

getTitleInstance

Added in API level 24
BreakIterator getTitleInstance (ULocale where)

[icu]返回定位标题边界的新的BreakIterator实例。 迭代器只返回定位标题边界,如仅用于Unicode 3.2所述。 对于Unicode 4.0以上的标题边界迭代,请使用Word边界迭代器。 getWordInstance()

Parameters
where ULocale: A Locale specifying the language of the text being analyzed.
Returns
BreakIterator A new instance of BreakIterator that locates title boundaries.
Throws
NullPointerException if where is null.

getTitleInstance

Added in API level 24
BreakIterator getTitleInstance ()

[icu]返回定位标题边界的新的BreakIterator实例。 这个函数假定被分析的文本是默认语言环境的语言。 迭代器只返回定位标题边界,如仅用于Unicode 3.2所述。 对于Unicode 4.0以上的标题边界迭代,请使用字边界迭代器。 getWordInstance()

Returns
BreakIterator A new instance of BreakIterator that locates title boundaries.

getWordInstance

Added in API level 24
BreakIterator getWordInstance (Locale where)

返回定位字边界的BreakIterator的新实例。

Parameters
where Locale: A locale specifying the language of the text to be analyzed.
Returns
BreakIterator An instance of BreakIterator that locates word boundaries.
Throws
NullPointerException if where is null.

getWordInstance

Added in API level 24
BreakIterator getWordInstance (ULocale where)

[icu]返回定位字边界的BreakIterator的新实例。

Parameters
where ULocale: A locale specifying the language of the text to be analyzed.
Returns
BreakIterator An instance of BreakIterator that locates word boundaries.
Throws
NullPointerException if where is null.

getWordInstance

Added in API level 24
BreakIterator getWordInstance ()

返回定位字边界的BreakIterator的新实例。 此函数假定要分析的文本是默认语言环境的语言。

Returns
BreakIterator An instance of BreakIterator that locates word boundaries.

isBoundary

Added in API level 24
boolean isBoundary (int offset)

如果指定的位置是边界位置,则返回true。 如果该函数返回true,则将当前迭代位置设置为指定的位置; 如果该函数返回false,则当前迭代位置设置为与调用following()一样。

Parameters
offset int: the offset to check.
Returns
boolean True if "offset" is a boundary position.

last

Added in API level 24
int last ()

将迭代器设置为最后的边界位置。 这总是这个迭代器迭代的文本的“过去 - 结束”索引。 例如,如果迭代器遍历整个字符串(将其称为“text”),则此函数将始终返回text.length()。

Returns
int The character offset of the end of the stretch of text being broken.

next

Added in API level 24
int next ()

将迭代器推进一个边界。 当前迭代位置更新为指向当前位置之后的下一个边界位置,这也是返回的值。 如果当前位置等于last()返回的值或DONE,则此函数返回DONE并将当前位置设置为DONE。

Returns
int The position of the first boundary position following the iteration position.

next

Added in API level 24
int next (int n)

将迭代器按文本中指定的步数移动。 一个正数将迭代器向前移动; 负数将迭代器向后移动。 如果这导致迭代器离开文本的任何一端,这个函数返回DONE; 否则,该函数返回适当边界的位置。 调用此函数相当于调用next()或previous()n次。

Parameters
n int: The number of boundaries to advance over (if positive, moves forward; if negative, moves backwards).
Returns
int The position of the boundary n boundaries from the current iteration position, or DONE if moving n boundaries causes the iterator to advance off either end of the text.

preceding

Added in API level 24
int preceding (int offset)

将迭代器的当前迭代位置设置为指定位置之前的最后一个边界位置。 (指定的位置本身是否是边界位置并不重要 - 该函数总是将迭代位置移动到指定位置之前的最后边界。)如果指定的位置是起始位置,则返回DONE。

Parameters
offset int: The character position to start searching from.
Returns
int The position of the last boundary position preceding "offset" (whether of not "offset" itself is a boundary position), or DONE if "offset" is the starting offset of the iterator.

previous

Added in API level 24
int previous ()

将迭代器向后移动一个边界。 当前迭代位置被更新为指向当前位置之前的最后边界位置,并且这也是返回的值。 如果当前位置等于first()返回的值或DONE,则此函数返回DONE并将当前位置设置为DONE。

Returns
int The position of the last boundary position preceding the iteration position.

setText

Added in API level 24
void setText (String newText)

设置迭代器分析一段新文本。 新的文本片段作为字符串传入,并且当前的迭代位置被重置为字符串的开头。 (旧文本被删除。)

Parameters
newText String: A String containing the text to analyze with this BreakIterator.

setText

Added in API level 24
void setText (CharacterIterator newText)

设置迭代器分析一段新文本。 BreakIterator传递一个CharacterIterator,通过它可以访问文本本身。 当前迭代位置被重置为CharacterIterator的开始索引。 (旧的迭代器被删除。)

Parameters
newText CharacterIterator: A CharacterIterator referring to the text to analyze with this BreakIterator (the iterator's current position is ignored, but its other state is significant).

Hooray!