Most visited

Recently visited

Results for

Added in API level 24

Summary: Constants | Protected Ctors | Methods | Inherited Methods | [Expand All]

BreakIterator

public abstract class BreakIterator
extends Object implements Cloneable

java.lang.Object
↳	android.icu.text.BreakIterator

[ICU增强] ICU的替代BreakIterator 。 ICU特有的方法，字段和其他功能标记为“ [icu] ”。

一个在文本中定位边界的类。这个类定义了一个对象的协议，它根据一组标准分解一段自然语言文本。例如，可以提供BreakIterator的实例或子类，以根据某种语言或一组语言的约定将一段文本分解为单词，句子或逻辑字符。我们提供了五种内置类型的BreakIterator：

getTitleInstance() returns a BreakIterator that locates boundaries between title breaks.
getSentenceInstance() returns a BreakIterator that locates boundaries between sentences. This is useful for triple-click selection, for example.
getWordInstance() returns a BreakIterator that locates boundaries between words. This is useful for double-click selection or "find whole words" searches. This type of BreakIterator makes sure there is a boundary position at the beginning and end of each legal word. (Numbers count as words, too.) Whitespace and punctuation are kept separate from real words.
getLineInstance() returns a BreakIterator that locates positions where it is legal for a text editor to wrap lines. This is similar to word breaking, but not the same: punctuation and whitespace are generally kept with words (you don't want a line to start with whitespace, for example), and some special characters can force a position to be considered a line-break position or prevent a position from being a line-break position.
getCharacterInstance() returns a BreakIterator that locates boundaries between logical characters. Because of the structure of the Unicode encoding, a logical character may be stored internally as more than one Unicode code point. (A with an umlaut may be stored as an a followed by a separate combining umlaut character, for example, but the user still thinks of it as one character.) This iterator allows various processes (especially text editors) to treat as characters the units of text that a user would think of as characters, rather than the units of text that the computer sees as "characters".

The text boundary positions are found according to the rules described in Unicode Standard Annex #29, Text Boundaries, and Unicode Standard Annex #14, Line Breaking Properties. These are available at http://www.unicode.org/reports/tr14/ and http://www.unicode.org/reports/tr29/.

BreakIterator的接口遵循一个“迭代器”模型（因此是名称），这意味着它具有“当前位置”的概念以及用于更新当前位置的方法，如first（），last（），next（）和previous（）。所有BreakIterators都支持以下不变式：

The beginning and end of the text are always treated as boundary positions.
The current position of the iterator is always a boundary position (random- access methods move the iterator to the nearest boundary position before or after the specified position, not _to_ the specified position).
DONE is used as a flag to indicate when iteration has stopped. DONE is only returned when the current position is the end of the text and the user calls next(), or when the current position is the beginning of the text and the user calls previous().
Break positions are numbered by the positions of the characters that follow them. Thus, under normal circumstances, the position before the first character is 0, the position after the first character is 1, and the position after the last character is 1 plus the length of the string.
The client can change the position of an iterator, or the text it analyzes, at will, but cannot change the behavior. If the user wants different behavior, he must instantiate a new iterator.

BreakIterator accesses the text it analyzes through a CharacterIterator, which makes it possible to use BreakIterator to analyze text in any text-storage vehicle that provides a CharacterIterator interface. Note: Some types of BreakIterator can take a long time to create, and instances of BreakIterator are not currently cached by the system. For optimal performance, keep instances of BreakIterator around as long as makes sense. For example, when word-wrapping a document, don't create and destroy a new BreakIterator for each line. Create one break iterator for the whole document (or whatever stretch of text you're wrapping) and use it to do the whole job of wrapping the text.

例子：

创建和使用文本边界

 public static void main(String args[]) {
      if (args.length == 1) {
          String stringToExamine = args[0];
          //print each word in order
          BreakIterator boundary = BreakIterator.getWordInstance();
          boundary.setText(stringToExamine);
          printEachForward(boundary, stringToExamine);
          //print each sentence in reverse order
          boundary = BreakIterator.getSentenceInstance(Locale.US);
          boundary.setText(stringToExamine);
          printEachBackward(boundary, stringToExamine);
          printFirst(boundary, stringToExamine);
          printLast(boundary, stringToExamine);
      }
 }

Print each element in order

 public static void printEachForward(BreakIterator boundary, String source) {
     int start = boundary.first();
     for (int end = boundary.next();
          end != BreakIterator.DONE;
          start = end, end = boundary.next()) {
          System.out.println(source.substring(start,end));
     }
 }

Print each element in reverse order

 public static void printEachBackward(BreakIterator boundary, String source) {
     int end = boundary.last();
     for (int start = boundary.previous();
          start != BreakIterator.DONE;
          end = start, start = boundary.previous()) {
         System.out.println(source.substring(start,end));
     }
 }

Print first element

 public static void printFirst(BreakIterator boundary, String source) {
     int start = boundary.first();
     int end = boundary.next();
     System.out.println(source.substring(start,end));
 }

Print last element

 public static void printLast(BreakIterator boundary, String source) {
     int end = boundary.last();
     int start = boundary.previous();
     System.out.println(source.substring(start,end));
 }

Print the element at a specified position

 public static void printAt(BreakIterator boundary, int pos, String source) {
     int end = boundary.following(pos);
     int start = boundary.previous();
     System.out.println(source.substring(start,end));
 }

Find the next word

 public static int nextWordStartAfter(int pos, String text) {
     BreakIterator wb = BreakIterator.getWordInstance();
     wb.setText(text);
     int last = wb.following(pos);
     int current = wb.next();
     while (current != BreakIterator.DONE) {
         for (int p = last; p < current; p++) {
             if (Character.isLetter(text.charAt(p)))
                 return last;
         }
         last = current;
         current = wb.next();
     }
     return BreakIterator.DONE;
 }
 
(The iterator returned by BreakIterator.getWordInstance() is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses a simple heuristic to determine which boundary is the beginning of a word: If the characters between this boundary and the next boundary include at least one letter (this can be an alphabetical letter, a CJK ideograph, a Hangul syllable, a Kana character, etc.), then the text between this boundary and the next is a word; otherwise, it's the material between words.)

也可以看看：

CharacterIterator

Summary

Constants
`int`	`DONE` 在返回所有有效边界后，DONE由previous（）和next（）返回。
`int`	`KIND_CHARACTER` [ICU]
`int`	`KIND_LINE` [ICU]
`int`	`KIND_SENTENCE` [ICU]
`int`	`KIND_TITLE` [ICU]
`int`	`KIND_WORD` [ICU]
`int`	`WORD_IDEO` 包含表意字符的单词的标记值，下限
`int`	`WORD_IDEO_LIMIT` 包含表意字符的单词的标记值，上限
`int`	`WORD_KANA` 包含假名字符的单词的标记值，下限
`int`	`WORD_KANA_LIMIT` 包含假名字符的单词的标记值，上限
`int`	`WORD_LETTER` 为包含字母的单词标记值，不包括平假名，片假名或表意字符，下限。
`int`	`WORD_LETTER_LIMIT` 包含字母的单词的标记值，上限
`int`	`WORD_NONE` 为不符合任何其他类别的“单词”标记值。
`int`	`WORD_NONE_LIMIT` 未分类词语标签的上限。
`int`	`WORD_NUMBER` 为看起来是数字的单词标记值，下限。
`int`	`WORD_NUMBER_LIMIT` 为数字显示的单词标记值，上限。

Protected constructors
`BreakIterator()` 默认的构造函数。

Public methods
`Object`	`clone()` 克隆方法。
`abstract int`	`current()` 返回迭代器的当前位置。
`abstract int`	`first()` 将迭代器设置为第一个边界位置。
`abstract int`	`following(int offset)` 将迭代器的当前迭代位置设置为指定位置后面的第一个边界位置。
`static Locale[]`	`getAvailableLocales()` 返回可以使用BreakIterator的语言环境列表。
`static BreakIterator`	`getCharacterInstance(ULocale where)` [icu]返回定位逻辑字符边界的BreakIterator的新实例。
`static BreakIterator`	`getCharacterInstance(Locale where)` 返回定位逻辑字符边界的BreakIterator的新实例。
`static BreakIterator`	`getCharacterInstance()` 返回定位逻辑字符边界的BreakIterator的新实例。
`static BreakIterator`	`getLineInstance(Locale where)` 返回BreakIterator的新实例，该实例查找合法的换行位置。
`static BreakIterator`	`getLineInstance(ULocale where)` [icu]返回BreakIterator的一个新实例，用于查找合法的换行位置。
`static BreakIterator`	`getLineInstance()` 返回BreakIterator的新实例，该实例查找合法的换行位置。
`int`	`getRuleStatus()` 对于RuleBasedBreakIterators，从确定最近返回的中断位置的中断规则返回状态标记。
`int`	`getRuleStatusVec(int[] fillInArray)` 对于RuleBasedBreakIterators，从确定最近返回的中断位置的中断规则中获取状态（标记）值。
`static BreakIterator`	`getSentenceInstance(Locale where)` 返回定位句子边界的BreakIterator的新实例。
`static BreakIterator`	`getSentenceInstance(ULocale where)` [icu]返回定位句子边界的新的BreakIterator实例。
`static BreakIterator`	`getSentenceInstance()` 返回定位句子边界的BreakIterator的新实例。
`abstract CharacterIterator`	`getText()` 在正在分析的文本上返回一个CharacterIterator。
`static BreakIterator`	`getTitleInstance(Locale where)` [icu]返回定位标题边界的新的BreakIterator实例。
`static BreakIterator`	`getTitleInstance(ULocale where)` [icu]返回定位标题边界的新的BreakIterator实例。
`static BreakIterator`	`getTitleInstance()` [icu]返回定位标题边界的新的BreakIterator实例。
`static BreakIterator`	`getWordInstance(Locale where)` 返回定位字边界的BreakIterator的新实例。
`static BreakIterator`	`getWordInstance(ULocale where)` [icu]返回定位字边界的BreakIterator的新实例。
`static BreakIterator`	`getWordInstance()` 返回定位字边界的BreakIterator的新实例。
`boolean`	`isBoundary(int offset)` 如果指定的位置是边界位置，则返回true。
`abstract int`	`last()` 将迭代器设置为最后的边界位置。
`abstract int`	`next()` 将迭代器推进一个边界。
`abstract int`	`next(int n)` 将迭代器按文本中指定的步数移动。
`int`	`preceding(int offset)` 将迭代器的当前迭代位置设置为指定位置之前的最后一个边界位置。
`abstract int`	`previous()` 将迭代器向后移动一个边界。
`void`	`setText(String newText)` 设置迭代器分析一段新文本。
`abstract void`	`setText(CharacterIterator newText)` 设置迭代器分析一段新文本。

Inherited methods

From class java.lang.Object

Constants

DONE

Added in API level 24

int DONE

在返回所有有效边界后，DONE由previous（）和next（）返回。

常量值：-1（0xffffffff）

KIND_CHARACTER

Added in API level 24

int KIND_CHARACTER

[ICU]

常量值：0（0x00000000）

KIND_LINE

Added in API level 24

int KIND_LINE

[ICU]

常量值：2（0x00000002）

KIND_SENTENCE

Added in API level 24

int KIND_SENTENCE

[ICU]

常量值：3（0x00000003）

KIND_TITLE

Added in API level 24

int KIND_TITLE

[ICU]

常量值：4（0x00000004）

KIND_WORD

Added in API level 24

int KIND_WORD

[ICU]

常数值：1（0x00000001）

WORD_IDEO

Added in API level 24

int WORD_IDEO

包含表意字符的单词的标记值，下限

常量值：400（0x00000190）

WORD_IDEO_LIMIT

Added in API level 24

int WORD_IDEO_LIMIT

包含表意字符的单词的标记值，上限

常量值：500（0x000001f4）

WORD_KANA

Added in API level 24

int WORD_KANA

包含假名字符的单词的标记值，下限

常量值：300（0x0000012c）

WORD_KANA_LIMIT

Added in API level 24

int WORD_KANA_LIMIT

包含假名字符的单词的标记值，上限

常量值：400（0x00000190）

WORD_LETTER

Added in API level 24

int WORD_LETTER

为包含字母的单词标记值，不包括平假名，片假名或表意字符，下限。

常量值：200（0x000000c8）

WORD_LETTER_LIMIT

Added in API level 24

int WORD_LETTER_LIMIT

包含字母的单词的标记值，上限

常量值：300（0x0000012c）

WORD_NONE

Added in API level 24

int WORD_NONE

为不符合任何其他类别的“单词”标记值。包括空格和大多数标点符号。

常量值：0（0x00000000）

WORD_NONE_LIMIT

Added in API level 24

int WORD_NONE_LIMIT

未分类词语标签的上限。

常量值：100（0x00000064）

WORD_NUMBER

Added in API level 24

int WORD_NUMBER

为看起来是数字的单词标记值，下限。

常量值：100（0x00000064）

WORD_NUMBER_LIMIT

Added in API level 24

int WORD_NUMBER_LIMIT

为数字显示的单词标记值，上限。

常量值：200（0x000000c8）

Protected constructors

BreakIterator

Added in API level 24

BreakIterator ()

默认的构造函数。这个抽象基类没有任何状态。

Public methods

clone

Added in API level 24

Object clone ()

克隆方法。创建另一个具有与此相同的行为和当前状态的BreakIterator。

Returns
`Object`	The clone.

current

Added in API level 24

int current ()

返回迭代器的当前位置。

Returns
`int`	The iterator's current position.

first

Added in API level 24

int first ()

将迭代器设置为第一个边界位置。这总是这个迭代器迭代的文本的开始索引。例如，如果迭代器遍历整个字符串，该函数将始终返回0。

Returns
`int`	The character offset of the beginning of the stretch of text being broken.

following

Added in API level 24

int following (int offset)

将迭代器的当前迭代位置设置为指定位置后面的第一个边界位置。（指定的位置本身是否是边界位置并不重要 - 该函数总是将迭代位置移动到指定位置之后的第一个边界。）如果指定位置是过去结束位置，则返回DONE 。

Parameters
`offset`	`int`: The character position to start searching from.

Returns
`int`	The position of the first boundary position following "offset" (whether or not "offset" itself is a boundary position), or DONE if "offset" is the past-the-end offset.

getAvailableLocales

Added in API level 24

Locale[] getAvailableLocales ()

返回可以使用BreakIterator的语言环境列表。

Returns
`Locale[]`	An array of Locales. All of the locales in the array can be used when creating a BreakIterator.

getCharacterInstance

Added in API level 24

BreakIterator getCharacterInstance (ULocale where)

[icu]返回定位逻辑字符边界的BreakIterator的新实例。

Parameters
`where`	`ULocale`: A Locale specifying the language of the text being analyzed.

Returns
`BreakIterator`	A new instance of BreakIterator that locates logical-character boundaries.

Throws
`NullPointerException`	if `where` is null.

getCharacterInstance

Added in API level 24

BreakIterator getCharacterInstance (Locale where)

返回定位逻辑字符边界的BreakIterator的新实例。

Parameters
`where`	`Locale`: A Locale specifying the language of the text being analyzed.

Returns
`BreakIterator`	A new instance of BreakIterator that locates logical-character boundaries.

Throws
`NullPointerException`	if `where` is null.

getCharacterInstance

Added in API level 24

BreakIterator getCharacterInstance ()

返回定位逻辑字符边界的BreakIterator的新实例。此函数假定要分析的文本是默认语言环境的语言。

Returns
`BreakIterator`	A new instance of BreakIterator that locates logical-character boundaries.

getLineInstance

Added in API level 24

BreakIterator getLineInstance (Locale where)

返回BreakIterator的新实例，该实例查找合法的换行位置。

Parameters
`where`	`Locale`: A Locale specifying the language of the text being broken.

Returns
`BreakIterator`	A new instance of BreakIterator that locates legal line-wrapping positions.

Throws
`NullPointerException`	if `where` is null.

getLineInstance

Added in API level 24

BreakIterator getLineInstance (ULocale where)

[icu]返回BreakIterator的一个新实例，用于查找合法的换行位置。

Parameters
`where`	`ULocale`: A Locale specifying the language of the text being broken.

Returns
`BreakIterator`	A new instance of BreakIterator that locates legal line-wrapping positions.

Throws
`NullPointerException`	if `where` is null.

getLineInstance

Added in API level 24

BreakIterator getLineInstance ()

返回BreakIterator的新实例，该实例查找合法的换行位置。这个函数假设被破坏的文本是默认语言环境的语言。

Returns
`BreakIterator`	A new instance of BreakIterator that locates legal line-wrapping positions.

getRuleStatus

Added in API level 24

int getRuleStatus ()

对于RuleBasedBreakIterators，从确定最近返回的中断位置的中断规则返回状态标记。

对于不支持规则状态的break迭代器类型，返回默认值0。

Returns
`int`	The status from the break rule that determined the most recently returned break position.

getRuleStatusVec

Added in API level 24

int getRuleStatusVec (int[] fillInArray)

对于RuleBasedBreakIterators，从确定最近返回的中断位置的中断规则中获取状态（标记）值。

对于不支持规则状态的break迭代器类型，不返回任何值。

如果输出数组的大小不足以保存数据，则输出将被截断为可用长度。不会抛出异常。

Parameters
`fillInArray`	`int`: an array to be filled in with the status values.

Returns
`int`	The number of rule status values from rules that determined the most recent boundary returned by the break iterator. In the event that the array is too small, the return value is the total number of status values that were available, not the reduced number that were actually returned.

getSentenceInstance

Added in API level 24

BreakIterator getSentenceInstance (Locale where)

返回定位句子边界的BreakIterator的新实例。

Parameters
`where`	`Locale`: A Locale specifying the language of the text being analyzed.

Returns
`BreakIterator`	A new instance of BreakIterator that locates sentence boundaries.

Throws
`NullPointerException`	if `where` is null.

getSentenceInstance

Added in API level 24

BreakIterator getSentenceInstance (ULocale where)

[icu]返回定位句子边界的新的BreakIterator实例。

Parameters
`where`	`ULocale`: A Locale specifying the language of the text being analyzed.

Returns
`BreakIterator`	A new instance of BreakIterator that locates sentence boundaries.

Throws
`NullPointerException`	if `where` is null.

getSentenceInstance

Added in API level 24

BreakIterator getSentenceInstance ()

返回定位句子边界的BreakIterator的新实例。这个函数假定被分析的文本是默认语言环境的语言。

Returns
`BreakIterator`	A new instance of BreakIterator that locates sentence boundaries.

getText

Added in API level 24

CharacterIterator getText ()

在正在分析的文本上返回一个CharacterIterator。对于BreakIterator的至少一些子类，这是对BreakIterator 使用的实际迭代器的引用，因此此函数的返回值应视为const 。当迭代器返回时，不保证此迭代器的当前位置。如果您需要移动该位置来检查文本，请首先克隆该函数的返回值。

Returns
`CharacterIterator`	A CharacterIterator over the text being analyzed.

getTitleInstance

Added in API level 24

BreakIterator getTitleInstance (Locale where)

[icu]返回定位标题边界的新的BreakIterator实例。迭代器只返回定位标题边界，如仅用于Unicode 3.2所述。对于Unicode 4.0以上的标题边界迭代，请使用Word边界迭代器。 getWordInstance()

Parameters
`where`	`Locale`: A Locale specifying the language of the text being analyzed.

Returns
`BreakIterator`	A new instance of BreakIterator that locates title boundaries.

Throws
`NullPointerException`	if `where` is null.

getTitleInstance

Added in API level 24

BreakIterator getTitleInstance (ULocale where)

Parameters
`where`	`ULocale`: A Locale specifying the language of the text being analyzed.

Returns
`BreakIterator`	A new instance of BreakIterator that locates title boundaries.

Throws
`NullPointerException`	if `where` is null.

getTitleInstance

Added in API level 24

BreakIterator getTitleInstance ()

[icu]返回定位标题边界的新的BreakIterator实例。这个函数假定被分析的文本是默认语言环境的语言。迭代器只返回定位标题边界，如仅用于Unicode 3.2所述。对于Unicode 4.0以上的标题边界迭代，请使用字边界迭代器。 getWordInstance()

Returns
`BreakIterator`	A new instance of BreakIterator that locates title boundaries.

getWordInstance

Added in API level 24

BreakIterator getWordInstance (Locale where)

返回定位字边界的BreakIterator的新实例。

Parameters
`where`	`Locale`: A locale specifying the language of the text to be analyzed.

Returns
`BreakIterator`	An instance of BreakIterator that locates word boundaries.

Throws
`NullPointerException`	if `where` is null.

getWordInstance

Added in API level 24

BreakIterator getWordInstance (ULocale where)

[icu]返回定位字边界的BreakIterator的新实例。

Parameters
`where`	`ULocale`: A locale specifying the language of the text to be analyzed.

Returns
`BreakIterator`	An instance of BreakIterator that locates word boundaries.

Throws
`NullPointerException`	if `where` is null.

getWordInstance

Added in API level 24

BreakIterator getWordInstance ()

返回定位字边界的BreakIterator的新实例。此函数假定要分析的文本是默认语言环境的语言。

Returns
`BreakIterator`	An instance of BreakIterator that locates word boundaries.

isBoundary

Added in API level 24

boolean isBoundary (int offset)

如果指定的位置是边界位置，则返回true。如果该函数返回true，则将当前迭代位置设置为指定的位置; 如果该函数返回false，则当前迭代位置设置为与调用following（）一样。

Parameters
`offset`	`int`: the offset to check.

Returns
`boolean`	True if "offset" is a boundary position.

last

Added in API level 24

int last ()

将迭代器设置为最后的边界位置。这总是这个迭代器迭代的文本的“过去 - 结束”索引。例如，如果迭代器遍历整个字符串（将其称为“text”），则此函数将始终返回text.length（）。

Returns
`int`	The character offset of the end of the stretch of text being broken.

Added in API level 24

int next ()

将迭代器推进一个边界。当前迭代位置更新为指向当前位置之后的下一个边界位置，这也是返回的值。如果当前位置等于last（）返回的值或DONE，则此函数返回DONE并将当前位置设置为DONE。

Returns
`int`	The position of the first boundary position following the iteration position.

Added in API level 24

int next (int n)

将迭代器按文本中指定的步数移动。一个正数将迭代器向前移动; 负数将迭代器向后移动。如果这导致迭代器离开文本的任何一端，这个函数返回DONE; 否则，该函数返回适当边界的位置。调用此函数相当于调用next（）或previous（）n次。

Parameters
`n`	`int`: The number of boundaries to advance over (if positive, moves forward; if negative, moves backwards).

Returns
`int`	The position of the boundary n boundaries from the current iteration position, or DONE if moving n boundaries causes the iterator to advance off either end of the text.

preceding

Added in API level 24

int preceding (int offset)

将迭代器的当前迭代位置设置为指定位置之前的最后一个边界位置。（指定的位置本身是否是边界位置并不重要 - 该函数总是将迭代位置移动到指定位置之前的最后边界。）如果指定的位置是起始位置，则返回DONE。

Parameters
`offset`	`int`: The character position to start searching from.

Returns
`int`	The position of the last boundary position preceding "offset" (whether of not "offset" itself is a boundary position), or DONE if "offset" is the starting offset of the iterator.

Added in API level 24

int previous ()

将迭代器向后移动一个边界。当前迭代位置被更新为指向当前位置之前的最后边界位置，并且这也是返回的值。如果当前位置等于first（）返回的值或DONE，则此函数返回DONE并将当前位置设置为DONE。

Returns
`int`	The position of the last boundary position preceding the iteration position.

setText

Added in API level 24

void setText (String newText)

设置迭代器分析一段新文本。新的文本片段作为字符串传入，并且当前的迭代位置被重置为字符串的开头。（旧文本被删除。）

Parameters
`newText`	`String`: A String containing the text to analyze with this BreakIterator.

setText

Added in API level 24

void setText (CharacterIterator newText)

设置迭代器分析一段新文本。 BreakIterator传递一个CharacterIterator，通过它可以访问文本本身。当前迭代位置被重置为CharacterIterator的开始索引。（旧的迭代器被删除。）

Parameters
`newText`	`CharacterIterator`: A CharacterIterator referring to the text to analyze with this BreakIterator (the iterator's current position is ignored, but its other state is significant).