Most visited

Recently visited

Added in API level 9

Normalizer

public final class Normalizer
extends Object

java.lang.Object
   ↳ java.text.Normalizer


该类提供方法normalize ,它将Unicode文本转换为等同的组合或分解形式,从而可以更轻松地排序和搜索文本。 normalize方法支持Unicode Standard Annex #15 — Unicode Normalization Forms中描述的标准规范化表单。

带有口音或其他装饰的字符可以用几种不同的Unicode编码方式进行编码。 例如,采取字符A-急性。 在Unicode中,这可以编码为单个字符(“合成”形式):

      U+00C1    LATIN CAPITAL LETTER A WITH ACUTE

or as two separate characters (the "decomposed" form):

      U+0041    LATIN CAPITAL LETTER A
      U+0301    COMBINING ACUTE ACCENT

To a user of your program, however, both of these sequences should be treated as the same "user-level" character "A with acute accent". When you are searching or comparing text, you must ensure that these two sequences are treated as equivalent. In addition, you must handle characters with more than one accent. Sometimes the order of a character's combining accents is significant, while in other cases accent sequences in different orders are really equivalent.

类似地,字符串“ffi”可以被编码为三个单独的字母:

      U+0066    LATIN SMALL LETTER F
      U+0066    LATIN SMALL LETTER F
      U+0069    LATIN SMALL LETTER I

or as the single character

      U+FB03    LATIN SMALL LIGATURE FFI

The ffi ligature is not a distinct semantic character, and strictly speaking it shouldn't be in Unicode at all, but it was included for compatibility with existing character sets that already provided it. The Unicode standard identifies such characters by giving them "compatibility" decompositions into the corresponding semantic characters. When sorting and searching, you will often want to use these mappings.

如上面第一个例子所示, normalize方法通过将文本转换为规范组合和分解形式来帮助解决这些问题。 另外,您可以让它执行兼容性分解,以便可以将兼容性字符视为与其等效项相同。 最后, normalize方法将口音重新排列为正确的规范顺序,以便您不必担心自己的重音重排。

W3C通常建议在NFC中交换文本。 还要注意,大多数遗留字符编码仅使用预先组合的表格,并且通常不会自己编码任何组合标记。 为了转换为这种字符编码,需要将Unicode文本标准化为NFC。 有关更多用法示例,请参阅Unicode标准附录。

Summary

Nested classes

枚举 Normalizer.Form

此枚举提供了Unicode Standard Annex #15 — Unicode Normalization Forms中描述的四种Unicode规范化形式的常量以及两种访问它们的方法。

Public methods

static boolean isNormalized(CharSequence src, Normalizer.Form form)

确定给定的char值序列是否归一化。

static String normalize(CharSequence src, Normalizer.Form form)

规范化一系列char值。

Inherited methods

From class java.lang.Object

Public methods

isNormalized

Added in API level 9
boolean isNormalized (CharSequence src, 
                Normalizer.Form form)

确定给定的char值序列是否归一化。

Parameters
src CharSequence: The sequence of char values to be checked.
form Normalizer.Form: The normalization form; one of NFC, NFD, NFKC, NFKD
Returns
boolean true if the sequence of char values is normalized; false otherwise.
Throws
NullPointerException If src or form is null.

normalize

Added in API level 9
String normalize (CharSequence src, 
                Normalizer.Form form)

规范化一系列char值。 序列将根据指定的标准化进行标准化。

Parameters
src CharSequence: The sequence of char values to normalize.
form Normalizer.Form: The normalization form; one of NFC, NFD, NFKC, NFKD
Returns
String The normalized String
Throws
NullPointerException If src or form is null.

Hooray!