Skip to content
/ text Public

Conversion from unicode to ASCII (simple transliteration) and related features.

License

Notifications You must be signed in to change notification settings

m-m-m/text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

Apache License, Version 2.0 Build Status

mmm-text

Maven Central mmm-text JavaDoc

mmm-text-ascii

The module io.github.mmm.text.ascii (artifactId mmm-ascii) provides a efficient conversion from unicode to ASCII. Unicode is an awesome and powerful standard but is also extremely complex. This can cause unexpected problems e.g. two String values may not be equal even though they are optically and semantically equivalent. Just to give one example the character ä can also be written as a combining diaresis character followed by an a. As a result your implementation to compare, search, or otherwise process texts can produce unexpected results. Further, when you have a name from arbitrary user-input and you need to use that in a path of the file-system or to derive a business key you may not want to allow any unicode character to avoid problems.

Features

This library offers the following features:

  • Transformation of any unicode text to a corresponding String only containing 7-bit ASCII characters.

  • Highly optimized for performance: ultra fast and low memory footprint due to lazy loading of mappings per code-point high-bytes.

  • Support for simplified transliteration. It is not fully compatible to standards like ISO-843, ISO-9:1995, ISO-15919, ISO 11940-2:2007, etc. So if you look for accuracy this is the wrong place. However, if a "good enough" approach is enough you have found a great solution.

  • Extremely robust due to is simplicity. Any input is accepted and you will get ASCII output.

Usage

Maven Dependency:

<dependency>
  <groupId>io.github.m-m-m</groupId>
  <artifactId>mmm-text-ascii</artifactId>
  <version>${mmm.base.version}</version>
</dependency>

Module Dependency:

  requires transitive io.github.mmm.text.ascii;

About

Conversion from unicode to ASCII (simple transliteration) and related features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages