Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 575 Vote(s) - 3.56 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Working with Unicode code points in Swift

#1
If you are not interested in the details of Mongolian but just want a quick answer about using and converting Unicode values in Swift, then skip down to the first part of the [accepted answer][1].

---

#Background

I want to render Unicode text for [traditional Mongolian to be used in iOS apps][2]. The better and long term solution is to use an [AAT smart font][3] that would render this complex script. ([Such fonts do exist][4] but their license does not allow modification and non-personal use.) However, since I have never made a font, let alone all of the rendering logic for an AAT font, I just plan to do the rendering myself in Swift for now. Maybe at some later date I can learn to make a smart font.

Externally I will use Unicode text, but internally (for display in a `UITextView`) I will convert the Unicode to individual glyphs that are stored in a dumb font (coded with Unicode [PUA][5] values). So my rendering engine needs to convert Mongolian Unicode values (range: U+1820 to U+1842) to glyph values stored in the PUA (range: U+E360 to U+E5CF). Anyway, this is my plan since it is [what I did in Java in the past][6], but maybe I need to change my whole way of thinking.

#Example

The following image shows *su* written twice in Mongolian using two different forms for the letter *u* (in red). (Mongolian is written vertically with letters being connected like cursive letters in English.)

![enter image description here][7]

In Unicode these two strings would be expressed as

var suForm1: String = "\u{1830}\u{1826}"
var suForm2: String = "\u{1830}\u{1826}\u{180B}"

The Free Variation Selector (U+180B) in `suForm2` is recognized (correctly) by Swift `String` to be a unit with the *u* (U+1826) that precedes it. It is considered by Swift to be a single character, an extended grapheme cluster. However, for the purposes of doing the rendering myself, I need to differentiate *u* (U+1826) and FVS1 (U+180B) as two distinct UTF-16 code points.

For internal display purposes, I would convert the above Unicode strings to the following rendered glyph strings:

suForm1 = "\u{E46F}\u{E3BA}"
suForm2 = "\u{E46F}\u{E3BB}"

#Question

I have been playing around with Swift `String` and `Character`. There are a lot of convenient things about them, but since in my particular case I deal exclusively with UTF-16 code units, I wonder if I should be using the old `NSString` rather than Swift's `String`. I realize that I can use `String.utf16` to get UTF-16 code points, but [the conversion back to `String` isn't very nice][8].

Would it be better to stick with [`String`][9] and [`Character`][10] or should I use [`NSString`][11] and [`unichar`][12]?

#What I have read

- [Strings and Characters documentation][13]
- [Strings in Swift][14]
- [NSString and Unicode][15]

*Updates to this question have been hidden in order to clean the page up. See the edit history.*

[1]:

[To see links please register here]

[2]:

[To see links please register here]

[3]:

[To see links please register here]

[4]:

[To see links please register here]

[5]:

[To see links please register here]

[6]:

[To see links please register here]

[7]:

[8]:

[To see links please register here]

[9]:

[To see links please register here]

[10]:

[To see links please register here]

[11]:

[To see links please register here]

[12]:

[To see links please register here]

[13]:

[To see links please register here]

[14]:

[To see links please register here]

[15]:

[To see links please register here]

[16]:

[To see links please register here]

[17]:

[To see links please register here]

[18]:

[To see links please register here]

[19]:

[To see links please register here]

[20]:

[To see links please register here]

Reply

#2
*Updated for Swift 3*

# String and Character

For almost everyone in the future who visits this question, [`String` and `Character`][1] will be the answer for you.

**Set Unicode values directly in code:**

var str: String = "I want to visit 北京, Москва, मुंबई, القاهرة, and 서울시. 😊"
var character: Character = "🌍"

**Use hexadecimal to set values**

var str: String = "\u{61}\u{5927}\u{1F34E}\u{3C0}" // a大🍎π
var character: Character = "\u{65}\u{301}" // é = "e" + accent mark

Note that the Swift Character can be composed of multiple Unicode code points, but appears to be a single character. This is called an Extended Grapheme Cluster.

See [this question][2] also.

**Convert to Unicode values:**

str.utf8
str.utf16
str.unicodeScalars // UTF-32

String(character).utf8
String(character).utf16
String(character).unicodeScalars

**Convert from Unicode hex values:**

let hexValue: UInt32 = 0x1F34E

// convert hex value to UnicodeScalar
guard let scalarValue = UnicodeScalar(hexValue) else {
// early exit if hex does not form a valid unicode value
return
}

// convert UnicodeScalar to String
let myString = String(scalarValue) // 🍎

Or alternatively:

let hexValue: UInt32 = 0x1F34E
if let scalarValue = UnicodeScalar(hexValue) {
let myString = String(scalarValue)
}

A few more examples

let value0: UInt8 = 0x61
let value1: UInt16 = 0x5927
let value2: UInt32 = 0x1F34E

let string0 = String(UnicodeScalar(value0)) // a
let string1 = String(UnicodeScalar(value1)) // 大
let string2 = String(UnicodeScalar(value2)) // 🍎

// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
myString.append(UnicodeScalar(hexValue))
}
print(myString) // Cat‼🐱

Note that for UTF-8 and UTF-16 the conversion is not always this easy. (See [UTF-8][3], [UTF-16][4], and [UTF-32][5] questions.)

# NSString and unichar

It is also possible to work with `NSString` and `unichar` in Swift, but you should realize that unless you are familiar with Objective C and good at converting the syntax to Swift, it will be difficult to find good documentation.

Also, `unichar` is a `UInt16` array and as mentioned above the conversion from `UInt16` to Unicode scalar values is not always easy (i.e., converting surrogate pairs for things like emoji and other characters in the upper code planes).

# Custom string structure

For the reasons mentioned in the question, I ended up not using any of the above methods. Instead I wrote my own string structure, which was basically an array of `UInt32` to hold Unicode scalar values.

Again, this is not the solution for most people. First consider using [extensions][6] if you only need to extend the functionality of `String` or `Character` a little.

But if you really need to work exclusively with Unicode scalar values, you could write a custom struct.

The advantages are:

- Don't need to constantly switch between Types (`String`, `Character`, `UnicodeScalar`, `UInt32`, etc.) when doing string manipulation.
- After Unicode manipulation is finished, the final conversion to `String` is easy.
- Easy to add more methods when they are needed
- Simplifies converting code from Java or other languages

Disadavantages are:

- makes code less portable and less readable for other Swift developers
- not as well tested and optimized as the native Swift types
- it is yet another file that has to be included in a project every time you need it

You can make your own, but here is mine for reference. The hardest part was [making it Hashable][7].

// This struct is an array of UInt32 to hold Unicode scalar values
// Version 3.4.0 (Swift 3 update)


struct ScalarString: Sequence, Hashable, CustomStringConvertible {

fileprivate var scalarArray: [UInt32] = []


init() {
// does anything need to go here?
}

init(_ character: UInt32) {
self.scalarArray.append(character)
}

init(_ charArray: [UInt32]) {
for c in charArray {
self.scalarArray.append©
}
}

init(_ string: String) {

for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// Generator in order to conform to SequenceType protocol
// (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })
func makeIterator() -> AnyIterator<UInt32> {
return AnyIterator(scalarArray.makeIterator())
}

// append
mutating func append(_ scalar: UInt32) {
self.scalarArray.append(scalar)
}

mutating func append(_ scalarString: ScalarString) {
for scalar in scalarString {
self.scalarArray.append(scalar)
}
}

mutating func append(_ string: String) {
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// charAt
func charAt(_ index: Int) -> UInt32 {
return self.scalarArray[index]
}

// clear
mutating func clear() {
self.scalarArray.removeAll(keepingCapacity: true)
}

// contains
func contains(_ character: UInt32) -> Bool {
for scalar in self.scalarArray {
if scalar == character {
return true
}
}
return false
}

// description (to implement Printable protocol)
var description: String {
return self.toString()
}

// endsWith
func endsWith() -> UInt32? {
return self.scalarArray.last
}

// indexOf
// returns first index of scalar string match
func indexOf(_ string: ScalarString) -> Int? {

if scalarArray.count < string.length {
return nil
}

for i in 0...(scalarArray.count - string.length) {

for j in 0..<string.length {

if string.charAt(j) != scalarArray[i + j] {
break // substring mismatch
}
if j == string.length - 1 {
return i
}
}
}

return nil
}

// insert
mutating func insert(_ scalar: UInt32, atIndex index: Int) {
self.scalarArray.insert(scalar, at: index)
}
mutating func insert(_ string: ScalarString, atIndex index: Int) {
var newIndex = index
for scalar in string {
self.scalarArray.insert(scalar, at: newIndex)
newIndex += 1
}
}
mutating func insert(_ string: String, atIndex index: Int) {
var newIndex = index
for scalar in string.unicodeScalars {
self.scalarArray.insert(scalar.value, at: newIndex)
newIndex += 1
}
}

// isEmpty
var isEmpty: Bool {
return self.scalarArray.count == 0
}

// hashValue (to implement Hashable protocol)
var hashValue: Int {

// DJB Hash Function
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}

// length
var length: Int {
return self.scalarArray.count
}

// remove character
mutating func removeCharAt(_ index: Int) {
self.scalarArray.remove(at: index)
}
func removingAllInstancesOfChar(_ character: UInt32) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar != character {
returnString.append(scalar)
}
}

return returnString
}
func removeRange(_ range: CountableRange<Int>) -> ScalarString? {

if range.lowerBound < 0 || range.upperBound > scalarArray.count {
return nil
}

var returnString = ScalarString()

for i in 0..<scalarArray.count {
if i < range.lowerBound || i >= range.upperBound {
returnString.append(scalarArray[i])
}
}

return returnString
}


// replace
func replace(_ character: UInt32, withChar replacementChar: UInt32) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar == character {
returnString.append(replacementChar)
} else {
returnString.append(scalar)
}
}
return returnString
}
func replace(_ character: UInt32, withString replacementString: String) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar == character {
returnString.append(replacementString)
} else {
returnString.append(scalar)
}
}
return returnString
}
func replaceRange(_ range: CountableRange<Int>, withString replacementString: ScalarString) -> ScalarString {

var returnString = ScalarString()

for i in 0..<scalarArray.count {
if i < range.lowerBound || i >= range.upperBound {
returnString.append(scalarArray[i])
} else if i == range.lowerBound {
returnString.append(replacementString)
}
}
return returnString
}

// set (an alternative to myScalarString = "some string")
mutating func set(_ string: String) {
self.scalarArray.removeAll(keepingCapacity: false)
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// split
func split(atChar splitChar: UInt32) -> [ScalarString] {
var partsArray: [ScalarString] = []
if self.scalarArray.count == 0 {
return partsArray
}
var part: ScalarString = ScalarString()
for scalar in self.scalarArray {
if scalar == splitChar {
partsArray.append(part)
part = ScalarString()
} else {
part.append(scalar)
}
}
partsArray.append(part)
return partsArray
}

// startsWith
func startsWith() -> UInt32? {
return self.scalarArray.first
}

// substring
func substring(_ startIndex: Int) -> ScalarString {
// from startIndex to end of string
var subArray: ScalarString = ScalarString()
for i in startIndex..<self.length {
subArray.append(self.scalarArray[i])
}
return subArray
}
func substring(_ startIndex: Int, _ endIndex: Int) -> ScalarString {
// (startIndex is inclusive, endIndex is exclusive)
var subArray: ScalarString = ScalarString()
for i in startIndex..<endIndex {
subArray.append(self.scalarArray[i])
}
return subArray
}

// toString
func toString() -> String {
var string: String = ""

for scalar in self.scalarArray {
if let validScalor = UnicodeScalar(scalar) {
string.append(Character(validScalor))
}
}
return string
}

// trim
// removes leading and trailing whitespace (space, tab, newline)
func trim() -> ScalarString {

//var returnString = ScalarString()
let space: UInt32 = 0x00000020
let tab: UInt32 = 0x00000009
let newline: UInt32 = 0x0000000A

var startIndex = self.scalarArray.count
var endIndex = 0

// leading whitespace
for i in 0..<self.scalarArray.count {
if self.scalarArray[i] != space &&
self.scalarArray[i] != tab &&
self.scalarArray[i] != newline {

startIndex = i
break
}
}

// trailing whitespace
for i in stride(from: (self.scalarArray.count - 1), through: 0, by: -1) {
if self.scalarArray[i] != space &&
self.scalarArray[i] != tab &&
self.scalarArray[i] != newline {

endIndex = i + 1
break
}
}

if endIndex <= startIndex {
return ScalarString()
}

return self.substring(startIndex, endIndex)
}

// values
func values() -> [UInt32] {
return self.scalarArray
}

}

func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}

func +(left: ScalarString, right: ScalarString) -> ScalarString {
var returnString = ScalarString()
for scalar in left.values() {
returnString.append(scalar)
}
for scalar in right.values() {
returnString.append(scalar)
}
return returnString
}



[1]:

[To see links please register here]

[2]:

[To see links please register here]

[3]:

[To see links please register here]

[4]:

[To see links please register here]

[5]:

[To see links please register here]

[6]:

[To see links please register here]

[7]:

[To see links please register here]

Reply

#3
//Swift 3.0
// This struct is an array of UInt32 to hold Unicode scalar values
struct ScalarString: Sequence, Hashable, CustomStringConvertible {

private var scalarArray: [UInt32] = []

init() {
// does anything need to go here?
}

init(_ character: UInt32) {
self.scalarArray.append(character)
}

init(_ charArray: [UInt32]) {
for c in charArray {
self.scalarArray.append©
}
}

init(_ string: String) {

for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// Generator in order to conform to SequenceType protocol
// (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })

//func generate() -> AnyIterator<UInt32> {
func makeIterator() -> AnyIterator<UInt32> {

let nextIndex = 0

return AnyIterator {
if (nextIndex > self.scalarArray.count-1) {
return nil
}
return self.scalarArray[nextIndex + 1]
}
}

// append
mutating func append(scalar: UInt32) {
self.scalarArray.append(scalar)
}

mutating func append(scalarString: ScalarString) {
for scalar in scalarString {
self.scalarArray.append(scalar)
}
}

mutating func append(string: String) {
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// charAt
func charAt(index: Int) -> UInt32 {
return self.scalarArray[index]
}

// clear
mutating func clear() {
self.scalarArray.removeAll(keepingCapacity: true)
}

// contains
func contains(character: UInt32) -> Bool {
for scalar in self.scalarArray {
if scalar == character {
return true
}
}
return false
}

// description (to implement Printable protocol)
var description: String {

var string: String = ""

for scalar in scalarArray {
string.append(String(describing: UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
}
return string
}

// endsWith
func endsWith() -> UInt32? {
return self.scalarArray.last
}

// insert
mutating func insert(scalar: UInt32, atIndex index: Int) {
self.scalarArray.insert(scalar, at: index)
}

// isEmpty
var isEmpty: Bool {
get {
return self.scalarArray.count == 0
}
}

// hashValue (to implement Hashable protocol)
var hashValue: Int {
get {

// DJB Hash Function
var hash = 5381

for i in 0 ..< scalarArray.count {
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
/*
for i in 0..< self.scalarArray.count {
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
*/
return hash
}
}

// length
var length: Int {
get {
return self.scalarArray.count
}
}

// remove character
mutating func removeCharAt(index: Int) {
self.scalarArray.remove(at: index)
}
func removingAllInstancesOfChar(character: UInt32) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar != character {
returnString.append(scalar: scalar) //.append(scalar)
}
}

return returnString
}

// replace
func replace(character: UInt32, withChar replacementChar: UInt32) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar == character {
returnString.append(scalar: replacementChar) //.append(replacementChar)
} else {
returnString.append(scalar: scalar) //.append(scalar)
}
}
return returnString
}

// func replace(character: UInt32, withString replacementString: String) -> ScalarString {
func replace(character: UInt32, withString replacementString: ScalarString) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar == character {
returnString.append(scalarString: replacementString) //.append(replacementString)
} else {
returnString.append(scalar: scalar) //.append(scalar)
}
}
return returnString
}

// set (an alternative to myScalarString = "some string")
mutating func set(string: String) {
self.scalarArray.removeAll(keepingCapacity: false)
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// split
func split(atChar splitChar: UInt32) -> [ScalarString] {
var partsArray: [ScalarString] = []
var part: ScalarString = ScalarString()
for scalar in self.scalarArray {
if scalar == splitChar {
partsArray.append(part)
part = ScalarString()
} else {
part.append(scalar: scalar) //.append(scalar)
}
}
partsArray.append(part)
return partsArray
}

// startsWith
func startsWith() -> UInt32? {
return self.scalarArray.first
}

// substring
func substring(startIndex: Int) -> ScalarString {
// from startIndex to end of string
var subArray: ScalarString = ScalarString()
for i in startIndex ..< self.length {
subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
}
return subArray
}
func substring(startIndex: Int, _ endIndex: Int) -> ScalarString {
// (startIndex is inclusive, endIndex is exclusive)
var subArray: ScalarString = ScalarString()
for i in startIndex ..< endIndex {
subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
}
return subArray
}

// toString
func toString() -> String {
let string: String = ""

for scalar in self.scalarArray {
string.appending(String(describing:UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
}
return string
}

// values
func values() -> [UInt32] {
return self.scalarArray
}

}

func ==(left: ScalarString, right: ScalarString) -> Bool {

if left.length != right.length {
return false
}

for i in 0 ..< left.length {
if left.charAt(index: i) != right.charAt(index: i) {
return false
}
}

return true
}

func +(left: ScalarString, right: ScalarString) -> ScalarString {
var returnString = ScalarString()
for scalar in left.values() {
returnString.append(scalar: scalar) //.append(scalar)
}
for scalar in right.values() {
returnString.append(scalar: scalar) //.append(scalar)
}
return returnString
}
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through