String Iteration in Dart

I’ve just begun my deep dive into Dart. It’s a very nice language, but the string iteration had me scratching my head for a while. I still don’t know if I am doing it the best way to be honest.

In most languages the string class will offer a way to iterate over characters. For example in C#

foreach (var ch in text) { ... }

However, Dart represents strings as a sequence of Unicode UTF-16 code units. A Unicode code point is represented as an integer called a Rune.

Consider this example:

var text = 'Hello World'; 
print(text.codeUnits); 
   
// outputs
[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100]

There appears to be no concept of characters in the traditional sense. For example, to iterate over text and print out its contents you need to convert the runes to strings:

var text = 'Hello World';

text.runes.forEach((rune) {
  var s = String.fromCharCode(rune);
  stdout.write(s);
});    

// outputs
Hello World

This post has two points, firstly to introduce this concept to new Dart developers, and secondly to ask for feedback on a more efficient, idiomatic Dart way of doing things.

I’m currently writing a simple script to parse a file with very basic markup. Once this script is working well, I’ll modify it so the code is asynchronous and bring it into a Flutter application I’m working on. It needs to be as fast as possible. The idea is to annotate a document so I can generate the appropriate Flutter widgets:

@1. The sanguine temperament

A person of sanguine temperament reacts quickly and strongly to almost any stimulation or impression, but the reaction is usually of short duration. The stimulation or impression is quickly forgotten, and the remembrance of past experiences does not easily arouse a new response.

Among the good qualities of the sanguine temperament, we may list the following:

* affability and cheerfulness;
* sympathy and generosity toward others;
* sensitivity and compassion for the sufferings of others;
* docility and submission to superiors;
* sincerity and spontaneity.

~intro_fig2.png

$St. Teresa of Avila had to overcome the superficiality of her sanguine nature

There is a process which operates on the file before I add it as an asset to Flutter. That process strips out new lines, inserts paragraph markers, and adds a final paragraph marker to simplify parsing – the real last element is always processed, no need to check for content after iteration finishes.

I have a basic enumeration:

enum ElementType {
  Paragraph,  // | 124
  Chapter,    // @ 64
  Section,    // # 35
  Image,      // ~ 126
  FigCaption, // $ 36
  Bullet,     // * 42
  Number,     // > 62
  Letter,     // < 60
  Roman,      // ^ 94
  None,
}

Their associated tags (runes):

final List<int> tags =
    List.unmodifiable([124, 64, 35, 126, 36, 42, 62, 60, 94]);

And a simple element class which contains a type and associated content. I’m working with code points rather than strings with the assumption it should be much faster and use less resources:

class Element {
  final String content;
  final ElementType type;

  Element(this.type, this.content);

  static ElementType getType(int rune) {
    var index = tags.indexOf(rune);
    return index == -1 ? ElementType.None : ElementType.values[index];
  }
}

The usual argument checking:

void main(List<String> args) {
  assert(args.length == 1, 'usage: bd.dart myfile.bdc');

  var path = args[0];

  assert(FileSystemEntity.typeSync(path) != FileSystemEntityType.notFound,
      'file does not exist');

  var text = File(path).readAsStringSync();
  var codes = text.codeUnits;

  assert(codes.length > 0, 'file is empty');

This is the core part that parses the file and generates elements. This is where I have to iterate over the file. The content assignment is the only case where I convert the runes to a string.

  var elements = <Element>[];
  var buffer = StringBuffer();
  var type = ElementType.None;

  for (var i = 0; i < codes.length; ++i) {
    var rune = codes[i];
    var newType = Element.getType(rune);

    if (newType == ElementType.None) {
      buffer.writeCharCode(rune);
      continue;
    }

    if (buffer.isNotEmpty) {
      var content = buffer.toString();
      elements.add(Element(type, content));
      buffer.clear();
    }

    type = newType;
  }

And a final assertion:

  assert(buffer.isEmpty, 'document is missing a termination marker');

Printing the extracted elements:

  elements.forEach((e) => print('${e.type} => ${e.content}'));

The results are as expected:

ElementType.Chapter => 1. The sanguine temperament
ElementType.Paragraph => A person of sanguine temperament reacts quickly and strongly to almost any stimulation or impression, but the reaction is usually of short duration. The stimulation or impression is quickly forgotten, and the remembrance of past experiences does not easily arouse a new response.
ElementType.Paragraph => Among the good qualities of the sanguine temperament, we may list the following:
ElementType.Bullet =>  affability and cheerfulness;
ElementType.Bullet =>  sympathy and generosity toward others;
ElementType.Bullet =>  sensitivity and compassion for the sufferings of others;
ElementType.Bullet =>  docility and submission to superiors;
ElementType.Bullet =>  sincerity and spontaneity.
ElementType.Image => intro_fig2.png
ElementType.FigCaption => St. Teresa of Avila had to overcome the superficiality of her sanguine nature

I hope this gives a heads up to those new to iterating strings in Dart.

Appreciate any feedback and tips for improvement.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s