Parsing GMI


Legends have it that implementing a client is a weekend job. Maybe I am much stupider than expected, or maybe I care too much about being correct and minimal, but it's taking me much longer. Procrastination does not help, I admit.


As I see it, a parser for streamed data should be a state machine that eats characters. In my case I am starting with a string already (and as I am using GTK, it is a good idea to shove the entire string into the text buffer, and style afterwards). So my parser is there for styling purposes only.


So it is a state machine that eats characters and returns NIL unless it determines a stylable span end, in which case it returns enough information to style said span.


Since a determination happens at the beginning of a line and requires one to three characters, there are more states than one would expect. This reaffirms my conjecture that GMI format is rather unfortunate...


In an Ideal Gemini...


Each line starts with a 'kind' character - a specially selected unicode character that visually communicates the purpose of the line, be it a bullet, an arrow for links, or one of three headers. Only blockquote lines do not start with a kind character, allowing us to easily copy and paste them...


But, in the real world


I just have to slog it with a 100-line state machine...





/gemlog/