2007-08-15
Table of Contents
List of Tables
List of Examples
This is the manual of LitJSON, a library for the .Net framework, which purpose is to handle data in the JSON format.
It covers version 0.3.0 of the library and at the moment it includes a quick start section and some appendices containing specific information about features of LitJSON, such as the specific approach used in the parser component, and some notes regarding performance as measured by a set of benchmarks.
Due to some problems in the generation of HTML output from the XML sources, the complete API reference is not included yet, although this shortcoming will likely be fixed in a future release. Until that time, it is recommended that developers familiarise with the API by reading the unit tests distributed with its sources.
JSON is a simple, yet powerful notation to specify data. It defines simple scalar types such as boolean, number (integers and reals) and string, and a couple of data structures: arrays (lists) and objects (dictionaries). For more information on the JSON format, visit JSON.org.
LitJSON is written in C#, and it's intended to be small, fast and easy to use. It was developed on a GNU/Linux environment, using the Mono framework.
Table of Contents
In order to consume data in JSON format inside .Net programs, the natural approach that comes to mind is to use JSON text to populate a new instance of a particular class; either a custom one, built to match the structure of the input JSON text, or a more general one which acts as a dictionary.
Conversely, in order to build new JSON strings from data stored in objects, a simple export-like operation sounds like a good idea.
For this purpose, LitJSON includes the JsonMapper class, which provides two main methods
used to do JSON-to-object and object-to-JSON conversions. These
methods are ToObject and
ToJson.
Example 1.1. Simple JsonMapper examples
As the example shows, the method
JsonMapper.ToObject has a generic variant
that is used to specify the type of the object to be returned.
using LitJson;
using System;
public class Person
{
// Person members are defined here ...
}
public class JsonSample
{
public static void Main ()
{
PersonToJson ();
JsonToPerson ();
}
public static void PersonToJson ()
{
Person bill = new Person ();
bill.Name = "William Shakespeare";
bill.Age = 51;
bill.Birthday = new DateTime (1564, 4, 26);
string json_bill = JsonMapper.ToJson (bill);
Console.WriteLine (json_bill);
// {"Name":"William Shakespeare","Age":51,"Birthday":"04/26/1564 00:00:00"}
}
public static void JsonToPerson ()
{
string json = @"
{
""Name"" : ""Thomas More"",
""Age"" : 57,
""Birthday"" : ""02/07/1478 00:00:00""
}";
Person thomas = JsonMapper.ToObject<Person> (json);
Console.WriteLine ("Thomas' age: {0}", thomas.Age);
// Thomas' age: 57
}
}
When JSON data is to be read and a custom class that matches a
particular data structure is not available or desired, users can use
the non-generic variant of ToObject, which
returns a JsonData instance.
JsonData is a general purpose type that can
hold any of the data types supported by JSON, including lists and
dictionaries.
Example 1.2. Using the non-generic variant of JsonMapper.ToObject
using LitJson;
using System;
public class JsonSample
{
public void LoadAlbumData (string json_text)
{
JsonData data = JsonMapper.ToObject (json_text);
// Dictionaries are accessed like a hash-table
Console.WriteLine ("Album's name: {0}", data["album"]["name"]);
// Scalar elements stored in a JsonData instance can be cast to
// their natural types
string artist = (string) data["album"]["name"];
int year = (int) data["album"]["year"];
// Arrays are accessed like regular lists as well
Console.WriteLine ("First track: {0}", data["album"]["tracks"][0]);
}
}
An alternative interface to handling JSON data that might be familiar
to some developers is through classes that make it possible to read
and write data in a stream-like fashion. These classes are JsonReader and JsonWriter.
These two types are in fact the foundation of this library, and the
JsonMapper type is built on top of them, so in
a way, the developer can think of the reader and writer classes as the
low-level programming interface for LitJSON.
Example 2.1. Using JsonReader
using LitJson;
using System;
public class DataReader
{
public static void Main ()
{
string sample = @"{
""name"" : ""Bill"",
""age"" : 32,
""awake"" : true,
""n"" : 1994.0226,
""note"" : [ ""life"", ""is"", ""but"", ""a"", ""dream"" ]
}";
ReadJson (sample);
}
public static void ReadJson (string json)
{
JsonReader reader = new JsonReader (json);
Console.WriteLine ("{0,14} {1,10} {2,16}", "Token", "Value", "Type");
Console.WriteLine (new String ('-', 42));
// The Read() method returns false when there's nothing else to read
while (reader.Read ()) {
string type = reader.Value != null ?
reader.Value.GetType ().ToString (): "";
Console.WriteLine ("{0,14} {1,10} {2,16}",
reader.Token, reader.Value, type);
}
}
}
This example would produce the following output:
Token Value Type
------------------------------------------
ObjectStart
PropertyName name System.String
String Bill System.String
PropertyName age System.String
Int 32 System.Int32
PropertyName awake System.String
Boolean True System.Boolean
PropertyName n System.String
Double 1994.0226 System.Double
PropertyName note System.String
ArrayStart
String life System.String
String is System.String
String but System.String
String a System.String
String dream System.String
ArrayEnd
ObjectEnd
Example 2.2. Using JsonWriter
using LitJson;
using System;
using System.Text;
public class DataReader
{
public static void WriteJson ()
{
StringBuilder sb = new StringBuilder ();
JsonWriter writer = new JsonWriter (sb);
writer.WriteArrayStart ();
writer.Write (1);
writer.Write (2);
writer.Write (3);
writer.WriteObjectStart ();
writer.WritePropertyName ("color");
writer.Write ("blue");
writer.WriteObjectEnd ();
writer.WriteArrayEnd ();
Console.WriteLine (sb.ToString ());
// [1,2,3,{"color":"blue"}]
}
}
Table of Contents
LitJSON was built out of a specific need for a small JSON library for the .Net framework, but also a big motivation was simply the fun of learning a little bit about parsing techniques and put some of that information into action writing a parser for the JSON grammar, which is not very complicated.
This appendix includes some specific details about the approach used by the parser. This section should clarify some of the internal details of this library's source code.
Table of Contents
The parser in this library is based on a couple of elements: a finite state machine used to recognize tokens, and a pushdown automaton that recognizes the grammar of JSON text.
A finite state machine recognizes tokens from the input text, in effect acting as a lexer. It's composed of 28 states, listed below:
As you may notice, the lexer is designed to read more than just the
strict JSON grammar as defined in RFC
4627. It may also allow comments (the forms
//comment and /*comment*/ are
supported) and single-quoted strings. The library allows the developer
to configure whether this extensions are accepted or not. They are
allowed by default.
A simple table lists which state to go to next depending on the current state and the current input character.
Now, the actual handling of the states can do a little more than just setting the next state. The following flags are used to indicate special information related to the actions in a state:
Add char to buffer. A string buffer is kept and returned in certain states along with the token ID. For example, when reading a number, the entire sequence of characters corresponding to the number are returned as a string, along with the NUMBER token.
Only accept this state if the lexer is configured to allow comments.
Add the unescaped character to the buffer.
Leave character unconsumed. This leaves the current input character untouched so it is used in the next lexer cycle.
Return. This means that a token has been recognized, so the corresponding ID should be returned. If the string buffer is not empty, it should be returned as well, and then the buffer cleared.
Only accept this state if the lexer is configured to allow string literals enclosed with single-quotes.
Stack State. When used as a flag, it indicates that the current state number should be put in a stack. When used as a state value, it means that the next state to go to is the one in the stack.
Unicode escaping sequence. In this state, exactly 4 hex characters should be read and, at the end, the corresponding unicode character is added to the string buffer and the machine continues.
One last thing to notice: the Char token means that the
literal input character is returned as a token, except when it's a
single quote ('), in which case the double quote
(") is returned. This is just to simplify things a
little bit for the grammar parser.
Table A.1. Lexer State Machine (part one)
| Input | State 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Token | Char | NUMBER | NUMBER | NUMBER | NUMBER | TRUE | FALSE | ||||||||
space | 1 | [R] 1 | [R] 1 | [R] 1 | [R] 1 | ||||||||||
" | [R] 19 | ||||||||||||||
' | [SQ][R] 23 | ||||||||||||||
+ | [A] 8 | ||||||||||||||
[,\]}] | [R] 1 | [L][R] 1 | [L][R] 1 | [L][R] 1 | [L][R] 1 | ||||||||||
- | [A] 2 | [A] 8 | |||||||||||||
. | [A] 5 | [A] 5 | |||||||||||||
/ | [C] 25 | ||||||||||||||
0 | [A] 4 | [A] 4 | [A] 3 | [A] 6 | [A] 6 | [A] 8 | [A] 8 | ||||||||
[1-9] | [A] 3 | [A] 3 | [A] 3 | [A] 6 | [A] 6 | [A] 8 | [A] 8 | ||||||||
: | [R] 1 | ||||||||||||||
E | [A] 7 | [A] 7 | [A] 7 | ||||||||||||
[\[{] | [R] 1 | ||||||||||||||
a | 13 | ||||||||||||||
e | [A] 7 | [A] 7 | [A] 7 | [R] 1 | [R] 1 | ||||||||||
f | 12 | ||||||||||||||
l | 14 | ||||||||||||||
n | 16 | ||||||||||||||
r | 10 | ||||||||||||||
s | 15 | ||||||||||||||
t | 9 | ||||||||||||||
u | 11 |
Table A.2. Lexer State Machine (part two)
| Input | State 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Token | NULL | CHARSEQ | Char | CHARSEQ | Char | ||||||||
\n | 1 | 27 | 27 | ||||||||||
" | [L][R] 20 | [R] 1 | [E] SS | [A] 23 | 26 | 27 | 27 | ||||||
' | [A] 19 | [E] SS | [L][R] 24 | [R] 1 | 26 | 27 | 27 | ||||||
* | 27 | 26 | 28 | 28 | |||||||||
/ | [A] 19 | [E] SS | [A] 23 | 26 | 26 | 27 | 1 | ||||||
[0-9A-Facde] | [A] 19 | [U] SS | [A] 23 | 26 | 27 | 27 | |||||||
\ | [SS] 21 | [E] SS | [SS] 21 | 26 | 27 | 27 | |||||||
[bf] | [A] 19 | [E] SS | [U] SS | [A] 23 | 26 | 27 | 27 | ||||||
l | 18 | [R] 1 | [A] 19 | [A] 23 | 26 | 27 | 27 | ||||||
[nrt] | [A] 19 | [E] SS | [A] 23 | 26 | 27 | 27 | |||||||
u | 17 | [A] 19 | 22 | [A] 23 | 26 | 27 | 27 | ||||||
other | [A] 19 | [A] 23 | 26 | 27 | 27 |
Since the lexer handles most of the hairy stuff (returning it as terminals), the resulting grammar deals mainly with the recursive portions.
TextS -> Object
| Array
Object -> { Object'
Object' -> }
| Pair PairRest }
PairRest -> ε
| , Pair PairRest
Pair -> String : Value
Array -> [ Array'
Array' -> ]
| Value ValueRest ]
ValueRest -> ε
| , Value ValueRest
Value -> String
| Object
| Array
| NUMBER
| TRUE
| FALSE
| NULL
String -> " CHARSEQ "
Table A.3. Grammar FIRST(X) sets
| Rule | FIRST set |
|---|---|
| Array | [ |
| Array' | ] " { [ NUMBER TRUE FALSE NULL |
| Object | { |
| Object' | } " |
| Pair | " |
| PairRest | ε , |
| String | " |
| Text | { [ |
| Value | " { [ NUMBER TRUE FALSE NULL |
| ValueRest | ε , |
Note: $ is the end of input marker
Table A.4. Grammar FOLLOW(X) sets
| Rule | FOLLOW set |
|---|---|
| Array | $ , ] } |
| Array' | $ |
| CharList | " |
| Object | $ , ] } |
| Object' | $ |
| Pair | , } |
| PairRest | } |
| String | : , ] } |
| Text | $ |
| Value | , ] } |
| ValueRest | ] |
Table A.5. Parse Table
| Input Token | Array | Array' | Object | Object' | Pair | PairRest | String | Text | Value | ValueRest |
|---|---|---|---|---|---|---|---|---|---|---|
" | Value ValueRest ] | Pair PairRest } | String : Value | " CHARSEQ " | String | |||||
, | , Pair PairRest | , Value ValueRest | ||||||||
[ | [ Array' | Value ValueRest ] | Array | Array | ||||||
] | ] | ε | ||||||||
{ | Value ValueRest ] | { Object' | Object | Object | ||||||
} | } | ε | ||||||||
NUMBER | Value ValueRest ] | NUMBER | ||||||||
TRUE | Value ValueRest ] | TRUE | ||||||||
FALSE | Value ValueRest ] | FALSE | ||||||||
NULL | Value ValueRest ] | NULL |
One of the main design goals of LitJSON is to be a small and fast library.
This doesn't mean that it should perform every operation as fast as possible down to the last nano-second (which would be a pointless exercise anyway). However, a set of simple benchmarks to keep track of its performance as the library grows, and to compare it against other JSON libraries for .Net does serve a useful purpose. In simple terms, a set of benchmarks has been created to make sure LitJSON's performance doesn't suck.
Table of Contents
In an attempt to keep the benchmarks useful and organised, different binaries are created, each one with its own set of micro-benchmarks measuring different features of the libraries.
The benchmarks are split in 4 main categories:
Readers.
Writers.
Importing data into objects.
Exporting data from objects.
Aditionally, information about heap memory used is gathered using heap-buddy.
The complete source code of the benchmarks can be found under the
benchmarks directory included
in the library's source tree. More information about this code is
included in the README file under that
directory.
The benchmarks included test related capabilities of the following .Net libraries:
Jayrock version 0.9.8316.
LitJSON version 0.3.0.
Newtonsoft Json.NET version 1.1.
Much gratitude goes out to the developers of Jayrock and Json.NET for their excellent work, and, needless to say, LitJSON wouldn't be what it is now without them.
It is important to notice that, although benchmarks like these can show trends in performance and help the developers notice when something goes notably wrong, in general they are not meant to produce conclusive results in general.
For this reason, every number presented here is a relative figure, that depends on a number of variables (hardware, operating system, runtime, system load, etc.) Also notice that different libraries may have been designed with different purposes in mind, so benchmark suites that only focus on a common subset of functionality, like this one, don't present the full picture to the potential developer that is deciding whether to use a given tool or not.
These benchmarks have been run while trying to keep the environment as consistent as possible. The machine used is an old Pentium III, running Linux 2.6.19. The binaries are executed using Mono 1.2.4.
The benchmarks implemented are:
Using a reader class (JsonTextReader in
Jayrock, JsonReader in LitJSON and
Json.NET), read the following input:
[
42,
1,
1,
2,
3,
5,
8,
-50,
-678.56,
3.1415,
1.4e10,
4.0e5,
8.0e-3
]
Using a reader class, the following input is read:
[
"Hello World!",
"The quick brown fox jumps over the lazy dog",
"Lorem ipsum dolor sit amet, consectetur adipisicing elit", // .. more text
"$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^"
]
Given the following input:
{
"Image": {
"FirstProperty": true,
"Width": 800,
"Height": 600,
"Title": "View from 15th Floor",
"Comment": "Sample text:\t\"abcdef\"\nSecond Line",
"Comment2": "\u03c0\u03c1\u03cc\u03b3\u03c1\u03b1\u03bc\u03bc\u03b1",
"Default": true,
"Active": false,
"Resource": null,
"Thumbnail": {
"Url": "http://www.example.com/image/481989943",
"Height": 125,
"Width": "100" },
"IDs": [ 116, 943, 234, 38793 ],
"Score": 9.40,
"Scale": 1.0e-1,
"Views": 3000000000,
"LastProperty": true
}
}
A reader is created and tokens are read until the
"FirstProperty" element is found.
Given the same input as the last benchmark, a reader is
created and tokens are read until
"LastProperty" is found.
The results:
Benchmarking type BenchmarkJayrock
JayrockReaderNumbers 00:00:01.9310770
JayrockReaderStrings 00:00:01.7263010
JayrockReaderFirstProperty 00:00:00.5530070
JayrockReaderLastProperty 00:00:04.2311710
Benchmarking type BenchmarkLitJson
LitJsonReaderNumbers 00:00:01.4522950
LitJsonReaderStrings 00:00:01.7806950
LitJsonReaderFirstProperty 00:00:00.3135730
LitJsonReaderLastProperty 00:00:03.4958360
Benchmarking type BenchmarkNewtonsoft
NewtonsoftReaderNumbers 00:00:02.6169990
NewtonsoftReaderStrings 00:00:02.3476750
NewtonsoftReaderFirstProperty 00:00:01.1853270
NewtonsoftReaderLastProperty 00:00:03.5655060
Heap memory usage:
SUMMARY
Filename: BmJayrockReader-heap.out
Allocated Bytes: 132.3M
Allocated Objects: 2511436
GCs: 980
Resizes: 5
Final heap size: 468k
Distinct Types: 81
Backtraces: 565
SUMMARY
Filename: BmLitJsonReader-heap.out
Allocated Bytes: 65.4M
Allocated Objects: 1251602
GCs: 479
Resizes: 5
Final heap size: 472k
Distinct Types: 92
Backtraces: 671
SUMMARY
Filename: BmNewtonsoftReader-heap.out
Allocated Bytes: 353.5M
Allocated Objects: 1361535
GCs: 3122
Resizes: 5
Final heap size: 476k
Distinct Types: 81
Backtraces: 527
The benchmarks implemented are:
Using a writer object (JsonTextWriter in
Jayrock, JsonWriter in LitJSON and
Json.NET), write the following numbers in an array:
0.0,
10.0,
3.1416,
0.0000001,
-789.123,
0.00056,
50000000000.0
0,
42,
100000,
-1,
-123,
7777,
25
Using a writer, write the following information:
'[', null,
'{', null,
'P', "precision",
'S', "zip",
'P', "Latitude",
'D', 37.7668,
'P', "Longitude",
'D', -122.3959,
'P', "Address",
'S', "",
'P', "City",
'S', "SAN FRANCISCO",
'P', "State",
'S', "CA",
'P', "Zip",
'S', "94107",
'P', "Country",
'S', "US",
'P', "Visited",
'B', true,
'P', "Ref",
'N', null,
'P', "Comment",
'S', "This is a \"comment\"\tColumn2\nLine2. " +
"\u00c6nema is a good album.",
'}', null,
'{', null,
'P', "precision",
'S', "zip",
'P', "Latitude",
'D', 37.371991,
'P', "Longitude",
'D', -122.026020,
'P', "Address",
'S', "",
'P', "City",
'S', "SUNNYVALE",
'P', "State",
'S', "CA",
'P', "Zip",
'S', "94085",
'P', "Country",
'S', "US",
'P', "Visited",
'B', false,
'P', "Ref",
'N', null,
'}', null,
']', null
This information is processed in pairs, the first item
indicates the type of token to write: {,
}, [ and
] for opening/closing objects and arrays,
P for properties, S for
strings, B for booleans,
D for double numbers and
N for null.
The results:
Benchmarking type BenchmarkJayrock
JayrockWriterNumbers 00:00:10.7310150
JayrockWriterObjects 00:00:08.8977730
Benchmarking type BenchmarkLitJson
LitJsonWriterNumbers 00:00:10.6846000
LitJsonWriterObjects 00:00:08.4647110
Benchmarking type BenchmarkNewtonsoft
NewtonsoftWriterNumbers 00:00:10.6271730
NewtonsoftWriterObjects 00:00:08.4410420
Heap memory usage:
SUMMARY
Filename: BmJayrockWriter-heap.out
Allocated Bytes: 88.8M
Allocated Objects: 1890928
GCs: 671
Resizes: 5
Final heap size: 468k
Distinct Types: 64
Backtraces: 419
SUMMARY
Filename: BmLitJsonWriter-heap.out
Allocated Bytes: 70.0M
Allocated Objects: 1320942
GCs: 543
Resizes: 5
Final heap size: 468k
Distinct Types: 66
Backtraces: 434
SUMMARY
Filename: BmNewtonsoftWriter-heap.out
Allocated Bytes: 74.0M
Allocated Objects: 1470936
GCs: 563
Resizes: 5
Final heap size: 468k
Distinct Types: 64
Backtraces: 422
These benchmarks receive the following as input:
{
"Name" : "Art Vandelay",
"Age" : 30,
"Height" : 1.65,
"Retired" : false,
"Urls" : [
"http://example.com/artvandelay",
"http://artvandelay.org/" ],
"Job" : {
"Title" : "Importer/Exporter",
"Description" : "import matches... long matches"
}
}
The benchmarks implemented are:
Using the specific API to import JSON data
(JsonConvert.Import in Jayrock,
JsonMapper.ToObject in LitJSON and
JavaScriptConvert.DeserializeObject
in Json.NET), the input data is imported into a
general-purpose data type defined by the library:
JsonObject in Jayrock,
JsonData in LitJSON and
JavaScriptObject in Json.NET.
Import the input JSON into an object of type System.Collections.Hashtable. Unfortunately, Jayrock doesn't seem to be able to perform this conversion, so it's not included in the results. Notice that this also affects the heap memory usage results (Jayrock's summary corresponds to running 2 of the 3 benchmarks).
A custom Person class is used to import
the input data.
The results:
Benchmarking type BenchmarkJayrock
JayrockConversionToGenericObject 00:00:03.8663690
JayrockConversionToObject 00:00:12.6593170
Benchmarking type BenchmarkLitJson
LitJsonConversionToGenericObject 00:00:02.5881870
LitJsonConversionToHashtable 00:00:06.2299200
LitJsonConversionToObject 00:00:06.4707210
Benchmarking type BenchmarkNewtonsoft
NewtonsoftConversionToGenericObject 00:00:10.3858570
NewtonsoftConversionToHashtable 00:00:11.3133830
NewtonsoftConversionToObject 00:00:24.4709470
Heap memory usage:
SUMMARY
Filename: BmJayrockImport-heap.out
Allocated Bytes: 111.0M
Allocated Objects: 2923200
GCs: 677
Resizes: 6
Final heap size: 628k
Distinct Types: 162
Backtraces: 1236
SUMMARY
Filename: BmLitJsonImport-heap.out
Allocated Bytes: 87.7M
Allocated Objects: 2132239
GCs: 582
Resizes: 6
Final heap size: 628k
Distinct Types: 142
Backtraces: 988
SUMMARY
Filename: BmNewtonsoftImport-heap.out
Allocated Bytes: 504.1M
Allocated Objects: 8801931
GCs: 3797
Resizes: 6
Final heap size: 644k
Distinct Types: 107
Backtraces: 811
The benchmarks implemented are:
Using an object of the same type as in the
ConversionToGenericObject benchmark, convert it into a JSON
string via JsonConvert.ExportToString
in Jayrock, JsonMapper.ToJson in
LitJSON, and
JavaScriptConvert.SerializeObject in
Json.NET.
Convert an object with the same structure and data, but of type System.Collections.Hashtable into JSON.
Convert from the custom Person type
into JSON.
The results:
Benchmarking type BenchmarkJayrock
JayrockConversionFromGenericObject 00:00:01.5508170
JayrockConversionFromHashtable 00:00:03.9338940
JayrockConversionFromObject 00:00:07.6953470
Benchmarking type BenchmarkLitJson
LitJsonConversionFromGenericObject 00:00:00.2117890
LitJsonConversionFromHashtable 00:00:02.7785360
LitJsonConversionFromObject 00:00:03.7776010
Benchmarking type BenchmarkNewtonsoft
NewtonsoftConversionFromGenericObject 00:00:03.1178680
NewtonsoftConversionFromHashtable 00:00:03.0265470
NewtonsoftConversionFromObject 00:00:08.6941390
Heap memory usage:
SUMMARY
Filename: BmJayrockExport-heap.out
Allocated Bytes: 104.7M
Allocated Objects: 2744490
GCs: 639
Resizes: 6
Final heap size: 632k
Distinct Types: 191
Backtraces: 1667
SUMMARY
Filename: BmLitJsonExport-heap.out
Allocated Bytes: 30.3M
Allocated Objects: 651969
GCs: 211
Resizes: 5
Final heap size: 472k
Distinct Types: 144
Backtraces: 945
SUMMARY
Filename: BmNewtonsoftExport-heap.out
Allocated Bytes: 83.5M
Allocated Objects: 2081974
GCs: 586
Resizes: 5
Final heap size: 476k
Distinct Types: 110
Backtraces: 762