tesseract: transversing Java Domains using JavaScript

Published

For those who don’t know, I’ currently working on Instituto Superior Técnico at the Fenix Project. Fenix is a web application developed to do academic administration of the entire college, from course enrollment to parking.

This application is written in Java and uses a rich domain with more than 1000 distinct entities. One of the technologies that suports Fenix is the JVSTM (Java Versioned Software Transactional Memory) and the FenixFramework. This library allows to describe complex domains (both entities and relationships) in a syntax similar to java, and generates both the SQL1 code that creates/alter the tables, and the high level code that works with the application level.

One of the first problems that i faced was the domain dimension, and and the level of relations that exist. The team is used to opening the domain model file and scroll up and down searching for relations and classes, but this file has 17000+ lines. So one of the first things i developed was the Fenix Domain Browser, a domain structure navigator that presents the domain as a series of UML schemes. But this browser only can inspect the domain structure and not his instances. For that you normally use SQL queries, that makes navigating in a relation a slow and painful process (inner joins), or you build the entire stack from the domain up to the viewing level. None of this alternatives was ideal but was what we had.

Last year, in “Programação Avançada” (Advanced Programming), one of the course projects was to develop an interpretator of Java2. Developing this REPL3 was alot of fun, because we had time and implemented a bunch of features that weren’t required (dynamic functions, macros, etc.). After the project ended, I tried to connect this REPL with the Fenix Domain, but ending up not being a nice interaction. The main problem was Java’s typification. Each time you had to create a variable you had to declare types and probably do some kind of cast. Using Object variables only delays the problem since it had then to upper cast them to call any method. Other problem was the intersection library that I was using, JA doesn’t support Java 5 and that means no foreach’s, autoboxings, etc. For that reasons I stopped using it and put it aside.

Recently I was given the responsibility of reorganize an application used in IST called FeaRS. This is a Feature request system, but uses a voting system similar to digg or reddit. This application also uses the same domain technology than Fenix. When I finally put the application running on my server I wanted to inspect the instances on the domain so I remember of using the REPL I had developed. But this time I had one of those “What if”.

What if i used rhino to interact with the domain and use JavaScript as an interface language. After a week working on this I end up with Tesseract.

Tesseract is a REPL for any domain developed with the Fenix Framework. By default uses JavaScript but can also work with Java. This is essential because the team’s know how is in Java. Since Java and JavaScript are to some extent similar, allows them to use the this program and benefit from JavaScript dynamism and functional behavior, and if needed its possible to use Java in a particular task.

With the technology that we are currently work with, at the higher layers every relation is represented as lists. Since JavaScript is a functional language, where implemented functions that operate over java.util.Collection interface (map, reduce, filter, etc.). Although these functions receive a JavaScript function, these were implemented in Java, and run really fast.

On top of this I decided to implement a query language based upon Microsoft’s LINQ. This allows the developers to create queries with the same power that in SQL (actually more power, since JavaScript is Turing Complete and SQL isn’t), fully replacing it for read/write tasks. Structural changes in the domain are not possible yet. This language can work with both Java and JavaScript objects, since the interfaces to slots and methods are similar.

Examples

To start a query, an array (either a Java Collection, or something that implements a map method) is passed to the $ function. This will generate a Query object.

tes>$(FearsApp.getFears().getAdmins())
[query: size: 7]
Its important to notice that in this language we are always operating over lists of objects. Its possible to for the query return the result set as a Native JavaScript Object (toArray), or a Java List (toList). Now lets see from the admins who has voted in request, only limiting the result to 3:
tes>var z = $(FearsApp.getFears().getAdmins()).where(
function(user){
return find(user.getVoter(),
        function(voter){
        return voter.getVotesUsed() > 0;
        });
}).limit(3);
tes>z
[query: where$limit, size: 3]
To show the results we ask for a table:
tes> z.table();
+------------------------------------------+
| eu.ist.fears.server.domain.User@24c68a98 |
| eu.ist.fears.server.domain.User@1494cb8b |
| eu.ist.fears.server.domain.User@34bf1d3b |
+------------------------------------------+
By default a table can returns the result of calling toString on a object. Its possible to have another results:
tes> z.table(["username"]);
+----------+
| username |
+----------+
| istXXXXX |
| istXXXXY |
| istXXXXZ |
+----------+
The string can represent either a slot, a method or a getter (in this case “getUsername”). A table can have as many columns as desired:
tes> z.table([
{
label:"The UserName",
slot:"username"
},{
label: "Another UserName",
func:function(x) {
    return x.getUsername()
}
}
]);
+--------------+------------------+
| The UserName | Another UserName |
+--------------+------------------+
|     istXXXXX |         istXXXXX |
|     istXXXXY |         istXXXXY |
|     istXXXXZ |         istXXXXZ |
+--------------+------------------+
To travel in a relation one can use the select function when is expected to return a single object, or selectAll if the relation is *-to-Many. Those functions receives a string representing a slot, method or getter, or a function that is supposed to return the selected object:
tes> var t = $(FearsApp.getFears().getAdmins()).selectAll("getVoter").select("project");
tes> t.count();
55
Its important to note that selectAll will collect the results as they come, so its likely to contain repetitions if they occur in the relation. To select the unique objects in a list:
tes> t.distinct().count();
8
To see what object you have at one position you can it with elementAt but normally what you want is to inspect it. inspect will inspect the object at index you provide. This function only works with DomainObjects:
tes> t.inspect(1);
Instance of: eu.ist.fears.server.domain.Project
+---------------------+-----------------------------------------------------------------------------+
|                slot |                                                                       value |
+---------------------+-----------------------------------------------------------------------------+
|                name |                                                             CIIST-Taguspark |
|         description | Propostas e sugest?es para melhoramento dos servi?os do CIIST no Taguspark. |
| featuresIncrementID |                                                                           4 |
|        initialVotes |                                                                           5 |
|        listPosition |                                                                           6 |
+---------------------+-----------------------------------------------------------------------------+
+----------------+--------------------------------------------------------+
|       relation |                                             value/size |
+----------------+--------------------------------------------------------+
|          voter |        eu.ist.fears.server.domain.Voter(<lenght: 215>) |
|         author |            eu.ist.fears.server.domain.User(4294967497) |
|          admin |           eu.ist.fears.server.domain.User(<lenght: 0>) |
|       fearsApp |       eu.ist.fears.server.domain.FearsApp(17179869185) |
| featureRequest | eu.ist.fears.server.domain.FeatureRequest(<lenght: 4>) |
+----------------+--------------------------------------------------------+
If you need to know what is the structure of the entity, you can use entity:
tes> t.distinct().entity(1);
Entity eu.ist.fears.server.domain.Project
+---------------------+------------------+
|                slot |             type |
+---------------------+------------------+
|                name | java.lang.String |
|         description | java.lang.String |
| featuresIncrementID |              int |
|        initialVotes |              int |
|        listPosition |              int |
+---------------------+------------------+
+----------------+-------------------------------------------+--------------+
|       relation |                                      type | multiplicity |
+----------------+-------------------------------------------+--------------+
|          voter |          eu.ist.fears.server.domain.Voter |            * |
|         author |           eu.ist.fears.server.domain.User |         1..1 |
|          admin |           eu.ist.fears.server.domain.User |            * |
|       fearsApp |       eu.ist.fears.server.domain.FearsApp |            1 |
| featureRequest | eu.ist.fears.server.domain.FeatureRequest |            * |
+----------------+-------------------------------------------+--------------+
Another important thing is to know what kind of objects you have in your list. The function type returns a list with the classname of the objects:
tes> t.distinct().types();
[query: selectAll$select$distinct$types, size: 1]
tes> t.distinct().types().table();
+------------------------------------+
| eu.ist.fears.server.domain.Project |
+------------------------------------+
A query can be passed to another query, either as the starting list:
tes> $(z).where(function(x){ return x.getUsername().equals("istXXXXX"); });
[query: where, size: 1]
Or as a intersection to another query:
tes> $(range(0,20)).intersect($(range(5,15)));
[query: intersect, size: 10]

Performance

I executed some tests with this language and for the most part, performance is acceptable. The first iteration of a new relation is a bit slower due to the objects not being in cache.

The biggest relation that exists contains about 2 million objects. Using only integer, an iteration over a list with 2.5 millions objects took about 30 seconds. This means that the bottleneck is still in the connection to the database.

Future Work

One of the things i would like to implement is some kind of lazy evaluation. Currently each function call returns a new query object. That means that the query doesn’t generate side effects on the query object. So, its possible to create promises of execution for each new operation and when a effect requested the full set of operations would be minimized and optimized. But since this is going to be used in a development environment, this is not critical right now.

Using JavaScript as a interaction tool with Java can be useful because its not hard for a person who understands Java to work with JavaScript, and this language allows a more dynamic approach to software development in Java

Tesseract is available at githut here. Its NOT production ready, its even prior to a version 0.1 at this moment.


  1. There are attempts to use NoSQL systems, that in this kind of domain have greater performance ↩
  2. Actually was a bit different, the language was called µJava, but was essentially the same thing. ↩
  3. If you aren’t lispy enough it means Read-Eval-Print Loop ↩