Recursive Descent Parser With Functions and Variables



This is a fully functional math expression parser based on a recursive descent parser (RDP). The underlying java code for the actual evaluator is in the public domain and is available here. To be useful, a parser should have a method of storing and retrieving variables by name and value, persistence of those values from session to session, and the ability to accept standard math expressions as simple text and process the text to calculate the correct mathematical result, regardless of the complexity of the expression. While there are numerous expression evaluators available in virtually every programming language, most lack the full functionality referred to above.


About twenty years ago, while working with health cars statistics, I came across Herb Schildt’s Simple Expression Parser in C++. I was convinced that such an approach could assist me in customizing and speeding up my work in mathematical statistics. Over several years, I managed to customize Schildt’s parser using C++ to accomodate more statistically useful variables such as vectors and matrices. But to my dismay, the code expanded to a large, complex, unmanagable , non-reuseable collection of interrelated classes. Porting the modules to other language became impossibly difficult. Since then, I have searched for a more compact and easily modifiable parser, and have found many in written in everything from Pascal, C, C++, C#, Python, Java, and various forms of pseudocode, none of which have been found satisfactory. Complicating the matter is the wide variety of parsing techniques that have been described including, top-down, bottom-up, recursive descent, reverse polish notation, Shunting-Yard, and on. When I found here the extremely compact little math expression parser in Java, I immediately went to work trying to expand it to suite my own needs. This article is the result of my early efforts to do so.

Using the code

There are three classes that work together to accomplish the parsing, JVarMap, JPreScan, and JParser.

JVarMap stores variable names using an ArrayList variable, Key, and an ArrayList variable, Value. There are methods for adding, updating, and erasing variables from the mapping. These methods are fairly straightforward using ArrayLists. The value of a named variable can be accessed through the variable name using the getValue(String key) method. Class methods writeMap and readMap provide for binary disk storage and retrieval of data.

public class JVarMap { private static ArrayList Key; private static ArrayList Value; private byte[] BArray; private int nOffset; public JVarMap() { Key = new ArrayList(); Value = new ArrayList(); BArray = null; nOffset = 0; } public int Add(String svarnam, double dval) { Key.add(svarnam); Value.add(dval); BArray = null; nOffset = 0; return 1; } public int Update(String svarnam, double dval) { int nIndex = getKeyIndex(svarnam); if(nIndex < 0) return 0; Value.set(nIndex, dval); return 1; } public int Erase(String svarnam) { int nIndex = getKeyIndex(svarnam); if(nIndex < 0) return 0; Key.remove(nIndex); Value.remove(nIndex); return 1; } public double getValue(String sKey) { int ndx = -1; boolean bfound = false; double dnull = NaN; if( !Key.contains(sKey) ) return dnull; for(int i = 0; i < Key.size(); i++) { if(Key.get(i).equals(sKey)) bfound = true; if(bfound) ndx = i; if(bfound) break; } if(!bfound) return dnull; double dm = Value.get(ndx); return dm; } 

Because the JParser.eval method will only work with actual values and not with variable names, it is necessary to translate any variable names into their actual values in order to reformat an expression such as ‘m*x + b’ into ‘0.8 * 3.0 + 4.0’, in other words, substituting variable names with actual values. This is done through the JPreScan class. The Prescan method takes an initial string expression as a parameter, reformats as necessary to insure proper spacing which is required to split the input expression into string tokens, then uses these tokens to reconstruct an output string that uses a CVarMap variable, varmap, to substitute actual values for literal value names where necessary.

public String Prescan(String sCommand)
{ String sCommand2, sv; double dv = 0.0; sv = ""; sCommand2 = sCommand; String sx = sCommand2; System.out.println("sx =: " + sx); sx = sx.replaceAll("\\(", " "); sx = sx.replaceAll("\\)", " "); sx = sx.replaceAll("\\*", " "); sx = sx.replaceAll("\\+", " "); sx = sx.replaceAll("\\-", " "); sx = sx.replaceAll("\\/", " "); sx = sx.replaceAll("\\^", " "); System.out.println("sx =: " + sx); String[] sar = sx.split(" "); for(int i = 0; i < sar.length; i++) { System.out.println(String.format("sar[%d] =: %s", i, sar[i])); } System.out.println("sCommand2 =: " + sCommand2); for(int i = 0; i < sar.length; i++) { if(sar[i].isEmpty()) continue; if(sar[i].matches("[-+]?\\d*\\.?\\d+") ) continue; System.out.println(String.format("sar[%d] =: %s", i, sar[i])); if(varmap.isKey(sar[i])) { dv = .getValue(sar[i]); System.out.println("dv =: " + dv); sv = Double.toString(dv); System.out.println("sv =: " + sv); sCommand2 = sCommand2.replaceAll(sar[i], sv); } } System.out.println("sCommand2 =: " + sCommand2); return sCommand2;

Evaluation of the final prescanned expression is done using the JParser class eval(final String str) method. The eval method for arithmetic expressions does addition, subtraction, multiplication, division, exponentiation (using the ^ symbol), and a few basic functions like sqrt, log, atan, exp, sin, cos, and tan.. It supports grouping using (), and it gets the operator precedence and associativity rules correct. The parser is a recursive descent parser so it internally uses separate parse methods for each level of operator precedence in its grammar. A posting of some of the code is in the public domain and can be found here.

public double eval(final String str) { return new Object() { int pos = -1, ch; void nextChar() { ch = (++pos < str.length()) ? str.charAt(pos) : -1; } boolean eat(int charToEat) { while (ch == ' ') nextChar(); if (ch == charToEat) { nextChar(); return true; } return false; } double parse() { nextChar(); double x = parseExpression(); if (pos < str.length()) throw new RuntimeException("Unexpected: " + (char)ch); return x; } double parseExpression() { double x = parseTerm(); for (;;) { if (eat('+')) x += parseTerm(); else if (eat('-')) x -= parseTerm(); else return x; } } double parseTerm() { double x = parseFactor(); for (;;) { if (eat('*')) x *= parseFactor(); else if (eat('/')) x /= parseFactor(); else if (eat('^')) x = Math.pow(x, parseFactor()); else return x; } } double parseFactor() { if (eat('+')) return parseFactor(); if (eat('-')) return -parseFactor(); double x; int startPos = this.pos; if (eat('(')) { x = parseExpression(); eat(')'); } else if ((ch >= '0' && ch <= '9') || ch == '.') { while ((ch >= '0' && ch <= '9') || ch == '.') nextChar(); x = Double.parseDouble(str.substring(startPos, this.pos)); } else if (ch >= 'a' && ch <= 'z') { while (ch >= 'a' && ch <= 'z') nextChar(); String func = str.substring(startPos, this.pos); x = parseFactor(); if (func.equals("sqrt")) x = Math.sqrt(x); else if (func.equals("log")) x = Math.log(x); else if (func.equals("atan")) x = Math.atan(x); else if (func.equals("exp")) x = Math.exp(x); else if (func.equals("sin")) x = Math.sin(Math.toRadians(x)); else if (func.equals("cos")) x = Math.cos(Math.toRadians(x)); else if (func.equals("tan")) x = Math.tan(Math.toRadians(x)); else throw new RuntimeException("Unknown function: " + func); } else { throw new RuntimeException("Unexpected: " + (char)ch); } return x; } }.parse(); }

In order to demonstrate the program, the Console class has been used. A more detailed description of the class methods can be found here. Note that the JParser.eval method alone does not have the capability of assignment. That is, an expression such as ‘r = 10.2 * x’ will be rejected as an error. The workaround for this problem was to use the Console to recognize when an equals sign is contained in the initial input expression, if so, to separate the assignment from the evaluation part, and process each separately, reestablishing their connection following evaluation and adding or updating that was assigned to the variable map. This is implicit variable initialization. In other words, simply assigning a variable name, whether or not it is already mapped, will result in a new variable if it is not, or an updating of an existing variable. Notice that the Console class utilizes the NetBeans StdOut as well as it’s own console window output, thus enhancing the task of debugging.

A built in help method is available from the console by typing in ‘help’ or ‘?’. The following will appear:

public class Console extends javax.swing.JFrame { public Path p; public JParser parser = new JParser(); public JVarMap varmap = new JVarMap(); public JPreScan prescan = new JPreScan(); public Console() { initComponents(); p = Paths.get("./data/varmap.dat"); try { varmap.readMap(p); varmap.dumpKeys(); } catch (FileNotFoundException ex) { System.out.println(String.format("%s", ex.getMessage())); } } private void jTextArea1KeyPressed(java.awt.event.KeyEvent evt) { if (evt.getKeyCode() == KeyEvent.VK_ENTER) { boolean bAssign = false; int nIndex = sCommand.indexOf('='); else if(nIndex >= 0) { System.out.println("Assignment of new or update of variable plus evaluation"); svarnam = sCommand.substring(0, nIndex-1); sCommand = sCommand.substring(nIndex + 1, sCommand.length()); JPreScan prescan = new JPreScan(varmap, sCommand); sCommand = prescan.getOutStr(); try { double dval = parser.eval(sCommand); if(varmap.isKey(svarnam)) varmap.Update(svarnam, dval); else varmap.Add(svarnam, dval); String smtx = Double.toString(dval); smtx += "\n"; appendString(smtx); } catch (RuntimeException ex) { System.out.println(String.format("Warning: %s", ex.getMessage())); appendString("Unknown variable or function\n"); } } 

Points of Interest

The main challange has been to modify this parser to handle more complex variables such as vectors and matrices, and functions that require multiple input parameters. To date, I have been unsuccessful in doing so. I have even tried substituting the Shunting-Yard Algorithm parser using reverse polish notation with some limited success. If one could devise a code in the public domain that was readable, expandable, easily modified, and reusable, it would be of considerable value to many programmers. But vague allusions as to how various parser classes might be extended is all that I have found.


Version May 26, 2017