Putting a Java Interface on your C, C++, or Fortran Code
Abstract : The purpose of this report is to document some of the technical aspects of creating Java interfaces for codes written in languages other than Java. We outline a procedure where one separates the construction of the interface from the external codes with the introduction of an intermediate "wrapper" class. This intermediate class serves to isolate user interface details from the details of calling external routines. This intermediate class also facilitates the incorporation of external routines into Java based systems for distributed computing and/or visual programming.
Contents
Chris Anderson
Department of Mathematics
UCLA Los Angeles, CA 91555
7/15/97
These software components were developed in conjunction with the research supported by Air Force Office of Scientific Research Grant F49620-96-I-0327 and National Science Foundation/ARPA Grant NSF-DMS-961584
Introduction
While people are debating whether or not Java is good for computationally intensive tasks, the fact is that C, C++ and Fortran are the primary languages for those who do scientific/technical computing. It also seems unlikely that this situation will change in the near future. Unfortunately, C, C++ and Fortran do not contain (as Java does) standardized and platform independent constructs for creating user interfaces, managing threads, networking, and a variety of other tasks associated with creating "applications". Thus, there is interest in creating applications in which the user interface, or other "application packaging", is written in Java, but the core computational component is written in C, C++, or Fortran (see Figure 1). The purpose of this document is to describe, principally by means of an extended example, the process of creating a Java interface for a program written in C, C++ or Fortran.
In order to create applications which have the form indicated in Figure 1, one needs to know how to write Java interfaces and how to call routines written in C, C++ and Fortran from Java. The process of writing a Java interface is well described in a variety of books [1][2][3] [4] and we will assume that the reader is capable of writing a modest Java interface which accepts input and displays output to a user. The task of calling routines written in C, C++ and Fortran from Java comes under the heading of implementing and using "native" methods. Here too, other documents [1][7] describe the process of interfacing Java to other languages. While, for completeness, we will outline the steps required to create and implement Java classes with native methods, we assume that the reader has implemented a Java class that has at least one native method procedure in it (e.g. the "Hello World" example of [7]).
In one aspect, this report is the presentation of an extended example demonstrating how this knowledge of writing Java interfaces and implementing native methods can be combined to create a Java/"other language" application. In addition to providing samples of the mechanisms for data exchange, the example also reveals the choices we made (and choices you will have to make) concerning the dividing line between the Java interface and routines written in C, C++ or Fortran. Our example concerns the creation of a Java interface for a program which solves the heat equation in a two dimensional rectangular region. Examples in C++ and Fortran are given (as is readily seen the C++ example is very close to what might be composed in C).
In the first section we outline the process that we follow for creating applications of the type described by Figure 1. In the second section we present the example which will form the basis of our discussion, and in the third and fourth sections we detail the construction of the Java classes which form the primary components of the application.
The process of creating a Java interface to C, C++ and Fortran routines
The process that we use for creating Java interfaces consists of the three steps indicated in figure 2.
A noticeable feature of the process is that we utilize three steps, rather than two. One may wonder about the need for the intermediate step; that of writing an intermediate class that ``wraps'' the C, C++ or Fortran code. Originally we didn't have three steps, but adopted this practice for several reasons:
The example program written in "another" language.
The starting point for the process of writing a Java interface is to have a program or a selected set of code components that one wishes to write interfaces for. Rather than discuss the process of writing interfaces in an abstract way, we discuss the process of writing interfaces for a specific example. The example program is one that computes the evolution of the temperature of a rectangular plate. The main driver routine is given below (as well as in the file tempCalc.cpp); the include file for the functions which the main routine calls are given in tempCalcRoutines.h and the source for these routines is given in tempCalcRoutines.cpp.
In the first part of the main routine, the problem and run parameters are set, memory is allocated and the temperature distribution is initialized. A time stepping loop is then executed. In this loop, the temperature of the plate is evolved in time increments of size dt by calling the routine evolveTemperature(...) and at some predetermined number of time steps the temperature distribution is output. (In this case written to a file tempOut.dat).
Even though the temperature values are associated with a two-dimensional set of nodes covering the plate, we allocate and pass one-dimensional arrays of values. This was done because the standard method for exchanging data with other languages is through one-dimensional arrays; Java is no exception. Using one-dimensional sets of data values does not preclude using a two-dimensional array structure to access the data. The routines create2dArrayStructure(...) and destroy2dArrayStructure(...) in tempCalcRoutines.cpp demonstrate how one can create a two-dimensional array structure which access the data allocated as a one dimensional array.
#include
#include
#include "tempCalcRoutines.h"
void main()
{
//
// Set Problem parameters
//
double diffusivity = 0.1;
double a = 0.0; double b = 1.0;
double c = 0.0; double d = 1.0;
//
// Set Runtime parameters
//
long m = 10;
long n = 20;
long nSteps = 100;
long nOut = 10;
double dt = 0.01;
//
// Allocate space for solution and work arrays
//
double* Tarray = new double[m*n];
double* workArray = new double[m*n];
//
// Open output file
//
ofstream Fout("tempOut.dat");
initializeTemperature(Tarray,m,n,a,b,c,d);
double time = 0.0;
int i; int j;
for(i = 1; i <= nSteps; i++)
{
evolveTemperature(Tarray,m,n,a,b,c,d,dt,diffusivity,workArray);
time = time + dt;
if((i%nOut)== 0)
{
cout << " Step " << i << endl; // print out step to screen
Fout << m << " " << n << endl; // output to file tempOut.dat ...
Fout << time << endl;
for(j = 0; j < m*n; j++){Fout << Tarray[j] << endl;}
}
}
delete [] Tarray;
delete [] workArray;
}
This program is typical of many computationally intensive applications; data is allocated, parameters and values are initialized, and then a time stepping loop is executed. As the calculation proceeds data is output periodically.
The Fortran version of this program is given in tempCalc.f and the supporting routines are given in tempCalcRoutines.f. One may notice that the C++ program is nearly identical to the Fortran program and does not use any of the object oriented features of C++ (i.e. it does not utilize classes). This was done intentionally so that the code would serve as an example of codes which are likely to be used (and/or written) by the majority of those involved in scientific/technical computation.
The Java class that encapsulates the C, C++ or Fortran codes components.
The second step in the process of creating an interface is to create a Java wrapper class that encapsulates the C, C++ or Fortran code components. It is in this class that the connection between the external routines and the corresponding Java routines is made. This class is also responsible for "loading" the external routines.
Essentially, this class replaces the main() routine. In this regard the class allocates the required arrays, contains the parameters as data members and also contains the methods (declared native) which are invoked by the main() driver routine (the initializeTemperature and evolveTemperature routines).
To facilitate the execution of the program as a separate thread, this class implements the Runnable interface (it implements a run() method). In this run() method, we have changed the output process to be one which displays a color contour plot of the data, rather than write the output to a file. The requisite Java classes are contained in the files ColorContourPlot.java, ColorContourCanvas.java and ContourResolutionDialog.java.
Lastly this routine also includes a main routine of it's own for testing purposes. The complete code is given in TempCalcJava.java.
Since this routine has native methods, one must create the dynamically linked library (DLL) or shared library that contains their implementations. As outlined in the discussions on implementing native methods [1][7], this is a multi-step process:
See "Native Method Implementation Process" for a diagram of these steps.
At this point, if the native method implementation process is successful, one should be able to run a "command line" version of the program by executing the main routine of the class i.e. just execute "java TempCalcJava". Problems which occur at this point are often caused by incorrect, or non-specification, of the path which is searched for the library containing the native method implementation. On PC/Windows platforms the PATH variable must include the directory containing the native method implementation dll. On UNIX machines running solaris the LD_LIBRARY_PATH variable must include the directory containing the native method implementation shared library.
The Java Interface
The third step in writing the interface is to write the Java class that implements the interface. Minimally this means creating a Java application that possesses program control buttons and fields for data input. The interface is displayed below, and the associated Java code is contained within TemperatureApp.java.
This user interface was constructed using tools that generate Java 1.0.2. However, since our implementation of native methods is Java 1.1 based, after the initial construction, we compiled and worked with this code using the Java 1.1 compiler. In the Java 1.1 compilation step one must specify the flag "-deprication" and put up with all the warnings that are generated. Hopefully the interface construction tools will support Java 1.1 soon and these nuisances will disappear.
In looking over the Java code, one should take note that the computationally intensive part of the application is done as a separate thread [6]. Specifically, within the code which gets executed when the Run button is hit (the code fragment is given below) we create threads for the separate components of the application--- one thread for the calculation component and one thread for the color contour plot. The calculation component thread is given a lower priority, so that on machines whose implementation of the Java virtual machine doesn't time-slice among equal priority threads, the computationally intensive component will not cause the user interface to "freeze".
void RunButton_Clicked(Event event)
{
*
*
*
//
// Set up and start the threads for the contour plot and the
// calculation
//
Thread current = Thread.currentThread(); // capture current thread
Thread contourThread = new Thread(temperatureRun.contourPlot);
contourThread.start();
TempRunThread = new Thread(temperatureRun);
TempRunThread.setPriority(current.getPriority() -1);
TempRunThread.start();
}
The color contour plot that results from the execution of the program is given below
//
Java/Fortran Interface
This tutorial gives an example of how to interface Fortran and Java. It is intended to supplement the example given in Java/Fortran Interface tips and it's a good idea to read that site first. The example was developed on Sun Solaris using Sun Workshop 5.0 C++ and Fortran compilers and Sun JDK 1.2. Because the example is no longer fully portable, it may not work on your system and I don't know enough about other systems to really help you. So, if you do manage to get things running on other systems then please send in your tips and makefiles for the benefit of other users.
Why do it?
Java provides object-oriented programming, easy visualisation and true portability. Fortran has none of these things but does offer highly optimised compilers (with the option of parallelisation), excellent numerical libraries, simplicity, maths support (e.g. complex numbers and multi-dimensional arrays) and masses of legacy code all of which make it excellent for number crunching. Although one might like to move over to pure Java, the performance advantages of Fortran or the effort of converting legacy code often enforces a dual approach with an interface between the two languages.
Introduction
This tutorial will work through an example which sets some variables in Java, changes them in Fortran and shows the effect in Java. The Java program must call the Fortran subroutine via a C++ wrapper (as explained in Java/Fortran Interface tips) because Java can only talk to C or C++. The variables are integers, doubles, booleans and strings, one of each with the suffix "_A" and one with the suffix "_B". Each variable will be set in Java and reset in Fortran. However, only the changes in the "_A" variables are passed through to Java via the C++ wrapper. In addition two 2-D arrays ( a double precision array and a complex double precision array) are passed to Fortran. These arrays are overwritten in the Fortran subroutine. Because Java does not support multi-dimensional arrays, a 2-D array of size (mx,my) is referenced as a 1-D arrey of size (mx*my). In addition, Java does not support complex numbers so a 2-D complex array of size (mx,my) is referenced as a 1-D array of size(2*mx*my).
The Java program
The Java program initialises the variables and echoes them to the screen, then calls the Fortran subroutine through its C++ wrapper and echoes the updated variable values. In the line
public native void cppsub(int int_A, int int_B, double double_A, double double_B, boolean bool_A, boolean bool_B, int mx, int my, double[] doubleArray, double[] complexArray, String string_A, String string_B);
we declare a native routine "cppsub" (our C++ wrapper). In the line
static { System.loadLibrary("JniDemo");}
we load in the dynamic library "libJniDemo.so". The Java code is compiled by "javac JniDemo.java" and is found here: JniDemo.java. Notice how the 1-D Java arrays are unpacked to access the 2-D Fortran arrays.
Java include file
We can automatically generate a Java include file for the native routine "cppsub" from JniDemo.class by typing "javah -jni JniDemo". The include file is found here: JniDemo.h. and gives us the declaration of the subroutine "Java_JniDemo_cppsub" for the C++ wrapper. All the Java integers, doubles etc. have been passed as C++ types jint, jdouble etc. specified in "jni.h".
The C++ wrapper
The C++ wrapper is the heart of the interface and must do more than simply pass its parameters to Fortran. The link between Fortran and C++ is unfortunately not standardised. Information on the Solaris interface is found here: Fortran/C Interface. The line
extern "C" void f90sub_(int* int_A, int* int_B, double* double_A, double* double_B, unsigned char* bool_A, unsigned char* bool_B, int* mx, int* my, double* doubleArray, double* complexArray, const char* str_A, const char* str_B, int* str_A_len, int* str_B_len);
declares the externally compiled Fortran subroutine "f90sub" which gets an underscore added to its name. The routine is called with:
f90sub_(&int_A, &int_B, &double_A, &double_B, &bool_A, &bool_B, &mx, &my, daPtr, caPtr, str_A, str_B, &str_A_len, &str_B_len);
All the variables are passed by reference. Note that the jbooleans must be passed to Fortran as type unsigned char.
Let us first look at the scalar variables (jint, jdouble, jboolean). These can simply be passed by reference to Fortran. On return from the Fortran subroutine, the corresponding Java variables ending in "_A" are updated by:
jclass cls = env->GetObjectClass(obj);
jfieldID fid;
....
fid = env->GetFieldID(cls, "int_A", "I");
if (fid==0){
return;
}
env->SetIntField (obj, fid, int_A);
fid = env->GetFieldID(cls, "double_A", "D");
if (fid==0){
return;
}
env->SetDoubleField (obj, fid, double_A);
fid = env->GetFieldID(cls, "bool_A", "Z");
if (fid==0){
return;
}
env->SetIntField (obj, fid, bool_A);
where we directly overwrite the variables "int_A", "double_A" and "bool_A" in the calling object "obj".
For the arrays, we create pointers to those arrays by calling a JNI routine "GetDoubleArrayElements" and pass those to the Fortran subroutine. Before leaving the C++ wrapper we must call "ReleaseDoubleArrayElements" in case Java had to make a copy of the array (see Java Native Interface tutorial ).
jdouble* caPtr = env->GetDoubleArrayElements(complexArray,0);
jdouble* daPtr = env->GetDoubleArrayElements(doubleArray,0);
...
env->ReleaseDoubleArrayElements(complexArray, caPtr,0);
env->ReleaseDoubleArrayElements(doubleArray, daPtr,0);
Passing strings is slightly more complicated. We first define pointers "str_A" and "str_B" and get the lengths of the strings using "GetStringUTFChars" and "GetStringLength" and pass those to Fortran:
jstring jstr;
int str_A_len,str_B_len;
const char *str_A = env->GetStringUTFChars(string_A, 0);
str_A_len=env->GetStringLength(string_A);
const char *str_B = env->GetStringUTFChars(string_B, 0);
str_B_len=env->GetStringLength(string_B);
Next we make a copy of the string "str_A" to ensure it is only "str_A_len" characters long:
char buf[128]="";
strncat (buf,str_A,str_A_len);
Now we create the jstring "jstr" using the JNI method "NewStringUTF" and use that to overwrite the Java string "string_A" in the calling program:
fid = env->GetFieldID(cls, "string_A", "Ljava/lang/String;");
if (fid==0){
return;
}
jstr = env->NewStringUTF(buf);
env->SetObjectField(obj, fid, jstr);
Finally, we must release the string elements:
env->ReleaseStringUTFChars(string_A, str_A);
env->ReleaseStringUTFChars(string_B, str_B);
Here is the C++ wrapper code: JniDemo.cpp In Solaris, the C++ wrapper is compiled to create the dynamic library "JniDemo.so" something like:
CC -G -I/usr/java/include -I/usr/java/include/solaris -o JniDemo.so JniDemo.cpp
The Fortran subroutine
The Fortran subroutine first prints out all the variables passed to it, then overwrites them all and finally prints out the variables again. Note that booleans (Fortran logicals) must be converted from the unsigned chars that were passed from the C++ wrapper with the lines:
bool_A=(cbool_A.eq.char(1))
bool_B=(cbool_B.eq.char(1))
At the end, the Fortran logicals must be converted back to unsigned characters with the lines:
cbool_A=char(0)
cbool_B=char(0)
if(bool_A) cbool_A=char(1)
if(bool_B) cbool_B=char(1)
The passings of character strings between C++ and Fortran is not standardised so we have used character arrays and passed a separate integer with the string length. The arrays are indexed in the normal Fortran fashion. The fortran subroutine is found here: JniDemo_fortran.f In Solaris, we compile the fortran subroutine and link in the C++ wrapper like this:
f90 -lF77_mt -lM77 -lfsu -lf77compat -G -L/usr/java/jre/lib/sparc -ljava -I/usr/java/include -I/usr/java/include/solaris -o libJniDemo.so JniDemo.so JniDemo_fortran.f
The makefile
Here is the makefile for Sun Solaris: makefile . It is hopefully easy to adapt to your system. Note that files with the suffix ".so" are dynamic libraries (equivalent to ".DLL" files in Windows). The "-G" compiler flag signals creation of a dynamic library. The "-o" flag names the output file. The "-I" flag signals include files. The "-L" flag signals the directory where libraries to be linked to are kept and the "-l" flag names individual libraries (the names are expanded e.g. -ljava indicates file libjava.so). Here, dynamic libraries are used throughout rather than object files. The final linking step must be done with the Fortran f90 compiler so the dynamic library "libJniDemo.so" is created by linking the libraries "JniDemo.so" and JniDemo_fortran.so" together.
Output of the program
Compile the program using "make" and run it using "java JniDemo". You should get the output shown here: Output of "java JniDemo". Note that the print statements may be jumbled as you shouldn't really mix Java and native input/output. I hope you got it to work. Now try adapting it to suit your needs. If there's anything that hasn't been covered then let me know at the address below. Good luck!
Notes
There seems to be a problem (using Sun/Solaris/Workshop) when using e.g. REAL(8) and COMPLEX(16) in Fortran 90 instead of REAL*8 and COMPLEX*16. I get the message:
FATAL ERROR in native method: Try to unpin an object that is not pinned
I don't know why this is and the workaround is to just use the REAL*8 notation. Please email me if you know what's going on...
//
Features of Perl and Python compared with Java, C/C++ and Fortran:
|
shorter programs |
|
much faster development |
|
no variable declarations, |
|
lots of gluing tools |
|
lots of text processing tools |
|
GUIs are simpler/shorter |
|
lots of Web programming tools |
The popularity of scripting is growing fast!
|
70s/80s: Unix shell, job control languages (primitive, cryptic programming) |
|
90s: Development of Perl, Python, Ruby, Tcl ++ |
|
Y2K: Scripting languages are stable and powerful on all major platforms |
|
Next decade: explosive interest??? |
|
Scripting supports software reuse |
|
Scripting simplifies GUI programming |
|
Scripts are fast enough on today's computers |
|
Python supports both short scripts and Java/C++-like systems |
|
Case: make a window on the screen with the text 'Hello World' |
|
C + X11: 176 lines of ugly code |
|
Python + Tk: 6 lines of readable code #!/usr/bin/env python
from Tkinter import *
root = Tk()
Label(root, text='Hello, World!',
foreground="white", background="black").pack()
root.mainloop() |
|
Java and C++ codes are longer than Python + Tk |
· |
Object-oriented programming can also be used to parameterize types |
· |
Introduce base class A and a range of subclasses, all with a (virtual) print function |
· |
Let debug work with var as an A reference |
· |
Now debug works for all subclasses of A |
· |
Advantage: complete control of the legal variable types that debug are allowed to print (may be important in big systems to ensure that a function can allow make transactions with certain objects) |
· |
Disadvantage: much more work, much more code, less reuse of debug in new occasions |
|
Does the application implement complicated algorithms and data structures? |
|
Does the application manipulate large datasets so that execution speed is critical? |
|
Are the application's functions well-defined and changing slowly? |
|
Will strong typing be an advantage, e.g., in large development teams? |
|
Automate manual interaction with the computer |
|
Customize your own working environment and become more efficient |
|
Have more fun |
|
Increase the reliability of your work |
|
Scripts (usually) run on Unix, Windows, Mac and many hand-held devices (i.e. scripts are cross-platform) |
|
You get the power of Unix also in non-Unix environments |
|
Python is easier to learn because of its clean syntax and simple/clear concepts |
|
Python supports OO in a much easier way than Perl |
|
GUI programming in Python is easier |
|
Documenting Python is easier because of doc strings and more readable code |
|
Complicated data structures are easier to work with in Python |
|
Python is simpler to integrate with C++ and Fortran |
|
Python can be seamlessly integrated with Java |
|
Perl is more widespread |
|
Perl has more additional modules |
|
Perl is faster |
|
Perl enables many different solutions to the same problem |
|
Perl programming is more fun (?) and more intellectually challenging (?) |
Python is the very high-level Java,
Perl is very the high-level C?
|
NumPy enables efficient numerical computing in Python |
|
NumPy is a Python/C package which offers efficient arrays (contiguous storage) and mathematical operations in C |
|
Old Numeric module: from Numeric import * |
|
New Numarray module: from numarray import *
NumPy contains other modules as well |
|
Fill a 2D NumPy array with function values: n = 2000
a = zeros((n,n), Float)
xcoor = arange(0,1,1/float(n))
ycoor = arange(0,1,1/float(n))
for i in range(n):
for j in range(n):
a[i,j] = f(xcoor[i], ycoor[j]) |
|
Fortran/C/C++ version: (normalized) time 1.0 |
|
Python version: time 75 |
|
NumPy vectorized evaluation of f: time 3 |
|
Python loops over arrays are extremely slow |
|
NumPy may be sufficient |
|
However, NumPy vectorization may be inconvenient |
|
Write administering code in Python |
|
Identify bottlenecks (via profiling) |
|
Migrate slow Python code to Fortran, C, or C++ |
|
A Python variable can hold different objects: d = 3.2 # d holds a float
d = 'txt' # d holds a string
d = Button(frame, text='push') # instance of class Button |
|
In C, C++ and Fortran, a variable is declared of a specific type: double d; d = 4.2;
d = "some string"; /* illegal, compiler error */ |
|
This difference makes it quite complicated to call C, C++ or Fortran from Python |
|
Suppose we have a C function extern double hw1(double r1, double r2); |
|
We want to call this from Python as from hw import hw1
r1 = 1.2; r2 = -1.2
s = hw1(r1, r2) |
|
The Python variables r1 and r2 hold numbers, we need to extract these in the C code, convert to double variables, and then call hw1 |
|
This is done in wrapper code |
|
Between Python and hw1 we need wrapper code: static PyObject *_wrap_hw1(PyObject *self, PyObject *args) {
PyObject *resultobj;
double arg1 ;
double arg2 ;
double result;
if(!PyArg_ParseTuple(args,(char *)"dd:hw1",&arg1,&arg2)) {
return NULL; /* wrong arguments provided */
}
result = (double) hw1(arg1,arg2);
resultobj = PyFloat_FromDouble(result);
return resultobj;
|
|
A Python module, pymat, enables communication with Matlab: from Numeric import *
import pymat
x = arrayrange(0,4*math.pi,0.1)
m = pymat.open()
# can send NumPy arrays to Matlab:
pymat.put(m, 'x', x);
pymat.eval(m, 'y = sin(x)')
pymat.eval(m, 'plot(x,y)')
# get a new NumPy array back:
y = pymat.get(m, 'y')
|
|
The demonstrated functionality can be coded in Matlab as well |
|
Why Python + F77? |
|
We can define our own interface in a much more powerful language (Python) than Matlab |
|
We can much more easily transfer data to and from or own F77 or C or C++ libraries |
|
We can use any appropriate visualization tool |
|
And, of course, we can call up Matlab if we want |
|
Python + F77 gives tailored interfaces and maximum flexibility |
|
The basic variables types are primitive |
|
No struct or class construct in F77 |
|
No dynamic memory allocation |
|
Very primitive language for text processing, lists and more complicated data structures |
|
F77 codes are difficult to extend, maintain and re-use |
|
F90/F95: extension of F77 with classes |
|
The efficiency of F90/F95 is still problematic, and F77 programmers move very slowly to F90/F95 |
|
F77 is a very simple language |
|
F77 is easy to learn |
|
F77 was made for computing mathematical formulas and operating efficiently on plain arrays |
|
F77 is still (and will forever be?) the dominating language for numerical computing |
|
Lots of well tested and efficient F77 libraries exist |
|
Compilers have 50 years of experience with optimizing F77 loops over arrays |
|
Result: F77 is hard to beat when it comes to array operations |
|
Use F77 for time-critical loops (array operations) |
|
Make the rest of the program in Python, C++, Java or F95 |
|
Python and F77 is an easy-to-use pair! |
|
Note: integration of F77 with C++ or Java is not always straightforward |
//
A programmer can be significantly more productive in Python than in Java. How much more productive? The most widely accepted estimate is 5-10 times. On the basis of my own personal experience with the two languages, I agree with this estimate.
Managers who are considering adding Python to their organization's list of approved development tools, however, cannot afford to accept such reports uncritically. They need evidence, and some understanding of why programmers are making such claims. This page is for those managers.
On this page, I present a list of side-by-side comparisons of features of Java and Python. If you look at these comparisons, you can see why Python can be written much more quickly, and maintained much more easily, than Java. The list is not long -- it is meant to be representative, not exhaustive.
This page looks only at programmer productivity, and does not attempt to compare Java and Python on any other basis. There is, however, one related topic that is virtually impossible to avoid. Python is a dynamically-typed language, and this feature is an important reason why programmers can be more productive with Python; they don't have to deal with the overhead of Java's static typing. So the debates about Java/Python productivity inevitably turn into debates about the comparative advantages and drawbacks of static typing versus dynamic typing a€” or strong typing versus weak typing a€” in programming languages. I will not discuss that issue here, other than to note that in the last five years a number of influential voices in the programming community have been expressing serious doubts about the supposed advantages of static typing. For those who wish to pursue the matter, Strong versus Weak Typing: A Conversation with Guido van Rossum, Part V is a good place to start. See also Bruce Eckel's weblog discussion Strong Typing vs. Strong Testing and Robert C. Martin's weblog discussion Are Dynamic Languages Going to Replace Static Languages?. For background, see one of the papers that started it all in 1998 — Scripting: Higher Level Programming for the 21st Century by John Ousterhout.
Several of these discussions contain valuable comparisons of Java and Python. For other language comparisons, see the Python language comparisons page at www.python.org, and the PythonComparedToJava page at Python for Java Programmers.
Finally, it is important to note that asserting that a programmer can be more productive in Python than in Java, is not the same as asserting that one ought always to use Python and never to use Java. Programming languages are tools, and different tools are appropriate for different jobs. It is a poor workman whose toolbox contains only a hammer (no matter how big it is!), and it is a poor programmer (or software development organization) whose development toolkit contains only one programming language. Our toolboxes should contain both Python and Java, so that in any given situation we have the option of choosing the best tool for the job. So our claim is not that Python is the only programming language that you'll ever need — only that the number of jobs for which Python is the best tool is much larger than is generally recognized.
There are three main language characteristics that make programmers more productive with Python than with Java.
Java |
Python |
statically typed In Java, all variable names (along with their types) must be explicitly declared. Attempting to assign an object of the wrong type to a variable name triggers a type exception. That's what it means to say that Java is a statically typed language. Java container objects (e.g. Vector and ArrayList) hold objects of the generic type Object, but cannot hold primitives such as int. To store an int in a Vector, you must first convert the int to an Integer. When you retrieve an object from a container, it doesn't remember its type, and must be explicitly cast to the desired type. |
dynamically typed In Python, you never declare anything. An assignment statement binds a name to an object, and the object can be of any type. If a name is assigned to an object of one type, it may later be assigned to an object of a different type. That's what it means to say that Python is a dynamically typed language. Python container objects (e.g. lists and dictionaries) can hold objects of any type, including numbers and lists. When you retrieve an object from a container, it remembers its type, so no casting is required. For more information on static vs. dynamic typing, see the appendix. |
verbose "abounding in words; using or containing more words than are necessary" |
concise (aka terse) "expressing much in a few words. Implies clean-cut brevity, attained by excision of the superfluous" |
not compact |
compact In The New Hacker's Dictionary, Eric S. Raymond gives the following definition for "compact": Compact adj. Of a design, describes the valuable property that it can all be apprehended at once in one's head. This generally means the thing created from the design can be used with greater facility and fewer errors than an equivalent tool that is not compact. |
The reason for this is that Java, virtually alone among object-oriented programming languages, uses checked exceptions — exceptions that must be caught or thrown by every method in which they might appear, or the code will fail to compile. Recently (as of June 2003) there seems to be an increasing amount of unhappiness with Java's use of checked exceptions. See Bruce Eckel's "Does Java need Checked Exceptions?" and Ron Waldhoff's "Java's checked exceptions were a mistake".
As chromatic, the Technical Editor of the O'Reilly Network, put it:
In Why Python? Eric S. Raymond notes that:
Python ... is compact--you can hold its entire feature set (and at least a concept index of its libraries) in your head.
In Why I Love Python Bruce Eckel notes that Java is not compact.
I can remember many Python idioms because they're simpler. That's one more reason I program faster [in Python]. I still have to look up how to open a file every time I do it in Java. In fact, most things in Java require me to look something up.
Java's string-handling capabilities are surprisingly weak. (But they have improved considerably with the addition of the split method to the String class in Java 1.4.)
Example
Code to add an int to a Vector, and then retrieve it. This example illustrates both the verbosity and non-compactness of Java, when compared to Python.
In Java, a new Integer object must be created and initialized from the int before it can be added to the Vector. In order to retrieve the value, the member of the Vector must be cast back to an Integer, and then converted back to an int.
In Python, none of these conversions are necessary.
Java |
Python |
public Vector aList = new Vector;
public int aNumber = 5;
public int anotherNumber;
aList.addElement(new Integer(aNumber));
anotherNumber = ((Integer)aList.getElement(0)).intValue();
|
aList = []
aNumber = 5
aList.append(aNumber)
anotherNumber = aList[0]
|
In particular, this example illustrates the clumsiness of container objects in Java, which require casting an object whenever it is extracted from a container.
It is a common practice among Java programmers to overcome this clumsiness by writing application-specific container objects to wrap Java's generic containers. You might, for example, have an Employees class to wrap a vector of Employee objects. The get() method of the Employees object would retrieve the appropriate member of the Vector, and cast it back to an Employee object before returning it. This is actually a good practice in Java. But it has the effect of increasing the number of classes in the application -- classes that the programmer must write -- and (because each Java class must exist in its own file) it increases the number of Java files that the programmer must manage. None of this, of course, is necessary in Python.
The burden on Java programmers will be eased somewhat in Java 1.5 with the introduction of generics and boxing conversions, also know as autoboxing.
Verbosity is not just a matter of increasing the number of characters that must be typed -- it is also a matter of increasing the number of places where mistakes can be made. The Java code on the left has 5 control characters: ( ) { } ; where the corresponding Python code has only one control character, the colon. (Or two, if you count indentation. See below.)
Java |
Python |
if ( a > b ) { a = b; b = c; } |
if a > b : a = b
b = c
|
Omitting or duplicating such characters is easy to do accidentally, and constitutes a severe error in the code. In my personal estimate, I spend 5 times as much time fixing such errors in Java as I do in Python. It really cuts into your productivity -- and your creative energy -- when you spend that much of your time just trying to satisfy the compiler.
Technically, Python has another control character that Java does not a€” indentation. But the requirement for correct indentation is the same in Java as it is in Python, because in both languages correct indentation is a practical requirement for human-readable code. The Python interpreter automatically enforces correct indentation, whereas the Java compiler does not. With Java, you need an add-on product such as the Jalopy code formatter to provide automated enforcement of indentation standards.
APPENDIX: About static vs. dynamic typing, and strong vs. weak typing, of programming languages.
There is widespread confusion or disagreement about the meanings of the words static, dynamic, strong and weak when used to describe the type systems of programming languages. What follows is a description of the way (or at least one of the ways) these terms are most commonly used.
In a statically typed language, every variable name is bound both (1) to a type (at compile time, by means of a data declaration) and (2) to an object. The binding to an object is optional — if a name is not bound to an object, the name is said to be null. Once a variable name has been bound to a type (that is, declared) it can be bound (via an assignment statement) only to objects of that type; it cannot ever be bound to an object of a different type. An attempt to bind the name to an object of the wrong type will raise a type exception. |
In a dynamically typed language, every variable name is (unless it is null) bound only to an object. Names are bound to objects at execution time by means of assignment statements, and it is possible to bind a name to objects of different types during the execution of the program. |
|
|
Here is an example. In a statically-typed language, the following sequence of statements (which binds an integer object, then a string object, to the name employeeName) is illegal. If employeeName had been declared to be an int, then the second statement would be illegal; if it had been declared to be a String, then the first statement would be illegal. But in a dynamically-typed language this sequence of statements is perfectly fine.
employeeName = 9
employeeName = "Steve Ferg"
|
Python is a dynamically-typed language. Java is a statically-typed language.
In a weakly typed language, variables can be implicitly coerced to unrelated types, whereas in a strongly typed language they cannot, and an explicit conversion is required. (Note that I said unrelated types. Most languages will allow implicit coercions between related types a€” for example, the addition of an integer and a float. By unrelated types I mean things like numbers and strings.) In a typical weakly typed language, the number 9 and the string "9" are interchangeable, and the following sequence of statements is legal.
a = 9
b = "9"
c = concatenate(a, b) // produces "99"
d = add(a, b) // produces 18
|
In a strongly typed language, on the other hand, the last two statements would raise type exceptions. To avoid these exceptions, some kind of explicit type conversion would be necessary, like this.
a = 9
b = "9"
c = concatenate( str(a), b)
d = add(a, int(b) )
|
Both Java and Python are strongly typed languages. Examples of weakly typed languages are Perl and Rexx.
A third distinction may be made between manifestly typed languages in which variable names must have explicit type declarations, and implictly typed languages in which this is not required. Most static languages are also manifestly typed (Java certainly is), but Frank Mitchell notes that some are not: "Haskell and the dialects of ML, for example, can infer the type of any variable based on the operations performed on it, with only occasional help from an explicit type."
Java Python
Used to be "JPython", name recently changed to "jython".
An implementation of the PythonLanguage that generates bytecodes for the Java virtual machine.
from the site ... (some from an older site)
Jython is an implementation of the high-level, dynamic, object-oriented PythonLanguage written in 100% Pure Java, and seamlessly integrated with the Java platform. It thus allows you to run Python on any Java platform. Jython is freely available for both commercial and non-commercial use and is distributed with source code. Jython is complementary to Java and is especially suited for the following tasks:
Strengths:
Weaknesses:
Jan 24th, 2001 08:25
Marcel Oliver, Louis Luang
[Note: This is a complicated question. I am currently trying make this determination for myself, too. This is what I came up with after a bit of searching and trying. Hopefully more knowlegeable people will add some meat.]
Mathematical Features:
- Currently Matlab sets the standard. Octave is trailing Matlab by several versions. The main gist behind Octave is to provide a free implementation of Matlab.
- Numpy has a set of basic features that cover most of what is required for a standard undergraduate numerical analysis curriculum, (standard numerical linear algebra, FFT) but not much more. Fancy stuff like domain triangulations is not supported and does not seem currently available. [Don't know Octave's status on this particular issue.]
- There are some bindings to standard numerical libraries, e.g. Multipack [Link http://oliphant.netpedia.net is currently broken]
Graphical Capabilities:
- Again, Matlab is state of the art.
- A frequent criticism about Octave is that its graphical capabilities are substandard, in particular with regards to publication quality 3D graphics. [Current status?]
- There are graphical add-ons for numpy [No experience with any.]
Interfaces to External Code:
- All three are sufficiently able to interface with external C or Fortran code.
Programming Language:
- Numpy builds upon attractive general-purpose object-oriented programming langage.
- Octave and Matlab share mostly the same special purpose language. Maybe second weakest point of Matlab after Licencing.
- Some clumsyness in Numpy with Matrix multiplication. May disappear at some point, see http://python.sourceforge.net/peps/pep-0211.html http://python.sourceforge.net/peps/pep-0225.html
- Existing Phython modules should make it easy to interface with Internet or low-level I/O if so desired.
Documentation:
- Matlab: Lots of books and internet resources available.
- Octave: Some Mathlab info may work, somewhat complete user manual, but somewhat outdated.
- Numpy: Many good books on Python in general, a rather short but up-to-date introduction to Numpy.
Portability:
- Numpy and Octave; generally good [Known problems?]
- Matlab: Any platforms that Mathworks sees fit to support. This currently excludes all non-Intel Linux ports.
Lincense:
- Matlab: proprietary.
- Octave: GPL
- Numpy: Phython Lincense, is considered free.
Conclusion:
I currently plan to give Numpy a try as both a teaching platform and for doing small toy-case research codes. I am not yet sure how this will work out, but I decided for the probably least-established option for the following reasons.
- After falling into the proprietary trap with a well-know program for symbolic math, I do not want to repeat this experience.
- Python as a language as an aestetic appeal that Matlab/Ocatve simply do not have. Further, the language is useful far beyond numerical math, which is a big bonus when using it as a teaching language.
PyMat - An interface between Python and MATLAB
NumPy is a set of numerical extensions for Python that introduces a multidimensional array type and a rich set of matrix operations and mathematical functions. Users who have MATLAB 5 installed, however, may wish to take advantage of some of MATLAB's additional functions, including the plotting interface. The PyMat module acts as an interface between NumPy arrays in Python and a MATLAB engine session, allowing arrays to be passed back and forth and arbitrary commands to be executed in the MATLAB workspace.
PyMat is usable on both UNIX and Win32 platforms, although there are some slight differences between the two (mainly due to MATLAB's limitations).
You can download the latest version of PyMat from:
http://claymore.engineer.gvsu.edu/~steriana/Python
open([startcmd])
This function starts a MATLAB engine session and returns a handle that is used to identify the session to all other PyMat functions. The handle is of integer type.
On the Win32 platform, the optional parameter startcmd
is always ignored. On UNIX platforms, it indicates the method of starting MATLAB. Quoting from the documentation for the engOpen
function from the MATLAB API reference manual:
On UNIX systems, if startcmd
is NULL or the empty string, engOpen
starts MATLAB on the current host using the command matlab
. If startcmd
is a host-name, engOpen
starts MATLAB on the designated host by embedding the specified hostname string into the larger string:
"rsh hostname /"/bin/csh -c 'setenv DISPLAY hostname:0; matlab'/""
If startcmd
is any other string (has white space in it, or nonalphanumeric characters), the string is executed literally to start MATLAB.
On UNIX systems, engOpen
performs the following steps:
1. Creates two pipes.
2. Forks a new process and sets up the pipes to pass stdin and stdout from MATLAB (parent) to two file descriptors in the engine program (child).
3. Executes a command to run MATLAB (rsh
for remote execution).
Under Windows on a PC, engOpen
opens an ActiveX channel to MATLAB. This starts the MATLAB that was registered during installation. If you did not register during installation, on the command line you can enter the command:
matlab /regserver
close(handle)
This function closes the MATLAB session represented by handle
.
eval(handle, string)
This function evaluates the given string in the MATLAB workspace. Essentially, it is as if you had typed the string directly in MATLAB's command window.
Note that this function always succeeds without any exceptions unless the handle is invalid, even if the evaluation failed in the MATLAB workspace!. You are responsible for verifying successful execution of your code.
get(handle, name)
This function retrieves a matrix from MATLAB's workspace and returns a NumPy array with the same shape and contents. The name
parameter specifies the name of the array in MATLAB's workspace.
Currently, only one-dimensional and two-dimensional floating-point arrays (real or complex) are supported. Structures, cell arrays, multi-dimensional arrays, etc. are not yet supported. On UNIX systems, this function can also be used to retrieve character strings, in which case a Python string object is returned. This functionality is not supported on Win32 (due to MATLAB restrictions).
put(handle, name, data)
This function places a Python object into MATLAB's workspace under the given name. The data
parameter can be one of three things:
array()
function) and that object is instantiated in MATLAB's workspace. The following limitations apply to the current version of PyMat:
Here is a simple example that computes the Discrete Cosine Transform of a short NumPy array using MATLAB's dct
function:
>>> import pymat
>>> from Numeric import *
>>> x = array([1, 2, 3, 4, 5, 6])
>>> H = pymat.open()
>>> pymat.put(H, 'x', x)
>>> pymat.eval(H, 'y = dct(x)')
>>> print pymat.get(H, 'y')
[ 8.57321410e+00 -4.16256180e+00 -1.55403172e-15 -4.08248290e-01 -1.88808530e-15 -8.00788912e-02]
>>> pymat.close(H)
Simply place the pymat.pyd
file somewhere on your Python search path. Also ensure that MATLAB's BIN
directory is somewhere on your DOS path, or else that you have copied MATLAB's LIBENG.DLL
and LIBMX.DLL
files somewhere on this path.
///
A lightweight language has one data structure and uses it well.
Perl, TCL - String
Python, Scheme - List
Matlab - Matrix
All of these (well, I don't actually know TCL and Perl) offer other data types, but if you use the predominant data type things go much more smoothly. The key is that when there is one form of I/O, you can hook lots of things together easily. As pointed out at the conference, this trades short term ease of use against long-term maintainability.