Sunday, March 13, 2011

App Engine Datastore: Fast, consistent check if entity exists by key

Many people have asked if it is possible in App Engine to check if an entity exists, so that the whole operation satisfies the following conditions:
  • It is as fast as possible
  • It is strongly consistent (i.e. returns true immediately after a put and false immediately after a delete)
In App Engine Datastore the fastest operation is "get", however it will always retrieve the whole entity. If your entity is big (e.g. contains blobs), then using "get" may be too slow for our purpose. On the other hand, queries allow you to retrieve only keys (key-only queries). However, queries are inherently slower than "get" because they have to scan the index and in the case of HRD are not strongly consistent.

Here's the solution. Split your logical entity (e.g. a blog post) into two physical entities:
  • The first entity (e.g. BlogPost) contains all the data fields and the key (generated and assigned key types will both work). We will only retrieve this entity when we actually need to work with the data.
  • The second entity (let's call it BlogPostCheck) does not contain any fields, only the key. When we want to check if a blog post exists, we will run a "get" against the BlogPostCheck. Because the latter does not contain any data "get" will be extremely light and fast.
So what does the existence of BlogPostCheck have to do with the existence of BlogPost? The trick is that we will put both entities into the same entity group by making BlogPost the parent of BlogPostCheck. When saving a blog post we will create both a BlogPost and a BlogPostCheck in a single transaction. When deleting a blog post we will delete both in a single transaction. This way a BlogPostCheck exists if and only if a BlogPost exists (in a strongly consisten manner), therefore a "get" on BlogPostCheck is equivalent to "get" on BlogPost, except it doesn't retrieve the data, so it's faster. Here's some Java pseudo-code for this example:


// Checks if a blog post exists
boolean check(Key blogPostKey) {
DatastoreService ds = ...;
Key blogPostCheckKey = KeyFactory.createKey(blogPostKey, "check");
Transaction tx = ds.beginTransaction();
try {
ds.get(blogPostCheckKey);
return true;
} catch (EntityNotFoundException notFound) {
return false;
} finally {
tx.commit();
}
}

// Saving a blog post
DatastoreService ds = ...;
Transaction tx = ds.beginTransaction();
Key blogPostKey = ds.allocateIds("BlogBost", 1).getStart();
Key blogPostCheckKey = KeyFactory.createKey(blogPostKey, "check");
BlogPost bp = new BlogPost(blogPostKey, ...);
BlogPostCheck bpc = new BlogPostCheck(blogPostCheckKey);
save(bp);
save(bpc);
tx.commit();

// Deleting a blog post
Key blogPostKey = ...;
DatastoreService ds = ...;
Transaction tx = ds.beginTransaction();
Key blogPostCheckKey = KeyFactory.createKey(blogPostKey, "check");
ds.delete(blogPostKey, blogPostCheckKey);
tx.commit();

Sunday, March 22, 2009

Installing SSL (HTTPS) Certificate Into Java Keystore Without The Certificate File

I stumbled into a problem with Java and SSL. Our Archiva server serves Maven2 repository through SSL. I could browse it without issues using Firefox, after accepting the SSL security certificate. But Maven2 failed to download artifacts (and it didn't even raise a warning). Turned out that Java did not have the SSL certificate installed. I did not have the SSL certificate file, so how do I install it?

Turns out it's very simple. Here's how. Open the site in Firefox using an https link. You might be prompted to resolve the security certificate issue. If so, do that first. After you successfully load the page, click on the left tip of the address bar. You should see something like this:



Click "More Information...". Then click "View Certificate", open "Details" tab and click "Export...". Select "X.509 Certificate with chain (PEM)" and save the file. This file will be compatible with Java's keytool command. You can use a variant of the following command to install the certificate:

keytool -keystore "%JAVA_HOME%\jre\lib\security\cacerts" -import -file [PATH_TO_FILE]

Friday, March 13, 2009

Java Checked Exceptions Revisited

As promised this is a follow up to the discussion of Java's checked exceptions. I consider this topic important enough to visit it again. I will start by summarizing different aspects of checked exceptions, including my own ideas as well as ideas from external sources. Then I will suggest a couple of solutions to the problem.

Facts about checked exceptions:

  • A method that contains a statement that may throw a checked exception must either handle that exception in a try/catch block or declare the exception in the method's signature using the throws clause

  • Checked exceptions are only checked at compile time (see JLS). This fact is actually not widely known. If for example, you compile your application against a library which uses runtime exceptions, your application will still run if, after compilation, you replace the library with another version of the same library in which runtime exceptions were refactored into checked exceptions. This is because JVM does not perform the check when loading and running classes.


Pros

  • Checked exceptions automatically remind of themselves and have smaller chance to go unhandled. Consequently, if developers follow the best practice to make recoverable errors checked and unrecoverable errors unchecked, then checked exceptions will encourage recovery rather than application failure.

  • As James Gosling explains here, "the knowledge of the situation is always fairly localized". I would replace the word "always" with "sometimes", but the point stands. If the knowledge of a situation is localized then handling the exception closer to the point of occurence is likely the best approach. And again, with a checked exception the compiler will remind you of that.


Cons

  • Bruce Eckel gives an excellent overview of psychological phenomena observed in many Java developers when it comes to checked exceptions. Checked exceptions get swallowed, making it hard to track down bugs. Declaration of "throws Exception" or even worse, "throws Throwable", is not uncommon, and it defeats the purpose of checked exceptions altogether.

  • Checked exceptions reduce encapsulation (a more detailed example will follow). Implementations of an interface are not allowed to throw exceptions other than those declared by the interface. This is good. The bad part is that checked exceptions that your code cannot recover from locally is forced to handle the exception, at least you have to wrap it into a runtime exception. Wrapping exceptions means you can no longer use standard exception hierarchy.

  • Apparently Sun found a situation when checked exceptions should be bypassed (they tricked the compiler). Why should we not have the same option?

  • Versionability and scalability - checked exceptions effectively add themselves into the method's signature with API-breaking consequences to the client code. Additionally, as the project grows and the number of used libraries grows the number of possible exceptions grows with it. With checked exceptions this means that methods up the stack have two choices: 1) declare the grand-parent of all thrown exceptions in the throws clause and 2) list them all. Declaring the grand-parent exception is not desirable or not possible as it usually turns out to be either java.land.Exception or java.lang.Throwable. Listing all possible exceptions has a scalability issue in that a) it increases code bloat with long throws clauses proliferating throughout the code base and b) every time you add a new checked exception into the mix, you have to update all methods up the stack

  • Code Testability - quite often there are cases when a checked exception will never happen. This makes the catch block untestable. See details in the post.


Locality and Encapsulation
When James Gosling talks about localized knowledge of exceptions I believe what he means is that exceptions are local in the sense that they do not (or should not) travel very high up the call stack. For example, if method1 calls method2, method2 calls method3, ..., method9 calls method10 and method10 throws an exception, then James would rather handle that exception in method10 or method9, but wouldn't let it go too far up the stack, say, up to method2. Now, exceptions were specifically created to fly up the stack until someone cares to handle them. Where exactly that happens depends on the design of a given program. There is a perfectly solid program design according to which exception handling happens very high up the stack. This design stems from the practice of using exceptions as special, exceptional, out-of-the-ordinary conditions.

Let's go straight to Wikipedia and read the definition:

Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions - special conditions that change the normal flow of execution.


Now, what is the value of classifying a particular condition as "special" or being ouside of "the normal flow of execution" if it is part of the method signature like all the "normal" conditions and has the same consequences to the API and the client code? I believe the idea of exceptions is to give the programmer the ability to design their logic as if no exceptions ever occur in the flow, but also to create a "special" mechanism (as special as the exceptions themselves) to deal with exceptional situations outside the normal flow. Forcing the client code to handle an exception is forcing changes to the normal flow of execution. Does not this immediately destroy the purpose exceptions? The exception becomes more like a return status code rather than something that represents an exceptional situation.

In fact locality is not something that the designer of the exception class can decide. Java interfaces, dependency injection and AOP allow us to design programs with very high level of separation of concerns. For example, it is an accepted practice to start a database transaction as soon as a client request (think RPC or HTTP) is accepted, then run all the business logic and finally commit or rollback the transaction based on the outcome. This translates to the following call stack sequence:



See how your database code consists of two layers. One - the Transaction Manager - is up the stack, responsible for opening and committing/rolling back transactions. The second layer is down the stack, responsible for data access - Data Access Object (DAO). All the Business Logic - the majority of your code - is in between. Note, that your Business Logic code calls DAO and it is abstracted from the database driver. It does not "know" if your data access is implemented using pure JDBC or Hibernate, whether it is a relational database or document-oriented, like CouchDB. Here we observe the power of separation of concerns. At any moment we may decide to migrate from one type of database to another and we do not have to change a single line in our business logic. Unless! Unless we use checked exceptions. In order to let a checked exception fly through the business logic layer, every method in the business logic has to declare the exception. For example, if the database is accessed using JDBC, then the business layer will declare throws SQLException. Now, even without going any further we see the problem with this. Namely, the separation of concerns is no more, because the business logic is "aware" of the SQL nature of the data access. You now have an extra week of work to migrate from SQL to something else. The separation of concerns would not break if SQLException were runtime.


Now let's see how the designers of Java could deal with the situation.

Solution I - Remove Checked Exceptions from the Language

As I mentioned above checked exceptions are only checked at compile time. This means that when you run your Java program all exceptions are runtime. At the compiler level, removing the check is backwards-compatible change too, as exception checking is a restriction on the Java source code. All previously written Java code will still compile without any checking.

But is there a way to implement an exception mechanism in Java that has the benefits of checked exceptions but without their drawbacks?

Solution II - Annotations

In addition to Solution I introduce a set of standard annotations applicable to exception classes.

@Recoverable - indicates that the program may be able to recover from an exception of this type.

@Documented - indicates that this exception should be documented in the JavaDoc comment.

With these annotations in place the developer could configure their tools to highlight instances when a @Recoverable or @Documented exceptions are not handled or documented. Then James Gosling would configure his tools to fail compilation in this case. I wouldn't.

Solution III - Compiler Option

Just as we can enable/disable assertions at runtime, we could have an option to enable/disable exception checking at compile time. Simple, easy to implement.

Wednesday, March 11, 2009

Java Checked Exceptions and Code Testability

There has been a lot of blogs, articles and discussions about the dangers of Java's checked exceptions (Does Java need Checked Exceptions?, Java's checked exceptions were a mistake). Some of the prominent Java technologies, such as Hibernate and Springframework, explicitly reject checked exceptions and not only use runtime exceptions in their API, but also wrap Java standard checked exceptions to make them runtime (SQLException vs HibernateException). I am not going to restate any of the points outlined in those sources, but I will admit in advance that I am an opponent of checked exceptions. In this post I am going to show another aspect of the problem, one that I haven't seen mentioned on the web yet. Namely, I am going to show how checked exceptions reduce the testability of the code.

Let's start with an example. As you may know, Java I/O system uses checked exceptions, with java.io.IOException being the great grandfather of all. Any I/O system is error prone, so you can get all kinds of conditions. However, there are cases when I/O exceptions are never expected to happen. One of these cases concerns static classpath resources. Static classpath resources could be bundled icons, properties, translations, etc. The important thing here is that a static resource is an inseparable part of the application. It is loaded from the same location from which your application classes are loaded. If a classpath resource is missing or is corrupted, then the application is corrupted and may not run correctly. The effect of a missing resource is the same as having one of your jar or class files missing or corrupted. Now, we know that an attempt to initialize a class that is not in the classpath will result in a NoClassDefFoundError. Errors in Java are unchecked exceptions. This was a good decision. Why? Because if an application class is missing in the classpath then the application has not been properly built or initialized and there is nothing the application can do to reliably recover from such a condition. Consequently you never write unit or integration tests that test for those kind of conditions. The same applies to bundled static classpath resources. However, when loading static resources you have to do I/O manually and handle all the checked exceptions in your application code. And this is where your code testability falls. A code sample is worth a thousand words, so let's dive into the code and demonstrate.

Suppose you have a classpath resource example/AppProperties.properties with the following contents:

greeting=Hello, World!

A class example/AppProperties.java that loads the resource from the class path:

package example;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

public class AppProperties {

public String getGreeting() {
Properties props = new Properties();
try {
InputStream inStream =
AppProperties.class.getResourceAsStream("AppProperties.properties");
props.load(inStream);
inStream.close();
} catch (IOException e) {
// What to do here?
throw new RuntimeException("This should never have happened!", e);
}
return props.getProperty("greeting");
}

}


And a test case for this class example/AppTest.java:

package example;

import junit.framework.Assert;

import org.junit.Test;

public class AppTest {

@Test
public void testAppProperties() {
Assert.assertEquals("Hello, World!",
new AppProperties().getGreeting());
}

}


In my opinion a simple class like AppProperties should have 100% code line coverage by tests. Let's see what we get when running with a code coverage tool (I used Eclemma):



[+] Click to enlarge


What we see here is that the exception did not happen, so we never caught it and the catch block was never executed, causing incomplete line coverage. More over, there is no easy way to mock out a condition that will make it happen, because the resource is always there and the build system will ensure it is there with all the rest of the classes.

And no, I do not want to propagate an IOException by declaring it in the throws clause of the method, because I do not want to expose the clients to the implementation details (could become SQL in the future). Also even if I do, the client code will have to handle the exception and then the client code will not be covered. So I wrapped the exception into a RuntimeException, which is unchecked. I could use a different unchecked exception, but chose not to in order to keep it simple.

Back to the general idea. Sometimes we face a situation when we are forced to handle exceptions that will "never" happen. That is, "never" in the sense that they can only happen due to improper application assembly. They are not expected. Consequently there is no point in writing any recovery code or test cases for those conditions. However, checked exceptions force us to write those try/catch blocks, and those blocks are not testable.

This is not about trying to reach 100% coverage. That question deserves its own discussion. As noted in a very insightful article you should only use code coverage as a "clue" to places in the code that may contain bugs. Some of these clues may be false alarms, i.e. a piece of code is not tested but it does not contain bugs. The problem that comes with checked exceptions is that they produce many more of these false alarms. Suppose you run code coverage tool once a week and then go through results in search for bug-prone areas. Because try/catch blocks similar to the one above will never be covered you will be forced to return to that piece of code once a week, review it and say "Oh, it's just that condition that will never happen" and move on. Of course, it will only take you several seconds to verify it depending on code complexity, but if your code contains hundreds of places like this you may waste hours of your time, every week.

I am thinking about writing a follow-up with a conclusion in which I want to use points made for and against checked exceptions by others and put them on some sort of imaginary scale. I also have a middle-ground solution to the problem. But this is next time.

Thank you for reading. Your comments are welcome.

Wednesday, September 17, 2008

Thoughts on Linux hardware issues, etc

(originally I wrote this as a reply to a blog post, but because that blog has a horrible formatter I decided to expand it and post it here as well)

I have been using Linux (mostly Ubuntu and mostly at home) and Windows (mostly at work) for years now and I would probably use Linux full-time save, maybe, for games, although if a game runs on Linux I play it on Linux. My choice of Linux is purely practical. Simply put, I can work much more efficiently on Linux than I can on Windows. But enough about how much I like it, the problem is I keep hearing these complaints (or is it excuses?) about Linux that I find baseless. What is more surprising is that I hear them from computer-literate people, such as software developers. The rest might seem like another "5 biggest Linux myths" blog post, but in fact it is just a couple of notes I decided to take down not to forget them later and maybe use them as arguments in discussions.

1 - I find hardware compatibility hardly a relevant topic when comparing operating systems. I wish it stopped popping up at least in computer literate environments. If you are going to install Linux on Windows hardware, please do not expect that the install will go smoothly. Specifically, do not expect that sound, video or wireless will work out of the box. This is not a problem with Linux, it is just a fundamental law of computing that software must be compatible with the hardware it is running on. The same thing or even worse will happen to Windows when you try to install it on Linux hardware. Just recently, my brother got an Ubuntu laptop from System76 and tried to install XP on it. It wouldn't install until he patched some firmware on it. And then try installing Windows on anything that does not look like a standard PC. And then try installing Max OS on anything.

2 - A responder to a (blog) noted: "doing ANYTHING in Linux that becomes a 25-60 minute black hole". This is an utterly wrong statement. People usually apply this to hardware issues (in which case see point 1). However, once hardware issues are resolved (either by manual configuration or by buying certified Linux hardware) doing ANYTHING in Linux is a breeze. Of course due to exposure to such a vast repository of open-source software (of which maybe only 5% is of good quality) you will always install things that will not work as good you want. This brings me to my next point...

3 - Finding, installing and uninstalling software in Linux is easier than in any other operating system I have seen. This plays well with the huge amount of software available for Linux in that you can always install it, try it and uninstall it easily if you don't like it. No registry or program files leftovers will ever clog your system. Also thanks to advanced automatic dependency management programs take much less space, sharing many standard libraries. I have tons of stuff installed on my laptop and still have room for music and video on my 80 GB hard drive.

4 - Keeping your system up-to-date, and I don't mean just the OS, but everything installed on your system is incomparable. Linux is ahead of any other OS by orders of magnitude. Anything you install from a registered software repository (usually it's 99% of all your software) will be kept up-to-date with the rest of the system. Every now and then you are reminded that updates are available and these updates apply to your entire system: the OS (kernel), utilities, desktop manager, office suites, Apache, Firefox, well you get the point.

5 - This might be a matter of personal preference, but to me it feels much better to work on a system and to know that all it does at any given moment in time is what you asked it to do. There is no antivirus that checks every file you touch. There is no spyware and no spyware removers. Let's face it, no matter how clean your Windows system is, there is always some dark magic happening under the hood that for some reason: slows your system down, blocks you from performing normal tasks, crashes, automatically reboots without asking you first, etc, etc. When I copy/move/delete files I want the OS just do it, and since I still use Windows at work I am constantly reminded how much faster these basic operations are on Linux (from both speed and usability perspectives).

In conclusion I would like to reiterate that Linux hardware issues are overrated and irrelevant to discussions about features/usability/performance of the OS. What's worse is that the issue is magnified by the fact that most installations take place on foreign hardware and then the performance of the operating system is evaluated on that hardware. This is wrong. The question we should be discussing is 'When it does work does it perform better (from all aspects) compared to the competition or not?'

Thursday, June 26, 2008

On Scala Language - First Impressions

A couple of days ago I started playing with the Scala programming language (http://www.scala-lang.org). I decided to document my experience with this language. I will be talking in Java terms, so if you see me referring to, for example, something 'static', think of Java's definition of 'static'.

Several things immediately caught my attention. The language is concise, type-safe, lacks a keyword for declaring 'static' and 'final' members, lacks 'throws' keyword (i.e. all exceptions are runtime), fully interoperates with Java and Java libraries, contains elements of a functional language.

Certain things pop up in a Java developer's mind right away after hearing the above (I tested it on some of my Java-savvy friends and on myself, of course):
  • But how do I declare static variables and constants?
  • Functional elements, ha. Are we messing with the 'return' statement again? If there's no return statement I don't wanna hear about it. (Then they close their ears and go: "Bla-bla-bla-bla...")
  • Interoperates with Java as in JNI or somethin'?
  • No checked exceptions? How come?
I will try to answer these questions right away as it is almost impossible to move on. Java developers have heard a lot about attempts to make a language concise, expressive, and so on. All was at the expense of type-safety (Python, Ruby, Groovy, etc). So their scepticism just keeps growing.

This example answers all the questions:


package scalatest

import java.io.File

object HelloWorld {

def main(args : Array[String]) {
println(message0)
println(message1)
println(message2)
println(message3)

this.message3 = "hehe"
println(message3)

/*
This won't compile
message2 = "hehe"
*/

val f = new File(".")
println(f.getAbsolutePath)

if (false) {
throw new java.io.IOException("Just a test")
}

println("Semi-colons at the end of an expression are optional too");
println("But you can" +
" still do multi-line "
+ "expressions")

val a =
"""This might be useful sometimes too.
You can embed Python-style multiline strings like this one.
Looks ugly at first, but I can see myself getting used to it.
I don't think I will use it a lot though"""

println (a)
}

/**
* A method with a return statement
*/
def message0 : String = {
if (new Random().nextBoolean()) {
return "Hello, World!"
} else {
return "Goodbye, World!"
}
}

/**
* A method without a return statement
*/
def message1 : String = {
println("This is definitely a multi-line method")
println("Looks like return statements are optional")
if (new Random().nextBoolean()) {
// Hopefully future IDEs will mark the returned expressions automatically
"Hello, World!"
} else {
// It's not that hard
"Goodbye, World!"
}
}

/**
* A constant (i.e. equivalent of Java final keyword)
*/
val message2 : String = "abc"

/**
* Not a constant, just a field
*/
var message3 : String = "def"

}


Static Variables: Scala solves Java's problem with static members by dedicating a special construct for this purpose. It's called 'object'. 'object' is a class-singleton. In this example HelloWorld is an object, so everything in it is automatically static. This is good, because when you allow static members in your regular classes (as Java does) you run into problems with polymorphism. Scala doesn't, hurray!

Constants: Scala has a very elegant solution for constants. Constants are simply values (the keyword is 'val'). Java uses an oxymoron syntax declaring a constant as variable that doesn't change. However, we are used to it by now, so we don't look at it this way anymore, but Scala does remove a lot of code bloat by calling constants what they really are:



val HELLO = "World!"


VS


public static final String HELLO = "World!";


The 'return' statement: Scala does mess with the 'return' statement. The good news is that you can still use it. It is optional. Plus, because the language is type-safe the compiler won't allow you to return just anything. Plus, there is a good chance that an intelligent syntax highlighter will mark the returning expression.

Interoperability with Java: Looks good. I checked it by using a Java class in Scala code (see the example). The claim is that you can also extend Java classes and implement Java interfaces and use Scala classes in your Java classes. This was a very practical decision. Struts application in Scala, anyone?

Checked exceptions: Scala does not require you to catch checked exceptions, but you can still use them (see the example). There is no mechanism to declare thrown exceptions either. Checked exceptions is still a debate (http://www.mindview.net/Etc/Discussions/CheckedExceptions). I don't like them. Only Java has them. The rest of the world lives happily without them. I try to avoid them as much as I can. Documenting thrown exception is one thing. Fixing bugs because of swallowed exceptions or interpreting methods throwing Throwable is another. All of this is created by the psychological phenomena created by checked exceptions. So I am glad that Scala does not have checked exceptions (I can already see 50% of my audience leaving this web-site :).

So, Scala seems to have gotten right the things that I usually complain about in Java, and they did not have to sacrifice type-safety. Let's see if they've got everything else right...

Monday, June 4, 2007

Welcome to my first public blog!

I am thinking about using this blog as a central place for my thoughts on technology, mostly on free and open source software, Java, Linux and related subjects.

I hope you will enjoy it, but if you do not, please let me know :)