GHads mind

developers thoughts & finds

Testing MapReduce (MongoDB) in Java

leave a comment »

Yesterday a co-worker of mine thought about using MapReduce in MongoDB for calculating visibility of products for our company shop. Not convinced that an operation that basicly just groups is the right tool for the job I wanted a way to test MapReduce operations without the hassle to setup a MongoDB with example data for each co-worker and using Javascript from our Java environment (think of debugging…). So I thought about providing a tiny implementation in Java that just behaves like MapReduce in MongoDB (no paralization). This way all of my co-workers can play with their operations and check the practicability of their solution.

So today I quickly coded the following class, which imho also helps to explain the way MapReduce works to newbies:

public abstract class MapReduce<In, Out> {

	public static <Input, Output> Map<Object, Output> execute(MapReduce<Input, Output> operation, Collection<Input> input) {
		if (operation == null || input == null || input.isEmpty()) throw new IllegalArgumentException();

		// map input to emit values
		for (Input in : input) {
			operation.map(in);
		}

		// reduce emitted values
		Map<Object, Output> result = new HashMap<Object, Output>();
		for (Map.Entry<Object, Collection<Output>> entry : operation.emits.entrySet()) {
			Object key = entry.getKey();
			Output reduced = operation.reduce(key, entry.getValue());
			result.put(key, reduced);
		}
		return result;
	}

	private Map<Object, Collection<Out>> emits = new HashMap<Object, Collection<Out>>();

	protected void emit(Object key, Out value) {
		Collection<Out> forKey = emits.get(key);
		if (forKey == null) {
			forKey = new ArrayList<Out>();
			emits.put(key, forKey);
		}
		forKey.add(value);
	}

	public abstract void map(In input);

	public abstract Out reduce(Object key, Collection<Out> emits);

}

So we have an abstract class that defines a “map” and a “reduce” method which must be implemented by the operation. We have an internal method “emit” which must be called from the map method (map phase) to add an entry for the reduce phase per key/unique value (grouping). And we have a static method “execute” to call the abstract methods for a collection of input elements, first for each element the map-method then for all emited values the reduce-method per key.

To test your MapReduce operation the implementation is easy:

public class MapReduce_Test {

	public static class Customer {
		public String name;
	}

	public static class GroupByName extends MapReduce<Customer, Integer> {

		@Override
		public void map(Customer input) {
			emit(input.name, 1);
		}

		@Override
		public Integer reduce(Object key, Collection<Integer> emits) {
			int count = 0;
			for (Integer i : emits) {
				count = count + i;
			}
			return count;
		}
	}

	public static void main(String[] args) {
		// create some customers
		List<Customer> input = new ArrayList<Customer>();
		for (int i = 0; i < 1000; i++) {
			Customer in = new Customer();
			in.name = "name" + (i % 6);
			input.add(in);
		}

		// execute MapReduce operation
		Map<Object, Integer> result = MapReduce.execute(new GroupByName(), input);

		// show result
		System.out.println(result);
	}

}

Here we’re just group by customers name and count the number of occurencies for each name. The output of this Test is: {name1=167, name5=166, name2=167, name3=167, name4=166, name0=167}

So for my co-workers it’s easier to test their operations for MapReduce and if the operation gets the desired output for the given input, we can now transform the map/reduce methods from Java to JavaScript saving JavaScript debug time and headscratching woes 😉

Greetz,
GHad

Written by ghads

February 17, 2011 at 11:17 am

HTML5 <time> microformat with Grails (UTC dates…)

with 2 comments

As I mentioned last post I’m doing a little pet project with Grails, HTML5 and MongoDB. Now I wanted to include the rendering of dates in StackOverflow style, something like: 5 minutes ago, about an hour ago and such. As HTML5 defines a <time> microformat I naturally wanted to use this one today and also found a nice jQuery plugin to convert times into the format I like: timeago. This one also includes support for the <time> tag by default.

But there’s a catch (as always…). The <time> tag and the plugin depend on ISO8601 UTC dates, which MongoDB stores nicely. But by getting data from MongoDB, you have your java.util.Date class and trouble starts, as there is no default way of formatting and recalculating dates to UTC with one call. Else I could have used the Grails included formatDate tag.

So I first searched for an easy way to format a Date instance to ISO8601 UTC format. Luckily I found the following post that shows how to set UTC for a formatter and the right format string for ISO8601. As this is a two line call and I didn’t want to make a Utils-Class for representation only to call it from the controller and having a String for rendering instead of a Date. And I didn’t want to include the formatter code on every page, I came up with a Grails Filter solution.

By including this class in grails-app/conf:

class MyFilters {
	def filters = {
		addUtils(controller:'*', action:'*') {
			after = {
				// add formatter for UTC dates in ISO8601 format
				SimpleDateFormat ISO8601UTC = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
				ISO8601UTC.setTimeZone(TimeZone.getTimeZone("UTC"))
				params["_UTC"] = ISO8601UTC
			}
		}
	}
}

I have a new formatter instance inside the params ready to use for formating dates to UTC in the correct way for the tag:

<time class="timeago" datetime="${params._UTC.format(created)}">${created}</time>

I used an underscore to seperate the utils from regular paramaters used to render the site. For now I don’t care about a new instance for every call, but if you care, just make the formatter static and reduce the scope of the addUtils Filter to only the pages that need it.

Greetings,
GHad

Written by ghads

January 26, 2011 at 3:10 pm

Grails + HTML5 = Works

with 6 comments

Today while building a project to test MongoDB integration with Grails, I was curious whether Grails can emit HTML5. So I just tried to apply to great HTML5 Boilerplate (http://html5boilerplate.com/) template and it worked out wonderful. I just changed the default main.gsp, main.css and application.js and added some more files…

Of course some things need adaption:

– Using jQuery 1.4.4 instead of 1.4.2
– All hardcoded path replaced with ${resource(dir:’images’,file:’favicon.ico’)} for example
– Javascript libs can be embedded with <g:javascript library=”…” /> instead of a hard coded path, when you place your lib file into the web-app/js folder
– For JQuery Fallback loading the src can be exchanged with ${resource(dir:’js’,file:’jQuery-1.4.4.js’)} to avoid hard path coding
– Because of the templating/layout nature of Grails view, JavaScript loading of jQuery and the additional plugins.js and application.js (also via <g:javascript …>) cannot be pushed to the very end of the page, but must be loaded before any part of a page is been decoreted by the template (except for belatedpng)
– I reused the default spinner div, but cleared the default grails code in application.js and introduced two simple functions for showing and hiding the spinner
– For flexible layout I used the following construct plus two additional GSPs header.gsp and footer.gsp

<div id="container">
 <header>
  <g:applyLayout name="header" >
   <content tag="header">
    <g:pageProperty name="page.header" />
   </content>
  </g:applyLayout>
 </header>
 <div id="main">
  <g:layoutBody />
 </div>
 <footer>
  <g:applyLayout name="footer">
   <content tag="footer">
    <g:pageProperty name="page.footer" />
   </content>
  </g:applyLayout>
 </footer>
</div>

This enables the header.gsp and footer.gsp to include portions of the GSP page to show while being modular. The header/footer.gsp looks like:

<%@ page contentType="text/html;charset=UTF-8" %>
<b>Footer begin...</b>
<g:pageProperty name="page.footer" />
<b>Footer end...</b>

While a GSP page may looks like:

<%@ page contentType="text/html;charset=UTF-8" %>
<html>
 <head>
  <title>my title</title>
  <meta name="layout" content="main">
 </head>
 <body>
  <content tag="header">
   … draw the header here…
  </content>
  <h1>Hello ${name}!</h1>
  <content tag="footer">
   … draw the footer here…
  </content>
 </body>
</html>

This allows for dynamic customizing of header/footer portions per page if desired, e.g. to render a status per page, which has its position and styling defined via the layout.

The rendered result of this looks like:

Header begin… … draw the header here… Header end…
Hello World!
Footer begin… … draw the footer here… Footer end…

The HTML5 Validator http://html5.validator.nu/ only complaints about the chrome frame but validates my rendered test page sourcecode without any moaning.

For CSS3, the integration was a nobrainer, I just added everything to main.css, removed all grails styles and used a @import url(styles.css); for a clear seperation of predefined and own styles at styles.css. The W3C Validator for CSS3 doen’t really like the main.css file, but as it comes directly from the boilerplate template I don’t really care. I can now start writing my pet project with HTML5 and CSS3 and can be sure to have maximum compatibility with yesterdays, todays and tomorrows browser and will update my code when a new version of the fantastic boilerplate template comes out.

Oh, for your own integration before going productive don’t forget to add expires to external resources, minify .css and .js files to one and enably GZIP transfer at your server. That should be all for a snappy cool HTML5/CSS3 site done with grails.

Have fun and greetings,
GHad

Written by ghads

January 21, 2011 at 9:43 am

Happy new year 2011!

leave a comment »

I wish you a good start and an even better year 2011!

I hope to blog more regulary this year with various themes and interesting topics. So stay tuned…

Greetz, GHad

Written by ghads

January 3, 2011 at 10:30 am

Posted in Uncategorized

Tagged with

Filter Collections via “Double Brace Initialization”

with 3 comments

Hi again,

here’s another little code I did recently while trying to answer a Stackoverflow-Question.

The author of the question (here) asks for a smarter way to filter collections/lists in Java like it is done in Python/Scala/Groovy, as one-liner.

I immediatly thought of JDK7 closures, but as it is widly known Oracle shifted the featurelists for JDK7 and 8 not long ago and it seems, closures will come not earlier than 2012. But even with closures the accepted answers shows that in its current state event closures would be an akward and over-complicated solution compared to groovy for example (quoted from the seanizers answer):

JDK7 sample code:

findItemsLargerThan(List l, int what){
  return filter(boolean(Integer x) { x > what }, l);
}
findItemsLargerThan(Arrays.asList(1,2,5,6,9), 5)

Groovy sample code:

  Arrays.asList(1,2,5,6,9).findAll{it > 5}

You see…

So I started to experiment with pure Java to find a solution that is easy to read and easy to program while not beein too verbose. After some hours I came up with the following code

List<Integer> filtered = new Filter<Integer>(Arrays.asList(1,2,5,6,9)) {
  {
    findAll(it > 5);
  }
};

I thinks it’s quite elegant, because it’s readable pure Java code and fits one line sans code-formating. How does it work? Obvisually one needs a class Filter. The abstract Filter class implements the List interface (delegates all methods to an internal List variable) and takes the collection to filter (assigned to the internal List variable) as argument for its constructor and throws a RuntimeException when the collection is null or empty. Then we have the double brace initialization pattern which allows to dynamicly and anonymously subclass the filter class while providing an instance initalizer which will be executed after the constructor of the Filter class is done. Inside the instance initializer all we do now is a method call to findAll(), which takesa boolean argument. But where does “it” come from and how will “it” be mapped to each element of the collection?

“it” is a variable inside the Filter class. As the instance initializer is executed after the constructor, we can set “it” to the first element of the collection to filter. The instance initializer is then executed and the boolean condition it > 5 is evaluated. So at the time findAll() is executed, the condition whether to include the first element in the filtered result is already done. So the magic to repeat the evaluation for each element must be inside the findAll() method:

protected void findAll(boolean b) {
  // exit condition for future calls
  if (values.size() > 1) {
    // only repeat for each entry, if values has multiple entries
    Constructor constructor = this.getClass()
       .getDeclaredConstructors()[0];
    Iterator iterator = values.iterator();
    boolean first = true;
    while (iterator.hasNext()) {
      T element = iterator.next();
      // don't evalute again for the first entry
      if (first) {
        if (!b) {
          iterator.remove();
        }
        first = false;
      } else {
        // else repeat Filter invocation for all elements
        Filter filtered = null;
        try {
          // invoked constructor for the element
          filtered = (Filter) constructor.newInstance(
            new Object[] { null, Arrays.asList(element) });
        } catch (Exception e) {
          e.printStackTrace();
        }
        // if values is empty, the condition didn't match and the
        // element can be removed
        if (filtered != null && filtered.isEmpty()) {
          iterator.remove();
        }
      }
    }
  } else {
    // one element can be checked directly
    if (!b) {
      values.clear();
    }
  }
}

(a little bit more polished than my answer at Stackoverflow)

First thing is to define an exit condition as the findAll() method will be executed multiple times eventually. The condition says, if there is more than one element in the list, we must iterate, else we can look at the evaluted boolean condition directly. In the last case “false” will just clear the internal List, whereas “true” does not change it.

When iteration is needed we get the constructor of this class once (there is only one) and take the next element. If it is the first one, we already know the evaluated condition and thus remove or keep the first item. For all other items we will create a new instance of the anonymous class that extended Filter via instance initalizer in the first place. By creating a new instance and providing a list as argument with a single element, the instance initializer is called again after the constructer set “it”. So the evaluation of the boolean condition starts again, but for the next element this time. After instance creation it’s a simple check whether the boolean condition had matched or not. If the internal List is empty, the condition was “false” and the element can be removed from the Filter instance that iterates the elements and creates new instances.

So the only catch is that there are n-1 additional instances created for a collection n to filter, when n is the number of elements. But as Instance creation is quite cheap these days and every additional instance does not need to iterate again, this runs quite fast. I’m sure there is still potential to optimize the Filter class as new ArrayLists are created here and there but im very satisfied with the result so far.

The filter class can be used for each kind of object, for example checking if people are 18 would be “findAll(it.getAge() >= 18);”, if there is a People Class with a getAge() method.

So long this time, hope you enjoyed this blogpost (feedback is appriciated).
Greetz,
GHad

Written by ghads

September 22, 2010 at 3:49 pm

Java Event handling via Enumerations

leave a comment »

It’s been a while

but I’m still active. Just had no time posting updates, but today I’ll start over, hoping to get more updates done more frequently. I want to start with a series of Java related posts showing some API usage code for my personal API JBase I developed over the last year. I will not post implementation details as far as full code but I’ll highlight one part and will give you hints about how to get it coded by yourself.

To start I will show an Event raising and handling system via Enumerations. There is no 3rd party library involved, just pure Java 5+ code is involved. So why Events?

Well it turns out that unless you use Bindings, updating a UI from model values or listenting for user actions involes an eventing system that needs quite a lot of work when used like it is promoted by Swing (i.e. WindowListener):

– A class for each event type (WindowEvent)
– An interface for the Listener (WindowListener) with a method per event (windowsActivated, windowsClosed, …)
– An internal raise method (processWindowEvent)
– An add method/remove method per listener interface (addWindowListener, removeWindowListener)
– List handling for keeping the listeners references (windowListeners)
– Synchronizing for multi-threading (when accesing the list)
– Did I forget somthing?

So scaling is a mess especially as changing any part leads to massive editing because of a tight coupling. For adding a new event type for example one needs all of the above. For adding new events (adding a method to the interface) all implementing listeners classes must be changed! And so on…

So I tried to solve those issues by decoupling the events completly from the class by using an Enumeration for the event type and an enumeration variable per event.

Use it like this:


public class SimpleEvent_Example implements SimpleEvent.Listener {

 public static void main(String[] args) {
   SimpleEvent.addListener(new SimpleEvent_Example());

   System.out.println("Raise SOME...");
   SimpleEvent.SOME.raise("some id");

   System.out.println("Raise OTHER...");
   SimpleEvent.OTHER.raise("other id");
 }

 public void handle(SimpleEvent event, String id) {
   switch (event) {
   case SOME:
     System.out.printf("SimpleEvent_Example.handle SOME: id= %1$s", id);
     System.out.println();
     break;
   case OTHER:
     System.out.printf("SimpleEvent_Example.handle OTHER: id= %1$s", id);
     System.out.println();
   break;
   }
 }
}

Easy, hm?

Let’s look at this line by line. The class implements an interface and thus listens to SimpleEvent, which would normaly be the Event class (e.g. WindowEvent). The Enumeration contains a non-static inner interface called Listener which is implemented here. For listing to multiple Event this would just be ‘implements SimpleEvent.Listener, OtherEvent.Listener’ and so on. It’s clearly declared at which event types this class listens to, so it should not be that confusing but very obvious on a second thought.

In the main method we first create a new instance of the example class and add this instance as listener to the Enumeration/event type. It has static methods for adding and removing Listeners so this is straight forward.

Next the event SOME of the type SimpleEvent is raised. As the enumeration variable IS the event it has a raise method. So this is completly decoupled from any class actually raising the event. Here the raise method is public but you can still make it package protected for example if you like to limit which class is allowed to raise events. So the event SOME of the event type SimpleEvent is like the previously mentioned windowsActivated event of the event type WindowEvent.

The handle method is implememted as declared in the Listener interface. For acting upon an event, the raised event is provided as well as the argument id from raising the event. A simple switch on event makes reacting to multiple events inside one method possible instead of using a method per event like with WindowListener (windowsActivated, windowsClosed, …). Now when adding a new event, you do not NEED to edit all Listeners anymore, if you design your switch right (don’t forget about default).

What’s missing is the source. But as the Listeners interface and the raise method is up to the developer when creating the Enumeration for the event type, an additional source parameter can be added at will.

How is SimpleEvent implemented then? Simple of course:


public enum SimpleEvent {

 // 1. declare events as enum variables
 SOME, OTHER;

 // 2. add the non-static interface for listeners
 public interface Listener {

   public void handle(SimpleEvent event, String id);

 }

 // 3. create delegator for handling listeners and events
 private static JEvents<Listener, SimpleEvent> EVENTS = JEvents.<Listener, SimpleEvent> create();

 // 4. add raise method and delegate to event types raise method
 public void raise(String id) {
   EVENTS.raise(this, new Object[] { id });
 }

 // 5. create static delegates for adding and removing listeners
 public static boolean addListener(Listener listener) {
   return EVENTS.addListener(listener);
 }

 public static boolean removeListener(Listener listener) {
   return EVENTS.removeListener(listener);
 }
}

As you can see there is not much code involved but there is a static instance of JEvents doing all the hard stuff in the background. So let’s look a little deeper inside the enum first and then go into JEvents…

The first few lines of the Enumeration is just standard stuff: declaring the events as enum variables and defining the listeners interface. Here we have one constraint though: the first parameter of each liteners interface method must be the Enumeration itself and is called ‘event’ by convention. One can always add more listeners methods but this is usally not needed as the event is an enumeration variable so you can switch inside the method implemtation instead of implementing multiple methods.

The third step creates the background instance of the class JEvents. This create call is just another way of providing concrete type information to a generic static create method instead of passing the class types. The JEvents instance uses the type information to restrict the raise and listeners add and remove methods to the correct interface and Enumeration, giving you a little type-savety for free.

At step four the non-static raise method is added to make the raise calls from the usage example possible. As the event itself is known by then (the selected enumeration variable) ‘this’ can be used when calling JEvents. As you can see the second part of the raise call looks rather akward but is needed. Here you need to pass all the arguments matching the parameter signature of one of the listeners methods except the event itself. The reason why I used an Object[] instead of varargs are listeners methods with varargs or arrays. When using a String[] for example by passing it to the JEvents raise method it would be casted to Object[]. So when looking for the right listeners method to call, we have a wrong signature. Thus to reduce errors I decided for using an Object[] directly so the I do not need to think about when I need to create a new Object[] or when using varargs is enough. A little akward syntax is the only trade-off here for reducing errors. Note you can still use varargs at the listeners methods as they get erased to arrays at compile time and JEvents can find the right method to call.

There are multiple raise methods at JEvents. I can use veto events (returning true if one listener returns true or throws an exception), synchronious events (like here), asychronious events and repeated events. The developer decides on what event he/she wants to use when implementing the Enumerations raise methods delegation to JEvents.

Finally adding and removing listeners are just static delegates to the JEvents instance and your done. This is far less code and far less changes for scaling than the classic approach:

– Adding a new event type is creating a new Enumeration (<30 lines of code)
– Adding a new event is just adding a new enum variable and no/little listeners changes involved
– No additional raise methods are needed if the listeners method can handle all the parameters
– No list handling for listeners as adding/removing is just a delegation
– JEvent does all the synchronizing for you
– completly decoupled

So how does JEvent works?

The JEvents instance manages a list of listers for the interface and lets you add/remove listeners at any time via the Enumeration itself via delegation. When a raise method is called, JEvents knows the signature of the listeners method, looks up the corresponding method and calls it for every listener. It can handle types that are assignable from a paramters type and looks for primitve types while checking the signature. Basicly that’s all. The fine-print involves errorhandling, asynchroniuos/delayed/repeated events via scheduled executors that can be canceled, synchronizing via ReentrantLocks for multi-threaded usage and veto events as mentioned above. I’ll just show the part actually checking if a method is the listener method for the event and the arguments:


private boolean isListener(Method m, E event, Object[] args) throws Exception {
   if (args == null) {
     args = new Object[0];
   }
   Class<?>[] parameterTypes = m.getParameterTypes();
   if (parameterTypes.length == args.length + 1) {
     if (parameterTypes[0].equals(event.getClass())) {
       for (int i = 0, max = args.length; i < max; i++) {
         Class<?> argsType = args[i].getClass();
         Class<?> parameterType = parameterTypes[i + 1];
         if (!argsType.equals(parameterType) && !parameterType.isAssignableFrom(argsType)) {
           if (Reflection.isPrimitveOrWrapper(argsType)) {
             argsType = Reflection.getPrimitiveClass(argsType);
             if (!argsType.equals(parameterType)) {
               return false;
             }
           } else {
             return false;
           }
         }
       }
       return true;
     }
   }
   return false;
 }

Note that a class Reflection is used, which is also a part of JBase but needs no further explanation here. We first check if the length of the methods parameters is correct, then if the first parameter is the events type. After that we check for every other parameter if the the parameters class matches the arguments class or is assignable from it. If not we try it once more for primitives, as the Object array contains only the wrapper classes to the primitives.

That concludes todays journey to my JBase API. I hope you enjoyed the dive and I’d like to get some feedback from you if you don’t bother to comment.

Thank you and greetz,
GHad

Written by ghads

September 2, 2010 at 12:16 pm

Java 7 – Small language changes

with 2 comments

Hi everyone,
Apart from the great proposals made by Neal Gafter that can be found everywhere on the web, I have some too… But I don’t know how to make them “official” because of my lack of knowledge about deeper changes in the specs or compiler. Anyway, I’ll like to post them, maybe someone can give me a hint how to get them in…

@Home I’ve writen a small closures framework w/ double brace initialization, but I stumbled upon two issues that made me (almost) surrender:
– Java should allow access to non final variables from inside anonymous classes
– And I need to finish the construction of an object in its super constructor to prevent direct execution of a double braced anonymous class, as the instructions inside the double brace is been “copied” to the anonymous constructor

Something I come over just today:

Java should ease the usage of static final fields/constants when declared inside the class you call a method. Instead of doing like

TableWrapData tdGrabName = new TableWrapData(TableWrapData.FILL_GRAB);

it should be possible to write

TableWrapData tdGrabName = new TableWrapData(FILL_GRAB);

The same accounts for enumerations. As the method definition states the type, why do I need to write it again? E.g. having a class

class SomeClass { public SomeClass(SomeEnum enum) {} }

instead of

new SomeClass(SomeEnum.Value);

I want to do

new SomeClass(Value);

of course with full IDE support 😉

Another one is to allow Interfaces for Annotation fields. That would ease the use of Enums that implement this interface, thus making customizable/extendable enums possible.

If you have questions or suggestions, feel free to post some comments, I’ll get back to you…

Greetz, GHad

Written by ghads

March 3, 2009 at 12:23 pm

Posted in Java development

Tagged with , ,