GHads mind

developers thoughts & finds

Posts Tagged ‘noSQL

Testing MapReduce (MongoDB) in Java

leave a comment »

Yesterday a co-worker of mine thought about using MapReduce in MongoDB for calculating visibility of products for our company shop. Not convinced that an operation that basicly just groups is the right tool for the job I wanted a way to test MapReduce operations without the hassle to setup a MongoDB with example data for each co-worker and using Javascript from our Java environment (think of debugging…). So I thought about providing a tiny implementation in Java that just behaves like MapReduce in MongoDB (no paralization). This way all of my co-workers can play with their operations and check the practicability of their solution.

So today I quickly coded the following class, which imho also helps to explain the way MapReduce works to newbies:

public abstract class MapReduce<In, Out> {

	public static <Input, Output> Map<Object, Output> execute(MapReduce<Input, Output> operation, Collection<Input> input) {
		if (operation == null || input == null || input.isEmpty()) throw new IllegalArgumentException();

		// map input to emit values
		for (Input in : input) {
			operation.map(in);
		}

		// reduce emitted values
		Map<Object, Output> result = new HashMap<Object, Output>();
		for (Map.Entry<Object, Collection<Output>> entry : operation.emits.entrySet()) {
			Object key = entry.getKey();
			Output reduced = operation.reduce(key, entry.getValue());
			result.put(key, reduced);
		}
		return result;
	}

	private Map<Object, Collection<Out>> emits = new HashMap<Object, Collection<Out>>();

	protected void emit(Object key, Out value) {
		Collection<Out> forKey = emits.get(key);
		if (forKey == null) {
			forKey = new ArrayList<Out>();
			emits.put(key, forKey);
		}
		forKey.add(value);
	}

	public abstract void map(In input);

	public abstract Out reduce(Object key, Collection<Out> emits);

}

So we have an abstract class that defines a “map” and a “reduce” method which must be implemented by the operation. We have an internal method “emit” which must be called from the map method (map phase) to add an entry for the reduce phase per key/unique value (grouping). And we have a static method “execute” to call the abstract methods for a collection of input elements, first for each element the map-method then for all emited values the reduce-method per key.

To test your MapReduce operation the implementation is easy:

public class MapReduce_Test {

	public static class Customer {
		public String name;
	}

	public static class GroupByName extends MapReduce<Customer, Integer> {

		@Override
		public void map(Customer input) {
			emit(input.name, 1);
		}

		@Override
		public Integer reduce(Object key, Collection<Integer> emits) {
			int count = 0;
			for (Integer i : emits) {
				count = count + i;
			}
			return count;
		}
	}

	public static void main(String[] args) {
		// create some customers
		List<Customer> input = new ArrayList<Customer>();
		for (int i = 0; i < 1000; i++) {
			Customer in = new Customer();
			in.name = "name" + (i % 6);
			input.add(in);
		}

		// execute MapReduce operation
		Map<Object, Integer> result = MapReduce.execute(new GroupByName(), input);

		// show result
		System.out.println(result);
	}

}

Here we’re just group by customers name and count the number of occurencies for each name. The output of this Test is: {name1=167, name5=166, name2=167, name3=167, name4=166, name0=167}

So for my co-workers it’s easier to test their operations for MapReduce and if the operation gets the desired output for the given input, we can now transform the map/reduce methods from Java to JavaScript saving JavaScript debug time and headscratching woes 😉

Greetz,
GHad

Advertisements

Written by ghads

February 17, 2011 at 11:17 am