MapReduce

2013. 2. 26. 17:33

MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers.

The model is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms.

MapReduce libraries have been written in many programming languages. A popular free implementation is Apache Hadoop.

Example

Sample codes

Mapper (map): 입력을 받아 처리, 처리 결과 정렬

Reducer (reduce): Mapper에서 처리된 결과를 받아 통합

Mapper

//--- 입력 : Object. Long 형태의 key, Text. 파일에서 읽은 하나의 라인
//---        InputReader 등을 사용하여 입력되는 값(Text)의 양식을 변경할 수 있음
//--- 출력 : Text, IntWritable -> Reduce에 전달, Text는 키
public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {
	private final static IntWritable one = new IntWritable(1);
	public void map(Object key, Text word, Context context) throws IOException, InterruptedException {
		context.write(word, one);
	}
}

Reducer

//--- 입력 : Text, IntWritable <- Map에서 전달 받음, Text를 키로 하여 값을 취합하여 전달받음
//--- 출력 : Text, IntWritable
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
	private final static IntWritable count = new IntWritable();
	public void reduce(Text word, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable val : values) {
			sum += val.get();
		}
		count.set(sum);
		context.write(word, count);
	}
}

References

저작자표시 비영리 변경금지

SENS

Programming Note