MapReduce on swift container

asked 2014-12-18 05:53:37 -0500

ashleydw gravatar image

I have a swift container with json files in pseudo directories. I also have created a hadoop cluster that I would like to run MapReduce jobs on but instead of providing a single object (as per the error when using a container: "URL must be of the form swift://container.sahara/object") I would like to process jobs across the whole container (i.e. searching in each individual file).

Is it possible to set up in such a way that this is possible? I'm open to alternatives to MapReduce

Thanks

edit: My question is slightly answered by this amazon answer, so I think I need an alternative. I would like to search for files where a data attribute matches that of a search field; for example, find all files where the timestamp is X.

Are you trying to recursively traverse input directories? http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-troubleshoot-errors-io.html (http://docs.aws.amazon.com/ElasticMap...)

edit retag flag offensive close merge delete