Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
With all my recent posts around Hadoop Streaming I thought it would be useful to summarize them into a single post. The main objective of these posts was to put together a codebase to enable F# developers to write Map/Reduce libraries through a simple API.
The full code posting can be found here: https://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850
The idea was to provide reusable code such that one only needed to be concerned with implementing the Map/Reduce code with the following function prototypes:
For Text Streaming:
Map : string > (string * obj) option
Reduce : string -> seq<string> > obj option
For Binary Streaming:
Map : WordprocessingDocument -> seq<string * obj>)
Map : PdfReader -> seq<string * obj>)
Reduce: string -> seq<string> -> obj option
For XML Streaming:
Map : XElement-> seq<(string * string) * obj>)
Reduce : string * string -> seq<string> -> obj option
So here is the full posting summary:
Hadoop Streaming and F# MapReduce
Using Hadoop on Azure JS Console for Data Visualizations
MapReduce Tester: A Quick Word
Hadoop Binary Streaming and F# MapReduce
Hadoop Binary Streaming and PDF File Inclusion
Hadoop Streaming and Reporting
Hadoop Streaming and Windows Azure Blob Storage
Hadoop XML Streaming and F# MapReduce
Look out for more Hadoop posts in the coming months.