000 a
999 _c31327
_d31327
008 221014b xxu||||| |||| 00| 0 eng d
020 _a9780134846019
082 _a006.312
_bAVE
100 _aAven, Jeffrey
245 _aData analytics with spark using python
260 _bAddison-Wesley,
_c2018
_aBoston :
300 _ax, 306 p. ;
_bill.
_c24 cm
365 _b475.00
_cINR
_d01
504 _aIncludes bibliographical references and index.
520 _aSpark for Data Professionals introduces and solidifies the concepts behind Spark 2.x, teaching working developers, architects, and data professionals exactly how to build practical Spark solutions. Jeffrey Aven covers all aspects of Spark development, including basic programming to SparkSQL, SparkR, Spark Streaming, Messaging, NoSQL and Hadoop integration. Each chapter presents practical exercises deploying Spark to your local or cloud environment, plus programming exercises for building real applications. Unlike other Spark guides, Spark for Data Professionals explains crucial concepts step-by-step, assuming no extensive background as an open source developer. It provides a complete foundation for quickly progressing to more advanced data science and machine learning topics. This guide will help you: Understand Spark basics that will make you a better programmer and cluster “citizen” Master Spark programming techniques that maximize your productivity Choose the right approach for each problem Make the most of built-in platform constructs, including broadcast variables, accumulators, effective partitioning, caching, and checkpointing Leverage powerful tools for managing streaming, structured, semi-structured, and unstructured data.
650 _aPython
650 _aAnonymous function
650 _a Apache
650 _a Big data
650 _aBroadcast method
650 _aCheckpointing
650 _a Create Direct Stream method
650 _aData frames
650 _a Dsteams
650 _aEnvironment variables
650 _aFlatMap
650 _aHadoop
650 _aHigher-order function
650 _a JSON
650 _a JDBC
650 _a Lamda syntax
650 _a MapReduce
650 _a NOSQL systems
650 _aPySpark
650 _a Spark cluster architecture
650 _a Tuple function
650 _a Unpersist method
650 _aYARN
942 _2ddc
_cBK