Open in app

Sign In

Write

Sign In

Hemanth Kumar M
Hemanth Kumar M

37 Followers

Home

About

Published in Towards AWS

·Sep 1, 2022

Hive Metastore Service on EKS

Running highly available and scalable Hive Metastore service on EKS — Apache Hive is an open source Datawarehousing software for Big Data, which provides Distributed Data management, MetaData management and Query engine using SQL. …

Apache Hive

3 min read

Hive Metastore Service on EKS
Hive Metastore Service on EKS
Apache Hive

3 min read


Published in Nerd For Tech

·Aug 29, 2022

Kubernetes CronJob: Automate Administrative tasks in EKS

A simple use case of Automating administrative tasks in EKS using Kubernetes Cron job — Kubernetes is one of the most widely used open source container orchestration platforms. The easy, quick and handy way of automating tasks in Unix for most DevOps Engineers is to schedule a script with Unix crontab. Similarly, Kubernetes provides a native way of scheduling tasks at a fixed schedule. This…

Kubernetes

2 min read

Kubernetes CronJob: Automate Administrative tasks in EKS
Kubernetes CronJob: Automate Administrative tasks in EKS
Kubernetes

2 min read


Published in Nerd For Tech

·Dec 12, 2021

Spark remote job submission to Yarn running on AWS EMR

Spark remote job submission allows client to submit Spark jobs to Yarn cluster from anywhere, decoupling the client from the Yarn cluster. — Spark remote job submission allows clients to submit Spark jobs to the Yarn cluster from anywhere. This can also be leveraged to submit Spark jobs to different Yarn clusters running different versions of Hadoop and Spark. Problem Statement We are using AWS EMR cluster with Yarn as a resource manager to run…

Apache Spark

3 min read

Spark remote job submission to Yarn running on AWS EMR
Spark remote job submission to Yarn running on AWS EMR
Apache Spark

3 min read


Published in Towards Data Science

·Aug 8, 2021

Compare PySpark DataFrames based on Grain

A simple approach to compare Pyspark DataFrames based on grain and to generate reports with data samples — Comparing two datasets and generating accurate meaningful insights is a common and important task in the BigData world. By running parallel jobs in Pyspark we can efficiently compare huge datasets based on grain and generate efficient reports to pinpoint the difference at each column level. Requirement:

Big Data Analytics

4 min read

Compare PySpark DataFrames based on Grain
Compare PySpark DataFrames based on Grain
Big Data Analytics

4 min read


Published in Nerd For Tech

·Aug 3, 2021

Simple AWS S3 Logging in Python3 Using Boto3

Simple way of implementing S3 logging in Python3 — One of the Key aspects of a software framework is to log necessary information and persist them. Using boto3 and native python logger its easy to persist logs into AWS S3 within a python program. Logs would be helpful to debug, monitor, audit and understand the behaviour of the framework…

Big Data

2 min read

Simple AWS S3 Logging in Python3 Using Boto3
Simple AWS S3 Logging in Python3 Using Boto3
Big Data

2 min read

Hemanth Kumar M

Hemanth Kumar M

37 Followers

Big Data Infrastructure and Platform Engineering | Automation | Cloud

Following
  • Netflix Technology Blog

    Netflix Technology Blog

  • Amit Singh Rathore

    Amit Singh Rathore

  • Kubernetes Advocate

    Kubernetes Advocate

  • Crack FAANG

    Crack FAANG

  • AirbnbEng

    AirbnbEng

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech