Vejju

Sunday, January 19, 2025

1. Cryptography Basics

Understand Key Concepts:
- Encryption, decryption, hashing, and digital signatures.
- Key terms: confidentiality, integrity, authentication, non-repudiation.
Learn Symmetric Cryptography:
- Algorithms: AES, DES, ChaCha20.
- Modes of operation: ECB, CBC, GCM.
Learn Asymmetric Cryptography:
- Algorithms: RSA, ECC, DSA.
- Key exchange (Diffie-Hellman).
Practical Steps:
- Use OpenSSL for simple encryption and decryption.

2. Keys: Private vs. Public

Private Key:
- Always secret, used for decryption and signing.
- Stored securely (e.g., in HSM or keystore).
Public Key:
- Shared openly, used for encryption and signature verification.
Master Public-Private Key Pair Usage:
- Encrypt a message using the public key and decrypt with the private key.

3. X.509 Certificates

What is X.509?
- A standard format for public key certificates.
- Includes details like subject, issuer, validity period, and public key.
Practical Work:
- Generate certificates using OpenSSL.
- Create a Certificate Authority (CA) and sign requests.

4. PKCS Standards

Key Standards:
- PKCS#11: Cryptographic token interface (e.g., HSM integration).
- PKCS#12: Personal Information Exchange format for private keys and certificates.
- PKCS#7: Cryptographic message syntax for signed/encrypted data.
Practical Work:
- Use pkcs11-tool and OpenSSL to explore PKCS implementations.

5. PEM vs. CERT

PEM (Privacy-Enhanced Mail):
- Base64 encoded certificate format.
- Common extensions: .pem, .crt, .cer.
CERT:
- Binary certificate format.
- Common extensions: .der, .crt.
Practical Comparison:
- Convert between PEM and CERT using OpenSSL:
```
openssl x509 -in cert.pem -outform der -out cert.der
```

6. Keystore and Truststore

Keystore:
- Stores private keys and associated certificates.
- Used by applications to establish their identity.
Truststore:
- Stores trusted certificates (usually CA certificates).
- Verifies the identity of external systems.
Tools:

Use Java keytool to manage keystores and truststores.

Certificate (X.509 Certificate):

A digital certificate (e.g., X.509) is a document that binds a public key to an entity (such as a person, organization, or website).
Contains:
- The public key of the entity.
- Information about the entity (name, organization, etc.).
- Issuer details (Certificate Authority - CA).
- Validity period (start and expiry dates).
- Signature of the CA to ensure authenticity.
Does not contain: The private key.

Example:
A website's SSL/TLS certificate contains the site's public key but not the private key.

Key (Public/Private Key):

A key pair consists of:
- Public key: Used for encryption or signature verification.
- Private key: Used for decryption or signing (must be kept secret).
A key itself does not contain a certificate, but it can be associated with a certificate.

How They Work Together:

When a certificate is issued, it includes the public key that corresponds to a private key stored securely by the certificate owner.
When encrypting or verifying data, the certificate's public key is used.
The corresponding private key (not in the certificate) is used to decrypt or sign.

Sunday, December 29, 2024

Kubernetes architecture is designed to manage and orchestrate containerized applications efficiently. It consists of a Master Node (control plane) and multiple Worker Nodes, along with various components that communicate and collaborate to ensure application reliability, scalability, and high availability.

Kubernetes Architecture Overview

1. Master Node (Control Plane)

The master node is responsible for managing the cluster, maintaining the desired state of applications, and scheduling workloads.

Key Components of the Master Node

API Server:
- Acts as the cluster's front-end.
- Receives REST API requests from users, tools, and other components.
- Validates and processes the requests, and updates the cluster's state in etcd.
Scheduler:
- Assigns work (Pods) to worker nodes based on resource availability, constraints, and policies.
- Ensures efficient utilization of cluster resources.
Controller Manager:
- Runs various controllers (control loops) to ensure the desired state of the cluster.
- Types of controllers:
  - Node Controller: Monitors node health.
  - Replication Controller: Ensures the desired number of pod replicas.
  - Endpoint Controller: Manages service and pod relationships.
  - Service Account Controller: Manages service accounts and API tokens.
etcd:
- A distributed key-value store used for storing cluster state and configuration data.
- Provides consistency and high availability for cluster metadata.

2. Worker Node

Worker nodes run application workloads and manage containers. Each worker node is responsible for running Pods.

Key Components of a Worker Node

Kubelet:
- An agent that runs on each worker node.
- Ensures containers are running in the desired state as specified by the control plane.
- Communicates with the API Server.
Container Runtime:
- Software responsible for running containers (e.g., Docker, containerd, CRI-O).
- Interfaces with Kubernetes using the Container Runtime Interface (CRI).
Kube Proxy:
- Manages networking for Pods.
- Implements network rules to allow communication between Pods and Services.
- Supports various networking modes (e.g., IP tables or IPVS).
Pod:
- The smallest deployable unit in Kubernetes, which encapsulates one or more containers, storage resources, and networking.
- All Pods on a worker node are scheduled by the Master Node.

3. Add-Ons

Additional components that extend Kubernetes functionality:

Dashboard: A web UI for managing and monitoring the cluster.
DNS: Internal DNS for resolving service names within the cluster.
Ingress Controller: Manages HTTP and HTTPS traffic to applications.
Monitoring Tools: Tools like Prometheus or Grafana for monitoring.
Logging: Centralized logging solutions like Fluentd or Elasticsearch.

Key Concepts

1. Pod

The smallest deployable unit in Kubernetes.
Encapsulates containers, shared storage, and network.

2. Service

Exposes a group of Pods as a network service.
Types:
- ClusterIP (default): Internal access within the cluster.
- NodePort: External access via a node's IP and a static port.
- LoadBalancer: External access via a cloud provider's load balancer.

3. Ingress

Manages external access to services, usually HTTP/HTTPS.

4. ReplicaSet

Ensures a specified number of pod replicas are running at all times.

5. Deployment

Manages updates, rollbacks, and scaling of applications.

6. Namespace

Logical partitioning for resource isolation and organization.

Benefits of Kubernetes Architecture

High Availability: Redundancy at multiple levels ensures no single point of failure.
Scalability: Automatically scales applications based on demand.
Flexibility: Supports hybrid and multi-cloud deployments.
Self-Healing: Automatically restarts or replaces failed containers.

Cluster Information

kubectl cluster-info                     # View cluster info
kubectl get nodes                        # List nodes in the cluster
kubectl get componentstatuses            # Check cluster components' health

Namespaces

kubectl get namespaces                   # List all namespaces
kubectl create namespace <name>          # Create a namespace
kubectl delete namespace <name>          # Delete a namespace
kubectl config set-context --current --namespace=<name>  # Set default namespace

Pods

kubectl get pods                         # List all pods in the current namespace
kubectl get pods -n <namespace>          # List pods in a specific namespace
kubectl describe pod <pod-name>          # Detailed information about a pod
kubectl logs <pod-name>                  # View logs of a pod
kubectl exec -it <pod-name> -- /bin/bash # Access a pod's shell
kubectl delete pod <pod-name>            # Delete a pod
kubectl get pods --field-selector=status.phase=Running  # List running pods

Deployments

kubectl get deployments                  # List deployments
kubectl describe deployment <name>       # Detailed deployment information
kubectl apply -f deployment.yaml         # Apply a deployment configuration
kubectl scale deployment <name> --replicas=<count>  # Scale a deployment
kubectl rollout restart deployment/<name>  # Restart deployment pods
kubectl edit deployment <name>
kubectl delete deployment <name>         # Delete a deployment

Services

kubectl get services                     # List all services
kubectl describe service <name>          # Detailed service information
kubectl expose deployment <name> --type=NodePort --port=8080  # Expose a deployment
kubectl apply -f service.yaml            # Apply a service configuration
kubectl delete service <name>            # Delete a service

ConfigMaps & Secrets

kubectl get configmaps                   # List all ConfigMaps
kubectl create configmap <name> --from-literal=key=value  # Create a ConfigMap
kubectl describe configmap <name>        # Detailed ConfigMap information
kubectl delete configmap <name>          # Delete a ConfigMap

kubectl get secrets                      # List all Secrets
kubectl create secret generic <name> --from-literal=key=value  # Create a Secret
kubectl describe secret <name>           # Detailed Secret information
kubectl delete secret <name>             # Delete a Secret

Ingress

kubectl get ingress                      # List all Ingress rules
kubectl apply -f ingress.yaml            # Apply an Ingress configuration
kubectl delete ingress <name>            # Delete an Ingress

YAML File Templates

Pod YAML

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
  labels:
    app: my-app
spec:
  containers:
  - name: my-container
    image: nginx
    ports:
    - containerPort: 80

Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-container
        image: nginx
        ports:
        - containerPort: 80

Service YAML

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: NodePort

Ingress YAML

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

Advanced Commands

Config & Resources

kubectl get all                          # Get all resources in the namespace
kubectl top nodes                        # Show resource usage of nodes
kubectl top pods                         # Show resource usage of pods
kubectl apply -f <file>                  # Apply configuration from a file
kubectl edit <resource> <name>           # Edit a resource interactively
kubectl delete -f <file>                 # Delete resources defined in a file

Debugging

kubectl describe <resource> <name>       # Detailed info about a resource
kubectl logs <pod-name>                  # Fetch logs from a pod
kubectl logs <pod-name> -c <container>   # Logs for a specific container
kubectl exec -it <pod-name> -- <command> # Run commands inside a container
kubectl get events                       # View cluster events

Rollouts

kubectl rollout status deployment/<name>    # Check rollout status
kubectl rollout undo deployment/<name>      # Rollback to a previous version

Wednesday, November 27, 2024

AWS

What is IAM rules.
Lambda Pros / Cons [ColdStart Latency].
How many ways we can invoke a lambda
SQS VS SNS
S3 usage and different types of data encryption S3.
Can create same bucket name with different regions.
how to integrate spring boot with lamda.
What is ECS and how to deploy micro services and configuration steps each (Cluster creation,service creation,Task defination,Domains,VPN network)
How to configure VPN inbound and outbound ECS.
What is EC2 and how to deploy spring boot micro service (pros and cons)
Difference in Fargate and EC2.
What is Fargate
AWS RDS and ECR configuration.
Usages of EKS and difference Kubernates.
Difference in Dynamo DB and MonogoDb (with index creation).
Different types of AWS gateways and Load Balancers.
Subnet and VPN configuration for IP restrictions one server to another server.
Difference Cloudwatch and cloudtrail.
How to integrate spring boot with cloudwatch.
What is Elastic bean stack.
AWS cli commands for ECS, EC2.

Saturday, November 23, 2024

ReactJS

Getting Started

Install React:

npx create-react-app my-app
cd my-app
npm start

Basic Component:

import React from 'react';

const MyComponent = () => {
  return <h1>Hello, World!</h1>;
};

export default MyComponent;

JSX (JavaScript XML)

Embed expressions in curly braces:

const name = "React";
<h1>Hello, {name}!</h1>

Conditional rendering:

jsx
const isLoggedIn = true;
<div>{isLoggedIn ? 'Welcome!' : 'Please log in.'}</div>;

Apply CSS:

jsx
<div style={{ color: 'blue', fontSize: '20px' }}>Styled Text</div>

Props

Passing data to child components:

const Welcome = ({ name }) => <h1>Hello, {name}!</h1>;
<Welcome name="John" />;

Default props:

jsx
Welcome.defaultProps = {
  name: 'Guest',
};

State (useState Hook)

Manage component state:

import React, { useState } from 'react';

const Counter = () => {
  const [count, setCount] = useState(0);

  return (
    <div>
      <p>Count: {count}</p>
      <button onClick={() => setCount(count + 1)}>Increment</button>
    </div>
  );
};

Effect (useEffect Hook)

Handle side effects (e.g., fetching data):

import React, { useEffect, useState } from 'react';

const FetchData = () => {
  const [data, setData] = useState([]);

  useEffect(() => {
    fetch('https://api.example.com/data')
      .then((response) => response.json())
      .then((data) => setData(data));
  }, []); // Empty array: runs only on mount

  return <div>{JSON.stringify(data)}</div>;
};

Events

Handling events:

const handleClick = () => alert('Button clicked!');
<button onClick={handleClick}>Click Me</button>;

Passing parameters:

jsx
<button onClick={() => handleClick('React')}>Click Me</button>;

Forms and Inputs

Controlled inputs:

const [value, setValue] = useState('');

const handleChange = (event) => setValue(event.target.value);

return <input type="text" value={value} onChange={handleChange} />;

Lists

Rendering lists:

const items = ['React', 'Angular', 'Vue'];

return (
  <ul>
    {items.map((item, index) => (
      <li key={index}>{item}</li>
    ))}
  </ul>
);

Routing (React Router)

Install React Router:
```
npm install react-router-dom
```

Basic example:

import { BrowserRouter as Router, Route, Link, Switch } from 'react-router-dom';

const App = () => (
  <Router>
    <nav>
      <Link to="/">Home</Link>
      <Link to="/about">About</Link>
    </nav>
    <Switch>
      <Route path="/" exact component={() => <h1>Home</h1>} />
      <Route path="/about" component={() => <h1>About</h1>} />
    </Switch>
  </Router>
);

Lifecycle Methods

Using Hooks for lifecycle:

Mounting: useEffect(() => {}, []);
Updating: useEffect(() => {}, [dependencies]);

Unmounting:


useEffect(() => {
  return () => {
    // Cleanup code here
  };
}, []);

Mounting:

constructor()
getDerivedStateFromProps()
render()
ComponentDidMount()

Updating

getDerivedStateFromProps()
ShouldComponentUpdate()
render()
getSnapshotBeforeUpdate()
ComponentDidUpdate()

Ummounting

ComponentWillUnmount()

Context API

Context for global state:

import React, { createContext, useContext } from 'react';

const ThemeContext = createContext('light');

const App = () => (
  <ThemeContext.Provider value="dark">
    <Toolbar />
  </ThemeContext.Provider>
);

const Toolbar = () => {
  const theme = useContext(ThemeContext);
  return <div>Theme: {theme}</div>;
};

Custom Hooks

Create reusable logic:

const useCounter = (initialValue = 0) => {
  const [count, setCount] = useState(initialValue);
  const increment = () => setCount(count + 1);
  return [count, increment];
};

const Counter = () => {
  const [count, increment] = useCounter();
  return <button onClick={increment}>Count: {count}</button>;
};

Optimizations

React.memo: Prevent unnecessary renders.

const MemoizedComponent = React.memo(MyComponent);

useCallback: Memoize functions.

const memoizedCallback = useCallback(() => {
  doSomething();
}, [dependencies]);

useMemo: Memoize values.

const memoizedValue = useMemo(() => computeExpensiveValue(a, b), [a, b]);

Full Expansions
imr - Import React
import * as React from "react";

imrc - Import React, Component
import * as React from "react";
import { Component } from "react";

imrd - Import ReactDOM
import ReactDOM from "react-dom";

imrs - Import React, useState
import * as React from "react";
import { useState } from "react";

imrse - Import React, useState, useEffect
import * as React from "react";
import { useState, useEffect } from "react";

impt - Import PropTypes
import PropTypes from "prop-types";

impc - Import PureComponent
import * as React from "react";
import { PureComponent } from "react";

cc - Class Component
class | extends React.Component {
  render() {
    return <div>|</div>
  }
}

export default |;

ccc - Class Component With Constructor
class | extends Component {
  constructor(props) {
    super(props);
    this.state = { | };
  }
  render() {
    return ( | );
  }
}

export default |;

cpc - Class Pure Component
class | extends PureComponent {
  state = { | },
  render() {
    return ( | );
  }
}

export default |;

ffc - Function Component
function (|) {
    return ( | );
}

export default |;

sfc - Stateless Function Component (Arrow function)
const | = props => {
  return ( | );
};

export default |;

cdm - componentDidMount
componentDidMount() {
  |
}

uef - useEffect Hook
useEffect(() => {
  |
}, []);

ucb - useCallback Hook
useCallback((val) => {
  |
}, []);

cwm - componentWillMount
//WARNING! To be deprecated in React v17. Use componentDidMount instead.
componentWillMount() {
  |
}

cwrp - componentWillReceiveProps
//WARNING! To be deprecated in React v17. Use new lifecycle static getDerivedStateFromProps instead.
componentWillReceiveProps(nextProps) {
  |
}

gds - getDerivedStateFromProps
static getDerivedStateFromProps(nextProps, prevState) {
  |
}

scu - shouldComponentUpdate
shouldComponentUpdate(nextProps, nextState) {
  |
}

cwu - componentWillUpdate
//WARNING! To be deprecated in React v17. Use componentDidUpdate instead.
componentWillUpdate(nextProps, nextState) {
  |
}

cdu - componentDidUpdate
componentDidUpdate(prevProps, prevState) {
  |
}

cwun - componentWillUnmount
componentWillUnmount() {
  |
}

cdc - componentDidCatch
componentDidCatch(error, info) {
  |
}

gsbu - getSnapshotBeforeUpdate
getSnapshotBeforeUpdate(prevProps, prevState) {
  |
}

ss - setState
this.setState({ | : | });

ssf - Functional setState
this.setState(prevState => {
  return { | : prevState.| }
});

usf - Declare a new state variable using State Hook
const [|, set|] = useState();

Hit Tab to apply CamelCase to function. e.g. [count, setCount]

ren - render
render() {
  return (
    |
  );
}

rprop - Render Prop
class | extends Component {
  state = { | },
  render() {
    return this.props.render({
      |: this.state.|
    });
  }
}

export default |;

hoc - Higher Order Component
function | (|) {
  return class extends Component {
    constructor(props) {
      super(props);
    }

    render() {
      return < | {...this.props} />;
    }
  };
}

cpf - Class Property Function
  | = (e) => {
    |
  }

Tuesday, November 19, 2024

Kafka

Kafka architecture is designed as a distributed, scalable, and fault-tolerant system for real-time data streaming. It is widely used for building streaming applications and data pipelines.

Producer Mechanics

Batching and Compression:
- Producers batch messages to optimize network utilization.
- Compression (e.g., gzip, Snappy, LZ4, Zstd) reduces message size and improves throughput.
Acknowledgment Modes:

acks=0: Fire-and-forget, no guarantees.
acks=1: Leader acknowledgment; higher throughput, possible data loss.
acks=all: Full acknowledgment; durable but slower.

Consumer Mechanics

Offset Management:
- Consumers track message offsets via Kafka’s internal topic (__consumer_offsets) or externally (e.g., database).
Rebalancing:
- Dynamic assignment of partitions to consumers in a group.
- Sticky partitioning strategies reduce data re-fetching during rebalances.
Commit Strategies:

Auto-commit: Automatic offset saving; fast but risk of duplicate processing.
Manual commit: Offers control but requires careful management.

Kafka Storage Insights

Retention Policies:
- Time-based: Retain data for a configured duration.
- Size-based: Retain data until partition log reaches a set size.
Log Compaction:
- Retains the latest record for a key, useful for change data capture (CDC) or state storage.
Tiered Storage (Newer feature):

Offloads cold data to cheaper, external storage like S3 or HDFS.

Kafka Clustering

Broker Management:

Horizontal scaling by adding brokers.
Partition reassignment tools manage data redistribution.

Monitoring and Optimization

Key Metrics:

Broker Metrics: Disk I/O, network throughput, replication lag.
Producer Metrics: Record send rate, compression ratio, batch size.
Consumer Metrics: Fetch lag, commit latency, offset lag.

Kafka at Scale

Capacity Planning:
- Estimate throughput, storage, and partitioning needs.
Scaling Strategies:
- Dynamic addition of brokers, partition scaling, and topic rebalance.
High Throughput:

Optimize producer and broker configurations for sustained performance.

Error handling strategies

Several options are available for handling messages stored in a dead letter queue:

Re-process: Some messages in the DLQ need to be re-processed. However, first, the issue needs to be fixed. The solution can be an automatic script, human interaction to edit the message, or returning an error to the producer asking for re-sending the (corrected) message.
Drop the bad messages (after further analysis): Bad messages might be expected depending on your setup. However, before dropping them, a business process should examine them. For instance, a dashboard app can consume the error messages and visualize them.
Advanced analytics: Instead of processing each message in the DLQ, another option is to analyze the incoming data for real-time insights or issues. For instance, a simple ksqlDB application can apply stream processing for calculations, such as the average number of error messages per hour or any other insights that help decide on the errors in your Kafka applications.
Stop the workflow: If bad messages are rarely expected, the consequence might be stopping the overall business process. The action can either be automated or decided by a human. Of course, stopping the workflow could also be done in the Kafka application that throws the error. The DLQ externalizes the problem and decision-making if needed.
Ignore: This might sound like the worst option. Just let the dead letter queue fill up and do nothing. However, even this is fine in some use cases, like monitoring the overall behavior of the Kafka application. Keep in mind that a Kafka topic has a retention time, and messages are removed from the topic aft r that time. Just set this up the right way for you. And monitor the DLQ topic for unexpected behavior (like filling up way too quickly).

Kafka Questions

1. Kafka Basics

What is Kafka, and how does it differ from traditional messaging systems like RabbitMQ or ActiveMQ?
Explain Kafka’s architecture and its core components.
How does Kafka ensure fault tolerance?
What is the difference between a topic, partition, and offset in Kafka?
Can Kafka be used as a database? Why or why not?

2. Kafka Producers

How do Kafka producers achieve high throughput?
What are the acknowledgment (acks) configurations in Kafka, and how do they affect message delivery guarantees?
Explain Kafka producer's batching mechanism.
How does Kafka handle retries and retries with idempotence?
What is the role of the partition key in a Kafka producer? How does it influence message routing?

3. Kafka Consumers

What is the purpose of consumer groups in Kafka?
Explain how Kafka ensures message delivery semantics: at-least-once, at-most-once, and exactly-once.
How are offsets managed in Kafka? What are the pros and cons of auto-committing offsets?
What is rebalancing in Kafka, and how can it affect consumers?
How would you troubleshoot offset lag in a consumer group?

4. Kafka Brokers and Clustering

How does Kafka distribute partitions among brokers?
What is ISR (In-Sync Replica) in Kafka, and why is it important?
How does Kafka handle leader election for partitions?
Explain the difference between Kafka’s old ZooKeeper-based architecture and the new KRaft architecture.
What happens when a Kafka broker fails? How is data consistency ensured?

5. Kafka Storage

How does Kafka handle log segmentation and log compaction?
What are Kafka’s retention policies, and when would you use each type?
How does Kafka achieve high performance with its write-ahead log (WAL) design?
Explain tiered storage in Kafka and its advantages.
What is the role of indexes in Kafka logs, and how do they optimize reads?

6. Kafka Security

What security features does Kafka provide?
How do SASL and SSL/TLS work in Kafka for authentication and encryption?
What is the purpose of Kafka ACLs, and how do you configure them?
Explain the concept of role-based access control (RBAC) in Kafka.
How would you secure a Kafka cluster in a production environment?

7. Kafka Operations

How do you monitor the health of a Kafka cluster?
What are some common Kafka metrics, and why are they important?
How would you handle partition reassignment in Kafka?
What are the best practices for scaling a Kafka cluster?
How do you troubleshoot issues like high replication lag or message delays?

8. Kafka Streams and Kafka Connect

What is Kafka Streams, and how does it differ from Apache Spark Streaming or Flink?
How do stateful operations in Kafka Streams work, and where is the state stored?
Explain the difference between KTable and KStream.
What is Kafka Connect, and how does it help in integrating systems with Kafka?
How would you handle schema evolution in Kafka Connect with tools like Schema Registry?

9. Advanced Kafka Topics

How does Kafka achieve exactly-once semantics (EOS)?
What are the advantages and limitations of using Kafka for event sourcing?
Explain the concept of a dead letter queue (DLQ) in Kafka.
How does Kafka MirrorMaker 2.0 work for cross-cluster replication?
What strategies would you use to optimize Kafka throughput?

10. Kafka at Scale

What factors influence Kafka’s partitioning strategy, and how do you determine the number of partitions?
How would you design a Kafka deployment to handle high-throughput workloads?
Explain Kafka’s performance trade-offs when handling large messages.
How would you handle multi-region Kafka deployments?
What are the key considerations for capacity planning in a Kafka cluster?

11. Real-World Scenarios

How would you design a fault-tolerant Kafka pipeline for a payment system?
What challenges have you faced in Kafka production environments, and how did you resolve them?
Describe a use case where you used Kafka Streams to process real-time data.
How do you handle schema compatibility in Kafka when integrating with multiple systems?
Have you implemented a Kafka monitoring or alerting system? What tools did you use?

1. What are Kafka’s main components and their roles?

Answer:

Producer: Sends messages to Kafka topics.
Consumer: Reads messages from Kafka topics.
Broker: Kafka server that stores messages on disk and serves client requests.
Topic: A logical channel to which messages are published and read.
Partition: A topic is divided into partitions for scalability and parallelism.
Offset: A unique identifier for each message within a partition.
ZooKeeper/KRaft: (Legacy/New) Responsible for metadata management, leader election, and state coordination.

2. How does Kafka achieve fault tolerance?

Answer:

Kafka achieves fault tolerance through:

Replication: Each partition has replicas across brokers. If the leader fails, another replica is promoted.
In-Sync Replica (ISR): Replicas synchronized with the leader; ensures consistency.
Data Persistence: Messages are stored on disk and survive broker failures.
Leader Election: Handles broker or partition leader failure using ZooKeeper or KRaft.

3. What is the role of a partition in Kafka?

Answer:

Partitions allow Kafka to scale horizontally by distributing data across brokers.
Each partition is processed independently, enabling parallelism.
Partitions maintain message order within themselves but not across the entire topic.
They also play a critical role in replication for fault tolerance.

4. How does Kafka handle message delivery guarantees?

Answer:

At-least-once: Default; messages are delivered at least once, possible duplicates.
- Achieved by re-sending if acknowledgments fail.
At-most-once: Messages are delivered at most once, possible data loss.
- Achieved by disabling retries and acknowledgment.
Exactly-once: Ensures no duplicates or losses.
- Achieved using idempotent producers and Kafka transactions.

5. What is Kafka’s log compaction?

Answer:

Log compaction is a mechanism to retain only the latest message for a key in a topic, ensuring:

Storage optimization by discarding old values.
Supporting use cases like change data capture (CDC) or maintaining up-to-date key-value states.
It is controlled by the cleanup.policy=compact configuration.

6. Explain ZooKeeper’s role in Kafka.

Answer:

In Kafka (legacy):

Manages metadata like brokers, topics, and partitions.
Handles leader election for partitions.
Tracks broker heartbeats to detect failures. In newer Kafka versions (KRaft):
ZooKeeper is replaced by Kafka-native consensus for managing metadata.

7. How do Kafka consumers handle offset management?

Answer:

Consumers use offsets to track their progress:

Automatic Offset Commit: Kafka automatically commits offsets at regular intervals.
Manual Offset Commit: Consumers explicitly commit offsets for greater control. Offsets are stored:
In Kafka: Default; stored in __consumer_offsets topic.
Externally: Custom storage mechanisms (e.g., databases) for advanced use cases.

8. What are ISR (In-Sync Replicas) and their importance?

Answer:

ISR is the set of replicas that are fully synchronized with the partition leader.

Importance:
- Ensures data durability and fault tolerance.
- Leader election only happens among ISR replicas to prevent data loss.
If a replica falls behind, it’s removed from ISR.

9. What are the configurations for producer acknowledgment (`acks`)?

Answer:

acks=0: Producer doesn’t wait for acknowledgment. Fast but risky (possible data loss).
acks=1: Leader acknowledges once it writes to the log. Balances reliability and performance.
acks=all: All ISR replicas acknowledge. Ensures durability at the cost of latency.

10. How does Kafka achieve exactly-once semantics?

Answer:

Kafka achieves exactly-once semantics (EOS) through:

Idempotent Producers: Ensures the same message isn’t written twice to the log.
Transactions: Groups multiple producer and consumer operations into atomic units.
Kafka Streams: Automatically supports EOS for stream processing.

11. What is rebalancing in Kafka?

Answer:

Rebalancing occurs when:

A new consumer joins a group.
A consumer leaves or fails.
Topics or partitions change. Impact:
Partitions are reassigned among consumers.
May cause temporary unavailability or duplicate processing. Optimization:
Use sticky partition assignment strategies to minimize disruptions.

12. What are Kafka’s retention policies?

Answer:

Time-Based: Retain messages for a configured duration (log.retention.hours).
Size-Based: Retain messages until log size reaches a threshold (log.retention.bytes).
Log Compaction: Retain the latest record for a key (cleanup.policy=compact).

13. What is Kafka Streams?

Answer:

Kafka Streams is a Java library for building real-time stream processing applications.

Features:
- Supports stateful and stateless transformations.
- Scales horizontally across multiple instances.
- Provides fault-tolerant state stores.
Example Use Case:
- Real-time data transformation or aggregations (e.g., computing metrics from logs).

14. How would you monitor a Kafka cluster?

Answer:

Use tools and metrics like:

JMX Metrics: Monitor broker, producer, and consumer performance.
Prometheus/Grafana: Visual dashboards for Kafka metrics.
Key Metrics:
- Broker: Disk usage, network throughput.
- Producer: Record send rate, retries.
- Consumer: Offset lag, fetch latency.
Tools: Confluent Control Center, LinkedIn’s Burrow.

15. How does Kafka handle high throughput?

Answer:

Batching: Combines multiple messages into a single network request.
Compression: Reduces message size using gzip, Snappy, or LZ4.
Partitioning: Distributes load across brokers for parallel processing.
Efficient I/O: Uses sequential disk writes (write-ahead logs).
Optimized Configurations:
- Increase num.partitions and tune producer batch sizes.

Issues with Resolutions

1.
Situation / Task (Explain the situation or task so others understand the context):
Partition Queue is block (for emails/ for any events) due to one bad event.
Action (Give details about what you or another person did to handle the situation):
Raise the alert and log the exception
adding Feature flag for logging or catching the exception.
Result (Describe what was achieved by the action and why it was effective):
Now the partition queue released all blocked event, and it seems, no blocking queue.

2.
Situation / Task (Explain the situation or task so others understand the context):
Few messages in the queue are missing
Action (Give details about what you or another person did to handle the situation):
Previously, we have added event specific multiple listeners, now we maintained in a single endpoint listener and once we got the events, based on the event, split those events into multiple handlers.
Result (Describe what was achieved by the action and why it was effective):
Now able to process all messages in the queue.

3.
Situation / Task (Explain the situation or task so others understand the context):
Idempotent issue
Action (Give details about what you or another person did to handle the situation):
Maintained the key in cache and validating (implemented the logic for idempotent issues.)
Filtering the duplicates
Result (Describe what was achieved by the action and why it was effective):
Now the records are capturing as resilient.

4.
Situation / Task (Explain the situation or task so others understand the context):
Alignment Issue in UX (browser specific)
Action (Give details about what you or another person did to handle the situation):
It seems, 1 day task but after revamp the complete block of html div, then the issue is fixed.
Result (Describe what was achieved by the action and why it was effective):
In all browsers, it looks good

5.
Situation / Task (Explain the situation or task so others understand the context):
ETL performance issues, each job is executing more than expected time.
Action (Give details about what you or another person did to handle the situation): Result (Describe what was achieved by the action and why it was effective):

6.
Situation / Task (Explain the situation or task so others understand the context):
While using @Async, @Transactional, @Cacheable annotations, Its not working as expected

Action (Give details about what you or another person did to handle the situation):

Result (Describe what was achieved by the action and why it was effective):
Those annotations are Spring specific
It will not applicable for private method and calling method is in same class
Because, it will create proxy based, so while instantiate the bean, the proxy intercepts before / after and add few functionalities.

7. Situation / Task (Explain the situation or task so others understand the context):
FIS

Action (Give details about what you or another person did to handle the situation):

Result (Describe what was achieved by the action and why it was effective):