January 23, 2023Libish Varghese Jacob Reading time ~5 minutes

Setting Up Kafka

Apache Kafka is a high-throughput, low-latency event processing system designed to handle millions of data feeds per second. It has the capability to store, read and analyze streaming data. In an object-based system, what we are interested is the objects and its interaction with one another and the state of each object which we may store in a database. In an event-based system, our interest will be more on events generated by those objects or the system than the objects itself. Event data is stored in a structure called log and log is an ordered sequence of these events. Unlike databases, these logs are easy to scale. And Apache Kafka is a system to manage and process these logs. The main components of Kafka system are

January 31, 2022Libish Varghese Jacob Reading time ~7 minutes

Parquet file experiments, findings and recommendations

Parquet is a binary file format designed with big data in mind where we must access data frequently and efficiently. The way it stores file on the disk is also different from other file formats. It is a column-based data file. And in reality it uses both row based and column based approach to bring the best of both worlds. The data is encoded on disk which ensures that the size remains small compared to actual data and is then compressed where the file is scanned as whole and cut out redundant parts. The query/read speed is dramatically fast when compared to other file formats. Nested data is handled efficiently which is quite cumbersome in other file format to achieve. Doesn’t require to parse the entire file to find data due to its way of storing data. This makes it efficient in reading data. Works quite efficiently with data processing frameworks. Automatically stores schema information. SQL querying is possible with this file format using proper tools.

Data formats:

Data formats can be

Unstructured – When there is no specific structure. e.g Text, csv
Semi structured – XML, Json
Structured – Has records and rows, well defined schema, has very predictable locations where you can find the data - SQL, Parquet

March 30, 2021Libish Varghese Jacob Reading time ~1 minute

How to move a git folder repository to git repository without losing the history.

Here in the scenario, what I have was a folder repository where users will commit the changes. Users will treat the folder which is in a network location and will commit the changes to this repository. Now I want this to be moved to a bit bucket server so that I can use the proper process to get the changes merged. Here the challenge is to retain the history of changes without losing it while porting. Let’s see how this can be achieved. for this what we need is

August 27, 2020Libish Varghese Jacob Reading time ~1 minute

Delphi how to check if you have read permission on a directory

It seems to be difficult to check the read permission on a directory in Delphi 2009. Here we will see how to find the read permission in an alternate way. In later version of Delphi we have more straight way of achieving this.

February 26, 2020Libish Varghese Jacob Reading time ~3 minutes

Process WaitForExit with a timeout will not be able to collect the output message.

In this post we will evaluate a scenario where we try to execute a process and are trying to collect the output from the process but the output from a process is not collected by the calling process. If you are in this page, then you must have experienced the similar issue where the process is tested to collect the output from the process which we executed but it fails intermittently while collecting the output from the process. You might have done everything right by collecting the output from the process in an asynchronous manner but still it fails to collect the output in between. If you are using process.WaitForExit with a timeout, then it is the culprit. process.WaitForExit with a timeout is known to create issue when we have some parallelism in place and we are trying to execute say the same process multiple times in parallel or many different processes in parallel. The way to get out of this is to wait for the process without a timeout. This may have practical difficulties because if we don’t have a timeout, then there can be situation where we may wait forever for the process to exit. The way to get out of this is to run the process and wait for it without a timeout but timeout via the thread which is executing it. The below example will show you how to implement this.

Archives

Categories

Tags

About

Home

Welcome to Simple Basics

Setting Up Kafka

Parquet file experiments, findings and recommendations

Data formats:

How to move a git folder repository to git repository without losing the history.

Delphi how to check if you have read permission on a directory

Process WaitForExit with a timeout will not be able to collect the output message.