Understanding and Applying Normalization Techniques in Data models
Gugha Priyaa
by Gughapriyaa Elango
1M ago
In the realm of databases, normalization reigns supreme. It’s a structured approach to organizing data, minimizing redundancy, and ensuring data integrity. But navigating the world of normal forms can be daunting, with acronyms like 1NF, 2NF, and BCNF flying around. Fear not, data enthusiast, for this article will be your guide! 1NF: The Foundation (First Normal Form) Imagine a database table where a single cell holds multiple values. Not ideal, right? 1NF lays the groundwork by ensuring each cell contains a single atomic value. No more lists, no more chaos. This is the basic buildin ..read more
Visit website
Beyond the Basics: Exploring Advanced Data Quality Metrics
Gugha Priyaa
by Gughapriyaa Elango
1M ago
Building on your existing knowledge, let’s delve deeper into advanced data quality metrics to enhance your data analysis and anomaly detection capabilities. Information Theory Metrics: KL Divergence (KLD): This measures the difference between two probability distributions. For discrete data, it represents the relative entropy between the observed distribution and an expected distribution. You can define thresholds for KLD to identify significant deviations from the expected distribution (e.g., min and max values). Statistical Tests: Chi-squared Test: This tests the independence be ..read more
Visit website
Dealing with PII as data engineer
Gugha Priyaa
by Gughapriyaa Elango
1M ago
In today’s data-driven world, data engineers occupy a crucial yet precarious position. While they wield the power to unlock valuable insights, they also face the daunting responsibility of safeguarding sensitive information, particularly Personally Identifiable Information (PII). Navigating the complex landscape of data privacy regulations and best practices requires a nuanced understanding of the challenges and effective strategies for mitigation. PII encompasses a broad spectrum of data elements, ranging from names and addresses to social security numbers and health records. As data collecti ..read more
Visit website
The buzz around new age databases
Gugha Priyaa
by Gughapriyaa Elango
1M ago
ScyllaDB: A NoSQL Dynamo for Large-Scale Ingestion and Analytics Leveraging C++ for raw performance, ScyllaDB shines as a NoSQL database built for high-throughput, low-latency operations. Its architecture, inspired by Google’s Dynamo, excels at handling massive datasets across geographically distributed clusters. Real-time sensor analysis, large-scale e-commerce platforms, and fraud detection systems all benefit from its blazing-fast query execution and write performance. Redpanda: A Streaming Kafka Companion for Microservices and Event-Driven Architecture Redpanda emerges as a modern, distrib ..read more
Visit website
Database Indexing
Gugha Priyaa
by Gughapriyaa Elango
2M ago
Imagine a vast library, not just with books, but entire worlds of information neatly arranged in rows and columns. Now, imagine finding a specific piece of data within seconds, no frantic page-flipping required. That’s the magic of database indexing! But indexing isn’t just one-size-fits-all. It’s a multi-faceted tool with different approaches, each with its own strengths and quirks. Clustered Indexes Imagine a phone book where names and numbers are physically glued together. That’s the essence of a clustered index. It dictates the physical order of data within a table. Think of it like o ..read more
Visit website
Parquet and Avro formats — how does data get compressed?
Gugha Priyaa
by Gughapriyaa Elango
2M ago
Parquet and Avro formats — how does data get compressed? PARQUET: In Parquet, compression is performed column by column and it is built to support flexible compression options and extendable encoding schemas per data type — e.g., different encoding can be used for compressing integer and string data. Parquet data can be compressed using these encoding methods: Dictionary encoding: this is enabled automatically and dynamically for data with a small number of unique values. Bit packing: Storage of integers is usually done with dedicated 32 or 64 bits per integer. This allows more ..read more
Visit website
Relational Data modeling essentials — notes
Gugha Priyaa
by Gughapriyaa Elango
2M ago
Relational Data modeling essentials — notes Stratergies for efficient relational modeling: Identifying Entities: Start by identifying the key entities involved: Customers, Orders, Products, and Categories. Defining Relationships: Discuss the relationships between these entities. For example, a Customer can have multiple Orders, and each Order can contain multiple Products. A Product can belong to a Category. Establishing keys — primary, foreign, surrogate, compound Normalization: Discuss the importance of normalization in reducing data redundancy and improving data integrity. Also, consider w ..read more
Visit website
Dimensional data modeling — Essentials
Gugha Priyaa
by Gughapriyaa Elango
2M ago
Dimensional data modeling — notes Denormalised modeling is used FOR AGILE ANALYTICS AND AD HOC ANALYTICS, WHERE THE SPEED OF READ OPERATIONS IS CRITICAL USER APPLICATIONS. Dimensional and Denormalized models have different purposes. Dimensional models are generally used for data warehousing scenarios, and are particularly useful where super-fast query results are required for computed numbers such as “quarterly sales by region” or “by salesperson”. Data is stored in the Dimensional model after pre-calculating these numbers, and updated as per some fixed schedule. But even without a data wareho ..read more
Visit website
Snowflake — Datawarehouse essentials
Gugha Priyaa
by Gughapriyaa Elango
4M ago
Snowflake — Datawarehouse essentials You can be the accountadmin, and gets to see all the roles. In reality, in snowflake, you will only get access to some tools. Accountadmin has highest control. Sysadmin role is the one used for creating database, table, warehouse. Your default role is sysadmin. If you dont find a table, you should check the role setting, if you are authorized to do this. UNDERSTAND THE ADMIN ROLES FOR PRIVELGES: Admin, sysadmin, etc. You can create databases and decide who will have access to the databases. You can transfer ownership of databases between the roles, you ..read more
Visit website
Dynamics of hate speech analysis in tweets using text mining techniques
Gugha Priyaa
by Gughapriyaa Elango
4M ago
PROJECT OVERVIEW The project focuses on tackling the prevalence of hate speech on social media, aiming to create a robust model for detecting and flagging such harmful content. Primarily leveraging cutting-edge techniques like Convolutional Neural Networks (CNN), it aims to develop an effective hate speech detection system that contributes to a safer online environment. Beyond just user protection, the project prioritizes algorithmic fairness, striving to minimize biases in hate speech classification and ensure equitable treatment across diverse user groups. Additionally, it aims to foster co ..read more
Visit website

Follow Gugha Priyaa on FeedSpot

Continue with Google
Continue with Apple
OR