Docker Fundamentals for Data Engineers
Start Data Engineering
by
3d ago
1. Introduction 2. Docker concepts 2.1. Define the OS and its configurations with an image 2.2. Use the image to run containers 2.2.1. Communicate between containers and local OS 2.2.2. Start containers with docker CLI or compose 2.2.3. Executing commands in your docker container 3. Conclusion 4. References 1. Introduction Docker can be overwhelming to start with ..read more
Visit website
Data Engineering Best Practices - #2. Metadata & Logging
Start Data Engineering
by
1M ago
1. Introduction 2. Setup & Logging architecture 3. Data Pipeline Logging Best Practices 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Obtain visibility into the code’s execution sequence using text logs 3.3. Understand resource usage by tracking Metrics 3.4. Monitoring UI & Traceability 3.5. Rapid issue identification and resolution with actionable alerts 4. Conclusion 5 ..read more
Visit website
Uplevel your dbt workflow with these tools and techniques
Start Data Engineering
by
3M ago
1. Introduction 2. Setup 3. Ways to uplevel your dbt workflow 3.1. Reproducible environment 3.1.1. A virtual environment with Poetry 3.1.2. Use Docker to run your warehouse locally 3.2. Reduce feedback loop time when developing locally 3.2.1. Run only required dbt objects with selectors 3.2.2. Use prod datasets to build dev models with defer 3.2.3. Parallelize model building by increasing thread count 3 ..read more
Visit website
What is an Open Table Format? & Why to use one?
Start Data Engineering
by
5M ago
1. Introduction 2. What is an Open Table Format (OTF) 3. Why use an Open Table Format (OTF) 3.0. Setup 3.1. Evolve data and partition schema without reprocessing 3.2. See previous point-in-time table state, aka time travel 3.3. Git like branches & tags for your tables 3.4. Handle multiple reads & writes concurrently 4. Conclusion 5. Further reading 6 ..read more
Visit website
6 Steps to Avoid Messy Data in Your Warehouse
Start Data Engineering
by
6M ago
1. Introduction 2. Six Steps for a Clean Data Warehouse 2.1. Understand the business 2.2. Make data easy to use with the appropriate data model 2.3. Good input data is necessary for a good data warehouse 2.4. Define Source of Truth (SOT) and trace its usage 2.5. Keep stakeholders in the loop for a more significant impact 2.6. Watch out for org-level red flags ? 3 ..read more
Visit website
Data Engineering Best Practices - #1. Data flow & Code
Start Data Engineering
by
8M ago
1. Introduction 2. Sample project 3. Best practices 3.1. Use standard patterns that progressively transform your data 3.2. Ensure data is valid before exposing it to its consumers (aka data quality checks) 3.3. Avoid data duplicates with idempotent pipelines 3.4. Write DRY code & keep I/O separate from data transformation 3.5. Know the when, how, & what (aka metadata) of pipeline runs for easier debugging 3 ..read more
Visit website
What is a self-serve data platform & how to build one
Start Data Engineering
by
10M ago
1. Introduction 2. What is self-serve? 2.1. Components of a self-serve platform 3. Building a self-serve data platform 3.1. Creating dataset(s) 3.1.1. Gather requirements 3.1.2. Get data foundations right 3.2. Accessing data 3.3. Identify and remove dependencies 4. Conclusion 5. Further reading 6. References 1. Introduction Most companies want to build a self-serve data platform ..read more
Visit website
How to become a valuable data engineer
Start Data Engineering
by
11M ago
1. Introduction 2. Skills 2.1. Business Impact 2.1.1. Know your business 2.1.2. Money & Time 2.2. Technical skills 3. Build impactful projects 4. Conclusion 5. Further reading 1. Introduction So you are a new data engineer (or looking for a DE job) and want to better yourself as a data engineer. However, when you look at job postings or company tech stack, you are overwhelmed by the sheer amount of tools you have to learn ..read more
Visit website
Data Pipeline Design Patterns - #2. Coding patterns in Python
Start Data Engineering
by
1y ago
Introduction Sample project Code design patterns 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Singleton, & Object pool patterns Python helpers 1. Typing 2. Dataclass 3. Context Managers 4. Testing with pytest 5. Decorators Misc Conclusion Further reading References Introduction Using the appropriate code design pattern can make your code easy to read, extensible, and seamless to modify existing logic, debug, and enable developers to onboard quicker ..read more
Visit website
Stitch S3 DB Integration
Start Data Engineering
by
1y ago
Given Source S3 path and file delimiter data warehouse connection details (endpoint, port, username, password and database name) data warehouse schema name and table name Run frequency Steps Log into your stitch account, here Click on the Destination tab and use the data warehouse connection details to establish a destination database. Click on Add Integration button on your dashboard. Select Amazon S3 CSV as the integration in the next page ..read more
Visit website

Follow Start Data Engineering on FeedSpot

Continue with Google
Continue with Apple
OR