Taming the Kaniko beast
Medium » Hacking Analytics
by Julien Kervizic
1y ago
Kaniko brings about a lightweight way to build container images on Kubernetes— it may need however some taming before becoming usable. Photo by Geran de Klerk on UnsplashWhat is Kaniko Kaniko is a tool developed by Google to help build docker container images in Kubernetes. It is an application written in Go, that doesn’t depend on a Docker daemon. Why use Kaniko There are multiple reasons why you might want to use Kaniko. Kaniko is a lightweight tool that doesn’t require as many permissions and privileges as Docker and doesn’t have a need to have a running Docker serv ..read more
Visit website
Should you do a Master (MSc) in Data Science Or Analytics?
Medium » Hacking Analytics
by Julien Kervizic
2y ago
two temple guards Rijksmuseum Master of Data Science and Analytics has become quite well implemented as part of universities’ curriculums. They are now over a thousand masters offered across different departments and schools, from schools of Computer Science, Mathematics & Statistics, Economics or Business. When considering a MSc in Data Science, a few questions will probably pop up in your mind: Do I need a Master's in Data Science or Analytics to become a Data Scientist? Do I need to follow a Master of Data Science or Analytics to learn Data Science? Will a Master of Data Scie ..read more
Visit website
Data Engineer Archetypes
Medium » Hacking Analytics
by Julien Kervizic
2y ago
ON DATA ENGINEERING An overview of the different profiles of Data Engineers Photo by Wonderlane on Unsplash With the increased digitalization and data use cases stemming from it, the field of data engineering is becoming quite a in demand. Yet more often than not, hiring managers and companies don't fully grasp the nuance of the fields. There are many different data engineer archetypes, and while true generalist exists, typically, a Data Engineer will have particular expertise and affinity towards one of the areas of Data Engineering. Datawarehouse archetype The data warehouse archetype f ..read more
Visit website
Files formats for Data Engineers — (Part 1) — Standards Data Formats
Medium » Hacking Analytics
by Julien Kervizic
2y ago
ON DATA ENGINEERING Files formats for Data Engineers — (Part 1) — Standards Data Formats Photo by Maksym Kaharlytskyi on Unsplash Data Engineers need to understand the different files format they need to read from and write to, as well as the limitations and pitfalls of each format. They need to be able to judge which format would be the most suitable for the task at hand and understand how to interact with each of these file formats. Data engineers tend to deal with some of the more typical file formats when looking to ingest data: CSV/TSV, JSON, XML, and XLSX file formats. Man ..read more
Visit website
Speeding up your Python data pipelines with Cython and Nim
Medium » Hacking Analytics
by Julien Kervizic
2y ago
ON PYTHON How Cython or Nim can help you get more from Python and achieve better performance Photo by CHUTTERSNAP on Unsplash There are many ways to speed up Python, from writing more efficient python code to leveraging optimized libraries such as Numpy when doing mathematical computation. But it is also possible to speed up python by writing C extension code or using other languages such as Cython or Nim that compile to C/C ++ and allow to extend python in a more efficient way. Cython and Nims allow you to embed some Python code as part of your coding in the new. It is for instance possi ..read more
Visit website
Machine Learning in Production — How to Operate a Model factory
Medium » Hacking Analytics
by Julien Kervizic
2y ago
Machine Learning in Production — How to Operate a Model factory How to leverage MLOps principles for more impactful machine learning initiatives Photo by Johannes Plenio on Unsplash A couple of years ago, I touched base on some of the key principles to handling the deployments of machine learning models. These days merely deploying machine learning models into production is often not sufficient. What matters is to operate them at scale, and this requires a very different process and ways of working. The term MLOps is often used to identify this new paradigm. MLOps places primary cons ..read more
Visit website
Scaling with Pandas beyond the millions (of records)
Medium » Hacking Analytics
by Julien Kervizic
2y ago
Photo by billow926 on Unsplash Typically, Pandas find its' sweet spot in usage in low- to medium-sized- to medium-sized datasets up to a few million rows. Beyond this, more distributed frameworks such as Spark or Dask are usually preferred. It is, however, possible to scale pandas much beyond this point. The typical issue with scaling with Pandas is how to deal with Pandas' memory utilization. Pandas leverage data stores within memory and don't keep data on Disk. However, it offers some generator functionality that allows iterating some of its datasets chunks by chunk. Pandas of ..read more
Visit website
On Data Engineering code reviews
Medium » Hacking Analytics
by Julien Kervizic
2y ago
Photo by Markus Winkler on Unsplash It is essential to do code reviews in Data Engineering. Code reviews provide a good foundation for the future, especially when looking at real-time use cases and a way to avoid regressions now. Data Engineering code review can be similar to code reviews for Software engineering, but since Data Engineering deals with a higher degree of unknown the coding style often needs to be more defensive. The focus and priority are also often quite different than in software engineering. The four pillars of data engineering code review There are different pilla ..read more
Visit website
Finance & Financial Analytics for Data Engineers
Medium » Hacking Analytics
by Julien Kervizic
2y ago
ON DATA ENGINEERING Financial methods, applications, and modeling for Data Engineers Photo by Markus Spiske on Unsplash Being a role meant to support the decision-making process, Data Engineers need to understand certain Financial concepts and know-how to best leverage them in their data models. Some concepts are particularly important for Data Engineers activities, namely amortization, and allocation. While other concepts of controlling such as entitlement values, mix shift, and variance decomposition can also be helpful to understand how some of their data consumers might be levera ..read more
Visit website
Python’s Data Classes a Data Engineer’s best friend
Medium » Hacking Analytics
by Julien Kervizic
2y ago
ON Data Engineering Data Engineering application of data classes Photo by Nam Hoang on Unsplash Data classes are a relatively new introduction to Python, first released in Python 3.7 which provides an abstraction layer leveraging type annotations to define container objects for data. Compared to a normal Python class, data classes make do of some of the syntactic sugar for instantiation, and there are a number of areas where data class can add value to data engineering. Understanding Data Classes Data classes The data class library introduces a lightweight way to define objects ..read more
Visit website

Follow Medium » Hacking Analytics on FeedSpot

Continue with Google
Continue with Apple
OR