A Data Engineer’s Guide To Non-Traditional Data Storages

Data Engineering

With the rise of big data and data science, many engineering roles are being challenged and expanded. One new-age role is data engineering.

Originally, the purpose of data engineering was the loading of external data sources and the designing of databases (designing and developing pipelines to collect, manipulate, store, and analyze data).

It has since grown to support the volume and complexity of big data. So data engineering now encapsulates a wide range of skills, from web-crawling, data cleansing, distributed computing, and data storage and retrieval.

For data engineering and data engineers, data storage and retrieval is the critical component of the pipeline together with how the data can be used and analyzed.

In recent times, many new and different data storage technologies have emerged. However, which one is best suited and has the most appropriate features for data engineering?

Keep Reading

An HDFS Tutorial for Data Analysts Stuck With Relational Databases

Introduction

By now, you have probably heard of the Hadoop Distributed File System (HDFS), especially if you are data analyst or someone who is responsible for moving data from one system to another. However, what are the benefits that HDFS has over relational databases?

HDFS is a scalable, open source solution for storing and processing large volumes of data. HDFS has been proven to be reliable and efficient across many modern data centers.

HDFS utilizes commodity hardware along with open source software to reduce the overall cost per byte of storage.

With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics. It does not require the underpinnings and overhead to support transaction atomicity, consistency, isolation, and durability (ACID) as is necessary with traditional relational database systems.

Moreover, when compared with enterprise and commercial databases, such as Oracle, utilizing Hadoop as the analytics platform avoids any extra licensing costs.

One of the questions many people ask when first learning about HDFS is: How do I get my existing data into the HDFS?

In this article, we will examine how to import data from a PostgreSQL database into HDFS. We will use Apache Sqoop, which is currently the most efficient, open source solution to transfer data between HDFS and relational database systems. Apache Sqoop is designed to bulk-load data from a relational database to the HDFS (import) and to bulk-write data from the HDFS to a relational database (export).

Keep Reading

Landing Page Design: Building the Ultimate Landing Page

When I started to implement the Ultimate Hacking Keyboard, I wasn’t very marketing savvy. As an engineer, all I could see ahead was product development and technical challenges. However, marketing is just as important and must not be overlooked. A good landing page is a must-have.

Luckily for us, we realized that there’s a lot to do before we start our crowdfunding campaign, and an attractive site could turn this otherwise idle time to our advantage by capturing people’s attention, generating more subscribers and priming us for the campaign.

Keep Reading

How To Improve ASP.NET App Performance In Web Farm With Caching

A Brief Introduction to Caching

Caching is a powerful technique for increasing performance through a simple trick: Instead of doing expensive work (like a complicated calculation or complex database query) every time we need a result, the system can store – or cache – the result of that work and simply supply it the next time it is requested without needing to reperform that work (and can, therefore, respond tremendously faster).

Keep Reading

Webpack or Browserify & Gulp: Which Is Better?

As web applications grow increasingly complex, making your web app scalable becomes of the utmost importance. Whereas in the past writing ad-hoc JavaScript and jQuery would suffice, nowadays building a web app requires a much greater degree of discipline and formal software development practices, such as:

  • Unit tests to ensure modifications to your code don’t break existing functionality
  • Linting to ensure consistent coding style free of errors
  • Production builds that differ from development builds

The web also provides some of its own unique development challenges. For example, since webpages make a lot of asynchronous requests, your web app’s performance can be significantly degraded from having to request hundreds of JS and CSS files, each with their own tiny overhead (headers, handshakes, and so on). This particular issue can often be addressed by bundling the files together, so you’re only requesting a single bundled JS and CSS file rather than hundreds of individual ones.

Bundling tools tradeoffs: Webpack vs Browserify

Which bundling tool should you use: Webpack or Browserify + Gulp? Here is the guide to choosing.

Keep Reading