Data#


Table of Contents#


Resources#

Tools & Technologies

  • [ h ] Apache Airflow

  • [ h ] Apache Hadoop

  • [ h ] Apache Kafka

  • [ h ] Apache Spark

  • [ h ] Dask

  • [ h ] Docker

  • [ h ] Kubeflow

  • [ h ] Kubernetes

  • [ h ] MongoDB

  • [ h ] MySQL

  • [ h ] Neo4j

  • [ h ] PostgreSQL

  • [ h ] Redis

freeCodeCamp

  • [ y ] 06-16-2021. “Data Analytics Crash Course: Teach Yourself in 30 Days”.

  • [ y ] 07-02-2018. “SQL Tutorial - Full Database Course for Beginners”.

  • [ y ] 08-31-2018. “Database Design Course - Learn how to design and plan a database for beginners”.

Geek’s Lessons

  • [ y ] Geek’s Lesson’s SQL Tutorial

  • [ y ] Geek’s Lesson’s Advanced SQL Tutorial

My Lesson

  • [ y ] 11-17-2022. “IBM Data Analyst Complete Course | Data Analyst Tutorial For Beginners”.

Nerd’s Lesson

  • [ y ] 07-10-2023. “Advanced Data Analytics,Google | Data Analytics”.

  • [ y ] 04-15-2023. “Database Engineering Complete Course | DBMS Complete Course”. YouTube.

more

  • [ y ] 03-01-2023. Stenway. “Stop Using CSV !”.

  • [ y ] 01-21-2024 ExplainingComputers. “Explaining File Compression Formats”.

online


Texts#

DBs

  • [ h ] Abiteboul, Serge; Richard Hull; & Victor Vianu. Foundations of Databases.

  • Campbell, Laine & Charity Majors. (2017). Database Reliability Engineering: Designing and Operating Resilient Database Systems. O’Reilly.

  • [ h ] Lemahieu, Wilfred, Seppe Vanden Broucke, & Bart Baesens. (2018). Principles of Database Management: The Practical Guide to Storing, Managing, and Analyzing Big and Small Data. Cambridge University Press.

  • [ h ] Petrov, Alex. (2019). Database Internals: A Deep Dive into How Distributed Data Systems Work. O’Reilly.

  • [ h ] Ramakrishnan, Raghu & Johannes Gehrke. (2003). Database Management Systems, 3rd Ed. McGraw-Hill.

Data

  • [ h ] Doan, AnHai; Alon Halevy; & Zachary Ives. (2012). Principles of Data Integration. Morgan Kaufmann.

Data Analytics & Science

  • [ g ] Buisson, Florent. (2021). Behavioral Data Analysis with R and Python. O’Reilly.

  • [ G ] Cohen, Mike X. (2022). Practical Linear Algebra for Data Science: From Core Concepts to Applications Using Python. O’Reilly.

  • [ g ] Densmore, James. (2021). Data Pipelines Pocket Reference: Moving and Processing Data for Analytics. O’Reilly.

  • Downey, Allen B. (2023). Elements of Data Science. No Starch Press.

  • Farrelly, Colleen M. & Yae Ulrich Gaba. (2023). The Shape of Data: Network Science, Geometry-Based Machine Learning, and Topological Data Analysis in R. No Starch Press.

  • [ g ] Fregly, Chris & Antje Barth. (2021). Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines. O’Reilly.

  • [ g ] Grus, Joel. (2019). Data Science from Scratch: First Principles with Python. 2nd Ed. O’Reilly.

  • [ h ][ g ] Janssens, Jeroen. (2021). Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools. 2nd Ed. O’Reilly.

  • [ g ] Lakshmanan, Valliappa. (2022). Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning. 2nd Ed. O’Reilly

  • [ g ] Lakshmanan, Valliappa. (2018). Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning. O’Reilly.

  • McGregor, Susan E. (2021). Practical Python Data Wrangling and Data Quality: Getting Started with Reading, Cleaning, and Analyzing Data. O’Reilly.

  • [ h ] McKinney, Wes. (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, 2nd Ed. O’Reilly.

  • Nield, Thomas. (2022). Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and Statistics. O’Reilly.

  • Provost, Foster & Tom Fawcett. (2013). Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O’Reilly.

  • Tuckfield, Bradford. (2023). Data Science for Business People. No Starch Press.

  • [ g ] Tuulos, Ville. (2022). Effective Data Science Infrastructure: How to make data scientists productive. Manning.

  • [ g ] VanderPlas, Jake. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly.

Data Viz

  • Dale, Kyran. (2022). Data Visualization with Python and JavaScript. 2nd Ed. O’Reilly.

  • [ g ] Dougherty, Jack & Ilya Ilyankou. (2021). Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code. O’Reilly.

  • [ g ] Fisher, Danyel & Miriah Meyer. (2018). Making Data Visual: A Practical Guide to Using Visualization for Insight. O’Reilly.

  • Healy, Kieran. (2019). Data Visualization: A Practical Instroduction. Princeton University Press.

  • [ g ] Murray, Scott. (2017). Interactive Data Visualization for the Web: An Introduction to Designing with D3, 2nd Ed. O’Reilly.

  • [ g ] Thomas, Stephen A. (2015). Data Visualization with JavaScript. No Starch Press.

  • [ g ] Wilke, Claus O. (2019). Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly.

More

  • Alexopoulos, Panos. (2020). Semantic Modeling for Data: Avoiding Pitfalls and Breaking Dilemmas. O’Reilly.

  • Eryurek, Evren et al. (2021). Data Governance The Definitive Guide: People, Processes, and Tools to Operationalize Data Trustworthiness. O’Reilly.

  • [ h ][ g ] Kleppmann, Martin. (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly.

  • Reis, Joe & Matt Housley. (2022). Fundamentals of Data Engineering: Plan and Build Robust Data Systems. O’Reilly.

  • Seldess, Jesse, Ben Darnell, & Guy Harrison. (2022). CockroachDB: The Definitive Guide: Distributed Data at Scale. O’Reilly.

  • Strengholt, Piethein. (2020). Data Management at Scale: Best Practices for Enterprise Architecture. O’Reilly.

  • [ h ] Uttamchandani, Sandeep. (2020). The Self-Service Data Roadmap: Democratize Data and Reduce Time to Insight. O’Reilly.


Terms#

  • [ w ] Column

  • [ w ] Column-Oriented DBMS

  • [ w ] Create, Read, Update, Delete (CRUD)

  • [ w ] Data Architecture

  • [ w ] Data Cleaning

  • [ w ] Data Extraction

  • [ w ] Data Integration

  • [ w ] Data Integrity

  • [ w ] Data Lake

  • [ w ] Data Management

  • [ w ] Data Migration

  • [ w ] Data Pipeline

  • [ w ] Data Processing

  • [ w ] Data Retrieval

  • [ w ] Data Sink

  • [ w ] Data Storage

  • [ w ] Data Transformation

  • [ w ] Data Type

  • [ w ] Data Warehouse

  • [ w ] Database

  • [ w ] Database Management System (DBMS)

  • [ w ] Database Modeling

  • [ w ] Database System

  • [ w ] Enterprise Data Model

  • [ w ] Extract, Load, Transform (ELT)

  • [ w ] Extract, Transform, Load (ETL)

  • [ w ] Graph Database (GDB)

  • [ w ] Heterogeneous Database System (HDB)

  • [ w ] Hierarchical Database Model

  • [ w ] MUMPS

  • [ w ] NoSQL

  • [ w ] Object Database

  • [ w ] Object-Oriented Database Management System (OODBMS)

  • [ w ] Object-Relational Database (ORD)

  • [ w ] Object-Relational Database Management System (ORDBMS)

  • [ w ] Object-Relational Mapping (ORM)

  • [ w ] Online Analytical Processing (OLAP)

  • [ w ] Online Transaction Processing (OLTP)

  • [ w ] Operational Data Store

  • [ w ] Query Language

  • [ w ] Relational Database Management System (RDBMS)

  • [ w ] Row

  • [ w ] Structured Query Language (SQL)

  • [ w ] Unstructured Data