E-commerce Inventory Analysis with SQL: 
A 3-Approach Project

E-commerce Inventory Analysis with SQL: A 3-Approach Project


Introduction

As a data analyst, I'm always looking for ways to use data to benefit businesses. In this project, I used SQL to analyze e-commerce inventory data. I answered a variety of questions using 3 different SQL approaches. I also provided a detailed explanation of each approach, so that other data analysts can learn from my work.

I hope you enjoy this project as much as I did while working on it.

Problem Statement

The eCommerce store of a major online retailer is facing a critical challenge of inaccurate inventory tracking resulting in dissatisfied customers and lost sales, potentially leading to the survival of the company being at risk. As a data analyst, your responsibility is to utilize SQL to monitor and assess the inventory levels of the eCommerce store, in order to address the issue and prevent any further damage.

Database Set Up

For this project we will be using two tables. You can copy the schema query here

Table 1: Products Table

This table presents comprehensive details regarding the merchandise offered by the eCommerce store. The table comprises the following fields:

  1. product_id: A distinct identifier for each product.

  2. product_name: The designated name of the product.

  3. product_category: The specific category to which the product belongs.

  4. product_price: The retail price of the product.

Below is the SQL code to generate the products table and populate it with sample data

CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(50),
product_category VARCHAR(20),
product_price NUMERIC(10,2)
);
INSERT INTO products (product_name, product_category, product_price)
VALUES ('Product A', 'Category 1', 19.99),
('Product B', 'Category 2', 29.99),
('Product C', 'Category 1', 39.99),
('Product D', 'Category 3', 49.99),
('Product E', 'Category 2', 59.99);

Table 2: Inventory Table

This table encompasses crucial details concerning the inventory levels of products available at the eCommerce store. The fields within this table consist of:

  1. product_id: A distinct identifier for each product.

  2. inventory_date: The date when the inventory count was conducted.

  3. inventory_level: The quantity of units present in inventory on the specified inventory date.

Below is the SQL code to generate the inventory table and populate it with sample data

CREATE TABLE inventory (
  product_id INT,
  inventory_date DATE,
  inventory_level INT
);
INSERT INTO inventory (product_id, inventory_date, inventory_level)
VALUES (1, '2022-01-01', 100),
       (2, '2022-01-01', 200),
       (3, '2022-01-01', 150),
       (4, '2022-01-01', 75),
       (5, '2022-01-01', 250),
       (1, '2022-01-02', 80),
       (2, '2022-01-02', 180),
       (3, '2022-01-02', 100),
       (4, '2022-01-02', 60),
       (5, '2022-01-02', 220),
       (1, '2022-01-03', 50),
       (2, '2022-01-03', 150),
       (3, '2022-01-03', 75),
       (4, '2022-01-03', 80),
       (5, '2022-01-03', 200);

Case Study Questions and Answers

In this analysis, we will be answering 5 major questions using 3 different approaches for each of the question :

Question 1. What are the top 5 products with the highest inventory levels on the most recent inventory date?

Approach 1

SELECT p.product_name,
(SELECT inventory_level FROM inventory WHERE product_id = p.product_id ORDER BY inventory_date DESC LIMIT 1) AS inventory_level
FROM products AS p
ORDER BY inventory_level DESC
LIMIT 5;

Explanation:

  • This query retrieves the product_names and their latest inventory_levels from the products and inventory tables.

  • It uses a subquery to find the maximum inventory_date for all products,

  • And then filters the inventory table to retrieve the inventory levels for the latest inventory_date. The results are sorted by inventory_level.

Approach 2:

SELECT p.product_name,i.inventory_level
FROM products AS p
INNER JOIN inventory AS i
ON i.product_id = p.product_id
WHERE i.inventory_date = (SELECT MAX(inventory_date) FROM inventor
LIMIT 5;

Explanation:

  • This query retrieves the latest inventory level for each product by joining the products and inventory tables on the product_id and inventory_date columns.

  • It uses a subquery to find the latest inventory_date for each product, and then joins the inventory table again to retrieve the corresponding inventory_level for each product. The results are sorted in descending order by inventory_level.

Approach 3:

SELECT p.product_name, i.inventory_level
FROM products AS p
INNER JOIN 
(
SELECT product_id, MAX(inventory_date) AS max_date
    FROM inventory
    GROUP BY product_id
) AS latest_inv
ON p.product_id =latest_inv.product_id
INNER JOIN inventory AS i
ON latest_inv.product_id=i.product_id
AND latest_inv.max_date= i.inventory_date
ORDER BY i.inventory_level DESC
LIMIT 5;

Explanation:

  • This query retrieves the total inventory level for each product category based on the latest inventory data by joining the products and inventory tables using a subquery to filter the inventory table for the most recent inventory data for each product.

  • The subquery selects the product_id and the maximum inventory_date for each product from the inventory table, grouping the results by product_id.

  • The latest inventory data for each product is then joined with the products table on the product_id column to retrieve the corresponding product_category.

  • The results are then joined with the inventory table on the product_id and inventory_date columns to retrieve the inventory_level for the latest inventory data. The results are then summed using the SUM function and displayed as the total_inventory_level.

  • The results are then grouped by product_category using the GROUP BY clause, and sorted in descending order by the total_inventory_level using the ORDER BY clause.

Result

product_nameinventory_level
Product E200
Product B150
Product D80
Product C75
Product A50
  • The top 5 inventory levels include

  • Product E with an inventory level of 200

  • Product B with an inventory level of 150

  • Product D with an inventory level of 80

  • Product C with an inventory level of 75

  • Product A with an inventory level of 50

  • This suggests you would need to stock up on Product A and Product B as quick as possible before customers starts complaining

Question 2. What is the total inventory level for each product category on the most recent inventory date?

Approach 1

SELECT p.product_category, SUM(i.inventory_level)AS total_inventory_level
FROM products AS p
INNER JOIN inventory AS i
ON i.product_id = p.product_id
WHERE i.inventory_date = (SELECT MAX(inventory_date) FROM inventory)
GROUP BY product_category

Explanation:

  • This query retrieves the total inventory level for each product category on the most recent inventory date by joining the products and inventory tables on the product_id column.

  • It uses a subquery to find the latest inventory date and then filters the inventory table to retrieve the inventory levels for that date. The results are then grouped by product category, and the total inventory level for each category is calculated using the SUM function.

Approach 2

SELECT p.product_category,
SUM((SELECT inventory_level FROM inventory WHERE product_id = p.product_id ORDER BY inventory_date DESC LIMIT 1)) AS total_inventory_level
FROM products AS p
GROUP BY 
product_category
ORDER BY 2 DESC

Explanation:

  • This query retrieves the total inventory level for each product category based on the latest inventory data by using a subquery in the SELECT clause to retrieve the latest inventory_level for each product from the inventory table.

  • The results are then grouped by product_category using the GROUP BY clause, and sorted in descending order by the total_inventory_level using the ORDER BY clause.

Approach 3

SELECT p.product_category, sum(i.inventory_level) AS total_inventory_level
FROM products AS p
INNER JOIN 
(
SELECT product_id, MAX(inventory_date) AS max_date
    FROM inventory
    GROUP BY product_id
) AS latest_inv
ON p.product_id =latest_inv.product_id
INNER JOIN inventory AS i
ON latest_inv.product_id=i.product_id
AND latest_inv.max_date= i.inventory_date
GROUP BY p.product_category
ORDER BY 2 DESC

Explanation:

  • This query retrieves the total inventory level for each product category based on the latest inventory data by joining the products and inventory tables using a subquery to filter the inventory table for the most recent inventory data for each product.

  • The subquery selects the product_id and the maximum inventory_date for each product from the inventory table, grouping the results by product_id.

  • The latest inventory data for each product is then joined with the products table on the product_id column to retrieve the corresponding product_category.

  • The results are then joined with the inventory table on the product_id and inventory_date columns to retrieve the inventory_level for the latest inventory data. The results are then summed using the SUM function and displayed as the total_inventory_level.

  • The results are then grouped by product_category using the GROUP BY clause, and sorted in descending order by the total_inventory_level using the ORDER BY clause.

Result

product_categorytotal_inventory_level
Category 1125
Category 2350
Category 380
  • Product category 1 has a total stock level of 125

  • Product category 2 has a total stock level of 350

  • Product category 3 has a total stock level of 80

Question 3 : What is the average inventory level for each product category for the month of January 2022?

Approach 1

SELECT p.product_category,round(avg(i.inventory_level),2) AS avg_inventory_level
FROM products AS p
INNER JOIN inventory AS i
ON p.product_id = i.product_id
WHERE EXTRACT(YEAR FROM i.inventory_date)= 2022 AND
EXTRACT(MONTH FROM inventory_date)= 01
GROUP BY p.product_category

Explanation:

  • This approach uses the EXTRACT function to filter inventory data for January 2022, then joins the products and inventory tables using the product_id column. -It then groups the results by product_category and calculates the average inventory level, rounded to 2 decimal places using the ROUND function.

Approach 2

SELECT p.product_category,round(avg(i.inventory_level),2) AS avg_inventory_level
FROM products AS p
INNER JOIN inventory AS i
ON p.product_id = i.product_id
WHERE i.inventory_date >= '2022-01-01' AND inventory_date < '2022-02-01'
GROUP BY p.product_category

Explanation:

  • This approach uses a subquery to identify the latest inventory date for each product_id, then joins the products and inventory tables on the latest inventory date and product_id columns. It then groups the results by product_category and calculates the sum of inventory levels.

Approach 3

SELECT p.product_category, ROUND(AVG(i.inventory_level), 2) AS avg_inventory_level
FROM products AS p, inventory AS i
WHERE p.product_id = i.product_id
AND i.inventory_date BETWEEN '2022-01-01' AND '2022-01-31'
GROUP BY p.product_category

Explanation:

  • This approach uses the BETWEEN operator to filter inventory data for the month of January 2022, then joins the products and inventory tables using the product_id column.

  • It then groups the results by product_category and calculates the average inventory level, rounded to 2 decimal places using the ROUND function

Result

product_categoryavg_inventory_level
Category1
Category2
Category3

Question 4. Which products had a decrease in inventory level from the previous inventory date to the current inventory date?

Approach 1

WITH cte AS 
(
SELECT
product_name,
curr.inventory_date,
curr.inventory_level - LAG(curr.inventory_level) OVER (PARTITION BY curr.product_id ORDER BY curr.inventory_date) AS inventory_diff
FROM inventory AS curr
INNER JOIN products as p
ON p.product_id = curr.product_id
)
SELECT * 
FROM cte
WHERE inventory_change IS NOT NULL
  • This query calculates the difference in inventory levels between the current date and the previous date for each product in the inventory table.

  • It then returns the product name, inventory date, and inventory difference for products that experienced a change in inventory level.

Approach 2

SELECT  p.product_name, inv_1.inventory_date, inv_1.inventory_level - inv_2.inventory_level AS inventory_diff
FROM inventory AS inv_1
JOIN inventory inv_2 ON inv_1.product_id = inv_2.product_id 
         AND inv_1.inventory_date = inv_2.inventory_date + INTERVAL '1 day'
JOIN products p ON inv_1.product_id = p.product_id
WHERE inv_1.inventory_level < inv_2.inventory_level;

Explanation:

  • This SQL query retrieves the product name, inventory date, and inventory level difference between two consecutive days for each product in the inventory table where the inventory level decreased.

  • It helps to identify the products with declining inventory levels on consecutive days, which can be useful for inventory management and supply chain optimization.

Approach 3

SELECT 

    p.product_name,
    curr.inventory_date, 
    curr.inventory_level - prev.inventory_level AS inventory_diff
FROM inventory AS curr
JOIN inventory AS prev 
    ON curr.product_id = prev.product_id 
    AND curr.inventory_date > prev.inventory_date 
    AND NOT EXISTS (
        SELECT 1 FROM inventory 
        WHERE product_id = curr.product_id 
        AND inventory_date > prev.inventory_date 
        AND inventory_date < curr.inventory_date
    )
JOIN products AS p
ON p.product_id =curr.product_id
WHERE curr.inventory_level < prev.inventory_level
ORDER BY curr.product_id, curr.inventory_date DESC;

Explanation:

  • This SQL query identifies products whose inventory levels have decreased over two consecutive days. It returns the product name, inventory date, and inventory difference for each product that experienced a decrease in inventory level

Result

product_nameinventory_dateinventory_diff
Product A03/01/2022-30
Product A02/01/2022-20
Product B03/01/2022-30
Product B02/01/2022-20
Product C03/01/2022-25
Product C02/01/2022-50
Product D02/01/2022-15
Product E03/01/2022-20
Product E02/01/2022-30

Question 5. What is the overall trend in inventory levels for each product category over the month of January 2022?

Approach 1

SELECT p.product_category,i.inventory_date , round(avg(i.inventory_level),2) AS avg_inventory_level
FROM products AS p
INNER JOIN inventory AS i
ON p.product_id = i.product_id
WHERE EXTRACT(YEAR FROM i.inventory_date)= 2022 AND
EXTRACT(MONTH FROM inventory_date)= 01
GROUP BY p.product_category, i.inventory_date
ORDER BY p.product_category, i.inventory_date

Explanation:

  • This query joins the products and inventory tables on the product_id. It filters the results to only include inventory levels from January 2022, then groups the data by product category and inventory date.

Approach 2

SELECT p.product_category, i.inventory_date,round(avg(i.inventory_level),2) AS avg_inventory_level
FROM products AS p
INNER JOIN inventory AS i
ON p.product_id = i.product_id
WHERE i.inventory_date >= '2022-01-01' AND inventory_date < '2022-02-01'
GROUP BY p.product_category, i.inventory_date
ORDER BY p.product_category, i.inventory_date

Explanation:

  • This query joins the products and inventory tables using the product_id column.

  • The WHERE clause filters the results to include only inventory levels recorded between January 1, 2022, and January 31, 2022. The GROUP BY clause groups the data by product category and inventory date, allowing the query to calculate the average inventory level for each product category on each inventory date in January 2022.

  • Finally, the ORDER BY clause sorts the results by product category and inventory date, which makes it easier to analyze the data.

Approach 3

SELECT p.product_category,i.inventory_date, ROUND(AVG(i.inventory_level), 2) AS avg_inventory_level
FROM products AS p, inventory AS i
WHERE p.product_id = i.product_id
AND i.inventory_date BETWEEN '2022-01-01' AND '2022-01-31'
GROUP BY p.product_category, i.inventory_date
ORDER BY p.product_category, i.inventory_date

Explanation

  • This query performs an inner join between the products and inventory tables on the product_id column.

  • The WHERE clause filters the results to include only inventory levels recorded between January 1, 2022, and January 31, 2022. The GROUP BY clause groups the data by product category and inventory date, allowing the query to calculate the average inventory level for each product category on each inventory date in January 2022.

Finally, the ORDER BY clause sorts the results by product category and inventory date, which makes it easier to analyze the data.

Result

product_categoryinventory_dateavg_inventory_level
Category 101/01/2022125
Category 102/01/202290
Category 103/01/202262.5
Category 201/01/2022225
Category 202/01/2022200
Category 203/01/2022175
Category 301/01/202275
Category 302/01/202260
Category 303/01/202280

And we're done, In this case study I've provided 3 different approaches to solve each of the questions. alongside a detailed explanation of each of the queries.

Please like, comment and share also let me know if you encounter any challenges with this project.