Ivy Professional School
Rating

Automating Excel Reports Using Python and OpenPyXL: From Manual Drudgery to One-Click Reports

Eeshani Agrawal
By Eeshani Agrawal
16+ years experience
February 15, 2026
18 min read
Interactive Lesson

Watch: Excel Automation in Action

Automate Excel Python workflows are no longer a 'nice to have' skill; they are quickly becoming a core capability for anyone working with data, reports, or business operations. If you've ever spent hours updating Excel formulas, formatting sheets, or generating the same monthly report again and again, this tutorial is written for you.

Imagine you are good at Excel. You know how to filter, sort, build pivot tables, and format reports. But your pain is not 'I don't know Excel.' Your pain is: the same report comes back every week, and you are doing the same steps again and again. That repetitive effort is exactly what we will remove. The goal here is simple: by the end of this article, you should clearly understand what Python is doing at every step, even if you have never coded before, and you should be able to connect each Python step to something you already understand in Excel.

Why Excel Automation Matters in the Real World

Every Monday morning, your team gets a file called: raw_sales.csv or sales_dump.xlsx. It contains transactions: product, date, quantity, revenue, region, salesperson, etc. Then someone (often you) does this: Open the file, Clean the data, Add formulas, Apply formatting, Create summary tables, Save it as a 'final report' and share. This is manageable when it happens once. But when it happens weekly, daily, or for multiple departments, the time waste becomes huge.

The 'Robotic Assistant' Analogy

Think of Excel as your workshop where you craft reports manually with tools. Python is like hiring a robotic assistant who can use those same tools but never gets tired or makes typos.

The Manual Process:

You spend 2 hours every Monday doing the exact same clicks, drags, and formats.

The Automated Solution:

Python does all the clicks, drags, and formats in 2 seconds, while you review the final output.

Phase One: Preparation

Setting Up the Environment

Before writing any automation, we need the right tools. OpenPyXL is our primary library for working with Excel files at the cell level.

install_openpyxl.py
# Install OpenPyXL using pip
pip install openpyxl

Behind the Code:

pip is Python's package installer that downloads and installs libraries from the Python Package Index (PyPI). openpyxl is a library specifically designed to read/write Excel 2010 xlsx/xlsm/xltx/xltm files. After installation, Python can now 'speak Excel' and manipulate spreadsheets programmatically.

Make sure you have Python installed before running this command.

Phase Two: Building Blocks

Creating Your First Excel Report File with OpenPyXL

Here is the simplest report automation use case: You want Python to create a brand-new Excel file that already has a sheet name, column headers, and a basic structure ready for data.

create_excel.py
"text-purple-400">from openpyxl "text-purple-400">import Workbook

# Create a new workbook (like clicking 'New' "text-purple-400">in Excel)
wb = Workbook()

# Select the active sheet (default is 'Sheet1')
ws = wb.active

# Rename the sheet to something meaningful
ws.title = "Sales Data"

# Write headers to specific cells
ws["A1"] = "Product"
ws["B1"] = "Quantity"
ws["C1"] = "Revenue"

# Save the workbook to a file
wb.save("sales_report.xlsx")

Behind the Code:

Workbook() creates a new Excel workbook in memory, similar to opening Excel and clicking 'New'. wb.active selects the currently active sheet (usually the first one). ws.title renames the sheet just like double-clicking the sheet tab in Excel. Writing to cells like ws['A1'] is exactly like typing into cell A1 in Excel. Finally, wb.save() writes the file to disk - without this step, your work only exists in memory.

This creates a real .xlsx file that you can open in Excel, even though Excel was never opened during the process.

Writing Data into Excel Automatically

So far, we created a structure. Now we fill in rows, like entering data into a table. This is where automation truly shines - handling multiple rows without manual entry.

write_data.py
# Sample data (could come "text-purple-400">from database, API, or CSV)
data = [
    ("Laptop", 10, 80000),
    ("Tablet", 25, 125000),
    ("Phone", 40, 400000)
]

# Start "text-purple-400">from row 2 (row 1 has headers)
row_num = 2

"text-purple-400">for item "text-purple-400">in data:
    # Write each value to its respective column
    ws.cell(row=row_num, column=1, value=item[0])  # Product -> Column A
    ws.cell(row=row_num, column=2, value=item[1])  # Quantity -> Column B
    ws.cell(row=row_num, column=3, value=item[2])  # Revenue -> Column C
    row_num += 1  # Move to next row

# Save the updated workbook
wb.save("sales_report_with_data.xlsx")

Behind the Code:

We start from row_num = 2 because row 1 contains our headers. The for loop iterates through each item in our data list. ws.cell(row, column, value) is OpenPyXL's precise way of addressing cells - it's like saying 'go to this specific row and column, put this value there'. After writing all data, we save again to preserve our changes. This pattern - update workbook, then save workbook - is fundamental to Excel automation with Python.

You could have thousands of rows here, and Python would handle them all with the same few lines of code.

Adding Totals and Business Logic with Formulas

Excel reports are rarely just raw data. They usually contain totals, averages, and KPIs. OpenPyXL allows you to place formulas exactly like you would in Excel, preserving the Excel logic your team is familiar with.

add_formulas.py
# Add a label "text-purple-400">for total
ws["B6"] = "Total Revenue"

# Add SUM formula (just like "text-purple-400">in Excel)
ws["C6"] = "=SUM(C2:C4)"

# You can also add other formulas
ws["B7"] = "Average Quantity"
ws["C7"] = "=AVERAGE(B2:B4)"

# Save "text-purple-400">with formulas
wb.save("sales_report_with_formulas.xlsx")

Important Clarification:

OpenPyXL does not calculate the formulas itself - it only writes the formula text into the Excel file. Excel calculates the result when you open the file. This is actually beneficial because it keeps Excel logic intact, and your finance or operations team can still inspect formulas exactly as they're used to. This respect for existing workflows is why Python Excel automation adoption is growing in companies.

The formula syntax is exactly what you'd type in Excel, including the = sign at the beginning.

Formatting the Excel Report so It Looks Professional

Automation doesn't mean ugly output. Business reports must be readable with bold headers, proper number formatting, aligned columns, and consistent style every time. Python can enforce the same formatting every single time, which avoids human error.

formatting.py
"text-purple-400">from openpyxl.styles "text-purple-400">import Font, Alignment, PatternFill, Border, Side

# Make headers bold and centered
header_font = Font(bold="text-blue-400">True, size=12, color="FFFFFF")
header_fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")
alignment = Alignment(horizontal="center", vertical="center")

"text-purple-400">for cell "text-purple-400">in ["A1", "B1", "C1", "D1"]:
    ws[cell].font = header_font
    ws[cell].fill = header_fill
    ws[cell].alignment = alignment

# Format revenue "text-purple-400">as currency
"text-purple-400">from openpyxl.styles "text-purple-400">import numbers
"text-purple-400">for row "text-purple-400">in range(2, 5):
    ws.cell(row=row, column=3).number_format = numbers.FORMAT_CURRENCY_USD_SIMPLE

# Auto-adjust column widths
"text-purple-400">from openpyxl.utils "text-purple-400">import get_column_letter
"text-purple-400">for col "text-purple-400">in range(1, 4):
    col_letter = get_column_letter(col)
    ws.column_dimensions[col_letter].width = 15

wb.save("sales_report_formatted.xlsx")

Behind the Code:

We import various styling classes from openpyxl.styles. Font controls text appearance (bold, size, color). PatternFill sets background colors. Alignment controls text positioning. number_format applies Excel-style number formatting (currency, percentages, dates). column_dimensions adjusts column widths. This is where Advanced Excel vs Python becomes obvious: Excel is great for one-time formatting, Python is great for consistent formatting repeatedly across hundreds of files.

All these formatting options mirror what you can do manually in Excel's Format Cells dialog.

Phase Three: Real-World Scenarios

Automating an Already Present Excel Sheet

Most real-world automation doesn't start from scratch. You receive existing Excel files and need to automate cleaning, formatting, adding new columns, and calculating totals. Here's how Python handles existing workbooks.

load_existing.py
"text-purple-400">from openpyxl "text-purple-400">import load_workbook

# Load an existing Excel file
wb = load_workbook("existing_sales_report.xlsx")
ws = wb.active

print(f"Working ">with sheet: {ws.title}")
print(f"Total rows: {ws.max_row}")
print(f"Total columns: {ws.max_column}")

Behind the Code:

load_workbook() opens an existing Excel file for editing, preserving all existing content, formulas, and formatting. ws.max_row and ws.max_column give you the dimensions of used cells. This allows you to programmatically understand the structure of any Excel file you receive, regardless of who created it or how it's formatted.

This is powerful for handling reports from different departments or systems that all need the same processing.

Data Cleaning: Remove Duplicates and Blank Rows

Real data is messy. Python can automatically clean duplicate rows and remove blanks - tasks that take hours manually but milliseconds programmatically.

clean_data.py
# Track unique rows to identify duplicates
seen = set()
rows_to_delete = []

# Start "text-purple-400">from row 2 to avoid headers
"text-purple-400">for row "text-purple-400">in range(2, ws.max_row + 1):
    # Check "text-purple-400">if row is completely blank
    "text-purple-400">if all(ws.cell(row=row, column=col).value is "text-blue-400">None "text-purple-400">for col "text-purple-400">in range(1, ws.max_column + 1)):
        rows_to_delete.append(row)
        continue
    
    # Get all values "text-purple-400">in the row "text-purple-400">as a tuple
    row_data = tuple(ws.cell(row=row, column=col).value "text-purple-400">for col "text-purple-400">in range(1, ws.max_column + 1))
    
    # Check "text-purple-400">for duplicates
    "text-purple-400">if row_data "text-purple-400">in seen:
        rows_to_delete.append(row)
    "text-purple-400">else:
        seen.add(row_data)

# Delete rows "text-purple-400">from bottom to top to maintain correct indices
"text-purple-400">for row "text-purple-400">in sorted(rows_to_delete, reverse="text-blue-400">True):
    ws.delete_rows(row)

print(f"Deleted {len(rows_to_delete)} duplicate/blank rows")
wb.save("cleaned_report.xlsx")

Behind the Code:

We use a set() to track unique rows because sets automatically handle duplicates. The all() function checks if every cell in a row is None (blank). We collect rows to delete in a list first, then delete from bottom to top - this is crucial because deleting rows changes the indices of rows below. This approach handles thousands of rows efficiently, something that would be extremely tedious in Excel.

This is a 'super-eraser' that works at computer speed, not human speed.

Using Pandas with OpenPyXL (The Power Combo)

For complex data transformations, combining Pandas for data analysis with OpenPyXL for Excel formatting gives you the best of both worlds.

pandas_openpyxl.py
"text-purple-400">import pandas "text-purple-400">as "text-blue-400">pd
"text-purple-400">from openpyxl "text-purple-400">import load_workbook
"text-purple-400">from openpyxl.utils.dataframe "text-purple-400">import dataframe_to_rows

# Use Pandas "text-purple-400">for powerful data operations
df = "text-blue-400">pd.read_excel("raw_data.xlsx")

# Clean and transform "text-purple-400">with Pandas
df_cleaned = df.dropna()  # Remove rows "text-purple-400">with missing values
df_cleaned["Profit Margin"] = (df_cleaned["Revenue"] - df_cleaned["Cost"]) / df_cleaned["Revenue"]
summary = df_cleaned.groupby("Product").agg({
    "Quantity": "sum",
    "Revenue": "sum",
    "Profit Margin": "mean"
}).reset_index()

# Write back to Excel "text-purple-400">with OpenPyXL formatting
wb = Workbook()
ws = wb.active
ws.title = "Summary Report"

# Convert Pandas "text-blue-400">DataFrame to rows
"text-purple-400">for r_idx, row "text-purple-400">in enumerate(dataframe_to_rows(summary, index="text-blue-400">False, header="text-blue-400">True), 1):
    "text-purple-400">for c_idx, value "text-purple-400">in enumerate(row, 1):
        ws.cell(row=r_idx, column=c_idx, value=value)

# Apply formatting
header_font = Font(bold="text-blue-400">True)
"text-purple-400">for cell "text-purple-400">in ws[1]:
    cell.font = header_font

wb.save("pandas_openpyxl_report.xlsx")

Behind the Code:

Pandas excels at data manipulation (groupby, calculations, filtering) while OpenPyXL excels at Excel file control. We use pd.read_excel() to load data into a DataFrame, perform complex operations efficiently, then use dataframe_to_rows() to convert the DataFrame back to Excel rows. This combination is common in Python for Data Analytics projects where you need both analytical power and polished reporting.

This is the professional approach: use the right tool for each job.

Phase Four: Production Deployment

How to Schedule Python Scripts for Excel Automation

Automation becomes real only when it runs without you. On Windows you use Task Scheduler. On Linux/macOS you use cron. This turns your Python script from a manual tool into a production system.

cron_schedule.txt
# Run every Monday at 9 AM
0 9 * * 1 python /path/to/excel_report_script.py

# Run every weekday at 8 PM
0 20 * * 1-5 python /path/to/daily_report.py

# Run first day of every month at 6 AM
0 6 1 * * python /path/to/monthly_report.py

Behind the Schedule:

Cron syntax has five time fields: minute (0-59), hour (0-23), day of month (1-31), month (1-12), day of week (0-7 where 0 and 7 are Sunday). The example '0 9 * * 1' means 'at minute 0 of hour 9, any day of month, any month, on Monday'. On Windows, you'd use Task Scheduler with similar timing logic. This is what transforms your script from a 'cool trick' into a business process.

Always test your script thoroughly before scheduling it to run automatically.

Common Pitfalls: Troubleshooting Excel Automation

Even with automation, things can go wrong. Here are solutions to common issues:

File Permission Errors

Python can't save because the Excel file is open. Close Excel before running the script, or save to a different filename.

Formula Not Calculating

OpenPyXL writes formulas but doesn't calculate them. Open the file in Excel to see calculated results, or use data_only=True when loading to get last saved values.

Performance with Large Files

Processing 100,000+ rows? Consider using read-only mode (read_only=True) for reading and write-only mode (write_only=True) for writing large files.

The Transformation: From Manual Worker to System Designer

You have now completed a workflow that transforms repetitive Excel tasks into automated processes. What we've learned—creating workbooks, writing data, adding formulas, applying formatting, cleaning existing files, and scheduling automation—forms the foundation of modern business process automation. In fact, professionals who master this skill often become the go-to experts for efficiency improvements in their organizations.

Summary Checklist

Setup:pip install openpyxl
Create:wb = Workbook(); wb.save('file.xlsx')
Write:ws.cell(row, column, value)
Formulas:ws['A1'] = '=SUM(B2:B10)'
Format:from openpyxl.styles import Font, Alignment
Load Existing:load_workbook('existing.xlsx')
Schedule:Use cron (Linux/Mac) or Task Scheduler (Windows)

This concludes our guide on Excel automation with Python. Start small with one repetitive task, automate it, and gradually expand your automation portfolio.

By understanding both the Excel perspective (cells, formulas, formatting) and the Python perspective (loops, functions, libraries), you've built a bridge between manual work and automation. The repetitive Excel work that used to consume your Mondays now happens in seconds, freeing you for higher-value analysis and decision-making.

Identify Your Knowledge Gaps with Intelligent Quizzes

Take personalized quizzes tailored to your domain, topic, and difficulty level. Get detailed feedback on your strengths and weaknesses. Receive a customized learning plan to improve based on your quiz performance. Join 50,000+ learners who've improved their skills with PrepAI Diagnose.

Start Your PrepAI Diagnose
#1 Data Science & GenAI Institute | IIT Certified | Pay After Placement | Ivy