TidyDataCLI is a robust command-line tool built for automating the process of cleaning, transforming, and visualizing Excel/CSV data. Designed to be cross-platform, it can run seamlessly on Linux, macOS, and Windows, and can even be used through Docker without requiring Python to be installed.
Why use TidyDataCLI?
With its wide range of features, TidyDataCLI simplifies complex data tasks, offering tools for:
- Remove Duplicates: Efficiently remove duplicate entries from your dataset.
- Regex Cleaning: Sanitize data using customizable regular expressions.
- Column Name Cleaning: Standardize column names by stripping spaces and converting to lowercase.
- Trim Spaces: Remove leading and trailing spaces from string columns.
- Age Validation: Validate and clean 'age' columns to ensure data integrity.
- Change Case: Convert text columns to lowercase, uppercase, title case, or capitalize.
- Date Standardization: Standardize date formats across specified columns.
- Sorting: Sort data by one or more columns with ascending or descending options.
- Filtering: Apply conditions to filter rows based on specified criteria.
- Custom Transformations: Apply user-defined lambda functions for complex transformations.
- Column Addition: Add values to existing columns and perform arithmetic operations.
- Aggregation: Aggregate data by summing, averaging, or counting grouped values.
- Bar Charts: Generate bar charts with customizable x and y axes.
- Pie Charts: Create pie charts with labels and values for visualization.
- Word Clouds: Visualize text data using word clouds.
- Line Charts: Plot line charts for trend analysis.
- Box-and-Whisker Plots: Create box plots to analyze data distributions.
- Gantt Charts: Visualize project timelines with Gantt charts.
- Heat Maps: Generate heat maps to represent data density.
- Histograms: Plot histograms with adjustable bin sizes.
- Tree Maps: Visualize hierarchical data using tree maps.
- Runs on Linux, macOS, and Windows and Docker Environments
pip install TidyDataCLI
git clone https://github.com/Siam3h/tidydatacli.git
cd tidydatacli
pip install .
For a containerized approach:
docker pull tidydatacli
docker run -v $(pwd):/data tidydatacli tidydata <command> --input /data/input.csv --output /data/output.csv
Once installed, TidyDataCLI can be invoked using the following syntax:
tidydata <command> [options]
tidydata clean --input data.csv --output cleaned_data.csv --remove_duplicates --clean_columns
tidydata transform --input data.csv --output transformed.csv --sort column1 --filter "age > 30"
tidydata visualize accounts.csv --type bar --x 'BILL TO' --y 'INVOICE NUMBER' --output bill_invoice_number.png
tidydata report accounts.xlsx report.pdf --format pdf
clean
Clean your dataset by removing duplicates, trimming spaces, or performing regex-based cleaning.
transform
Apply transformations such as sorting, filtering, adding columns, and custom lambda functions.
visualize
Create visual representations of your data, such as bar charts, pie charts, and word clouds.
report
Generate reports in text or PDF format with customizable summaries or detailed outputs.
To avoid dependency management, you can use Docker:
docker run -v $(pwd):/data tidydatacli tidydata clean --input /data/input.csv --output /data/output.csv
Error messages are displayed for common issues like file not found, invalid columns, or missing options.
Example error:
Error: Input file 'non_existent_file.csv' not found.
We welcome contributions!
Find issues or suggestions? Please open an issue on GitHub.
TidyDataCLI is licensed under the MIT License. See the LICENSE file for more details.
For any questions or issues, please contact Siama at siamaphilbert@outlook.com.