A comprehensive Python-based automation system for collecting, processing, and analyzing social media data across multiple platforms. This suite integrates with various APIs to fetch social media metrics, processes the data, stores it in a PostgreSQL database (Supabase), and syncs with Notion for reporting and analysis.
This automation suite consists of three main modules that work together to create a complete social media analytics pipeline:
- Social Client - Fetches data from social media platform APIs
- Process - Transforms and uploads data to PostgreSQL/Supabase
- Notion - Syncs data with Notion databases for reporting
- LinkedIn (Profile & Posts)
- Instagram (Profile & Posts)
- Twitter/X (Profile & Posts)
- Threads (Profile & Posts)
- Substack (Profile & Posts)
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Social APIs ββββββΆβ Data Processing ββββββΆβ Supabase β
β (RapidAPI) β β & Transform β β PostgreSQL β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Notion β
β Databases β
βββββββββββββββββββ
-
Data Collection π₯
- Raw JSON data fetched from social media APIs via RapidAPI
- Daily collection frequency with smart caching
- Raw responses stored in
results/raw/directory - Skip existing data to avoid duplicate API calls
-
Data Processing π
- JSON data transformed using configurable field mappings
- Automatic type conversion and validation
- Required fields validated before processing
- Error handling for missing or invalid data
-
Database Storage πΎ
- Processed data uploaded to PostgreSQL (Supabase)
- Automatic table creation based on DataFrame structure
- Batch processing for large datasets
- Upsert logic (update existing, insert new)
-
Data Aggregation π
- Profile data consolidated across platforms
- Posts data aggregated by content type (video vs non-video)
- One record per day with all platform data
- Optimized for time-series analysis
-
Reporting & Sync π
- Data synced to Notion databases for visualization
- Bidirectional sync capabilities
- Change tracking and comprehensive logging
- Support for all Notion property types
- Frequency: Daily automated collection
- Platforms: LinkedIn, Instagram, Twitter/X, Threads, Substack
- Data Types: Profile information and recent posts performance
- Storage: Raw JSON files with metadata
- Validation: Field mapping and type conversion
- Quality: Comprehensive error handling and logging
automation/
βββ reporting/
β βββ social_client/ # API data collection
β β βββ social_api_client.py
β β βββ README.md
β βββ process/ # Data processing & database operations
β β βββ data_processor.py
β β βββ supabase_uploader.py
β β βββ profile_aggregator.py
β β βββ posts_consolidator.py
β β βββ README.md
β βββ notion/ # Notion integration
β β βββ notion_update.py
β β βββ notion_supabase_sync.py
β β βββ notion_database_structure.py
β β βββ README.md
β βββ config/ # Configuration files
β β βββ config.json
β β βββ mapping.json
β β βββ logger_config.py
β β βββ README.md
β βββ results/ # Output directories
β βββ raw/ # Raw API responses
β βββ processed/ # Processed data files
βββ README.md
- Python 3.7+
- PostgreSQL database (local or Supabase cloud)
- API keys for social media platforms (via RapidAPI)
- Notion API token (for Notion integration)
-
Clone the repository
git clone <repository-url> cd automation/reporting
-
Install dependencies
pip install -r requirements.txt
-
Configure environment
# Copy example configurations cp config/config_example.json config/config.json cp process/.env_example process/.env # Edit config files with your credentials
-
Set up database
- Create a Supabase project or set up local PostgreSQL
- Update database credentials in
.envfile
-
Collect social media data
cd social_client python social_api_client.py -
Process and upload data
cd ../process python data_processor.py -
Aggregate profiles and posts
python profile_aggregator.py python posts_consolidator.py
-
Sync with Notion (optional)
cd ../notion python notion_update.py YYYYMMDD
Fetches data from social media APIs:
- Automatic timestamp-based file naming
- Skip existing data to avoid duplicate API calls
- Debug mode for troubleshooting
- Progress tracking and comprehensive logging
Key Features:
- π Multi-platform support
- πΎ JSON output with metadata
- π Smart caching
- π Debug mode
Handles data transformation and database operations:
- Field mapping based on configuration
- Automatic type conversion
- Database table creation
- Batch processing for large datasets
Key Components:
- π
data_processor.py- Main processing engine - π€
supabase_uploader.py- Database interface - π
profile_aggregator.py- Consolidates follower counts - π
posts_consolidator.py- Merges posts data
Integrates with Notion for reporting:
- Bidirectional sync with Supabase
- Automatic schema detection
- Change tracking and logging
- Support for all Notion property types
Key Tools:
- π
notion_update.py- Updates Notion with Supabase data - π
notion_supabase_sync.py- Continuous database sync - π
notion_database_structure.py- Schema analysis
Central configuration management:
- API credentials and endpoints
- Field mapping rules
- Database settings
- Logging configuration
# 1. Fetch latest social media data
cd reporting/social_client
python social_api_client.py
# 2. Process and upload to database
cd ../process
python data_processor.py
# 3. Aggregate data
python profile_aggregator.py
python posts_consolidator.py
# 4. Update Notion (if needed)
cd ../notion
python notion_update.py $(date +%Y%m%d)# Run continuous sync in background
cd reporting/notion
python notion_supabase_sync.pyThe system uses PostgreSQL (via Supabase) with a normalized schema that separates raw data collection from aggregated analytics. All tables use date as the primary key for efficient time-series queries.
The system creates individual tables for each platform and data type to store raw API responses:
linkedin_profile: LinkedIn follower counts and profile datainstagram_profile: Instagram follower counts and profile datatwitter_profile: Twitter/X follower counts and profile datathreads_profile: Threads follower counts and profile datasubstack_profile: Substack subscriber counts and profile data
Common Profile Fields:
date(date, PRIMARY KEY): Date of data collectionplatform(text): Platform identifierdata_type(text): Data type identifier ('profile')num_followers(integer): Number of followers/subscribers
linkedin_posts: LinkedIn post performance metricsinstagram_posts: Instagram post performance metricstwitter_posts: Twitter/X post performance metricsthreads_posts: Threads post performance metricssubstack_posts: Substack post performance metrics
Common Posts Fields:
date(date, PRIMARY KEY): Date of data collectionplatform(text): Platform identifierdata_type(text): Data type identifier ('posts')post_id(text): Unique post identifierposted_at(date): Date when post was publishedis_video(integer): Boolean flag (1 for video, 0 for non-video)num_likes(integer): Number of likes/reactionsnum_comments(integer): Number of commentsnum_reshares(integer): Number of reshares/reposts
profile - Consolidated daily follower counts across all platforms:
date(date, PRIMARY KEY): Date of data collectionnum_followers_linkedin(integer): LinkedIn follower countnum_followers_instagram(integer): Instagram follower countnum_followers_twitter(integer): Twitter follower countnum_followers_substack(integer): Substack subscriber countnum_followers_threads(integer): Threads follower count
posts - Daily post performance metrics separated by content type:
date(date, PRIMARY KEY): Date of data collection
Non-Video Posts (by platform):
post_id_*_no_video: Post ID for latest non-video contentposted_at_*_no_video: Publication datenum_likes_*_no_video: Engagement metricsnum_comments_*_no_video: Comment countsnum_reshares_*_no_video: Share counts
Video Posts (by platform):
post_id_*_video: Post ID for latest video contentposted_at_*_video: Publication datenum_likes_*_video: Engagement metricsnum_comments_*_video: Comment countsnum_reshares_*_video: Share counts
( = linkedin, instagram, twitter, substack, threads)
The system implements a sophisticated two-stage data pipeline designed for scalability and analysis:
Social Media Data:
- Platform-specific tables store raw API responses
- Automatic table creation based on data structure
- Preserves original data integrity before transformation
Notion Integration:
- Dynamic schema detection from Notion databases
- Bidirectional sync with change tracking
- Complex data types stored as JSONB for flexibility
- SQL aggregation scripts process raw data into analysis-ready tables
- Platform-specific data merged into unified views
- Optimized for time-series analysis and cross-platform comparisons
All Notion-synced tables share standardized columns:
| Column | Data Type | Description |
|---|---|---|
notion_id |
text |
Notion page ID (UUID) - Primary Key |
created_time |
timestamp with time zone |
When page was created in Notion |
last_edited_time |
timestamp with time zone |
When page was last edited |
archived |
boolean |
Whether page is archived |
notion_data_jsonb |
jsonb |
Complex data types and unmapped properties |
Tables are automatically created with columns derived from Notion properties:
- Property names β normalized column names (lowercase, underscores)
- Data types automatically mapped from Notion to PostgreSQL
- Complex types (relations, arrays) stored in JSONB column
| Notion Property Type | PostgreSQL Data Type |
|---|---|
| Title, Rich Text, URL, Email, Phone | text |
| Number | bigint or double precision |
| Select, Status | text |
| Date | timestamp with time zone |
| Checkbox | boolean |
| Formula (various) | Mapped to appropriate types |
| Multi-Select, Relation, People, Files | jsonb |
The system syncs data from 15+ Notion databases for comprehensive content management:
Content & Publishing:
notion_posts- Social media posts and contentnotion_articles- Blog articles and written contentnotion_newsletter- Newsletter content and campaigns
Media & Assets:
notion_clips- Video/audio clips and media assetsnotion_illustrations- Images and visual contentnotion_visual_types- Media categorization
Business & Analytics:
notion_companies- Company profiles and relationshipsnotion_connections- Network and relationship datanotion_interactions- User engagement and interactions
Content Strategy:
notion_editorial- Editorial calendar and planningnotion_concepts- Content ideas and brainstormingnotion_books- Book recommendations and reviewsnotion_books_recommendations- Reading lists and suggestions
Additional Databases:
notion_episodes- Podcast episodes and seriesnotion_comments- User comments and feedbacknotion_wins_and_features- Success metrics and feature tracking
Social Media Data:
- Raw platform tables feed into consolidated tables
- Foreign key relationships based on
datefield - No traditional foreign keys between Notion tables
Notion Data:
- Relationships stored as Notion page ID arrays in JSONB
- Application-layer joins required for complex queries
- Preserves Notion's flexible relationship model
Data Integrity:
- Primary keys ensure uniqueness
- Timestamp tracking for change detection
- Archive status management for data lifecycle
Monthly follower growth by platform:
SELECT
DATE_TRUNC('month', date) as month,
platform,
AVG(num_followers) as avg_followers,
MAX(num_followers) - MIN(num_followers) as growth
FROM (
SELECT date, 'linkedin' as platform, num_followers_linkedin as num_followers FROM profile
UNION ALL
SELECT date, 'instagram' as platform, num_followers_instagram as num_followers FROM profile
UNION ALL
SELECT date, 'twitter' as platform, num_followers_twitter as num_followers FROM profile
) combined
GROUP BY month, platform
ORDER BY month, platform;Average engagement by content type and platform:
SELECT
'linkedin' as platform,
'video' as content_type,
AVG(num_likes_linkedin_video) as avg_likes,
AVG(num_comments_linkedin_video) as avg_comments,
AVG(num_reshares_linkedin_video) as avg_reshares
FROM posts
WHERE num_likes_linkedin_video IS NOT NULL;Features for ML model training:
SELECT
p.date,
-- Historical performance (7-day average)
AVG(ps.num_likes_linkedin_no_video) OVER (
ORDER BY p.date ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING
) as avg_likes_7d,
-- Growth momentum
p.num_followers_linkedin - LAG(p.num_followers_linkedin, 7) OVER (ORDER BY p.date) as follower_growth_7d,
-- Engagement rate
CASE
WHEN p.num_followers_linkedin > 0
THEN (ps.num_likes_linkedin_no_video + ps.num_comments_linkedin_no_video) / p.num_followers_linkedin
ELSE NULL
END as engagement_rate
FROM profile p
LEFT JOIN posts ps ON p.date = ps.date
WHERE ps.num_likes_linkedin_no_video IS NOT NULL;Total audience reach across platforms:
SELECT
date,
num_followers_linkedin + num_followers_instagram + num_followers_twitter +
num_followers_substack + num_followers_threads as total_followers,
-- Engagement rates
CASE WHEN num_followers_linkedin > 0
THEN (num_likes_linkedin_no_video + num_comments_linkedin_no_video) / num_followers_linkedin
ELSE 0 END as linkedin_engagement_rate,
CASE WHEN num_followers_instagram > 0
THEN (num_likes_instagram_no_video + num_comments_instagram_no_video) / num_followers_instagram
ELSE 0 END as instagram_engagement_rate
FROM profile p
LEFT JOIN posts ps ON p.date = ps.date
ORDER BY date DESC;Most scripts support debug mode for detailed logging:
python script_name.py --debugSwitch between local and cloud databases:
python supabase_uploader.py --environment localOverride default configuration files:
python notion_supabase_sync.py --config custom_config.json-
API Rate Limits
- Use
--skip-existingflag to avoid re-fetching data - Implement delays between API calls
- Use
-
Database Connection Errors
- Verify credentials in
.envfile - Check network connectivity
- Ensure database is accessible
- Verify credentials in
-
Missing Data Fields
- Review
mapping.jsonfor correct field paths - Enable debug mode to see raw API responses
- Check if API response structure changed
- Review
-
Notion Sync Issues
- Verify Notion API token is valid
- Check database IDs in configuration
- Review Notion API rate limits
# Test database connection
cd reporting/process
python supabase_test_connect.py
# Analyze Notion database structure
cd ../notion
python notion_database_structure.py --debug
# Process single platform
cd ../social_client
python social_api_client.py --platform linkedin_profile --debug- Batch Processing: Data is processed in batches to handle large datasets
- Incremental Sync: Only new/modified data is synced to avoid redundant operations
- Connection Pooling: Database connections are pooled for efficiency
- Smart Caching: API responses are cached daily to minimize API calls
-
Never commit sensitive data
- Keep
config.jsonout of version control - Use
.envfiles for database credentials - Rotate API keys regularly
- Keep
-
Use environment variables in production
export SUPABASE_URL="your-url" export SUPABASE_KEY="your-key"
-
Implement access controls
- Use read-only database users where possible
- Limit API key permissions
- Enable Supabase Row Level Security (RLS)
-
Update configuration
- Add platform config to
config.json - Define field mappings in
mapping.json
- Add platform config to
-
Test data collection
python social_api_client.py --platform new_platform --debug
-
Verify processing
python data_processor.py --debug
- Create custom processors in the
processmodule - Add new Notion property type handlers
- Implement additional aggregation queries
This project is free software for personal use from Roberto Ferraro π
https://www.linkedin.com/in/ferraroroberto/
Built with β€οΈ for automated social media analytics and reporting