AtliQ Hotels Data Analysis Project

Data Load¶

Datasets¶

We have 5 csv files with records over 1 lac

  • dim_date.csv
  • dim_hotels.csv
  • dim_rooms.csv
  • fact_aggregated_bookings.csv
  • fact_bookings.csv

Quick Exploration¶

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
0 May012216558RT11 16558 27-04-22 1/5/2022 2/5/2022 -3.0 RT1 direct online 1.0 Checked Out 10010 10010
1 May012216558RT12 16558 30-04-22 1/5/2022 2/5/2022 2.0 RT1 others NaN Cancelled 9100 3640
2 May012216558RT13 16558 28-04-22 1/5/2022 4/5/2022 2.0 RT1 logtrip 5.0 Checked Out 9100000 9100
3 May012216558RT14 16558 28-04-22 1/5/2022 2/5/2022 -2.0 RT1 others NaN Cancelled 9100 3640
4 May012216558RT15 16558 27-04-22 1/5/2022 2/5/2022 4.0 RT1 direct online 5.0 Checked Out 10920 10920
date mmm yy week no day_type
0 01-May-22 May 22 W 19 weekend
1 02-May-22 May 22 W 19 weekeday
2 03-May-22 May 22 W 19 weekeday
3 04-May-22 May 22 W 19 weekeday
4 05-May-22 May 22 W 19 weekeday
property_id check_in_date room_category successful_bookings capacity
0 16559 1-May-22 RT1 25 30.0
1 19562 1-May-22 RT1 28 30.0
2 19563 1-May-22 RT1 23 30.0
3 17558 1-May-22 RT1 30 19.0
4 16558 1-May-22 RT1 18 19.0
property_id property_name category city
0 16558 Atliq Grands Luxury Delhi
1 16559 Atliq Exotica Luxury Mumbai
2 16560 Atliq City Business Delhi
3 16561 Atliq Blu Luxury Delhi
4 16562 Atliq Bay Luxury Delhi
room_id room_class
0 RT1 Standard
1 RT2 Elite
2 RT3 Premium
3 RT4 Presidential

Preparing Data¶

We can clearly see that the dates are inconsistent across the dataset.

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
0 May012216558RT11 16558 2022-04-27 00:00:00 2022-05-01 00:00:00 2022-05-02 00:00:00 -3.0 RT1 direct online 1.0 Checked Out 10010 10010
1 May012216558RT12 16558 2022-04-30 00:00:00 2022-05-01 00:00:00 2022-05-02 00:00:00 2.0 RT1 others NaN Cancelled 9100 3640
date mmm yy week no day_type
0 2022-05-01 May 22 W 19 weekend
1 2022-05-02 May 22 W 19 weekeday
property_id check_in_date room_category successful_bookings capacity
0 16559 2022-05-01 RT1 25 30.0
1 19562 2022-05-01 RT1 28 30.0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 92 entries, 0 to 91
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      92 non-null     datetime64[ns]
 1   mmm yy    92 non-null     object        
 2   week no   92 non-null     object        
 3   day_type  92 non-null     object        
dtypes: datetime64[ns](1), object(3)
memory usage: 3.0+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9200 entries, 0 to 9199
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   property_id          9200 non-null   int64         
 1   check_in_date        9200 non-null   datetime64[ns]
 2   room_category        9200 non-null   object        
 3   successful_bookings  9200 non-null   int64         
 4   capacity             9198 non-null   float64       
dtypes: datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 359.5+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 134590 entries, 0 to 134589
Data columns (total 12 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   booking_id         134590 non-null  object 
 1   property_id        134590 non-null  int64  
 2   booking_date       134590 non-null  object 
 3   check_in_date      134590 non-null  object 
 4   checkout_date      134590 non-null  object 
 5   no_guests          134587 non-null  float64
 6   room_category      134590 non-null  object 
 7   booking_platform   134590 non-null  object 
 8   ratings_given      56683 non-null   float64
 9   booking_status     134590 non-null  object 
 10  revenue_generated  134590 non-null  int64  
 11  revenue_realized   134590 non-null  int64  
dtypes: float64(2), int64(3), object(7)
memory usage: 12.3+ MB

df_bookings date columns are still object dtypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 134590 entries, 0 to 134589
Data columns (total 12 columns):
 #   Column             Non-Null Count   Dtype         
---  ------             --------------   -----         
 0   booking_id         134590 non-null  object        
 1   property_id        134590 non-null  int64         
 2   booking_date       134590 non-null  datetime64[ns]
 3   check_in_date      134590 non-null  datetime64[ns]
 4   checkout_date      134590 non-null  datetime64[ns]
 5   no_guests          134587 non-null  float64       
 6   room_category      134590 non-null  object        
 7   booking_platform   134590 non-null  object        
 8   ratings_given      56683 non-null   float64       
 9   booking_status     134590 non-null  object        
 10  revenue_generated  134590 non-null  int64         
 11  revenue_realized   134590 non-null  int64         
dtypes: datetime64[ns](3), float64(2), int64(3), object(4)
memory usage: 12.3+ MB

The datatypes are successfully converted

Helper functions for Visualization¶

Exploring bookings data

(134590, 12)
array(['RT1', 'RT2', 'RT3', 'RT4'], dtype=object)
array(['direct online', 'others', 'logtrip', 'tripster', 'makeyourtrip',
       'journey', 'direct offline'], dtype=object)
booking_platform
others            55066
makeyourtrip      26898
logtrip           14756
direct online     13379
tripster           9630
journey            8106
direct offline     6755
Name: count, dtype: int64

Number of bookings done through booking platforms¶

No description has been provided for this image

Overall Aggregation of Bookings data

property_id booking_date check_in_date checkout_date no_guests ratings_given revenue_generated revenue_realized
count 134590.000000 134590 134590 134590 134587.000000 56683.000000 1.345900e+05 134590.000000
mean 18061.113493 2022-06-11 05:51:49.871461120 2022-06-14 22:52:29.939816960 2022-06-17 07:51:51.262352640 2.036170 3.619004 1.537805e+04 12696.123256
min 16558.000000 2022-04-07 00:00:00 2022-05-01 00:00:00 2022-05-02 00:00:00 -17.000000 1.000000 6.500000e+03 2600.000000
25% 17558.000000 2022-05-19 00:00:00 2022-05-23 00:00:00 2022-05-25 00:00:00 1.000000 3.000000 9.900000e+03 7600.000000
50% 17564.000000 2022-06-11 00:00:00 2022-06-15 00:00:00 2022-06-17 00:00:00 2.000000 4.000000 1.350000e+04 11700.000000
75% 18563.000000 2022-07-04 00:00:00 2022-07-08 00:00:00 2022-07-10 00:00:00 2.000000 5.000000 1.800000e+04 15300.000000
max 19563.000000 2022-07-31 00:00:00 2022-07-31 00:00:00 2022-08-06 00:00:00 6.000000 5.000000 2.856000e+07 45220.000000
std 1093.055847 NaN NaN NaN 1.034885 1.235009 9.303604e+04 6928.108124

Min and Max Revenue generated

Minimum Revenue: 6500
Maximum Revenue: 28560000

Exploring rest of the files

(25, 4)
property_id property_name category city
0 16558 Atliq Grands Luxury Delhi
1 16559 Atliq Exotica Luxury Mumbai
2 16560 Atliq City Business Delhi
3 16561 Atliq Blu Luxury Delhi
4 16562 Atliq Bay Luxury Delhi
category
Luxury      16
Business     9
Name: count, dtype: int64
city
Delhi        5
Hyderabad    6
Bangalore    6
Mumbai       8
Name: count, dtype: int64

Number of Hotels per City¶

No description has been provided for this image
property_id check_in_date room_category successful_bookings capacity
0 16559 2022-05-01 RT1 25 30.0
1 19562 2022-05-01 RT1 28 30.0
2 19563 2022-05-01 RT1 23 30.0
3 17558 2022-05-01 RT1 30 19.0
4 16558 2022-05-01 RT1 18 19.0
array([16559, 19562, 19563, 17558, 16558, 17560, 19558, 19560, 17561,
       16560, 16561, 16562, 16563, 17559, 17562, 17563, 18558, 18559,
       18561, 18562, 18563, 19559, 19561, 17564, 18560])

Total Successful Bookings done by each Property

property_id
16558    3153
16559    7338
16560    4693
16561    4418
16562    4820
16563    7211
17558    5053
17559    6142
17560    6013
17561    5183
17562    3424
17563    6337
17564    3982
18558    4475
18559    5256
18560    6638
18561    6458
18562    7333
18563    4737
19558    4400
19559    4729
19560    6079
19561    5736
19562    5812
19563    5413
Name: successful_bookings, dtype: int64

Properties where Successful Bookings crossed the Capacity

property_id check_in_date room_category successful_bookings capacity
3 17558 2022-05-01 RT1 30 19.0
12 16563 2022-05-01 RT1 100 41.0
4136 19558 2022-06-11 RT2 50 39.0
6209 19560 2022-07-02 RT1 123 26.0
8522 19559 2022-07-25 RT1 35 24.0
9194 18563 2022-07-31 RT4 20 18.0

Max Capacity

property_id check_in_date room_category successful_bookings capacity
27 17558 2022-05-01 RT2 38 50.0
128 17558 2022-05-02 RT2 27 50.0
229 17558 2022-05-03 RT2 26 50.0
328 17558 2022-05-04 RT2 27 50.0
428 17558 2022-05-05 RT2 29 50.0
... ... ... ... ... ...
8728 17558 2022-07-27 RT2 22 50.0
8828 17558 2022-07-28 RT2 21 50.0
8928 17558 2022-07-29 RT2 23 50.0
9028 17558 2022-07-30 RT2 32 50.0
9128 17558 2022-07-31 RT2 30 50.0

92 rows × 5 columns

Data Cleaning¶

date mmm yy week no day_type
0 2022-05-01 May 22 W 19 weekend
1 2022-05-02 May 22 W 19 weekeday
2 2022-05-03 May 22 W 19 weekeday
3 2022-05-04 May 22 W 19 weekeday
4 2022-05-05 May 22 W 19 weekeday

We can see that the day_type column values are inconsistent

date mmm yy week no day_type
0 2022-05-01 May 22 W 19 weekend
1 2022-05-02 May 22 W 19 weekday
2 2022-05-03 May 22 W 19 weekday
3 2022-05-04 May 22 W 19 weekday
4 2022-05-05 May 22 W 19 weekday

Now it is cleaned

Cleaning df_bookings

property_id booking_date check_in_date checkout_date no_guests ratings_given revenue_generated revenue_realized
count 134590.000000 134590 134590 134590 134587.000000 56683.000000 1.345900e+05 134590.000000
mean 18061.113493 2022-06-11 05:51:49.871461120 2022-06-14 22:52:29.939816960 2022-06-17 07:51:51.262352640 2.036170 3.619004 1.537805e+04 12696.123256
min 16558.000000 2022-04-07 00:00:00 2022-05-01 00:00:00 2022-05-02 00:00:00 -17.000000 1.000000 6.500000e+03 2600.000000
25% 17558.000000 2022-05-19 00:00:00 2022-05-23 00:00:00 2022-05-25 00:00:00 1.000000 3.000000 9.900000e+03 7600.000000
50% 17564.000000 2022-06-11 00:00:00 2022-06-15 00:00:00 2022-06-17 00:00:00 2.000000 4.000000 1.350000e+04 11700.000000
75% 18563.000000 2022-07-04 00:00:00 2022-07-08 00:00:00 2022-07-10 00:00:00 2.000000 5.000000 1.800000e+04 15300.000000
max 19563.000000 2022-07-31 00:00:00 2022-07-31 00:00:00 2022-08-06 00:00:00 6.000000 5.000000 2.856000e+07 45220.000000
std 1093.055847 NaN NaN NaN 1.034885 1.235009 9.303604e+04 6928.108124

(1) Cleaning invalid guests

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
0 May012216558RT11 16558 2022-04-27 2022-05-01 2022-05-02 -3.0 RT1 direct online 1.0 Checked Out 10010 10010
3 May012216558RT14 16558 2022-04-28 2022-05-01 2022-05-02 -2.0 RT1 others NaN Cancelled 9100 3640
17924 May122218559RT44 18559 2022-05-12 2022-05-12 2022-05-14 -10.0 RT4 direct online NaN No Show 20900 20900
18020 May122218561RT22 18561 2022-05-08 2022-05-12 2022-05-14 -12.0 RT2 makeyourtrip NaN Cancelled 9000 3600
18119 May122218562RT311 18562 2022-05-05 2022-05-12 2022-05-17 -6.0 RT3 direct offline 5.0 Checked Out 16800 16800
18121 May122218562RT313 18562 2022-05-10 2022-05-12 2022-05-17 -4.0 RT3 direct online NaN Cancelled 14400 5760
56715 Jun082218562RT12 18562 2022-06-05 2022-06-08 2022-06-13 -17.0 RT1 others NaN Checked Out 6500 6500
119765 Jul202219560RT220 19560 2022-07-19 2022-07-20 2022-07-22 -1.0 RT2 others NaN Checked Out 13500 13500
134586 Jul312217564RT47 17564 2022-07-30 2022-07-31 2022-08-01 -4.0 RT4 logtrip 2.0 Checked Out 38760 38760

As you can see above, the number of guests having less than zero value represents data error. We can filter these records.

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
1 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640
2 May012216558RT13 16558 2022-04-28 2022-05-01 2022-05-04 2.0 RT1 logtrip 5.0 Checked Out 9100000 9100
4 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920
5 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100
6 May012216558RT17 16558 2022-04-28 2022-05-01 2022-05-06 2.0 RT1 others NaN Cancelled 9100 3640
... ... ... ... ... ... ... ... ... ... ... ... ...
134584 Jul312217564RT45 17564 2022-07-30 2022-07-31 2022-08-01 2.0 RT4 others 2.0 Checked Out 32300 32300
134585 Jul312217564RT46 17564 2022-07-29 2022-07-31 2022-08-03 1.0 RT4 makeyourtrip 2.0 Checked Out 32300 32300
134587 Jul312217564RT48 17564 2022-07-30 2022-07-31 2022-08-02 1.0 RT4 tripster NaN Cancelled 32300 12920
134588 Jul312217564RT49 17564 2022-07-29 2022-07-31 2022-08-01 2.0 RT4 logtrip 2.0 Checked Out 32300 32300
134589 Jul312217564RT410 17564 2022-07-31 2022-07-31 2022-08-01 2.0 RT4 makeyourtrip NaN Cancelled 32300 12920

134578 rows × 12 columns

(134578, 12)

After removing invalid guests:
Rows : 134578
Columns : 12

(2) Outlier removal in revenue generated

Minimum Revenue: 6500
Maximum Revenue: 28560000
(15378.036937686695, 93040.1549314641)
294498.50173207896
-263742.4278567056
booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
2 May012216558RT13 16558 2022-04-28 2022-05-01 2022-05-04 2.0 RT1 logtrip 5.0 Checked Out 9100000 9100
111 May012216559RT32 16559 2022-04-29 2022-05-01 2022-05-02 6.0 RT3 direct online NaN Checked Out 28560000 28560
315 May012216562RT22 16562 2022-04-28 2022-05-01 2022-05-04 2.0 RT2 direct offline 3.0 Checked Out 12600000 12600
562 May012217559RT118 17559 2022-04-26 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 2000000 4420
129176 Jul282216562RT26 16562 2022-07-21 2022-07-28 2022-07-29 2.0 RT2 direct online 3.0 Checked Out 10000000 12600

The Rows above show revenue values higher than the higher limit. Therefore, they should be removed.

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
1 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640
4 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920
5 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100
6 May012216558RT17 16558 2022-04-28 2022-05-01 2022-05-06 2.0 RT1 others NaN Cancelled 9100 3640
7 May012216558RT18 16558 2022-04-26 2022-05-01 2022-05-03 2.0 RT1 logtrip NaN No Show 9100 9100
... ... ... ... ... ... ... ... ... ... ... ... ...
134584 Jul312217564RT45 17564 2022-07-30 2022-07-31 2022-08-01 2.0 RT4 others 2.0 Checked Out 32300 32300
134585 Jul312217564RT46 17564 2022-07-29 2022-07-31 2022-08-03 1.0 RT4 makeyourtrip 2.0 Checked Out 32300 32300
134587 Jul312217564RT48 17564 2022-07-30 2022-07-31 2022-08-02 1.0 RT4 tripster NaN Cancelled 32300 12920
134588 Jul312217564RT49 17564 2022-07-29 2022-07-31 2022-08-01 2.0 RT4 logtrip 2.0 Checked Out 32300 32300
134589 Jul312217564RT410 17564 2022-07-31 2022-07-31 2022-08-01 2.0 RT4 makeyourtrip NaN Cancelled 32300 12920

134573 rows × 12 columns

After removing the rows showing values above higher limit of revenue:
Rows: 134573
Columns: 12

(134573, 12)

Checking for Revenue realized

count    134573.000000
mean      12695.983585
std        6927.791692
min        2600.000000
25%        7600.000000
50%       11700.000000
75%       15300.000000
max       45220.000000
Name: revenue_realized, dtype: float64
33479.358661845814
booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
137 May012216559RT41 16559 2022-04-27 2022-05-01 2022-05-07 4.0 RT4 others NaN Checked Out 38760 38760
139 May012216559RT43 16559 2022-05-01 2022-05-01 2022-05-02 6.0 RT4 tripster 3.0 Checked Out 45220 45220
143 May012216559RT47 16559 2022-04-28 2022-05-01 2022-05-03 3.0 RT4 others 5.0 Checked Out 35530 35530
149 May012216559RT413 16559 2022-04-24 2022-05-01 2022-05-07 5.0 RT4 logtrip NaN Checked Out 41990 41990
222 May012216560RT45 16560 2022-04-30 2022-05-01 2022-05-03 5.0 RT4 others 3.0 Checked Out 34580 34580
... ... ... ... ... ... ... ... ... ... ... ... ...
134328 Jul312219560RT49 19560 2022-07-31 2022-07-31 2022-08-02 6.0 RT4 direct online 5.0 Checked Out 39900 39900
134331 Jul312219560RT412 19560 2022-07-31 2022-07-31 2022-08-01 6.0 RT4 others 2.0 Checked Out 39900 39900
134467 Jul312219562RT45 19562 2022-07-28 2022-07-31 2022-08-01 6.0 RT4 makeyourtrip 4.0 Checked Out 39900 39900
134474 Jul312219562RT412 19562 2022-07-25 2022-07-31 2022-08-06 5.0 RT4 direct offline 5.0 Checked Out 37050 37050
134581 Jul312217564RT42 17564 2022-07-31 2022-07-31 2022-08-01 4.0 RT4 makeyourtrip 4.0 Checked Out 38760 38760

1299 rows × 12 columns

One observation we get from the above dataframe is that all rooms are of RT4 category which means a presidential suite. Now, since RT4 is a luxurious room it is likely it's rent will be higher. To make a fair analysis, we need to do data analysis only on RT4 room types

Checking Revenue realized for RT4 rooms

room_id room_class
0 RT1 Standard
1 RT2 Elite
2 RT3 Premium
3 RT4 Presidential
count    16071.000000
mean     23439.308444
std       9048.599076
min       7600.000000
25%      19000.000000
50%      26600.000000
75%      32300.000000
max      45220.000000
Name: revenue_realized, dtype: float64
50583

Here, the higher limit comes to be 50583 and in our dataframe above we can see that the max value for the revenue realized is 45220. Hence, we can conclude that there are no outliers in the room_category column and therefore, we don't need to do any data cleaning on it.

Checking for null values

booking_id               0
property_id              0
booking_date             0
check_in_date            0
checkout_date            0
no_guests                0
room_category            0
booking_platform         0
ratings_given        77897
booking_status           0
revenue_generated        0
revenue_realized         0
dtype: int64

Total values in our dataframe are 1,34,573. Out of that 77,899 rows have null rating. Since there are many rows with null rating, we should not filter these values. Also we should not replace this rating with a median or mean rating.

Cleaning df_agg_bookings

property_id check_in_date room_category successful_bookings capacity
0 16559 2022-05-01 RT1 25 30.0
1 19562 2022-05-01 RT1 28 30.0
2 19563 2022-05-01 RT1 23 30.0
3 17558 2022-05-01 RT1 30 19.0
4 16558 2022-05-01 RT1 18 19.0

Checking for null values

property_id            0
check_in_date          0
room_category          0
successful_bookings    0
capacity               2
dtype: int64
property_id check_in_date room_category successful_bookings capacity
8 17561 2022-05-01 RT1 22 NaN
14 17562 2022-05-01 RT1 12 NaN

Filling the null values with median values

25.0
property_id check_in_date room_category successful_bookings capacity
8 17561 2022-05-01 RT1 22 25.0
14 17562 2022-05-01 RT1 12 25.0

Properties where successful bookings are more than capacity. That is an outlier

property_id check_in_date room_category successful_bookings capacity
3 17558 2022-05-01 RT1 30 19.0
12 16563 2022-05-01 RT1 100 41.0
4136 19558 2022-06-11 RT2 50 39.0
6209 19560 2022-07-02 RT1 123 26.0
8522 19559 2022-07-25 RT1 35 24.0
9194 18563 2022-07-31 RT4 20 18.0
(9200, 5)
(9194, 5)

Rows after cleaning: 9194
Columns after cleaning: 5

Data Transformation¶

property_id check_in_date room_category successful_bookings capacity
0 16559 2022-05-01 RT1 25 30.0
1 19562 2022-05-01 RT1 28 30.0
2 19563 2022-05-01 RT1 23 30.0
4 16558 2022-05-01 RT1 18 19.0
5 17560 2022-05-01 RT1 28 40.0

Create occupancy percentage column

property_id check_in_date room_category successful_bookings capacity occ_pct
0 16559 2022-05-01 RT1 25 30.0 0.833333
1 19562 2022-05-01 RT1 28 30.0 0.933333
2 19563 2022-05-01 RT1 23 30.0 0.766667
4 16558 2022-05-01 RT1 18 19.0 0.947368
5 17560 2022-05-01 RT1 28 40.0 0.700000

Converting values to percentage format

property_id check_in_date room_category successful_bookings capacity occ_pct
0 16559 2022-05-01 RT1 25 30.0 83.33
1 19562 2022-05-01 RT1 28 30.0 93.33
2 19563 2022-05-01 RT1 23 30.0 76.67
4 16558 2022-05-01 RT1 18 19.0 94.74
5 17560 2022-05-01 RT1 28 40.0 70.00
<class 'pandas.core.frame.DataFrame'>
Index: 9194 entries, 0 to 9199
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   property_id          9194 non-null   int64         
 1   check_in_date        9194 non-null   datetime64[ns]
 2   room_category        9194 non-null   object        
 3   successful_bookings  9194 non-null   int64         
 4   capacity             9194 non-null   float64       
 5   occ_pct              9194 non-null   float64       
dtypes: datetime64[ns](1), float64(2), int64(2), object(1)
memory usage: 502.8+ KB

Column is successfully created.

Insights Generation¶

1. What is the average occupancy rate in each of the room categories?

room_category
RT1    57.89
RT2    58.01
RT3    58.03
RT4    59.28
Name: occ_pct, dtype: float64

Manager may say, "I don't understand RT1, RT2".
Therefore:
Printing room categories such as Standard, Premium, Elite etc along with average occupancy percentage would be better

room_id room_class
0 RT1 Standard
1 RT2 Elite
2 RT3 Premium
3 RT4 Presidential

Merging df_agg_bookings and df_rooms

New datframe: df

property_id check_in_date room_category successful_bookings capacity occ_pct room_id room_class
0 16559 2022-05-01 RT1 25 30.0 83.33 RT1 Standard
1 19562 2022-05-01 RT1 28 30.0 93.33 RT1 Standard
2 19563 2022-05-01 RT1 23 30.0 76.67 RT1 Standard
3 16558 2022-05-01 RT1 18 19.0 94.74 RT1 Standard
4 17560 2022-05-01 RT1 28 40.0 70.00 RT1 Standard

Dropping room_id which is already in df_agg_bookings by name room_category

property_id check_in_date room_category successful_bookings capacity occ_pct room_class
0 16559 2022-05-01 RT1 25 30.0 83.33 Standard
1 19562 2022-05-01 RT1 28 30.0 93.33 Standard
2 19563 2022-05-01 RT1 23 30.0 76.67 Standard
3 16558 2022-05-01 RT1 18 19.0 94.74 Standard
4 17560 2022-05-01 RT1 28 40.0 70.00 Standard

Average Occupancy % by Room Class

room_class
Elite           58.01
Premium         58.03
Presidential    59.28
Standard        57.89
Name: occ_pct, dtype: float64

2. Average occupancy rate per city.

property_id property_name category city
0 16558 Atliq Grands Luxury Delhi
1 16559 Atliq Exotica Luxury Mumbai
2 16560 Atliq City Business Delhi
3 16561 Atliq Blu Luxury Delhi
4 16562 Atliq Bay Luxury Delhi

Merging df and df_hotels

df already has df_agg_bookings data and df_rooms data before this

property_id check_in_date room_category successful_bookings capacity occ_pct room_class property_name category city
0 16559 2022-05-01 RT1 25 30.0 83.33 Standard Atliq Exotica Luxury Mumbai
1 19562 2022-05-01 RT1 28 30.0 93.33 Standard Atliq Bay Luxury Bangalore
2 19563 2022-05-01 RT1 23 30.0 76.67 Standard Atliq Palace Business Bangalore
3 16558 2022-05-01 RT1 18 19.0 94.74 Standard Atliq Grands Luxury Delhi
4 17560 2022-05-01 RT1 28 40.0 70.00 Standard Atliq City Business Mumbai

Average Occupancy Rate by City¶

No description has been provided for this image

3. When was the occupancy better? Weekday or Weekend?

date mmm yy week no day_type
0 2022-05-01 May 22 W 19 weekend
1 2022-05-02 May 22 W 19 weekday
2 2022-05-03 May 22 W 19 weekday
3 2022-05-04 May 22 W 19 weekday
4 2022-05-05 May 22 W 19 weekday

Merging df and df_date

property_id check_in_date room_category successful_bookings capacity occ_pct room_class property_name category city date mmm yy week no day_type
0 16559 2022-05-01 RT1 25 30.0 83.33 Standard Atliq Exotica Luxury Mumbai 2022-05-01 May 22 W 19 weekend
1 19562 2022-05-01 RT1 28 30.0 93.33 Standard Atliq Bay Luxury Bangalore 2022-05-01 May 22 W 19 weekend
2 19563 2022-05-01 RT1 23 30.0 76.67 Standard Atliq Palace Business Bangalore 2022-05-01 May 22 W 19 weekend
3 16558 2022-05-01 RT1 18 19.0 94.74 Standard Atliq Grands Luxury Delhi 2022-05-01 May 22 W 19 weekend
4 17560 2022-05-01 RT1 28 40.0 70.00 Standard Atliq City Business Mumbai 2022-05-01 May 22 W 19 weekend

Occupancy %- Weekday vs Weekend

day_type
weekday    51.81
weekend    73.96
Name: occ_pct, dtype: float64

4. Monthly Trend by Occupancy

mmm yy
May 22    58.93
Jun 22    58.03
Jul 22    57.94
Name: occ_pct, dtype: float64

Monthly Trend by Occupancy¶

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

In the month of June, what is the occupancy for different cities?

array(['May 22', 'Jun 22', 'Jul 22'], dtype=object)
property_id check_in_date room_category successful_bookings capacity occ_pct room_class property_name category city date mmm yy week no day_type
3098 16559 2022-06-01 RT1 14 30.0 46.67 Standard Atliq Exotica Luxury Mumbai 2022-06-01 Jun 22 W 23 weekday
3099 18560 2022-06-01 RT1 18 30.0 60.00 Standard Atliq City Business Hyderabad 2022-06-01 Jun 22 W 23 weekday
3100 19562 2022-06-01 RT1 18 30.0 60.00 Standard Atliq Bay Luxury Bangalore 2022-06-01 Jun 22 W 23 weekday
3101 19563 2022-06-01 RT1 14 30.0 46.67 Standard Atliq Palace Business Bangalore 2022-06-01 Jun 22 W 23 weekday
3102 17558 2022-06-01 RT1 8 19.0 42.11 Standard Atliq Grands Luxury Mumbai 2022-06-01 Jun 22 W 23 weekday

Average Occupancy % by City in June

city
Delhi        61.46
Mumbai       57.79
Hyderabad    57.69
Bangalore    55.85
Name: occ_pct, dtype: float64

Reading and Exploring new data

property_id property_name category city room_category room_class check_in_date mmm yy week no day_type successful_bookings capacity occ%
0 16559 Atliq Exotica Luxury Mumbai RT1 Standard 2022-08-01 Aug-22 W 32 weekeday 30 30 100.00
1 19562 Atliq Bay Luxury Bangalore RT1 Standard 2022-08-01 Aug-22 W 32 weekeday 21 30 70.00
2 19563 Atliq Palace Business Bangalore RT1 Standard 2022-08-01 Aug-22 W 32 weekeday 23 30 76.67
3 19558 Atliq Grands Luxury Bangalore RT1 Standard 2022-08-01 Aug-22 W 32 weekeday 30 40 75.00
4 19560 Atliq City Business Bangalore RT1 Standard 2022-08-01 Aug-22 W 32 weekeday 20 26 76.92

Cleaning the day_type column

property_id property_name category city room_category room_class check_in_date mmm yy week no day_type successful_bookings capacity occ%
0 16559 Atliq Exotica Luxury Mumbai RT1 Standard 2022-08-01 Aug-22 W 32 weekday 30 30 100.0
1 19562 Atliq Bay Luxury Bangalore RT1 Standard 2022-08-01 Aug-22 W 32 weekday 21 30 70.0
Index(['property_id', 'property_name', 'category', 'city', 'room_category',
       'room_class', 'check_in_date', 'mmm yy', 'week no', 'day_type',
       'successful_bookings', 'capacity', 'occ%'],
      dtype='object')
Index(['property_id', 'check_in_date', 'room_category', 'successful_bookings',
       'capacity', 'occ_pct', 'room_class', 'property_name', 'category',
       'city', 'date', 'mmm yy', 'week no', 'day_type'],
      dtype='object')
(7, 13)

Columns in august data: 13

(9194, 14)

Columns in the dataframe where the august data is to be appended: 14

Number of Columns should be same

Dropping the extra date column created due to merge operation

property_id check_in_date room_category successful_bookings capacity occ_pct room_class property_name category city mmm yy week no day_type
0 16559 2022-05-01 RT1 25 30.0 83.33 Standard Atliq Exotica Luxury Mumbai May 22 W 19 weekend
1 19562 2022-05-01 RT1 28 30.0 93.33 Standard Atliq Bay Luxury Bangalore May 22 W 19 weekend
2 19563 2022-05-01 RT1 23 30.0 76.67 Standard Atliq Palace Business Bangalore May 22 W 19 weekend
3 16558 2022-05-01 RT1 18 19.0 94.74 Standard Atliq Grands Luxury Delhi May 22 W 19 weekend
4 17560 2022-05-01 RT1 28 40.0 70.00 Standard Atliq City Business Mumbai May 22 W 19 weekend
(9194, 13)

Updating the records

property_id check_in_date room_category successful_bookings capacity occ_pct room_class property_name category city mmm yy week no day_type occ%
9196 19563 2022-08-01 RT1 23 30.0 NaN Standard Atliq Palace Business Bangalore Aug-22 W 32 weekday 76.67
9197 19558 2022-08-01 RT1 30 40.0 NaN Standard Atliq Grands Luxury Bangalore Aug-22 W 32 weekday 75.00
9198 19560 2022-08-01 RT1 20 26.0 NaN Standard Atliq City Business Bangalore Aug-22 W 32 weekday 76.92
9199 17561 2022-08-01 RT1 18 26.0 NaN Standard Atliq Blu Luxury Mumbai Aug-22 W 32 weekday 69.23
9200 17564 2022-08-01 RT1 10 16.0 NaN Standard Atliq Seasons Business Mumbai Aug-22 W 32 weekday 62.50

5. Revenue realized per city

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
1 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640
4 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920
5 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100
property_id property_name category city
0 16558 Atliq Grands Luxury Delhi
1 16559 Atliq Exotica Luxury Mumbai
2 16560 Atliq City Business Delhi

Merging df_bookings and df_hotels

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized property_name category city
0 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640 Atliq Grands Luxury Delhi
1 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920 Atliq Grands Luxury Delhi
2 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100 Atliq Grands Luxury Delhi
city
Bangalore    420383550
Delhi        294404488
Hyderabad    325179310
Mumbai       668569251
Name: revenue_realized, dtype: int64

Revenue realized by City¶

No description has been provided for this image

6. Month by month revenue

array(['May 22', 'Jun 22', 'Jul 22'], dtype=object)
date mmm yy week no day_type
0 2022-05-01 May 22 W 19 weekend
1 2022-05-02 May 22 W 19 weekday

Merging df_bookings_all and df_date

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized property_name category city date mmm yy week no day_type
0 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640 Atliq Grands Luxury Delhi 2022-05-01 May 22 W 19 weekend
1 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920 Atliq Grands Luxury Delhi 2022-05-01 May 22 W 19 weekend
2 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100 Atliq Grands Luxury Delhi 2022-05-01 May 22 W 19 weekend
booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized property_name category city mmm yy week no day_type
0 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640 Atliq Grands Luxury Delhi May 22 W 19 weekend
1 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920 Atliq Grands Luxury Delhi May 22 W 19 weekend
2 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100 Atliq Grands Luxury Delhi May 22 W 19 weekend
mmm yy
Jul 22    572843348
Jun 22    553925855
May 22    581767396
Name: revenue_realized, dtype: int64

Monthly Trend by Revenue¶

No description has been provided for this image

7. Revenue realized per hotel type

property_name
Atliq Seasons     66086735
Atliq Grands     211462134
Atliq Bay        259996918
Atliq Blu        260851922
Atliq City       285798439
Atliq Palace     304081863
Atliq Exotica    320258588
Name: revenue_realized, dtype: int64
Index(['Atliq Exotica', 'Atliq Palace', 'Atliq City', 'Atliq Blu', 'Atliq Bay',
       'Atliq Grands', 'Atliq Seasons'],
      dtype='object', name='property_name')

Top Hotels by Revenue¶

No description has been provided for this image

8. Average rating per city

city
Delhi        3.78
Hyderabad    3.66
Mumbai       3.65
Bangalore    3.41
Name: ratings_given, dtype: float64
Index(['Delhi', 'Hyderabad', 'Mumbai', 'Bangalore'], dtype='object', name='city')

Avg Rating per City¶

No description has been provided for this image

Revenue realized per Booking Platform¶

No description has been provided for this image

Revenue by City and Booking Platform¶

No description has been provided for this image

Revenue: Weekday vs Weekend¶

No description has been provided for this image

Finding Revenue by Room Category

Merging df_bookings and df_rooms

booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized
1 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640
4 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920
5 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100
6 May012216558RT17 16558 2022-04-28 2022-05-01 2022-05-06 2.0 RT1 others NaN Cancelled 9100 3640
7 May012216558RT18 16558 2022-04-26 2022-05-01 2022-05-03 2.0 RT1 logtrip NaN No Show 9100 9100
room_id room_class
0 RT1 Standard
1 RT2 Elite
2 RT3 Premium
3 RT4 Presidential
booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized room_id room_class
0 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640 RT1 Standard
1 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920 RT1 Standard
2 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100 RT1 Standard
3 May012216558RT17 16558 2022-04-28 2022-05-01 2022-05-06 2.0 RT1 others NaN Cancelled 9100 3640 RT1 Standard
4 May012216558RT18 16558 2022-04-26 2022-05-01 2022-05-03 2.0 RT1 logtrip NaN No Show 9100 9100 RT1 Standard
booking_id property_id booking_date check_in_date checkout_date no_guests room_category booking_platform ratings_given booking_status revenue_generated revenue_realized room_class
0 May012216558RT12 16558 2022-04-30 2022-05-01 2022-05-02 2.0 RT1 others NaN Cancelled 9100 3640 Standard
1 May012216558RT15 16558 2022-04-27 2022-05-01 2022-05-02 4.0 RT1 direct online 5.0 Checked Out 10920 10920 Standard
2 May012216558RT16 16558 2022-05-01 2022-05-01 2022-05-03 2.0 RT1 others 4.0 Checked Out 9100 9100 Standard
3 May012216558RT17 16558 2022-04-28 2022-05-01 2022-05-06 2.0 RT1 others NaN Cancelled 9100 3640 Standard
4 May012216558RT18 16558 2022-04-26 2022-05-01 2022-05-03 2.0 RT1 logtrip NaN No Show 9100 9100 Standard

Revenue by Room Category¶

No description has been provided for this image