Profile Photo

Tech ClassesOffline

  • Tech_Classes
Data Analysis End-to-End Project for Portfolio STEP BY STEP | How to create a Data Analyst Project

Data Analysis End-to-End Project for Portfolio STEP BY STEP | How to create a Data Analyst Project

Hello everyone welcome to my channel tech classes. So, in this video we are going to make data analysis project one end to end project, so like a company comes with a problem statement given on the basis of that you research and find the data, then analyze and interpret its results in the last in terms of report , so in this way I have created a complete data analysis project. Or you must have seen on YouTube, hotel booking is such a data set, which is a very common data set, many people have made a project on this, but what people do is that they take a Jupyter notebook, load the data set in it, and then do anything. They perform EDA that is useless without any aim , such a project is not done, the project is done on the basis of the first problem statement, we do not just explore the data, we have to analyze it on the basis of a problem statement and Solution is to be found out of that , so in the same way I framed the project with this data , if you like the project, you will modify this project and in the comment section, you will definitely give its link that how you modified this project and you can resume it. You can also add this project to your portfolio so this is a complete project, first of all, we will take the problem statement, next understand the problem statement, then identify, the data which is already identified, but here I have just said that This is how it happens in like company , on the basis of problem statement, you have to research the data set, so we will find the data, clean it, explore the data set, because which columns are there in it, which match our requirement. Which are necessary for requirement, we will explore them , which values ​​are there, which variables are there, we will see what type of distribution it is, and based on the requirement, based on the problem statement, we will analyze the data and find the insights from it in the form of visualization will add it in a report. Now here you can see that this complete report starts with business problem then we have made assumptions based on the business problem we have made some research question and hypothesis and after that comes analysis and finding and some solutions in conclusion Some of the suggestions we have given to the hotel. Suppose hotels are like our clients come to us and say that our cancellation rate is very high, analyze us and tell where our problems are coming from. Means where is the problem of the hotel coming and how can we tackle it, how can we solve it, how can we reduce the cancellation, this is some kind of business problem, we will see it further in detail, then first of all we will see What are main steps in data analysis project and according to those steps we will create project and if you like this project then please like video and subscribe on my channel before starting we understand the steps of project that when If you start a data project, then what are the steps through which you can complete a data analysis project, then first of all start the project. Create a problem statement. Defining is to create problem statement and on its basis you have to proceed further then according to that you have to select data according to that you have to do requirement gathering and perform analysis then problem statement is very important if you are going to make any data analysis project let's see next identify the data you want to analyze when you have a problem statement when you By defining the problem statement, you have to find the solution of which problem you are going to tackle, then according to that you have to find the data, you have to collect it, you have to gather it in one place, on which you have to form the analysis. Now the third one is explore and clean the data set so in many cases it happens that the data set we collect may have many like inconsistencies in the data set or there may be some appropriate variable in some data There may be missing or there may be a lot of duplicate data, then we have to remove these types of errors which have come in our data set before analysis because if we form the analysis on raw data set, results or insights can differ. So first of all we will clean the data set whatever data set we have collected we will clean it and on the same clean data set we will analyze and perform analysis, analyze data to get useful Insights, whatever the form of analysis will be on the data set, it will be keeping in mind the problem statement, so the problem statement also includes the formulating research questions and hypothesis, so what is the research question and hypothesis, which questions Research question is what you want to find answers to. In the hypothesis you make a kind of assumptions, you make a kind of possibilities, if I am going to perform analysis on a data set, then these possibilities can be that the answer to this question will be What will be the answer to the second question, then what is the hypothesis, when the complete analysis is completed in the last, then we come to know that the hypothesis we have created is true or false, so in this way you will get the problem statement research questions and Hypothesis have to be created and on the same basis all the analysis will be performed because on a data set you can perform different kind of analysis, but whatever the analysis is, if you do it keeping in mind the problem statement, then your project will be done in a sequence . Whatever analysis you will make all that will be defining the problem statements, When you will complete the data analysis if you get useful insights, then the best way to present them is to make a report. In this project, I have made a report. What I have shared in the starting, what is written in the report, I have written the problem statement, research questions, hypotheses, assumptions, which we have made for the data set and analysis and findings, whatever analysis we have performed, and according to that, what can we suggest ? Can be presented to the client in front or for whom we have made a project or if we have made a project for ourselves, then for the problem statement, we can define the suggestion, then I have presented all these things in the report, then If you are making this project of data analysis, then until you present it, in some way like not presenting the answer to its problem statement then it will not become a complete project because what many people do, they make jupyter notebook in it EDA has performed and shown the visualization but it does not tell everything that why you have done all this visualization, then why have you made, there are some findings like this, if you look at the reports or If you show through visualization in dashboard this is complete step of a data analysis end to end project. So first what you have to do is define problem statement then you have to analyze the data you have to collect the data clean the data And then perform the different kind of analysis using visualization on the data set and present In terms of report, this is the business problem that I have defined for hotel booking projects, so in this whole video, you will see completely all the steps that I have shared, first we will define the business problem formula, then in recent years, city hotel and resort hotel have seen high. Cancellation rates ok so this is the problem high cancellation rate suppose city hotel and resort hotel is your client they come to you and say we are losing a lot of revenue because of the high cancellation so what you have to do You have to understand their problems, according to that you have to find the data set, again so that this is the problem so what is the business problem that there are too many cancellations in city hotel and resort hotel. Each hotel is now dealing with the a number of issues as a result including fewer revenues and less than ideal hotel room use. So It means as the cancellation rate is increasing their revenue is decreasing and the hotel rooms are also remaining ideal no one is using it Revenue is already decreasing but they are also spending a lot of money on hotel rooms and their servicing Hence, lower cancellation rate is both hotel 's primary goal in order to increase their efficiency in generating revenue and for us to offer some business advice to address problem. So, cancellation rate is the biggest concern in these two hotels first of all We have to solve that and then give some business advice so that they can solve this problem so the analysis of hotel booking cancellation as well as other factors that have no bearing on their business yearly revenue generation are the main topics of this report, so in this report, what we have done , why cancellation is happening and what are the factors affecting hotel booking cancellation, we have focused on them, analyzed them and what are the things that are waste is one way hotel don't have to pay attention on we have analyzed in this project. Hotel booking is a project which is like available on internet, you can see everywhere, many people have performed EDA in many ways so I whatever I have done in this data analysis project it has been done keeping in mind only this problem statement . You can also define your own problem statement and create your own project. This is an idea that I am giving you so that you can create any data analysis project so that is all about the problems statement and now comes the assumptions. So these are the assumptions we have taken for the data set and some hotels So first assumption is no unusual occurrences between 2015 and 2017 will have a substantial impact on the data used. unusual occurrences it means outlier so we are assuming that in the data set there is no outlier the data set that I am using is taken from the Kaggle, there are many outliers in it, but for the analysis of this problem, the statement is being written. It is a kind of report, when you go in company and work then if you have to form an analysis for a client , then you will present it in the same format. First problem statement then assumptions that you have created based on it So, the information is still current and can be used to analyze a hotel's possible plans in an efficient manner. By the way, this is very old data, it is data till 2017, but we are assuming in that this is current data and whatever analysis we form, we can do it for its decision making. So the third assumption is There are no unanticipated negatives to the hotel employing any advised technique. It means, whatever advice and solutions we will present to the hotel they will accept without any problem or any negative feedback. The hotels are not currently using any of the suggest solution. Whatever solution we are going to give, hotels do not have those kind of solutions. The biggest factor affecting the effectiveness of earning income is booking cancellation . Here this is the biggest factor affecting the earning is booking cancellation . Sixth, cancellation results in vacant rooms for the booked length of time. what does it mean that on the day the customer has canceled the room will remain empty for the day means no earning will be generated from it. Clients make hotel reservations the same year they make cancellations They are performing cancellation in same year in which they have booked reservation. It is not like today reservation is done and next year they are canceling there is no such thing. Now, the research question that we have define for the problem statement first what are the variables that affect hotel reservation cancellations? How can we make hotel reservations cancellations better? How will hotels be assisted in making pricing and promotional decisions? these are three research questions we define first and identify which are the factors affecting reservation cancellation next what can we do means how can we improve that the cancellation rate should decrease and lastly if We can help the hotel in price or in promotion decision so that if cancellation decrease is done then what will they remain then this type of research question is based on the research question we have created same high court cancellation and do prices are higher Langar waiting list customer 10 tu cancel d majority offline r coming from offline travel agents tu make their reservation price higher hoga then customer cancel more or if they have been placed in langar waiting list like today reservation like aaj reservation kiya and If their reservation is confirmed today , then they will not cancel that much, but if it is more than a day, they may move to another hotel, so the cancellation weight may also increase, so this was the second hypothesis, third hypothesis. Hypothesis is that our clients are customers of hotels, those offline travel agencies come more, they create more reservations through offline travel agents, so these are some hypotheses we have created, nine of this analysis and the findings that we are going to create In d jupyter notebook and whatever its suggestion like these now we will perform analysis in jupyter notebook we will define here hotel booking data set it has some rest 36 columns and I think one lakh plus daily three data will be available I I will also give its link in the description box means start d project in d jupyter notebook keeping in mind your problem statement research question hypothesis we will perform analysis apart from that you can do many things till this date this data is very useful Well, you can find different types of insights from this , but in this project, we will find only as much as is necessary for our project, so let's start, first of all, we create a Jupyter notebook here, then in the Jupyter notebook we If we make it, then we make it in a sequence that first we import all the libraries, then we load the data set, after that we perform idiot, and in the last we perform analysis and visualization, and in EDM itself, we also do data cleaning. If we do , let's first start with importing libraries, so important libraries ok, so which libraries we can do that, first of all give Pandas to handle the data, after that if any warning comes, then to remove it We will also improve the warnings so that we can ignore the warnings and use our Jupyter notebook. it will look good without any warnings so here i rename it to data analysis notebook ko hotel booking pd.reed csp here in d form of d cs format so c will call d reed cs3 function will call body function ko karenge and In this we will pass the path of the data set, there is no path here, our notebook is in the folder in which our data is present, so we can directly paste the name, so from here I copy the name and yahan pe paste ok so yeh succesfully run ho gaya exploratory data analysis and data plan karte hain to usse kya karte hain data set humne load kar liya to humko data kaisa hai to hum kya karenge ds.head is simple a function which returns top five If we don't pass any number here then we returned top five days in it and our data set looks something like this let's see its last also you can also see some of its last five days if here you have any If you pass the number then that number of days will be returned like here now it has returned 10 days and earlier when I did not give anything to you, it used to return five days, then five is the default value, on top of that you put any value. can so ok with us it looks something like this nine moving a hair you have to see ho main number of rose and columns are present in the data set so you will c d f dot shop using column se yahan pe kuch jo columns The one who specifically represents a person, if anyone gives you his data in real time, he will never give personal data of his customer, he will only give general data whose panels are performed, then according to that, I have collected all those from the data set. The thing has been removed, it has become a general data and on this you will form the analysis report which I have created and the data set I have created after removing the customer personal data, so I will share all these things with you in the description box. From where you can go and see and change from yourself , whatever you want in this project, if you want to create a new problem statement or add some new research questions to this problem statement , then you can also do it from the dashboard You can create a report on this data set, you can also create a dashboard, so all these are some problems, hotels are city hotels and resort hotels, is_canceled is our main variable, which has 0 or 1 data binary data 0 represents that there is no cancellation and 1 represents that reservation is canceled these are the columns, but we don't want to focus on all the columns right now just focus on those which is related to our problem statement ok let's move ahead now after seeing columns, we have to see the datatypes of the columns so for this you can use df.info() function. Here you will also see how many non null values are there means here you are looking at 6737 and here you are looking at 119390 is the total number of rows, so here there are some columns in which the values ​​are less, so it means that there are missing values, all variable's datatypes are correct but there is one column which is here. Reservation status date is in object form, but we have to perform analysis on reservation status date only, so we have to change it in date time first, so first of all, whenever you are using any dataset, then first look for any date column. If that date column is necessary, then first convert it into date time, so to convert it into date time, first of all we have to write the column name, reservation_status_date, then pd.to_datetime(), this function will convert any date column to datetime format so after converting the reservation_status_date to datetime, we are again saving it to the same column Lets recheck again now you can see, its datetime There are some object datatypes, which is nothing but the categorical columns for categorical columns, also we can see how many values ​​are there in which categorical column, how many unique values ​​are there or you can see how many categories are there, so for that we can do describe function, Now everyone knows that describe function is used for numerical columns only. but there is a parameter in this include if you put object in include then only you will get a summary statistics of object column then you can see here count index how many count how many normal values, ​​unique values ​​means number of categories. In hotel, there are 2, city hotel and resort hotel only, then in date month, there are 12 values, 5 in meal, ​​in the country, there are 177 values. ​In the market segment 8, distribution channel 5, like this, and in all the columns, Now lets see what are these categories in the object columns, now the number is known, but it is also necessary to see how many means which are the values ​​that are coming in these categories we will have to run a loop, we will have to run a for on those columns which is of object type, so now how to get object type columns here if you see what is returning here, it is returning only object columns, it is returning data of object columns, so we will do this by using it If we take the columns then we will get all the columns which are of object data type. So for column in unique() function which we can use with any column in the data frame so first of all what we are doing is filter in the data frame with the column and then passing the unique function let's run this lets make it clearly visible by adding a line after every category, Now every column is separate In hotel, there are 2 values, resort hotel city hotel, arrival date month july, august, september There is no need to go into so much detail, we do not have to perform much analysis , many countries have been given in the country , in this country, I have seen that there are some countries whose value count is only 1 then don't count it. We only have to see the top country, so the market segment category are direct, corporate online TA, TA is travel agent offline TA & TO means Tour Operators, Complimentary, Groups, Undefined, Aviation. So Direct means they are coming Directly to the hotel and making their reservation_status : Checkout, Canceled, No show. No show means they have not came to hotel lets check missing values, ​​so the easiest way to check missing values df.isnull().sum() it will return the column name and also how many total values ​​are there which are missing so here if you can see there are only four columns which have missing values ​​children have only four records country has 488, agents has 16340 and Company has 1 lakh plus, now data is also 1 lakh plus, so if we go to handle it, it will be very hectic, so what we are going to do simply, we will remove the column of both agent and company because It is not possible to handle it we don't need agent so we can remove agent and company data is so much missing value that we can't handle it now coming to the country missing value is 488 only and here in children there are only 4 records. So there are total 1 lakh plus records. If only 500 records are missing then we can also simply drop it so what we will do here first of all we will remove these two columns and all the missing values ​​in country columns We will drop it , so first of all we remove the columns, company and drop function is used, if you want to remove a row or a column, then if you want to remove a column, then you have to do the axis =1. If you want to change in the data frame, then you have to pass inplace = True as a parameter and in this function you can pass as many column names as you like to remove as a list , now for removing the rows with the missing values you can simply use dropna, so every row with missing values will be removed and to make the change in the original data frame will use inplace = True lets run this and again check for the missing values ​​so from all the columns, all the missing values ​​have been removed from. Now lets see the summary statistics of the the numerical columns, so this is the function that returns the summary statistics of the numerical column until you passed include=object. When you will see in this returned dataframe You will get a lot of outliers So outliers will be 10 children is not possible 10 babies is also not possible so these are the columns that are not necessary so will not remove the outliers adr : average daily rate or you can say price 5400 value of hotel is very high if I show it to you by plotting box plot then you can understand how high it is so if you are looking here there is only one value which is beyond 5000 and here All the values ​​below 1000 are there, so it has become a very big outline, which we have to remove and all the outliers will run in the data set, but this is optional. And to perform analysis well, I am giving an overview of how data project is done, so if you want to do better, then you will remove the outlier for sure, so what we do here is to remove the value. will take value below 5000. If we again see describe function then the maximum value in adr is 510 which is much less than 5400 adr is the required column that's why we removed the outliers from this. then how will you perform, what is the thing that you need keeping in mind the problem statement so first of all what would you like to see we will see the amount of reservation which has been canceled and those which have not been canceled, then we will first look at the percentage of the reservation, we will see the count of that in the data set, that means okay, the cancellation is happening in the hotel but What percentage is that means if we look at the data so far from 2015 to 2017 because it is really a major issue that we can find and tackle it even with any normal solution, so first of all we find out the percentage and we have to see the is_canceled column percentage value_counts() function returns the category names and how many times it is present in the columns So, if we pass here normalize = True, then it will return the percentage ok 62.8 percent of the reservation are not canceled and around 37% reservations are canceled 37% is much higher, if it is 10% or 5% then it is manageable but 37% is around you can say half of the reservation that is not canceled so lets visualize it ok like visualization tells more about the findings so what we can do is first will print the percentage I hope you know all the functions try to make the figure small, name the title, title would be like reservation status count we will plot a bar graph because this data which we are going to present gives you categorical and we If we are going to present some count , then what can we do that we can present it in the form of bar graph, then you have to think for yourself that in which form we have to present, what to present, all this is ok so xlabel will be 2 xlabel need two values ​​not cancel and cancel and where our value will come from where will it come from just normalize equals here you have to remove this because we need exact count lets make it customize here You can also see from this that it is more than half so this is a major issue this is a big problem for the hotel, let's see next visualization so now we have got the count now we have got the percentage that how much is canceled Now depending on the hotels lets see whose cancellation and not cancellation ratio is more so we have to find first depending on the hotels so for that we can use count plot let's first set the size of the figure Now sns, we are using the seaborn libraries we need hotel type on x axis, and on y_axis is_canceled as we have to count based on it and where will our data come from df, lets customize it now lets set the legend labels, title of the plot, things like what I am writing here, all this is just to beautify a visualization , so we can show as much data as possible in it, so all this you can play with it with different things to make it more beautiful if we simply make a visualization but it will not give us as much information as possible from the visualization then you can see resort hotel and city hotel if you look at this both of them which is the top of the bar graph there is a lot of difference between these two it means here cancelation is less but in city hotel, cancelation is more as well as ratio is also more than resort hotel what can be the case first of all the price of resort hotel is more than the price of city hotel compared to city hotel. Now we will proof the whole thing further, with the help of other column, more attention should be given to the hotel, more maintenance should be done in the second hotel, why it means that whatever the reasons are for cancellation in the city hotel, we should remove them. May be the case of adr is the most important case in resort hotel for the cancellation but in the city hotel maintenance and other facilities can be a reason of cancellation if we want to see its percentage means from here then count of value is known but how many percentage like how many percentage out of resolve hotel 100% how many percentage are cancel and how many percentage reservation are not cancel for this, we will first filter the data on resort hotels and then in the hotel column only filter the resort hotel around 28% reservation which are cancel but 72% are not getting cancel but if you see this for the city hotel lets change it to city, it has around 42% of the reservation are getting canceled which is very high number So we got this percentage so now let us see if there is any effect of price on the cancellation of resort hotel and city hotel then what we will do first is that we will group the resort hotel on the reservation status date using groupby function because it is like there are many records on the same day, so what will we do with all the records that are there in a day, we will take the value of the average daily rate of that day, then for this we will use group By function we will do it then resort hotel dot group by on which we want to group reservation status date and other we will do it average daily rate ok which we mean assumption created hypothesis created in mind that adr may be the chance of cancelation in resort hotel then we will see Resort Hotel Index and Resort ok so the blue line is representing the resort hotel and the orange line is representing the city hotel so here some data means it is not present in the middle because of this We and here also have something similar till 27 we can see that the data so what does it mean then why these spikes can be considered that these spikes can span ok hai c case pe city hotel ka aage well aage The price of resort hotel gets higher and here the price of city hotel is inqumparision you resolve hotel is less for some time and for some time it is more then this is our hypothesis which we have created in our mind. Proves that the price of city hotel is less than incomparison to de resort hotel for same days and can be holidays and weekends have reservation so in which month we can see maximum reservation and in which month we can see maximum cancellation For that we create figure where we will do it with less plot because we need a group bar chart on meat which represent cancer and not cancel counts or reservation comes so we can compare that in which month reservation highest happened and in which month cancellation will be highest ok so less plot ka use karenge hum to x axis pay we need months nine first of all we don't have any column by month for reservation status date so we converted it to date In time, we can now extract a data from it by means of month, then first of all we will close it and create a column df man and give status will date all the dates that will be there, the months will be returned in it ok so x axis Pay Hume Months Chahiye Data Hamara Aayega Score Reservation Status Par Month Blue bar graph is showing de not canceled reservation and this orange colored bar graph is showing de cancellation reservation Highest cancellation has been performed in January Highest cancellation is in January and lowest 8 month number in August means August and reservation is highest in same month of August and lowest is in December and January so you can assume in a way that January so here if you see we get a little bit of analysis What we are getting inside is a bit conjuring that when our cancellation is the least then our reservation is maximum and when our cancellation is maximum then our reservation is less then what does it mean that there can be a case here In August, whether the price hotel is low because of this means more cancellations or the price is high because people prefer to cancel means more cancellations. Let's plot the average daily rate for each month so the data means that we will create a visualization so what we will do here first we will filter the data frame on this canceled column when there will be van and then we will group by [music] index c will just reset d index ok na aap dekh You can and here also the lowest cancellation is in August and the highest is in January , so this proves our hypothesis, when they are unable to afford higher prices, they make reservations, but at the last moment, some people think that they are not If more money will be spent, then we should not go to that hotel, they will book some other hotel, so this is the problem , let's see, means in which country you compare, you date countries, they will go to do a lot, then we will only get top 10 means ok so let's create cancel data cancel data filter data give country I am sorry returns value like in descending order so we will take only top 10 so those who remain top 10 top 10 those top 10 will be stored in country variable s un series can show like let's create d racer too much done So we will show it like a pie chart so let's set d figure size first nine d nine will see plot dp chart plot top 10 country country.index Portugal country has seen highest cancellation 70% followed by all countries Brazil, France, Italy is very less in all of these, not that much, but in Portugal it is the highest, so if we want to give any suggestion to the hotel, what should they give, that in all the hotels in Portugal country, they should increase the facilities and prices. should be corrected, promotion should be given discounts or campaign should be run, advertisement marketing should be done, but it should be decreased, then this will happen only when you will advertise too much or keep the price low or you will provide a lot of facilities in the same low price. Provide karoge toh chances hai ki ye cancellation c d clients like from where they are coming from or de coming from d online travel agents aur offline travel agents tumhara jo hypothesis tha woh tha d majority of clients r coming from offline travels agents you make their reservation So it is true or not, so let's see, so for this we simply count the value of this project, we have done most of the value cons in it, from where the value is a very good function, which I get a lot of details, a data Set ke jale usse usse karte hain toh market segment tha tha column in which all these things these values ​​are present hai travel agents wale so hum usse karenge and give value to ab dekh sakte ho ki around 56000 customers are coming from de online travel agent And 24000 r coming from d offline direct r coming like 12000 r coming directly on d hotel to make the reservation c can also c d percentage count of this normalization equals to true to see can see here pay around 20% is coming from d offline travel agent, which was our hypothesis, here it is wrong, we thought that offline travel agents send more customers to hotels, but it is not online, as nowadays it is the era of online, if anyone wants anywhere, then first of all they go to online hotel. Sees where online means whatever is the visiting places according to which is the nearest by hotel then he makes his reservation there so most of the customers are coming from online travel agency Lalit c this same count which same percentage Yes, that like reservation has been cancelled, what is there in that, so we had created a data frame, cancel data, so we will perform the same thing here, so you can see here also the cancellations that are happening, that too 47% around 47 cancellations is coming. From d online travel agents this is also a problem that ok reservation is happening online but out of that also 47% of 47 is getting canceled ok so now what can be the possible reasons for this, its reasons can be something like this That's online so let's make reservation but really when we went and saw that hotel was nothing special photos are very good well cast from online travel agent it's very good hotel three star hotel rooms are very good rooms are big enough space The rooms are there, the facilities are good but when we went to the accounts, we saw that there is nothing like this, then the cancellation rate there is very high, so this is its problem according to the facilities and whatever the travel agents have done online . Facilities have been written that nothing like this remains in actual, that's why most of the cancellations happen, so the hotel should take care of this first. They should show the same thing as the type of hotel it is, online also , it will reduce the cancellation rate, or the idea of ​​non-canceled reservation, so let's check it , I have already written the code here . I copy and paste it here for time saving only, so what I have done here, first of all, from cancel df, which is cancel data, here I have done OK function , grouped it for reservation status date and For other we have taken the min let us reset the index and sorted the values ​​so that we can get average daily rate in increasing order date wise same thing we have done with non cancer date off frame also here I create The non-cancellable data that we haven't created yet, so it will be the game's zero, okay , here we are seeing a little bit of the clean, how, and here's also a little bit of the plot that you are seeing. You are seeing this line, its possible reason is because our data is slightly spread, means it gives less in 2015 and even after 2017, we have very less data, which means understand that there is data for one day, then data after next 30 days. If there is next two days bar data, then there is a little inconsistency in the data set, so you can see their distance here, we will filter the data from 2016 to 2017 September and we will plot only that data, what have we done in 2016 We have filtered the data from 2016 to September 2017, here we have canceled and not canceled data for both the flames, so now after this we will plot the plot again, from 2016 till the last we have printed the plot. That's why like according to this plot, this plot is more clear, then it is providing more information, this average title, if it has become a little too small, then we increase it a little, the orange line which is canceled, it is more means in Comparison to average proof hai price ka effect hai cancellation pe so average daily rate is d factor date is mostly influencing d cancellation rates so as you see average daily rate is higher hai cancellation is also high and average daily rate of not canceled hai woh kifi low hai idhar thodi bhot increes hua yahan pe se hua hai again give it this decrease give cancel average daily rate and these spikes you are seeing are nothing it is just in the curve or at the end of the vanth it is something like this It is seen that at the end of the spice month, everyone goes to the hotel, goes for sightseeing, goes to the visiting places, then the charges of the hottest ones come down a bit, then this project is meant to be presented in front of a non-technical person. If yes, then how will you present it in the form of report? Yes, we will make a report of it, which we have prepared in the starting, problem statement, business problem, I min research, after that we have collected the data, after collecting the data, we have cleaned it . Kiya de analysis ko form kiya agar last step is present d analysis present d data set present d whatever findings you have in terms of reports and dashboard then you can also create a dashboard but I will make reports here if you understand He is a non technical person, we have to present the finding, he will not understand this code, he needs some analysis in text format , videorarization, so for that I made a report here and after that we are given some suggestions in the last cancer and dose date. R Not in the same way, whatever paragraph sorry, whatever visualization you put, it should have a perfect explanation in your report. You must have heard the name Storytelling with Data, what is Storytelling with Data, it is nothing, we call it a report, what I am explaining to you, what I am doing , you should explain a story Hoon jo hume nikla hai ek data set pay and ans ko form kar to ye ek kind story it is obvious date significant number of reservation date have not bin cancel date is around 63% of 64% there are 37% of clients Cancel their reservation which has a significant impact on the hotel earning, so this time I have shown you that what the bar graph is representing, I have explained that the bar graph which is not canceled is more. But such is that there are still a lot of reservations which are not getting canceled but 37% of clients still cancel the reservation and which is not a very small number which we can neglect 37% higher so it is very like it Is impacting de hotel earning significantly if next reservation reservation status of people is clear from the graph it was possible resort hotels are more expensive give dose in cities so what we came to conclusion from this graph that why bookings are less in resort hotels here It is a possible reason that the price here is very high, which we had also proved from this graph, so you can see from this line graph that the price of city hotel here is less in comparison to resort hotel. We have also written about it here ascertain des the average daily rate for a city hotel is less than the date of a resort hotel and on the other is it it and less a goal without saying date [music] rise in resort hotel rates to ye jitne There are also spikes as we concluded that spice r basic status common c have developed d group bar draft analyze with d highest and lowest reservation levels according to reservation status s can be c bodh number of confirmed reservation and d number of cancer reservation highest like date is not cancer but in January reservation which are getting canceled give highest and its special Too low cancellation and high price leads too more cancellation to this bar graph demonstrate date cancellation or most common when prices are greatest and art list common when dry when they are do waste there for d cost of accommodation is solely responsible for d cancer first research Our question is complete here which are the factors that affect reservation cancellation like most important factor is d price with d highest number of cancellation date is 70% let's check d area from where d guest are visiting d Hotel and making reservation is it coming from direct or group online or offline travel agents so around 46% of de clans come from de online travel agencies where is 27% comes from growth and only four person clients book hotel directly by visiting them and making reservation nine d last of shows reservation so this last bar graph completely proofs our research question our hypothesis here at last we have given some suggestions for hotel so first cancellation order to prevent cancellation of reservation hotel jump work on their pricing Strategies and try to lower d rates for specific hotel based on location discounts to d consumers aaj d ratio of cancellation and not cancellation of d result hotel d city hotel hotel reasonable discount in d room prices on weekends or on holiday d month off Jan Hotel Caen Start campaign and marketing with every reasonable amount to increase de revenue s de cancellation is de highest in this month and de last it de can also increase de quality of de hotel and their services manly in portugal to reduce de cancellation rates so here are some suggestions What we have given to hotel is based on data analysis data we have for on data set so this is complete project, in this way you have to create project, first of all you have to do data analysis of complete project, create problem statement then find data. karna hai collect karna hai dene usko explore karna hai clean karna hai after than you have to analyze you get useful inside and then you have to present these sides in terms of repos or dashboard and you can also come winner so here it is project project steps project If you can modify, then definitely tell me in the comment section that how did you like this project step by step, how did I explain and if I have missed any step or you want to know something else, the steps of the details projects I can tell that also in the comment section and if you modify this project then please give the link in the comment section so that I can see how innovative history creative history you are engaged in this project to make it better. For if you like this video, please subscribe the channel, like it, don't tell me how you liked the video in the comment section and share it with your friends, thank you

Read More
Please wait...
User Balance 310 / coins
Crypto Newbie

User Badges

Media

Top