Wednesday, June 5, 2019

Data Mining Techniques in Airline Industry

entropy minelaying Techniques in airline business IndustryPurpose and ScopeAll around the world, the airline industry could be train forth in a couple of(prenominal) words, which is intensely competitive and dynamic. The airline industry generates billions of dollars every year but still has a cumulative profit margin of less(prenominal) than 1%1. Many Airlines atomic anatomy 18 trying to recover from deep debt. The reasons for these be multifold- fuel prices, high cyclicality and seasonality, fierce competition, high fixed costs and legion(predicate) new(prenominal) issues related to security and passengers prophylactic.To ensure for the best economic outcome, Airline companies are trying with their roughly creative asset data. Data recitationd in society with data digging techniques tout ensembleows comprehensive intelligent management and decision-making system. Achieving these benefits in a timely and intelligent manner may wait on in responseing lower run co sts, better customer service, market competitiveness, increased profit margin and shareholder value gain.This purpose of this paper is to demonstrate the applications of data mining techniques on ninefold aspects of airline business. For example, to predict the number of national and international airline passengers from a particular(prenominal) city/airport, to dynamically price the tickets depending on seasonality and demand, to explore the frequent philippic database to prepare for CRM implementation, to makes the operational decisions about catering, personnel, and gate traffic flow, to assist the security agencies for secure and safe courses for the passenger specially after 9/11 incident. look to the Number of Passenger by applying Data Mining TechniqueForecasting is critical to any business for planning and revenue management, especially in the Airline industry, where a lot of planning is required to buy/lease new aircrafts, to hire crew members, to find the new slots in busy airports and to get the approvals from many gentle wind authorities.In the case of Air travel, lot of seasonality and cyclicality involved. Passengers are more likely to fly to some destinations based on the time of the year. Business travelers are likely to travel weekdays than weekends. Early morning and evening leaks are desired by business travelers who want to accomplish a days work at their destination and return the kindred day.To forecast the number of passenger, artificial neural meshwork (ANN) croupe be use. The purpose of a neural network is to run into on to recognize patterns in a given data. Once the neural network has been trained on samples of the given data, it can make predictions by detecting equal patterns in future data.The growth factors which might influence the air travel demand depend on several things. Mauro Calvano2 in his issue of transport Canada aviation forecast 2002-2016 considered 12 major socio-economic factors as followsGDPPersonal Disposable incomeAdult PopulationsUS economic OutlookAirline YieldFleet/ highroad structure/Average Aircraft SizePassenger Load factorsLabor cost and productivityFuel cost/Fuel efficiencyAirline cost opposite than Fuel and LaborPassenger Traffic Allocation AssumptionsNew technologyFactors 1 to 5 are related demand side of the forecastFactors 6 to 10 are related to operations and supply sideFactors 10 and 11 represent the structural changesThis historical data is called the estimation set. A portion of the overall available data is reserved for validating the accuracy of the developed forecast fashion model. This reserved data set is called the forecasting set because no learning contained in it is used in any form during the development of the forecast model. The data in the forecasting set are used for testing the unc oil coloured extrapolative properties of the developed forecast model. The estimation set is further divided into a training set and a testing set. Information in the training set is used directly for the determination of the forecast model, whereas knowledge in the testing set is used indirectly for the same purpose.Figure1 Forecasting Process ModelFor a given ANN architecture and a training set, the basic mechanism behind about supervised learning rules is the updating of the weights and the bias terms, until the mean squared defect (MSE) among the output predicted by the network and the desired output (the target) is less than a pre-specified tolerance.Neural networks are can be represented as layers of working(a) bosss. The most general form of a neural network model used in forecasting can be written asY = F H1 (x), H2 (x), . , Hn (x)+ uWhere, Y is a dependent or output variable,X is a set of input/ influencing variables,F Hs are network functions, and u is a model error.This input layer is connected to a hidden layer. Hs are the hidden layer nodes and represents different nonlinear functions. Each node in a layer receives its input from the preceding layer through and through link which has weights assigned, which get adjusted exploitation an appropriate learning algorithm and the information contained in the training set.Figure2 ANN ArchitectureAbdullah Omer BaFail3 did the study to forecast the number of airline passenger in Saudi Arabia. He selected the most influencing factors to forecast the number of domestic passengers in the different cities of Saudi Arabia. For Dhahran he selected factors like Oil gross domestic product for last 6 years, private non-oil gross domestic product, signification of goods and services for last 10 years, and population size for last 2 years.The domestic and international demonstrable and forecasted number of passengers for the city of Dhahran for the years 1993 through 1998 is shown below. Forecasts underestimated the actual travel. The Mean Absolute Percentage Error (MAPE) for domestic travel is about 10%, spot for international travel is about 3%.Figure3 Forecast ing results from Abdullah Omer BaFail3The take away from the Abdullah Omer BaFail3 for me is that the efficient forecasting model can be invented using ANN if we using the right influencing indicators.In this study some indicators which influence are oil gross domestic product and per capita income in the domestic and international sectors. In view of the fluctuating nature of the passenger usage of airline services in Saudi Arabia, certain suggestions were made. Most of these recommendations were in order to improve the flexibility of the system to the fluctuations in demand and supply. Hub and spike model was also suggested as solutions in certain sectors to increase the flexibility in adjusting their capacity allocations across markets as new information about demand conditions become available. drill of Data Mining technique to predict the Airline Passengers No-show RatesAirlines overbook the charges based on the expectation that some percentage of booked passengers forget not show for each(prenominal) flight. Accurate forecasts of the expected number of no-shows for each flight can increase airline revenue by reducing the number of perishable position (empty seats that might new(prenominal)wise hand been sold) and the number of involuntary denied boardings at the departure gate. Typically, the simplest way is to go for average no-show rates of historically convertible flights, without the use of passenger-specific information.Lawernce, Hong, Cherrier4 in their research paper predicted the no-show rates using specific information on the individual passengers booked on each flight.The Airlines offer ninefold fares in different fight class. The number of seats allocated to each booking class is driven by demand for each class, such that revenue is maximized. For example, few seats can be kept on hold for the last-minute travelers with high fares and number of seats sold in lower-fare classes earlier in the booking process. Terms and conditions of c ancellation and no-show also vary in each class.The no-shows results in lost revenue if the flight departs with empty seats that might otherwise have been sold. Near accurate forecasts of the expected number of no-shows for each flight are very much desirable because the under-prediction of no-shows leads to loss of potential revenue from empty seats, spell over-prediction can produce a significant cost penalty associated with denied boardings at the departure gate and also create customer dissatisfaction.In the simplest model, the overbooking limit is taken as the capacity plus the estimated number of no-shows. Bookings are offered up to this level. No-shows numbers are predicted using time-series rules such as taking the seasonally weighted lamentable average of no-shows for previous instances of the same flight.Figure4 No-show trend over days to departureSource Lawernce, Hong, Cherrier4The simple model does not take account of specific characteristics of the passengers. Lawer nce, Hong, Cherrier4 in his study used classification method, similarly Kalka and Weber5 at Lufthansa used induction trees to compute passenger-level no-show probabilities, and compared their accuracy with conventional, historical-based methods. I tried to summarize Lawernce, Hong, Cherrier4 nest and results briefly below.Whenever a ticket is booked the Passenger Name Records (PNRs) is generated and all the passenger information is recorded. The PNR data includes, for each passenger, specifics of all flights in the itinerary, the booking class, and passenger specific information such as frequent-flier membership, ticketing status, and the agent or channel through which the booking originated. Each PNR is also specified whether the passenger was a no-show for the specified flight.In the simplest model the mean no-show rate over a group of similar historical flights is computed. The mean in turn used to predict the number of no-shows over all booking classes.The passenger-level model given by can be implemented using any classification method capable of generating the normalized probabilities. The PNR records are partitioned into segments, and separate predictive models are developed for each segment. In the passenger-level modeling we characterize each using the PNR details. Let Xi i = 1..I denote I features associated with each passenger. Combining all features yields the feature vectorX = X1Xi Each passenger, n = 1.N, booked on flight m is represented by the vector of feature valuesxmn = xmn, 1 xmn, i.. xmn, I We know the predicted no-show rate from the historical model it is assumed the passenger inherits the no-show rate. The passenger level predictive model is then stated as follows given a set of class labels cmn a set of feature vectors xmn and a cabin level historical prediction mhist predict the output class of passenger n on flight mP(C = cmn mhist , X= xmn )We are specifically interested in the no-show probability, cmn = NS, and write this probabil ity in the simplified formP(NS mhist , xmn )The number of no-shows in the cabin is estimated as P(NS mhist , xmn )The summing of probabilities for each passenger in the cabin, gives no-show rate for the cabin. An analogous get on can also be used to predict no-show rates at the fare-class level.Lawernce, Hong, Cherrier4 compare results computed using the historical, passenger-level, and cabin-level models. The models were built using approximately 880,000 PNRs booked on 10,931 flights, and evaluated against 374,900 PNRs booked on 4088 flights. The figure shows a conventional lift curve computed using the three different implementations of the passenger-level model.Figure 5 Gain ChartsSource Lawernce, Hong, Cherrier4Each point on the lift curve shows the fraction of actual no-shows observed in a sample of PNRs selected in order of decreasing no-show probability. The diagonal line shows the baseline case in which it is assumed that the probabilities are drawn from a random distribu tion. The three implementations of the passenger-level model identify approximately 52% of the actual no-shows in the first 10% of the sorted PNRs.This is one of the way the Airlines can integrate data mining models incorporating specific information on individual passengers can produce more accurate predictions of no-show rates than conventional, historical based, statistical methods.Application of Data Mining technique to Strategies Customer Relationship ManagementIn the current time most of the industries using frequency marketing programs as a scheme for retaining customer loyalty in the form of points, miles, dollars, beans and so on. Airlines are a big fan of this Kingfishers Kingmiles, Jet Airways Jet Privilege, American Airlines AAdvantage, Japan Airlines gasoline mileage Bank, KrisFlyer Miles etc. they all seemed to have carved their own identities.Frequent Flyer Program presents an invaluable opportunity to gather customer information. It helps to understand the behav ioral patterns, break new opportunities, customer acquisition and retention opportunities. This helps Airlines to identify the most valuable and the appropriate strategies to use in developing one-to-one relationships with these customers.The objective of data mining application over the frequent flyer customer data could be many, but ideally it is as followsCustomer segmentationCustomer satisfaction psychoanalysisCustomer activity analysisCustomer retention analysis rough of the examples in each category areClassify the customers into groups based on sectors most frequently flown, class, period of year, time of the day, purpose of the trip.Which types of customers are more valuable?Do most valuable customers receive the value for money?What are the attributes and characteristics of the most valuable customer segments?What type of campaign is appropriate for best use of resources?What are the opportunities to up-selling and cross-selling, for example hotel booking, upgrade to next class, credit card, etc.Design packages or grouping of services Customer acquisition.Yoon6 designed a database knowledge discovery process consisting of five tones selecting application domain, target data selection, pre-processing data, extracting knowledge, and interpretation and evaluation. This study refers to the Yoon process to deal with three mining phases, including the pre-process, data-mining, and interpretation phases for airlines, as illustrated in figure below.Figure 6 database knowledge discovery processSource Yoon6Some straightforward solution can be implemented that can also be scaled-up in future like K-means, Kohonen self-organizing networks and classification trees.In the case of K-means algorithm, it is use on customer data, assigning each to the closest existing cluster center. The K- means model is run with different cluster number until K-means clusters are sound separated.In the case of classification trees (C5.0), we derive a simple rule set to uniquely c lassify the complete database. Again, we have to generate the attributes, resulting from the sequence of flight segments. The accuracy of the forecast for each segment is provided by balancing the training set according to equally sized clusters. We regulate the number of subsequent rules, while determining a minimal numbers of records given deep down each subgroup.Maalouf and Mansour7 did the study based on 1,322,409 customer activities transactions and 79,782 passengers for a period of 6 years. They prepared Data based on Z-Score Normalization and ran the multiple queries and transformed the data to create the crew input records. They used K-means and O-Cluster algorithms. The result generated by clunk provides customer segmentation with respect to important dimensions of customers needs and value. The table below is the result is a summary of the profile produced by k-means clustering that includes revenue mileage, number of services used, and customer membership period.Figure 7 Clustering result on Airline Customer DataSource Maalouf and Mansour7The results generated by k-means clustering are used as a basis for the association rules algorithm. Two different scenarios have been applied. The first scenario is based on Financial, Flight, and Hotel activities with 1,896 records. The second scenario is based on the flight activities especially the sectors, with 1,867 records.Figure 8 Association rules for best customer activitiesSource Maalouf and Mansour7Some of the take way from Meatloaf and Mansour7 study.Clustering using k-means algorithm generated 9 different clusters with specific profile for each one.From the cluster analysis it can be found which are the best customer clusters (higher mileage per passenger) than other clusters. want a retention strategy for these clusters.Cross Selling strategies can be formulated between the clusters (for example between 15 and 11 13 and 17 because they are close in services value.The cluster analysis provides an opportunity for the airline to produce more revenue from a customer. For example, the airline could apply an up-selling strategy by selling a higher fare seat depending on the clusters.From the cluster analysis Airline may adopt an enhanced strategy for customers in clusters in order to increase services usage and revenue mileage per passenger.Plan for marketing campaign or special offers by analysis through association rules, for example, the customers using the Flight and Financial services never use the Hotel Services and the customers using the Flight and Hotel services never use the Financial Services.By analyzing the services used in different clusters, Airline can characterize services integration. It enables the airline to serve a customer the way the customer wants to be served.Application of Data Mining Application technique to understand the Impacts of Severe WeatherSevere hold up has major impacts on the air traffic and flight delays. Appropriate proactive strategies fo r different severe- live on days may result in improvement of delays and cancellations. Thus, understanding en-route weather impacts on flight performance is an important step for improving flight performance.Zohreh and Jianping8 in their study proposed a framework for data mining approach to analysis of weather impacts on Airspace system performance. This approach consists of three phases data preparation, feature extraction, and data mining. The data preparation phase includes the usual process of selection of data sources, data integration, and data formatting.Figure 9 Framework proposed by Zohreh and Jianping8He used three data sources Airline Service Quality Performance (ASQP), Enhanced Traffic Management System (ETMS), and topic Convective Weather Forecast (NCWF) supplied by National Center for Atmospheric Research. He used NCWF data from April through September 2000 to represent the severe weather season.These data-sets include the scheduled and actual departure and arrival times of each flight of ten reporting airlines, tail number, wheels off/on times, taxi times, cancellation and diversion information, be after departure and arrival times, actual departure and arrival times, planned flight routes, actual flight routes, and cancellations, flight frequencies between two airports, intended flight routes between two airports, flight delays, flight cancellations, and flight diversions.The image segmentation phase resulted in a set of severe-weather regions. Then for each of these regions, a set of weather features and a set of air traffic features are extracted. A day is described by a set of severe-weather regions, each having a number of weather and traffic features.As a result of this study it was found that there is strong correlation of block flights, of pestilential weather regions, bad weather airports, blocked distance, bad weather longitude, by pass distance, bad weather latitude, of bad weather pixels with flight performance.Similarly the c lustering algorithms (like K-means) can be applied. The expectation is that the same clusters have similar weather impacts on flight performance. Zohreh and Jianping8 generated clusters for the entire airspace It was found that a cluster with worse weather almost always had bad performance. The clusters with large percentage of blocked flights, bypass distance, and blocked distance had a worse performance. These results were promising and showed that days in a cluster have similar weather impacts on flight performanceOther data mining approach which can be applied is Classifications. Application of Classification can help us discover the patterns/rules that have significant impact on the flight performance. Discovered rules may be used to predict if a day is a good or a bad performance day based on its weather. For exampleRule for Goodif %BlockedFlights and BypassDistance then Good (n, prob)There can be different ways where we can apply data mining approach to analysis of weather im pact on airline performance. It seems to be that results obtained from clustering and classifications were very meaningful for airline and passengers to plan ahead.Application of Data Mining techniques to ensure safety and security of Airlines passengerThe reaction of the terrorist attack on 26/9 and 11/9 resultant in increaseSecurity at airports It ends up allowing only ticketed passengers past the security gates, screen carry-on luggage more carefully for possible weapons. The question is whether these steps could have avoided the attacks, the people involved in the attack had legitimate tickets, and carrying box cutters and razor blades (like in any other normal person would do).The uncommon was the combination of their characteristics, like none were U.S. citizens, all had lived in the U.S. for some period of time, all had connections to a particular foreign country, all had purchased one-way tickets at the gate with cash.With the amount of data available about the passenger dur ing ticketing, the can be reviewed to characterize relevant available passenger information. minded(p) a passengers name, address, and a contact phone number, various data bases (public or private) can identify the social security number (SSN), from which much information will be readily available (credit history, police record, education, employment, age, gender, etc.). Since there is large number of characteristics available on both individual passengers, it will be important to identifying signals within the natural variability or noise. If predicted wrong, this may lead to either falsely detaining an innocent passenger or failing to detain a plane that carries a terrorist.The airlines already collect much data on various flights. When the data come in the form of multiple characteristics on a single item, beta tools for multivariate data can be applied, such as classification, regression trees, multivariate adaptive regression splines/trees. The security of the air transportat ion can be improved substantially through modern, intelligent use of pattern recognition techniques applied to large linked databases.Similarly Data mining techniques can be used for the Safety of the passenger. An air safety office plays a key role in ensuring that an aviation organization operates in a safe manner. Currently, strain Safety offices collect and analyze the incident reports by a combination of manual and automated methods.. Data analysis is done by safety officers who are very familiar with the domain. With Data mining one can find interesting and useful information hidden in the data that might not be found by simply tracking and querying the data, or even by using more sophisticated query and reporting tools.In a study done by Zohreh Nazeri, Eric Bloedorn, Paul Ostwald10 it was found that finding associations and distribution patterns in the data, bring important inside. The other finding is Linking the incident reports to other sources of safety related data, suc h as aircraft maintenance and weatherdata, could help finding better causal relationships.SumMRryBusiness Intelligence through efficient and appropriate Data mining application can be very useful in the Airline industry. The Appropriate action plans from the data mining analysis can result in improved customer service, help generating considerable financial lift and set the future strategy.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.