Pacific B usiness R eview I nternational

A Refereed Monthly International Journal of Management Indexed With THOMSON REUTERS(ESCI)
ISSN: 0974-438X
Imapct factor (SJIF): 6.56
RNI No.:RAJENG/2016/70346
Postal Reg. No.: RJ/UD/29-136/2017-2019
Editorial Board

Prof. B. P. Sharma
(Editor in Chief)

Dr. Khushbu Agarwal
(Editor)

Ms. Asha Galundia
(Circulation Manager)

Editorial Team

Mr. Ramesh Modi

A Refereed Monthly International Journal of Management

Performance Pattern of PGDM Students Using Two Step Clustering

Author

Jaya Srivastava (Ph. D. Scholar,

Department of Computer Science,

Jaipur National University Jaipur INDIA,

email jsrivastava29@gmail.com,

mobile 9451478982)

Abhay K Srivastava* Assistant Professor,

Department of Decision Sciences,

Lucknow, INDIA, email abhay.srivastava@jaipuria.ac.in.

mobile 9454703657).

ABSTRACT

Clustering is the first step to discover knowledge from the databases. It provides convenience to researchers who are willing to extract hidden information from databases. It simply groups data showing similar characteristics in order to bring homogeneity within it and heterogeneity among other clusters. That is why it is termed as unsupervised learning since it is not pre-dictated by certain set of rules and boundaries. Its achieves simplification in datasets by bringing it to fewer clusters but at the cost of losing some minor details due to compression of data.

Through SPSS one can easily perform clustering on sufficiently large data sets. There are three different methods in SPSS to group data without any supervision: hierarchical clustering method, k-means clustering method, and two-step clustering method. With a large data file or where there are mixed set of variables, continuous and categorical, SPSS two-step clustering procedure offers a convenient method to interpret clusters which is a major concern in hierarchical and k-means clustering. In this paper we have demonstrated the use of two step clustering by taking academic records of PGDM students in two consecutive sessions to discover some hidden information. The objective of the study is to discover some useful pattern about students’ performance in B Grade business schools that can help stakeholders to formulate strategies to improve its quality of intake students.

With the use of this procedure we are able to extract easily some interesting information that can be used by academic institutions to design its academic and training program to improve students’ performance as well as its brand value.

Keywords: Two step clustering, Academic performance, two way ANOVA, Hierarchical Clustering, K-means Clustering, PGDM (Post Graduate Diploma in Business Management).

Introduction

Understanding performance pattern of its students is a critical issue for any institute in today’s competitive era. If an institute fails to map the performance pattern, it becomes very difficult for it to survive in the long run. This is more crucial for professional institute.

With the use of simple techniques of Data Analysis, one can easily explore this pattern.

Clustering Technique is one of them which is very useful to explore and segment students showing better results from those showing poor results in their performance. A cluster analysis is simply a process of partitioning data in such a manner that groups thus generated appears identical from outside. Cluster analysis tries to maximize within-group homogeneity and between group heterogeneity.

The decision of selecting a suitable clustering procedure is based on number of cases and type of variable. SPSS has three different procedures for clustering data. These are: Hierarchical cluster analysis, K-means cluster and two step clustering.

In hierarchical clustering, a suitable statistic (mean or median) is selected to quantify the distance between two cases. Then a “bottom up” or “top down” method is chosen to form the groups. Since there can have as many clusters as the number of cases, determiningthe number of clusters is a crucial issue which depends on the type of linkage used to measure similar groups .

In k-means clustering, the number of clusters to be formed is given initially. The algorithm iteratively estimates the cluster means and assigns each case to the cluster for which its distance to the cluster mean is the smallest. Lastly, in two-step clustering, cases are assigned to “pre clusters.” In the second step, the pre clusters are clustered using the hierarchical clustering algorithm. The number of clusters can be specified by the user or it may be decided by the algorithm itself.

Limitations in Hierarchical and K-means Clustering Technique

The issue of deciding number of clusters is very critical decision in cluster analysis. Unfortunately, hierarchical methods does not provide enoughsupport in taking such decisions. It only provides some useful support related distances as a criterion for combining various objects.We can use the gradient of the scree plot in deciding cluster numbers.But since it is not very objective decision, such decisions may be erroneous.

K-means clustering technique can be used only with large datasets on interval or ratio scaled data. The major problem with K-means algorithms is the pre-specification of the value of k before running it on SPSS. This is a critical decision in K means clustering which is not very easy to determine. Hence many times hierarchical procedure is used first to determine the number of clusters and k-means afterwards.

For metric data, we can easily use either Hierarchical or Partitioning Clustering techniques but in case of mixed type of data sets, these algorithms create problem(Shih, Jheng & Lai 2010). In order to handle this, two-step clustering is an appropriate approach.

A Two-Step Clustering

Two-step cluster analysis was developed by Chiu, Fang, Chen, Wang and Jeris. (2001) to handle the problem of mixed variables (when different measurement scales are used for different variables). If for example, one variable is on real scale while other is on nominal scale, both K-means and Hierarchical fail to perform clustering. Two step clustering can easily be applied in large data sets. Data analysis can be done in one go (which is important for very large data files). It produces a varying number of clusters based on mixed variable set including categorical as well as continuous variables. It follows two stagesin performing overall clustering. The first stage is the pre cluster stage. Pre-cluster step is used first to scale down number of clusters to initial-clusters, which is far less than the number of original data points, traditional clustering methods can be used effectively and in second step grouping of sub-clusters is done using two options for calculating the goodness of fit. These are Akaike’s information criteria (AIC) and Bayes information criteria ( BIC) respectively which are based on the likelihood model. In AIC, what is obtained is a constant plus the distance between the actual but unknown likelihood function of the number of clusters that actually exist in the population with the fitted function of the model. BIC is based on the posterior probability of the model being true under certain Bayesian conditions.The algorithm of two step clustering is based on two steps (Shih, Jheng & Lai 2010):

Step1: It uses co-occurrence concept where categorical variable are converted into continuous values. It explores the relationship to find the similarity between pairs of objects to create sub-clusters. The assumption behind the co-occurrence is that if two items always occur together in one object, a strong similarity must between them.

Step 2: It takes pre-clusters as input and then groups them into the desired number of clusters based on AIC or BIC methods. Hierarchical clustering method, where clusters are merged in a recursive manner, is used very effectively in this stage (Fraley, 1998). The merger of the cluster takes place by considering the proximity and this process continues till all clusters are merged.Since the merger is done in a recursive manner, it is easy to compare solutions with different numbers of clusters.

Literature Review

N.Sivaram (2010) surveyed clustering applications and classification algorithms for recruitment data mining techniques that fit into the problems which were determined. The study applied several data mining techniques like K-means, fuzzy C-means clustering and decision tree classification algorithms to the recruitment data of an industry.

Md.Hedayetul Islam Shovon (2012) predicted of student academic performance by applying K means clustering algorithm. Class quizzes, mid and final exam marks were used as an evaluation parameter. The study aimed to help the teachers to reduce the drop out ratio to a significant level and improve the performance of students.

Oyelade, O. J (2010) predictedthe Students’ Academic Performance using k-means clustering. He reviewed different clustering techniques that could be applied for educational data mining to predict academic performance of students and its implications.

  1. Kabakchieva (2013) found patterns in the student data that could contribute to predicting Bulgarian student performance at the university based on their personal and pre-university attributes.

Steinmayr and Spinath (2009) examined the extent of contribution of motivational concepts on the achievement of teen age students and predicted their success.

Farooq et al. (2011) used t-test and ANOVA to investigate the effect of different factors like gender, age, nationality etc. on students’ achievement. They usedvarious learning methods to measure the effect on their academic scores.

Wilson & Hardgrave (1995) found that the classification techniques such as discriminant analysis or logistic regression are more appropriate than the multiple linear regression to predict academic success or failure of students.

Kumar and Vijaylakshmi (2011) used decision tree to predict the result of student final exam. The aim of the study is to assist professors to identify students who needed help, in order to improve their performance and clear the exams.

Erdogan and Timor (2005) studied the relationship between student’s university entrance examination results and their success using cluster analysis and k-means algorithm techniques. The study seeks to determine patterns of academic success and failure for students, thus predicting the likelihood of dropping them or having poor academic performance.

Ogor (2007) developed a framework by the performance prediction indicators by deploying a simple student performance assessment and monitoring system within a teaching and learning environment, focusing on performance monitoring of students, continuous assessment (tests) and examination scores in order to predict their final achievement status upon graduation

Sembiring etal. (2011) used data mining techniques to analyze how student success is related to their behavior by the kernel method and developed the model of student performance predictors.

Dietz-Uhler and Hurn (2013) defined learning analytics in educational institutions, available learning analytics tools, and how faculty can make use of data in their courses to monitor and predict student performance. They also discussed several issues and concerns with the use of learning analytics in higher education.

Delavari etal. (2008) presented the educational data mining capabilities in the context of higher educational system. They proposed an analytical guideline for higher education institutions to enhance its decision making processes. They also applied data mining techniques to discover new hidden knowledge which could be useful for the decision making processes in higher education.

Research Methodology

As already discussed, SPSS two step clustering provides an easy mechanism of examining the clusters. In order to show the effectiveness of two step clustering in extracting some useful information, we have taken students records of two successive academic years. From the two Business Schools of B category, data was taken from the placement department to conduct this study.Student’sPGDM marks, used as a dependent variable, were recordedfor two consecutive years ( 550 students in Academic year 2014-16 and 572 students in academic year 2015-17) Student academic performance in 10th, 12th, Graduation were recorded for further analysis. Two Independent variables Gender and background informationwere also used of these students since these two variable are found contributing significant importance in analyzing some meaningful insight. This was verified using two way ANOVA (Yahya Al-Nakeeb 2015). The output of this test is also shown below. Two step clustering using SPSS 13 was done to discover some interesting patterns of student’s performance.Same College student datawas used for both the years for studying patterns.

The major objective is to extract, through the use of two step clustering, meaningful patterns of students’ performance in a professional courses like PGDM. Another objective is to find out do background or gender play significant role in the performance pattern of PGDM students. Since students come from different background and gender, so three Hypothesis have been framed for the analysis.

H1: Gender plays an important role in the performance of PGDM students

H2: Background of PGDM students in their undergraduate level impacts on their performance.

H3: Students with commerce/ management background perform better in PGDM program.

H4: Irrespective of background, students having good track record in 10th, 12th, and graduation perform better in PGDM also

The first two hypotheses will be validated using two way ANOVA but for remaining two hypothesis, H3 & H4, some useful performance pattern will be explored and found by using two step clustering.

Discussions

Before proceeding to conduct two step clustering, it is important to decide those categorical variable that played significant role in finalizing the clusters. Profile plots using two way ANOVA is used to analyze the impact of variables gender and background shown in table 1&2 and figure1& 2 respectively.

Table No. 1: Descriptive Statistics of batch 14-16

Dependent Variable: PGDM Marks

Gender

Background

Mean

Std. Deviation

N

Female

Commerce

71.0171

7.97893

138

Management

70.9343

10.35156

60

Science

79.0750

5.99581

8

Arts

62.5360

4.50871

10

Technical

75.4191

9.20052

44

Total

71.6647

9.02619

260

Male

Commerce

65.3552

8.53346

122

Management

67.1710

7.98515

82

Science

65.2250

13.50438

12

Arts

60.5140

5.91358

10

Technical

71.9888

6.67838

64

Total

67.1603

8.62179

290

Table No.2: Tests of Between-Subjects Effects(Two way ANOVA)

Dependent Variable: PGDM Marks

Source

Type III Sum of Squares

Df

Mean Square

F

Sig.

Corrected Model

7140.537(a)

9

793.393

11.208

.000

Intercept

967524.988

1

967524.988

13668.050

.000

Gender

1680.832

1

1680.832

23.745

.000

Background

3715.194

4

928.798

13.121

.000

Gender * Background

564.572

4

141.143

1.994

.094

Error

38225.167

540

70.787

Total

2685945.243

550

Corrected Total

45365.704

549

From the above table, both gender and background variables are producing significant effect but the interaction between gender and background is not significant (.094). If we observe figure 1, it appears that there is an interaction effect between gender and background as there is a huge difference in the mean marks of males and females for science background students. But since the sample size in science and arts background is very small, hence it is not creating any significant interaction effect in the ANOVA model.

Figure No.1: Plots on marginal means of PGDM marks considering gender and background.

Similarly we can analyze for batch 15-17 also. Both gender and background are significant i.e. average performance of PGDM students depends both on the gender as well as background. The interaction between gender and background is also causing a significant impact as shown in table 4. As shown in figure 2 performance of male and female vary remarkably in the case of commerce and management background. This behavior is also same with technical background students

Table No. 3: Descriptive Statistics of batch 15-17

Dependent Variable: PGDM Marks

Gender of Students

Background of Students

Mean

Std. Deviation

N

Female

Commerce

71.1211

10.37952

142

Management

62.4107

9.04535

30

Science

72.3086

12.46124

14

Arts

69.7700

6.81225

16

Technical

77.8092

7.63355

52

Total

71.4419

10.49654

254

Male

Commerce

67.4697

6.57465

154

Management

66.0118

7.25925

68

Science

65.7956

8.56755

18

Arts

68.9067

2.42898

6

Technical

72.2644

9.30590

72

Total

68.1759

7.79756

318

Table No. 4: Tests of Between-Subjects Effects (Two way ANOVA)

Dependent Variable: PGDM Marks

Source

Type III Sum of Squares

Df

Mean Square

F

Sig.

Corrected Model

7835.236(a)

9

870.582

11.986

.000

Intercept

1067888.126

1

1067888.126

14702.399

.000

Gender

373.205

1

373.205

5.138

.024

Background

6009.157

4

1502.289

20.683

.000

Gender * Background

1250.795

4

312.699

4.305

.002

Error

40820.082

562

72.634

Total

2821600.447

572

Corrected Total

48655.318

571

Figure No.2: Plots on marginal means of PGDM marks considering gender and background.

In coming sectionsA & B, the output of two step clustering is discussed to study more on the performance patterns of PGDM for academic batch 2014-16 and 2015-17 respectively.

  1. Two step Clustering Output of SPSS for PGDM batch 2014-16

After deciding on the variables that are used in finalizing the clusters, we now examine the number of cases in the final cluster solution using two step clustering as shown in Tables given below. Four clusters are formed. The largest cluster has 33.2% of the clustered cases, and the smallest has 19.7% as shown in the table below. All clusters are of approximately significant size.

Table No.5Cluster Distribution

N

% of Combined

% of Total

Cluster

1

136

24.8%

24.7%

2

182

33.2%

33.1%

3

108

19.7%

19.6%

4

122

22.3%

22.2%

Combined

548

100.0%

99.6%

Excluded Cases

2

0.4%

Total

550

100.0%

Now the following table 6represents entire summary statistics of all four clusters for different level of classes that includes 10th (High School) 12th (Intermediate), Graduation and PGDM

Table No.6: Summary Statistics of all four clusters

Highschool

Intermediate

Graduation

PGDM

Mean

S.D.

Mean

S.D.

Mean

S.D.

Mean

S.D.

Cluster

1

69.3694

9.74142

70.5890

9.14710

62.0897

7.41374

71.1596

7.94958

2

66.5897

10.27900

66.1419

9.60194

66.7344

8.94679

68.1862

9.61742

3

73.3719

9.46423

70.9148

8.77387

69.1326

7.53547

73.3863

7.94643

4

65.9933

8.92106

65.8521

8.57972

57.6325

5.70008

65.3552

8.53346

Combined

68.4834

10.04976

68.1217

9.38115

64.0280

8.71692

69.3187

9.09412

If we examine the mean and standard deviation of each cluster, the combined average of PGDM is 69.31, which is best among all four classes. If we compare cluster wise average, cluster 3 shows the highest mean in all the four classes.

From table 7, Cluster 2 &3 consist of both males and females. Cluster 1 consists of females while cluster 4 consists of male only. The number of females are 258 while number of males are 290. Hence the ratio of males and females is almost equal.

Table No. 7:Clusters Based on Gender

Female

Male

Frequency

Percent

Frequency

Percent

Cluster

1

136

52.7%

0

.0%

2

78

30.2%

104

35.9%

3

44

17.1%

64

22.1%

4

0

.0%

122

42.1%

Combined

258

100.0%

290

100.0%

Now if we observe clusters on the basis of graduation background of studentsfrom table 8, cluster 1 consists of only those students who did graduation in Commerce, cluster 2 consists of students having management, science and arts background while cluster 3 consists of students having technical background only while cluster 4 consists of students with commerce background. As far as number is concerned, 258 students are from commerce background, 142 are from Management background, 108 from technical background while only 40 students are either from Science and Arts as a background in their undergraduate (20 each from Science and Arts) courses.

Table No. 8:Clusters based on Background

Commerce

Management

Science

Arts

technical

No.

%

No.

%

No.

%

No.

%

No.

%

Cluster

1

136

52.7%

0

.0%

0

.0%

0

.0%

0

.0%

2

0

.0%

142

100.0%

20

100.0%

20

100.0%

0

.0%

3

0

.0%

0

.0%

0

.0%

0

.0%

108

100.0%

4

122

47.3%

0

.0%

0

.0%

0

.0%

0

.0%

Combined

258

100.0%

142

100.0%

20

100.0%

20

100.0%

108

100.0%

Examining the Composition of the Clusters

Cluster composition can easily be examined by bar charts and box plots for categorical and continuous data respectively. Though SPSS offers numerous other displays and tables to help researchers in determining the composition of the clusters.

Bar Charts are used to examine categorical variables i.e. Gender. Gender appears to be an important variable in forming the clusters. Table 7 shows the percentage of males and females in each clusters. Gender distribution in clusters are different. Cluster 1 has only females and cluster 4 has only males. The composition of cluster 2 and 3 appears to be almost same on the basis of gender classification.

Similarly background of the student is another categorical variable that has been analyzed here in table 8. While Cluster 1 and cluster 4 contains commerce stream students only, Cluster 3 contains students with only technical stream. Cluster 2 is a hybrid cluster that contains students with other stream. The number of commerce stream studentsis found highest in PGDM programs. Percentage of students with Science and Arts background is less than 8 percent. This shows that mostly commerce and technical background students arejoining PGDM programs as an option for higher studies.

Here we have shown the box plot of PGDM marks only.On observing the output of PGDM marks, the average marks are shown by the small circle in all the four clusters while overall spread is shown by the vertical line. As we can observe that the average mark of students in cluster 3 is far above the overall average in PGDM. The performance of cluster 3 (technical background students containing both females and males) is above average in all classes starting from 10thstandard to PGDM (refer table no.2) While performance of cluster 4 is far below average in all the four classes.Cluster 4 consists of males having commerce background students. Another important observation is about cluster 2 whose average marks are consistently below average in all the four classes. This cluster represents students largely of those students who opt for management programs in graduation like BBA and BBM. (Bachelors of Business Administration and Bachelors of Business Management).

Figure No.3: Performance of students in PGDM program

  1. Two step Clustering Output of SPSS for PGDM batch 2015-17

Student’s record of batch 15-17 is also analyzed using two step clustering in SPSS13. The same process is repeated in order to understand some useful patterns in the dataset. All clusters are evenly distributed with almost equal number of observations in each cluster (table 9).In this dataset also 4 clusters are formed with almost the same distribution pattern as observed in the previous data set. The combined strength of students is 572.

Table No. 9. Cluster Distribution

N

% of Combined

% of Total

Cluster

1

154

26.9%

26.9%

2

124

21.7%

21.7%

3

152

26.6%

26.6%

4

142

24.8%

24.8%

Combined

572

100.0%

100.0%

Total

572

100.0%

Table. No. 10: Summary Statistics of clusters based of Academic performance

Marks in PGDM first year

Marks in Graduation

Marks in 12th standard

Marks in 10th standard

Mean

S.D.

Mean

S.D.

Mean

S.D.

Mean

S.D.

Cluster

1

67.4697

6.57465

58.2745

6.82192

66.8253

9.93256

64.2278

8.65343

2

74.5897

9.03862

68.0132

8.63722

68.0632

8.42724

73.8171

8.91064

3

66.3653

8.57294

65.0680

6.76636

64.1696

8.12345

67.2550

9.73112

4

71.1211

10.37952

62.8162

7.17765

71.7386

9.03846

70.5411

9.59673

Combined

69.6262

9.23097

63.3185

8.11756

67.6077

9.32511

68.6783

9.87072

Referring Table 10, the summary of marks is shown in PGDM (first year), graduation, 12th and 10th standard respectively. Cluster 2 has the highest average marks in PGDM and Cluster 3 has the lowest average marks in PGDM.

Table No. 11. Clusters based on gender of Students

Female

Male

Frequency

Percent

Frequency

Percent

Cluster

1

0

.0%

154

48.4%

2

52

20.5%

72

22.6%

3

60

23.6%

92

28.9%

4

142

55.9%

0

.0%

Combined

254

100.0%

318

100.0%

As shown in table 11, Cluster 1 consists of only males and cluster 4 consists of females only. While cluster 2 and 3 shows both male and female students with the number of males slightly more than the number of females.

Table No. 12. Clusters based on Background of Students

Commerce

Management

Science

Arts

technical

No.

%

No.

%

No.

%

No.

%

No.

%

Cluster

1

154

52.0%

0

.0%

0

.0%

0

.0%

0

.0%

2

0

.0%

0

.0%

0

.0%

0

.0%

124

100.0%

3

0

.0%

98

100.0%

32

100.0%

22

100.0%

0

.0%

4

142

48.0%

0

.0%

0

.0%

0

.0%

0

.0%

Combined

296

100.0%

98

100.0%

32

100.0%

22

100.0%

124

100.0%

On observing table 12based onstudent’s educational stream in under graduation, cluster 1 and 4 contains only commerce stream while cluster 2 contains only technical stream students. Cluster 3 contains students with management, science and arts as a background hence it is a hybrid cluster.

In Figure 6, the average performance of technical students (cluster 2) in PGDMis best as compared to students of other streams, while the performance of students in cluster 3 is worst. This cluster contains students from management, science and arts background. If we observe their performance in graduation, technical student’s performance is also better as compared to commerce stream students. Cluster 4 is also above average and it contains females only.

Fig. No.4: Descriptive of Marks in PGDM

Finally the pattern of performance of the students in cluster3is consistently below average among (refer table 10)which is reflected throughout their academic progression from 10th standard to PGDM. If we observe cluster 2, its student are consistent performer and they perform very wellat all levels of classes (10th, 12th, under-graduation and PGDM). So PGDM students having technical background are doing far better in academics as compared to students with other backgrounds in Graduation.

In order to get some detailed insights for each graduation stream like Management, commerce, Technical, Arts and Science, we run One-way ANOVA to find descriptive analysis of each stream. It is required to explore the hidden information since some of the streams (Management, Science and Arts) got merged in one single cluster. We have analyzed this information for both batches of PGDM, i.e. Batch 14-16 and batch 15-17 in table 13 and table 14 respectively.

N

Mean

Std. Deviation

PGDM(14-16)

Commerce

260

68.3604

8.70115

Management

142

65.7611

9.21670

Science

20

70.7650

12.93394

Arts

20

66.5250

5.22208

Technical

108

73.3863

7.94643

Total

550

69.2896

9.09029

Table13. Descriptive of Marks in PGDM with respect to all streams of Batch 14-16

The performance of students having Management background in Graduation is below average in PGDM as shown in table 13. This is even below than students with Arts background also.

N

Mean

Std. Deviation

PGDM (15-17)

Commerce

296

69.2214

8.78927

Management

98

64.9094

7.97765

Science

32

68.6450

10.77718

Arts

22

69.5345

5.89129

Technical

124

74.5897

9.03862

Total

572

69.6262

9.23097

Table 14.Descriptive of Marks in PGDM with respect to Background of Batch 15-17

The same trend can be observed in next batch also i.e. 2015-17 which is just the repetition of previous batch 2014-16. Here one interesting thing has been noticed that though science and arts background students are also performing well but since their number is very small, it is not creating any reflection in cluster performance.

Conclusion:

Cluster identification is a critical job in cluster analysis. This step is very important, since it focusses whether the segments are distinct or not. Through two step clustering, it becomes very simple to achieve. This has been shown by analyzing the case of PGDM students’performance in two successive academic years.

Two Step clustering is an effective way to classify a mixed data set into a meaningful groups that can be easily interpreted and analyzed. As shown in two-step analysis, we could easily extract some meaningful information after studying a large dataset of Management students for two successive years. As shown above students having a technical background in Graduation have performed well not only in PGDM program but also in Graduation, 12th and 10th standards also.

An important factor in deciding the number of clusters are categorical variables that are selected in the data set. Here we have used two categorical variables namely Gender and Background of students. It is very important to carefully select these attributes before going for two step clustering. In our case both of these variables are important in finalizing the clusters.

Our Hypothesis H1 and H2 have been validated in the beginning using two way ANOVA though it is found that when both gender and background taken together, these variables are not creating any impact on the overall performance of students.

Our third hypothesis H3, cannot be validated as the performance of commerce background students, especially male cluster, is not doing well in the course.

It is very interesting to observe that students who have management background in undergraduate performed very badly in PGDM program.Though the pattern of syllabi in BBA or BBM are streamlined with PGDM courses. If we try to find out the root cause than it can be easily observed that these students are not very sound in academics. This is verified by observing their marks in 10th and 12th standardwhich are far below the average marks. Though this cluster also contains students with science and arts background but their percentage in the cluster is very less. Hence the major impact in the average marks are created by students having Management stream in graduation. Hence an important conclusion that can be inferred that irrespective of student’s background in undergraduate programs, if they are not strong academically, they will not be able to do well in PGDM even if they have studied the same course in under graduation. Hence hypothesis H4 is validated.

Another interesting observation is the mix of students joining PGDM programs. Mostly commerce, management and technical students are joining this course so colleges should focus more in attracting diverse background students also for Management programs by carefully admitting them by giving more weightage to their past academic performance and less weightage to management aptitude test as conducted by different agencies of government and private bodies..

References

Aziz, A. A.; Ismail, N. H.; Ahmad, F. (2013) Mining Students’ Academic Performance. Journal of Theoretical and Applied Information Technology. Vol. 53 No. 3. Pakistan.

Amershi, S.; Conati, C. (2009) Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning Environments. Journal of Educational Data Mining, Article 2, Vol 1, No 1. Canada.

Chiu T, Fang D, Chen J, Wang Y, Jeris C (2001). A robust and scalable clustering algorithm for mixed type attributes in large database environment. In: Proceedings of the 7th ACM SIGKDD international conference in knowledge discovery and data mining, Association for Computing Machinery, San Francisco, CA, pp 263–268.

Dietz-Uhler, B.; Hurn, J. E. (2013) Using Learning Analytics to Predict (and Improve) Student Success: A Faculty Perspective. Journal of Interactive Online Learning, Volume 12, Number 1. USA.

Delavari, N.; Phon-Amnuaisuk, S.; Beikzadeh, M. R. (2008) Data Mining Application in Higher Learning Institutions. Informatics in Education, Vol. 7, No. 1, 31-54. Institute of Mathematics and Informatics, Vilnius, Lithuania

  1. Kabakchieva. (2013) Analyzing University Data for Determining Student Profiles and Predicting Performance.Cybernetics and Information Technologies, Vol.1(3).

Erdogan, S.Z.; Timor, M. (2005) A Data Mining Application in a Student Database. Journal of Aeronautics and Space Technologies (AST), Vol.2, no.2, pp.53-57. Springer.

Fraley, C. and A.E. Raftery. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Computer Journal, 4. p. 578–588.

Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing, 20. p. 270–281.

Farooq, M.S.; Chaudhry, A.H.; Shafiq, M.; Berhanu, G. (2011) Factors Affecting Students’ Quality 0f Academic Performance: A Case of Secondary School Level. Journal of Quality and Technology Management. Volume VII, Issue II, Page 01-14.

Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2. p. 283–304.

Jain, Anoop kumar,& Satyam Maheswari (2012). A Survey of Recent Clustering Techniques in Data Mining, International Archive of Applied Sciences and Technology (ISSN: 0976-4828), vol. 3[2], pp. 68-75.

Jain, A.K. Murty, M.N. and Flynn, P.J. (1999). Data Clustering: A Survey. ACM Computing Surveys, Vol. 31, No. 3.

Jiawei Han, Micheline Kamber and Jian Pei (2013), Data Mining: Concepts and Techniques, The Morgan Kauffman series 3rd ed., 2013.

Kaufman L, Rousseeuw PJ (2005) Finding groups in data. An introduction to cluster analysis. Wiley, Hoboken, NY

  1. Kameshwaran, and K. Malarvizhi. (2014). Survey on Clustering Techniques in Data Mining, IJCSIT International Journal of Computer Science and Information Technologies (ISSN: 0975-9646), vol. 5(2), pp. 2272-2276.

Kumar, S. A.; Vijayalakshmi, M. N. (2011) Efficiency of Decision Trees in Predicting Student's Academic Performance, First International Conference on Computer Science, Engineering and Applications, CS and IT 02, pp. 335-343. Dubai.

  1. Arockiam, S. Charles, I. Carol, P. Bastin Thiyagaraj, S. Yosuva, V. Arulkumar. (2010). Deriving Association between Urban and Rural Students Programming Skills. International Journal on Computer Science and Engineering Vol. 02, No. 03, 687-690.
  2. Kuchaki Rafsanjani, Z. Asghari Varzaneh, and N. Emami Chukanlo. (2012). A survey of hierarchical clustering algorithms, TJMCS: The Journal of Mathematics and Computer Science, vol. 5, No. 3, pp. 229-240.

Manpreet Kaur, and Usvir Kaur. (2013). A Survey on Clustering Principles with K-Means clustering Algorithms Using Different Methods in Detail. International Journal of Computer Science and Mobile Computing, vol. 2, Issue 5, pg. 327-331.

Ming-Yi Shih, Jar-Wen Jheng and Lien-Fu Lai. (2010). A two-Step Method for Clustering Mixed Categorical and Numeric Data”, Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11-19.

Md. Hedayetul Islam Shovon. (2012) “Prediction of Student Academic Performance by an Application of K-Means Clustering Algorithm”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 2(7).

  1. Sivaram. (2010) Applicability of Clustering and Classification Algorithms for Recruitment Data Mining, International Journal of Computer Applications, Vol. 4(5).

Oyelade, O. J, (2010). Application of k-Means Clustering algorithm for prediction of Students’ Academic Performance. (IJCSIS) International Journal of Computer Science and Information Security, Vol.7.

Ogor, E. N. (2007) Student Academic Performance Monitoring and Evaluation Using Data Mining Techniques. Electronics, Robotics and Automotive Mechanics Conference, CERMA 2007, pp. 354- 359. Mexico.

Oussena, S.; Kim, H.; Clark, T. (2011) Exploiting Student Intervention System Using Data Mining. IMMM 2011: The First International Conference on Advances in Information Mining and Management. Barcelona, Spain.

Osmanbegović, E.; Suljić, M. (2012) Data Mining Approach for Predicting Student Performance. Journal of Economics and Business, Vol. X, Issue 1, Elsevier.

  1. Cortez, and A. Silva. (2008). Using Data Mining to Predict Secondary School Student Performance. In EUROSIS, A. Brito and J. Teixeira (Eds.) pp.5-12.
  2. Indira Priya, and Dr. D. K. Ghosh. (2013). A Survey on Different Clustering Algorithms in Data Mining Technique, IJMER: International Journal of Modern Engineering Research (ISSN: 2249-6645), vol. 3, Issue 1, pp. 267-274.

Prof. Neha Soni, and Prof. Amit Ganatra. (2012). Categorization of Several Clustering Algorithms from Different Perspective: A Review, IJARCSSE: International Journal of Advanced Research in Computer Science and Software Engineering (ISSN: 2277 128X), vol. 2, Issue 8.

Rama. B, Jayashree. P, and Salim Jiwani. (2010). A Survey on clustering, IJCSE: International Journal on Computer Science and Engineering (ISSN: 0975-3397), vol. 02, No. 09, pp. 2976-2980.

Ramandeep Kaur, and Dr. Gurjit Singh Bhathal. (2013). A Survey of Clustering Techniques, International Journal of Advanced Research in Computer Science and Software Engineering (ISSN: 2277 128X), vol. 3, Issue 5.

Romero, C.; Ventura, S.(2007) Educational Data Mining: A Survey from 1995 to 2005, Expert Systems with Applications (33), pp. 135-146. Elsevier.

  1. Revathi, and Dr. T. Nalini. (2013). Performance Comparison of Various Clustering Algorithm, IJARCSSE: International Journal of Advanced Research in Computer Science and Software Engineering (ISSN: 2277 128X), vol. 3, Issue 2.

Suma. V, Pushpavathi t.P, and Ramaswamy. V. (2012). An Approach to Predict Software Project Success by Data Mining Clustering. International Conference on Data Mining and Computer Engineering (ICDMCE‘2012), Bangkok (Thailand).

Superby, J.F.; Vandamme, J.P.; Meskens, N. (2006) Determination of Factors Influencing the Achievement of the First year University Students using Data Mining Methods. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, Educational Data Mining Workshop, (ITS`06), pp. 37-44. Taiwan. .

Steinmayr, R.; Spinath, B. (2009) The importance of motivation as a predictor of school achievement. Learning and Individual Differences.Journal of Psychology and Education, 19, 80-90, Elsevier. USA.

Sajadin Sembiring, (2011). Prediction of Student Academic Performance by an Application of Data Mining Techniques. International Conference on Management and Artificial Intelligence IPEDR, IACSIT Press, Vol. 6.

Theodoridis, S. and K. Koutroumbas. (1999). Pattern recognition. Academic Press, New York.

Wilson, R. L.; Hardgrave, B. C. (1995) Predicting graduate student success in an MBA program: Regression versus classification. Educational and Psychological Measurement, 55, 186-195. USA.

Yin, J., Tan, Z. F., Ren, J. T. and Chen, Y. Q. (2005), An Efficient Clustering Algorithm for Mixed Type Attributes in Large Dataset, Proc. of the Fourth International Conference on Machine Learning and Cybernetics, 2005 August. Guangzhou China.

Yahya Al-Nakeeb, Mark Lyons, Lorna J. Dodd and Anwar Al-Nuaim (2015) An Investigation into the Lifestyle, Health Habits and Risk Factors of Young Adults. International. Journal of Environmental. Research and. Public Health 2015, 12, 4380-4394; doi:10.3390/ijerph120404380

Yadav, Asmita, (2013). A Survey of Issues and Challenges Associated with Clustering Algorithms, IJSETT: International Journal for Science and Emerging technologies with latest Trends (ISSN (Online): 2250- 3641), vol. 10(1), pp. 7-11