regression analysis from 200 observations (600 values)

chapolito

New member
Joined
Nov 13, 2008
Messages
2
hi. i hope somebody can help me out of my dilemma. my task: we suppose i am the responsible manager for research and development (r&d) in a firm that produces and sells robots. I now have to examine the relationship between expenditure, innovations and monthly sales in my firm.
my last task is to decide whether it would be a good idea to increase the R & D spending by 1 million dollar (per month). would this be a good strategy? I really need some help from you guys. thank you so much in advance.


ok here are the data. the first column on the left refers to the expenditure, the middle one to innovations and the right one to sales. this are the data i were given to do the task explained above in total there are 200 observations.

R&D expenditure per month (hundreds of thousands) innovations per month sales per month (hundreds of thousands)
930 3 1549
1194 4 1591
719 3 1410
1340 4 1762
903 3 1499
1035 3 1462
573 3 1300
1008 3 1471
391 2 1234
1343 3 1297
1251 4 1623
720 3 1471
1476 3 1601
1317 4 1658
742 3 1368
914 3 1410
974 3 1444
972 3 1295
706 2 1200
1445 4 1799
1063 2 1296
901 2 1095
1292 3 1506
452 2 1185
992 2 1234
907 3 1317
718 3 1522
1055 4 1663
975 2 1113
1242 5 1715
1136 2 1136
1253 3 1222
940 3 1443
853 2 880
878 3 1393
1160 4 1797
1194 4 1586
1122 3 1299
1086 2 980
693 3 1423
1245 3 1729
416 3 1237
497 2 1215
887 3 1340
653 3 1334
1163 3 1242
1420 3 1388
581 2 1074
1502 4 1835
716 3 1219
839 3 1470
737 2 1196
854 2 1117
1164 3 1422
1260 4 1538
821 2 1092
953 3 1405
1193 4 1716
1324 4 1460
1119 3 1396
508 2 1083
1077 3 1340
732 3 1434
631 3 1287
788 3 1313
788 2 1260
1296 4 1798
938 3 1558
1120 3 1396
844 3 1249
779 2 1177
904 2 1105
1316 3 1593
1448 4 1730
893 3 1461
1168 4 1606
1065 2 1186
882 3 1620
1268 3 1522
1116 4 1662
1197 3 1211
683 2 1119
963 3 1460
997 2 1166
1470 4 1489
765 2 1094
667 3 1324
1055 3 1239
693 4 1452
1089 3 1493
1434 4 1868
1036 3 1435
1430 4 1714
609 2 1210
1148 3 1397
876 3 1639
737 3 1446
947 3 1428
1028 3 1335
961 3 1626
1064 3 1566
947 3 1246
920 3 1301
1360 4 1430
1011 4 1759
552 3 1408
1053 2 1171
1195 4 1716
966 3 1188
1165 4 1743
918 3 1475
934 3 1267
704 3 1213
808 2 1151
784 2 1109
616 2 1095
1177 3 1512
533 2 1135
1157 4 1551
1001 3 1393
819 3 1291
1177 4 1700
1326 4 1735
857 3 1197
986 2 1270
1239 4 1592
874 3 1405
779 3 1526
1179 4 1669
943 2 1197
1073 3 1514
1319 4 1559
1416 4 1833
1163 3 1295
1154 3 1543
816 3 1331
1032 3 1356
1667 4 1722
520 2 1052
1158 3 1541
1049 4 1621
985 3 1437
868 2 1204
945 3 1583
1013 3 1270
753 2 1316
948 2 1208
1386 3 1473
1071 3 1324
1020 3 1512
986 3 1363
1248 2 1255
908 3 1532
1075 3 1293
1058 3 1570
1227 5 1889
742 3 1409
1368 4 1746
1165 4 1889
1368 3 1325
1426 5 1825
1094 3 1460
861 2 1275
1076 3 1456
962 3 1218
934 4 1627
1312 3 1461
670 3 1393
1014 3 1316
1075 4 1526
845 3 1238
1152 3 1482
1177 3 1545
849 4 1707
677 2 1266
1316 3 1446
1018 3 1605
1216 3 1534
1545 4 1835
1077 3 1507
1352 4 1730
1181 3 1522
786 2 1170
939 3 1467
906 3 1562
1090 2 1225
1006 2 1088
1007 3 1485
936 2 957
1079 4 1707
781 3 1126
1257 4 1452
1067 3 1317
507 3 1519
1079 4 1550
627 2 1221
1233 4 1868
1561 4 1650
712 2 1100
927 3 1593
 
What are your thoughts? What have you tried? How far did you get? Where are you stuck? Are you supposed to use any particular strategy, software, techniques, etc, on this exercise?

Please be complete. Thank you! :D

Eliz.
 
What I think you need to ask yourself is: Mathematically speaking, what do you know about this data? If my job were on the line and I had this much data, the first thing I would do would be to find out everything I could about the real-world data. Then, how can you use that information to say whether or not it would be a good decision to increase R&D spending?

It would be a good decision to increase R&D spending, I think, if the data show that an increase in R&D spending will result in greater monthly sales. It would be a bad decision to increase R&D spending if doing so would result in a decrease in monthly sales (or in a decrease in net sales, depending on how detailed you want to get with this). In addition, you can ask, based on an analysis of the data, whether or not a million dollars is too much. That is, would spending $750,000 more result in greater monthly sales (or net sales) than spending an additional $1m? I mean, is there a point of diminishing returns (and what is that point)? In its simplest form, however, your dependent variable is the monthly sales, which is what you're being asked to maximize, and your independent variable is the R&D expenses, which is what you're being asked to manipulate.

Are you able to see a relationship between spending and sales (I did, with a correlation of about .59)? How would you describe the trend in sales as you increase spending? Is it a positive slope or negative slope over the whole domain? This will tell you what the data suggests you might get from increasing your spending. The correlation will help you in determining just how certain you can feel in giving that advice.

The effect of spending on the number of innovations is a little tricker since the range of innovations is a lot smaller. There's still a very strong correlation in these data between the number of innovations and monthly sales, but how much an increase in spending will affect the number of innovations is small.

If you can help us out by telling us what trends you have observed (or computed) in these data (relationship of your dependent variable -- monthly sales -- to your independent variables would be a good start), I think we can help you use that mathematical information to construct a good answer for this company of yours. But just giving us the data like this doesn't tell us where you want to go with your analysis. There are so many possibilities as to what you can do with good data like this, and we need to narrow down the focus a little.
 
hi. thank you very much for the help so far. my thoughts on this are similar. first of all i wanted to examine whether there is a correlation between expenditures and sales. i computed the same result of .59. i wasnt sure if this is enough evidence for a correlation? is it? assuming it is, I dont quite know how to judge whether it is sufficient evidence to say yes, it would be a good idea to increase spending since sales go up as well. what are your thoughts? how would one do an analysis ( does regression work) for this task?

thanks a lot

p.s. connected to the same data and problem i am stuck somewhere else. i am also asked to test whether 2 is the optimal number of innovations.
my thought: i would test the null hyphothesis that ho: mean of innovations = 2
the alternative would be : mean of innovations not equal to 2

what do you think about the approach?
 
Thanks for writing back. This gives people a lot more to chew on...

You're right that there's a correlation between expenditures and sales, .59. Whether that's good enough to justify pumping a million dollars more into the company depends on your particular biases. If you have the politics of a banker, you would say 'no' for sure, especially given the scatter plot of this data. There are several months where spending was significantly higher than it was in other months and monthly sales were still lower. If your biases are more along the lines of one of the creative people at this company, you would probably say .59 is perfectly adequate to justify taking the risk. There's no hard-and-fast rule about how much correlation is enough. It's all very opinion-driven when it comes to using that as a decision-making process.

Here's another thing: In the real situation, many other variables have an influence on the sales of this company. Any real answer would have to take those into account as well. In a regression model, we have so far found that

sales = f ( expenditures )

has a .59 correlation and a positive slope. Therefore, we can definitely say sales tend to go up with increased expenditures. A better analysis would take other variables into account. You've already started.

sales = f ( expenditures ) + f ( innovations ) ... + f ( global_economy ) + f ( average_temperature )

I'm exaggerating a little, I suppose, but if you have the data, why not see what the coefficient is for innovations as well as expenditures in your analysis. But keep in mind, the president could make a speech tomorrow that makes people want to stop buying robots, and all this analysis would go out the window, because it was too simplified. The best analysis takes all the contributing factors into account, but companies almost always have to make decisions with limited data. So, join the club.

Also, about innovations, it's not exactly something you have any control over. The slope of the line "innovations vs. expenditures" (first-order) is positive, with a high correlation, but it is nearly flat over the whole domain. It wouldn't work to say to the creative types: "Have no more than two innovations this month, but have more than one." They come up with what they come up with.

Computing the mean number of innovations is as simple as finding it, but with the regression, you could compute confidence intervals for the number of innovations. Again, I'm not sure what that information would be used for, in terms of your decision-making process.

It is therefore my opinion that being asked to find the "optimum number" of innovations is misguided, but if that's the assignment, it's not really your fault. I would start by taking a look at a scatter plot of "sales vs. innovations." If you think sales are higher with 2 innovations than with 6, say, someone else could easily say it's 3. So, I think your null hypothesis should be something like "Sales is highest when there are two innovations." That's what we really mean by "optimum" which is what your assignment is. To find an optimum point, try to do a second-order regression with innovations on the x axis and sales on the y axis. If the maximum occurs at 2 (and it's concave down), then I suppose two would be the optimum number of innovations. In other words, find the coefficients for sales = a + b1 innovations + b2 innovations^2, or y = a + b1 x + b2 x^2. If the parabola is concave up, forget it (reject Ho); if it's concave down, at what value of x does the maximum occur? For whatever it's worth, that's your optimum number of innovations.
 
chivox said:
... the president could make a speech tomorrow that makes people want to stop buying robots ...


Nothing that man could say would suprise me.

 
Top