登录
  • #数学|统计
  • #统计生统

Interview project求教

dandelionjmy
3808
3
最近收到一个Interview的project, 看起来挺简单的,不过因为我不是统计背景,稍微有点吃力。找工作有段时间了,真心希望能拿下这个工作。先说下背景,小女子有应用数学的学位,也是思考了很久觉得往统计编程方面转比较有希望拿到offer. 题目如下, 问题比较长,我自己思考了很久,有些疑问希望大家能给予点拨。

Reginald Vinegar Industries Inc, a major vinegar producer, must provide health insurance to its employees. Every few years, Company A’s insurer, InsurAHealth, reviews the health status of the employees. To do this, InsurAHealth calculates a health score between 0 and 6, where 6 denotes a very sick person. InsurAHealth calculates this score every quarter and that the employees have gotten sicker. Mean Health Score in Quarter 1 was 3.4, in Quarter 6 it was 3.5, and Quarter 12 it was 3.9.

Reginald Vinegar Industries Inc has hired you to evaluate InsurAHealth's claim that employees are sicker. The ‘health score’ is a proprietary tool used by InsurAHealth, and it does not release the items that go into its formula. InsurAHealth has provided data over 12 quarters on 2,000 employees from Company A. This is a representative sample of the employees at the company, and we know that the information included in this data is not part of the health score calculation.

You should not spend more than two hours on this investigation. If a question is unclear to you or you want more clarification, answer to the best of your ability as no questions will be answered regarding the prompt. Many questions can be approached from multiple angles and have many correct answers. Regression analysis is not necessary to answer these questions, but

can be used if you like. Feel free to manipulate/subset the data in any way you see fit and use extra sheets, graphs, or whatever tools you may like. This tab and the data tab are locked, so you will need to copy and paste the information to manipulate it. The only requirement is that all output should be placed in this excel workbook and all work and calculations be shown.

Questions:

1) What are the demographic characteristics of employees at Reginald Vinegar Industries Inc?

Suggestion: Create a few tables, check if demographics change over time



2) What characteristics is the health score associated with?

Suggestion: Create a few scatter plots



3) Based on the data provided, how do you evaluate InsurAHealth's claim that employees are getting sicker?

Suggestion: First list how you would evaluate the claim. Then, time-permitting, implement the steps you suggested.

数据太大,上传不了附件, 先贴一张图片, 如果有人想要全部数据的话我也可以发给你。我的想法是:

1)用SAS做了一个table, 看一个各个characteristic有没有随时间变化。

2)感觉好像是画scatter plots, 看一下health score和age, hospital visit等等因素有没有关系。不知道这样理解正确么?

3)第三个有比较大的疑问。

首先每个quarter的测试人数都是不一样的,也就是说,第1个quarter只有600个sample, 第12个quarter有快2000个sample. 而且quarter之间不是independent的。如果用repeated measure来看mean有没有随时间变化的话,算是missing data了。

其次,health score 虽然说是0-6之间,但是有些是10,我把这些数据去掉了。我先做了个正态性检验,结果是数据不符合正态分布,但是直方图看起来挺像的,当然我还没有测residual是否正态。然后我就用了proc mixed来看均值变化是否显著。结果是从第1个quarter到第12个quarter均值变化是显著的,但是如果只测2-11quarter, 均值变化是不显著的。我觉得这样的话是不能证明employees are getting sicker的,因为health score很可能跟季节或者什么因素有关。

不知道大家觉得这个用proc mixed可以吗?还是有什么别的方法?

非常感谢!







3条回复
热度排序

发表回复