- #工作信息
- #求职
- #机器学习
- #攻略
【新人求大米】机器学习知识面试前的Check List

456925
Books: 1. 动手学深度学习
2. 统计学习方法
3. 深度推荐学习
4. 机器学习-西瓜书
Papers:
1. Self-Attention is All you need
2. Position Encodin
3. Tokenizer
4. BERT
5. ROBERTA
6. Info-XLM
7. Transformer XL
8. XLNET
9. XLM
10. XLMROBERTA
11. ALBERT
12. DSSM
13. Adapter
14. Knowledge Distillation
15. TINEYBERT
16. MiniLM
Algorithm:
1. Logic Regression
2. K-means
3. SVM
4. KNN
5. 贝叶斯
6. 决策树
7. 条件随机场
8. GBDT
9. XG Boost
10. LightGBM
11. 集成学习
12. 优化算法
13. 激活函数
14. 模型初始化
15. 模型评估
a. F1, Macro-F1, Micro-F1
b. Precision, Recall
c. AUC
d. NDCG
16. Loss函数
17. LSTM
18. GRU
19. Word2vec
20. Fast Text
21. Text CNN
22. Seq2seq
23. TFIDF
24. BM25
25. Lambda Mart
26. 强化学习
27. SVD and PCA
28. KDTREE, ANN, 局部敏感hash
29. FGM
Technique:
1. Overfitting
a. Drop Out
b. Layer Normalize
c. Batch Normalize
d. Increase Data
e. Rest Net
f. LSTM
g. Weight decay
h. Parameters initialize
i. Early Stopping
j. 梯度裁剪
2. Underfitting
a. Increase parameters
b. Reduce learning rate
c. Reduce L2 parameter
3. Model compression
a. L1与L2正则
b. 偏差与方差
c. Data Unbalanced
a. Focal loss
d. 知识蒸馏
e. 分布式训练
a. 单机多卡
b. 多机多卡
f. 熵,交叉熵,相对熵
QA:
1. Text CNN how it works?
2. Transformer why divide by sqrt(dimension dk) for QK when do scaled dot-product attention, this leads to having more stable gradients
3. Transformer positional encoding? In order for the model to make use of the order of the sequence, we must inject some information about the relative or absolute position of the tokens in the sequence.
4. Why multi-head? Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions
5. Self-attention核心思想
6. 多语言模型预训练的objective,多语言模型的核心思想
问面试官的问题:
1. 部门的商业目标,最大挑战是什么
2. 技术氛围
3. 岗位的发展前景,晋升目标,职业发展方向
4. 职业规划
5. 工作节奏
6. 竞争优势
2. 统计学习方法
3. 深度推荐学习
4. 机器学习-西瓜书
Papers:
1. Self-Attention is All you need
2. Position Encodin
3. Tokenizer
4. BERT
5. ROBERTA
6. Info-XLM
7. Transformer XL
8. XLNET
9. XLM
10. XLMROBERTA
11. ALBERT
12. DSSM
13. Adapter
14. Knowledge Distillation
15. TINEYBERT
16. MiniLM
Algorithm:
1. Logic Regression
2. K-means
3. SVM
4. KNN
5. 贝叶斯
6. 决策树
7. 条件随机场
8. GBDT
9. XG Boost
10. LightGBM
11. 集成学习
12. 优化算法
13. 激活函数
14. 模型初始化
15. 模型评估
a. F1, Macro-F1, Micro-F1
b. Precision, Recall
c. AUC
d. NDCG
16. Loss函数
17. LSTM
18. GRU
19. Word2vec
20. Fast Text
21. Text CNN
22. Seq2seq
23. TFIDF
24. BM25
25. Lambda Mart
26. 强化学习
27. SVD and PCA
28. KDTREE, ANN, 局部敏感hash
29. FGM
Technique:
1. Overfitting
a. Drop Out
b. Layer Normalize
c. Batch Normalize
d. Increase Data
e. Rest Net
f. LSTM
g. Weight decay
h. Parameters initialize
i. Early Stopping
j. 梯度裁剪
2. Underfitting
a. Increase parameters
b. Reduce learning rate
c. Reduce L2 parameter
3. Model compression
a. L1与L2正则
b. 偏差与方差
c. Data Unbalanced
a. Focal loss
d. 知识蒸馏
e. 分布式训练
a. 单机多卡
b. 多机多卡
f. 熵,交叉熵,相对熵
QA:
1. Text CNN how it works?
2. Transformer why divide by sqrt(dimension dk) for QK when do scaled dot-product attention, this leads to having more stable gradients
3. Transformer positional encoding? In order for the model to make use of the order of the sequence, we must inject some information about the relative or absolute position of the tokens in the sequence.
4. Why multi-head? Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions
5. Self-attention核心思想
6. 多语言模型预训练的objective,多语言模型的核心思想
问面试官的问题:
1. 部门的商业目标,最大挑战是什么
2. 技术氛围
3. 岗位的发展前景,晋升目标,职业发展方向
4. 职业规划
5. 工作节奏
6. 竞争优势