數據科學項目的最佳實踐

Best Practice for Data Science Projects 數據科學項目的最佳實踐

Best Practice for 數據 Science Projects

  1. How was the data collected/sampled? Engineers can, intentionally or unintentionally, introduce data bias. 數據 scientists can take the same data and show the same results as favorable or unfavorable.
  2. How is data split into train/validation/test groups? An inappropriate split of the data may result in significant differences in production results. Applying an 80-20% split to any size of the data sets, and not stratifying skewed data sets are very common mistakes that engineers regularly make.
  3. Does the test data represent the data, that the model will be used? What percentage of data is changing, in a certain amount of time? A basic check is, to ensure that the data received after completion of the project, still represents the business needs.
  4. Is it maintainable? The software should meet the requirements of the project. When engineers finish working on the last step of the project, they frequently run to the project manager to demonstrate the metrics of the model (mostly defined on accuracy).
  5. Is it scalable? If the data set is small, splitting data with an 80–20% ratio is OK, however, if you have a big dataset like 10 million. In this case, splitting with an 80–20 ratio will result in, 8 million train set size and a 2 million test set size. Do you really want to have your test set size that big?
  6. Is it documented? Engineers can understand every piece of the project and every line of code because they created the code however, good documentation will allow new-hires to quickly come up to speed. It should not take too much time for new people to understand existing work.

確保在項目結束時具有以下內容:

  1. 所有內容均已編寫腳本,筆記本中的所有代碼均已實現。
  2. ML 數據 Model image has been generated.
  3. 系統演示了從開始(第一步)到結束(最後一步)的平穩流程。

#AI #ArtificialIntelligence #ML #MachineLearning #DeepLearning #ModernDataPlatform #DataConsultation #DataPlanning #DataModeling #DataAnalyticsJourney #DataJourney #HongKongDataAnalytics #HongKongDataAnalysis #PredictiveAnalysis #ForecastAnalysis #TrendAnalysis #Azure #PowerBI #CloudMigration

如果您有任何疑問或對我們的服務感興趣,歡迎與我們聯繫。

zh_HKChinese