IBM

50 Questions

Question Bank

50 IBM A1000-120 - Assessment: Data Science Foundations Practice Questions: Question Bank 2025

Build your exam confidence with our curated bank of 50 practice questions for the IBM A1000-120 - Assessment: Data Science Foundations certification. Each question includes detailed explanations to help you understand the concepts deeply.

50 Questions

All Domains

Mixed Difficulty

Need More? Try 100 Questions View Study Guide

Question Banks Available

50 Questions

Current Selection

Current

Extended Practice

Extended Practice

Comprehensive Question Bank

Why Use Our 50 Question Bank?

Strategically designed questions to maximize your exam preparation

50 Questions

A comprehensive set of practice questions covering key exam topics

All Domains Covered

Questions distributed across all exam objectives and domains

Mixed Difficulty

Easy, medium, and hard questions to test all skill levels

Detailed Explanations

Learn from comprehensive explanations for each answer

50 Question Bank

Practice Questions

50 practice questions for IBM A1000-120 - Assessment: Data Science Foundations

AI Generated

50 Questions

Data Science Fundamentals

A retail team wants to define the business problem for a data science project. Which statement best represents a well-formed problem definition?

Data Manipulation and Visualization

A dataset contains ages with a few negative values due to a data entry issue. What is the most appropriate initial step?

Data Manipulation and Visualization

Which visualization is most appropriate to examine the distribution and potential skew of a single continuous variable (e.g., transaction amount)?

Machine Learning Basics

A model predicts whether an email is spam (spam vs not spam). Which type of machine learning problem is this?

Statistical Analysis and Mathematics

A/B testing results show conversion rates of 4.2% (A) and 4.8% (B). The team wants to know if the difference is likely due to chance. Which statistical concept is primarily used to make this decision?

Data Manipulation and Visualization

A dataset includes an "income" feature with extreme outliers. The model is sensitive to feature scale. Which preprocessing approach is generally most robust to outliers while scaling?

Machine Learning Basics

During model evaluation, accuracy is 95%, but the dataset is highly imbalanced (only 3% positive class). Which metric is generally more informative for the positive class performance?

Data Science Fundamentals

A data scientist finds that "customer_id" is included as a numeric feature and the model’s performance jumps unexpectedly on the training set but not on validation data. What is the most likely issue?

Machine Learning Basics

A data scientist performs feature selection using the full dataset, then splits into train/test and reports test performance. The score seems unusually high. What is the best explanation?

Statistical Analysis and Mathematics

A team wants to estimate the uncertainty of a sample mean for a skewed distribution with unknown population variance, using only the observed data. Which approach is most appropriate?

Data Science Fundamentals

A product manager asks a data scientist to describe what a "data science lifecycle" is and why it matters. Which description is most accurate?

Data Manipulation and Visualization

A dataset contains an "age" field where some values are negative due to a data entry issue. What is the best initial action?

Data Manipulation and Visualization

You want to visualize the relationship between two continuous variables (e.g., advertising spend and sales). Which plot is most appropriate?

Machine Learning Basics

A bank is evaluating a binary classifier for fraud detection where only 0.5% of transactions are fraud. Which metric is generally more informative than accuracy for model selection in this scenario?

Statistical Analysis and Mathematics

A team reports a strong correlation between ice cream sales and drowning incidents. Which statement best explains why this correlation does not imply causation?

Statistical Analysis and Mathematics

During exploratory data analysis, you find a numeric feature with a long right tail (highly skewed) that will be used in a linear model. Which transformation is commonly used to reduce right skewness?

Data Manipulation and Visualization

A dataset has 20% missing values in a numeric column. You plan to train a model and want a simple baseline approach that preserves row count. Which option is most reasonable as a starting point?

Machine Learning Basics

A stakeholder requests a model to predict a continuous value (monthly energy consumption in kWh). Which algorithm type is the most appropriate starting point?

Statistical Analysis and Mathematics

You are asked to estimate the average customer satisfaction score. The distribution is skewed and contains outliers. Which approach is most appropriate to quantify uncertainty around the mean without relying heavily on normality assumptions?

Machine Learning Basics

A model shows excellent performance during cross-validation, but when evaluated on truly new data collected after deployment, performance drops significantly. Which issue most likely explains this behavior?

Statistical Analysis and Mathematics

A stakeholder asks for a quick metric that summarizes the typical order value, but the dataset contains a few extremely large orders that are rare. Which measure is MOST appropriate to report as the 'typical' value?

Data Manipulation and Visualization

You are preparing a dataset for a classification model. A column contains values like 'N/A', empty strings, and real categories. What is the BEST practice for handling these values before modeling?

Data Science Fundamentals

A data science team needs to explain what the 'target variable' is to a non-technical audience. Which description is MOST accurate?

Machine Learning Basics

A dataset has two numeric features: 'annual_income' ranges from 20,000 to 200,000 and 'months_with_company' ranges from 0 to 240. You plan to use k-nearest neighbors (k-NN). What preprocessing step is MOST recommended and why?

Data Manipulation and Visualization

A dashboard shows monthly revenue. Management wants to identify whether there is a repeating seasonal pattern across multiple years. Which visualization is MOST appropriate?

Machine Learning Basics

You build a binary classifier for fraud detection on a dataset where only 1% of transactions are fraudulent. Accuracy is 99%, but the model rarely flags fraud. Which metric is MOST informative for this situation?

Statistical Analysis and Mathematics

A data scientist computes a 95% confidence interval for the mean time-to-resolution of support tickets. What is the BEST interpretation of this interval?

Data Science Fundamentals

You are building a model to predict employee attrition. Your dataset includes a feature 'left_company_next_month' which is populated based on future HR updates. The model achieves extremely high performance in testing. What is the MOST likely issue?

Statistical Analysis and Mathematics

A team fits a linear regression model to predict house prices. Residual plots show a clear funnel shape: residual variance increases with predicted price. Which assumption is MOST clearly violated, and what is a reasonable next step?

Data Manipulation and Visualization

You are cleaning a dataset with 10 million rows using pandas. A teammate repeatedly appends rows to a DataFrame inside a loop and the job becomes extremely slow. What is the BEST troubleshooting recommendation?

Statistical Analysis and Mathematics

A retail team wants to summarize customer spending where a small number of customers spend extremely large amounts, creating a long right tail. Which metric is the most robust single-number summary of a typical customer's spending?

Data Science Fundamentals

A data scientist is given a dataset that includes a column called customer_id containing unique identifiers for each customer. For most modeling tasks, how should customer_id be treated?

Data Manipulation and Visualization

A dataset contains the columns: date, revenue, and units_sold. You want a visualization to show how revenue changes over time. Which chart is most appropriate?

Machine Learning Basics

A classification model is evaluated on a highly imbalanced dataset where only 2% of cases are positive. Accuracy is 98%, but the model rarely finds positives. Which metric is most appropriate to highlight this issue?

Data Science Fundamentals

A team is preparing data for a churn model. The target column churned is sometimes missing for recently acquired customers who have not been observed long enough. What is the best practice for handling these rows during supervised training?

Statistical Analysis and Mathematics

A researcher compares two groups (A and B) and wants a 95% confidence interval for the difference in their mean values. The data is not strongly non-normal and sample sizes are moderate. Which approach is most appropriate?

Data Manipulation and Visualization

A dataset includes a categorical column city with 200 unique values. You plan to use a linear model and want to include city. What is a common, appropriate encoding approach?

Machine Learning Basics

A team notices their model performs extremely well during cross-validation but poorly after deployment. Investigation shows that a feature 'post_purchase_support_calls' was created using calls made after the churn event. What issue best explains the discrepancy?

Machine Learning Basics

A logistic regression model outputs probabilities for a binary classification problem. The cost of false negatives is far higher than false positives. What is the most appropriate adjustment to address this while keeping the model unchanged?

Statistical Analysis and Mathematics

You are building a linear regression model. Residual plots show variance increasing with the fitted value (a funnel shape), and a normality check suggests heavy-tailed errors. Which action is a reasonable first step to improve model validity?

Statistical Analysis and Mathematics

A retail team wants to summarize customer spending by month and compare it across months. Which measure is most appropriate to report for each month to reduce the impact of a few extremely large purchases?

Data Manipulation and Visualization

A data scientist is building a churn dataset and discovers duplicate customer records caused by multiple sign-ups with the same email. What is the best next step before modeling?

Machine Learning Basics

A team is unsure whether a problem should be treated as classification or regression. The target variable is the number of days until a customer’s next purchase. Which type of problem is this?

Data Science Fundamentals

You create a feature 'total_spend_last_30_days' to predict whether a customer will churn next week. The feature was computed using transactions that occurred after the churn label date for some customers. What issue does this introduce?

Data Manipulation and Visualization

A dataset contains two categorical columns: 'city' with 10,000 unique values and 'membership_tier' with 4 values. For a foundational baseline model, which encoding approach is most appropriate for each column?

Statistical Analysis and Mathematics

A marketer claims a new email campaign increased conversion rate. You have conversion outcomes for a random sample of users who received the email and a control group that did not. Which statistical test is most appropriate to compare conversion rates?

Machine Learning Basics

A team evaluates a model for predicting rare fraud events (1% positive class). Accuracy is 99% but the model misses most fraud cases. Which metric is most appropriate to prioritize if the goal is to catch as many fraud cases as possible?

Data Manipulation and Visualization

You want to communicate the distribution of response times for two APIs and highlight differences in median and spread, while also showing potential outliers. Which visualization is most appropriate?

Statistical Analysis and Mathematics

A linear regression model is trained to predict house price. Residual plots show a funnel shape: residual variance increases as predicted price increases. Which assumption is most directly violated?

Data Science Fundamentals

A model performs extremely well in cross-validation but fails in production. Investigation shows that many predictors are derived from a 'final_status' field recorded only after the outcome occurs. What is the best corrective action?

Need more practice?

Expand your preparation with our larger question banks

100 Questions 200 Questions

FAQ

IBM A1000-120 - Assessment: Data Science Foundations 50 Practice Questions FAQs

IBM A1000-120 - Assessment: Data Science Foundations is a professional certification from IBM that validates expertise in ibm a1000-120 - assessment: data science foundations technologies and concepts. The official exam code is A1000-120.

Our 50 IBM A1000-120 - Assessment: Data Science Foundations practice questions include a curated selection of exam-style questions covering key concepts from all exam domains. Each question includes detailed explanations to help you learn.

50 questions is a great starting point for IBM A1000-120 - Assessment: Data Science Foundations preparation. For comprehensive coverage, we recommend also using our 100 and 200 question banks as you progress.

The 50 IBM A1000-120 - Assessment: Data Science Foundations questions are organized by exam domain and include a mix of easy, medium, and hard questions to test your knowledge at different levels.

More Preparation Resources

Explore other ways to prepare for your certification