Professional Data Engineer Intermediate Practice Exam: Medium Difficulty 2025
Ready to level up? Our intermediate practice exam features medium-difficulty questions with scenario-based problems that test your ability to apply concepts in real-world situations. Perfect for bridging foundational knowledge to exam-ready proficiency.
Your Learning Path
What Makes Intermediate Questions Different?
Apply your knowledge in practical scenarios
Medium Difficulty
Questions that test application of concepts in real-world scenarios
Scenario-Based
Practical situations requiring multi-concept understanding
Exam-Similar
Question style mirrors what you'll encounter on the actual exam
Bridge to Advanced
Prepare yourself for the most challenging questions
Medium Difficulty Practice Questions
10 intermediate-level questions for Professional Data Engineer
A retail company wants to build a near-real-time analytics pipeline for clickstream events. Events arrive continuously, must be deduplicated (same eventId can be retried), enriched with a small reference dataset updated daily, and written to BigQuery for dashboards with a typical end-to-end latency under 2 minutes. Which architecture best meets these requirements with minimal operational overhead?
A media company has 500 TB of semi-structured JSON logs in Cloud Storage and wants to enable ad-hoc SQL analysis with schema-on-read. Analysts frequently filter by event_date and user_id, and query cost/performance must be optimized. Which approach is most appropriate?
A financial services company needs to process daily batch files from multiple partners. The pipeline must validate schema, quarantine bad records, and produce a curated dataset in BigQuery. They also want to version and promote changes to transformations across dev/test/prod with code review. Which solution best fits?
A Dataflow streaming pipeline reading from Pub/Sub intermittently fails due to malformed messages. The team needs the pipeline to continue processing valid messages, route invalid payloads for later inspection, and set up alerting when the rate of invalid messages exceeds a threshold. What should they do?
Your team operates multiple BigQuery datasets across projects. A new requirement states that analysts must only see rows for their region (e.g., EMEA, APAC) without duplicating tables, and access should be manageable centrally. What is the best approach?
A batch pipeline loads data into BigQuery each hour. Occasionally, the same source file is delivered twice, and the curated table ends up with duplicate rows. The team wants the load to be idempotent and auditable. What should they implement?
A team has trained a model in Vertex AI and deployed it to an endpoint. After deployment, they suspect training/serving skew because a key categorical feature is encoded differently online than in training. They want a solution that reduces drift risk and makes preprocessing consistent between training and prediction. What should they do?
A data science team needs to retrain a churn model weekly using the latest labeled data in BigQuery, run evaluation, and only deploy if performance meets a defined threshold. The process must be repeatable and auditable. Which approach best satisfies this?
A company has regulatory requirements to ensure that sensitive columns (e.g., national_id) are protected in analytics. Analysts should still be able to join on the identifier and compute aggregates, but must not see the raw values. What is the best solution in BigQuery?
A data platform team needs to improve reliability for a critical dataset produced by an ELT workflow in BigQuery. They want automated checks for freshness and null rates on key columns, and they need to be alerted when checks fail. Which approach best meets these needs using Google Cloud-native capabilities?
Mastered the intermediate level?
Challenge yourself with advanced questions when you score above 85%
Professional Data Engineer Intermediate Practice Exam FAQs
Professional Data Engineer is a professional certification from Google Cloud that validates expertise in professional data engineer technologies and concepts. The official exam code is GCP-9.
The Professional Data Engineer intermediate practice exam contains medium-difficulty questions that test your working knowledge of core concepts. These questions are similar to what you'll encounter on the actual exam.
Take the Professional Data Engineer intermediate practice exam after you've completed the beginner level and feel comfortable with basic concepts. This helps bridge the gap between foundational knowledge and exam-ready proficiency.
The Professional Data Engineer intermediate practice exam includes scenario-based questions and multi-concept problems similar to the GCP-9 exam, helping you apply knowledge in practical situations.
Continue Your Journey
More resources to help you pass the exam