Google Cloud

Intermediate Level

Medium Difficulty

Professional Data Engineer Intermediate Practice Exam: Medium Difficulty 2025

Ready to level up? Our intermediate practice exam features medium-difficulty questions with scenario-based problems that test your ability to apply concepts in real-world situations. Perfect for bridging foundational knowledge to exam-ready proficiency.

20 Medium Questions

Scenario-Based

Exam-Similar Style

Next: Advanced Level Back to Beginner

Your Learning Path

Level Up Your Skills

What Makes Intermediate Questions Different?

Apply your knowledge in practical scenarios

Medium Difficulty

Questions that test application of concepts in real-world scenarios

Scenario-Based

Practical situations requiring multi-concept understanding

Exam-Similar

Question style mirrors what you'll encounter on the actual exam

Bridge to Advanced

Prepare yourself for the most challenging questions

Intermediate Questions

Medium Difficulty Practice Questions

10 intermediate-level questions for Professional Data Engineer

AI Generated

Medium Difficulty

Designing data processing systems

A retail company wants to build a near-real-time analytics pipeline for clickstream events. Events arrive continuously, must be deduplicated (same eventId can be retried), enriched with a small reference dataset updated daily, and written to BigQuery for dashboards with a typical end-to-end latency under 2 minutes. Which architecture best meets these requirements with minimal operational overhead?

Designing data processing systems

A media company has 500 TB of semi-structured JSON logs in Cloud Storage and wants to enable ad-hoc SQL analysis with schema-on-read. Analysts frequently filter by event_date and user_id, and query cost/performance must be optimized. Which approach is most appropriate?

Designing data processing systems

A financial services company needs to process daily batch files from multiple partners. The pipeline must validate schema, quarantine bad records, and produce a curated dataset in BigQuery. They also want to version and promote changes to transformations across dev/test/prod with code review. Which solution best fits?

Building and operationalizing data processing systems

A Dataflow streaming pipeline reading from Pub/Sub intermittently fails due to malformed messages. The team needs the pipeline to continue processing valid messages, route invalid payloads for later inspection, and set up alerting when the rate of invalid messages exceeds a threshold. What should they do?

Building and operationalizing data processing systems

Your team operates multiple BigQuery datasets across projects. A new requirement states that analysts must only see rows for their region (e.g., EMEA, APAC) without duplicating tables, and access should be manageable centrally. What is the best approach?

Building and operationalizing data processing systems

A batch pipeline loads data into BigQuery each hour. Occasionally, the same source file is delivered twice, and the curated table ends up with duplicate rows. The team wants the load to be idempotent and auditable. What should they implement?

Operationalizing machine learning models

A team has trained a model in Vertex AI and deployed it to an endpoint. After deployment, they suspect training/serving skew because a key categorical feature is encoded differently online than in training. They want a solution that reduces drift risk and makes preprocessing consistent between training and prediction. What should they do?

Operationalizing machine learning models

A data science team needs to retrain a churn model weekly using the latest labeled data in BigQuery, run evaluation, and only deploy if performance meets a defined threshold. The process must be repeatable and auditable. Which approach best satisfies this?

Ensuring solution quality

A company has regulatory requirements to ensure that sensitive columns (e.g., national_id) are protected in analytics. Analysts should still be able to join on the identifier and compute aggregates, but must not see the raw values. What is the best solution in BigQuery?

Ensuring solution quality

A data platform team needs to improve reliability for a critical dataset produced by an ELT workflow in BigQuery. They want automated checks for freshness and null rates on key columns, and they need to be alerted when checks fail. Which approach best meets these needs using Google Cloud-native capabilities?

Mastered the intermediate level?

Challenge yourself with advanced questions when you score above 85%

Try Advanced

FAQ

Professional Data Engineer Intermediate Practice Exam FAQs

Professional Data Engineer is a professional certification from Google Cloud that validates expertise in professional data engineer technologies and concepts. The official exam code is GCP-9.

The Professional Data Engineer intermediate practice exam contains medium-difficulty questions that test your working knowledge of core concepts. These questions are similar to what you'll encounter on the actual exam.

Take the Professional Data Engineer intermediate practice exam after you've completed the beginner level and feel comfortable with basic concepts. This helps bridge the gap between foundational knowledge and exam-ready proficiency.

The Professional Data Engineer intermediate practice exam includes scenario-based questions and multi-concept problems similar to the GCP-9 exam, helping you apply knowledge in practical situations.

Continue Your Journey

More resources to help you pass the exam