GCP Developer Exam Study Guide - Part IV
Part 4 of 6
Last month I took the Google Cloud Platform Professional Developer Exam. To prepare, I put together a study guide. I'm posting it here in five parts. Hopefully, it will help someone else with the exam. You can see the full study guide at my GitHub.
Section 4: Integrating Google Cloud Platform Services
4.1 Integrating an application with Data and Storage services
- Enabling BigQuery and setting permissions on a dataset: There are a number of permissions and roles associated with BigQuery. Setting up a service account with the desired rights will allow compute resources to access BigQuery data at the desired level (read, write, delete, etc)
- Writing a SQL query to retrieve data from relational databases: All coding languages and frameworks will have their own tools/libraries for interfacing with SQL databases. Set the database endpoint to the one provided by GCP, and construct queries in SQL. Cloud Spanner allows you to query the db using the SDK.
- Analyzing data using BigQuery: This is a developer exam, not an analyst one. You can construct SQL-like queries in BigQuery, but for robust analysis, you will need an analytics tool, like Datalab.
- Fetching data from various databases: Managed databases have SDKs that can be used for queries in addition to a more conventional interface library in application code.
- Enabling Cloud SQL and configuring an instance: Using this tutorial, you can create an instance using the console (or gcloud or the api). After the instance has been created, use
gcloud sql connect <instance_name> --user=root
, provide the root password used when creating the instances, and you will be brought to the MySQL or PostgreSQL prompt. You can create databases and upload data using standard SQL. - Connecting to a Cloud SQL instance: Use the
gcloud sql connect
command above. - Enabling Cloud Spanner and configuring an instance: You can create an instances and configure the schema etc using the console.
- Creating an application that uses Cloud Spanner: Cloud Spanner has client libraries in most major programming languages. See more on writing code to interface.
- Configuring a Cloud Pub/Sub push subscription to call an endpoint: Pubsub can push messages as HTTP POST requests to webhooks, endpoints, or urls. Subscriptions can be configured with an auth header for endpoints that require it. The message.data field is base64-encoded. The endpoint must return a success code (200, 201, 202, 204, or 102) otherwise pub sub will retry delivery until the message expires. The server pushes messages in the subscription to preconfigured endpoints as HTTPS requests. If the server does not receive a success code from the endpoint, it re-sends the message. You can create a push subscription (pull by default) using the following command:
gcloud pubsub subscriptions create mySubscription --topic myTopic --push-endpoint="https://myapp.appspot.com/push"
- Connecting and running a CloudSQL query: You can connect to the cloudSQL instance via cloudshell, using the mysql client, or via proxy for external app. Once the connection is established and authenticated, queries can be run in standard SQL, like any IaaS RDMS. You can connect to a MySQL instance using the MySQL (or psql client for postgresql) client installed on another server, like so. Additionally, you can use
gcloud sql connect [INSTANCE_ID] --user=root
in the cloudshell to connect. You can run queries using the client. Connecting from different compute resources will have different tools/methods. - Storing and retrieving objects from Google Storage: You can use the console, or the gcloud SDK to read and write objects to buckets. In addition to the GUI, you can use the
gsutil
command line tool to interact with objects in buckets. Additionally, there are client sdks for the major programming languages. - Pushing and consuming from Data Ingestion sources: There are dozens of combinations of services that do this. How this works best really depends on the use case. You can have batches of data from dataproc, or streaming data coming from dataflow, its probably best to write it to a store, like sql, spanner, or datastore, or use a queue like pub sub to make sure no data is lost. pub/sub is a good multipurpose solution for this use case. Any data source (app, batch, logs) can write to it, and it can be picked up by many different services without data loss.
-
Reading and updating an entity in a Cloud Datastore transaction from an application: Use the sdk. In Datastore, the entity is whatever object and can have as many key value pairs as needed. Use the gcloud SDK:
get(key, missing=None, deferred=None, **transaction=var**, eventual=False)
(Python)put(entity)
to add or update
- Using the cli tools: every action in GCP is an api call and the gcloud cli can make those calls. The library is huge, but its generally something like
gcloud <service> <action> <options>
. Some services have special cli tools, like Cloud Storage (gsutil
) and BigQuery (bq
). -
Provisioning and configuring networks: Lot to cover here. GCP uses software defined networking, so VPC can be global, and subnets are regional. GCP offers a shared VPC where a host account holds the VPC, but other service projects can deploy resources into that VPC. Useful for enterprise architecture where app teams need their own projects to work in, but security and networking can maintain control over the network. VPCs have firewall rules to control traffic. VPCs can be peered to one another for easy access. They can be connected to hybrid environments using Cloud VPN or Interconnect. TL;DR, when configuring networks, make sure that firewall rules allow communication on ports needed for data solutions.
4.2 Integrating an application with Compute services.
- Implementing service discovery in gke, gae, and compute engine: GCP includes service discovery in the form of a metadata server. You can configure project metadata with shared environment variables, and query the metadata server endpoint (cURL).This is an older article talking about its use in for VMs. See the Docs. In the context of containers, service discovery refers to the types of containers as services.
- Writing an application that publishes/consumes from cloud pub sub: use the gcloud sdk to publish and consume. You can also hit the https endpoints to access that functionality. In addition to being able to hit the queues with an sdk and a service account, pub sub can also be exposed as an endpoint, complete with authentication for easy, secure access.
- Reading instance metadata to obtain application configuration: Instances (VMs as well as GAE instances) can be assigned labels (as opposed to network tags)- key value pairs that provide information about the instance. By labeling instances, you can use the
gcloud
tool to filter and sort for specific instances. Instance metadata usually includes information about the infrastructure (IP, VM class, applicable service accounts, etc). You can add custom metadata in the form of labels (as opposed to network tags). You can use this to create startup and shutdown scripts. Application configuration would have to be added as custom metadata, which could then be queried for post provisioning, maintenance, governance, and other tasks. (I couldn't find any info on app info in metadata). -
Authenticating users by using Oauth2 Web Flow and Identity Aware Proxy:
- Oauth: A widely used auth framework. See more. Here is a detailed walkthrough for configuring OAuth in GCP.
- IAP: is a GCP service that uses identity and context to sign in to apps and vms. See the tutorial. You can allow users access by whitelisting them. You can also add conditions based on location, and add admin rights depending on the URL path. Check the quick demo.
- Using the CLI tools: Same info as previous section.
-
Configuring Compute services network settings (e.g., subnet, firewall ingress/egress, public/private IPs): subnet and firewall rules have already been answered. All Compute instances have an internal IP that allows resources within the network to access them. Public IPs are optional- they can be removed. They allow access from outside the network. Multiple IPs can be added by adding additional virtual NICs.
4.3 Integrating Google Cloud APIs with applications.
- Enabling a GCP API: APIs must be enabled on a per service basis on each project. You can do so in the GUI or programmatically.
- Using pre-trained Google ML APIs: Google provides a set of ML as a service API for NLP, Image recognition, translation, and speech to text. You can use these API via sdk, like so or via endpoints (You can also use the GUI, but that doesn't integrate into an application).
-
Making API calls with a Cloud Client Library, the REST API, or the APIs Explorer, taking into consideration:
- batching requests: You can batch api calls to send multiple requests at once. This would be useful for caching offline requests, or for new APIs where there is a lot of data to upload
- restricting return data: When making calls, since there are limits and quotas to the number of calls you can make, they should be carefully constructed. However, calls returning large amounts of data (for instance, all files in a bucket where logs are stored), can cause performance issues. Ensure that calls contain filters to get only the data needed. When calling for info about resources, filter with tools like labels to return the resources needed. (I couldn't find any specific tools or guides on how to execute on this, defer to the specific guide on the tool you are using).
- paginating results: When a large collection of results needs to be returned, paginiating can be an effective way to prevent overload of systems/ Paginiation is the practice of splitting a list of results into groups (or pages) and returning results a 'page' at a time. I couldn't find any docs indicating how this is implemented for GCP APIs.
- caching results: When designing applications that will poll relatively static information, caching results and holding them for a designated TTL can be a performance improving and cost saving design pattern. Make sure to balance how quickly the data will become stale with how much data would need to be queried and how great the performance impact on the system would be. Caching can also be an effective design choice for applications that need to function with poor internet connections.
- Using service accounts to make Google API calls: Service accounts allow services and applications to call GCP resources. Create the service account, make sure it has rights to make whatever calls the app should. The service account can be assigned to GCP resources within the context of GCP (console/command line). For external services, the service account has a key that can be installed and used to authenticate API calls.
- Using APIs to read/write to data services (BigQuery. Cloud Spanner): For BigQuery you can make GET and POST requests to access datasets. For cloudspanner, the api has an
ExecuteSql
method to pass in a query and run it, returning the result. -
Using the Cloud SDK to perform basic tasks: basic tasks will vary depending on role. However,
gcloud
syntax is usually something like:gcloud <service> <method> <submethod> --<flags>