Introduction
Log Based discovery is an easy to deploy, low-cost, and simple to manage mechanism to enable the Ghost Platform to discover APIs and API Endpoints in your cloud environment in just a few minutes. It works by processing replicas of AWS Load Balancer Access Logs from the Load Balancers in your account and feeds them into the Ghost Platform to populate your API and API Endpoint inventories.
System Overview
Supported Sources
The following access log-based sources are currently supported:
Planned
API Gateways (REST and HTTP)
AWS WAFv2 Logs
AWS CloudFront Distribution Logs
Architecture
The Ghost Log Based processing stack is comprised of two S3 buckets and one AWS Lambda log processing function. Not pictured are several IAM Roles necessary for monitoring and updating of the stack by Ghost. The detailed resource breakdown is available in the terraform module.
Ghost Input Bucket (S3 bucket) - Accepts Load Balancer logs replicated to it via S3 bucket replication rules from an existing customer Load Balancer logging bucket.
Ghost Log Processing Function (Lambda) - Processes log files landing in the Ghost Input Bucket and submits the sanitized and reduced data to the Ghost API.
Deployment Guide
The Ghost Log Based processing stack can be deployed in the following regions:
af-south-1
, ap-east-1
, ap-northeast-1
, ap-northeast-2
, ap-northeast-3
, ap-south-1
, ap-south-2
, ap-southeast-1
, ap-southeast-2
, ap-southeast-3
, ca-central-1
, eu-central-1
, eu-central-2
, eu-north-1
, eu-south-1
, eu-south-2
, eu-west-1
, eu-west-2
, eu-west-3
, me-central-1
, me-south-1
, sa-east-1
, us-east-1
, us-east-2
, us-west-1
, us-west-2
.
The foundational buckets and Lambda function will be deployed into a single region of a single account using Terraform. We provide a terraform module that simplifies the deployment of the necessary resources.
The following diagram depicts two deployments in the same Account - one per Region.
Installation Steps
1. Navigate to the API Keys page.
2. Generate a new API key with "Write Logs" permissions by pressing the "Create API Key" button in the top right of the page.
3. After pressing the "Create API Key" button, copy the value that is displayed in the UI. This key is necessary for the lambda function to authenticate with the Ghost API .
4. Create a new secret in AWS Secrets Manager in the account you intend to deploy the forwarder to. Use a Plaintext secret where the value is the API key created in the previous step. For example, set the value to gho_Ym....
instead of {"key":"gho_Ym..."}
. Once the secret is created, note the ARN of the secret for the next step.
5. Follow this example in the log forwarder module documentation to deploy the resources in your AWS account.
The Ghost Log processing stack is now deployed successfully. Proceed to the Configuration Guide section to enable S3 Bucket Replication if this was not already done through terraform in order to get logs flowing into this stack and on their way to the Ghost Platform.
Configuration Guide
Enabled Load Balancer Logging
In any region with a supported log discovery source, determine if the log source is already logging to an S3 bucket. For a Load Balancer that is not yet configured for logging to an S3 Bucket, enable access logging by following the official AWS documentation first. Make note of this bucket name.
Enable Bucket Replication
1. Now that Load Balancer logs are being sent to your own logging S3 bucket, enable versioning on that bucket:
aws s3api put-bucket-versioning --bucket YOUR_LOG_BUCKET --versioning-configuration Status=Enabled
Note: replace YOUR_LOG_BUCKET
with your actual bucket name.
2. Next, configure a bucket replication rule on YOUR_LOG_BUCKET
to the GHOST_INPUT_BUCKET
:
aws s3api put-bucket-replication --bucket YOUR_LOG_BUCKET --replication-configuration '{
"Role": "arn:aws:iam::YOUR_ACCOUNT_ID:role/S3ReplicationRole",
"Rules": [
{
"ID": "ReplicateALBLogsToGhostInput",
"Prefix": "",
"Status": "Enabled",
"Destination": {
"Bucket": "arn:aws:s3:::GHOST_INPUT_BUCKET"
},
"ExistingObjectReplication": {
"Status": "Disabled"
}
}
]
}'
Note: replace YOUR_LOG_BUCKET
with your actual bucket name, YOUR_ACCOUNT_ID
with your actual AWS account number, and GHOST_INPUT_BUCKET
with the bucket name from the outputs of the CloudFormation Stack.
We recommend not replicating LB logs older than 24hrs either automatically via sync or manually as they will be unnecessarily evaluated by the Lambda function to be discarded and therefore not be processed by the Ghost Platform.
3. After 5 minutes, confirm that logs are being replicated and delivered to the GHOST_INPUT_BUCKET
:
aws s3 ls s3://GHOST_INPUT_BUCKET
4. To verify logs are flowing to the platform, navigate to the API Keys page and you should see the "Last Seen" column for the API key you created earlier updated to a recent timestamp. Assuming the load balancer(s) you configured to send logs to the forwarder are receiving API traffic you should also see new Endpoints and APIs being populated in the platform.
Uninstallation Guide
If you configured bucket replication using terraform, you can skip steps 1&2 to remove bucket replication rules.
Note the name of the
GHOST_INPUT_BUCKET
.Navigate to the S3 console and find the Load Balancer logging bucket. Under the
Management
tab, delete any replication rules that are configured to send copies of log files to theGHOST_INPUT_BUCKET
.Repeat this step for all Load Balancer log buckets that have replication configured to replicate log files to the
GHOST_INPUT_BUCKET
Run
terraform destroy
to remove the log forwarder module.
Support
Frequently Asked Questions
What is the cost of the solution?
The primary points where cloud provider costs are incurred are:
Load Balancer Logs - Charged for log storage per AWS S3 pricing of logs delivered to the customer-owned logging bucket. In many organizations, this is already in place.
For bucket replication from the customer Load Balancer log bucket to the Ghost Input Bucket within the same AWS Region, there is no AWS transit cost. For bucket replication that crosses AWS Regions, see the Data Transfer OUT From Amazon S3 To section of the pricing page. In most cases, this is $0.023/GB.
Replicated Logs in the Ghost Input Bucket - Normal S3 storage pricing that will not exceed one day's worth of Load Balancer logs as the Ghost Input Bucket has a 24hr lifecycle deletion policy.
The idle cost of the Ghost-deployed infrastructure that is processing no logs is near zero. Log processing by the Go-based Lambda function and networking costs will vary based on the amount of Load Balancer log files and log lines sent to the Ghost Input Bucket. By default, the AWS logging service writes gzipped log files every 5 minutes for every Load Balancer with access logging configured.
Why require S3 bucket replication to send the logs to Ghost?
The Ghost log forwarder is meant to receive a replicated copy of the log files to a dedicated Ghost controlled "input" bucket in the same region so that:
There is no disruption to or overlap with the event notification configuration of existing log processing pipelines.
The customer has full control over which load balancer logs are to be sent to Ghost by configuring S3 bucket replication rules.
The Ghost Log Forwarding processes can safely delete the logs replicated to the Ghost "input" bucket after they are processed as they are not the primary system of record.
What about Cross-Region Replication (CRR)?
It is possible to replicate source logs across regions into a single, centralized input bucket. Many organizations find that cross-region replication costs outweigh the benefit of centralized log storage. That said, if cross-region log replication is already in place to facilitate other use cases, Ghost can take advantage of that configuration with either deployment approach. If logs are in a single Account and Region, leverage the targeted deployment method. If logs are in a small number of Accounts and Regions, use the Organization-wide approach and configure the CloudFormation Stack Set deployment Accounts and Regions to include only those that are involved in log aggregation.