<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://engseclabs.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://engseclabs.com/" rel="alternate" type="text/html" /><updated>2026-05-09T05:17:54+00:00</updated><id>https://engseclabs.com/feed.xml</id><title type="html">EngSecLabs</title><subtitle>Practical security programs for early to growth-stage B2B SaaS companies. Real risk reduction, not compliance theater.</subtitle><entry><title type="html">Credential isolation and least privilege for AWS agents</title><link href="https://engseclabs.com/blog/iam-agent-proxy/" rel="alternate" type="text/html" title="Credential isolation and least privilege for AWS agents" /><published>2026-05-08T00:00:00+00:00</published><updated>2026-05-08T00:00:00+00:00</updated><id>https://engseclabs.com/blog/iam-agent-proxy</id><content type="html" xml:base="https://engseclabs.com/blog/iam-agent-proxy/"><![CDATA[<p>Two problems come up every time you give an AI agent AWS access: the agent has exfiltratable credentials, and you have to guess what permissions it needs in the form of an IAM policy.</p>

<p><a href="https://github.com/engseclabs/iam-agent-proxy">iam-agent-proxy</a> is an HTTPS proxy for AWS CLI/SDK calls that validates requests using fake AWS keys and re-signs with real credentials. And because the proxy intercepts every request, it can resolve each one to an IAM action string, generate, and even enforce a least-privilege policy from what the agent actually called.</p>

<h2 id="getting-started">Getting started</h2>

<p>Start the proxy with whatever AWS profile has the permissions your agent needs:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">AWS_PROFILE</span><span class="o">=</span>my-real-profile iam-agent-proxy
</code></pre></div></div>

<p>In a second terminal, point the agent at it:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">AWS_PROFILE</span><span class="o">=</span>iam-agent-proxy
<span class="nb">export </span><span class="nv">HTTPS_PROXY</span><span class="o">=</span>http://localhost:8080
</code></pre></div></div>

<p>The agent gets proxy-issued fake keys — no IAM identity behind them:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"Version"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
  </span><span class="nl">"AccessKeyId"</span><span class="p">:</span><span class="w"> </span><span class="s2">"AKIAPROXY0000000001"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"SecretAccessKey"</span><span class="p">:</span><span class="w"> </span><span class="s2">"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"Expiration"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2026-05-08T15:00:00Z"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Make some AWS calls:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws sts get-caller-identity
aws s3 <span class="nb">ls</span>
</code></pre></div></div>

<p>The proxy terminal logs each resolved action:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[14:32:01] ALLOWED  sts:GetCallerIdentity
[14:32:09] ALLOWED  s3:ListAllMyBuckets
</code></pre></div></div>

<p>Run the agent through a representative workload, then extract the observed policy:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>iam-agent-proxy policy
</code></pre></div></div>

<p>That emits standard IAM policy JSON you can use as an inline policy or session policy. Set <code class="language-plaintext highlighter-rouge">PROXY_MODE=enforce</code> and point <code class="language-plaintext highlighter-rouge">ALLOWLIST_PATH</code> at that file and the proxy starts blocking anything outside it, returning a well-formed <code class="language-plaintext highlighter-rouge">AccessDenied</code> 403 so the agent’s error handling works as designed.</p>

<p>The workflow inverts the usual least-privilege approach: instead of guessing what the agent needs before it runs, you observe what it actually does and lock in that baseline.</p>

<p>Check it out at <a href="https://github.com/engseclabs/iam-agent-proxy">github.com/engseclabs/iam-agent-proxy</a>. If you’re building in this space or hit a case it doesn’t cover, reach out on <a href="https://www.linkedin.com/in/alex-smolen-8a59a31">LinkedIn</a> or <a href="https://infosec.exchange/@alsmola">Mastodon</a>.</p>]]></content><author><name>Alex Smolen</name></author><category term="aws," /><category term="iam," /><category term="ai-agents," /><category term="security," /><category term="credential-isolation" /><summary type="html"><![CDATA[A proxy that holds real AWS credentials, gives agents fake keys, re-signs outbound requests, and generates a least-privilege policy from observed behavior.]]></summary></entry><entry><title type="html">AWS Credential Isolation for Local AI Agents</title><link href="https://engseclabs.com/blog/agent-credential-isolation/" rel="alternate" type="text/html" title="AWS Credential Isolation for Local AI Agents" /><published>2026-04-27T00:00:00+00:00</published><updated>2026-04-27T00:00:00+00:00</updated><id>https://engseclabs.com/blog/agent-credential-isolation</id><content type="html" xml:base="https://engseclabs.com/blog/agent-credential-isolation/"><![CDATA[<p>If you run local agents, you need to make tough choices between autonomy and safety. Setting <code class="language-plaintext highlighter-rouge">dangerously-skip-permissions</code> while sword fighting on desk chairs and letting the tokens burn bright is an out-of-reach dream when you’re forced to babysit file access and “can I use the internet for this?” requests. But the impact of an agent doing something stupid needs to be pretty low for humans to check out of the loop.</p>

<p><img src="/assets/images/agent-credential-isolation.png" alt="Agent credential isolation diagram" /></p>

<p>I’ve been interacting with AWS from my local coding agents and ran into a source of flummoxation. If you’ve got even a moderately complicated AWS IAM setup, profile management gets gnarly. The problem has three parts:</p>

<ul>
  <li>The agent should only have access to a specific AWS identity, not whatever creds happens to be in your shell</li>
  <li>That identity should use short-lived credentials that refresh automatically, not static keys you rotate manually</li>
  <li>And (ideally) the permissions on that identity are scoped to what the agent actually needs for the task at hand</li>
</ul>

<p>Personal opinion, but <a href="https://github.com/engseclabs/trailtool">trailtool</a> is an awesome way to handle the third point - see my <a href="https://engseclabs.com/blog/cloudtrail-for-ai-agents#defining-least-privilege-iam-policies-for-roles">blog post</a> for how TrailTool can generate least privilege IAM roles from observed CloudTrail activity. For the first two, I ran into a tool called <a href="https://github.com/61418/elhaz">elhaz</a> that solves the credentials part nicely. By tying them together, you can get a sandboxed agent scoped to a specific AWS identity, with automatically refreshing short-lived credentials, locked down to least privilege.</p>

<h2 id="elhaz">elhaz</h2>

<p>Elhaz is a local credential broker daemon that manages in-memory AWS STS credentials and serves them over a Unix socket at <code class="language-plaintext highlighter-rouge">~/.elhaz/sock/daemon.sock</code>. A core concept is  a <em>config</em> -  a saved set of parameters for assuming an AWS role (the role ARN, session configuration, etc). When you run <code class="language-plaintext highlighter-rouge">elhaz daemon add -n my-config</code> you’re telling the daemon to actually assume that role using your active AWS cred chain and start managing the session. From that point the daemon holds live, automatically refreshing credentials for that config in memory.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create a config for the role you want the agent to assume</span>
elhaz config add

<span class="c"># Start the daemon and initialize the session</span>
elhaz daemon start
elhaz daemon add <span class="nt">-n</span> my-agent-role

<span class="c"># Verify it's working</span>
elhaz <span class="nb">whoami</span> <span class="nt">-n</span> my-agent-role
</code></pre></div></div>

<p>A very cool aspect of elhaz is that it exposes the assumed role credentials via Unix socket IPC rather than through the standard env var or filesystem approaches that underlie the AWS credential chain. That’s what makes the sandboxing story clean.</p>

<h3 id="sandboxing-aws-credentials-in-a-container">Sandboxing AWS credentials in a container</h3>

<p>Agent sandboxing is moving fast and anything I say about specific tools will get stale quickly. There’s a whole ecosystem here: <a href="https://simonwillison.net/2025/Oct/22/living-dangerously-with-claude/">sandbox-exec</a>, <a href="https://github.com/containers/bubblewrap">Bubblewrap</a>, <a href="https://github.com/Use-Tusk/fence">fence</a>, <a href="https://developers.openai.com/codex/concepts/sandboxing">Codex</a> and <a href="https://code.claude.com/docs/en/sandboxing">Claude</a> sandboxes, and more landing constantly. The point is that these mechanisms restrict agent access to resources using OS-level primitives. The credential delivery mechanism needs to fit the affordances, and it doesn’t seem env vars, files, and ports fit the bill. In my research, I’ve been using Docker for agent isolation, and what follows is what I found, listed in increasing order of how well each approach actually holds up.</p>

<p><em>Note</em>: this assumes you’re using ephemeral, role-based credentials throughout. IAM user access keys are static, long-lived, and <a href="https://aws.amazon.com/blogs/security/practical-steps-to-minimize-key-exposure-using-aws-security-services/">the top entry point for attackers</a> when they leak..</p>

<h3 id="environment-variables">Environment variables</h3>

<p>When we need isolated creds, you might consider reaching for <code class="language-plaintext highlighter-rouge">aws-vault exec</code> or <code class="language-plaintext highlighter-rouge">aws configure export-credentials</code> (<code class="language-plaintext highlighter-rouge">elhaz</code> also support <code class="language-plaintext highlighter-rouge">export</code>), capture the output, and inject it into the container. It works for a one-off, but those are snapshot credentials. STS session tokens typically expire in an hour, at which point your long-running agent session breaks and you’re back to manually refreshing and re-injecting. You may be able to wire something like this up where the container stays updated over time, but it’s not a hands-off approach.</p>

<h3 id="credential-files">Credential files</h3>

<p>Mounting <code class="language-plaintext highlighter-rouge">~/.aws/</code> into the container is a step up since the SDK can re-read profiles and SSO tokens directly. But it gives the agent access to every profile in that directory, not just the one you intended. And SSO tokens cached on the host are not refreshable from inside the container without a browser. In a multi-agent setup the access boundary is too coarse: you can’t easily say “Agent A gets dev access and Agent B gets prod access” without a lot of manual wiring.</p>

<p>One note: Claude Code has a <a href="https://code.claude.com/docs/en/sandboxing">native sandbox</a> worth understanding here, because it does not solve the credential isolation problem. The sandbox restricts writes to an allowlist of paths and filters outbound HTTP through a proxy. Reads are unrestricted by default. That means Claude can read your entire home directory out of the box, including <code class="language-plaintext highlighter-rouge">~/.aws/credentials</code> etc. You can add <code class="language-plaintext highlighter-rouge">~/.aws</code> to a deny list, but it doesn’t hold: the deny list applies to Claude’s built-in file read tool, not bash subprocesses. Running <code class="language-plaintext highlighter-rouge">cat ~/.aws/credentials</code> via Bash succeeds regardless.</p>

<h3 id="metadata-emulation">Metadata emulation</h3>

<p>The more sophisticated approach is emulating the <a href="https://www.wiz.io/blog/the-many-ways-to-obtain-credentials-in-aws">AWS instance metadata service</a> locally. This is how ECS delivers credentials to tasks in production: the SDK makes HTTP requests to a well-known local endpoint and gets fresh credentials back on demand. Tools like aws-vault replicate this locally via <code class="language-plaintext highlighter-rouge">--server</code> mode, pointing <code class="language-plaintext highlighter-rouge">AWS_CONTAINER_CREDENTIALS_FULL_URI</code> at a local HTTP server that vends credentials.</p>

<p>The problem is that this doesn’t work on macOS, and the workarounds are dead ends. Docker Desktop runs the engine inside a Linux VM, so <code class="language-plaintext highlighter-rouge">--network host</code> reaches the VM’s loopback, not the Mac’s (<a href="https://docs.docker.com/engine/network/drivers/host/">Docker host networking doesn’t work on macOS</a>). And even if you could route it, the AWS SDK has a hardcoded allowlist for HTTP credential endpoints: only <code class="language-plaintext highlighter-rouge">localhost</code>, <code class="language-plaintext highlighter-rouge">127.0.0.1</code>, and the ECS/EKS metadata IPs are permitted. Adding <code class="language-plaintext highlighter-rouge">host.docker.internal</code> was <a href="https://github.com/aws/aws-sdk/issues/562">requested</a> and closed as “not planned.” The aws-vault issue tracker has <a href="https://github.com/99designs/aws-vault/issues/767">an open thread</a> on this that has been sitting unresolved for years.</p>

<h3 id="unix-sockets">Unix sockets</h3>

<p>Unix sockets represent a convenient way to use filesystem access to express AWS dynamic credential access. If you don’t explicitly mount a socket into a container, that socket doesn’t exist in that container’s universe. Run two elhaz configs, get two socket files, and the volume mount itself becomes the authorization decision. Agent B cannot reach Agent A’s credentials as the storage locations are partitioned.</p>

<p>Here’s what it looks like in practice:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># On the host: start the daemon and assume the role</span>
elhaz daemon start
elhaz daemon add <span class="nt">-n</span> my-agent-role

<span class="c"># Run a container with only that socket mounted</span>
docker run <span class="nt">--rm</span> <span class="se">\</span>
  <span class="nt">-v</span> ~/.elhaz/sock/daemon.sock:/tmp/elhaz.sock <span class="se">\</span>
  <span class="nt">-e</span> <span class="nv">AWS_DEFAULT_REGION</span><span class="o">=</span>us-east-1 <span class="se">\</span>
  <span class="nt">-e</span> <span class="nv">AWS_PROFILE</span><span class="o">=</span>elhaz <span class="se">\</span>
  python:3.12-slim <span class="se">\</span>
  bash <span class="nt">-c</span> <span class="s1">'
pip install elhaz awscli -q &amp;&amp;
mkdir -p ~/.aws &amp;&amp;
cat &gt; ~/.aws/config &lt;&lt;EOF
[profile elhaz]
credential_process = elhaz --socket-path /tmp/elhaz.sock export --format credential-process -n my-agent-role
region = us-east-1
EOF
aws sts get-caller-identity
'</span>
</code></pre></div></div>

<p>The container never touches your <code class="language-plaintext highlighter-rouge">~/.aws</code> directory or environment variables. Credentials are scoped to a single named config, automatically refreshed by the daemon, and never written to disk inside the container. The same pattern likely works with non-Docker sandboxing approaches since Unix socket access is controlled by filesystem permissions, a primitive that most sandboxing tools expose in some form (swapping Docker for Bubblewrap, testing on different Linux platforms, etc. is left as an exercise to the reader).</p>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[How to give a local coding agent exactly the AWS access it needs, nothing more, using elhaz]]></summary></entry><entry><title type="html">TrailTool: CloudTrail for AI Agents</title><link href="https://engseclabs.com/blog/cloudtrail-for-ai-agents/" rel="alternate" type="text/html" title="TrailTool: CloudTrail for AI Agents" /><published>2026-03-23T00:00:00+00:00</published><updated>2026-03-23T00:00:00+00:00</updated><id>https://engseclabs.com/blog/cloudtrail-for-ai-agents</id><content type="html" xml:base="https://engseclabs.com/blog/cloudtrail-for-ai-agents/"><![CDATA[<p>Running security for AWS-centric companies means getting down and dirty with CloudTrail. Not only will you crawl the logs with SIEMs to “find the baddies” via IoCs; as a proactive engineering-focused security team, you’ll rely on them to implement access control, validate changes, and debug problems.</p>

<p>In the agentic AI era<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> these tasks can be carried out by Claude et al., but CloudTrail logs are hard to synthesize. Answering “did <code class="language-plaintext highlighter-rouge">contractor@company.com</code> update this S3 bucket in the last 30 days?” could mean sifting through terabytes of logs, toiling with custom queries, and correlating role assumptions by hand. You can connect agents with <a href="https://aws.amazon.com/about-aws/whats-new/2025/09/aws-cloudtrail-mcp-server-enhanced-security-analysis/">MCP tooling</a> and build skills to standardize query patterns, but it’s a fair amount of non-trivial configuration and undifferentiated heavy lifting. Every query (especially loops where the agent needs to learn/fix the syntax) wastes time and tokens while bloating the context window.</p>

<p>Enter <a href="https://github.com/engseclabs/trailtool">TrailTool</a>. The big idea is to process (Lamba) and cache (DynamoDB) CloudTrail based on access patterns, grouping events into entities (People, Sessions, Roles, Services, Resources). When you want to ask common questions (<em>what has this role accessed?</em>, <em>who accessed this resource?</em>, etc.) you get quick trustable answers (<code class="language-plaintext highlighter-rouge">trailtool roles detail &lt;RoleName&gt;</code>). TrailTool is open source - deploy the Ingestor Lambda via SAM and your agent (or you) query with a CLI that works with standard AWS credentials.</p>

<p>Here are four workflows I’ve been running with it. For each one, there’s a prompt I gave Claude Code and the resulting session transcript.</p>

<h2 id="detecting-clickops-modifications">Detecting “ClickOps” modifications</h2>

<p>Ah, ClickOps, the primordial ooze from which fully realized cloud services emerge. Who amongst us hasn’t felt the rush, the thrill, of building software with only a faint understanding of the resources being created using a wizard UI and a prayer? It’s the original way to vibe software development.</p>

<p>If you do cloud security, you know that ClickOps resources bypass some kinda important security mechanisms like “change control” and “cloud hardening standards.” They may represent a drift from Infrastructure as Code that needs to be rectified. Or they may represent an opportunity to nudge someone onto an IaC pattern. At the very least, it’s an opportunity to review the resource for security best practices.</p>

<p><strong>Prompt:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Use trailtool to identify resources created or modified via ClickOps over the last 30 days, import them into Terraform state, and create the relevant Terraform configuration.
</code></pre></div></div>

<details>
<summary><strong>Session transcript</strong> (<a href="https://gist.github.com/alsmola/e51a5a100a4c537b8d19f2366f593b1a">view gist</a>)</summary>

<script src="https://gist.github.com/alsmola/e51a5a100a4c537b8d19f2366f593b1a.js"> </script>

</details>

<p>Detecting ClickOps in CloudTrail means filtering out traffic that happens via a web browser user agent, finding the mutating actions, pulling out the resource names, and sifting everything down to a list that can be sliced and diced by who, what service, what region. TrailTool has already done that work at ingest time, so the agent skips straight to reasoning about the results.</p>

<p>This prompt assumes you’re in catch-up mode and people are spinning stuff up without IaC. Rather than nagging them, you can fire something like this off to clean up after them. You could run a longer-lived agent looking for ClickOps in real time, nagging folks over Slack. Of course, the long-term fix is IAM policies strict enough to prevent it in the first place, which brings us to the next case.</p>

<h2 id="defining-least-privilege-iam-policies-for-roles">Defining least-privilege IAM policies for roles</h2>

<p><a href="https://alsmola.medium.com/designing-least-privilege-aws-iam-policies-for-people-ea4185c8a44b">Least privilege for IAM</a> is definitely a journey, not a destination. Especially when you consider humans, with all of their non-deterministic behavior and their “I’m an admin, get me out of here” entitlement. We all agree that things should be locked down, until we can’t do something we need for our job.</p>

<p>Often permissions start with the block of stone known as <code class="language-plaintext highlighter-rouge">AdministratorAccess</code>, which is then whittled by IAM artisans into an artful figure of “enough removed to satisfy security, enough retained to avoid complaints.” Like Michelangelo, who (apocryphally) stated that the creation of his masterpiece was simple: “I just removed everything that is not David.”</p>

<p>How do we know what to remove? CloudTrail is a pretty good way to figure it out. Generating least-privilege policy from CloudTrail logs is non-trivial, but tools like <a href="https://aws.amazon.com/iam/access-analyzer/">IAM Access Analyzer</a> and <a href="https://github.com/iann0036/iamlive">iamlive</a> have mapped out this path. TrailTool’s session-level analysis maps log lines into coherent narratives about what a user did over the course of an authenticated login, and uses iamlive mappings to translate that into IAM policy actions.</p>

<p><strong>Prompt:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Use trailtool to remove unused permissions from IAM policies for SandboxPowerUser role.
</code></pre></div></div>

<details>
<summary><strong>Session transcript</strong> (<a href="https://gist.github.com/alsmola/be4b61de4a1d514d23d359a3a4c606e3">view gist</a>)</summary>

<script src="https://gist.github.com/alsmola/be4b61de4a1d514d23d359a3a4c606e3.js"> </script>

</details>

<p>Use cases come and go, and access granted before may no longer be needed. You can run this as a recurring workflow: generate a policy, create a PR, review, deploy, repeat. Of course, this leads to what happens when you <em>over</em>-tighten, or when there are new use cases.</p>

<h2 id="responding-to-accessdenied-errors">Responding to AccessDenied errors</h2>

<p>Tightening down IAM policies is half the battle. The other half is knowing what to do when a role starts throwing AccessDenied errors. In my experience, they’re often a feedback signal - someone tried to do something legitimate and got blocked. Rather than having them file a ticket or ping you in Slack, use an agent to automatically identify the errors and draft the fix.</p>

<p><strong>Prompt:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Use trailtool to add permissions for events that received "AccessDenied" errors with IAM policies associated with SandboxPowerUser role.
</code></pre></div></div>

<details>
<summary><strong>Session transcript</strong> (<a href="https://gist.github.com/alsmola/3ab81be2dca0e1825548346cef3642e8">view gist</a>)</summary>

<script src="https://gist.github.com/alsmola/3ab81be2dca0e1825548346cef3642e8.js"> </script>

</details>

<p>The idea here is that least privilege needs to evolve, and someone bumping up against a permissions error is a signal that policies need to be loosened. A human-in-the-loop is a good idea here, as with all permission changes, but by shortening the loop from “hey I tried to do this thing and I couldn’t” to “fixed try again” by plumbing together all the implicit data with an agent-generated PR that’s ready to merge means less back-and-forth and faster forward progress for developers.</p>

<h2 id="validating-emergency-break-glass-access-justifications">Validating emergency break-glass access justifications</h2>

<p>One common pattern for implementing IAM access is the break-glass case. If there’s an incident or high-priority operation, an operator can ask for an exception, usually accompanied by a justification. The approver typically uses this justification as context for their decision, but:</p>
<ul>
  <li>The justification may be extremely brief</li>
  <li>The operator may end up doing something different after they receive the access</li>
</ul>

<p><strong>Prompt:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Use trailtool to investigate the session associated with the user alex@engseclabs.com assuming BreakGlassEmergency account around 8AM March 18, 2026 to see how it aligns with this justification: I was investigating an incident and needed to access SSM session for production instance.
</code></pre></div></div>

<details>
<summary><strong>Session transcript</strong> (<a href="https://gist.github.com/alsmola/91bd796e3ade4f33884ba9e9b5079b3c">view gist</a>)</summary>

<script src="https://gist.github.com/alsmola/91bd796e3ade4f33884ba9e9b5079b3c.js"> </script>

</details>

<p>This is where session-level analysis is useful. TrailTool lets you summarize activity at the session level, without manually correlating role assumptions and API calls across raw log files. By comparing this with stated justifications, it highlights discrepancies that might represent unwanted system changes or even attacks. It eliminates one of the foibles of <a href="https://alsmola.medium.com/access-approvals-considered-harmful-f24fa2fe2f87">access approvals</a>, closing the loop to ensure people do what they say they will.</p>

<h2 id="check-out-trailtool">Check out TrailTool</h2>

<p><a href="https://github.com/engseclabs/trailtool">TrailTool</a> is open source, so you can deploy it to your own account and start querying with the CLI. Or, you can check out <a href="https://trailtool.io">trailtool.io</a> for a more full-featured hosted version.</p>

<p>If you’re building AI-driven security workflows and CloudTrail analytics are slowing you down, let’s talk. Connect on <a href="https://www.linkedin.com/in/alex-smolen-8a59a31">LinkedIn</a> or <a href="https://infosec.exchange/@alsmola">Mastodon</a>.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>I’ve apparently been trying to build this since 2017 — <a href="https://github.com/alsmola/cloudtrail-daily">cloudtrail-daily</a> was a Go CLI that walked S3 and printed a People → Services → Actions summary, which is basically TrailTool minus the persistence layer and eight years of IAM scar tissue. The problem hasn’t changed; just the consumer. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[TrailTool pre-aggregates CloudTrail events into entities, making AI-driven security analysis fast, cheap, and actionable.]]></summary></entry><entry><title type="html">GraphGRC v2: SOC 2 Compliance in GitHub</title><link href="https://engseclabs.com/blog/graphgrc-v2-soc2-compliance-in-github/" rel="alternate" type="text/html" title="GraphGRC v2: SOC 2 Compliance in GitHub" /><published>2026-01-13T00:00:00+00:00</published><updated>2026-01-13T00:00:00+00:00</updated><id>https://engseclabs.com/blog/graphgrc-v2-soc2-compliance-in-github</id><content type="html" xml:base="https://engseclabs.com/blog/graphgrc-v2-soc2-compliance-in-github/"><![CDATA[<blockquote>
  <p><strong>tl;dr</strong> — GraphGRC is available at <a href="https://github.com/engseclabs/graphgrc">github.com/engseclabs/graphgrc</a>. Fork it to get pre-written SOC 2 controls, policies, processes, and standards in Markdown. GitHub Actions validate your docs and generate a compliance site. Free and open source.</p>
</blockquote>

<p>You need SOC 2 for your first enterprise deal. The sales team says you need it yesterday. You look at Vanta or Drata - $12,000 per year minimum, probably more. They’ll give you pre-written policies and a nice dashboard. They’ll also lock all your compliance documentation in their proprietary platform, where it lives as long as you keep paying.</p>

<p>The alternative is Google Docs or Confluence. You copy policies from the internet, paste them into a folder somewhere, and hope your auditor accepts them. There’s no structure, no validation, no way to ensure your documentation stays current. When someone asks “what’s our incident response process?” you send them a link to a doc that was last updated 18 months ago.</p>

<p class="blog-lede">Your compliance documentation should live in version control, not a SaaS platform you're renting.</p>

<h2 id="grc-as-code">GRC as code</h2>

<p>GraphGRC is compliance documentation in GitHub. Fork the repo and you get pre-written controls, policies, processes, and standards. Everything’s in Markdown with semantic linking between documents. GitHub Actions validate your docs and generate a static compliance site. Free and open source.</p>

<p>This isn’t a full GRC platform - there’s no vendor management module, no training attestation tracker, no fancy dashboard. It’s the foundational documentation you need to pass SOC 2, structured in a way that actually makes sense and validates itself automatically.</p>

<h2 id="how-it-works">How it works</h2>

<p>The documentation model has four layers that map to each other:</p>

<p><strong>Controls</strong> map SOC 2 requirements to your actual documentation. Each control references the policies, processes, and standards that satisfy it. These are the things your auditor cares about.</p>

<p><strong>Standards</strong> describe technical requirements. AWS security baseline. GitHub security configuration. Laptop security standards. These are the concrete security configurations you need to maintain.</p>

<p><strong>Processes</strong> are step-by-step procedures. Incident response runbook. Access review process. Onboarding and offboarding workflows. These tell people what to do.</p>

<p><strong>Policies</strong> set objectives for different roles. What the security team expects from engineers, from support, from finance. Not generic boilerplate about password complexity - specific guidance about handling customer data, using AI tools, managing secrets.</p>

<p>Everything links together via heading-level anchors. A control references specific sections of policies and processes. A process references relevant standards. Change a policy and the validation checks ensure the control mappings stay accurate. Update a process and automated reviews flag when it needs attention.</p>

<p>The validation runs in GitHub Actions. It checks that every control maps to documentation that exists. It verifies review dates are current and owners are assigned. It ensures your processes reference standards that are actually maintained. When you open a pull request to update documentation, the checks tell you if you broke anything.</p>

<h2 id="who-this-is-for">Who this is for</h2>

<p>This works if you’re preparing for SOC 2 and don’t want to pay for Vanta. It works if your security team values owning your compliance documentation and wants it in version control. It works if your team is comfortable with Git and Markdown workflows.</p>

<p>It doesn’t work if you need extensive hand-holding or want a full-featured GRC platform with vendor questionnaires and training modules. It requires some technical comfort - you need to be able to fork a repo, edit Markdown files, and run GitHub Actions.</p>

<p>The documentation is still being validated. I’ve used this model at multiple companies, but GraphGRC v2 as an open source project is new. Feedback is welcome. If you find gaps or things that don’t make sense, open an issue.</p>

<h2 id="try-it">Try it</h2>

<p>Check it out on GitHub: <a href="https://github.com/engseclabs/graphgrc">github.com/engseclabs/graphgrc</a></p>

<p>Fork it, customize it for your company, let me know what’s missing. Open to contributions and feedback. If you have questions or want to talk through whether this makes sense for your situation, reach out on <a href="https://www.linkedin.com/in/alexsmolen/">LinkedIn</a> or <a href="https://infosec.exchange/@alexsmolen">Mastodon</a>.</p>

<p>Your compliance documentation should be yours. This is a start.</p>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[GRC tools like Vanta cost $12K+/year and lock your compliance docs in proprietary systems. GraphGRC v2 gives you SOC 2 documentation in GitHub - pre-written controls, policies, and processes in Markdown with automated validation. Free and open source.]]></summary></entry><entry><title type="html">Fix Dependabot Security Alerts That Don’t Open Pull Requests</title><link href="https://engseclabs.com/blog/fix-dependabot-security-alerts-when-prs-fail/" rel="alternate" type="text/html" title="Fix Dependabot Security Alerts That Don’t Open Pull Requests" /><published>2026-01-10T00:00:00+00:00</published><updated>2026-01-10T00:00:00+00:00</updated><id>https://engseclabs.com/blog/fix-dependabot-security-alerts-when-prs-fail</id><content type="html" xml:base="https://engseclabs.com/blog/fix-dependabot-security-alerts-when-prs-fail/"><![CDATA[<p>Dependabot catches security vulnerabilities and opens pull requests to fix them. Except when it doesn’t. If Dependabot can’t create a PR for a security alert, <a href="https://github.com/engseclabs/dependabot-wolf/">dependabot-wolf</a> automatically sends the details to Copilot to figure it out. It’s a GitHub Action that monitors Dependabot security alerts and automatically sends any without PRs to GitHub Copilot for resolution.</p>

<p><strong>How it works:</strong></p>

<ol>
  <li>Checks for Dependabot security alerts that don’t have pull requests</li>
  <li>Extracts the vulnerability details and dependency conflict information</li>
  <li>Sends the context to an issue and assigns Copilot</li>
</ol>

<p>You need to create a fine-grained PAT and put it as an Action secret with perms to create an issue and assign to Copilot.</p>

<h2 id="installation">Installation</h2>

<p>Add the action to your repository’s workflow:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="na">name</span><span class="pi">:</span> <span class="s">Dependabot Wolf</span>
<span class="na">on</span><span class="pi">:</span>
  <span class="na">schedule</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">cron</span><span class="pi">:</span> <span class="s1">'</span><span class="s">0</span><span class="nv"> </span><span class="s">0</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*'</span>  <span class="c1"># Daily check</span>
  <span class="na">workflow_dispatch</span><span class="pi">:</span>

<span class="na">jobs</span><span class="pi">:</span>
  <span class="na">check-dependabot</span><span class="pi">:</span>
    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
    <span class="na">permissions</span><span class="pi">:</span>
      <span class="na">contents</span><span class="pi">:</span> <span class="s">read</span>
      <span class="na">security-events</span><span class="pi">:</span> <span class="s">read</span>
      <span class="na">issues</span><span class="pi">:</span> <span class="s">write</span>
    <span class="na">steps</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">engseclabs/dependabot-wolf@v1</span>
</code></pre></div></div>
<hr />

<p><strong>Found this useful?</strong> Check out the <a href="https://github.com/engseclabs/dependabot-wolf/">repo</a> or let me know if you run into issues.</p>]]></content><author><name>Alex Smolen</name></author><category term="dependabot" /><category term="github-security" /><category term="github-actions" /><category term="automation" /><category term="copilot" /><summary type="html"><![CDATA[Dependabot throws security alerts but sometimes can't create pull requests. Here's a GitHub Action that automatically sends failed alerts to Copilot for resolution.]]></summary></entry><entry><title type="html">Backyard APT: A Raccoon Story</title><link href="https://engseclabs.com/blog/raccoon-diaries/" rel="alternate" type="text/html" title="Backyard APT: A Raccoon Story" /><published>2025-11-07T00:00:00+00:00</published><updated>2025-11-07T00:00:00+00:00</updated><id>https://engseclabs.com/blog/raccoon-diaries</id><content type="html" xml:base="https://engseclabs.com/blog/raccoon-diaries/"><![CDATA[<p>Living in urban coastal California means you’re never far from an APT: an Advanced Persistent Trashpanda. Cute, yes. Harmless? Not even close. In my battle against backyard raccoons, I learned some things about security. I learned some things about myself. I learned some things about MCP servers and microscopic nematodes. Allow me to tell you my story.</p>

<p><img src="/assets/images/raccoon/raccoon.png" alt="Raccon" /></p>

<p><em>I Can Haz Grub?</em></p>

<p>Every fall, we’d observe their nocturnal wanderings through our little grass patch of a backyard. Sometimes they’d bring the whole family on their explorations - furry assorted-size blobs clambering over the fence. The raccoons began to wear out their welcome, though, when they started “rolling the grass” - tearing up big chunks of lawn to (as I learned from our gardener) look for grubs growing beneath the sod until their emergence as mature bugs. This foraging made our lawn an eyesore and muddy mess.</p>

<p><img src="/assets/images/raccoon/grass.jpg" alt="Grass" /></p>

<p><em>Looking for grubs in all the wrong places</em></p>

<p>Muddy lawns made me grumpy, but hardly battle-ready. That changed the night our beloved chihuahua Jolene darted out between my legs through a cracked-open back door to “sweep the perimeter” - part of her canine rituals. Perhaps she had sensed the family of raccoons perusing the yard at that moment. Once she spotted them, she immediately gave chase. The largest raccoon broke towards her and began a fierce attack as Jolene yelped in fear and pain.</p>

<p>Panicking, I leapt off the deck onto the lawn and gave chase. The raccoon had Jo and was heading under the patio. I bent down, snatched Jo from its clutches, and issued a stern Birkenstock-ed kick to the intruder.</p>

<video autoplay="" loop="" muted="" playsinline="" preload="metadata">
  <source src="/assets/video/raccoon.mp4" type="video/mp4" />
  Your browser doesn't support embedded videos.
</video>

<p><em>The fateful encounter</em></p>

<p>As I carried Jo back inside, the raccoon followed me, glaring at us from the back door as if to continue the altercation. My inner Jersey Shore came out as I raged - “you wanna go bro?” - but my wife wisely encouraged me to keep the door shut and alerted me to Jo’s wound: a bite mark opened on her belly. We spent that night at an emergency vet (waiting in our car, as was the style in those COVID-era times) before Jojo returned to us, stitched up, rabies-inoculated, and more than a little terrified by her assault.</p>

<p>I’m generally a peacenik, but I was stirred with a revenge-fueled craving for justice. First you come for my lawn, then you come for my dog? I made it a goal - a mission, a blood pact, even - to keep these pests out of my backyard. As a security thought leader, I first wanted to evaluate the effectiveness of my defenses. So I put out a camera to see the action.</p>

<p><img src="/assets/images/raccoon/jo.jpg" alt="Jo" /></p>

<p><em>Jo investigating the surveillance system</em></p>

<h2 id="security-lesson-1-you-cant-stop-advanced-threats-with-basic-defense">Security Lesson #1: You Can’t Stop Advanced Threats with Basic Defense</h2>

<p>I searched Amazon for “raccoon deterrent” and purchased a few different ultrasonic/flashing light yard stakes. Well, if you’re in my shoes in the future, let me save you a few dollars and suggest you skip these devices. he raccoons looked at my fancy deterrents and kept right on digging. Much like deploying a flashy new security product without tuning, my defenses looked good on paper and did nothing in the field. Maybe these things work for rural raccoons unaccustomed to strange noises and signs of humans, but my backyard baby bug burglars were entirely nonplussed.</p>

<p><img src="/assets/images/raccoon/motion.png" alt="Motion detector" /></p>

<p><em>Ornamental e-waste</em></p>

<h2 id="security-lesson-2-too-many-false-positives-spoil-the-detection">Security Lesson #2: Too Many False Positives Spoil the Detection</h2>

<p>My next defensive strategy: spray them with water. I bought a motion-activated sprinkler to soak their spirits. I did get a few good direct shots. But more than once, I took out the trash and forgot to disarm, activating the system and soaking myself - a serious case of “alert fatigue”.</p>

<p><img src="/assets/images/raccoon/sprinkler.png" alt="Motion-activated sprinkler" /></p>

<p><em>My petard on which I was hoist</em></p>

<p>I also found water just wasn’t a big deal to these raccoons. I’d see them outside and accost them with a hose and spray nozzle: full blast, point blank. They would give no ground, just stare at me with their glowing eyes from the tree branches.</p>

<p>I commiserated with neighbors, consulted message boards, and talked with friends. A buddy whose family grew up on a farm suggested cages to trap the raccoons. “Well what do you do then?”, I asked. “Well,” he said, “my dad was a bit of a softie and would take them somewhere far away and release them. My mom, though, she would go get the gun and…” I blanched at the notion of dispatching anything more sentient than a housefly, so blowing a raccoon’s brains out was a hard no. Also, the idea of Ubering raccoon after raccoon from Oakland to Vacaville didn’t seem like a sustainable strategy.</p>

<h2 id="security-lesson-3-custom-code-to-wire-defensive-controls-together-is-a-superpower">Security Lesson #3: Custom Code to Wire Defensive Controls Together is a Superpower</h2>

<p>I’d set up Home Assistant and been tinkering with a few different motion sensors and smart switches. That’s when I got an idea - what if I connected something that moved to my own motion sensor? This would also let me build increasingly sophisticated detection and response flows.</p>

<p>My first challenge: what could I plug in outdoors that would move around and be threatening to a raccoon? Of course - those wild wacky inflatable arm guys! It’s basically just a fan and a sock with both ends open. I had to solve two problems. One was figuring out the right search term to buy one (“air dancer”), and the second was finding a supplier that had the right size - “scare a backyard raccoon”-size rather than “pay attention to my used car lot from the freeway”-size.</p>

<p><img src="/assets/images/raccoon/airdancer.png" alt="Air dancer" /></p>

<p><em>Like a scarecrow, but make it wacky</em></p>

<p>After building a few automations to wire the motion sensor to the plug (plus some patio lights) and turn the whole formula on only at night, I watched in glee as raccoons startled at the initial whir of the fan and then retreated from the wacky air dancing.</p>

<p>While I thought my problem was solved, a new wrinkle emerged. While the raccoons had made the “flight” decision, another backyard visitor was provoked to aggression - the normally placid possum. Reviewing my video audit log footage one night, I was shocked to see a possum react to my air dancer not with deference, but instead grab the flapping sock in its mouth and carry it off to somewhere I never found it.</p>

<video autoplay="" loop="" muted="" playsinline="" preload="metadata">
  <source src="/assets/video/Possum.mp4" type="video/mp4" />
  Your browser doesn't support embedded videos.
</video>

<p><em>Fearless possum claiming its prize</em></p>

<p>Supposing this may have been a one-time act of bravery, I replaced the lost investment and purchased a spooky ghost version that I believe will intimidate even the most resolute possum.</p>

<video autoplay="" loop="" muted="" playsinline="" preload="metadata">
  <source src="/assets/video/ghost.mp4" type="video/mp4" />
  Your browser doesn't support embedded videos.
</video>

<p><em>The ghost air dancer - surely possum-proof</em></p>

<p>Through all of this, I’ve been using Home Assistant to power my surveillance and wildlife PsyOps architecture. In this AI era, I’ve started using Claude Code and the Home Assistant MCP server to make some sick dashboards.</p>

<p><img src="/assets/images/raccoon/dashboard.png" alt="Dashabord" /></p>

<p><em>Vibe coding and the vibe is - raccoons begone!</em></p>

<h2 id="security-lesson-4-remove-the-attacker-incentives">Security Lesson #4: Remove the Attacker Incentives</h2>

<p>Despite my creativity in keeping these garden grub guzzlers at bay, I’d still wander out some mornings to find a corner of lawn beyond sensor distance torn up. This is where I discovered the most effective element of my defensive system. In a line of thinking that has deep profundity for all security endeavors, I realized the best defense isn’t higher fences. It’s changing what makes the target appealing in the first place. That’s true for raccoons and ransomware alike.</p>

<p>Toxic pesticides may have worked, but I didn’t want to poison my poor beset small dog. Instead, I identified glorious nematodes - microscopic organisms that feed on grubs and can be bought by the millions, mixed with water, and sprayed on the lawn.</p>

<p><img src="/assets/images/raccoon/nematodes.png" alt="Nematode" /></p>

<p><em>Microscopic, macro-effective</em></p>

<p>With these latest technological advantages, I believe I’ve won the cat and mouse game - or rather, the raccoon and person game - at least for now. And though there have been costs to my lawn’s integrity, my dog’s safety, and my sanity, I gained valuable security insights which I share with you. Good luck out there, whether your APTs wear hoodies or fur coats.</p>

<hr />

<iframe src="https://www.linkedin.com/embed/feed/update/urn:li:share:7393730688509906944?collapsed=1" height="682" width="504" frameborder="0" allowfullscreen="" title="Embedded post"></iframe>

<p><em>Sharing the raccoon saga on LinkedIn</em></p>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[Raccoons are both advanced and persistent threats. After one attacked my chihuahua Jolene, I declared war on my backyard invaders. Through ultrasonic deterrents, motion-activated sprinklers, and wacky inflatable air dancers, I learned critical security lessons - including that removing attacker incentives beats detection every time.]]></summary></entry><entry><title type="html">Data Retention is Two Different Problems</title><link href="https://engseclabs.com/blog/data-retention-two-different-problems/" rel="alternate" type="text/html" title="Data Retention is Two Different Problems" /><published>2025-10-23T00:00:00+00:00</published><updated>2025-10-23T00:00:00+00:00</updated><id>https://engseclabs.com/blog/data-retention-two-different-problems</id><content type="html" xml:base="https://engseclabs.com/blog/data-retention-two-different-problems/"><![CDATA[<p>A common artifact in security programs is something called a <em>data retention policy</em>. Like other policies, there’s a lot of jargon, but the centerpiece is typically a big table, with categories of data pointing to specific timeframes - for example:</p>

<table>
  <thead>
    <tr>
      <th>Data Category</th>
      <th>Retention Period</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Customer data</td>
      <td>30 days</td>
    </tr>
    <tr>
      <td>Employee records</td>
      <td>2 years</td>
    </tr>
    <tr>
      <td>Security event logs</td>
      <td>5 years</td>
    </tr>
    <tr>
      <td>Financial records</td>
      <td>7 years</td>
    </tr>
  </tbody>
</table>

<p>To me, these policies are confusing because they conflate two different goals:</p>

<ul>
  <li><strong>Data preservation</strong> for the <em>minimum</em> amount of time you need to keep important data. This applies to stuff like audit logs you need if you get hacked, or financial records you need to investigate fraud.</li>
  <li><strong>Data deletion</strong> for the <em>maximum</em> amount of time you’re allowed to keep personal data. This is driven by contracts and privacy laws which require you to delete personal data when you don’t need it anymore.</li>
</ul>

<p>It’s a good idea to write clear data retention policies to detangle these two goals. But also - you should provide advice to the people who are storing data, at design time, to help make sure you actually do the thing you said you’d do. Two simple and valuable ideas I’ve brought to numerous conversational tables to support this fidelity  are <em>immutability</em> (for preservation) and <em>ephemerality</em> (for deletion).</p>

<h2 id="can-it-be-immutable">Can it be immutable?</h2>

<p>Remember that for data you need to preserve - audit logs, security events, compliance records - the numeric duration specified in the policy is a minimum. In the median case, I propose any duration other than <em>indefinitely</em> is cargo culture copy-paste with zero explicit rationale. Presuming it’s not privacy-risk bearing personal data, other traditional reasons to delete old data - storage costs, database performance - matter far less than they used to. Cloud storage is cheap. Modern data warehouses handle large datasets well. The operational complexity of managing data lifecycle policies costs more than just keeping the data.</p>

<p>Beyond keeping archival data indefinitely, consider immutability - write once, never modify or delete. For data in cloud-based SaaS, indefinite is the default retention strategy, but immutability is more rigorous.  The difference is ensuring the data is not just retained - it also can’t be deleted. The controls for immutability are built into cloud storage systems like S3: <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html">object lock</a> or <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html">versioning</a>. A more indirect route to immutability is restricting access with <a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html">service control policies</a> or <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/MultiFactorAuthenticationDelete.html">MFA delete</a>. Combining with resilient isolated backups, you can uphold retention policies even in black swan events like data breaches or accidental outages.</p>

<h2 id="can-it-be-be-ephemeral">Can it be be ephemeral?</h2>

<p>If you store personal data, you should delete it when you lose the rights to it, lest you suffer the wrath of privacy legislators. You may lose your rights when a contract ends, or when someone sends you a deletion request.</p>

<p>The traditional approach requires building deletion machinery to plumb these requests through every system where personal data lives: your production database, analytics warehouse, ML training datasets, etc. Each system needs its own deletion endpoint, and you need orchestration to coordinate all of it. Yuck.</p>

<p>Ephemeral data solves this by design. If you automatically expire old data with something like <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html">S3 lifecycle management</a>, your retention period becomes your deletion mechanism. You retain the data for a little while, and then it naturally disappears. This doesn’t work for your main durable stores of personal data (e.g. your production database), but is worth considering everywhere else.</p>

<p>Like, you probably don’t want to selectively delete data from database backups. Same with logs that capture personal identifiers or detailed analytics that include user-level information. Aging this data out means no-touch compliance with your data retention policies.</p>

<p>For data lakes and warehouses with personal data AND long-term analytics requirements, enforce a lifecycle retention on fact tables with personal data, then apply pseudonymization or anonymization on transformed tables as part of your data pipeline. I’ll note that validating the effective removal of personal information during these data transformations is an exercise left to the not-faint-of-heart reader.</p>

<h2 id="design-for-immutability-or-ephemerality-from-the-start">Design for immutability or ephemerality from the start</h2>

<p>So, when you’re doing security design reviews or threat modeling for systems that will store data, ask these questions:</p>

<ul>
  <li><strong>For archival data</strong>: Can we make this immutable?</li>
  <li><strong>For personal data</strong>: Can we make this ephemeral?</li>
</ul>

<p>If the answer is yes, use storage systems with built-in immutability or expiration. Make the data lifecycle automatic. If the answer is no, then you need to build custom machinery - either deletion controls for durable personal data, or integrity and availability controls for archival data. Pointing people in the right direction early - during design, not after the system is built - is what keeps operational complexity at bay.</p>

<p>Remember - the hard part of security isn’t writing a policy. It’s the work that goes into making it reality.</p>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[Data retention covers two different problems - preservation (minimum time you must keep archival data) and deletion (maximum time you can keep personal data). They require opposite technical approaches - one prevents deletion, the other enforces it. The elegant solutions? Indefinite and ephemeral data.]]></summary></entry><entry><title type="html">Role-Based Everything: Aligning Access Control, Policies, and Training</title><link href="https://engseclabs.com/blog/role-based-everything/" rel="alternate" type="text/html" title="Role-Based Everything: Aligning Access Control, Policies, and Training" /><published>2025-10-14T00:00:00+00:00</published><updated>2025-10-14T00:00:00+00:00</updated><id>https://engseclabs.com/blog/role-based-everything</id><content type="html" xml:base="https://engseclabs.com/blog/role-based-everything/"><![CDATA[<p>Your new hire’s first day looks something like this: they sit through two hours of generic security training about phishing and password hygiene, they click through a 47-page acceptable use policy that covers everything from clean desk requirements to data retention, and then they get added to whatever Slack channels and tools their manager remembers to request. Three months later, they’re blocked on something, so they ping their manager who pings their manager, and suddenly they have production access. The security policies they acknowledged on day one? Nobody’s looked at those since.</p>

<p>This is broken. We have three systems (RBAC, security policies, and security training) that don’t talk to each other. Policies gather dust in Confluence. Training is a checkbox exercise everyone resents. And RBAC becomes a tangled mess of individual exceptions and Okta group rules hanging off job titles that hiring managers invented on the spot. There’s a better way. What if we stopped pretending these are separate problems?</p>

<p class="blog-lede">If someone has access to production, they have specific security responsibilities. Those responsibilities should be written as policies. Those policies should be their training.</p>

<h2 id="roles-describe-responsibilities-not-just-access">Roles describe responsibilities, not just access</h2>

<p>With great power comes great responsibility. If someone has elevated access, they have specific security obligations. Those obligations should be documented as role-specific security policies.</p>

<p>Security policies, <a href="https://csrc.nist.gov/glossary/term/security_policy">properly defined</a>, are rules that guide how people do their work securely. They’re not just documents for the security team. They’re objectives the security team has for everyone else - telling engineers how to handle secrets, telling support how to verify customer requests, telling finance how to process vendor payments securely.</p>

<p>These role-specific policies should form the basis of role-specific training. Training is just teaching people how to meet the objectives you’ve set for them. It all fits together.</p>

<h2 id="what-this-looks-like-in-practice">What this looks like in practice</h2>

<p>Let’s say you create a “Product Engineer” role. This role gets you everything you need to contribute code to the main product: GitHub access to the relevant repos, your CI/CD pipeline, your observability stack, your cloud environments. All the things an engineer needs to do their job.</p>

<p>Because this role has access to production systems and customer data, it also comes with a “Product Engineering Security Policy”. Not generic copy about password complexity requirements or physical document shredding that was written in 2015 and hasn’t been updated since. Specific policies about things like: can you use AI coding assistants (and if so, with what guardrails), how do you handle customer data in development, what’s the process for emergency production access, how do we think about secrets management.</p>

<p>For this system to work, your policies need to actually be good. I’ve written about <a href="https://alsmola.medium.com/better-security-policies-66eae7d6f722">better security policies</a> in software engineering environments before. The core principle: write policies for the people who need to follow them, not for auditors. When policies drive training and connect to access control, that principle becomes critical.</p>

<p>With role-specific policies, you can build role-specific training. Your onboarding training for Product Engineers walks through these policies. You don’t need expensive learning management systems to make this work. Use your existing knowledge management software (Notion, Confluence, whatever) to host the content and track who’s viewed it. Add a Google Form at the end for attestation. Write some light scripts against your identity system to check completion, send reminder emails or Slack messages to people who haven’t finished, and generate the compliance reports your auditors want.</p>

<p>Then there’s annual refresher training. Many compliance frameworks require it, and the spirit of the requirement is sound: security knowledge needs reinforcement and updating. But forcing people to take the exact same training again feels like a waste of time. It is a waste of time. The only people who benefit from unchanged annual training are security teams who don’t have to think creatively about how to fulfill the requirement.</p>

<p>Instead, use refreshers to do two things: briefly review your role-specific policies and highlight any recent changes or additions, then dive into substantive content that’s temporally relevant. If you have policies around secure coding standards for Product Engineers, your annual refresher should walk through the bugs from your bug bounty program over the past year. Show how they were introduced and how to prevent them. The best predictor of the next bug is the last bug. This is infinitely more valuable than generic OWASP content nobody remembers, and it directly reinforces the policies you’ve already established.</p>

<p>Here’s where it gets interesting: let’s say you have a product manager who wants to learn to code and contribute to the product. Great! They can get the Product Engineer role. But they also need to understand the policies that now apply to them, and they need to go through the training. Access and responsibility travel together.</p>

<h2 id="details-that-make-it-work">Details that make it work</h2>

<p><strong>Start with your highest-risk roles.</strong> Identify the 2-3 roles with the most elevated access (Product Engineer, Support, Infrastructure). If you have a data classification, use that to think through who has access to your most sensitive data. Check with your auditors early that the evidence you’ll collect for security policies and training controls works for them. Document the policies for these roles, build the training, and get people through it. Once you have the pattern working, the next roles get easier.</p>

<p><strong>Keep the granularity manageable.</strong> Think 10 to 20 roles maximum, at the job family level. If you need more than you can count on your fingers and toes, you’ve gone too far. These are the big blocks, not the fine-grained technical permissions you’d configure in AWS or Salesforce.</p>

<p><strong>Integrate with hiring.</strong> This is where most RBAC systems go off the rails. Job descriptions get created by hiring managers, titles get invented, and then later someone tries to figure out what access these people need. Flip it around. When a new job description is created, security should be involved. What role does this map to? What access does this role need? Don’t retrofit access patterns onto org structures that were designed without any thought to security.</p>

<p><strong>Handle exceptions gracefully.</strong> You won’t be able to say that every product manager needs the Product Engineer role. You need a system that’s flexible enough to handle exceptions. But at a high level, at a structural big block level, the roles should do most of the heavy lifting.</p>

<p><strong>Layer your policies.</strong> You can have a baseline employee security policy that applies to everyone. But then layer on role-specific policies for people with elevated access. Support engineers who have broad access to customer data and can perform administrative actions on behalf of customers? They get specific policies about customer privacy and how to verify requests. Infrastructure engineers with cloud admin permissions? They get specific policies about root account usage and infrastructure changes.</p>

<hr />

<p>This is hard work for the security team. You can’t just copy and paste policies from a SOC 2 in a box product. You can’t buy generic security awareness web video training and call it done. You actually need to own the underlying content, make sure it communicates what’s important, and build deep empathy with the people in your organization and the work they do. But when you align RBAC, policies, and training, you get better outcomes: people understand what matters to them specifically instead of drowning in generic noise, you spend less on overpriced compliance theater while actually reducing risk, and the system scales naturally as people move into new roles. This isn’t about finding a shortcut. It’s about doing the work that actually matters, respecting people’s intelligence enough to give them exactly what they need to know for the access they have.</p>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[Your new hire sits through generic security training, clicks through a 47-page policy, and gets random access over time. Three months later they ping for production access. The policies? Nobody's looked at them since day one. There's a better way.]]></summary></entry><entry><title type="html">Refocusing Vendor Security on Risk Reduction</title><link href="https://engseclabs.com/blog/refocusing-vendor-security-on-risk-reduction/" rel="alternate" type="text/html" title="Refocusing Vendor Security on Risk Reduction" /><published>2025-10-01T00:00:00+00:00</published><updated>2025-10-01T00:00:00+00:00</updated><id>https://engseclabs.com/blog/refocusing-vendor-security-on-risk-reduction</id><content type="html" xml:base="https://engseclabs.com/blog/refocusing-vendor-security-on-risk-reduction/"><![CDATA[<p>Modern software companies use a lot of software services. Data flows across organizational boundaries, and security risk moves from first-party to third-party, challenging security teams that have responsibility without control. Traditional security teams address this through certifications and questionnaires, supporting risk visibility and acceptance. What’s often overlooked is the opportunity to actually <em>reduce</em> risk by collaborating with implementation teams on secure configuration decisions specific to the software in question. Similar to the challenge of design involvement (threat modeling) in AppSec versus post-implementation analysis (running scans), this approach requires an embedded, empowered, and empathetic security team.</p>

<p class="blog-lede">Vendor security reviews focus on what vendors do. But the real risk (and opportunity) lies in what <em>you</em> do with their software.</p>

<p><em>The old way:</em> Marketing wants to buy HubSpot. Security asks for a SOC 2 and pen test, sends a 200-question spreadsheet, waits 3 weeks for responses, notes that the vendor doesn’t enforce password rotation, asks an exec to “accept the risk,” then moves on. Six months later, a developer connects HubSpot to the entire customer database via a Zapier integration using an API key that never expires, tied to an intern’s account.</p>

<p><em>The better way:</em> Security sits with marketing during onboarding. Together they: enable OIDC, set up role-based access so only marketing can see marketing data, configure audit logging to the SIEM, document that the Salesforce integration uses a service account with read-only permissions that auto-rotates credentials.</p>

<h2 id="avoid-generic-vendor-risk"><strong>Avoid generic “vendor risk”</strong></h2>

<p>Security teams that care about third-party SaaS risk find themselves involved in the procurement process. This is reasonable - knowing (asset inventory) is half the battle. But typical security team activities are surface-level and generic:</p>

<ul>
  <li>Request and review a SOC 2 report</li>
  <li>Request and review a penetration test</li>
  <li>Request and review responses to a security questionnaire</li>
  <li>Consult a third-party vendor risk service (e.g. SecurityScorecard)</li>
</ul>

<p>Sure, sometimes these exercises reduce risk. Don’t have a SOC 2? No purchase. Unresolved high-risk findings? Commit to fix before we sign a contract. But that only works when there’s a favorable power differential. While a large purchaser may influence a small supplier’s security practices, there’s less potential in more equal or inverse dynamics. If the supplier won’t budge, the security team needs to spend reputational capital (or convince someone else to spend it) to push back against compelling business interests.</p>

<p>In the median case, vendors have already prepared industry-standard responses by hiring audit and pen test firms incentivized to give them the sales-enabling materials they need. The riskiest outcome from a “vendor security review” is asking a harried executive to approve and accept risk for deficiencies divorced from the context of how the software will actually be used. A marketing team isn’t abandoning their new analytics platform because the vendor’s audit noted they didn’t offboard personnel within 30 days, or had a stored XSS vulnerability.</p>

<p>Perhaps most distressing for those who’ve seen the sausage made: the nature of SOC 2, pen tests, and other external security attestations is very low value for understanding risk. It’s extremely difficult to reason about first-party risk, let alone third-party risk. Large security teams spend tremendous effort identifying and reducing causes of security breaches in their own infrastructure and often fail. The idea that we can elucidate third-party system risk as part of a brief “vendor review” feels unrealistic. Cookie-cutter audits and commoditized pen tests simply don’t provide much assurance. We might get some signal about security maturity, but exhaustive security analysis that would let organizations make confident buy/no-buy decisions wouldn’t be cost-effective or practical for the hundreds of vendors organizations use.</p>

<p>Like many “supply chain” problems, there aren’t easy answers. As an ecosystem, we’ve accepted the risk of this free-flowing interconnectedness to enjoy the benefits of collaborative integrations and software specialization. That said, my take is we spend too much time on risk visibility and not enough on risk reduction. For SaaS vendors, risk reduction activity is essentially hardening the service - making sure any decisions made while configuring the software make optimal security choices, accounting for tradeoffs.</p>

<p>I’m not saying don’t ask for SOC 2 or pen tests. <a href="https://www.latacora.com/blog/2020/03/12/the-soc-starting/">Latacora’s SOC 2 Starting Seven</a> suggests simply “tracking all the software you subscribe to, buy, or install in a spreadsheet and start doing some simple risk tracking”.  Maybe you even want security questionnaires to look good for auditors or cover your bases with regulators in the event of a breach. But if you do it and don’t do what I describe below, you’re missing an opportunity to optimize the energy you spend on your security program.</p>

<p>One note: these activities are different from the very “external” vendor security review process that asks for evidence and risk approvals. They require the security team to be deeply connected to the IT function that clicks the buttons to enable identity/access/logs, and to have strong relationships with teams using software so there’s trust to accept recommendations. Expect less paperwork and more teamwork.</p>

<h2 id="instead-support-smart-security-decisions"><strong>Instead, support smart security decisions</strong></h2>

<h3 id="understand-data-flows"><strong>Understand data flows</strong></h3>

<p>New software review should start by understanding what data the vendor will collect and how. Similar to threat modeling, this requires context gathering - sitting down with implementers to make sure the security team understands what the heck this software is doing in the first place.</p>

<p>Recommendations often come out of just this step: Can we self-host rather than use cloud? Do we intend to turn on such-and-such third-party integrations?</p>

<h3 id="define-access"><strong>Define access</strong></h3>

<p>Pretty much any software will have you make security-related decisions about access. Typically this means roles and permissions (coarse- or fine-grained) plus identity (username/password, MFA, OAuth, SAML). I’ve seen plenty of wild west situations with access to new software tooling. This is an ideal opportunity for a security team to have a win-win by making easy, out-of-the-gates access work through well-defined role-based access control strategies that can be applied to new use cases.</p>

<p>This is where you make an impact: “Hey, they offer SAML and SCIM, let’s use that.”</p>

<h3 id="configure-auditing"><strong>Configure auditing</strong></h3>

<p>If the software has particularly sensitive data, it may be wise to track access, changes, or sensitive operations. Consider what you’d do in the event of a breach - are there audit logs configurable for the software? Would it make sense to write detections? Does the software have capability to ship audit logs to your SIEM?</p>

<h3 id="lock-down-integrations"><strong>Lock down integrations</strong></h3>

<p>This feels like the biggest area where vendor risk gets introduced, and the part that’s hardest to get a handle on as a security team because these integrations are often nuanced and not well understood except by those configuring or administering software.</p>

<p>Nearly all software these days integrates with other systems. Does it read/write from your data analytics pipeline? Tie into your observability stack? Does it need to integrate with your marketing website?</p>

<p>The boom in “Non-Human Identity” security points to the risk associated with these integrations if they’re created and maintained without security rigor. We see API keys stored plaintext on endpoints, full administrative access, integrations tied to accounts belonging to real humans that break when they leave the company. Rather than trying to solve these problems after the fact with tools, providing guidance upfront during onboarding helps reduce risk from the get-go and obviates the need for potentially breaking changes later.</p>

<h3 id="harden-with-security-guides"><strong>Harden with security guides</strong></h3>

<p>Beyond these standard security configuration areas, many software services have unique configuration knobs that can profoundly affect security. Larger vendors have started publishing “security guides” that help implementers understand their options and how they work. It should be the security team’s responsibility to consume these and ensure best practices are followed if they make sense for the organization’s context.</p>

<hr />

<p>The bottom line: vendor security isn’t just about collecting attestations and checking boxes. It’s about being embedded enough in your organization to understand how software actually gets used, building relationships strong enough that people trust your recommendations, and focusing your energy on the configuration decisions that actually reduce risk. That’s harder than asking for a SOC 2 report, but it’s also where the real security work happens.</p>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[Modern software companies use a lot of software services. Traditional security teams address third-party risk through certifications and questionnaires, but there's an opportunity to actually reduce risk by collaborating with implementation teams on secure configuration decisions.]]></summary></entry><entry><title type="html">What Should I Work on Next? A Framework for High-Impact Security Work</title><link href="https://engseclabs.com/blog/what-should-i-work-on-next/" rel="alternate" type="text/html" title="What Should I Work on Next? A Framework for High-Impact Security Work" /><published>2025-06-03T00:00:00+00:00</published><updated>2025-06-03T00:00:00+00:00</updated><id>https://engseclabs.com/blog/what-should-i-work-on-next</id><content type="html" xml:base="https://engseclabs.com/blog/what-should-i-work-on-next/"><![CDATA[<p>“So, my last project is basically wrapped up. What should I work on next?”</p>

<p>This question comes up often in my 1:1s, and how I respond influences if my report will do high-impact work or spin their wheels. I believe among the most valuable things I do as a security engineering manager is steer my team towards the best work. When a security engineer looks to me for advice on what to work on, it’s a high leverage opportunity. I’ve succeeded by aligning people with work that mattered; I’ve also failed to recognize and redirect wasteful work energy. Over time, I’ve developed three criteria/questions to guide this discussion:</p>

<ul>
  <li><strong>Business Goals:</strong> What is the impact of this work on the business?</li>
  <li><strong>Implicit Interest:</strong> How engaged are you when doing this work?</li>
  <li><strong>Personal Growth:</strong> How much does this work align with your stated career interests?</li>
</ul>

<h2 id="side-note-decision-making-as-an-optimization-problem">Side-note: Decision making as an optimization problem</h2>

<p>Back in 2012 at Twitter, I joined an internal group completing the Coursera “Machine Learning” course. One insight that stuck: behind all the matrix math, we were asking a hill climbing question. Given a bunch of dimensions, what are the best weights to predict outcomes?</p>

<p>That optimization mindset applies to simpler problems too, like “what to work on next”. Without a decision making rubric, you might jump to recent conversations with your boss, backlog items, or what would look good for promotion. Context is king so circumstances differ, but the best choice is unlikely to be based on just one criteria. The “right work” involves tradeoffs. Similar to “cheap/good/fast - pick two”, you can’t maximize everything. So, with that in mind, not all work will crush Business Goals, ignite Implicit Interests, <em>and</em> speed-run Personal Growth. But they’re all heuristics to consider for making optimal(-ish) choices about what to work on.</p>

<h2 id="business-goals">Business Goals</h2>

<p>Information security isn’t a first-order business goal. While no company wants bad security outcomes, security work rarely moves the needle for top company metrics like revenue growth. Like other “platform” work and technical debt, it’s difficult to show direct value. As I mentioned in <a href="https://alsmola.medium.com/building-effective-security-okrs-94f249230a39">Building Effective Security OKRs</a>, it’s easy to describe project value in ways that aren’t connected to business goals (“we shipped a thing”). Security engineers rarely explain their work’s value with spreadsheets full of numbers.</p>

<p>I’ve found one way to communicate security business value to stakeholders is through <em>narrative</em>. The story of why you’re doing what you’re doing needs to land with people thinking about business from the broadest perspective - executives, investors, customers, etc. The work may be a small part of this story. It may require explanation, and some squinting and hand waving. But work disconnected from business value won’t be watered and grown over time.</p>

<p>“Zero trust” is a narrative - “having network access shouldn’t get you access to services, there should be authentication/device verification involved”. There’s a deep risk discussion about why that’s good, but the narrative encodes it. Similar to “paved roads”, “detection engineering”, or “shift left” - these concepts connect projects to business goals via “strategy”. This is the connective tissue from grungy security work to IPOs/dollar signs.</p>

<p>Take static analysis tools as “shift left” thinking. The business narrative is preventing vulnerabilities before they become expensive bug bounty payouts or worse. But implementation details matter for that story to hold - you need the tool to catch vulnerability classes you’re paying for, not just generate noise engineers ignore. Similarly, when implementing detections, work backwards from credible threats people already understand - what’s in the news, what happened to similar companies - and tell the story from that threat-focused angle.</p>

<p>Some projects are inherently difficult to connect to business value. System rewrites (like migrating secret managers), building internal tooling that generates too many alerts to triage effectively, or <a href="https://alsmola.medium.com/access-approvals-considered-harmful-f24fa2fe2f87">access approval systems that slow productivity</a> have <em>very</em> indirect customer impact. This is a problem many engineering leaders deal with, and selecting your portfolio is an art on itself. Suffice to say, it’s best to have clear examples of how you’re impacting users to offset the ennui around big projects where the successful outcome is “everything works like it did before”.</p>

<p>The other way security projects impact business is unlocking revenue through compliance and customer assurance. This may not excite security engineers the way a novel threat detection algorithm does, but it’s often the most direct path from security work to business value. When your SOC 2 Type II enables your first big enterprise deal or your FedRAMP certification opens government contracts, the ROI conversation becomes much easier. People focused on business outcomes recognize these compliance achievements as tangible value they can point to with customers and prospects.</p>

<p>Regardless of framing, workshopping a project’s business impact is a great way to prioritize it. How would you communicate the progress and outcome to the organization? If the message isn’t clearly “here’s how we’re driving our business forward”, take heed.</p>

<h2 id="implicit-interest">Implicit Interest</h2>

<p>People prefer different types of work. One person may love diving into obscure technical challenges - a big “nerd snipe” target. Another may relish presenting to the organization about how security works. Others may be “red team” or “blue team”-coded.</p>

<p>The more someone gravitates toward and is absorbed by their work, the more productive they’ll be. Ask someone who dreads writing to complete a big documentation exercise, and it’ll take forever. Some engineers are allergic to frontend code. But find a problem that fits someone’s implicit interest, and they’ll often fly with it, bulldozing through obstacles and bringing unexpected creativity to problem solving.</p>

<p>Don’t forget you can combine people with complementary interests on the same project. Pair the person who loves prototyping with someone who enjoys writing thorough documentation. Match the frontend enthusiast with the API builder. This often produces better outcomes than forcing one person to do everything.</p>

<p>Given work with equal business value, figure out what project is more interesting. This is a great way to figure out if someone fits the team. If they’re not interested in the work, it’s hard to expect them to be more productive than a replacement who is.</p>

<h2 id="personal-growth">Personal Growth</h2>

<p>A final dimension to consider: how the work aligns with their stated career goals. A trap managers and reports fall into involves losing the plot about the report’s growth. If they don’t feel they’re headed in the direction they want at the pace they expect, their engagement and productivity will plummet while their departure becomes imminent.</p>

<p>If people know what they want to be when they grow up, connect the work they’re doing to that path. For engineers interested in the management track, prioritize work involving mentoring interns or working cross-functionally with other teams. For those who want to build their external presence, look for projects that could become “conference talk worthy” - novel approaches or interesting problems the broader security community would want to hear about.</p>

<p>In an ideal world, we’d all have deep clarity about how our career journey should unfold. Reality is that many people don’t know how they want to grow. Something like a career ladder can be useful as a default “here’s what growth looks like” description. Remember there’s no one true path for anyone’s career, and if you’re creative about how work can foster growth you can connect world-weary engineers with new challenges and kindle productive energy.</p>

<p>The world of security is broad and deep, and people are unique. Figuring out how to get the most of your team will continue to be the mark of effective security management. Use this three-part framework to evaluate options and get people onboard with winning projects.</p>]]></content><author><name>Alex Smolen</name></author><summary type="html"><![CDATA[A framework for helping security engineers choose high-impact work using three criteria - business goals, implicit interest, and personal growth.]]></summary></entry></feed>