<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Data on Morten Ankerstjerne</title>
    <link>https://mortenankerstjerne.com/categories/data/</link>
    <description>Recent content in Data on Morten Ankerstjerne</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Fri, 18 Jul 2025 11:04:47 +0200</lastBuildDate>
    <atom:link href="https://mortenankerstjerne.com/categories/data/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Repeat last known value</title>
      <link>https://mortenankerstjerne.com/posts/2025/repeat-last-known-value/</link>
      <pubDate>Fri, 18 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://mortenankerstjerne.com/posts/2025/repeat-last-known-value/</guid>
      <description>&lt;p&gt;&lt;em&gt;Demo scripts &lt;a href=&#34;#demo-scripts&#34;&gt;below&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;em&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; This technique is by no means something I have come up with, I&amp;rsquo;ve just needed it enough times that I figured I might as well put it into my own words, for future reference.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;&#xA;&lt;p&gt;Suppose you have a collection of time series data, recording some status or other measurement at a point in time.&lt;/p&gt;&#xA;&lt;p&gt;In an OLTP system we probably don&amp;rsquo;t want to repeat the same value every time we take a measurement, or we may only record changes made to a ledger when they occur.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Stop auto-shrinking your staging database</title>
      <link>https://mortenankerstjerne.com/posts/2025/stop-shrinking-stage/</link>
      <pubDate>Thu, 05 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://mortenankerstjerne.com/posts/2025/stop-shrinking-stage/</guid>
      <description>&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;&#xA;&lt;p&gt;If you&amp;rsquo;ve worked with data warehouse development on Microsoft SQL Server, you have probably had a run-in with your SAN-admin at some point, complaining about the size of your databases, and asking if you can do some cleanup to free up disk space.&lt;/p&gt;&#xA;&lt;p&gt;What often ends up happening is, you take look at the database files and realize there&amp;rsquo;s a lot of free space that could be released, especially if you have a database dedicated to staging transformed data, before loading it into data marts.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Ambiguity in daylight saving time</title>
      <link>https://mortenankerstjerne.com/posts/2025/daylight-saving-time/</link>
      <pubDate>Mon, 28 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://mortenankerstjerne.com/posts/2025/daylight-saving-time/</guid>
      <description>&lt;p&gt;&lt;em&gt;Demo scripts &lt;a href=&#34;#demo-scripts&#34;&gt;below&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;&#xA;&lt;p&gt;Most data professionals will have to deal with dates at one point or another. Most often, we don&amp;rsquo;t have the luxury of choosing the data types in the source systems we work with, so some interesting challenges may come up.&lt;/p&gt;&#xA;&lt;p&gt;One such challenge is daylight saving time (or summer time, in Europe), when people turn their clocks back or forward an hour to have more hours of sunlight during our daily life (this is not universal, but is observed throughout most of Europe and North America, &lt;a href=&#34;https://en.wikipedia.org/wiki/Daylight_saving_time_by_country&#34;&gt;among other places&lt;/a&gt;).&lt;/p&gt;</description>
    </item>
    <item>
      <title>Column order matters in Clustered Indexes</title>
      <link>https://mortenankerstjerne.com/posts/2025/index-column-order/</link>
      <pubDate>Sun, 23 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://mortenankerstjerne.com/posts/2025/index-column-order/</guid>
      <description>&lt;p&gt;&lt;em&gt;Demo scripts &lt;a href=&#34;#demo-scripts&#34;&gt;below&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;p&gt;I recently worked on a problem for a customer who experienced some performance issues with a job on their SQL Server database.&lt;/p&gt;&#xA;&lt;p&gt;Because they use SQL Server Standard Edition, their solution included a home-rolled version of table partitioning, where two tables were created every week:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;one contained 10-second aggregate values, written in microbatches every 10 seconds&lt;/li&gt;&#xA;&lt;li&gt;the other similarly contained 5-minute values, written from the 10 second table every 5 minutes&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;All of these aggregates were based on raw measurements from their various systems, one measurement per device.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
