This commit is contained in:
Joshua Coles 2023-03-19 16:32:37 +00:00
parent 22ba78d068
commit 4f7f018858
6 changed files with 114 additions and 71 deletions

1
.gitignore vendored
View File

@ -1,2 +1,3 @@
.texpadtmp
*.pdf
tectonic

View File

@ -1,56 +1,36 @@
\section*{Appendix}
\subsection*{Generic DLA Model}
\label{generic-dla}
The main tool used to generate the data in this report was the generic DLA framework written to support the paper. Here we will briefly discuss the process of creating a verifying this framework.
The main tool used to generate the data in this report was the generic DLA framework written to support the report. Here we will briefly discuss the process of creating and verifying this framework.
Innate within the problem of designing an exploratory model is the question of correctness, is something unusual you are observing a bug in your model, or an interesting new behaviour to explore.
% TODO Do I want to ref an example of this, it sound fun.
To counter this we operated on a system of repeatedly grounding alterations of our model to the previous version, treating the initially provided codebase as our root ground, once its behaviour had been verified to be roughly in accordance to the literature.
An intrinsic problem of developing computational models for exploratory work is the question of correctness: is some novel result you find a bug in your model, or exactly the interesting new behaviour you set out to explore.
To this end, starting with the initially provided code we made the aforementioned minimal alterations such that it would run in reasonable time\fnmark{macos-speed} and output the data required for later analysis. This data was then analysed and compared with literature (refer to the results for this work).
To mitigate this issue the model for this system was created iteratively, with each step being checked against the last where their domains overlap (naturally the newer model is likely to cover a superset of the domain of the old model so there will be some areas where they do not) and unit testing of specific behaviours and verifying expectations\fnmark{unit-test-egs}.
\fntext{unit-test-egs}{Examples that came up in development include: ensuring our uniform random walks are indeed uniform, that they visit all the desired neighbours, etc}
This creates a chain of grounding between one model and the next, where our trust model $N+1$ is grounded in our trust of model $N$ and our unit testing however this trust chain, depends on our trust of $N = 0$, the initially provided code. For this we rely on both the extensive history of the code, and (rough) agreement with literature (see the results section for this comparison).
To this end, starting with the initially provided code we made the minimal alterations necessary such that it would run in reasonable time\fnmark{macos-speed} and output the data required for later analysis. This was done explicitly with the goal of perturbing the initial code's behaviour as little as possible, including not performing relatively obvious performance improvement that might introduce bugs (the previously mentioned performance improvements were predominantly code removal as opposed to code change). This allowed us to collect the data we needed and ground the initial model in theory.
\fntext{macos-speed}{When running on macOS systems the rendering code slows down the model by several orders of magnitude making it unsuitable for large scale modelling, hence it is removed and replaced with image generation mitigation as discussed later.}
Once rough accordance with literature was obtained (see Figure \ref{nc-fd-convergence}), and most importantly, consistency between runs (verifying against a ill behaved system is a fruitless and painful endeavour), we added the sticking probability alteration as the simplest alteration the DLA algorithm, verifying agreement between the traditional and probabilistic sticking models at $p_{stick} = 1$.
Once rough accordance with literature was obtained (see Figure \ref{nc-fd-convergence}), and most importantly, consistency between runs (verifying against a ill behaved system is a fruitless and painful endeavour), we added the sticking probability alteration as the simplest alteration the DLA algorithm, verifying agreement between the traditional and probabilistic sticking models at $p_{stick} = 1$. See Figure \ref{sp-fd-rust-vs-c} for this comparison.
% TODO Rust vs C nc-fd-convergence graph
This then provided sufficient data for us to transition to our new generic framework, verifying that it agreed with this dataset to ensure correctness. In addition unit tests for a number of key behaviours were written, benefiting greatly from the composability of the system.
\begin{figure}[t]
\includegraphics[width=\columnwidth]{figures/sp-fd-rust-vs-c.png}
\caption{A comparison of the reported fractal dimension the probabilistic sticking extension of the Initially Provided Code (IPC + PS) in blue, and the New Framework with probabilistic sticking enabled (NF) in red. We can clearly see a high degree of agreement grounding our new framework and the basic functions of the model.}
\label{sp-fd-rust-vs-c}
\end{figure}
%We will investigate these characteristics in turn as time and computational modelling allows through the following process. Starting with the provided code and working towards a more bespoke and customisable model.
%
% We first took the initially provided code for DLA modelling \ref{IPC} and make minimal alterations such that it will run with reasonable speed \footnote[1]{When running on macOS systems the rendering code slows down the model by several orders of magnitude making it unsuitable for large scale modelling, hence it is removed and replaced with image generation mitigation as discussed later.} and output data for analysis.
This data will be analysed and compared with literature\ref{initial-fractal-dimension-data} to confirm agreement. This will then act as a baseline implementation of the DLA model against which we can compare future alterations to ground them and ensure preservation of correct behaviour. In addition we will be creating a small auxiliary program to generate static images of the final result for manual verification of qualitative characteristics as the rendering code is not suitable for large data collection\footnotemark[1].
We first took the initially provided code \cite{IPC} and made minimal alterations such such that the code ran in reasonable time\footnote{When running on macOS systems the rendering code slows down the model by several orders of magnitude making it unsuitable for large scale modelling, hence it is removed, visualisation was handled externally.} and output data for analysis. For large configuration space exploring runs the code was run using \cite{GNUParallel} to allow for substantially improved throughput.
This then provided sufficient data for us to transition to our new generic framework, verifying that it agreed with this dataset to ensure correctness.
%TODO Should we reference git commits here? Or keep them all in one repo. Maybe a combo and have them as submodules in a report branch allowing for a linear history and also concurrent presentation for a report.
Once this minimal viable alteration is complete we will implement our first proper change to the system, introducing a sticking probability, $p_{stick}$, such that a particle is no longer guaranteed to stick when moving adjacent to the cluster, but instead has a chance of simply passing by. This represents a change in our first identified orthogonal behaviour of the model, and the simplest to implement in the framework of the initially provided code. We will verify behaviour against the minimal viable alteration to ensure it is correct. Once this has been done this data will then be analysed to identify a quantitive relationship between $p_{stick}$ and our observables previously listed.
\subsection*{Auxiliary Programs}
\todo{Do we want to show that bouncing has no real effect}
A number of auxiliary programs were also developed to assist in the running and visualisation of the model. Most notably was the image generation tool which allowed for the model to focus on one thing: modelling DLA, separating out generating visualisations of the system. This was used to generate images such as that shown in Figure \ref{dla-eg} which are both useful for presentation, and visual qualitative assessment of model correctness.
\todo{Do we want to talk about testing, for example that we get a uniform offset, etc.}
\todo{Do we have any theory to link for this? Probably in results but worth bearing in mind}
For further alterations a new codebase will be engineered to allow for more efficient alteration of the other two, more systematic, orthogonal characteristics of the system, containing initially the sticking probability alteration. To ensure fidelity of results we will compare the behaviour and observables of this new system to that of the minimal viable alteration, as well as the sticking probability alteration of it \todo{Better word}.
Once accuracy has been determined the model will be embedded in spaces of higher dimensions, with different values of $p_{stick}$, to observe changes in our desired behaviour and compared against literature where possible.
Finally a system for more complex particle motion will be developed such that we can plug in multiple walk modes in addition to a standard random walk, for example by introducing an external force or various varieties.
\section{Specific Method}
\begin{enumerate}
\item Choice of maxParticles such that it converges?
\item Use of {convergent} eee
\end{enumerate}
Choice of maxParticles such that it converges?

View File

@ -1,28 +1,33 @@
\singlecolumnabstract{
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus. Phasellus viverra nulla ut metus varius laoreet. Quisque rutrum. Aenean imperdiet. Etiam ultricies nisi vel augue.
}
% TODO Write abstract
\medskip
% TODO Do I want a TOC?
%\tableofcontents
\section*{Introduction}
Diffusion-limited aggregation (DLA) models processes where the diffusion of small particles into a larger aggregate is the limiting factor in a system's growth. It is applicable to a wide range of systems such as, A, B, and C.
% TODO Provide examples
This process gives rise to structures which are fractal in nature (for example see Figure \ref{dla-eg}), ie objects which contain detailed structure at arbitrarily small scales. These objects are associated with a fractal dimension, $\mathrm{fd}$ or $df$. This number relates how measures of the object, such as mass, scale when the object itself is scaled. For non fractal this will be its traditional dimension: if you double the scale of a square, you quadruple its area, $2 ^ 2$; if you double the scale of a sphere, you octuple its volume, $2 ^ 3$. For a DLA aggregate in a 2D embedding space, its "traditional" dimension would be 1, it is not by nature 2D, but due to its fractal dimension it has a higher fractal dimension higher than that.
% TODO We need to clean up the symbol
% TODO Source the fractal dimension
In this paper we will consider a number of alterations the standard DLA process and the effect they have on the fractal dimension of the resulting aggregate. This data will be generated by a number of computational models derived initially from the code provided \cite{IPC} but altered and optimised as needed for the specific modelling problem.
% Mention MVA I think so I can reference it in the section on spaces alteration.
% TODO Explain Fractal Dimension
In this paper we will consider a number of alterations the standard DLA process and the affect they have on the fractal dimension of the resulting aggregate. This data will be generated by a number of computational models derived initially from the code provided \cite{IPC} but altered and optimised as needed for the specific modelling problem.
\begin{figure}[t]
\includegraphics[width=\columnwidth]{figures/dla-eg}
\caption{A $5000$ particle aggregate on}
\caption{A $5000$ particle aggregate on a 2D square grid.}
\label{dla-eg}
\end{figure}
% TODO Do I want to show something akin to the comparison image with a 2x2 grid of different sizes?
% TODO Extension, can do we do something akin to renormalisation with that scaling property?
\section*{Discussion}
@ -49,24 +54,16 @@ Finally we arrive at the final characteristic we will consider: the space that t
\section*{Method}
%TODO Include a note on long running and exploration simulations in the methodology section?
To this end we designed a generic system such that these different alterations of the traditional DLA model could be written, explored, and composed quickly, whilst generating sufficient data for statistical measurements. This involved separating the various orthogonal behaviours of the DLA algorithm into components which could be combined in a variety of ways enabling a number of distinct models to be exist concurrently within the same codebase.
% TODO Verify stats for said statistical measurements!!!
This code was based off the initially provided code, altered to allow for data extraction and optimised for performance. For large configuration space exploring runs the code was run using GNU Parallel \nocite{GNUParallel} to allow for substantially improved throughput.
This code was based off the initially provided code, altered to allow for data extraction and optimised for performance. For large configuration space exploring runs the code was run using GNU Parallel \nocite{GNUParallel} to allow for substantially improved throughput (this is opposed to long running, high $N$ simulations where they were simply left to run).
The code was written such that it is reproducible based on a user provided seed for the random number generator, this provided the needed balance between reproducibility and repeated runs. Instructions for building the specific models used in the paper can be found in the appendix.
We first took the initially provided code \cite{IPC} and made minimal alterations such such that the code ran in reasonable time\footnote{When running on macOS systems the rendering code slows down the model by several orders of magnitude making it unsuitable for large scale modelling, hence it is removed, visualisation was handled externally.} and output data for analysis.
\subsection*{Statistical Considerations}
% TODO Is this something we need to talk about?
% TODO Verify stats for said statistical measurements!!!
%\subsection*{Statistical Considerations}
% TODO Is this something we need to talk about? Or should it be in the appendix?
\subsection*{Fractal Dimension Calculation}

View File

@ -11,12 +11,12 @@
\usepackage{geometry} % to change the page dimensions
\geometry{a4paper, left=17.5mm, right=17.5mm, textwidth=85mm,columnsep=5mm, top=32mm}
%\setlength{\parindent}{0pt}
\setlength{\parskip}{8pt}
\usepackage[backend=bibtex, dateabbrev=false]{biblatex}
\usepackage{graphicx} % support the \includegraphics command and options
\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%%% PACKAGES
\usepackage{booktabs} % for much better looking tables

View File

@ -34,6 +34,24 @@
file = {/Users/joshuacoles/Zotero/storage/F2M3CGET/batty1989.pdf.pdf;/Users/joshuacoles/Zotero/storage/YAGMYPYZ/Batty et al. - 1989 - Urban Growth and Form Scaling, Fractal Geometry, .pdf}
}
@article{bentleyMultidimensionalBinarySearch1975,
title = {Multidimensional Binary Search Trees Used for Associative Searching},
author = {Bentley, Jon Louis},
date = {1975-09},
journaltitle = {Communications of the ACM},
shortjournal = {Commun. ACM},
volume = {18},
number = {9},
pages = {509--517},
issn = {0001-0782, 1557-7317},
doi = {10.1145/361002.361007},
url = {https://dl.acm.org/doi/10.1145/361002.361007},
urldate = {2023-03-18},
abstract = {This paper develops the multidimensional binary search tree (or k -d tree, where k is the dimensionality of the search space) as a data structure for storage of information to be retrieved by associative searches. The k -d tree is defined and examples are given. It is shown to be quite efficient in its storage requirements. A significant advantage of this structure is that a single data structure can handle many types of queries very efficiently. Various utility algorithms are developed; their proven average running times in an n record file are: insertion, O (log n ); deletion of the root, O ( n ( k -1)/ k ); deletion of a random node, O (log n ); and optimization (guarantees logarithmic performance of searches), O ( n log n ). Search algorithms are given for partial match queries with t keys specified [proven maximum running time of O ( n ( k - t )/ k )] and for nearest neighbor queries [empirically observed average running time of O (log n ).] These performances far surpass the best currently known algorithms for these tasks. An algorithm is presented to handle any general intersection query. The main focus of this paper is theoretical. It is felt, however, that k -d trees could be quite useful in many applications, and examples of potential uses are given.},
langid = {english},
file = {/Users/joshuacoles/Zotero/storage/EZJWE76J/Bentley - 1975 - Multidimensional binary search trees used for asso.pdf}
}
@article{botetClusteringClustersProcesses1985,
title = {Clustering of Clusters Processes above Their Upper Critical Dimensionalities},
author = {Botet, R},

View File

@ -1,17 +1,18 @@
\section*{Results}
\begin{figure}[t]
\includegraphics[width=\columnwidth]{figures/rmax-n.png}
\caption{The growth of $N$ vs $r_{\mathrm{max}}$ for $20$ runs of the standard DLA model. Also included is a line of best fit for the data, less the first $50$ which are removed to improve accuracy.
% TODO Check all of my captions are correct.
% TODO Add information for this
}
\label{rmax-n}
\end{figure}
\subsection*{Preliminary Work: Testing Initial Implementation and Fractal Dimension Calculations}
\label{ii-fdc}
\begin{figure}
\includegraphics[width=\columnwidth]{figures/rmax-n.png}
\caption{The growth of $N$ vs $r_{\mathrm{max}}$ for $20$ runs of the standard DLA model. When fitting the first $1000$ points are not included when fitting to improve accuracy. The
$50$ data points are not included as the data contains to much noise to be meaningfully displayed. Also included in the figure is the value from theory, $1.71 \pm 0.01$ from \cite[Table 1, $\langle D(d = 2)\rangle$]{nicolas-carlockUniversalDimensionalityFunction2019}.}
\label{rmax-n}
\end{figure}
\begin{figure}[t!]
\begin{figure}[hbt]
\includegraphics[width=\columnwidth]{figures/nc-fd-convergence.png}
\caption{The converge of the fractal dimension of $20$ runs of the standard DLA model. This uses the the simple calculation method. The first $50$ data points are not included as the data contains to much noise to be meaningfully displayed. Also included in the figure is the value from theory, $1.71 \pm 0.01$ from \cite[Table 1, $\langle D(d = 2)\rangle$]{nicolas-carlockUniversalDimensionalityFunction2019}.}
\label{nc-fd-convergence}
@ -26,7 +27,17 @@ This also allows us to say with reasonable confidence that we can halt our model
\subsection{Probabilistic Sticking}
\begin{figure}[t!]
\begin{figure}[hbt]
\includegraphics[width=\columnwidth]{figures/eg-across-sp/sp-range.png}
\caption{Here we see the result of three different DLA simulations with $p_{stick} = 0.1,0.5,1.0$ from left to right. Note the thickening of the arms at low probabilities.}
\label{sp-dla-comparison}
\end{figure}
As discussed one of the possible alterations of the system is the introduction of a probabilistic component to the sticking behaviour of the DLA system. Here we introduced a probability $p_{stick}$ to the initial grid based sticking behaviour of the particles, with the particle being given this probability to stick at each site (for example, if the particle was adjacent to two cells in the aggregate, then the probabilistic aspect would apply twice).
Comparing the results from different runs we can see in Figure \ref{sp-dla-comparison} the clear thickening of the arms with lower values of $p_{stick}$. This aligns with your observation of the fractal dimension, as seen in Figure \ref{sp-fd}.
\begin{figure}[hbt]
\includegraphics[width=\columnwidth]{figures/sp-fd}
\caption{The fractal dimension for the DLA system on a 2D grid lattice a sticking probability $p_{stick}$. This value was obtained from $100$ runs with different seeds, by computing the value of the fractal dimension using the simple method, taking a mean across the last $100$ measurements on a $2000$ particle cluster.
% TODO These numbers are way too small given the results of Figure 1.
@ -34,10 +45,46 @@ This also allows us to say with reasonable confidence that we can halt our model
\label{sp-fd}
\end{figure}
As discussed one of the possible alterations of the system is the introduction of a probabilistic component to the sticking behaviour of the DLA system. Here we introduced a probability $p_{stick}$ to the initial grid based sticking behaviour of the particles, with the particle being given this probability to stick at each site (for example, if the particle was adjacent to two cells in the aggregate, then the probabilistic aspect would apply twice).
This was also the case used to ground both the minimally altered code, and our new generic system, to ensure they are functioning correctly. The data for both is presented in Figure \ref{sp-fig}. As we would expect we see a g
This was also the case used to ground both the minimally altered code, and our new generic system, to ensure they are functioning correctly. This is discussed in more depth in the Appendix, \nameref{generic-dla}.
\subsection*{Higher Dimensions}
The next alteration to explore is changing the embedding space to be higher dimensional. This is an excellent example of the versatility of the generic DLA framework as for higher dimensions it becomes advantageous to move from an array based grid storage to a k-dimensional tree structure for more efficient storage, while only moving to $O(\log n)$ average performance for searches and inserts\cite{bentleyMultidimensionalBinarySearch1975}.
Running with various values of $p_{stick}$ we get the results shown in Figure \ref{sp-fd-2d-3d}.
\begin{figure}
\includegraphics[width=\columnwidth]{figures/3d-nc-fd-convergence}
\caption{TODO}
\label{3d-nc-fd-convergence}
\end{figure}
\begin{figure}
\includegraphics[width=\columnwidth]{figures/sp-fd-2d-3d}
\caption{TODO}
\label{sp-fd-2d-3d}
\end{figure}
% TODO Do I want to do higher dimensions still 4d?
% TODO how am I going to cope with the fact we don't agree with theory?
% TOOD Look at theory to see if I can find a curve for these sp-fd graphs or at the very least note similarities and differences between them. "Given the erroneous behaviour for low sp we are uncertain as to the correctness). Maybe take another crack at boxcount since you've mentioned it and it might be interesting.
\begin{enumerate}
\item The next obvious extension is 3D
\item Try with on axis and off-axis movement
\item See that off-axis has no effect but is quicker as traverses space quicker (I mean also validate this)
\item Now try 3D + SP
\end{enumerate}
% MORE EXTENSIONS
\subsection*{Continuous Space}
\begin{enumerate}
\item We get a divergence from theory, what happens if we use continuous
\end{enumerate}
\subsection*{Hexagonal}
\subsection*{External force onto wall}