SEMINAR REPORT PDF
2 | Screenagers International - Seminar Report 2 yazik.info programmes/erasmus-plus/documents/yazik.info SEMINAR REPORT. Semester: Autumn, Submitted by. Name: yazik.infoi . Roll No: Research Scholar. IDP-Education Technology. Download New Latest Seminar Topics for Computer Science, Electronics, Electrical, Latest Seminar Topics, Abstract, Reports, Engineering Topics, PPT, PDF.
|Language:||English, Spanish, German|
|ePub File Size:||20.51 MB|
|PDF File Size:||19.39 MB|
|Distribution:||Free* [*Register to download]|
PDF | This report is compilation of knowledge about intelligent buildings Seminar on Civil Engineering Disciplines for Partial Fulfillment of. Hints for Writing a Seminar Report, a Papers, or a Thesis. Prof. Philipp Slusallek, translated by Florian Winter. Saarland University. April Abstract. The report starts with a summary of the key points from the seminar. It goes on to give some background about why the seminar was organised and who.
Radar sets use the echo to determine the direction and distance of the reflecting object. The radar antenna illuminates the target with a microwave signal, which is then reflected and picked up by a receiving device. The electrical signal picked up by the receiving antenna is called echo or return. The radar signal is generated by a powerful transmitter and received by a highly sensitive receiver. All targets produce a diffuse reflection i.
The reflected signal is also called scattering. Backscatter is the term given to reflections in the opposite direction to the incident rays. Radar signals can be displayed on the traditional plan position indicator PPI or other more advanced radar display systems.
A PPI has a rotating vector with the radar at the origin, which indicates the pointing direction of the antenna and hence the bearing of targets. Transmitter The radar transmitter produces the short duration high-power rf pulses of energy that are into space by the antenna.
This switching is necessary because the highpower pulses of the transmitter would destroy the receiver if energy were allowed to enter the receiver. The receivers amplify and demodulate the received RF-signals. The receiver provides video signals on the output. Radar Antenna The Antenna transfers the transmitter energy to signals in space with the required distribution and efficiency.
This process is applied in an identical way on reception. For one thing, it solves a long-lived issue when it comes to gesture-recognition technology. Previous forays into the topic yielded almost-answers such as stereo cameras which have difficulty understanding the overlap of fingers, e. Google ATAPs answer is radar. Radar is capable of interpreting objects position and motion even through other objects, making it perfect for developing a sensor that can be embedded in different kinds of devices like smartphones.
The difficulty was that radar hardware is too large for wearable applications. Way too large.
Even the scaled-down early prototypes ATAP developed were about the size of a briefcase. And thats including the antenna array. RBMs perform a kind of fac- tor analysis on input data,extracting a smaller set of hidden variables, that can be used as data representation. It is different from other representation algorithms in Machine Learning ML due to 2 things. Since it is generative, it can generate data on its own after learning. Figure 4. Comparison of model structures 4. Their training can be done from a large supply of unlabeled sensory inputs and very limited labeled data can then be used to only slightly fine tune the model for a specific task at hand.
DBMs also handle ambiguous inputs more robustly. This is since they incorporate top-down feedback for the training procedure. However, the correlation between features in each data modality is much stronger than that between data modalities. As a result, the learning algo- rithms are easily tempted to learn dominant patterns in each data modality separately while giving up learning patterns that occur simultaneously in multiple data modalities.
To resolve this issue, deep learning methods, such as deep autoencoders or deep Boltzmann machines DBM , have been adapted, where the com- mon strategy is to learn joint representations that are shared across multiple modalities at the higher layer of the deep network, after learning layers of modality-specific networks. The rationale is that the learned features may have less within-modality correlation than raw features, and this makes it easier to capture patterns across data modalities.
This has shown promise, but there still remains the challenging question of how to learn associations between multiple heterogeneous data modalities so that we can effectively deal with missing data modalities at testing time. One necessary condition for a good generative model of multimodal data is the ability to predictor reason about missing data modalities given par- tial observation.
There emphasis is on efficiently learning associations between heterogeneous data modalities. According to their study, the data from multiple sources are semantically correlated and provide complemen- tary information about each other and a good multimodal model must be able to generate a missing data modality given the rest of the modalities.
In the first phase, each RBM component of the proposed multimodal DBM is pre-trained by using the greedy layerwise pretraining strategy.
Un- der this scheme, we will train each layer, with a set of different parameters and choose the best performing parameter set for the model. For this a contrastive divergence CD algorithm is utilized, since the time complexity for computation increases with number of neurons.
The 1-step contrastive divergence CD1 algorithm is widely used for RBM training, to perform ap- proximate learning for learning parameters. CD allows us to approximate the gradient of energy function.
The approximation of the gradient is based on a Markov chain. In CD1 algorithm a Markov chain is run for one full step and them the pa- rameters are modified to reduce the likelihood of chain wandering of from the initial distribution. This reduces the time and computational effort since we are not waiting the chain to run to equilibrium state and comparing initial and final distribution.
The distribution generated by Markov Chain can be thought approximately as the distribution generated by RBMs since they both alter the energy function. The one step running of the Markov Chain is since the results of that single step itself would give us the direction of change of the parameters gradient. The CD1 actually performs poorly in approximating the size of the change in parameters. The key intuition is that for the lower- level RBM to compensate for the lack of top-down input into h1, the input must be doubled, with the copies of the visible-to-hidden connections tied.
Conversely, for the top-level RBM to compensate for the lack of bottom-up input into h2, the number of hidden units is doubled. For the intermediate layers, the RBM weights are simply doubled. An alternative algorithm is supervised, greedy and layer-wise: Pseudo-code for a deep network obtained by training each layer as the hidden layer of a supervised one-hidden-layer neural network During each phase of the greedy unsupervised training strategy, layers are trained to represent the dominant factors of variation extant in the data.
This has the effect of leveraging knowledge of X to form, at each layer, a representation of X consisting of statistically reliable features of X that can then be used to predict the output usually a class label Y.
This perspective places unsupervised pre-training well within the family of learning strategies collectively know as semisupervised methods. As with other recent work demonstrating the effectiveness of semi-supervised methods in regularizing model parameters, we claim that the effectiveness of the unsupervised pre- training strategy is limited to the extent that learning P X is helpful in learn- ing P Y—X.
Here, we find transformations of X—learned features—that are predictive of the main factors of variation in P X , and when the pre-training strategy is effective,2 some of these learned features of X are also predictive of Y. In the context of deep learning, the greedy unsupervised strategy may also have a special function. This proxy criterion encourages significant factors of variation, present in the input data, to be represented in intermediate layers.
Here the methodology is to split the learning process into two stages. First, each RBM component of the proposed multimodal DBM is pretrained by using the greedy layerwise pre- training strategy. In this stage, the time cost for exactly computing the derivatives of the probability distributions with respect to parameters increases expo- nentially with the number of units in the network. Thus, we adopt 1-step contrastive divergence, an approximate learning method.
The second way is to infer the missing modalities by alternating Gibbs sampling. Meanwhile, the joint representation is updated with the generated data of missing modal- ities. RBM The proposed network architecture, which is shown above is com- posed of three different pathways respectively for visual, auditory and textual modali- ties.
Each pathway is formed by stacking multiple Restricted Boltzmann Ma- chines RBM , aiming to learn several layers of increasingly complex repre- sentations of individual modality. Different from other deep networks for extracting feature, such as Deep Belief Networks DBN and denoising Au- toencoders dA , DBM is a fully generative model which can be utilized for extracting features from data with certain missing modalities.
Additionally, besides the bottom-up information propagation in DBN and dA, a top-down feedback is also incorporated in DBM, which makes the DBM more stable on missing or noisy inputs such as weakly labeled data on the Web. The pathways eventually meet and the sophisticated non-linear relationships among three modalities are jointly learned. The final joint rep- resented in an unified way. Every RBM tries to optimize its energy function in order to maximize the probability of the training data.
DBNs can be trained using the CD algorithm to extract a deep hierarchical representation of the training data. During the learning process, the DBN is first trained one layer at a time, in a greedy unsupervised manner, by treat- ing the values of hidden units in each layer as the training data for the next layer except for the first layer, which is fed with the raw input data.
This learning procedure, called pre-training, finds a set of weights that determine how the variables in one layer depend on the variables in the layer above. If the network is to be used for a classification task, then a supervised discrim- inative fine-tuning is performed by adding an extra layer of output units and back-propagating the error derivatives using some form of stochastic gradi- ent descent, or SGD.
Figure 6. Erhan et al. One possible explanation is that pre-training initializes the parameters of the network in an area of parameter space where optimization is easier and better local optima is found.
This is equivalent to penalizing solutions that are outside a particular region of the solution space. Another explanation is that pre-training acts as a kind of regularizer that minimizes the variance and introduces a bias towards configurations of the parameters that stochastic gradient descent can explore during the supervised learn- ing phase, by defining a data-dependent prior on the parameters obtained through the unsupervised learning.
In other words, pre-training implicitly imposes constraints on the parameters of the network to specify which min- imum out of all local minima of the objective function is desired. Multimodal DBM Figure above shows the proposed network architecture, which is composed of three different pathways respectively for visual, auditory and textual modali- ties. Different from other deep networks for extracting feature, such as Deep Belief Networks DBN  and denoising Autoencoders dA , DBM is a fully generative model which can be utilized for extracting features from data with certain missing modalities.
The final joint rep- resentation can be viewed as a shared embedded space, where the features with very different statistical properties from different modalities can be rep- resented in an unified way. Visual Pathway The visual input consists of five complementary low-level features widely used in previous works.
As shown in Figure, each feature is modeled with a separate two-layer DBM. The Audio- Six descriptor, which can capture different aspects of an audio signal, is expected to be complementary to the MFCC. Since the dimension of Audio- Six is only six, we directly concatenate the MFCC feature with Audio-Six rather than separating them into two sub-pathways as the design in visual pathway.
The correlation between these two features can be learned by the deep architecture of DBM. Let denote the real-valued auditory features and and represent h1 and h2 hidden layers the first and second hidden layers respectively. Textual Pathway Different from the visual and auditory modalities, the in- puts of the textual pathway are discrete values i.
Thus, we use Replicated Softmax to model the distribution over the word count vectors. Let one visible unit denoting the associated metadata i. As argued in the introduction, many real-world applications will often have one or more modalities missing.
The Multimodal DBM can be used to generate such missing data modalities by clamping the observed modalities at the inputs and sampling the hidden modalities from the conditional distribution by running the standard alter- nating Gibbs sampler Inferring Joint Representations: This fused representa- tion is inferred by clamping the observed modalities and doing alternating Gibbs sampling to sample fromtwo layers if both modalities are present or from other if text is missing.
This representation can then be used to do information retrieval for mul- timodal or unimodal queries.
Each data point in the database whether missing some modalities or not can be mapped to this latent space. Queries can also be mapped to this space and an appropriate distance metric can be used to retrieve results that are close to the query. Discriminative Tasks: Classifiers such as SVMs can be trained with these fused representations as inputs. Alternatively, the model can be used to initialize a feed forward network which can then be finetuned.
In our ex- periments, logistic regression was used to classify the fused representations.
Our first set of experiments, evaluate the DBM as a discriminative model for multimodal data. For each model that we trained, the fused representation of the data was extracted and feed to a separate logistic regression for each of the 38 topics. The text input layer in the DBM was left unclamped when the text was missing. Hence, to make a fair comparison, our model was first trained using only labeled data with a similar set of features i.
We call this model DBM-Lab. To measure the effect of using unlabeled data, a DBM was trained using all the unlabeled examples that had both modalities present. We call this model DBM-Unlab. Adding these features improves the MAP to 0. We compared our model to two other deep learning models: These models were trained with the same number of layers and hidden units as the DBM.
Their performance was comparable but slightly worse than that of the DBM. In terms of precision 50, the autoencoder performs marginally better than the rest.
We also note that the Multiple Kernel Learning approach proposed in Guillaumin et.
However, they used a much larger set of image features 37, dimensions. Unimodal Inputs: Next, we evaluate the ability of the model to improve classification of unimodal inputs by filling in other modalities. For multi- modal models, the text input was only used during training. At test time, all models were given only image inputs. The next set of experiments was designed to eval- uate the quality of the learned joint representations.
A database of images was created by randomly selecting imagetext pairs from the test set. We also randomly selected a disjoint set of images to be used as queries. Each query contained both image and text modalities. Binary relevance la- bels were created by assuming that if any of the 38 class labels overlapped between a query and a data point, then that data point is relevant to the query.
For each model, all queries and all points in the database were mapped to the joint hidden representation under that model. Cosine similarity function was used to match queries to data points. Note that even though there is little overlap in terms of text, the model is able to perform well. Unimodal Queries: The DBM model can also be used to query for uni- modal inputs by filling in the missing modality. By effectively inferring the missing text, the DBM model was able to achieve far better results than any unimodal method MAP of 0.
The overall task can be divided into three phases — feature learning, supervised training, and testing. We keep the supervised training and testing phases fixed and ex- amine different feature learning models with multimodal data.
In detail, we consider three learning settings — multimodal fusion, cross modality learning, and shared representation learning. Multimodal learning setting For the multimodal fusion setting, data from all modalities is available at all phases; this represents the typical setting considered in most prior work in audio-visual speech recognition .
In cross modality learning, one has access to data from multiple modalities only during feature learning. During the supervised training and testing phase, only data from a single modality is provided. In this setting, the aim is to learn better single modality repre- sentations given unlabeled data from multiple modalities. This setting allows us to evaluate if the feature representations can capture correlations across different modalities. Specifically, studying this setting allows us to assess whether the learned representations are modality-invariant.
We used all the datasets for feature learning. We ensured that no test data was used for unsupervised fea- ture learning. CUAVE 36 individuals saying the digits 0 to 9.
DEPARTMENT OF COMPUTER SCIENCE & ENG G
We used the normal portion of the dataset where each speaker was frontal facing and spoke each digit 5 times. As there has not been a fixed protocol for evaluation on this dataset, we chose to use odd-numbered speakers for the test set and evennumbered ones for the training set. The dataset provided preextracted lip regions at 60x80 pixels.
As we were not able to obtain the raw audio information for this dataset, we used it for eval- uation on a visual-only lipreading task. We report results on the third-test settings used for comparisons. This is a new high definition version of the AVLetters dataset.
We used this dataset for unsupervised training only. Stanford Dataset: We collected this data in a similar fashion to the CUAVE dataset and used for unsupervised training only. We note that in all datasets there is variability in the lips in terms of ap- pearance, orientation and size. Our features were evaluated on speech classification of isolated letters and digits.
We extracted features from overlapping windows. Since exam- ples had varying durations, we divided each example into S equal slices and performed average-pooling over each slice.
The features from all slices were subsequently concatenated together. In these ex- periments, we evaluate cross modality learning where one learns better repre- sentations for one modality e.
seminar report and ppt on conveyors
For the bimodal deep autoencoder, we set the value of the other modality to zero when computing the shared representation which is consistent with the feature learning phase. All deep autoencoder models are trained with all available unlabeled audio and video data.
On the AVLetters dataset, there is an improvement over hand-engineered features from prior work. The deep autoencoder models performed the best on the dataset, obtaining a classification score of On the CUAVE dataset Table 1b , there is an improvement by learn- ing video features with both video audio compared to learning features with only video data.
The deep autoencoder models ultimately performs the best, obtaining a classification score of In our model, we chose to use a very simple front-end that only extracts bounding boxes without any cor- rection for orientation or perspective changes. The video classification results show that the deep autoencoder model achieves cross modality learning by discovering better video representations when given additional audio data.
In particular, even though the AVLetters dataset did not have any audio data, we were able to obtain better perfor- mance by learning better video features using other unlabeled data sources which had both audio and video data. However, we also note that cross modality learning did not help to learn better audio features; since our feature learning mechanism is unsupervised, we find that our model learns features that adapt to the video modality but are not useful for speech classification.
It is convenient if users can submit any media content at hand as the query. Suppose we are on a visit to the Great Wall, by taking a photo, we may expect to use the photo to retrieve the relevant textual materials as visual guides for us.
Therefore, cross-modal retrieval, as a natural searching way, becomes increasingly important. Cross-modal retrieval aims to take one type of data as the query to re- trieve relevant data of another type.The radio-frequency RF energy is transmitted to and reflected from the reflecting object. Most mechanisms for landing gear retraction system are based upon a four-bar linkage, by using three members connected by pivots. These data are employed in early preliminary design of landing gear.
Solution approach …………….. It is a particular approach to build and train neural networks. This requirement sets minimum and maximum limits for the wheel track. Suppose we are on a visit to the Great Wall, by taking a photo, we may expect to use the photo to retrieve the relevant textual materials as visual guides for us.